Exonisation rates at intronic TE insertion sites

(A) Schematic of the approach used to quantify exonisation rates. For each TE-gene breakpoint at an endogeneous exon-intron junction, TEChim computes the ratio of reads supporting TE-derived splice junctions relative to canonical exon-exon junctions. (B) Ranked distribution of TE exonisation efficiencies across intronic and exonic TE insertion sites in the Drosophila brain. For each of the 264 TE-gene pairs, the breakpoint with the highest exonisation ratio is shown. Points represent mean exonisation ratios across six biological replicates. Error bars show SEM. The dashed line indicates the mean exonisation rate across all loci. Junction level exonisation ratios, replicate coverage and SEM values are provided in Supplemental Table S1.

Antisense insertion of roo in mtd introduces cryptic splice donor- and acceptor sites

(A) Schematic of the genomic locus of the mustard gene, including a roo insertion present in abCherry flies. Bar graphs show the number of breakpoint-spanning reads identified in gDNA-sequencing data from abCherry flies, using TEChim (grey). (B, C) Schematic of mustard transcripts -RX, -RF and -RI, with coding sequences (orange) and UTRs (dark grey). The precise splice acceptor- and donor sites on roo are also shown, and sections of roo that are spliced out are shaded in light magenta. Bar graph in B shows the number of breakpoint spanning reads found in mRNA from abCherry midbrains by TEChim (grey) and TIDAL (red). Bar graphs in C show the number of reads that support splicing events (indicated by light grey lines) between up- and downstream exons, and two conserved loci within roo (grey bars). Each bar is labelled with the precise location of the breakpoint on roo and the reference genome. TIDAL did not find any of these reads. In addition, red arrows indicate the precise location of RT-PCR primers used by Azad et al. Note that one primer of each primer pair maps onto a section of the TE that is actually spliced out. Figure is drawn to scale and gDNA and mRNA are aligned to each other.

Accurate primer design generates independent breakpoint-spanning fragments that confirm splicing between Bx and opus

(A) Schematic of the genomic locus of Beadex - splice isoform -RA, drawn to scale. Primer pairs used by Azad et al. are shown in red (at the precise point where they map). The primers used here are drawn in black (forward primer: CCTGATCTCACGGTCTCTGT; reverse primer: CCTTAATGCCTGTCACCACG). The predicted band size for the spliced transcript is 647nt. (B) Electrophoresis gel image of PCR product. The ladder is Quick-load 100 bp ladder. The image was edited to remove surface artefacts (e.g. scratches) using global, automated image-processing software with no manual intervention. No bands were added or removed (see the original, unaltered image in Figure 2 – figure supplement 1). Dashed box indicates the fragment that was excised and Sanger sequenced. (C) Sanger sequencing result of the band from b. Sequence matches the predicted breakpoint between Beadex exon 4 and position 902 within opus.

Schematic highlighting method to assess rate of non-autonomous versus autonomous TE expression

(A, B) The average read depth per nucleotide within a TE (X) is calculated and compared with the number of breakpoint-spanning reads for the same element (Y). (C) Primarily non-autonomously expressed TEs sit in the top left half of the correlation plot, while autonomously expressed TEs are on the bottom right side. The data for this analysis is published in Treiber and Waddell, 2020.

tabor insertion in CG17698

Upper panel shows a schematic of the genomic locus of the CG17698 gene, including a tabor insertion present in abCherry flies. Bar graphs at the top show the number of breakpoint-spanning reads identified in gDNA-sequencing data from abCherry flies, using TEChim. Lower panel shows a schematic of the CG17698-RD transcript, with coding sequences (orange) and UTRs (dark grey). The splice acceptor site on tabor is also shown, and the section of tabor that is spliced out is shaded in light magenta. Bar graphs show the number of breakpoint spanning reads found in mRNA from abCherry midbrains using TEChim (grey) and TIDAL (red). Bar graphs at the bottom show the number of reads that support splicing events (indicated by light grey lines) between two downstream exons and a conserved locus within tabor (grey bars). Each bar is labelled with the precise location of the breakpoint on tabor and the reference genome. TIDAL did not find any of these reads. In addition, red arrows show precise location of RT-PCR primers used by Azad et al. Note that one primer maps onto a section of the TE that is actually spliced out. Figure is drawn to scale and gDNA and mRNA are aligned to each other.

I-element insertion in Pde1C

Upper panel shows a schematic of the genomic locus of the Pde1C gene, including an I-element insertion present in abCherry flies. Bar graphs at the top show the number of breakpoint-spanning reads identified in gDNA-sequencing data from abCherry flies, using TEChim. Lower panel shows a schematic of the Pde1C-RB and -RD transcripts, with coding sequences (orange) and UTRs (dark grey). The splice acceptor site on the I-element is also shown, and the section of the I-element that is spliced out is shaded in light magenta. Bar graphs show the number of breakpoint spanning reads found in mRNA from abCherry midbrains using TEChim (grey) and TIDAL (red). Bar graph at the bottom shows the number of reads that support splicing events (indicated by light grey line) between the upstream exon and a conserved locus within the I-element (grey bar). The bar is labelled with the precise location of the breakpoint on tabor and the reference genome. TIDAL did not find any of these reads. In addition, red arrows show precise location of RT-PCR primers used by Azad et al. Note that one primer maps onto a section of the TE that is actually spliced out and the other primer to an exon of another splice isoform than reported in abCherry flies. Figure is drawn to scale and gDNA and mRNA are aligned to each other.

blood insertion in Dscam2

Upper panel shows a schematic of the genomic locus of the Dscam2 gene, including a blood insertion present in abCherry flies. Bar graphs at the top show the number of breakpoint-spanning reads identified in gDNA-sequencing data from abCherry flies, using TEChim. Lower panel shows a schematic of the Dscam2-RE transcript, with coding sequences (orange) and UTRs (dark grey). The splice donor site on blood is also shown, and the section of blood that is spliced out is shaded in light magenta. Bar graphs show the number of breakpoint spanning reads found in mRNA from abCherry midbrains using TEChim (grey) and TIDAL (red). Bar graph at the bottom shows the number of reads that support splicing events (indicated by light grey line) between the downstream exons and a conserved locus within blood (grey bar). The bar is labelled with the precise location of the breakpoint on blood and the reference genome. TIDAL did not find any of these reads. In addition, red arrows show precise location of RT-PCR primers used by Azad et al. Note that one primer maps onto a section of the TE that is actually spliced out. Figure is drawn to scale and gDNA and mRNA are aligned to each other.

opus insertion in mub

Upper panel shows a schematic of the genomic locus of the mub gene, including an opus insertion present in abCherry flies. Bar graphs at the top show the number of breakpoint-spanning reads identified in gDNA-sequencing data from abCherry flies, using TEChim. Lower panel shows a schematic of the mub-RF transcript, with UTRs (dark grey). The splice donor site on opus is also shown. The first 24 nucleotides of opus are spliced out, but this cannot be visualised at the current scale. Bar graphs show the number of breakpoint spanning reads found in mRNA from abCherry midbrains using TEChim (grey) and TIDAL (red). Bar graph at the bottom shows the number of reads that support splicing events (indicated by light grey line) between the downstream exons and a conserved locus within opus (grey bar). The bar is labelled with the precise location of the breakpoint on opus and the reference genome. TIDAL did not find any of these reads. In addition, red arrows show precise location of PCR primers used by Azad et al. to genotype their w1118 flies. Note that these primers flank a region ∼31k nucleotides upstream of the opus insertion in abCherry flies. Figure is drawn to scale and gDNA and mRNA are aligned to each other.

mdg3 insertion in SelR

Upper panel shows a schematic of the genomic locus of the SelR gene, including an mdg3 insertion present in abCherry flies. Bar graphs at the top show the number of breakpoint-spanning reads identified in gDNA-sequencing data from abCherry flies, using TEChim. Lower panel shows a schematic of the SelR-RE and -RJ transcripts, with coding sequences (orange) and UTRs (dark grey). The splice acceptor site on mdg3 is also shown, and the section of mdg3 that is spliced out is shaded in light magenta. Bar graphs show the number of breakpoint spanning reads found in mRNA from abCherry midbrains using TEChim (grey) and TIDAL (red). Bar graph at the bottom shows the number of reads that support splicing events (indicated by light grey line) between the upstream exon and a conserved locus within mdg3 (grey bar). The bar is labelled with the precise location of the breakpoint on mdg3 and the reference genome. TIDAL did not find any of these reads. In addition, red arrows show precise location of RT-PCR primers used by Azad et al. Note that one primer maps onto a section of another splice isoform then reported in abCherry flies. Figure is drawn to scale and gDNA and mRNA are aligned to each other.

Original electrophoresis gel image