Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification

  1. Matthew T Parker
  2. Katarzyna Knop
  3. Anna V Sherwood
  4. Nicholas J Schurch
  5. Katarzyna Mackinnon
  6. Peter D Gould
  7. Anthony JW Hall
  8. Geoffrey J Barton  Is a corresponding author
  9. Gordon G Simpson  Is a corresponding author
  1. University of Dundee, United Kingdom
  2. University of Liverpool, United Kingdom
  3. Norwich Research Park, United Kingdom
  4. James Hutton Institute, United Kingdom
15 figures, 3 tables and 10 additional files

Figures

Figure 1 with 1 supplement
Diverse Arabidopsis RNAs are detected by nanopore DRS.

(A) Nanopore DRS 12.7 kb read alignment at AT1G48090, comprising 63 exons. Blue, nanopore DRS read alignments; black, Araport11 annotation. (B) Nanopore DRS read alignments at the snoRNA gene U3B (AT5G53902). Blue, sample of nanopore DRS read alignments; black, Araport11 annotation. (C) PIN4 (AT2G01420) antisense RNAs detected using nanopore DRS. Blue, Col-0 PIN4 sense Illumina RNAseq coverage and sample of nanopore PIN4 sense read alignments; orange, Col-0 PIN4 antisense Illumina RNAseq coverage and nanopore PIN4 antisense read alignments; green, hen2-2 mutant PIN4 sense Illumina RNAseq coverage; purple, hen2-2 mutant PIN4 antisense Illumina RNAseq coverage; black, PIN4 sense RNA isoforms found in Araport11; grey, PIN4 antisense differentially expressed regions detected in Illumina RNAseq with DERfinder.

Figure 1—source data 1

hen2-2 vs Col-0 differentially expressed regions from Ilumina RNAseq –Figure 1C.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig1-data1-v1.tds
Figure 1—figure supplement 1
Properties of nanopore DRS sequencing data.

(A) Nanopore DRS identified a 12.8 kb transcript generated from the AT1G67120 gene that includes 58 exons. Blue, nanopore DRS isoform; black, Araport11 annotation. (B) Synthetic ERCC RNA spike-in mixes are detected in a quantitative manner. Absolute concentrations of spike-ins are plotted against counts per million (CPM) reads in log10 scale. Blue, ERCC RNA spike-in mix 1; orange, ERCC RNA spike-in mix 2. (C) Overview of the sequencing and alignment characteristics of nanopore DRS data for ERCC RNA spike-ins. Left, distribution of the length fraction of each sequenced read that aligns to the ERCC RNA spike-in reference; centre, distribution of fraction of identity that matches between the sequence of the read and the ERCC RNA spike-in reference for the aligned portion of each read; right, distributions of the occurrence of insertions (black), substitutions (orange) and deletions (blue) as a proportion of the number of aligned bases in each read. (D) Substitution preference for each nucleotide (left to right: adenine [A], uracil [U], guanine [G], cytosine [C]). When substituted, G is replaced with A in more than 63% of its substitutions, while C is replaced by U in 73%. Conversely, U is rarely replaced with G (12%) and A is rarely substituted with C (16%). (E) Nucleotide representation within the ERCC RNA Spike-In reference sequences (black dots) compared with nucleotide representation within four categories from the nanopore DRS reads. Identity matches between the sequence of the read and the ERCC RNA spike-in reference (green crosses), insertions (blue pentagons), deletions (yellow stars) and substitutions (purple diamonds).G is under-represented and U is over-represented for all three categories of error (insertion, deletion and substitution) relative to the reference nucleotide distribution. C is over-represented in deletions and substitutions. A is over-represented in insertions and deletions and under-represented in substitutions. (F) Signals originating from the RH3 transcripts are susceptible to systematic over-splitting around exons 7–9 (highlighted using a purple dashed box), resulting in reads with apparently novel 5′ or 3′ positions. This appears only to occur at high frequency in datasets collected after May 2018 (Supplementary file 1) and may result from an update to the MinKNOW software. (G) PIN7 antisense RNAs detected using nanopore DRS. Blue, Col-0 PIN7 sense Illumina RNAseq coverage and nanopore PIN7 sense read alignments; orange, Col-0 PIN7 antisense Illumina RNAseq coverage and nanopore PIN7 antisense read alignments; green, hen2–two mutant PIN7 sense Illumina RNAseq coverage; purple, hen2–two mutant PIN7 antisense Illumina RNAseq coverage; black, PIN7 sense RNA isoforms found in Araport11 annotation; grey, PIN7 antisense differentially expressed regions detected with DERfinder.

Figure 2 with 1 supplement
Nanopore DRS reveals poly(A) tail length and maps 3′ cleavage and polyadenylation sites.

(A) Normalized histogram showing poly(A) tail length of RNAs encoded by different genomes in Arabidopsis. Blue, nuclear (n = 2,348,869 reads); orange, mitochondrial (n = 2,490 reads); green, chloroplast (n = 1,848 reads). (B) Distance between the RNA 3′ end positions in nanopore DRS read alignments and the nearest polyadenylation sites identified by Helicos data. (C) Nanopore DRS identified 3′ polyadenylation sites in RNAs transcribed from FPA (AT2G43410). Blue track, coverage of nanopore DRS read alignments; blue, collapsed read alignments representing putative transcript annotations detected by nanopore DRS; black, Araport11 annotation; pPAS, proximal polyadenylation site; dPAS, distal polyadenylation sites.

Figure 2—source data 1

Poly(A) tail length estimations generated from Col-0 reads – Figure 2A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig2-data1-v1.zip
Figure 2—figure supplement 1
3′ end processing is revealed by nanopore DRS.

(A) Poly(A) tail length estimates for ERCC spike-in controls. Boxplots showing distribution of poly(A) length estimates for ERCC spike-in controls with more than 100 mapped reads for which tail length could be successfully estimated. Expected poly(A) tail lengths are shown as orange points. (B) The RNA poly(A) tail length negatively correlates with the gene expression level. Expression in log2 scale of counts per million (CPM) obtained from nanopore DRS data is plotted against the median poly(A) tail length. ρ, Spearman’s correlation coefficient; black line, locally weighted scatterplot smoothing (LOWESS) regression fit. (C) Nanopore DRS identified known 3′ polyadenylation sites in RNAs transcribed from the IBM1 (AT3G07610) locus. Blue track, nanopore DRS coverage; blue, isoforms detected by nanopore DRS; black, Araport11 annotation; PAS, proximal polyadenylation site. (D) Nanopore DRS identified novel 3’ polyadenylation sites in RNAs transcribed from the PTM (AT5G35210) locus. Black (on the top), Helicos DRS 3’ coverage; blue track, nanopore DRS coverage; blue, isoforms detected by nanopore DRS; black (on the bottom), Araport11 annotation; green, five annotated transmembrane domain regions from Uniprot entry F4JYC8 (PTM_ARATH) that mapped to exons; pPAS, proximal polyadenylation site; dPAS, distal polyadenylation sites.

Figure 2—figure supplement 1—source data 1

Poly(A) tail length estimations generated from ERCC spike-in reads – Figure 2—figure supplement 1A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig2-figsupp1-data1-v1.zip
Figure 2—figure supplement 1—source data 2

Per gene poly(A) tail length estimate distributions generated from Col-0 reads - Figure 2—figure supplement 1B.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig2-figsupp1-data2-v1.tds
Figure 3 with 1 supplement
Cap-dependent ligation of an adapter enables detection of authentic RNA 5′ ends.

(A) 5′ adapter RNA ligation reduces 3′ bias in nanopore DRS data at RCA (AT2G39730) from 0.92 to 0.03. Blue line, exonic read coverage at RCA for reads without (left) and with adapter (right); orange line, median coverage; orange shaded area, interquartile range (IQR). Change in 3′ bias can be measured using the IQR/median = quartile coefficient of variation (QCoV). (B) 5′ adapter RNA ligation reduces 3′ bias in nanopore DRS data. Histogram showing the QCoV in per base coverage for each gene in the Araport11 annotation, for reads with a 5′ adapter RNA (orange), and reads without a 5′ adapter RNA (blue). (C) Cap-dependent adapter ligation allows identification of authentic 5′ ends using nanopore DRS. The cumulative distribution function shows the distance to the nearest Transcription Start Site (TSS) identified from full-length transcripts cloned as part of the RIKEN RAFL project (left), or 5’ tag identified from nanoPARE data (right), for reads with a 5′ adapter RNA (orange) compared with reads without a 5′ adapter RNA (blue). (D) Cap-dependent adapter ligation enabled resolution of an additional 11 nt of sequence at the RNA 5′ end. Histogram showing the nucleotide shift in the largest peak of 5′ coverage for each gene in data obtained using protocols with vs without a 5′ adapter RNA ligation. (E) For RCA (AT2G39730), the 5′ end identified using cap-dependent 5′ adapter RNA ligation protocol was consistent with Illumina RNAseq and full-length cDNA start site data but differed from the 5′ ends in the Araport11 and AtRTD2 annotations. Grey, Illumina RNAseq coverage; blue, nanopore DRS 5′ end coverage generated without a cap-dependent ligation protocol; green/pink, nanopore DRS 5′ end coverage for read alignments generated using the cap-dependent ligation protocol with (green) and without (pink) 5′ adapter RNA; orange, 5’ coverage of nanoPARE data; blue, TSSs identified from full-length transcripts cloned as part of the RIKEN RAFL project; black, Araport11 and AtRTD2 annotations (with duplicated 5’ positions removed).

Figure 3—source data 1

Transcriptional start site tags from RIKEN cDNA clones – Figure 3C.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig3-data1-v1.tds
Figure 3—source data 2

Transcriptional start site tags from nanoPARE sequencing – Figure 3C.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig3-data2-v1.tds
Figure 3—figure supplement 1
Nanopore DRS with cap-dependent ligation of 5′ adapter RNA.

(A) Histogram showing the distribution of 5′ adapter RNA length in the nanopore raw current signal, as inferred from alignment of the mRNA sequence to the signal using nanopolish eventalign. The median signal length was 1441 points and 96% of adapter signals were 3000 points or less. (B) Out-of-bag receiver operator characteristic curve showing the performance of the trained convolutional neural network at detecting 5′ adapter RNA using 3000 points of signal. The curve was generated using five-fold cross validation. (C) Out-of-bag precision recall curve showing the performance of trained neural network, generated using five-fold cross validation. (D) Alternative transcription start sites were identified using nanopore DRS with cap-dependent ligation of a 5′ end adapter at the AT1G17050 and AT5G18650 genes. Orange, 5’ coverage for capped nanoPARE reads; blue track, nanopore DRS coverage with cap-dependent ligation of 5′ adapter RNA; blue, isoforms detected by nanopore DRS with cap-dependent ligation of 5′ adapter RNA; black, Araport11 annotation. (E) Reads mapping to ERCC RNA spike-ins lack approximately 11 nt of sequence at the 5′ end. Histogram showing the distance to the 5′ end for ERCC RNA spike-in reads (each spike-in is shown in a different colour; only those with > 1000 supporting reads are shown). (F) Reads mapping to in vitro transcribed mGFP lack approximately 11 nt of sequence at the 5′ end. Histogram showing the distance to the 5′ end for in vitro transcribed mGFP. (G) Araport11 annotation overestimates the length of 5′ UTRs. The cumulative distribution function shows the distance to the nearest TSS identified from full-length transcripts cloned as part of the RIKEN RAFL project (blue) and Araport11 annotation (orange). (H) Nanopore DRS detects miR170/miR171 cleavage products of Hairy Meristem 1 (HAM1, AT2G45160) transcripts. Orange, 5’ coverage from capped nanoPARE reads; purple, 5’ coverage from uncapped nanoPARE reads; blue, nanopore DRS 5’ coverage; grey, miRNA target site alignment is shown; black, Araport11 annotation.

Figure 3—figure supplement 1—source data 1

microRNA cleavage site predictions supported by enrichment of nanopore 5’ ends – Figure 3—figure supplement 1H.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig3-figsupp1-data1-v1.xlsx
Figure 4 with 1 supplement
Nanopore DRS reveals the complexity of alternative splicing.

(A) Nanopore DRS identified the mutually exclusive alternative splicing of the second and third exons of FLOWERING LOCUS M (FLM, AT1G77080) and several novel isoforms. Black arrows indicate mutually exclusive exons. Orange, isoforms present in the AtRTD2 annotation but not identified using nanopore DRS; blue, isoforms common to both AtRTD2 and nanopore DRS; green, novel isoforms identified in nanopore DRS. (B) Comparison of splicing events identified in error-corrected and non-error-corrected nanopore DRS, Illumina RNA sequencing, and Araport11 and AtRTD2 annotations. Bar size represents the number of unique splicing events common to the set intersection highlighted using circles (see Supplementary file 5 for the exact values). GU/AG splicing events are shown on the top and non-GU/AG on the bottom of the plot: yellow, splicing events common to all five datasets; green, events common to both error-corrected and non-error-corrected nanopore DRS with support in orthogonal datasets; blue, events common to both nanopore DRS datasets without orthogonal support; orange, events found in uncorrected nanopore DRS (but not error corrected) with orthogonal support; pink, events found in error-corrected nanopore DRS (but not uncorrected) with orthogonal support. (C) Comparison of RNA isoforms (defined as sets of co-spliced introns) identified in error-corrected full-length nanopore DRS, Araport11 and AtRTD2 annotations. Bar size represents the number of splicing events common to a group highlighted using circles below (see Supplementary file 5 for the exact values): yellow, unique splicing patterns found in nanopore DRS and both reference annotations; blue, novel isoforms found only in nanopore DRS.

Figure 4—figure supplement 1
Patterns of splicing revealed using nanopore DRS.

(A) Nanopore DRS novel splicing events can be validated using RT-PCR. Blue, RNA isoforms found using nanopore DRS; orange, Sanger sequencing products obtained using RT-PCR; black, RNA Araport11 annotation. (B) Splice junction classification of unannotated GU/AG splice sites found in error-corrected nanopore DRS data that also have Illumina support. Counts are plotted in log10 scale and the exact number of splice junctions in each set is indicated. (C) A novel exon skipping event at exon 3 of KH domain-containing protein AT5G56140, identified by nanopore DRS, leads to a frameshift resulting in a premature termination codon in exon 4 (orange asterisk). Blue track, nanopore DRS cleavage, blue, nanopore DRS read alignments; black, AtRTD2 annotation; orange, full novel open reading frame (ORF). (D) A selection of novel alternative acceptor site at XRCC (AT1G80420), identified in nanopore DRS, leads to a frameshift resulting in a premature termination codon in exon 6 (orange asterisk). Blue track, nanopore DRS cleavage, blue, nanopore DRS read alignments; black, AtRTD2 annotation; orange, full novel open reading frame (ORF).

Figure 4—figure supplement 1—source data 1

Sanger sequencing products from novel nanopore DRS splicing events – Figure 4—figure supplement 1A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig4-figsupp1-data1-v1.txt
Figure 5 with 1 supplement
Differential error rate analysis identifies sites of VIR-dependent m6A modifications transcriptome-wide.

(A) Loss of VIR function is associated with reduced error rate in nanopore DRS. Histogram showing the log2 fold change in the ratio of errors to reference matches at bases with a significant change in error profile in vir-1 mutant compared with the VIR-complemented line. Orange and green shaded regions indicate sites with increased and reduced errors in vir-1, respectively. (B) The motif at error rate sites matches the consensus m6A target sequence. The sequence logo is for the motif enriched at sites with reduced error rate in the vir-1 mutant. (C) Differential error rate sites are primarily found in 3′ UTRs. Bar plot showing the number of differential error rate sites per kb for different genic feature types of 48,149 protein coding transcript loci in the nuclear genome of the Araport11 reference. Upstream and downstream regions were defined as 200 nt regions upstream of the annotated transcription start sites and downstream of the annotated transcription termination sites, respectively. (D) Differential error rate sites and miCLIP peaks are similarly distributed within the 3′ UTR, without accumulation at the stop codon. Metagene plot centred on stop codons from 48,149 protein coding transcript loci, showing the frequency of nanopore DRS error sites (blue) and miCLIP peaks (orange). (E) The locations of differential error rate sites are in good agreement with the locations of sites identified by miCLIP. Histogram showing the distribution of distances to the nearest miCLIP peak for each site of reduced error. Most error sites (66%) are within 5 nt of a miCLIP peak. (F) Nanopore DRS differential error site analysis and miCLIP identify m6A sites in the 3′ UTR of PRPL34 RNA. Blue, miclip 5′ end coverage; purple, nanopore DRS differential error sites; black, RNA isoform from Araport11 annotation. Expanded region: blue, miCLIP coverage; purple, nanopore DRS differential error sites; black, motif scores using the m6A consensus position weight matrix (Figure 5B). A higher positive score denotes a higher likelihood of a match to the consensus sequence.

Figure 5—source data 1

vir-1 vs VIRc differential error sites identified by nanopore DRS – Figure 5A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig5-data1-v1.tds
Figure 5—source data 2

Motifs detected under vir-1 vs VIRc differential error sites - Figure 5B.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig5-data2-v1.tds
Figure 5—source data 3

m6A sites detected by miCLIP of Col-0 samples - Figure 5D.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig5-data3-v1.tds
Figure 5—figure supplement 1
Identification of VIR-dependent m6A transcriptome-wide.

(A) Gene track showing the alignment of nanopore DRS reads from unmodified synthetic MALAT1 RNAs (MALAT1 control) and MALAT1 RNAs with m6A modification (MALAT1 m6A). Alignments are coloured by the basecall at each position (A = green, C = blue, G = yellow, U = orange). Grey, coverage tracks (insertions to the reference are not included in coverage counts); black, test statistic derived from the G-test of aligned basecall profiles between the two alignments; square brackets, modified GGm6ACU k-mer. (B) vir-1 shows reduced levels of m6A compared with Col-0 and restored m6A levels in the VIR-complemented line (VIRc). The ratio of m6A/A obtained using LC-MS. Blue, Col-0; orange, vir-1; green, VIRc. (C) Frequency of m6A motifs detected at vir-1 reduced error sites, as detected by FIMO using the motif detected de novo by MEME and an FDR threshold of 0.1. (D) Bar plot showing the number of miCLIP peaks per kb of different genic feature types in the Araport11 reference. Upstream and downstream regions were defined as 200 nt regions upstream of the annotated transcription start sites and downstream of the annotated transcription termination sites, respectively. (E) Subsampling analysis of MALAT1. Different coloured lines show detection rate at varying subsample sizes for unmethylated sample vs. mixed unmethylated and methylated sample. Fractions in legend denote the fraction of reads in mixed sample which originate from methylated data. (F) Subsampling analysis of a selection of four methylation sites in three genes, in Arabidopsis. Different coloured lines show detection rate at varying subsample sizes for each methylation site.

Figure 5—figure supplement 1—source data 1

MALAT1 differential error rates identified by nanopore DRS – Figure 5—figure supplement 1A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig5-figsupp1-data1-v1.tds
Figure 5—figure supplement 1—source data 2

m6A:A ratios of Col-0, vir-1, and VIRc lines quantified using liquid chromatography – mass spectroscopy – Figure 5—figure supplement 1B.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig5-figsupp1-data2-v1.xlsx
Figure 6 with 1 supplement
Reduction in m6A RNA modification leads to disruption of the circadian clock.

(A) Genes with differential error rate sites have lower detectable RNA levels. Histogram showing the log2 fold change in protein coding gene expression based on counts from nanopore DRS reads in the vir-1 mutant compared to the VIR-complemented line. Blue, genes with differential error rate sites (n = 5083 genes); orange, genes without differential error rate sites (n = 14,675 genes). (B) The circadian period is lengthened in the vir-1 mutant. Mean delayed fluorescence measurements in arbitrary units: blue, Col-0 (n = 61 technical reps); green, the VIR-complemented line (VIRc; n = 29 technical reps); orange, vir-1 (n = 24 technical reps). Shaded areas show bootstrapped 95% confidence intervals for the mean. (C) Boxplot showing the period lengths calculated from delayed fluorescence measurements shown in (b): blue, Col-0; green, VIRc; orange, vir-1 (orange). (D) Poly(A) tail length is altered in the vir-1 mutant. Histogram showing the poly(A) tail length distribution of CAB1 (AT1G29930): blue, Col-0 (n = 40,841 reads); green, VIRc (n = 65,810 reads); orange, vir-1 (n = 68,068 reads). Arrows indicate phased peaks of poly(A) length at approximately 20, 40 and 60 nt. vir-1 distribution is significantly different from both Col-0 and VIRc, according to the Kolmogorov–Smirnov test (p < 1 × 10−16, p < 1 × 10−16 respectively).

Figure 6—source data 1

Differential gene expression results for vir-1 vs VIRc, using counts derived from nanopore DRS – Figure 6A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig6-data1-v1.xlsx
Figure 6—source data 2

Delayed fluorescence and circadian period length for Col-0, vir-1 and VIRc – Figure 6B.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig6-data2-v1.xlsx
Figure 6—source data 3

Differential poly(A) tail length estimate results for vir-1 vs VIRc – Figure 6D.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig6-data3-v1.tds
Figure 6—source data 4

Poly(A) tail length estimates for CAB1 for reads from Col-0, vir-1 and VIRc – Figure 6D.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig6-data4-v1.tds
Figure 6—figure supplement 1
Changes in the gene expression of circadian clock components in the vir-1 mutant.

(A) Histogram showing log2 fold changes in gene expression based on Illumina RNAseq data for the vir-1 mutant and VIRc. Blue, genes with differential error rate sites (n = 5,095 genes); orange, genes without differential error rate sites (n = 14,686 genes). (B) Expression of core circadian clock components is perturbed in the vir-1 mutant. Boxplots showing normalized gene expression measured using Illumina RNAseq in log2counts per million (CPM): green, the VIR-complemented line (VIRc); orange, the vir-1 mutant. Asterisks, significant expression changes (using an FDR threshold of 0.05); orange labelled genes, genes with 3′ UTR m6A detectable by nanopore DRS and miCLIP. Each scatter point represents a single biological replicate. UBIQUITIN LIGASE 21 (UBC21; AT5G25760) was used as a control. (C) Expression of CCA1, encoding a regulator of the circadian rhythm in Arabidopsis, is increased in the vir-1 mutant. Boxplot showing the gene expression change from Col-0 (measured by RT-qPCR) for VIRc (green) and vir-1 (orange). Three technical replicates of three biological replicates were conducted. Each scatter point represents the comparison of a technical replicate of treatment (VIRc or vir-1) against control (Col-0). The expression change in vir-1 is significant (p = 0.02; marked by asterisk). UBIQUITIN LIGASE 21 (UBC21; AT5G25760) was used as a control. (D) Expression of the Flowering Locus C (FLC) gene is decreased in the vir-1 mutant. Boxplot showing gene expression change from Col-0 (measured by RT-qPCR) for VIRc (green) and vir-1 (orange). Three technical replicates of three biological replicates were conducted. Each scatter point represents the comparison of a technical replicate of treatment (VIRc or vir-1) against control (Col-0). Expression change in vir-1 is significant (p = 2.4 × 10−14; marked by asterisk). The following source data are available for Figure 6—figure supplement 1.

Figure 6—figure supplement 1—source data 1

Differential gene expression results for vir-1 vs VIRc, using counts derived from Illumina RNAseq – Figure 6—figure supplement 1A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig6-figsupp1-data1-v1.xlsx
Figure 6—figure supplement 1—source data 2

qPCR expression of CCA1 and FLC in Col-0, vir-1 and VIRc – Figure 6—figure supplement 1C.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig6-figsupp1-data2-v1.xlsx
Figure 7 with 1 supplement
RNA methylation is closely linked to poly(A) site selection.

(A) Global shift to proximal 3′ end usage observed in the vir-1 mutant compared with the VIRc line. Histogram showing the distance in nucleotides between the most reduced and most increased 3′ end positions for genes in which the 3′ end profile is altered in vir-1 (detected with the Kolmogorov–Smirnov test, FDR < 0.05). A threshold of 13 nt was used to detect changes in the 3′ end position. (B) A shift to the use of more proximal poly(A) sites is observed in m6A containing transcripts in the vir-1 mutant. Histogram showing distance from the error site (n = 17,491 error sites) to upstream and downstream 3′ ends: orange, vir-1; green, VIRc. (C) Poly(A) site choice is phased with m6A at PRPL34 (AT1G29070). PRPL34 has four distinct poly(A) site clusters identified from VIRc reads. Analysis of the error profiles of these clusters at the closest upstream m6A motif indicates that proximally polyadenylated isoforms are less likely to be methylated. Green, VIRc 3’ coverage and poly(A) clusters; purple, nanopore differential error site tested; black, Araport11 annotation. Zoomed view shows stacked barplots of VIRc error profiles for the tested nanopore differential error site. Pairs of clusters with significantly different profiles (G-test, FDR < 0.05) are indicated with asterisks. (D) PRPL34, which is methylated in at least two positions in the 3’UTR (Figure 5F), also displays an increase in alternative polyadenylation at a proximal poly(A) site in the vir-1 mutant. The proximal poly(A) site is approximately 17 nt downstream of the closest methylated position. Purple, nanopore differential error sites; green, VIRc coverage; orange, vir-1 coverage; black, Araport11 annotation. (E) RNA methylation sites overlap with canonical and degenerate poly(A) site motifs. Line plot showing the enrichment of poly(A) site motifs (with an edit distance of one) around m6A motifs detected de novo under nanopore differential error sites: blue, the canonical PAS motif AAUAAA; grey, other similar motifs (excluding AAAAAA); orange, the UAUAAA motif; green, the AACAAA motif; yellow, the AAGAAA motif; black, the m6A motif (includes all sequences shown in Figure 5—figure supplement 1E). (F) Readthough events and chimeric RNAs are detected in vir-1. Green, nanopore DRS and Illumina RNAseq data for the VIRc line; orange, nanopore DRS and Illumina RNAseq data for the vir-1 mutant; grey and white, differentially expressed regions between vir-1 and VIRc detected using Illumina RNAseq data with DERfinder that are upregulated and downregulated, respectively; black, Araport11 annotation. Intergenic readthrough regions are highlighted by the purple dashed rectangle. (G) ALG3-GH3.9 chimeric RNAs transcribed in the vir-1 mutant. Chimeric readthrough RNAs were validated using RT-PCR. RT-PCR products were separated on agarose gel and stained with SYBR Safe. RT+ and RT-, cDNA produced in the presence and absence of reverse transcriptase, respectively; ladder, DNA size ladder; Water, no template control. TUBULIN was used as a control.

Figure 7—source data 1

Differential poly(A) site choice results for vir-1 vs VIRc, derived from nanopore 3’ ends – Figure 7A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig7-data1-v1.xlsx
Figure 7—source data 2

vir-1 vs VIRc differentially expressed regions from Illumina RNAseq - Figure 7F.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig7-data2-v1.tds
Figure 7—source data 3

Differential chimeric read results for vir-1 vs VIRc - Figure 7F.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig7-data3-v1.xlsx
Figure 7—figure supplement 1
Changes in RNA 3′ end formation in the vir-1 mutant.

(A) Splicing is moderately disrupted in the vir-1 mutant. Results of differential exon usage analysis with DEXseq are shown for contiguous regions (‘exon chunks’), which occur in the same sets of transcripts in the Araport11 reference. Regions were classified as a 5′ or 3′ variation if they were bounded by the TSS of one or more transcripts. Orange, features with increased usage; blue, features with reduced usage. Boxplots show the distribution in absolute log2 fold change for each feature set. (B) DaPars analysis of Illumina RNAseq data indicates global 3’UTR shortening in the vir-1 mutant. Histogram showing the change in the percentage of distal poly(A) site usage in vir-1 compared to VIRc. Only genes which are significantly differentially polyadenylated (FDR < 0.05) are shown. There is a clear directional effect towards proximal poly(A) site use. (C) vir-1 mutant exhibits increased readthrough of an intronic proximal poly(A) site in intron 5 of the Symplekin domain encoding gene TANG1 (AT1G27595). Green track, nanopore DRS coverage from VIRc; green, nanopore DRS isoforms from VIRc; orange track, nanopore DRS reads from the vir-1 mutant; orange, nanopore DRS isoforms from vir-1; black, Araport11 annotation; dashed black box highlights the site of proximal polyadenylation.

Figure 7—figure supplement 1—source data 1

Differential exon usage for vir-1 vs VIRc, derived from Illumina RNAseq - Figure 7—figure supplement 1A.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig7-figsupp1-data1-v1.xlsx
Figure 7—figure supplement 1—source data 2

Differential poly(A) site choice results for vir-1 vs VIRc, derived from Illumina RNAseq data – Figure 7—figure supplement 1B.

https://cdn.elifesciences.org/articles/49658/elife-49658-fig7-figsupp1-data2-v1.tds
Author response image 1
Low amounts of input RNA increase the number of Enolase II reads and short noise reads.

Read length distribution of mapped (blue) and unmapped (orange) reads originating from a standard DRS experiment (left panel) and an adapter ligation experiment (right panel).

Author response image 2
Noise reads have low quality scores.

Boxplot showing distribution of median per-read quality scores for mapped and unmapped reads from an adapter ligation DRS experiment.

Author response image 3
Differential error rate analysis of S. cerevisiae SK1 WT vs ime4.

(a) Histogram showing the log2 fold change in the ratio of errors to reference matches at bases with a significant change in error profile in ime4 mutant compared with wild type S. cerevisiae SK1. Orange and green shading indicates sites with increased and reduced errors in ime4, respectively. (b) Sequence logo showing the motif enriched at sites with reduced error rate in ime4. (c) Frequency of m6A motifs detected at ime4 reduced error sites, as detected by FIMO using the motif detected de novode novo by MEME and an FDR threshold of 0.1.

Author response image 4
Smoothed distribution of m6A sites detected by different approaches around stop codons in S.cerevisiae SK1.
Author response image 5
Antisense Illumina RNA-seq reads at RCA are likely to be spurious.

(a) Scatter plot showing correlation of sense and antisense read abundances derived from Illumina RNAseq data mapping to ERCC spike-in controls, shown in blue. Each point represents a single spike in transcript detected in a single RNAseq replicate (both Col-0 and hen2-2). Since ERCC spike-ins should not have antisense RNAs, this indicates the level of spurious antisense signal in the dataset. RCA, shown in the orange, does not display antisense RNA levels greater than the ERCC spike-ins, indicating that there are not likely to be genuine antisense RNAs at this locus. FLC, which is known to have antisense RNAs, is shown in green as a positive control. (b) Strip plot showing the sense/antisense read ratio at RCA does not change in the hen2-2 mutant (T test p=0.88). Many genuine antisense RNAs, e.g. those at FLC, are unstable and are therefore upregulated in the hen2-2 mutant relative to sense transcripts.

Author response image 6
Comparison of differential error rate analysis in MALAT1 data using older (3.1.2) and newer (3.2.4) Guppy releases.

Gene track showing the change in the per- reference base test statistic attained when comparing methylated and unmethylated synthetic MALAT1 fragments basecalled using Guppy 3.1.2, which uses recurrent neural network basecalling, and Guppy 3.2.4, which uses recurrent neural network basecalling with a flip-flop architecture.

Author response image 7
Prevalence of concatemerised reads in nanopore DRS data.

Barplot showing the percentage of read alignments in nanopore DRS data where softclipped regions (i.e. unmapped sequence at 5’ and 3’ ends of alignments) which is mappable elsewhere in the TAIR10 genome, possibly indicating undersplitting of read signal by the MinKNOW software. Adapter ligated datasets do not show increased levels, suggesting that RNA ligase I does not cause a detectable increase in the number of concatemers/chimeric reads.

Author response image 8
Nanopore DRS captures downstream products of miRNA cleavage at transcripts encoded by TAS genes.

Orange, 5’ coverage from capped nanoPARE reads; purple, 5’ coverage from uncapped nanoPARE reads; blue, nanopore DRS 5’ coverage; black miRNA target site alignment; black, Araport11 annotation.

Tables

Author response table 1
Performance of MAZTER-seq, EpiNano and Nanopore differential error rate methods using S. cerevisiae Me-RIP/m6A-seq peaks as the “ground truth”.
MAZTER
seq
EpiNano
p >= 0.0
EpiNano
p >= 0.5
EpiNano
p >= 0.9
Nanopore error sitesNanopore error site
motifs
Total ground truth
sites
1,1001,1001,1001,1001,1001,100
Total prediction
sites
1,25732,9208,0121692,294555
Ground truth sites
supported
12748923314138124
Unsupported
prediction sites
1,12632,3097,7671552,087430
Unsupported
ground truth sites
9736118671,086962976
Precision0.1010.0150.0290.0830.0620.224
Recall0.1150.4450.2120.0130.1250.113
F1 score0.1080.0290.0510.0220.0830.150
Fisher p value1.7E-1246.9E-2051.5E-1124.9E-133.9E-983.9E-161
Author response table 2
Performance of Me-RIP/m6A seq, EpiNano and Nanopore differential error rate methods using S.cerevisiae MAZTER-seq sites as the “ground truth”.
Me-RIP
seq
Me-RIP
seq motifs
Me-RIP
seq
motifs (ACA)
EpiNano
p >= 0.5
Nanopore error sitesNanopore error site motifsNanopore error site motifs
(ACA)
Total ground truth
sites
1,2571,2571,2571,2571,2571,2571,257
Total prediction
sites
1,1006693618,0122,294555348
Ground truth sites
supported
13195950123116116
Unsupported
prediction sites
9735752678,0122,130442235
Unsupported
ground truth sites
1,1261,1621,1621,2571,1341,1411,141
Precision0.1190.1420.26200.0550.2080.330
Recall0.1040.0760.07600.0980.0920.092
F1 score0.1110.0990.11700.0700.1280.144
Fisher p value1.7E-1242.6E-1551.9E-18318.5E-1436.93E-2114.26E-238
Author response table 3
Known upf1 sensitive alternative splicing events which are supported by nanopore DRS read alignments.

Table showing examples of NMD sensitive alternative splicing events taken from Table 1 of Kalyna et al.13 which are supported by nanopore DRS reads. Duplicate genes and unsupported examples have been removed from the table.

Gene IDGene NameAlternative splicing class
AT1G55310At-SCL33Exon 4 alternative acceptor
AT2G37340At-RS2Z33Exon 3 alternative acceptor
AT4G16845VRN2Intron 2 retention
AT4G39260GRP8/CCR1Exon 1 alternative donor
AT1G77080MAF1Exon 4 alternative acceptor
AT2G02960Zinc finger (C3HC4) proteinExon 2 alternative acceptor
AT2G46790APRR9/TL1Exon 2 alternative donor
AT2G38880NF-YB1/HAP3aExon 6 alternative donor
AT3G49430At-SR34aIntron 1 cassette exon
AT3G01150At-PTB2aIntron 2 Cassette exon, exon 8 alternative donor
AT3G13570At-SCL30aIntron 3 cassette exon
AT3G53500At-RS2Z32Exon 3 alternative acceptor
AT3G61860At-RS31Intron 2 cassette exon
AT2G21660GRP7/CCR2Exon 1 alternative donor
AT5G53180At-PTB2bIntron 3 cassette exon
AT4G25500At-RS40Intron 2 cassette exon
AT3G55460At-SCL30Intron 3 cassette exon
AT2G29210Splicing factor PWI proteinExon 6 alternative acceptor
AT1G07830RPL29 familyExon 1 alternative donor
AT1G02090FUS5/CSN7/COP15Exon 8 alternative acceptor
AT1G72560PSD/Exportin-tExon 13 alternative donor
AT4G33060CYP57Intron 5 cassette exon
AT5G65060MAF3Exon 4 alternative acceptor
AT5G35410SOS2/CIPK24Exon 9 alternative acceptor
AT4G36960RRM-containing proteinIntron 1 retention

Additional files

Supplementary file 1

Properties of the nanopore DRS sequencing data.

Dataset statistics for all nanopore DRS sequencing runs conducted. Datasets are sorted by the date of the sequencing run. All data was collected using a MinION with R9.4 flow cell and SQK-RNA001 library kit. Increases in mapping and over-splitting rate that occur in samples collected after September 2018 are therefore likely to have resulted from changes in the MinKNOW software. Other variations in mapping rates are likely caused by differences in the amount of RNA loaded onto the flowcell. Depletion of degraded RNA during 5’ adapter RNA ligation, for example, results in smaller libraries for nanopore sequencing. This causes lower pore occupancy during sequencing and results in many reads with low median quality scores, which are actually noise signal from empty pores (Mojarro et al., 2018; Pontefract et al., 2018). Related to Figures 17.

https://cdn.elifesciences.org/articles/49658/elife-49658-supp1-v1.xlsx
Supplementary file 2

Alignment statistics for Illumina RNAseq dataset.

Related to Figures 17.

https://cdn.elifesciences.org/articles/49658/elife-49658-supp2-v1.xlsx
Supplementary file 3

Adapter detection using BLASTN rules approach.

The number of reads with adapters were detected in two biological replicates (Tables A and B respectively) of Col-0 sequenced with and without adapter ligation protocol. Rules are applied cumulatively (i.e. row one shows reads that pass the first rule, row two shows read alignments that pass the first and second rules, etc.). The signal-to-noise ratio shows the number of positive examples detected using rules in the adapter-ligated dataset divided by the number of false positives from the dataset collected without adapters. Related to Figure 3.

https://cdn.elifesciences.org/articles/49658/elife-49658-supp3-v1.xlsx
Supplementary file 4

Cleavage products are detectable at predicted miRNA target sites in nanopore DRS data.

Table showing the genes where we were able to detect degradation intermediates from miRNA cleavage events using nanopore DRS. Predictions in 5’UTRs and across splice junctions have been filtered to reduce false positives. Only predictions with support from nanoPARE data are shown. Related to Figure 3.

https://cdn.elifesciences.org/articles/49658/elife-49658-supp4-v1.xlsx
Supplementary file 5

Splice junctions supported by nanopore DRS and Illumina RNAseq.

Numbers are shown for (A) the unique splice junction set intersections upset plot (Figure 4B) and (B) unique linked splicing events upset plot (Figure 4C). Shaded cells denote sets included in the intersection for that row, while unshaded cells denote sets excluded from the intersection. Rows are sorted by the size of the intersection for canonical splice junctions. Related to Figure 4

https://cdn.elifesciences.org/articles/49658/elife-49658-supp5-v1.xlsx
Supplementary file 6

Known upf1 sensitive alternative splicing events which are supported by nanopore DRS read alignments.

Table showing examples of NMD sensitive alternative splicing events taken from Table 1 of Kalyna et al. (2012). which are supported by nanopore DRS reads. Duplicate genes and unsupported examples have been removed from the table. [Linked to Figure 4].

https://cdn.elifesciences.org/articles/49658/elife-49658-supp6-v1.xlsx
Supplementary file 7

Flowering time gene expression.

Change in gene expression of curated genes involved in flowering time in Arabidopsis, as detected using Illumina RNAseq for vir-1 compared with the VIR-complemented line. In all, 12.2% of flowering time genes show a change in mRNA level expression in the vir-1 mutant. Source of flowering time genes: George Coupland, Cologne: https://www.mpipz.mpg.de/14637/Arabidopsis_flowering_genes. Related to Figure 6.

https://cdn.elifesciences.org/articles/49658/elife-49658-supp7-v1.xlsx
Supplementary file 8

Primers used in this study.

https://cdn.elifesciences.org/articles/49658/elife-49658-supp8-v1.xlsx
Supplementary file 9

Key resources table.

https://cdn.elifesciences.org/articles/49658/elife-49658-supp9-v1.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/49658/elife-49658-transrepform-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Matthew T Parker
  2. Katarzyna Knop
  3. Anna V Sherwood
  4. Nicholas J Schurch
  5. Katarzyna Mackinnon
  6. Peter D Gould
  7. Anthony JW Hall
  8. Geoffrey J Barton
  9. Gordon G Simpson
(2020)
Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification
eLife 9:e49658.
https://doi.org/10.7554/eLife.49658