Alignment parameterisation in the context of frequent multiple-alignment

a) Impact of bowtie2 parameterisation on the percentage of correct alignments. Reads were simulated as full length and with only sequencing errors. b) The percentage of single and multi-mapped reads at each level of tRNA nomenclature. c) The frequency of reads multi-mapping at the Gene locus ID, Transcript ID and Anticodon level. Frequencies were normalised by dividing by the total number of reads aligned to the two tRNAs. Data shown is from YAMAT-Seq, BT20, replicate A.

The observed error profiles reflect the expected misincorporations from modifications in MODOMICS

a). Misincorporation frequencies for Inosine and m1A sites b). The proportion of reads which are truncated and how often this occurs close to a known modification site. c). The proportion of truncated reads which are truncated within 1 nucleotide of common modifications. m2,2G = N2,N2-dimethylguanosine, m2G = N2-methylguanosine, m1Y = 1-methylpseudouridine, m5C = 5-methylcytidine, m1G = 1-methylguanosine, m1A = 1-methyladenosine, D = dihydrouridine, Y = pseudouridine.

Alignment accuracy is dependent upon the alignment strategy and tRNA-Seq method

a) Percentage of reads aligned. b). Percentage of correct alignments. c). Spearman correlations for read misalignments rates between pairs of tRNA-Seq methods or pairs of samples within a single tRNA-Seq method. QuantM-tRNA-seq is not included since these samples were only for Mus musculus. d). A comparison of correct assignments for 4 selected Homo sapiens anticodons which are most variable across tRNA-Seq methods. QuantM-tRNA-seq is not included since these samples were only for Mus musculus.

Approaches for tallying reads per tRNA from aligned reads. SHRiMP does not report MAPQ, so the ≠=MAPQ>10 approach is not possible. Decision and Salmon have not previously been utilised

Quantification levels, using GtRNAdb nomenclature, and how quantification is achieved at each level by each read tallying approach

a) Reads assigned at anticodon and mimseq isodecoder level for each read tally method. Only results from alignment with bowtie2 or GSNAP (Mimseq) are shown. b) Mean RMSE at anticodon and mimseq isodecoder level. *=Best approach for each tRNA-Seq method, ▾=Worst approach. c). As per (b), except Pearson correlation coefficient. d) The percentage of reads from each simulation sample which are single or multiple aligned, separated by whether the alignment position(s) have the correct anticodon. Multi-aligned reads with more than more anticodon were deemed incorrect.

Each quantification approach has specific shortfalls

a) Pearson correlation between ground truth and estimated read counts for six anticodons with variable quantification accuracy between read tallying approaches. b). c) The variance in observed fold-changes in the real tRNA-Seq data that is explained by fold-changes in the uniform simulation data. x = The mean variance explained across the four Homo sapiens datasets.

a) The complete misincorporation frequencies for all modifications. b) The frequency of read truncations with respect to the distance from the 3’ end of the tRNA.

a) Percentage of correct alignments at all levels of quantification. b) Aligned reads vs correct alignments. Aligners with a higher percentage of aligned reads tend to have a lower percentage of correct alignments. c) Example misincorporation events with differential frequency across cell lines in the mim-tRNAseq and YAMATseq simulations.

a Reads assigned at each level for each read tally method. Only results from alignment with bowtie2 or GSNAP (Mimseq) are shown. b) The percentage of reads from each simulation sample which are single or multiple aligned, separated by whether the alignment position(s) have the correct anticodon. Multi-aligned reads with more than more anticodon were deemed incorrect. c) Mean RMSE for all approaches at all levels. *=Best approach for each tRNA-Seq method, ▾=Worst approach. c) As per (c), except Pearson correlation coefficient.

a). The correlation between the fraction of reads correctly assigned for each anticodon and the Pearson correlation between the estimated and true read counts. The blue line shows a linear regression fit. b). As per (a) but for mimseq isodecoder level quantification. c) The correlation between ground truth and estimated read counts for 6 anticodons with variable quantification accuracy, from YAMAT-Seq simulations. The dashed line represents equality.