Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy

Tom Smith; Mie Monti; Anne E Willis; Lajos Kalmár

doi:10.7554/eLife.96955.2

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Reviewing Editor
Jong-Eun Park
Korean Advanced Institute of Science and Technology, Daejeon, Republic of Korea
Senior Editor
Murim Choi
Seoul National University, Seoul, Republic of Korea

Reviewer #1 (Public review):

Summary:

In the manuscript titled "Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy," Tom Smith and colleagues conducted a comparative evaluation of various sequencing-based tRNA quantification methods. The inherent challenges in accurately quantifying tRNA transcriptional levels, stemming from their short sequences (70-100nt), extensive redundancy (~600 copies in human genomes with numerous isoacceptors and isodecoders), and potential for over 100 post-transcriptional chemical modifications, necessitate sophisticated approaches. Several wet-experimental methods (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA) combined with bioinformatics tools (bowtie2-based, SHRiMP, and mimseq) have been proposed for this purpose. However, their practical strengths and weaknesses have not been comprehensively explored to date. In this study, the authors systematically assessed and compared these methods, considering factors such as incorrect alignments, multiple alignments, misincorporated bases (experimental errors), truncated reads, and correct assignments. Additionally, the authors introduced their own bioinformatic approaches (referred to as Decision and Salmon), which, while not without flaws (as perfection is unattainable), exhibit significant improvements over existing methods.

Strengths:

The manuscript meticulously compares tRNA quantification methods, offering a comprehensive exploration of each method's relative performance using standardized evaluation criteria. Recognizing the absence of "ground-truth" data, the authors generated in silico datasets mirroring common error profiles observed in real tRNA-seq data. Through the utilization of these datasets, the authors gained insights into prevalent sources of tRNA read misalignment and their implications for accurate quantification. Lastly, the authors proposed their own downstream analysis pipelines (Salmon and Decision), enhancing the manuscript's utility.

https://doi.org/10.7554/eLife.96955.2.sa2

Reviewer #2 (Public review):

Summary:

The authors provided benchmarking study results on tRNA-seq in terms of read alignment and quantification software with optimal parameterization. This result can be a useful guideline for choosing optimal parameters for tRNA-seq read alignment and quantification.

Strengths:

Benchmarking results for read alignment can be a useful guideline for choosing optimal parameters and mapping strategy (mapping to amino acid) for various tRNAseq.

Weaknesses:

Some explanation on sequencing data analysis pipeline is not clear for general readers.

https://doi.org/10.7554/eLife.96955.2.sa1

Author response:

The following is the authors’ response to the original reviews.

Reviewer 1:

Because tRNA-sequencing methods have not been widely used (compared to mRNA-seq), many readers would not be familiar with the characteristics of different methods introduced in this study (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA; bowtie2-based, SHRiMP, and mimseq; what are the main features of "Salmon?"). The manuscript will read better when the basic features of these methods are described in the manuscript, however brief.

Introduction page 4 now clarifies a little more the difference between bowtie2, SHRiMP and mimseq. Results page 9 briefly summarises the differences between the tRNA-Seq methods. Results page 14 clarifies how Decision and Salmon work.

Reviewer 2:

(1) The explanation of the parameter D for bowtie2 sounds ambiguous. "How much effort to expend" needs to be explained in more detail.

Results page 6 gives a more precise explanation of the D parameter.

(2) Please provide optimal parameters (L and D) for tRNA-seq alignment.

I think optimal here is not possible to determine. It will depend on the species, the frequency of misincorporations due to modifications (tRNA-Seq protocol specific) and how long one is willing to let bowtie continue searching for a better match. The point of Figure 1a is that D needs to be increased if L is decreased and an error is allowed in the seed. I think the sentence in the results section Figure 1a is the appropriate way to express this without committing to a single ‘optimal’ parameterisation_:_ ‘We observed that when an error in the seed is allowed, as the seed length is decreased, there needs to be a concomitant increase in effort expended to allow bowtie2 more opportunities to find the best possible alignment, especially with respect to the Transcript ID‘.

(3) I think the authors chose L=10 and D=100 based on Figure 1A. Which dataset did you choose for this parameterization among ALL-tRNAseq, DM-tRNAseq, mim-tRNAseq, QuantM-tRNA-seq, and YAMAT-seq?

Figure 1A is based on simulation of full length reads with only sequencing errors, e.g not from any tRNA-Seq method in particular. This is stated in the results text and I’ve clarified in the figure legend.

(4) Salmon does not need a read alignment process such as Bowtie2. Hence, it is not clear "Only results from alignment with bowtie2" in Figure legend for Figure 4a.

I’m using Salmon in ‘alignment-mode’, taking the alignments from bowtie2. I’ve clarified this in results page 14.

https://doi.org/10.7554/eLife.96955.2.sa0

Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy

Peer review process

Editors

Be the first to read new articles from eLife