Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy

Tom Smith; Mie Monti; Anne E Willis; Lajos Kalmár

doi:10.7554/eLife.96955.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Jong-Eun Park
Korean Advanced Institute of Science and Technology, Daejeon, Republic of Korea
Senior Editor
Murim Choi
Seoul National University, Seoul, Republic of Korea

Reviewer #1 (Public Review):

Summary:

In the manuscript titled "Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy," Tom Smith and colleagues conducted a comparative evaluation of various sequencing-based tRNA quantification methods. The inherent challenges in accurately quantifying tRNA transcriptional levels, stemming from their short sequences (70-100nt), extensive redundancy (~600 copies in human genomes with numerous isoacceptors and isodecoders), and potential for over 100 post-transcriptional chemical modifications, necessitate sophisticated approaches. Several wet-experimental methods (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA) combined with bioinformatics tools (bowtie2-based, SHRiMP, and mimseq) have been proposed for this purpose. However, their practical strengths and weaknesses have not been comprehensively explored to date. In this study, the authors systematically assessed and compared these methods, considering factors such as incorrect alignments, multiple alignments, misincorporated bases (experimental errors), truncated reads, and correct assignments. Additionally, the authors introduced their own bioinformatic approaches (referred to as Decision and Salmon), which, while not without flaws (as perfection is unattainable), exhibit significant improvements over existing methods.

Strengths:

The manuscript meticulously compares tRNA quantification methods, offering a comprehensive exploration of each method's relative performance using standardized evaluation criteria. Recognizing the absence of "ground-truth" data, the authors generated in silico datasets mirroring common error profiles observed in real tRNA-seq data. Through the utilization of these datasets, the authors gained insights into prevalent sources of tRNA read misalignment and their implications for accurate quantification. Lastly, the authors proposed their downstream analysis pipelines (Salmon and Decision), enhancing the manuscript's utility.

Weaknesses:

As discussed in the manuscript, the error profiles derived from real-world tRNA-seq datasets may still harbor biases, as reads that failed to "align" in the analysis pipelines were not considered. Additionally, the authors did not validate the efficacy of their "best practice" pipelines on new real-world datasets, preferably those generated by the authors themselves. Such validation would not only confirm the improvements but also demonstrate how these pipelines could alter biological interpretations.
Because tRNA-sequencing methods have not been widely used (compared to mRNA-seq), many readers would not be familiar with the characteristics of different methods introduced in this study (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA; bowtie2-based, SHRiMP, and mimseq; what are the main features of "Salmon?"). The manuscript will read better when the basic features of these methods are described in the manuscript, however brief.

https://doi.org/10.7554/eLife.96955.1.sa1

Reviewer #2 (Public Review):

Summary:

The authors provided benchmarking study results on tRNA-seq in terms of read alignment and quantification software with optimal parameterization. This result can be a useful guideline for choosing optimal parameters for tRNA-seq read alignment and quantification.

Strengths:

Benchmarking results for read alignment can be a useful guideline to choose optimal parameters and mapping strategy (mapping to amino acid) for various tRNAseq.

Weaknesses:

The topic is highly specific, and the novelty of the analysis might not be widely useful for general readers.

Some details of the sequencing data analysis pipeline are not clear for general readers:

(1) The explanation of the parameter D for bowtie2 sounds ambiguous. "How much effort to expend" needs to be explained in more detail.

(2) Please provide optimal parameters (L and D) for tRNA-seq alignment.

(3) I think the authors chose L=10 and D=100 based on Figure 1A. Which dataset did you choose for this parameterization among ALL-tRNAseq, DM-tRNAseq, mim-tRNAseq, QuantM-tRNA-seq, and YAMAT-seq?

(4) Salmon does not need a read alignment process such as Bowtie2. Hence, it is not clear "Only results from alignment with bowtie2" in Figure legend for Figure 4a.

https://doi.org/10.7554/eLife.96955.1.sa0

Author response:

We thank the reviewers for their critical appraisal of our manuscript. We will address the points of confusion and/or lack of clarity in a revised manuscript. We agree with reviewer 1 that applying the best practice pipeline(s) on new experimental data and comparing this approach with current practices would be a useful demonstration of how this alters the biological interpretation. This is something we are in the process of completing but believe this is best addressed in a separate manuscript where we can focus on the associated biological findings, allowing this manuscript to remain focused on the accurate quantification of tRNA-Seq data.

https://doi.org/10.7554/eLife.96955.1.sa3

Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy

Peer review process

Editors

Be the first to read new articles from eLife