Calibrated analysis framework for nanopore direct RNA sequencing uncovers cell-specific m6A stoichiometry at conserved sites

  1. Department of Microbiology, New York University School of Medicine, New York, United States
  2. Institute of Virology, Hannover Medical School, Hanover, Germany
  3. Antimicrobial-Resistant Pathogens Program, New York University Grossman School of Medicine, New York, United States
  4. German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, , Germany
  5. Department of Pharmacology, Weill Medical College, Cornell University, New York, United States
  6. Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hanover, Germany

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Pedro Batista
    National Institutes of Health, National Cancer Institute, Bethesda, United States of America
  • Senior Editor
    Yamini Dalal
    National Cancer Institute, Bethesda, United States of America

Reviewer #1 (Public review):

Summary:

The authors set out to evaluate how accurately direct sequencing of RNA can identify and quantify several chemical modifications on RNA molecules, focusing primarily on m6A. A central goal of the work is to compare this approach with an independent chemical-based method (glyoxal and nitrite-mediated deamination of unmethylated adenosines (GLORI), using the same RNA samples, in order to assess reproducibility, false-positive signals, and sensitivity across a range of detection strategies. The authors further aim to demonstrate the biological utility of this approach by applying it to two human cell types, primary human fibroblasts and HD10.6 neurons. While the manuscript also reports detection of additional RNA modifications (pseudouridine and m5C, the depth of analysis and strength of controls are greatest for m6A, which forms the primary focus of the study

Strengths:

A strength of this work is the direct comparison of two distinct measurement approaches performed on the same RNA input material; this has not been done in other recently published benchmarking studies evaluating the utility of the recent direct RNA sequencing for calling m6A. The authors systematically test multiple analysis models and show that, when appropriate filtering is applied, detection of modified sites is reproducible across software versions. The use of synthetic RNA standards and METTL3 inhibitors as negative controls helps to reinforce the overall results.

The data show good agreement between the two methods at higher m6A modification levels, supporting the conclusion that direct RNA sequencing can reliably detect high-confidence modification sites. The authors also demonstrate that this approach can, in principle, provide information at the level of individual RNA variants (although only one example was provided), which is difficult to achieve with short-read methods. The methodology described here is likely to be useful to others seeking to apply similar approaches to identify and quantify m6A. The study also explores the detection of other RNA modifications, which highlights the broader potential of the approach, although these analyses are necessarily more exploratory given the more limited controls and data available.

Weaknesses:

Despite these strengths, several issues limit the interpretation of the results and should be clarified for readers.

First, the authors appropriately address false-positive signals by estimating expected false-positive rates and by quantitatively comparing sequence motif enrichment before and after filtering. These analyses provide important support for the use of stoichiometry-based thresholds and demonstrate that filtering substantially improves specificity. However, even after filtering, a subset of detected sites remains outside the expected sequence context. It therefore remains unclear to what extent these non-canonical sites reflect genuine biology versus residual technical artifacts.

Second, claims regarding the ability of direct RNA sequencing to resolve modification patterns across different RNA variants are supported by very limited evidence. The conclusion that this approach provides superior isoform-level quantification relative to short-read methods is based largely on a single gene example. While this case is interesting, it does not establish how widespread or general this advantage is. A broader analysis indicating how many genes show isoform-specific modification patterns detectable by this method, and how often these are missed by the comparison approach, would be necessary to support a general claim.

Third, the biological interpretation of cell type-specific differences in modification levels remains underdeveloped. Although differences in modification stoichiometry are reported between fibroblasts and neuron-derived cells, the functional consequences of these differences are not addressed. It is unclear whether changes in modification levels are associated with differences in RNA abundance, stability, or translation. As a result, statements suggesting that these modifications fine-tune core cellular pathways are speculative and should either be supported with additional analyses or framed more cautiously.

Related to this point, differences in gene expression between the two cell types are a potential confounding factor. The pathway enrichment patterns presented appear biased toward particular functional categories, but without clear control for differential gene expression, it is difficult to determine whether the observed enrichment reflects cell type-specific regulation of RNA modification or simply differences in which genes are expressed. Clarifying how background gene sets were defined for these analyses would help readers interpret the results.

The manuscript also suggests broader differences in overall modification levels between cell types, but this is not validated using an independent global assay. An orthogonal measurement of total modification levels on polyadenylated RNA (for example, dot blot) would help place site-specific stoichiometry differences in a clearer biological context.

Finally, the effects of the METTL3 inhibitor on these cell types are not fully characterized. While changes in m6A modification patterns are reported following treatment, the manuscript does not address whether the treatment affects cell growth or viability.

Appraisal of conclusions and impact:

Overall, the study provides an informative technical assessment of direct RNA sequencing for modification detection and establishes clear conditions under which the method performs well. The evidence strongly supports conclusions related to technical benchmarking, reproducibility, and the importance of filtering and controls, particularly for m6A. In contrast, conclusions regarding isoform-specific regulation and cell type-specific biological roles of RNA modification are less well supported by the data currently presented, and would benefit from either additional analysis or more restrained interpretation.

The work is likely to have a meaningful impact as a practical reference for researchers using direct RNA sequencing, particularly by clarifying sources of false positives and the value of appropriate controls. With clearer limits placed on biological interpretation or more data presented in support of the biological interpretation, the study would serve as a valuable reference for users seeking to apply these technologies reliably.

Reviewer #2 (Public review):

Summary:

In this study, the authors aim to establish a calibrated framework for detecting RNA modifications using long-read sequencing and apply it to compare modification patterns between fibroblasts and neuron-like cells. The work combines long-read sequencing, in vitro transcribed controls, methyltransferase inhibition, and comparison to an orthogonal sequencing-based method in an attempt to derive filtering strategies that reduce false positive modification calls. The authors further apply this framework to explore differences in modification levels between the two cell types.

The resulting dataset may be of interest to researchers working on RNA modification detection using long-read sequencing technologies. Independent datasets across additional cellular systems can be useful for benchmarking computational methods and evaluating the behavior of modification detection models. However, the conceptual advance of the analytical framework presented here remains somewhat unclear, as many aspects of the analysis closely resemble strategies that have already been described in recent benchmarking studies.

Strengths:

A clear strength of the study is the generation of a relatively large long-read sequencing dataset together with several useful experimental controls, including in vitro transcribed RNA and pharmacological inhibition of the methyltransferase enzyme responsible for installing this modification. These controls are helpful for illustrating the challenges associated with distinguishing high-confidence modification sites from background signals. The inclusion of two different human cellular systems also provides an additional dataset that may be useful for benchmarking and cross-validation in the field. The study addresses a practically relevant question for the community, namely, how to reduce false positive calls in long-read sequencing-based RNA modification analyses.

Weaknesses:

The main weakness of the manuscript is its limited methodological novelty. Much of the analytical framework presented here closely follows benchmarking strategies that have already been described in recent studies of RNA modification detection using long-read sequencing. Several previous studies have evaluated modification-aware basecalling approaches, discussed the need for stringent filtering strategies, and compared long-read sequencing-based predictions with orthogonal mapping approaches. The manuscript would therefore benefit from a deeper engagement with the recent benchmarking literature and a clearer explanation of what conceptual or methodological advance the present study provides beyond these earlier analyses.

A second concern relates to the filtering strategy that forms the core of the proposed workflow. The manuscript applies several thresholds, including modification probability, stoichiometry, and read coverage cutoffs, but it is not clearly explained how these thresholds were determined. It remains unclear whether these cutoffs were derived from statistical calibration, empirical optimization using the presented dataset, or adopted from previous studies. Because the downstream conclusions depend strongly on these filtering choices, a clearer methodological justification would strengthen the work and help readers assess the robustness of the proposed framework.

The interpretation of the comparison between the two modification detection approaches also appears somewhat overstated. Differences between the methods are frequently interpreted as evidence that one approach produces large numbers of false positive calls, but the analyses presented do not fully exclude alternative explanations such as differences in sensitivity, sequencing depth, or methodological biases. A more cautious interpretation of these discrepancies would therefore be appropriate.

Some discussion points also appear speculative. In particular, certain interpretations propose mechanistic explanations without presenting analyses that would allow these possibilities to be distinguished. Such interpretations would benefit from either additional supporting analyses or more cautious phrasing.

From a methodological perspective, the statistical robustness of the thresholds used throughout the analysis could also be discussed in more detail. Given the relatively modest read coverage cutoff applied in the study, low stoichiometry estimates may be strongly influenced by sampling noise, and fixed stoichiometry thresholds may therefore not correspond to a consistent level of confidence across sites. In addition, the manuscript relies heavily on fixed modification probability cutoffs to define high-confidence calls, but it does not discuss whether these scores are statistically calibrated or how they relate to expected error rates. Neural network outputs are often not well-calibrated probabilities, and interpreting these values as direct confidence estimates can therefore be problematic. Finally, modification detection models trained on known modification sites may capture sequence-context patterns present in the training data, meaning that motif enrichment or positional distributions along transcripts may partly reflect model biases rather than purely biological signals. A brief discussion of these limitations would help readers better interpret the robustness of the proposed filtering strategy and the downstream biological conclusions.

Overall, while the dataset may be of interest to the community, the extent to which the study advances current methodological understanding beyond recent benchmarking efforts remains limited.

Minor comments:

The discussion of the "DRACH" versus "all-context" outputs would benefit from greater technical precision. The statement that the number of sites within DRACH motifs identified by the all-context approach was nearly identical to the number reported by the DRACH model may suggest that these outputs derive from fundamentally different predictive models. As I understand it, the underlying neural network is the same, whereas the distinction lies primarily in the classification context. Clarifying this explicitly in the manuscript would improve interpretability and avoid potential confusion for readers.

The manuscript compares results obtained with different basecalling and modification settings but refers primarily to Dorado software versions. This may be misleading, as software version and model version are not necessarily equivalent. Different basecalling or modification models can be used with the same software release, and newer software versions may still use older models. For clarity and reproducibility, the authors should report the exact basecalling and modification model names used in the analyses rather than referring only to the Dorado software version.

Reviewer #3 (Public review):

In this study, the authors aim to establish a calibrated framework for identifying RNA chemical marks from direct RNA sequencing data using a modification-aware basecalling workflow, with a particular focus on N6-methyladenosine. By combining native RNA sequencing with an unmodified control transcriptome, enzyme inhibition, comparison across multiple software versions, and orthogonal validation using an independent mapping approach, the authors seek to define a best-practice pipeline for reducing false-positive calls and improving confidence in quantitative interpretation across cell types.

A major strength of the work is the rigor of the benchmarking strategy. In particular, the inclusion of an unmodified control transcriptome is both important and useful, and the study provides compelling evidence that this control remains necessary for robust interpretation, despite being omitted in many current workflows. The comparison across software versions and the matched analysis with an independent sequencing-based approach also substantially strengthen the evidence presented. The work therefore makes a valuable contribution to the community by offering a more stringent analytical framework that will likely be broadly useful to groups applying native RNA sequencing to study RNA chemical marks.

The evidence supporting the main conclusions is solid overall. The authors convincingly show that stringent filtering substantially reduces false-positive calls and improves agreement with orthogonal approaches, particularly at highly modified sites. The observation that many sites are conserved across cell types, while showing differences in relative modification levels, is also supported by the presented analyses.

At the same time, several conceptual issues limit the strength of some downstream interpretations. Most importantly, the manuscript repeatedly refers to the reported values as "stoichiometry," whereas the underlying software output is more appropriately interpreted as a statistical estimate of the proportion of aligned reads classified as modified. This distinction is important because the conclusions regarding cell-type differences rely on quantitative comparisons of these values. In addition, the current calling framework depends on successful canonical base assignment before modification calling, which raises an important limitation: sites with the strongest signal deviations may be underrepresented if they are more likely to be miscalled during basecalling. This issue may be especially relevant for RNA marks that induce stronger mismatch signatures than N6-methyladenosine and should be more explicitly discussed.

Overall, the authors largely achieve their primary aim of establishing a more rigorous and broadly applicable analytical framework for direct RNA sequencing-based modification detection. The work is likely to have a meaningful impact on the field, particularly by reinforcing the importance of appropriate negative controls and benchmarking standards. With clearer framing of the quantitative outputs and explicit discussion of current software limitations, this study will serve as a highly useful resource for the community.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation