Secondary structure of the SARS-CoV-2 genome is predictive of nucleotide substitution frequency

Zach Hensel

doi:10.7554/eLife.98102.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Richard Neher
University of Basel, Basel, Switzerland
Senior Editor
John Schoggins
The University of Texas Southwestern Medical Center, Dallas, United States of America

Reviewer #1 (Public Review):

Summary:

This very short paper shows a greater likelihood of C->U substitutions at sites predicted to be unpaired in the SARS-CoV-2 RNA genome, using previously published observational data on mutation frequencies in SARS-CoV-2 (Bloom and Neher, 2023).

General comments:

A preference for unpaired bases as a target for APOBEC-induced mutations has been demonstrated previously in functional studies so the finding is not entirely surprising. This of course assumes that A3A or other APOBEC is actually the cause of the majority of C->U changes observed in SARS-CoV-2 sequences.

I'm not sure why the authors did not use the published mutation frequency data to investigate other potential influences on editing frequencies, such as 5' and 3' base contexts. The analysis did not contribute any insights into the potential mechanisms underlying the greater frequency of C->U (or G->U) substitutions in the SARS-CoV-2 genome.

https://doi.org/10.7554/eLife.98102.1.sa1

Reviewer #2 (Public Review):

Hensel investigated the implications of SARS-CoV-2 RNA secondary structure in synonymous and nonsynonymous mutation frequency. The analysis integrated estimates of mutational fitness generated by Bloom and Neher (from publicly available patient sequences) and a population-averaged model of RNA basepairing from Lan et al (from DMS mutational profiling with sequencing, DMS-MaPseq).

The results show that base-pairing limits the frequency of some synonymous substitutions (including the most common CT), but not all: GA and AG substitutions seem unaffected by base-pairing.

The author then addressed nonsynonymous CT substitutions at base-paired positions. While there is still a generally higher estimated mutational fitness at unpaired positions, they propose a coarse adjustment to disentangle base-pairing from inherent mutational fitness at a given position. This adjustment reveals that nonsynonymous substitutions at base-paired positions, which define major variants, have higher mutational fitness.

Overall, this manuscript highlights the importance of considering RNA secondary structure in viral evolution studies.

The conclusions of this work are generally well supported by the data presented. Particularly, the author acknowledges most limitations of the analyses, and addresses them. Even though no new sequencing results were generated, the author used available data generated from the analysis of roughly seven million sequenced patient samples. Finally, the author discusses ways to improve the current available models.

There are a number of limitations of this work that should be highlighted, specifically in regard to the secondary structure data used in this paper. The Lan et al. dataset was generated using a multiplicity of infection (MOI) of 0.05, 24 hours post-infection (h.p.i.). At such a low MOI and late timepoint, viral replication is not synchronous and sequencing artifacts might be generated by cell debris and viral RNA degradation, therefore impacting the population-averaged results. In addition, the nonsynonymous base-paired positions in Figure 2 have relatively high population-averaged DMS reactivity, which suggests those positions are dynamic. Therefore, the proposed adjustment could result in an incorrect estimation of their inherent mutational fitness.

Additionally, like all such RNA probing experiments within cells, it remains difficult to deconvolve DMS/SHAPE low reactivity with RNA accessibility (e.g. from protein binding).

This work presents clear methods and an easy-to-access bioinformatic pipeline, which can be applied to other RNA viruses. Of note, it can be readily implemented in existing datasets. Finally, this study raises novel mechanistic questions on how mutational fitness is not correlated to secondary structure in the same way for every substitution.

Overall, this work highlights the importance of studying mutational fitness beyond an immune evasion perspective. On the other hand, it also adds to the viral intrinsic constraints to immune evasion.

https://doi.org/10.7554/eLife.98102.1.sa0

Secondary structure of the SARS-CoV-2 genome is predictive of nucleotide substitution frequency

Peer review process

Editors

Be the first to read new articles from eLife