Relative rate change for C to A, G to A, A to T, G to C, T to G and C to T mutations under the 11 degrees of non-reversibility alongside the maintained rates for A to C, A to G, T to A, C to G, G to T, and T to C

Ternary plots illustrating the relative fit of the NREV12, NREV6, and GTR nucleotide substitution models based on weighted AIC scores for 30 dsDNA, 31 dsRNA, 33 ssDNA, and 47 ssRNA virus nucleotide sequence datasets.

These plots were produced using the Akaike weights function with an overlaid density function (implemented in the qpcR package of RStudio (Ritz & Spiess, 2008) to indicate point densities. Each model is represented by a corner of the triangles, and each circle represents the relative fit of each of the three models to a single nucleotide sequence dataset. The sides of the triangle represent model support axes ranging from 0-100%, with the position of a circle in relation to each of the sides of the triangle indicating the probability of models best describing the nucleotide sequence dataset that is represented by that point. Whereas strong red colours represent a very high density of nucleotide sequence datasets that favour a particular model, bluer colours indicate a lower, but still substantial, density of datasets that favour a model.

WeightedRobinson-Foulds distances between inferred and true phylogenetic trees for datasets simulated with different degrees of nucleotide substitution non-reversibility and different average pairwise sequence identities (APIs) (~75%%, ~80%, ~85%, ~90% and ~95%).

”ns” above a pair of box and whisker plots indicates a paired t-test adjusted p-value of greater than or equal to 0.05 and “*” indicate a paired t test adjusted p-value of <0.05

AIC Scores and LRT results for double-stranded DNA virus datasets.

The lowest AIC scores indicating the best-itting models are in bold

AIC Scores and LRT results for double-stranded RNA datasets. The lowest AIC scores indicating the best fitting models are in bold.

AIC Scores and LRT results for single-stranded DNA datasets.

The lowest AIC scores indicating the best fitting models are in bold.

AIC Scores and LRT results for single-stranded RNA datasets.

The lowest AIC scores indicating the best fitting models are in bold.

Phylogenetic tree inferred from an alignment of real sequences (Avian Leukosis virus) that was used to simulate datasets with DNRs varying from 0 to 20.

The alignment of Avian Leukosis virus had an average sequence identity (API) of ~90% and the branches of this tree were scaled to produce four other trees reflecting branch tip sequences with approximate pairwise identities of ~75%, ~80%, ~85% and ~95%.

Summary of the viral genome/genome component datasets used in the study.