(A) Species for which susceptibility to infection and shedding of infectious virus have been assessed. Colours indicate the best available evidence, while symbols show the viruses involved. Blank …
(A) Tanglegram linking species across the two phylogenies, with sarbecovirus-susceptible species indicated in orange and putatively non-susceptible species in blue. Grey lines indicate species with …
Bars represent proportions from leave-one-out cross-validation, with error bars indicating 95% binomial confidence intervals. Dashed vertical lines indicate the performance expected from a null …
Bars represent proportions from leave-one-out cross-validation, with error bars indicating 95% binomial confidence intervals. Dashed vertical lines indicate the performance expected from a null …
(A) Performance of different models trained on all available data. Holdout predictions for individual species were grouped by the best available evidence of susceptibility or non-susceptibility to …
The model was trained on all hosts, with performance evaluated using leave-one-out cross-validation. In the first panel, bars represent sample sizes for susceptible species (orange) and …
The model was trained on all hosts, with performance evaluated using leave-one-out cross-validation. In the first panel, bars represent sample sizes for susceptible species (orange) and …
Positions are stratified by whether or not they form part of any features retained by the combined ACE2-based model (i.e., the model trained with access to all ACE2 representations, left panel) or …
(A) Importance of individual features in determining final predictions. Importance was measured using mean absolute SHAP values across all host species in the training data. An inset shows the same …
Both models represent individual ACE2 sites as a distance between the observed amino acid and the most common amino acid at that site among known susceptible species. Spike-binding sites were …
Excluding rhinolophid bats has no effect on model performance, suggesting that any potentially different evolutionary pressures on the ACE2 sequences of putative sarbecovirus reservoirs compared to …
(A) Overall accuracy, based on all species in our data for which predictions were available in each study. Accuracy measurements are arranged by increasing sample size, also indicated in colour. A …
(A) Heatmap comparing overlapping quantitative predictions for the same species across studies, where susceptibility is known. Quantitative scores from all studies were re-scaled to lie in [0, 1] to …
(A–B) Number of species available for prediction by (A) ACE2-based models (limited by ACE2 availability) and (B) phylogeny-based models (nearly all mammals, in this figure limited primarily by the …
Mammal species are arranged by model output, with higher scores indicating species predicted as more likely to be susceptible to sarbecovirus infection. A dashed line shows the optimised cutoff …
(A) Proportion of species either observed or predicted to be susceptible by different models. Error bars indicate 95% binomial confidence intervals. A time-scaled phylogeny illustrates divergence …
All taxonomic orders within the Boreoeutheria clade containing at least one observed or predicted susceptible species are shown. Error bars indicate 95% binomial confidence intervals. The Chiroptera …
All taxonomic orders within the Boreoeutheria clade containing at least one species predicted as susceptible are shown. Only bats and rodents show variation, likely due to family-level differences …
Final data on susceptibility to, and shedding of, sarbecoviruses, along with the accession numbers for angiotensin-converting enzyme 2 (ACE2) amino acid sequences used to represent each species.
These data were used to train all models presented.
Predictions from the phylogeny-only model.
Predictions are separated into four categories, reflecting predictions for (a) species in the training data, produced by withholding each species in turn from model training (i.e., predictions informing the leave-one-out cross-validation statistics presented), (b) susceptible species recognised through natural infection after data collection for this study had ended, (c) all other mammals available in the TimeTree phylogeny, and (d) all other birds in the TimeTree phylogeny. Across all tables, ‘cutoff’ refers to the optimised value beyond which predicted probabilities from a given model are considered to indicate susceptible species (labelled as ‘True’ in the ‘prediction’ column).