Variation in the ACE2 receptor has limited utility for SARS-CoV-2 host prediction

  1. Nardus Mollentze  Is a corresponding author
  2. Deborah Keen
  3. Uuriintuya Munkhbayar
  4. Roman Biek
  5. Daniel G Streicker
  1. School of Biodiversity, One Health & Veterinary Medicine, College of Medical, Veterinary, and Life Sciences, University of Glasgow, United Kingdom
  2. Medical Research Council – University of Glasgow Centre for Virus Research, United Kingdom
5 figures and 3 additional files

Figures

Figure 1 with 1 supplement
Phylogenetic clustering of sarbecovirus host susceptibility and shedding data.

(A) Species for which susceptibility to infection and shedding of infectious virus have been assessed. Colours indicate the best available evidence, while symbols show the viruses involved. Blank …

Figure 1—figure supplement 1
Congruence between a phylogeny reconstructed from ACE2 amino acid sequences and a consensus time-scaled phylogeny for mammals and birds obtained from the TimeTree database.

(A) Tanglegram linking species across the two phylogenies, with sarbecovirus-susceptible species indicated in orange and putatively non-susceptible species in blue. Grey lines indicate species with …

Figure 2 with 4 supplements
Ability of models trained on different representations of either ACE2 sequences or a time-scaled amniote phylogeny to predict host susceptibility to sarbecovirus infection.

Bars represent proportions from leave-one-out cross-validation, with error bars indicating 95% binomial confidence intervals. Dashed vertical lines indicate the performance expected from a null …

Figure 2—figure supplement 1
Ability of models trained on different representations of either ACE2 sequences or a time-scaled amniote phylogeny to predict shedding of infectious virus after sarbecovirus infection.

Bars represent proportions from leave-one-out cross-validation, with error bars indicating 95% binomial confidence intervals. Dashed vertical lines indicate the performance expected from a null …

Figure 2—figure supplement 2
Influence of different sources of sarbecovirus susceptibility data on prediction accuracy.

(A) Performance of different models trained on all available data. Holdout predictions for individual species were grouped by the best available evidence of susceptibility or non-susceptibility to …

Figure 2—figure supplement 3
Performance of a model trained with all ACE2 representations on hosts linked to sarbecoviruses not known to use ACE2.

The model was trained on all hosts, with performance evaluated using leave-one-out cross-validation. In the first panel, bars represent sample sizes for susceptible species (orange) and …

Figure 2—figure supplement 4
Performance of a model trained with phylogenetic eigenvectors on hosts linked to sarbecoviruses not known to use ACE2.

The model was trained on all hosts, with performance evaluated using leave-one-out cross-validation. In the first panel, bars represent sample sizes for susceptible species (orange) and …

Figure 3 with 3 supplements
Phylogenetic informativeness of all ACE2 amino acid positions available for selection in the model.

Positions are stratified by whether or not they form part of any features retained by the combined ACE2-based model (i.e., the model trained with access to all ACE2 representations, left panel) or …

Figure 3—figure supplement 1
Features retained and used by the combined ACE2-based model for predicting susceptibility to sarbecovirus infection.

(A) Importance of individual features in determining final predictions. Importance was measured using mean absolute SHAP values across all host species in the training data. An inset shows the same …

Figure 3—figure supplement 2
Model performance when training the best-performing non-ensemble ACE2-only model (‘AA consensus distance’) with access to either all sites (as in Figure 2) or with representations of known SARS-CoV-2 spike-binding sites only.

Both models represent individual ACE2 sites as a distance between the observed amino acid and the most common amino acid at that site among known susceptible species. Spike-binding sites were …

Figure 3—figure supplement 3
Model performance after training with and without data from rhinolophid bats.

Excluding rhinolophid bats has no effect on model performance, suggesting that any potentially different evolutionary pressures on the ACE2 sequences of putative sarbecovirus reservoirs compared to …

Figure 4 with 1 supplement
Performance of existing heuristics on our susceptibility data.

(A) Overall accuracy, based on all species in our data for which predictions were available in each study. Accuracy measurements are arranged by increasing sample size, also indicated in colour. A …

Figure 4—figure supplement 1
Comparison of predictions across studies and models.

(A) Heatmap comparing overlapping quantitative predictions for the same species across studies, where susceptibility is known. Quantitative scores from all studies were re-scaled to lie in [0, 1] to …

Figure 5 with 4 supplements
Distribution of wild terrestrial mammals predicted as susceptible depends on input data and model choice.

(A–B) Number of species available for prediction by (A) ACE2-based models (limited by ACE2 availability) and (B) phylogeny-based models (nearly all mammals, in this figure limited primarily by the …

Figure 5—figure supplement 1
Estimating the value of quantitative susceptibility predictions from the phylogeny-only model for guiding surveillance.

Mammal species are arranged by model output, with higher scores indicating species predicted as more likely to be susceptible to sarbecovirus infection. A dashed line shows the optimised cutoff …

Figure 5—figure supplement 2
Species observed and predicted as susceptible to sarbecovirus infection, aggregated by taxonomic order.

(A) Proportion of species either observed or predicted to be susceptible by different models. Error bars indicate 95% binomial confidence intervals. A time-scaled phylogeny illustrates divergence …

Figure 5—figure supplement 3
Proportion of species observed or predicted to be susceptible to sarbecovirus infection in boreoeutherian families.

All taxonomic orders within the Boreoeutheria clade containing at least one observed or predicted susceptible species are shown. Error bars indicate 95% binomial confidence intervals. The Chiroptera …

Figure 5—figure supplement 4
Distribution of wild terrestrial mammals predicted as susceptible by the phylogeny-only model, separated by taxonomic order.

All taxonomic orders within the Boreoeutheria clade containing at least one species predicted as susceptible are shown. Only bats and rodents show variation, likely due to family-level differences …

Additional files

Supplementary file 1

Final data on susceptibility to, and shedding of, sarbecoviruses, along with the accession numbers for angiotensin-converting enzyme 2 (ACE2) amino acid sequences used to represent each species.

These data were used to train all models presented.

https://cdn.elifesciences.org/articles/80329/elife-80329-supp1-v1.xlsx
Supplementary file 2

Predictions from the phylogeny-only model.

Predictions are separated into four categories, reflecting predictions for (a) species in the training data, produced by withholding each species in turn from model training (i.e., predictions informing the leave-one-out cross-validation statistics presented), (b) susceptible species recognised through natural infection after data collection for this study had ended, (c) all other mammals available in the TimeTree phylogeny, and (d) all other birds in the TimeTree phylogeny. Across all tables, ‘cutoff’ refers to the optimised value beyond which predicted probabilities from a given model are considered to indicate susceptible species (labelled as ‘True’ in the ‘prediction’ column).

https://cdn.elifesciences.org/articles/80329/elife-80329-supp2-v1.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/80329/elife-80329-mdarchecklist1-v1.pdf

Download links