A framework for testing scenarios of virus-host evolution, illustrated with the example of Coronaviridae and their mammalian hosts:

In (A), a scenario of ancient origination and codiversification; in (B) a scenario of recent origination and diversification by preferential host switches; and in (C) a scenario of independent evolution. For each scenario, we indicate the associated predictions in the grey boxes. Contrary to scenario C, both scenarios A and B are expected to generate a cophylogenetic signal, i.e. closely-related coronaviruses tend to infect closely-related mammals, resulting in significant reconciliations when using topology-based probabilistic cophylogenetic methods, such as the undated version of ALE, Jane, or eMPRess. However, we expect scenario B to be distinguishable from scenario A in terms of the time consistency of host-switching events. Under scenario B, cophylogenetic methods wrongly estimate a combination of cospeciations and “back-in-time” host switches (see Methods & Results). We also expect different biogeographic patterns under the different scenarios, as illustrated by the maps, where the color gradient represents diversity levels (red: high diversity, grey: low diversity).

Species-level relationships among coronaviruses and their associated mammalian hosts.

A consensus Maximum Clade Credibility phylogenetic tree of coronaviruses is shown on the left. The tree was constructed with BEAST2 based on 150-aa palmprint amino acid sequences of the RdRp gene. sOTUs of Coronaviridae followed the definition of the Serratus project. The putative location of four genera of coronaviruses, Beta, Gamma, Delta, and Alphacoronaviruses, is shown. Bar scale is in units of aa substitution. On the right, a barplot gives the number of total mammalian host species and the number of host species by main mammalian order. Ancestral states on the left were obtained for illustrative purposes with the make.simmap function of the phytools R package (Revell 2012). Mammal silhouettes taken from open-to-use sources in phylopic.org, detailed credits given in SI Appendix Table S6.

A network visualization of mammal-coronavirus interactions reveals the presence of phylogenetic signal, the isolation of bats, and the centrality of humans.

Species-level network representation of the interactions between mammal species and coronavirus sOTUs. Colored round nodes represent mammal species (colors indicate the mammalian order) and grey squared nodes correspond to coronavirus sOTUs. The position of the nodes reflects their similarity in interaction partners, i.e. the tendency of clustering of mammals belonging to the same order can be interpreted as the presence of phylogenetic signal in species interactions. Humans and SARS-Cov-2 are presented using bigger nodes. The plot was obtained using the Fruchterman-Reingold layout algorithm from the igraph R-package.

The origination of coronaviruses in mammals is estimated among bats, which tend to form a closed reservoir.

(A) Phylogenetic tree of the mammals with branches colored as the percentage of ALE reconciliations which inferred this branch or its ancestral lineages as the origination of coronaviruses in mammals. Red branches are likely originations, whereas blue branches are unlikely. (B) Boxplots recapitulating the probability of inferred origination per branch in bats versus other mammal orders, with ALE applied on the original mammal tree (left panel) or on the mammal tree transformed into a star phylogeny (right panel), therefore assuming an origination in extant species. (C) Distributions of the percentages of host switches occurring within mammalian orders (left panel) and between-orders involving bats (right panel). Observed values (in orange) are compared to null expectations if host switches were happening at random (in grey). Mammal silhouettes taken from open-to-use sources in phylopic.org, detailed credits given in SI Appendix Table S6.

Maps of the diversity of coronaviruses and their mammal hosts.

In A) the richness of species of coronaviruses; geographic range maps of coronaviruses were constructed after applying the host-filling method on the geographic range maps of mammalian hosts of coronaviruses. In B) Faith’s (1992) phylogenetic diversity of coronaviruses, calculated using the phylogenetic tree of coronaviruses (see main text). In C) and D), the richness and phylogenetic diversity of mammal hosts of coronaviruses, respectively. All maps are on the Mollweide projection.