Zooanthroponotic transmission of SARS-CoV-2 and host-specific viral mutations revealed by genome-wide phylogenetic analysis
Figures
![](https://iiif.elifesciences.org/lax:83685%2Felife-83685-fig1-v1.tif/full/617,/0/default.jpg)
Overview of available Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) genome sequences from different animal species.
(A) Barplot of the number of genome sequences available in GISAID (on February 28, 2022) sampled from each animal species. Only species with 50 or more sequences were included in the current study: cat, dog, mink, and deer. (B) Heatmap of animal-associated mutations identified in previous publications. Darker colors indicate mutations found in a greater number of studies. Each row corresponds to one of the species included in this study, and columns correspond to mutations along the SARS-CoV-2 reference genome. Mutations identified as family-wise significant (p<0.05) in our genome-wide association studies are indicated with an orange asterisk. A detailed list of publications reporting these mutations is in Supplementary file 8. Only single nucleotide variants, not insertions or deletions, are included. The heatmap illustrates the results of several key prior studies but does not represent a comprehensive meta-analysis.
![](https://iiif.elifesciences.org/lax:83685%2Felife-83685-fig2-v1.tif/full/617,/0/default.jpg)
Transmission events inferred from non-human animals to humans.
Panels a-d display a representative tree for every species with animal-to-human transmissions marked on the tree. More detailed versions of these trees are in . Trees are rooted with the Wuhan reference genome (from one of the first sampled human COVID-19 patients).
-
Figure 2—source data 1
Detailed representative phylogeny of cat- and human-derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.
In order to make the tree topology clear, branch lengths are not to scale.
- https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data1-v1.zip
-
Figure 2—source data 2
Detailed representative phylogeny of dog- and human-derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.
In order to make the tree topology clear, branch lengths are not to scale.
- https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data2-v1.zip
-
Figure 2—source data 3
Detailed representative phylogeny of mink- and human-derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.
In order to make the tree topology clear, branch lengths are not to scale. The colored boxes to the right of the tree show the allelic state of the three mink-associated genome-wide association studies (GWAS) hits in each terminal branch of the phylogeny, with dark red indicating the animal-associated alternate allele.
- https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data3-v1.zip
-
Figure 2—source data 4
Detailed representative phylogeny of deer- and human derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.
In order to make the tree topology clear, branch lengths are not to scale. The colored boxes to the right of the tree show the allelic state of the seven deer-associated genome-wide association studies (GWAS) hits that appeared in all ten replicate GWAS runs, with dark red indicating the animal-associated alternate allele.
- https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data4-v1.zip
![](https://iiif.elifesciences.org/lax:83685%2Felife-83685-fig3-v1.tif/full/617,/0/default.jpg)
Transmission events from animals-to-humans are rarely detected, except from mink.
The distribution of inferred transmission counts (across 10 replicate trees) in each animal species, in both bootstrap-filtered and unfiltered trees are shown in A the animal-to-human direction, and B the human-to-animal direction. Points are plotted with jitter to avoid overlap.
![](https://iiif.elifesciences.org/lax:83685%2Felife-83685-fig4-v1.tif/full/617,/0/default.jpg)
Manhattan plots summarizing genome-wide association studies (GWAS) hits in each animal species.
In every panel, the x-axis represents the nucleotide position in the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) reference genome and the y-axis represents the -log10 of the pointwise p-values averaged over replicates. ORFs are shown as alternating shaded bars along the x-axis. Statistically, significant hits with family-wise corrected p-values of lower than 0.05 are shown in red (non-synonymous) or blue (synonymous), while non-statistically significant p-values are in black.
Tables
Average inferred transmission events between humans and animals.
Average inferred number of transitions(filtered – unfiltered) | Mink(n=1038) | Deer(n=134) | Cat(n=78) | Dog(n=39) |
---|---|---|---|---|
Animal-to-human | 38.0–112.3 | 0.7–1.4 | 4.4–4.4 | 1.4–1.7 |
Human-to-animal | 42.3–65.2 | 38.3–55.2 | 58.5–68.3 | 31.5–35.7 |
Single-nucleotide variants associated with mink by genome-wide association studies (GWAS).
“Pos” refers to the nucleotide position in the reference genome. Homoplasy counts in focal animals (cases), humans (controls), and p-values are averaged across replicates in which the site’s family-wise p-values were <0.05. Where applicable, amino acid positions refer to the polyprotein with mature protein positions in parenthesis. The ‘local transmission odds ratio’ is the result of a Fisher’s exact test of the likelihood that the alternate base (animal-associated minor allele) was enriched in the local human population where the mink sequences bearing the alternate base were sampled (Methods). n.s., not significant. Odds ratio p-value: *<0.05, **<0.01, ***<0.001.
Pos. | Ref. base | Alt. base | Amino acid change | Gene | Homoplasy count in focal animal | Homoplasy count in humans | p-value (pointwise) | p-value (familywise) | Siginificant in N replicates | Local transmission odds ratio |
---|---|---|---|---|---|---|---|---|---|---|
26047 | U | G | L219V | ORF3a | 6 | 0 | 0.0014 | 0.0365 | 10 | 3.93 *** |
12795 | G | A | G4177E (nsp9 G37E) | ORF1ab/pp1ab/nsp9/replicase | 6 | 0 | 0.0015 | 0.0368 | 6 | 7.53 *** |
23064 | A | C | N501T | Spike/S1/RBD/binds ACE2 | 6.4 | 0 | 0.0010 | 0.0258 | 5 | 0.48 *** |
Single-nucleotide variants associated with deer by genome-wide association studies (GWAS).
“Pos” refers to the nucleotide position in the reference genome. Homoplasy counts in focal animals (cases), humans (controls), and p-values are averaged across replicates in which the site’s family-wise p-values were <0.05. Where applicable, amino acid positions refer to the polyprotein with mature protein positions in parenthesis. IG, Intergenic. The ‘local transmission odds ratio’ is the result of a Fisher’s exact test of the likelihood that the alternate base (animal-associated minor allele) was enriched in the local human population where the deer sequences bearing the alternate base were sampled (Methods). n.s., not significant. Odds ratio p-value: *<0.05, **<0.01, ***<0.001.
Pos. | Ref. base | Alt. base | Amino acid change | Gene | Homoplasy count in focal animal | Homoplasy count in humans | p-value(pointwise) | p-value(familywise) | Significant in N replicates | Local transmission odds ratio |
---|---|---|---|---|---|---|---|---|---|---|
7303 | C | U | I2346I (nsp3 I1524I) | ORF1a/pp1ab/pp1a/nsp3 | 17.8 | 1.2 | 9.99E-06 | 9.99E-06 | 10 | 2.51*** |
9430 | C | U | I3055I (nsp4 I292I) | ORF1a/pp1ab/pp1a/nsp4 | 15.2 | 6.2 | 9.99E-06 | 9.99E-06 | 10 | 2.20*** |
14960 | A | U | N4899I (nsp12 N507I) | ORF1ab/pp1ab/nsp12/RdRp | 7.8 | 0.1 | 9.99E-06 | 1.09E-05 | 10 | 0** |
20259 | C | U | F6665F (nsp15 F213F) | ORF1ab/pp1ab/nsp15 | 4.8 | 0.1 | 3.39E-05 | 0.0013 | 10 | n.s. |
28016 | C | U | F41F | ORF8 | 4 | 0 | 7.59E-05 | 0.0061 | 10 | 6.09*** |
12073 | C | U | D3936D (nsp7 D67D) | ORF1a/pp1ab/pp1a/nsp7 | 5.2 | 1.1 | 4.29E-05 | 0.0025 | 10 | n.s. |
29679 | C | U | IG | 3’UTR | 5 | 1.8 | 8.59E-05 | 0.0055 | 10 | 3.17*** |
5184 | C | U | P1640L (nsp3 P822L) | ORF1a/pp1ab/pp1a/nsp3 | 4.6 | 1.6 | 0.0002 | 0.0115 | 8 | 2.61*** |
29750 | C | U | IG | 3’UTR/S2M | 5 | 2.6 | 0.0002 | 0.0103 | 7 | 3.12*** |
7318 | C | U | F2351F (nsp3 F1533F) | ORF1a/pp1ab/pp1a/nsp3 | 4 | 0.3 | 0.0001 | 0.0114 | 6 | 3.80*** |
16466 | C | U | P5401L (nsp13 P77L) | ORF1ab/pp1ab/nsp13/Hel | 5 | 1 | 4.99E-05 | 0.0019 | 5 | 4.09*** |
7267 | C | U | F2334F (nsp3 F1516F) | ORF1a/pp1ab/pp1a/nsp3 | 4.4 | 0.8 | 9.39E-05 | 0.0079 | 5 | 2.79*** |
210 | G | U | IG | 5’UTR/SL5a | 4 | 0.5 | 0.0001 | 0.0136 | 4 | 3.98*** |
6730 | C | U | N2155N (nsp3 N1337N) | ORF1a/pp1ab/pp1a/nsp3 | 4 | 0.75 | 0.0002 | 0.0168 | 4 | 1.81** |
27752 | C | U | T120I | ORF7a | 4 | 0.75 | 0.0002 | 0.0169 | 4 | 4.03*** |
11152 | C | U | V3629V (nsp6 V60V) | ORF1a/pp1ab/pp1a/nsp6 | 4 | 0.7 | 0.0002 | 0.0153 | 3 | 0.80** |
5822 | C | U | L1853F (nsp3 L1035F) | ORF1a/pp1ab/pp1a/nsp3 | 4 | 0.5 | 0.0001 | 0.0118 | 2 | n.s. |
9711 | C | U | S3149F (nsp4 S386F) | ORF1a/pp1ab/pp1a/nsp4 | 4 | 0.5 | 8.49E-05 | 0.0118 | 2 | 0.56** |
9679 | C | U | F3138F (nsp4 F375F) | ORF1a/pp1ab/pp1a/nsp4 | 4 | 0 | 9.49E-05 | 0.0067 | 2 | 2.32*** |
7029 | C | U | S2255F (nsp3 S1437F) | ORF1a/pp1ab/pp1a/nsp3 | 4 | 0.5 | 0.0002 | 0.0149 | 2 | 0.22*** |
29738 | C | A | IG | 3’UTR/S2M | 4 | 0 | 3.99E-05 | 0.0059 | 1 | n.s. |
26767 | U | C | I82T | ORF5/M | 4 | 0 | 8.99E-05 | 0.0057 | 1 | 4.09*** |
203 | C | U | IG | 5’UTR/SL5a | 4 | 1 | 0.0003 | 0.0191 | 1 | 5.94*** |
12820 | A | G | L4185L (nsp9 L45L) | ORF1a/pp1ab/pp1a/nsp9 | 5 | 1 | 3.99E-05 | 0.0009 | 1 | 4.52*** |
4540 | C | U | Y1425Y (nsp3 Y607Y) | ORF1a/pp1ab/pp1a/nsp3 | 4 | 1 | 0.0002 | 0.0239 | 1 | 2.80*** |
29666 | C | U | L37F | ORF10 | 4 | 1 | 0.0002 | 0.0219 | 1 | 1.54*** |
Additional files
-
Supplementary file 1
GISAID accession numbers of all sequences used in this study.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp1-v1.zip
-
Supplementary file 2
Number of viral sequences passing quality filters.
The counts show the initial number of sequences downloaded from GISAID from each animal species, and the remaining number after each consecutive quality filter. The ‘quality control’ count shows the number of sequences after removing those with incomplete sampling dates and/or >500 ambiguous bases (Ns). The ‘post-alignment pruning’ shows the count after removing sequences shorter than 29,000 bases and/or with an insertion absent in all other sequences (introducing a gap in the alignment). The ‘divergent tree branches’ shows the count after removing sequences that introduce long branches into the phylogeny (Methods). Ranges of counts indicate variation across tree replicates.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp2-v1.docx
-
Supplementary file 3
Table of transmission counts for all candidate species, in both animal-to-human and human-to-animal direction, for both bootstrap-filtered and unfiltered cases.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp3-v1.docx
-
Supplementary file 4
Human-derived sequence counts bearing each of the significant GWAS hits identified in deer inside and outside regions where deer sequences containing each mutation are found.
Odds ratio and the p-values are reported following a Fisher’s exact test. GWAS hits with OR <1 or not significantly different from 1 are highlighted in green.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp4-v1.docx
-
Supplementary file 5
Human-derived sequence counts bearing each of the significant GWAS hits identified in mink inside and outside regions where mink sequences containing each mutation are found.
Odds ratio and the p-values are reported following a Fisher’s exact test. GWAS hits with OR <1 or not significantly different from 1 are highlighted in green.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp5-v1.docx
-
Supplementary file 6
Number of times Mink GWAS hits appear along human-to-mink transmission branches.
The counts are summed across all branches and all 10 tree replicates.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp6-v1.docx
-
Supplementary file 7
Number of times Deer GWAS hits appear along human-to-deer transmission branches.
The counts are summed across all branches and all 10 tree replicates. Yellow-colored rows are mutations that never appear on a human-to-animal transmission branch over all deer replicates. The orange-colored row corresponds to a nucleotide position mutated along human-to-animal transmission branches, but the substitution was never identical to the animal-associated allele identified by GWAS at that position.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp7-v1.docx
-
Supplementary file 8
SARS-CoV-2 mutations were previously associated with non-human animal species in the literature.
Insertions and deletions are not considered. For each of the studies, mutations reported in the main text or main tables are included. For the (Pickering et al., 2022) study, substitutions found in deer in their study and also appear at least once in the database of previously reported deer sequences (listed in the paper’s Supplementary file 2) are included.
- https://cdn.elifesciences.org/articles/83685/elife-83685-supp8-v1.docx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/83685/elife-83685-mdarchecklist1-v1.docx