Coordinated evolution at amino acid sites of SARS-CoV-2 spike

  1. Alexey Dmitrievich Neverov  Is a corresponding author
  2. Gennady Fedonin
  3. Anfisa Popova
  4. Daria Bykova
  5. Georgii Bazykin
  1. HSE University, Russian Federation
  2. Central Research Institute for Epidemiology, Russian Federation
  3. Moscow Institute of Physics and Technology (National Research University), Russian Federation
  4. Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Russian Federation
  5. Lomonosov Moscow State University, Russian Federation
  6. Skolkovo Institute of Science and Technology, Russian Federation
9 figures, 26 tables and 1 additional file

Figures

Concordantly evolving sites in SARS-CoV-2 Spike protein.

(A) Clusters of concordantly evolving sites. Graph vertices represent sites, and edges represent concordantly evolving pairs (Appendix 1—table 7). The graph consists of 13 connected components, 8 of …

Clusters of coevolving sites on the protein structure.

Sites of the five clusters that comprise multiple coevolving pairs of sites are shown as spheres, with color coding matching Figure 1A. (A) Closed conformation of the S-protein trimer (pdb: 7JJJ). (B

Figure 3 with 1 supplement
Concordantly evolving pair of sites 501 and 1118.

Leading and trailing mutations are represented by blue (site 501) or red (site 1118) right-pointing and left-pointing triangles respectively; diamond-shaped signs indicate mutations that are both …

Figure 3—figure supplement 1
Concordantly evolving pair of sites 501 and 1118.

Here, mutations on terminal branches, which are excluded from the analysis, are shown in light blue, red or violet; mutations on internal branches that are followed by mutations on terminal …

The Q677H is depleted on the background of N501Y.

Branches carrying substitutions at site 501 are shown in blue; at site 677, in red; at both sites, in violet. There is less violet color than expected. Some clades without mutations at either site …

Figure 5 with 1 supplement
Coevolution of S-protein sites 484 and 655.

Leading and trailing mutations are represented by blue (site 484), red (site 655) or violet (mutations at both sites on the same branch) right-pointing and left-pointing triangles respectively; …

Figure 5—figure supplement 1
Concordantly evolving pair of sites 484 and 655.

Here, mutations on terminal branches, which are excluded from the analysis, are shown in light blue, red or violet; mutations on internal branches that are followed by mutations on terminal …

Appendix 1—figure 1
Numbers of predicted concordantly (A) and discordantly (B) evolving pairs for different nominal p-values in the simulated data in non-epistatic mode of evolution, compared to the null distribution for the ML phylogeny reconstructed by IQ-TREE.

Black dots indicate numbers of site pairs having corresponded p-values in the simulated data, boxes with whiskers indicate distributions of numbers of site pairs having corresponded p-values among …

Appendix 1—figure 2
Numbers of predicted concordantly (A) and discordantly (B) evolving pairs for different nominal p-values in the S-gene of SARS-Cov-2, compared to the null distribution for the ML phylogeny reconstructed by IQ-TREE.

See legend for the Appendix 1—figure 1.

Author response image 1
Site pairs detected as concordantly (I) or discordantly (II) evolving for FDR 30%, blue – site from a true concordantly evolving pair, red – site from a true discordantly evolving site pair, yellow – neutral; thick lines indicate true epistatic pairs.
Author response image 2
Phylogenetic clades annotated as VOCs according to the GISAID metadata as of 07.

09.2021 (I), and to the newer PANGOLIN 4.1.3, used in the default mode (II). Α is shown in red color, Β in blue, Γ in green, and Δ in orange. There are more red (α) branches outside the main α clade …

Tables

Table 1
Concordantly evolving sites of the SARS-CoV-2 S-protein with FDR less than 10% for both reconstructed phylogenies (see Text).

The following characteristics are shown: coordinates on the S-protein sequence, nominal p-values, the value of the epistatic statistics, the total number of consecutive mutation pairs for the two …

site 1site 2clusterp-valueepistatic statistics#consec.pairs of mutations#mutations in consec. pairs at site 1#mutations in consec. pairs at site 2#mutations at site 1#mutations at site2physical distance, Å
13152<2e-52.864554516-
2041742.2e-41.3484328747.83
2619041.8e-41.02944312318.4
6321334e-50.6811112313.51
69703<2e-51.125222541.3
701443<2e-51.3013334514.03
764902.6e-40.221113345.18
18936021.6e-40.5442123329.89
19041740.00011.2723223739.93
2592613<2e-51.351.521423.81
3563602<2e-51.2832.5232310.12
3593602<2e-50.9761.512231.34
4394411<2e-52.097434983.5
44044111.4e-41.5053331281.33
4404421<2e-52.4763231263.92
4404441<2e-52.5153.5251275.6
4414421<2e-52.244444861.33
44144312e-51.281542823.23
4414441<2e-53.4925.544872.6
44244316e-51.301442621.32
4424441<2e-53.013644674.05
44344418e-51.043222271.32
501111851.4e-42.7371613124022131.71
6817165<2e-53.90516.512165921-
71698252e-52.00115911211581.66
716111854e-52.38213812212222.29
859950<2e-52.2195.54511915.17
982111851.4e-41.63715118152294.63
Table 2
Discordantly evolving sites of the SARS-CoV-2 S-protein.

The following characteristics are shown: coordinates on the S-protein sequence, nominal p-values, the value of the epistatic statistic, the total number of consecutive mutation pairs for the two …

site 1site 2p-valueepistat.#consec. pairs of mutations#mut. in consec. pairs in site1#mut. in consec. pairs in site2#mut. in site1#mut. in site2FDRUShER treephysical distance,Å
696143e-30.075155140.23146.75
2225013.1e-30.0111111400.05855.44
4406813.1e-30.0111112590.055-
5016751.2e-30.0211140240.0684.98
5016774.8e-40.053234039088.20
5706142.4e-30.1671716140.89519.30
6146534e-50.034141450.02515.47
6149823.2e-40.1271714150.89127.87
68111763.9e-30.0111159110.385-
Appendix 1—table 1
Levels of spurious signal of concordant evolution, inferred in the simulated dataset with no epistasis by four variants of the method.

Number of pairs, detected under p-value threshold 10-4, and estimated FDR for this threshold are shown.

Model#FPmin p-valueMaximal FDR threshold for#FP = 0
consider allelesshuffle mutations in fgr. only2<1e-40,25
shuffle mutations both in bgr. and fgr.3<1e-40,15
ignore allelesshuffle mutations in fgr. only4<1e-40,11
shuffle mutations both in bgr. and fgr.24<1e-40,02
Appendix 1—table 2
Levels of spurious signal of discordant evolution, inferred in the simulated dataset with no epistasis by four variants of the method.

Number of pairs, detected under p-value threshold 10-4, and maximal threshold on estimated FDR that allows to avoid false discoveries are shown.

Model#FPmin p-valueMaximal FDR threshold for #FP = 0
consider allelesshuffle mutations in fgr. only2<1e-40,2
shuffle mutations both in bgr. and fgr.19<1e-40,026
ignore allelesshuffle mutations in fgr. only4<1e-40,125
shuffle mutations both in bgr. and fgr.16<1e-40,031
Appendix 1—table 3
Numbers of truly and falsely predicted concordantly evolving pairs of sites for the simulated data with positively and negatively epistatically interacting sites.

The evolution of the population of genotypes with twenty independently evolving pairs of epistatically interacting sites was simulated by MimicrEE2. Ten pairs of sites were evolving under recurrent …

Model#FP#TP
consider allelesshuffle mutations in fgr. only510
shuffle mutations both in bgr. and fgr.556
ignore allelesshuffle mutations in fgr. only228
shuffle mutations both in bgr. and fgr.2507
Appendix 1—table 4
Predicted concordantly evolving pairs of sites for the simulated data with positively and negatively epistatically interacting sites.

The characteristics of predicted site pairs for the estimated FDR≤10% are shown: positions of sites on the primary sequence of S-protein (site1 and site2), the upper p-values (p-value), values of …

site1site2p-valueepistat#consec. pairs of mut.#mut. in consec. pairs in site1#mut. in consec. pairs in site2#mut. in site1#mut. in site2exp. epistatSE
12<1e-412,3666,552652963,970,87
34<1e-417,79946110441186,551,23
56<1e-413,8754485461053,270,83
543<1e-421,43108,5821004310512,61,89
78<1e-411,3650425081093,220,78
910<1e-417,09906488101176,621,19
1112<1e-413,4653,54851121082,540,69
1314<1e-411,6352,54356141062,850,72
1516<1e-411,84694770161165,881,08
1920<1e-49,06473947181094,290,88
4868<1e-429,11171,511016920952,60,69
17182e-48,2366386531956,611,23
19312e-411,5878,556786717620,722,09
50942e-424,34162,51001516818317,551,9
78942e-428,501721101699417820,52,07
Appendix 1—table 5
Numbers of truly and falsely predicted discordantly evolving pairs of sites for the simulated data with positively and negatively epistatically interacting sites.

See legend for Appendix 1—table 3.

Model#FP#TP
consider allelesshuffle mutations in fgr. only47
shuffle mutations both in bgr. and fgr.625
ignore allelesshuffle mutations in fgr. only49
shuffle mutations both in bgr. and fgr.75
Appendix 1—table 6
Predicted discordantly evolving pairs for the simulated data with positively and negatively epistatically interacting sites.

See legend for Appendix 1—table 4. The ‘p-value’ column contains the lower p-values.

site1site2p-valueepistat#consec. pairs of mut.#mut. in consec. pairs in site1#mut. in consec. pairs in site2#mut. in site1#mut. in site2exp. epistatSE
1650<1e-45,7285,5478310817810,511,48
17961e-47,4493,5549010919212,921,70
2324<1e-48,2770,5516813712913,701,65
2526<1e-46,1373,5377413912110,841,46
26951e-46,9491,5529412117912,181,60
2728<1e-45,9076427316814913,901,82
33341e-44,7048,537451171368,551,30
3536<1e-45,4759,5395515012710,811,51
3738<1e-44,1044,532411131468,931,39
3940<1e-44,9761,5375911514211,111,47
3947<1e-47,2786,5548411516613,091,66
Appendix 1—table 7
Predicted concordantly evolving pairs for the ML phylogeny of S-gene reconstructed by IQ-TREE.

The characteristics of predicted site pairs for the estimated FDR≤10% are shown: the positions of sites on the primary sequence of S-protein (site1 and site2), the upper p-values (p-value), values …

site1site2p-valueepistat#consec. pairs of mut.#mut. in consec. pairs in site1#mut. in consec. pairs in site2#mut. in site1#mut. in site2FDRfor the USHER treepdb distance
13152<2e-52,8645545160,03
18200,000081,6613,5332881,193,62
2026<2e-51,9854338120,3615,19
204170,000221,348432870,0547,83
261900,000181,0294431230,0618,40
63640,000020,726111241,001,31
63670,000080,726111251,009,38
63690,000120,726111251,0011,07
632130,000040,681111230,0113,51
64670,000120,726111451,005,51
64690,000140,726111451,007,05
6769<2e-51,101222550,553,59
6970<2e-51,125222540,011,30
70144<2e-51,301333450,0114,03
764900,000260,220111330,0545,18
1541071<2e-51,767423550,2595,13
155157<2e-51,1132223110,533,74
1893560,000060,544212320,3231,63
1893600,000160,544212330,0729,89
1904170,00011,272322370,0139,93
2132610,000080,562111320,3711,10
259261<2e-51,3501,521420,053,81
262272<2e-51,183441920,9431,42
3563570,000060,448111220,691,33
356360<2e-51,2832,523230,0510,12
359360<2e-50,9761,512230,011,34
439441<2e-52,097434980,013,50
4404410,000141,5053331280,011,33
440442<2e-52,4763231260,013,92
4404430,000021,2844321220,253,86
440444<2e-52,5153,5251270,015,60
441442<2e-52,244444860,011,33
4414430,000021,281542820,023,23
441444<2e-53,4925,544870,012,60
4424430,000061,301442620,051,32
442444<2e-53,013644670,014,05
4434440,000081,043222270,031,32
4846550,000041,71686534120,8973,58
50111180,000142,73716131240220,09131,71
681716<2e-53,90516,5121659210,02
7169820,000022,0011591121150,0181,66
71611180,000042,3821381221220,0122,29
859950<2e-52,2195,5451190,0115,17
98211180,000141,6371511815220,0194,63
125812590,000040,77531353NA
Appendix 1—table 8
Predicted concordantly evolving pairs for the MP phylogeny of S-gene reconstructed by USHER.

See legend for Appendix 1—table 7. The ‘p-value’ column contains the lower p-values.

site1site2p-valueepistat#consec. pairs of mut.#mut. in consec. pairs in site1#mut. in consec. pairs in site2#mut. in site1#mut. in site2IQ-TREEpdb distance
123460,000060,934221940-
128990,000040,724221930-
131520,000041,4075434131-
201900,000220,88532384022,59
204170,000181,10722188147,83
261900,000260,913434124118,40
266550,000420,9686,5361214037,53
546900,000440,714111113033,65
62251<2e-50,74811123032,01
6796<2e-50,5991113707,12
6970<2e-51,4963335511,30
691440,000020,98422255012,57
70144<2e-51,08833355114,03
764900,00010,27511123145,18
80215<2e-51,6876,5362114012,51
809500,000141,091322217056,59
1522520,000160,623221134015,26
1893600,00030,53411122129,89
1897720,000240,44011122044,17
190417<2e-51,1342,52248139,93
21511670,000420,8532221440
2552560,000220,7582228701,32
2552600,000360,8113328309,61
2562580,000120,6601117202,81
2562600,000141,0562227308,18
259260<2e-51,5362223301,31
2592610,000140,9091113213,81
357360<2e-51,0251,5122207,59
359360<2e-51,2682122211,34
3607720,000180,53411122058,61
439440<2e-52,0383,524101201,31
439441<2e-52,79553410513,50
4394440,000061,6064,53410706,21
440441<2e-52,8224,54212511,33
440442<2e-52,09233212313,92
440444<2e-52,9474,54312715,60
441442<2e-52,2923225311,33
4414430,000021,3832225213,23
441444<2e-53,8296,5445712,60
4414450,000181,0862215608,51
4424430,00011,1322213211,32
44244402,3874333714,05
4434440,000061,0314142711,32
4444450,00021,2704537601,31
50157002,7341613104619053,69
50111180,000482,02818141346191131,71
5706810,000182,53817,5151419620
57011180,000021,88113,59101919079,04
572118100,533111320
58312370,000140,242111331
6817160,000023,01720201662211
68111180,000322,33418141262190
71698204,62319,513162118181,66
716111802,87217,511132119122,29
85995001,59643497115,17
982111801,83117,510101819194,63
102711760,000161,2065,5541290
Appendix 1—table 9
Predicted discordantly evolving pairs for the MP phylogeny of S-gene reconstructed by USHER.

The characteristics of predicted site pairs for the estimated FDR≤10% are shown: the positions of sites on the primary sequence of S-protein (site1 and site2), the upper p-values (p-value), values …

site1site2p-valueepistat#consec. pairs of mut.#mut. in consec. pairs in site1#mut. in consec. pairs in site2#mut. in site1#mut. in site2IQ-TREEpdb distance
186810,003440,1025,55422620
2225010,003140,0191111246155,44
4395010,002860000104605,16
4405010,00190000124609,47
4406810,002740,00811112621
4849820,00210,0203334318035,42
5016750,001360,0251114622184,98
5016770,00010,0232224633188,2
5706770,002260,0111111933044,77
6146530,000220,037414105115,47
6757160,0056800002221038,15
67511180,0052400002219058,84
6776810,002640,09833333620
6777160,001020,0111113321042,93
6779820,001520,0041113318054,37
67711180,000660,0021113319064,36
Appendix 1—table 10
Coordinated episodic selection in concordantly evolving pairs predicted for the phylogeny reconstructed by IQ-TREE.

Z-scores < 0 and upper p-values after Benjamini-Hochberg adjustment < 0.05 indicate clustering of nonconsecutive mutations; z-scores > 0 and lower p-values after Benjamini-Hochberg adjustment < 0.05 …

site 1site 2#nonseq. pairszscorelower pvalueupper pvaluelower pvalue adj.upper pvalue adj.
131522569–0,7240,76750,232510,5813
18208610,51,1480,12250,87750,36751
202657192,4440,01750,98250,11251
2041725331,7380,040,960,21
2619036822,2580,0150,9850,11251
63641241,7360,060,940,24551
63672641,680,0750,9250,25961
63692392,2240,0250,9750,14061
63213941,4070,08250,91750,26521
646713242,3530,01250,98750,11251
646911992,7660,00250,99750,02811
676925424,104010,02811
697020144,041010,02811
701446272,9520,00250,99750,02811
7649017480,6320,250,750,53571
1541071502–1,3570,9250,07510,225
1551574820,6710,23750,76250,53441
189356108–0,1870,5550,4450,9990,9536
189360580,9270,170,830,47651
19041716310,0990,4450,5550,83441
213261778–1,2510,9050,09510,2672
259261736,5–2,3840,99750,002510,0094
2622727440,0850,440,560,83441
35635765–0,7970,75250,247510,5862
35636063,5–2,70410,002510,0094
35936064,5–0,390,62750,372510,8381
4394411196–3,98910,002510,0094
4404411647–3,5510,002510,0094
440442822–3,2310,002510,0094
440443931–2,1240,9950,00510,0173
4404441591,5–3,51110,002510,0094
441442446–5,06310,002510,0094
441443505–4,2610,002510,0094
441444864,5–5,33710,002510,0094
442443251–3,98810,002510,0094
442444429–4,72710,002510,0094
443444491–3,97210,002510,0094
484655187921,0380,180,820,47651
5011118161211,4430,070,930,25961
68171619375,5–0,9620,83250,167510,4434
71698282670,580,30750,69250,6291
716111899861,650,050,950,2251
8599503684,50,7470,20750,79250,51881
982111881030,7570,23250,76750,53441
125812591053–1,4350,93750,062510,2009
Appendix 1—table 11
Coordinated episodic selection in discordantly evolving pairs predicted for the phylogeny reconstructed by IQ-TREE.

Z-scores < 0 and upper p-values after Benjamini-Hochberg adjustment < 0.05 indicate clustering of nonconsecutive mutations; z-scores > 0 and lower p-values after Benjamini-Hochberg adjustment < 0.05 …

site 1site 2#nonseq. pairszscorelower pvalueupper pvaluelower pvalue adj.upper pvalue adj.
6961428752,270,0150,9850,061
22250117114–0,8610,7950,2050,9950,369
44068110559–2,2290,9950,0050,9950,045
50167518907–2,390,98750,01250,9950,0563
50167730478–1,4630,9250,0750,9950,1856
57061458131,7380,04250,95750,09561
61465318562,9440,002510,02251
61498249132,2260,020,980,061
68111769599–1,3260,91750,08250,9950,1856
Appendix 1—table 12
Coordinated episodic selection in concordantly evolving pairs predicted for the phylogeny reconstructed by USHER.

Z-scores < 0 and upper p-values after Benjamini-Hochberg adjustment < 0.05 indicate clustering of nonconsecutive mutations; z-scores > 0 and lower p-values after Benjamini-Hochberg adjustment < 0.05 …

site 1site 2#nonseq. pairszscorelower pvalueupper pvaluelower pvalue adj.upper pvalue adj.
1234612960,4610,290,7110,9263
128998830,7050,2450,75510,9551
131523955–0,7310,750,2510,4597
201902357–0,3720,65250,347510,5826
2041731840,6920,2250,77510,9551
2619038760,9930,14250,85750,88350,9975
2665510663,52,4860,010,990,14251
54690923–1,2880,91750,082510,1959
62251950,510,28750,712510,9263
679620270,4620,2850,71510,9263
697022412,982010,07131
691449672,5820,010,990,14251
701448332,3030,020,980,18321
7649018010,7660,21250,787510,9551
8021513261,5–2,3690,9950,00510,0204
809506321–1,4170,93250,067510,1749
15225211420,0420,48750,512510,8063
189360430,1880,3950,60510,8621
189772109–0,850,7850,21510,4226
1904172157,5–0,7410,77250,227510,4323
21511672031–1,6020,95250,047510,1425
2552562126–1,0480,8450,15510,3398
2552601285–2,5510,002510,0119
256258645–1,4410,93750,062510,1696
256260872–1,3680,92750,072510,1797
259260435–2,3260,99250,007510,0267
259261778–1,8610,990,0110,0335
35736030,5–1,5170,950,0510,1425
35936038–0,5490,67250,327510,5657
36077239–0,6740,72250,277510,4943
4394402456,5–2,5770,99250,007510,0267
4394411061–2,93510,002510,0119
4394441143,5–3,43410,002510,0119
4404411555,5–2,6710,9950,00510,0204
440442897–2,68610,002510,0119
4404441675,5–2,97610,002510,0119
441442387–4,06510,002510,0119
441443440–3,43710,002510,0119
441444721,5–4,48810,002510,0119
441445596–2,1440,98250,017510,0554
442443253–3,5810,002510,0119
442444416–4,34610,002510,0119
443444472–4,30110,002510,0119
444445640–2,5020,99750,002510,0119
501570214893,943010,07131
5011118213001,3980,080,920,571
57068126547,52,150,0150,9850,1711
570111813096,51,9260,02250,97750,18321
5721181511–0,810,7850,21510,4226
58312373077–0,9980,83250,167510,3536
68171627238–1,0290,85750,142510,3249
6811118263160,0420,4750,52510,8063
71698210836,50,0570,46250,537510,8063
716111813434,5–0,0240,51750,482510,7858
85995048920,4290,31250,687510,9263
982111810470,50,2580,39750,602510,8621
102711764023,50,980,1550,8450,88350,9975
Appendix 1—table 13
Coordinated episodic selection in discordantly evolving pairs predicted for the phylogeny reconstructed by USHER.

Z-scores < 0 and upper p-values after Benjamini-Hochberg adjustment < 0.05 indicate clustering of nonconsecutive mutations; z-scores > 0 and lower p-values after Benjamini-Hochberg adjustment < 0.05 …

site 1site 2#nonseq. pairszscorelower pvalueupper pvaluelower pvalue adj.upper pvalue adj.
1868134644,5–1,9790,97250,02750,980,2
222501207560,2220,38250,61750,681
4395017667–1,2070,88750,11250,980,45
440501112200,4220,3250,6750,6651
44068113859–0,5990,73750,26250,980,6629
484982212494,3040,002510,01331
50167522065–1,8680,96250,03750,980,2
50167736276–0,5260,710,290,980,6629
570677223094,7820,002510,01331
61465322731,7390,050,950,161
67571613924–0,7310,77250,22750,980,6629
6751118134520,4170,33250,66750,6651
67768144811–2,0430,980,020,980,2
677716228910,6040,2650,7350,6651
677982178473,5430,002510,01331
6771118221151,8690,04250,95750,161
Appendix 1—table 14
Pairs of lineage-defining sites of Alpha VOC (subset I) tend to have stronger signal of concordant evolution than the complementary subset of site pairs (subset II).

Site pairs were ordered by nominal p-values; for each of the two subsets, mean rank of site pairs included into that subset is provided. The difference between mean ranks (delta) obtained for the …

mean ranks for subset I(15 site pairs)mean ranks for subset II(17005 site pairs)delta
data234,78517,8–8283,1
simulations2182,48516,1–6333,7
  1. Number of lineage-defining sites: 6.

  2. P{delta(simulations)≤delta(data)}<0.0025.

Appendix 1—table 15
Pairs of lineage-defining sites of Beta VOC tend to have stronger signal of concordant evolution than the complementary subset of site pairs.

See legend for Appendix 1—table 14.

mean ranks for subset I(15 site pairs)mean ranks for subset II(17005 site pairs)delta
data530,58517,5–7987,1
simulations3021,48515,3–5494,0
  1. Number of lineage-defining sites: 6.

  2. P{delta(simulations)≤delta(data)}<0.0025.

Appendix 1—table 16
Pairs of lineage-defining sites of Delta VOC tend to have stronger signal of concordant evolution than the complementary subset of site pairs.

See legend for Appendix 1—table 14.

mean ranks for subset I(15 site pairs)mean ranks for subset II(17005 site pairs)delta
data3137,28515,2–5378,0
simulations5545,98513,1–2967,2
  1. Number of lineage-defining sites: 6.

  2. P{delta(simulations)≤delta(data)}=0.005.

Appendix 1—table 17
Pairs of lineage-defining sites of Gamma VOC tend to have stronger signal of concordant evolution than the complementary subset of site pairs.

See legend for Appendix 1—table 14.

mean ranks for subset I(55 site pairs)mean ranks for subset II(16965 site pairs)delta
data543,38536,3–7993,0
simulations3532,38526,6–4994,3
  1. Number of lineage-defining sites: 11.

  2. P{delta(simulations)≤delta(data)}<0.0025.

Appendix 1—table 18
Pairs of lineage-defining sites of Omicron VOC.

See legend for Appendix 1—table 14.

mean ranks for subset I(91 site pairs)mean ranks for subset II(16929 site pairs)delta
data5686,08525,7–2839,7
simulations5540,68526,5–2985,9
  1. Number of lineage-defining sites: 14.

  2. P{delta(simulations)≤delta(data)}=0.635.

Appendix 1—table 19
Pairs of lineage-defining sites of Alpha VOC tend to have stronger signal of concordant evolution than the complementary subset of site pairs even if all Alpha and related lineages are excluded from the analysis.

See legend for Appendix 1—table 14.

mean ranks for subset I(15 site pairs)mean ranks for subset II(15385 site pairs)delta
data316,87707,7–7390,9
simulations4281,37703,8–3422,5
  1. Number of lineage-defining sites: 6.

  2. P{delta(simulations)≤delta(data)}<0.0025.

Appendix 1—table 20
No difference in the strength of concordant evolution is detected for pairs of lineage-defining sites of Beta VOC and the complementary subset of site pairs if all Beta and related lineages are excluded from the analysis.

See legend for Appendix 1—table 14.

mean ranks for subset I(15 site pairs)mean ranks for subset II(16275 site pairs)delta
data4586,78148,8–3562,1
simulations4522,38148,8–3626,6
  1. Number of lineage-defining sites: 6.

  2. P{delta(simulations)≤delta(data)}=0.5628.

Appendix 1—table 21
No difference in the strength of concordant evolution is detected for pairs of lineage-defining sites of Delta VOC and the complementary subset of site pairs if all Delta and related lineages are excluded from the analysis.

See legend for Appendix 1—table 14.

mean ranks for subset I(15 site pairs)mean ranks for subset II(16638 site pairs)delta
data5934,78329,2–2394,5
simulations5975,18329,1–2354,0
  1. Number of lineage-defining sites: 6.

  2. P{delta(simulations)≤delta(data)}=0.4462.

Appendix 1—table 22
Pairs of lineage defining sites of Gamma tend to have weaker signal of concordant evolution than the complementary subset of site pairs if all Gamma and related lineages are excluded from the analysis.

See legend for Appendix 1—table 14.

mean ranks for subset I(45 site pairs)mean ranks for subset II(16245 site pairs)delta
data6424,48150,3–1725,8
simulations5393,58153,1–2759,7
  1. Number of lineage-defining sites: 10.

  2. P{delta(simulations)≤delta(data)}=0.9728.

  3. P{delta(simulations)≥delta(data)}=0.0272.

  4. GISAID Identifier: EPI_SET_20220729hq doi:10.55876/gis8.220729hq.

Author response table 1
Observed frequencies of spurious detection of concordant evolution for different types of site pairs, compared to the expected (multinomial test p-value = 0).

0032). The number of possible FP pairs is calculated as the number of unordered combinations of sites that produce false positive pairs. For example, 20 sites in the simulation are involved in …

site types#FP#total possible FPexpected FP frequencyobserved FP frequency
(pos,pos)71800,03640,0843
(pos,neg)74000,08100,0843
(neg,neg)01900,03850
(pos,neu)3312000,24290,3976
(neg,neu)612000,24290,0723
(neu,neu)3017700,35830,3614
Author response table 2
Observed frequencies of spurious detection of discordant evolution for different types of site pairs, compared to the expected.

Multinomial test p-value = 0.0018.

site types#FP#total possible FPexpected FP frequencyobserved FP frequency
(pos,pos)01900,03850
(pos,neg)74000,08100,1556
(neg,neg)41800,03640,0889
(pos,neu)512000,24290,1111
(neg,neu)1912000,24290,4222
(neu,neu)1017700,35830,2222

Additional files

Download links