Phylodynamics of SARS-CoV-2 in France, Europe, and the world in 2020

  1. Romain Coppée  Is a corresponding author
  2. François Blanquart
  3. Aude Jary
  4. Valentin Leducq
  5. Valentine Marie Ferré
  6. Anna Maria Franco Yusti
  7. Léna Daniel
  8. Charlotte Charpentier
  9. Samuel Lebourgeois
  10. Karen Zafilaza
  11. Vincent Calvez
  12. Diane Descamps
  13. Anne-Geneviève Marcelin
  14. Benoit Visseaux
  15. Antoine Bridier-Nahmias  Is a corresponding author
  1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, France
  2. Centre for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS Inserm, PSL Research University, France
  3. Sorbonne Université, Inserm, iPLESP, France
  4. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, France
  5. Service de Virologie, Hôpital de la Pitié-Salpêtrière, AP-HP, France

Abstract

Although France was one of the most affected European countries by the COVID-19 pandemic in 2020, the dynamics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) movement within France, but also involving France in Europe and in the world, remain only partially characterized in this timeframe. Here, we analyzed GISAID deposited sequences from January 1 to December 31, 2020 (n = 638,706 sequences at the time of writing). To tackle the challenging number of sequences without the bias of analyzing a single subsample of sequences, we produced 100 subsamples of sequences and related phylogenetic trees from the whole dataset for different geographic scales (worldwide, European countries, and French administrative regions) and time periods (from January 1 to July 25, 2020, and from July 26 to December 31, 2020). We applied a maximum likelihood discrete trait phylogeographic method to date exchange events (i.e., a transition from one location to another one), to estimate the geographic spread of SARS-CoV-2 transmissions and lineages into, from and within France, Europe, and the world. The results unraveled two different patterns of exchange events between the first and second half of 2020. Throughout the year, Europe was systematically associated with most of the intercontinental exchanges. SARS-CoV-2 was mainly introduced into France from North America and Europe (mostly by Italy, Spain, the United Kingdom, Belgium, and Germany) during the first European epidemic wave. During the second wave, exchange events were limited to neighboring countries without strong intercontinental movement, but Russia widely exported the virus into Europe during the summer of 2020. France mostly exported B.1 and B.1.160 lineages, respectively, during the first and second European epidemic waves. At the level of French administrative regions, the Paris area was the main exporter during the first wave. But, for the second epidemic wave, it equally contributed to virus spread with Lyon area, the second most populated urban area after Paris in France. The main circulating lineages were similarly distributed among the French regions. To conclude, by enabling the inclusion of tens of thousands of viral sequences, this original phylodynamic method enabled us to robustly describe SARS-CoV-2 geographic spread through France, Europe, and worldwide in 2020.

Editor's evaluation

This paper is a comprehensive, quantitative, and robust overview of the global, European, and French genomic epidemiology of SARS-CoV-2 in the first year of the pandemic. It contributes methodological advances in maximum likelihood phylogeography, using multiple scales and providing a simulation-based validation. The results show two distinct patterns of SARS-CoV-2 exchange events between the first and second half of 2020, with Europe being involved in most intercontinental exchanges: France experienced viral introductions primarily from North America and Europe during the first wave, while the second wave saw limited intercontinental movement and a significant contribution of the virus from Russia into Europe.

https://doi.org/10.7554/eLife.82538.sa0

Introduction

On December 1, 2019, an outbreak of severe respiratory disease was identified in the city of Wuhan, China (Huang et al., 2020). The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was rapidly identified as the agent of the disease (Zhu et al., 2020), responsible for the ongoing global pandemic of coronavirus disease 2019 (COVID-19). By the end of 2020, the virus caused over 1.8 million deaths worldwide including ~65,000 deaths in France, concomitantly with social and economic devastations in many regions of the world (Mofijur et al., 2021; Santomauro et al., 2021). Since the beginning of COVID-19 pandemic, the scientific community has thoroughly characterized the virus, including its pathogenesis, the monitoring of its circulation in human populations, and the development of several treatments or vaccines (Cevik et al., 2020; Krammer, 2020). Epidemiological models have been particularly helpful to quantify viral spread both in the short and long terms and to inform public health decisions (Hoertel et al., 2020; Kissler et al., 2020).

In addition to clinical and epidemiological insights, viral whole-genome sequencing has become a powerful and invaluable tool to better understand infection dynamics (Volz et al., 2013), including the COVID-19 pandemic. The number of available SARS-CoV-2 whole-genome sequences has rapidly grown thanks to the efforts of scientists and researchers gathered via international networks such as the Global Initiative on Sharing All Influenza Data, GISAID (https://www.gisaid.org/; Khare et al., 2021). These genomic sequences are essential to effectively reconstruct the global viral spread and the origins of variants. Genomic data have become a strong asset in addition to epidemiological data to inform governments and help public health decisions (Attwood et al., 2022; Rife et al., 2017). However, due to the computational time required for many analyses, existing phylogenetic tools are limited for studying large amounts of data such as those generated by widespread viral sequencing. Therefore, it is still necessary to develop methods to analyze large datasets while optimizing computational calculation times. Producing appropriate subsamples through several replicates may be an efficient approach in this matter.

In France, the first COVID-19 suspected case was identified in late December 2019 (Deslandes et al., 2020), and the first confirmed cases of SARS-CoV-2 infection were detected on January 24, 2020, in individuals who had recently traveled in China (Bernard Stoecklin et al., 2020). COVID-19 cases remained scarce until the end of February, when the national incidence curve of new SARS-CoV-2 infections started to rise (Figure 1). By the end of February, reinforced measures were announced, including social distancing, cessation of passenger flights to France, school closure, and finally, a complete lockdown across the entire country from March 17 to May 10, 2020. The reported daily incidence and numbers of severe cases peaked at the beginning of April 2020 before decreasing steadily until August 2020. However, after the relaxation of social distancing measures in June, a second wave of infections occurred in early September peaking at more than 100,000 positive cases and 1300 confirmed deaths in a single day on November 2, 2020 (Figure 1). After this peak, daily incidence and severe COVID-19 cases gradually diminished down to a number of positive daily cases varying between 2000 and 25,000 at the end of 2020 thanks to a second national lockdown applied between October 29 and December 15, 2020. Epidemiological trends were similar in most European countries except for Russia or Romania, where high rates of SARS-CoV-2-related deaths were reported even in the summer of 2020. Of note, the other continents showed different patterns of virus circulation: compared to Europe, the number of deaths increased about 2 weeks later in North America and remained high throughout 2020; and from early May, Asia and South America were also highly impacted by the pandemic (Figure 1—figure supplement 1).

Figure 1 with 2 supplements see all
Timeline of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-related deaths and stringency index in France, 2020.

Key events are indicated on the timeline. Official lockdowns included stay home orders and closure of schools and daycares. Based on SARS-CoV-2-related deaths, the two first French epidemic waves are, respectively, dated from March to July 2020, and September to December 2020. SARS-CoV-2-related deaths are displayed as the daily number of deaths (light blue area) and as the weekly average of daily number of deaths (dark blue curve). The stringency (Oxford) index is a composite measure based on different response indicators including school and workplace closures and travel bans, rescaled to a value from 0 to 100 (100 = strictest) (Hale et al., 2021).

Elucidating the SARS-CoV-2 dynamic throughout the various phases of the pandemic is paramount to better anticipate how to limit virus circulation for future viral epidemics (Rife et al., 2017). Here, we analyzed GISAID deposited sequences to elucidate the origins and spread of the virus in France, Europe, and the world from January 1 to December 31, 2020. Through a maximum likelihood discrete trait phylogeographic method, we estimated the main geographical areas that contributed to viral introduction into France and Europe, the countries/continents to which France exported SARS-CoV-2 the most and the contribution of the different French regions to the national circulation of the virus. The main exchanged lineages were also investigated. We looked at the differences in virus circulation during each of the two European epidemic waves of 2020 independently. Given France’s central geographic location in Europe and the high proportion of international travelers visiting this country before the pandemic, we aimed to explore the role that France played in SARS-CoV-2 exchanges both in Europe and worldwide.

Results

Defining appropriate subsamples using simulations

From January 1 to December 31, 2020, a total of 638,706 sequences were retained in our study. Inferring a phylogenic tree with such a large number of sequences would require very long calculation times. To overcome this limit, we constructed smaller datasets by randomly choosing subsamples (with replacement) of the sequences. The number of sequences for each country at each week was chosen to be proportional to the number of SARS-CoV-2-related deaths per country and per week with a 2-week shift to account for the time between infection and death.

As a proof of concept, we conducted an extensive simulation study to estimate the accuracy of the discrete trait phylogeographic inference for rates of transitions between two distinct locations. First, to evaluate the precision of such inference on a tree of 1000 leaves, we simulated a two-states model with different combinations of transition rates in 50 replicates. Parameters were correctly estimated with limited variability across the 50 replicates. The median parameters across replicates gave a very accurate estimation (Figure 2A).

Figure 2 with 4 supplements see all
Estimating variability in transition rates using simulations.

(A) Estimated versus true parameters in the simulation study of the two-states model. The two panels show the two transition rates. For each set of parameters, 50 replicates were conducted. The large red dot is the median of the replicates. The red cross is the true parameter value, on the bisector. (B) Estimated rate of transition in subsampled trees. For each replicate (n = 50), one point is the result of one subsampled smaller phylogenetic tree (from a large phylogenetic tree). The big dot shows the median for each replicate. The horizontal red line is the overall median (of the medians), across replicates. The horizontal dashed gray line is the true rate. Only one of the two rate parameters is shown. (C) Log-median error in parameter estimation as a function of the log number of replicates, when inference is conducted on truly independent replicate evolutionary histories, on a tree of 1000 leaves. The points are the data, the dashed line shows the line of slope ‘−1’ which is the expectation as the replicates are truly independent. (D) Log-median error as a function of log number of subsamples used for the inference done on subsampled phylogenetic trees. The colored points and lines show the inference done on 50 distinct realizations of the evolutionary process on the whole tree. The dashed line is the overall regression line with a slope of −0.7. We tested between 1 and 10 subsampled trees (x-axis).

Next, to evaluate how independent parameter estimates are done on randomly subsampled trees of the same larger phylogeny, we inferred parameters on 50 100-leaves trees randomly subsampled from a 10,000-leaves SARS-CoV-2 phylogenetic tree. For each resulting subtree, we conducted inferences on 50 replicates corresponding to 50 realizations of the stochastic process of evolution of the discrete character – as done in the first simulation – on the whole tree of 10,000 leaves. For each replicate, we observed some error on the estimation of the parameter, because one replicate only corresponds to one possible realization of the evolutionary process, although the overall median of inferred parameters across subsampled trees was closer to the true parameter values (Figure 2B).

Different estimations of the transition rates conducted on different subsampled trees are not expected to be fully independent because the subtrees partly share the same evolutionary history. Therefore, we estimated the level of independence of these estimations. When several estimates are perfectly independent from one another and are averaged to obtain the final estimate of the quantity of interest, we expect the error in parameter estimation to converge to 0 with a 1/N (N−1) scaling, where N is the number of replicates. This is indeed what we observed when we calculated the error on estimation of the parameter as a function of the chosen number of replicates N in the first set of simulations. Here, the replicates were truly independent replicate realizations of the evolutionary history and inference was conducted on the whole tree of 1000 leaves (Figure 2C). On the contrary, when estimates are perfectly dependent, error on the averaged parameter estimate is expected to not decrease with N. When evaluating the error on parameter estimates across subsamples of the large tree, we expected the scaling of error as a function of number of subsamples N to be intermediate between non-independence (~N0 scaling) and perfect independence (~N−1 scaling). Using the relationship between log(error) as a function of log(N), we estimated a slope of −0.7 (Figure 2D). Thus, inferences conducted on subsamples of the same phylogenetic tree are partly independent. The precise degree of independence is expected to depend on the shape of the phylogenetic tree, but the coefficient was similar when doing the same study on a randomly generated tree instead of the SARS-CoV-2 tree.

We finally conducted another round of simulations to evaluate the error on what we considered as exchange between multiple locations when using sparse subsampling. For that, a 1,000,000-leaves tree was simulated with a five-states discrete trait representing geographical units. Then, 100 subsampled 1000-leaves trees from the whole phylogenetic tree were produced and the ancestry for the discrete trait was reconstructed from the leaf data only. We estimated the number of transitions (exchanges) of each type and compared them with the one obtained from the main tree, finding a mean error rate of 2.7% over the 100 subsamples (Figure 2—figure supplement 1).

Altogether, these simulations suggested that using subsamples of 1000 sequences from a large dataset and performing partially independent replicates seems to be sufficient to accurately estimate transition events.

Description of the datasets and global diversity of SARS-CoV-2 sequences

We defined 100 subsamples of sequences proportionally to COVID-19 deaths across geographic locations and time for different geographic scales (worldwide, Europe, and French regions) and time periods (from January 1 to July 25, 2020, and from July 26 to December 31, 2020, respectively, covering the first and second European epidemic waves). We chose the sampling intensity guided by the weekly number of SARS-CoV-2-related deaths reported by public health organizations. Here, the number of SARS-CoV-2-related deaths was used rather than the number of detected cases because the latter was biased due to variable ascertainment rates across countries and time. For example, the larger number of PCR tests conducted in the second epidemic wave could wrongly suggest that the virus circulated much more during the second half of 2020 (Figure 1—figure supplement 2).

For each geographic scale and time period, there was a positive correlation between the weekly number of SARS-CoV-2-related deaths and the weekly number of sequences we included for a subsample (Spearman’s rank correlation, p < 0.001; r = 0.94 for the lowest correlation). We also confirmed that the number of sequences per territory was, on average, properly temporally distributed within each time period (Figure 2—figure supplement 2).

Some countries and French administrative regions were however discarded in the analyses because they were not sufficiently represented in the GISAID database. Overall, a total of 39,288 and 39,755 distinct SARS-CoV-2 sequences were included across the 100 sampled phylogenies for the worldwide dataset, respectively, for the first and the second time periods (Table 1). At the European scale, 26,757 and 27,658 different SARS-CoV-2 sequences covering 11 countries were analyzed across the 100 subsamples (Table 1). Focusing on French administrative regions, sequences available on the GISAID database were very sparse. The Provence-Alpes-Côte d’Azur (PACA, Marseille area) was the only region that highly sequenced SARS-CoV-2 in 2020. Île-de-France (IDF, Paris area), Auvergne-Rhône-Alpes (ARA, Lyon area), Occitanie (OCC, Toulouse and Montpellier area), and Bretagne (BRE, Rennes area) have sequenced much less than PACA, but provided sufficient data to investigate SARS-CoV-2 geographic exchange events in France. The remaining French administrative regions were discarded since too few sequences were available to properly match the number of weekly SARS-CoV-2 deaths (Figure 2—figure supplement 3). We thus considered 2543 unique sequences across the 100 subsamples between January 1 and July 25, 2020, and 3124 unique sequences between July 26 and December 31, 2020 (Table 1).

Table 1
Number of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences investigated for each dataset.
DatasetGeographiesPeriod investigatedAverage number of sequences sampled in a subsampleTotal number of sequences
WorldAfrica, Asia, Europe, France, North America, Oceania, South AmericaJanuary 1 to July 25, 202084639,288
July 26 to December 31, 202077739,755
EuropeBelgium, France, Germany, Italy, The Netherlands, Poland, Romania, Russia, Spain, Sweden, United KingdomJanuary 1 to July 25, 202090426,757
July 26 to December 31, 202087227,658
FranceAuvergne-Rhône-Alpes (ARA), Bretagne (BRE), Île-de-France (IDF), Occitanie (OCC), Provence-Alpes-Côte d’Azur (PACA)January 1 to July 25, 20204162543
July 26 to December 31, 20204333124

The genomic diversity of circulating SARS-CoV-2 in the different continents, countries, and French regions was found to be similar (Figure 2—figure supplement 4). Overall, genomes showed high sequence conservation compared to the Wuhan-Hu-1 reference in 2020 (mean and median of ~13 single nucleotide polymorphisms (SNPs) with 95% of the distribution comprised between 4 and 25 SNPs).

Which continents exchanged SARS-CoV-2 with Europe and France?

Through 100 distinct, dated and ancestrally reconstructed phylogenetic trees, we first studied SARS-CoV-2 exchanges worldwide for each of the time periods studied. Between January 1 and July 25, 2020 (covering the first European epidemic wave), we found that Europe (excluding France) accounted for 57.3% of the total number of exportation events, and was the main source of SARS-CoV-2 exportations toward the other continents in all of the subsamples (Figure 3A–D and Figure 3—figure supplement 1). North America also highly participated in virus exportation during this period time (24.3%). South America and Asia were each associated with 7.1% of the total number of exportation events, consistent with a later circulation of the virus in these continents (Figure 1—figure supplement 1). France was estimated to have contributed 4.2% of the total exportation events, indicating that France was not the major European source of SARS-CoV-2 at the international level between January 1 and July 25, 2020. The exportation events from France were mostly headed toward Europe and, to a lesser extent, to North America, South America, and Asia (Figure 3B and Figure 3—figure supplement 2). These events mostly consisted of the B.1 (80.2%), B.1.1 (6.1%), and B.1.356 (3.5%) lineages (Figure 3E). North America received a large proportion of SARS-CoV-2 from other continents (28.5% of the introduction events), followed by South America (23.1%) and Europe (17.3%) (Figure 3A–D). An average of 11.7% of all SARS-CoV-2 introductions were into France, and originated from North America (50.7%) and Europe (45.7%) (Figure 3—figure supplement 2). These introductions consisted of the B.1 (71.9%) and B.1.1 (18.2%) lineages (Figure 3E). The first introductions into France were detected at the beginning of February, and progressively increased to reach a peak just before the nationwide lockdown from March 17, 2020 (Figure 3D). Only South America and Asia were associated with a continuous increase in SARS-CoV-2 introductions after this date, probably because no such drastic measures were generalized there and the circulation of the virus remained limited in these regions.

Figure 3 with 2 supplements see all
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exchange worldwide.

Exchange events were inferred with 100 subsampled phylogenies between January 1 and July 25, 2020, and between July 26 and December 31, 2020. (A) Number of introduction and exportation events for each subsample and for each continent and France. (B) SARS-CoV-2 exchange flows between continents and France during the two time periods investigated. In these plots, migration flow out of a particular location starts close to the outer ring and ends with an arrowhead at the destination location. Arrow width is proportional to the exchange strength. (C) Number of exportation and (D) introduction events per territory over time. The mean number of exchanges over the subsamples and for each week was calculated. Gray bars indicate the dates of the complete lockdowns in France. (E) Proportion of pango lineages exported from France and introduced into France. Lineages with a proportion <3% were grouped into the ‘other’ clade.

From July 26 to December 31, 2020 (second European epidemic wave), we observed 1.4 times fewer exchange events worldwide compared to the first half of 2020. Importantly, we showed the importance of analyzing several subsamples, as there was a large variation in the total number of exportation or introduction events, especially in Europe (Figure 3A). Europe was, as between January 1 and July 25, 2020, the main source of exchanges with a total of 49.9% of the exportation events across subsamples, followed by North America (18.0%), Asia (14.1%), and South America (9.5%) (Figure 3A–D and Figure 3—figure supplement 1). Most of the events occurred during the summer period (June to August 2020), corresponding to the summer holidays in most countries of the world. France accounted for 8.3% of the exportation events, but they were almost exclusively oriented toward other European countries (89.9%) and overall detected from August to November 2020 (Figure 3B and Figure 3—figure supplement 2), consistent with the SARS-CoV-2 incidence in this period in France (Figure 1—figure supplement 2). The B.1.160 lineage accounted for almost all the exportation events from France (97.9%) (Figure 3E). In a similar fashion, SARS-CoV-2 introductions into France mostly originated from Europe (81.9%) (Figure 3B) and were detected at a low rate from April 2020, then at a higher but always limited rate from June 2020, and at a strong level in September and October 2020 (Figure 3—figure supplement 2). These SARS-CoV-2 introductions into France consisted in majority of B.1.177 (28.0%), B.1.160 (24.7%), B.1 (17.7%), B.1.1 (10.0%), and B.1.258 (7.6%) lineages (Figure 3E).

How did the virus spread in Europe?

We then aimed to get a more comprehensive view of SARS-CoV-2 exchanges between France and other European countries with the same approach. Here, we only focused on European countries associated with a high incidence and without under-sampling due to a lack of data on GISAID (Table 1).

By calculating the count of introduction and exportation events between January 1 and July 25, 2020 across the subsamples, we observed that Italy was the major contributor to virus exportation toward other European countries, with an average of 41.5% of the total number of exportation events. The United Kingdom, France, and Spain also highly participated in virus exportation, and consisted of 21.6, 18.1, and 11.8% of the total number of exportation events, respectively (Figure 4A–D and Figure 4—figure supplement 1). These observations are in line with epidemiological data, since Italy was the first country in Europe to be heavily affected by the pandemic; and France, the United Kingdom, and Spain were the three other European countries associated with the highest number of SARS-CoV-2-related deaths during the first wave (Figure 1—figure supplement 1). The number of all exportation events however decreased after the implementation of lockdowns in the different countries (with the first one occurring in Italy on March 9, 2020) (Figure 4C). France mostly exported SARS-CoV-2 toward Belgium (25.5%), Germany (21.0%), and the United Kingdom (20.4%), and a little less toward Spain (10.1%) and the Netherlands (9.2%). All of these events occurred before the official lockdown in France (March 17, 2020) (Figure 4B and Figure 4—figure supplement 2), and consisted of the B.1 (78.5%), B.1.1 (8.8%), and B.1.356 (6.9%) lineages (Figure 4E). The rate of SARS-CoV-2 exportations from France then decreased until the second European epidemic wave, as it was also the case for other European countries except Russia (Figure 4C). For all introduction events, the proportions were more balanced: the United Kingdom accounted for a quarter of the total number of events, while Russia, Belgium, Germany, Italy, Spain, France, and the Netherlands represented between 6.5 and 11.9% of the total number of events (Figure 4D). In France, a high rate of introduction events was observed in February and March before the lockdown and originated mostly from Italy (44.3%), the United Kingdom (30.8%), and Spain (14.6%) (Figure 4—figure supplement 2). These introductions consisted in majority of the B.1.1 (47.9%) and B.1 (35.4%) lineages (Figure 4E).

Figure 4 with 2 supplements see all
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exchanges on the European scale.

Transmission events were calculated by averaging the results from 100 subsampled phylogenies between January 1 and July 25, 2020, and between July 26 and December 31, 2020. (A) Number of introduction and exportation events for each subsample and for each European country. (B) SARS-CoV-2 exchange flows between European countries during the two time periods investigated. In these plots, migration flow out of a particular location starts close to the outer ring and ends with an arrowhead at destination location. Arrow width is proportional to the exchange strength. (C) Number of exportation and (D) introduction events per territory over time. The mean number of exchanges over subsamples and for each week was calculated. Gray bars indicate the dates of the complete lockdowns in France. (E) Proportion of pango lineages exported from France and introduced into France. Lineages with a proportion <3% were grouped into the ‘other’ clade.

The second time period (from July 26 to December 31, 2020) showed a different pattern of exchanges. Here, we estimated 1.3 times fewer exchanges compared to the first half of 2020. Russia accounted for most of the exportation events (27.6%) (Figure 4A, B). These events were estimated to occur during the spring (after the relaxation of containment measures in most European countries) and the summer periods (Figure 4C). This result was expected since Russia was almost the sole European country to report a high number of SARS-CoV-2-related deaths during this period (Figure 1—figure supplement 1). Spain (16.9%), France (14.0%), Germany (10.2%), Italy (7.9%), Poland (7.5%), and the United Kingdom (6.6%) also highly participated in virus exportation (Figure 4A–D and Figure 4—figure supplement 1). Most of these events were detected between August and October 2020 (Figure 4C), and strongly decreased just before the second lockdown in most European countries (the first one occurring in Spain on October 9, 2020). Again, these observations are consistent with epidemiological reports, as Spain was the first country in European Union to be associated with a sharp increase of SARS-CoV-2-related deaths, rapidly followed by France. France mostly exported the virus toward Italy (22.2%), Germany (21.3%), and Belgium (19.1%) (Figure 4B and Figure 4—figure supplement 2), and mostly the B.1.160 lineage (79.2%) (Figure 4E). Focusing on introduction events, Germany accounted for 23.8% of the total number of introductions, followed by the United Kingdom (15.3%), Italy (15.1%), and France (9.8%). For the remaining European countries, the proportion of introduction events was comprised between 4.5 and 7.6%, except for Russia that accounted for <0.1% of the events. Introductions into France mostly originated from Russia (31.7%) during the summer period, and later from Spain (23.7%), Italy (12.5%), and Germany (11.0%) (Figure 4—figure supplement 2). These SARS-CoV-2 introductions into France consisted in majority of B.1.177 (31.0%), B.1.1 (22.6%) and B.1.258 (7.6%) lineages (Figure 4E).

The first and second French epidemic waves: two patterns of SARS-CoV-2 transmissions within France

We finally conducted an analysis of virus spread inside of France by studying exchanges between French administrative regions. As previously explained, only five regions (ARA, BRE, IDF, OCC, and PACA, which cover the most densely populated urban areas in France, that is, Paris, Lyon, Montpellier, Toulouse, and Marseille) were investigated since only a few, if any, sequences were available on GISAID for the other regions (Figure 2—figure supplement 3). Of note, the Grand-Est region was one of the first massively affected by SARS-CoV-2 in France during the first wave and, thus, must have played a major role. The fact that it was not sampled is a limitation that must be taken into account when reading these results.

During the first French epidemic wave, almost all exportation events originated from IDF (84.0%). ARA, BRE, OCC, and PACA only accounted for 8.2, 0.4, 2.3, and 5.1% of the total number of exportation events (Figure 5A, B and Figure 5—figure supplement 1). As for the wider geographic scales, most exportation events occurred between February and March, and largely decreased after the national lockdown (Figure 5C and Figure 5—figure supplement 2). ARA was the region having most SARS-CoV-2 introductions, with 48.3% of the total number of introduction events, followed by PACA (22.1%), OCC (17.3%), and BRE (7.0%) (Figure 5C). All the regions were mostly affected by the B.1 lineage at similar rates, except for OCC with more B.1.1 lineage (Figure 5E).

Figure 5 with 2 supplements see all
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exchanges in France.

Exchange events were calculated through 100 subsampled phylogenies between January 1 and July 25, 2020, and between July 26 and December 31, 2020. (A) Number of introduction and exportation events for each subsample and for each French administrative region. (B) SARS-CoV-2 exchange flows between French administrative regions during the two time periods investigated. Migration flow out correspond to the mean of the French sets of sequences. The number of exportation and introduction events per French region over time are detailed (C) between January 1 and July 25, 2020, and (D) between July 26 and December 31, 2020. The proportion of pango lineages exported or introduced are given for each French region during (E) the first and (F) the second time period investigated. Lineages with a proportion <3% were grouped into the ‘other’ clade.

During the second wave, IDF was no longer the single epicenter of SARS-CoV-2 inter-region exchanges. Both IDF and ARA participated concomitantly in virus spread at similarly large levels (43.3 and 42.6% of the total number of exportation events, respectively) (Figure 5A, B and Figure 5—figure supplement 1). The first exportation events mostly occurred in the spring and summer in IDF. ARA was associated with a high increase in exportation events by the end of July, consistent with the resumption of the epidemic. Again, the second national lockdown – implemented in October 29, 2020 – was associated with a strong decline of the number of exchange events (Figure 5D). Regarding introduction events, PACA received 31.3% of the total number of introductions, followed by ARA (24.7%), IDF (24.1%), OCC (11.5%), and BRE (8.2%). Overall, ARA, BRE, OCC, and PACA mostly sent and received the B.1.160 lineage. Only IDF showed a different pattern, exporting at similar rates both the B.1.160 and the B.1.177 lineages (Figure 5F).

Altogether, SARS-CoV-2 exchange events within France showed two distinct patterns between the first and second epidemic waves of 2020. While most SARS-CoV-2 exchanges involved the Paris area during the first half of 2020, the situation was much more balanced after the summer 2020 with the Lyon area (ARA) also contributing to many exchanges. Viral spread across French regions in summer 2020 could be explained by travel associated with the summer holidays.

Discussion

Prior to this analysis, the dynamics of SARS-CoV-2 transmissions within France or between France and other countries (in Europe or worldwide) has been only partially characterized. To our knowledge, three studies have explored the virus spread in France, but only focused on the first French epidemic wave and/or could not address SARS-CoV-2 introduction events from other continents or other European countries (Danesh et al., 2020; Elie and Alizon, 2020; Gámbaro et al., 2020). Other studies have focused on larger scales such as Europe as a whole, but did not explore in-depth the dynamics of exchanges especially in France (Hodcroft et al., 2021; Lemey et al., 2021; Nadeau et al., 2021). Here, we studied introduction and exportation events in France at different scales during the first year of the pandemic, covering the two first European epidemic waves. Our approach provides a way to analyze large amounts of genomic data for future epidemics and improve our understanding of viral spread in France and Europe.

Methodologically, most phylodynamics studies on SARS-CoV-2, as well as on other viruses or pathogens, are based on Bayesian approaches (notably with the popular and robust BEAST tool; Drummond and Rambaut, 2007) using one or more phylogenetic trees composed of several thousand sequences. To reconstruct the evolution parameters and ancestral states at nodes (here, the geographical area and the SARS-CoV-2 lineage), Bayesian methods require high computational power, representing their main limitation (Drummond and Rambaut, 2007). Overall, the generation of a dated phylogenetic tree and the ancestral state reconstruction for even a few thousands of sequences is no small feat, requiring a very long computational time, and sometimes parameter estimation fails to converge. Furthermore, conclusions obtained would be based on a single sample with a relatively limited number of sequences, but different samples may lead to different results (Hall et al., 2016). As stated by Dellicour et al., Bayesian phylogenetic analysis on about 2800 SARS-CoV-2 sequences required over 150 hr to obtain enough samples from the posterior distribution across 15 parallel runs with a GPU accelerated implementation (Dellicour et al., 2021). Since the GISAID database stores several hundred thousand SARS-CoV-2 sequences (n = 638,704 on May 8, 2022 for the year 2020 alone), we believe that using a phylogeny of a few thousand sequences only might not be the best way to accurately infer exchange events. In order to obtain more comprehensive trends drawing conclusions from a larger number of SARS-CoV-2 sequences, we constructed 100 semi-independent maximum likelihood phylogenies of relatively limited size (from 418 to 904 sequences depending on the geographic scale and time period investigated) and inferred ancestral states by a maximum likelihood discrete trait phylogeographic method. Bayesian approach seems not to be more accurate than maximum likelihood for the reconstruction of ancestral states (Hanson-Smith et al., 2010), and we gain a lot in terms of sampling, computational time, and trends by sampling 100 semi-independent subsamples. The maximum likelihood-based method was recently used to depict with accuracy the genomic epidemiology of the virus in Canada (McLaughlin et al., 2022), as well as in other contexts (Cunningham et al., 1998; Pupko et al., 2000).

We characterized SARS-CoV-2 spatial spread within France, and between France and European and all other countries in 2020. Between January 1 and July 25, 2020 (first European epidemic wave), France mostly exchanged SARS-CoV-2 with Europe and North America. The observation contrasts with the second European epidemic wave – that took place between late October and the end of 2020 – where France almost exclusively exchanged the virus with other European countries. This result is logical insofar as the borders of France have gradually reopened with European countries but remained mostly closed to the rest of the world. A very low rate of SARS-CoV-2 exchanges between Europe (including France) and Asia or South America was found in 2020, which is not surprising since the virus started to spread in these continents only from May 2020 where all borders of Europe were already closed. On the European scale, France was a major source of SARS-CoV-2 exchange. It ranked third regarding exportation events (mainly directed toward neighboring countries) only behind the United Kingdom and Italy between January 1 and July 25, 2020, and third again behind Spain and Russia (where exchange events happened mostly in summer 2020) between July 26 and December 31, 2020. The first cases introduced from European countries into France came from Italy and Spain, consistent with the early days of SARS-CoV-2 spread in Europe. These results are in line with the observations made across Europe through Bayesian inference (Lemey et al., 2021). In line with the lack of SARS-CoV-2 genomic diversity at the beginning of the pandemic, both B.1 and B.1.1 were the most representative lineages introduced into or exported from France. During the second European epidemic wave, some cases were introduced into France (and some other European countries) from Russia where virus incidence was high since the summer 2020. Then, most introductions into France originated from Spain which initiated the second wave in European Union, followed by Italy and Germany. Most of the introductions into France were the lineages B.1.1, B.1.177, and B.1.258 that highly circulated in Europe. By contrast, France mostly exported the B.1.160 lineage which was firstly detected in July 2020 in Marseille (PACA, France), and represented the majority of sequences in fall 2020. The frequency of the B.1.160 lineage then decreased rapidly at the beginning of 2021 in favor of B.1.1.7. Interestingly, the number of international exchanges was lower during the second wave compared to the first one, indicating the effectiveness of border control measures and that French people reduced travel outside France after the first national lockdown.

Our results on the French scale should be interpreted considering several limitations. In addition to a limited overall size, genomic data only covered 5 out of 13 administrative regions (Figure 2—figure supplement 3). The North and East of France (i.e., regions Hauts-de-France and Grand-Est, respectively) that border Belgium, Luxembourg, and Germany, were not investigated. More importantly, the region Grand-Est was associated with early large clusters that participated in virus spread across the country during the first European epidemic wave, such as the evangelical gathering of the Christian Open-Door Church (Mulhouse, Haut-Rhin) that took place from 17 to February 21, 2020 and consisted of approximately a thousand infected individuals. Keeping these limitations in mind, we observed two distinct patterns of SARS-CoV-2 exchanges within France in 2020. During the first wave, most of the exportations originated from IDF (Paris area). This may be linked with the larger number of individuals living in IDF who left to other French regions at the beginning of the epidemic. In the second epidemic wave, we observed a similar contribution of IDF to exchanges, but this time accompanied with ARA (Lyon area) which was associated with a strong SARS-CoV-2 incidence. The circulation of lineages was the same between the French regions, both during the first and second epidemic waves. The B.1 was the major circulating lineage during the first wave, while B.1.160 was mostly transmitted during the second one, concomitantly with B.1.177 in IDF.

To conclude, using an original phylodynamic approach to investigate a large SARS-CoV-2 sequence dataset, we characterized SARS-CoV-2 spatial spread within France, across Europe and in the world in 2020 and unraveled distinct patterns of exchanges between the two first epidemic waves.

Materials and methods

Sequence acquisition and curation, and dataset production

Request a detailed protocol

A phylodynamics study (which aims to depict with accuracy exchange events between locations) requires a large number of sequences (Attwood et al., 2022). SARS-CoV-2 genome sequences were downloaded from the GISAID database on May 8, 2022 (n = 10,005,024). The following inclusion and exclusion criteria were applied to select samples for analysis: (1) only complete viral genomes from infected individuals (more than 29 kb in length) were included; (2) the maximal proportion of undetermined nucleotide bases was fixed to 5%; (3) sequences with unknown or incomplete sampling dates, or unknown geographical location, were excluded; (4) sampling dates were comprised between January 1, 2020 and December 31, 2020; and (5) sequences from non-human were excluded. The final dataset included 638,704 sequences. In a second pass, 26,103 genomes were considered as outliers as they were in the extreme 5% regarding the number of non-synonymous mutations.

We studied the epidemic at three different geographic scales: (1) worldwide, to identify SARS-CoV-2 exchanges between continents and France; (2) at the European scale, to determine more precisely which European countries mostly exchanged with France; and finally (3) at the French administrative region level, to get a better understanding of virus spread inside the country. For each geographic scale, the analyses were independently run for the two first European epidemic waves (January 1 to July 25, 2020 (weeks 1–30), and July 26 to December 31, 2020 (weeks 31–52); Figure 1).

Considering the large number of sequences in GISAID that passed our criteria (n = 638,704), we produced 100 subsamples of sequences for each investigated geographic scale analysis (i.e., worldwide, Europe, and French administrative regions) and time period (January 1 to July 25, 2020 and July 26 to December 31, 2020). To determine the number of SARS-CoV-2 genome sequences to be included so as to be representative of each location epidemic, we used the number of SARS-CoV-2-related deaths per territory and per week as a proxy for virus circulation taking place 2 weeks earlier. We did not use SARS-CoV-2 detected cases as the proxy for infections because the intensity of COVID-19 testing was increased after the first wave, which probably increased the ascertainment probability (Figure 1—figure supplement 2). For each subsample, sequences were drawn randomly from our GISAID extraction. Some countries and French regions had to be discarded because of their under-representation in the GISAID database. For the European scale, we selected countries with more than 10,000 sequences. Ukraine, Czech Republic, and Hungary passed this threshold, but were excluded for having too many weeks without enough data. At the French region scale, we included only regions with less than 20% of weeks lacking data (Figure 2—figure supplement 3).

For the first time period investigated (January 1 to July 25, 2020), an average of 846, 904 and 416 SARS-CoV-2 sequences were included in each set at the worldwide, European and French scales, respectively (Table 1). When considering the second time period (July 26 to December 31, 2020), the worldwide, European and French scale sequence subsamples contained 777, 872, and 433 genomic sequences (Table 1).

Defining appropriate subsample using simulations

Request a detailed protocol

Before performing phylodynamics analysis using SARS-CoV-2 phylogenetic trees of limited sizes, we performed a series of simulations to demonstrate the accuracy of our approach. In a first simulation, we investigated the inference of transition parameters on a random phylogenetic tree of 1,000 leaves produced with the rtree function of the ape package (Paradis et al., 2004) in R (R Development Core Team, 2022). For that, we simulated a two-states model, with two different transition rates, using the ape rTraitDisc function (Paradis et al., 2004) with the ‘all rates different’ (ARD) model. From each simulation, the two transition parameters were inferred using the ace function of the ape package (which performed maximum likelihood inference of the two rates). In this simulation, we tested different combinations of the two rates.

In a second simulation analysis, we tested the accuracy of inference done on subsamples of a larger phylogenetic tree. We considered a SARS-CoV-2 phylogenetic tree of 10,000 leaves retrieved from the GISAID database, and subsampled 50 randomly chosen 100-leaves phylogenies for inference. For each resulting subtree, we conducted inferences – using the same strategy as previously explained – on 50 replicates corresponding to 50 realizations of the stochastic process of evolution of the discrete character on the whole tree of 10,000 leaves.

Then, we evaluated the level of independence between estimations done on subsampled trees, using the SARS-CoV-2 phylogenetic tree of 10,000 leaves and following the same strategy as previously detailed.

In order to evaluate the error on counting transitions between states of a discrete trait along a phylogeny when using sparse subsampling, we simulated a main tree of 1,000,000 leaves with a five-states discrete trait again using rtree and rTraitDisc (ARD model) from the R ape package (Paradis et al., 2004). We then made 100 random subsampled 1000-leaves trees and reconstructed the ancestry for the discrete trait from the leaf data only using the ape function ace (once again ARD model). The mean error rate was obtained by calculating the number of transitions of each type and comparing them with the one obtained from the main tree. The non-parametric 95% confidence interval was calculated from 1000 bootstraps.

Multiple sequence alignment and maximum likelihood phylogenetic tree

Request a detailed protocol

For each subsample, genome sequences were independently aligned against the Wuhan-Hu-1 reference genome (GenBank accession: NC_045512.2) using MAFFT v7.450 then merged altogether (Katoh and Standley, 2013). The alignments were then masked for problematic sites listed by Nicola De Maio et al. (https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). All the sites present in the list were transformed as ‘N’ and thus eliminated from the subsequent analyses. From each masked alignment, we inferred a maximum likelihood phylogenetic tree using FastTree v.2.1.11 under a general time-reversible (GTR) model with a discrete gamma distribution to model inter-site rate variation and 1000 ultrafast-bootstraps to compute support values (Price et al., 2010). The choice of FastTree was mainly due to its short runtime while giving results very close to IQTree 2.1.2 (Minh et al., 2020) as described by Rob Lanfear (https://github.com/roblanf/sarscov2phylo/blob/master/tree_estimation.md). Each phylogenetic tree was dated using LSD2 v2.3 using Wuhan Hu-1 as outgroup for rooting (To et al., 2016), with the parameter ‘-e 2’ to be more stringent about outlier nodes. Phylogenetic trees were manipulated and viewed using treeio and ggtree R packages, respectively (Wang et al., 2020; Yu et al., 2017).

Worldwide, Europe, and French region SARS-CoV-2 phylodynamics analyses

Request a detailed protocol

For each phylogenetic tree we produced, ancestral state reconstructions of discrete geographical and SARS-CoV-2 lineage data were performed with the ace function of ape package in R (Paradis et al., 2004). Three distinct Markov models of discrete character evolution through maximum likelihood were tested and compared: the Equal Rates (ER) which assumes a single rate of transition among all possible states, the All Rates Different (ARD) which allows a distinct rate for each position transition between two states, and the Symmetrical Rates (SYM) where forward and reverse transitions share the same parameter. In this study, we only present the results obtained with the SYM model because both the ARD and ER models led to nearly identical observations compared to the SYM model. Of note, the use of ARD required extensive calculation times to infer maximum likelihood transition rates.

From the ancestral state reconstruction, we assigned nodes to a location when supported by ≥50% of the state likelihood. To detect and count each exchange event, we compared the geographic state of a node with the state of its two children. If the nodes corresponded to different geographic locations, we counted an exchange event from the parent node state to the given child node state. The midpoint of the parent–child branch was chosen as the date of exchange in all cases. The same strategy was applied to investigate the proportions of introduced or exported pango lineages. Exchange events across all the subsamples were then weekly averaged.

Data availability

All genome sequences and associated metadata in the dataset are published in GISAID's EpiCoV database. To view the contributors of each individual sequence with details such as accession number, virus name, collection date, originating lab and submitting lab and the list of authors, visit: https://doi.org/10.55876/gis8.230120zd. All the scripts developed for this study were deposited in the following GitHub repository: https://github.com/Rcoppee/PhyloCoV (copy archived at swh:1:rev:9a984b6414e8c2377694ce922eb31874cdaddf1a).

The following data sets were generated

References

    1. Hodcroft EB
    2. Zuber M
    3. Nadeau S
    4. Vaughan TG
    5. Crawford KHD
    6. Althaus CL
    7. Reichmuth ML
    8. Bowen JE
    9. Walls AC
    10. Corti D
    11. Bloom JD
    12. Veesler D
    13. Mateo D
    14. Hernando A
    15. Comas I
    16. González-Candelas F
    17. SeqCOVID-SPAIN consortium
    18. González-Candelas F
    19. Goig GA
    20. Chiner-Oms Á
    21. Cancino-Muñoz I
    22. López MG
    23. Torres-Puente M
    24. Gomez-Navarro I
    25. Jiménez-Serrano S
    26. Ruiz-Roldán L
    27. Bracho MA
    28. García-González N
    29. Martínez-Priego L
    30. Galán-Vendrell I
    31. Ruiz-Hueso P
    32. De Marco G
    33. Ferrús ML
    34. Carbó-Ramírez S
    35. D’Auria G
    36. Coscollá M
    37. Ruiz-Rodríguez P
    38. Roig-Sena FJ
    39. Sanmartín I
    40. Garcia-Souto D
    41. Pequeno-Valtierra A
    42. Tubio JMC
    43. Rodríguez-Castro J
    44. Rabella N
    45. Navarro F
    46. Miró E
    47. Rodríguez-Iglesias M
    48. Galán-Sanchez F
    49. Rodriguez-Pallares S
    50. de Toro M
    51. Escudero MB
    52. Azcona-Gutiérrez JM
    53. Alberdi MB
    54. Mayor A
    55. García-Basteiro AL
    56. Moncunill G
    57. Dobaño C
    58. Cisteró P
    59. García-de-Viedma D
    60. Pérez-Lago L
    61. Herranz M
    62. Sicilia J
    63. Catalán-Alonso P
    64. Muñoz P
    65. Muñoz-Cuevas C
    66. Rodríguez-Rodríguez G
    67. Alberola-Enguidanos J
    68. Nogueira JM
    69. Camarena JJ
    70. Rezusta A
    71. Tristancho-Baró A
    72. Milagro A
    73. Martínez-Cameo NF
    74. Gracia-Grataloup Y
    75. Martró E
    76. Bordoy AE
    77. Not A
    78. Antuori-Torres A
    79. Benito R
    80. Algarate S
    81. Bueno J
    82. del Pozo JL
    83. Boga JA
    84. Castelló-Abietar C
    85. Rojo-Alba S
    86. Alvarez-Argüelles ME
    87. Melon S
    88. Aranzamendi-Zaldumbide M
    89. Vergara-Gómez A
    90. Fernández-Pinero J
    91. Martínez MJ
    92. Vila J
    93. Rubio E
    94. Peiró-Mestres A
    95. Navero-Castillejos J
    96. Posada D
    97. Valverde D
    98. Estévez-Gómez N
    99. Fernandez-Silva I
    100. de Chiara L
    101. Gallego-García P
    102. Varela N
    103. Moreno R
    104. Tirado MD
    105. Gomez-Pinedo U
    106. Gozalo-Margüello M
    107. Eliecer-Cano M
    108. Méndez-Legaza JM
    109. Rodríguez-Lozano J
    110. Siller M
    111. Pablo-Marcos D
    112. Oliver A
    113. Reina J
    114. López-Causapé C
    115. Canut-Blasco A
    116. Hernáez-Crespo S
    117. Cordón MLA
    118. Lecároz-Agara M-C
    119. Gómez-González C
    120. Aguirre-Quiñonero A
    121. López-Mirones JI
    122. Fernández-Torres M
    123. Almela-Ferrer MR
    124. Gonzalo-Jiménez N
    125. Ruiz-García MM
    126. Galiana A
    127. Sanchez-Almendro J
    128. Cilla G
    129. Montes M
    130. Piñeiro L
    131. Sorarrain A
    132. Marimón JM
    133. Gomez-Ruiz MD
    134. López-Hontangas JL
    135. González Barberá EM
    136. Navarro-Marí JM
    137. Pedrosa-Corral I
    138. Sanbonmatsu-Gámez S
    139. Pérez-González C
    140. Chamizo-López F
    141. Bordes-Benítez A
    142. Navarro D
    143. Albert E
    144. Torres I
    145. Gascón I
    146. Torregrosa-Hetland CJ
    147. Pastor-Boix E
    148. Cascales-Ramos P
    149. Fuster-Escrivá B
    150. Gimeno-Cardona C
    151. Ocete MD
    152. Medina-Gonzalez R
    153. González-Cantó J
    154. Martínez-Macias O
    155. Palop-Borrás B
    156. de Toro I
    157. Mediavilla-Gradolph MC
    158. Pérez-Ruiz M
    159. González-Recio Ó
    160. Gutiérrez-Rivas M
    161. Simarro-Córdoba E
    162. Lozano-Serra J
    163. Robles-Fonseca L
    164. de Salazar A
    165. Viñuela-González L
    166. Chueca N
    167. García F
    168. Gómez-Camarasa C
    169. Carvajal A
    170. de la Puente R
    171. Martín-Sánchez V
    172. Fregeneda-Grandes J-M
    173. Molina AJ
    174. Argüello H
    175. Fernández-Villa T
    176. Farga-Martí MA
    177. Domínguez-Márquez V
    178. Costa-Alcalde JJ
    179. Trastoy R
    180. Barbeito-Castiñeiras G
    181. Coira A
    182. Pérez-del-Molino ML
    183. Aguilera A
    184. Planas AM
    185. Soriano A
    186. Fernandez-Cádenas I
    187. Pérez-Tur J
    188. Marcos MÁ
    189. Moreno-Docón A
    190. Viedma E
    191. Mingorance J
    192. Galán-Montemayor JC
    193. Parra-Grande M
    194. Stadler T
    195. Neher RA
    (2021) Spread of a SARS-cov-2 variant through Europe in the summer of 2020
    Nature 595:707–712.
    https://doi.org/10.1038/s41586-021-03677-y

Article and author information

Author details

  1. Romain Coppée

    Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    romain.coppee@inserm.fr
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3024-5928
  2. François Blanquart

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Centre for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS Inserm, PSL Research University, Paris, France
    Contribution
    Data curation, Software, Validation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0591-2466
  3. Aude Jary

    Sorbonne Université, Inserm, iPLESP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  4. Valentin Leducq

    Sorbonne Université, Inserm, iPLESP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  5. Valentine Marie Ferré

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  6. Anna Maria Franco Yusti

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  7. Léna Daniel

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  8. Charlotte Charpentier

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  9. Samuel Lebourgeois

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  10. Karen Zafilaza

    1. Sorbonne Université, Inserm, iPLESP, Paris, France
    2. Service de Virologie, Hôpital de la Pitié-Salpêtrière, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  11. Vincent Calvez

    1. Sorbonne Université, Inserm, iPLESP, Paris, France
    2. Service de Virologie, Hôpital de la Pitié-Salpêtrière, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  12. Diane Descamps

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  13. Anne-Geneviève Marcelin

    1. Sorbonne Université, Inserm, iPLESP, Paris, France
    2. Service de Virologie, Hôpital de la Pitié-Salpêtrière, AP-HP, Paris, France
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  14. Benoit Visseaux

    1. Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    2. Service de Virologie, Hôpital Bichat Claude Bernard, AP-HP, Paris, France
    Contribution
    Conceptualization, Supervision, Funding acquisition, Validation, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  15. Antoine Bridier-Nahmias

    Université Paris Cité and Sorbonne Paris Nord, Inserm, IAME, Paris, France
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Methodology, Project administration, Writing - review and editing
    For correspondence
    antoine.bridier-nahmias@inserm.fr
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0376-6840

Funding

No external funding was received for this work.

Acknowledgements

We gratefully thank all the laboratories and organizations that deposited SARS-CoV-2 sequences on the GISAID database, without which this study would not have been possible. We would like to thank the editors of eLife and the reviewers, who provided extremely valuable feedback and greatly improved the quality of this work. Funding This study was supported in part by the ANRS MIE (Agence Nationale de la Recherche sur le SIDA et les hépatites virales – Maladies Infectieuses Emergentes), the FRM (Fondation pour la Recherche Médicale), and the Inserm UMR1137 unit.

Version history

  1. Received: August 8, 2022
  2. Preprint posted: August 11, 2022 (view preprint)
  3. Accepted: April 15, 2023
  4. Accepted Manuscript published: April 26, 2023 (version 1)
  5. Version of Record published: May 11, 2023 (version 2)

Copyright

© 2023, Coppée et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 712
    views
  • 110
    downloads
  • 4
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Romain Coppée
  2. François Blanquart
  3. Aude Jary
  4. Valentin Leducq
  5. Valentine Marie Ferré
  6. Anna Maria Franco Yusti
  7. Léna Daniel
  8. Charlotte Charpentier
  9. Samuel Lebourgeois
  10. Karen Zafilaza
  11. Vincent Calvez
  12. Diane Descamps
  13. Anne-Geneviève Marcelin
  14. Benoit Visseaux
  15. Antoine Bridier-Nahmias
(2023)
Phylodynamics of SARS-CoV-2 in France, Europe, and the world in 2020
eLife 12:e82538.
https://doi.org/10.7554/eLife.82538

Share this article

https://doi.org/10.7554/eLife.82538

Further reading

    1. Epidemiology and Global Health
    Caroline Krag, Maria Saur Svane ... Tinne Laurberg
    Research Article

    Background:

    Comorbidity with type 2 diabetes (T2D) results in worsening of cancer-specific and overall prognosis in colorectal cancer (CRC) patients. The treatment of CRC per se may be diabetogenic. We assessed the impact of different types of surgical cancer resections and oncological treatment on risk of T2D development in CRC patients.

    Methods:

    We developed a population-based cohort study including all Danish CRC patients, who had undergone CRC surgery between 2001 and 2018. Using nationwide register data, we identified and followed patients from date of surgery and until new onset of T2D, death, or end of follow-up.

    Results:

    In total, 46,373 CRC patients were included and divided into six groups according to type of surgical resection: 10,566 Right-No-Chemo (23%), 4645 Right-Chemo (10%), 10,151 Left-No-Chemo (22%), 5257 Left-Chemo (11%), 9618 Rectal-No-Chemo (21%), and 6136 Rectal-Chemo (13%). During 245,466 person-years of follow-up, 2556 patients developed T2D. The incidence rate (IR) of T2D was highest in the Left-Chemo group 11.3 (95% CI: 10.4–12.2) per 1000 person-years and lowest in the Rectal-No-Chemo group 9.6 (95% CI: 8.8–10.4). Between-group unadjusted hazard ratio (HR) of developing T2D was similar and non-significant. In the adjusted analysis, Rectal-No-Chemo was associated with lower T2D risk (HR 0.86 [95% CI 0.75–0.98]) compared to Right-No-Chemo.

    For all six groups, an increased level of body mass index (BMI) resulted in a nearly twofold increased risk of developing T2D.

    Conclusions:

    This study suggests that postoperative T2D screening should be prioritised in CRC survivors with overweight/obesity regardless of type of CRC treatment applied.

    Funding:

    The Novo Nordisk Foundation (NNF17SA0031406); TrygFonden (101390; 20045; 125132).

    1. Epidemiology and Global Health
    Xiaoxin Yu, Roger S Zoh ... David B Allison
    Review Article

    We discuss 12 misperceptions, misstatements, or mistakes concerning the use of covariates in observational or nonrandomized research. Additionally, we offer advice to help investigators, editors, reviewers, and readers make more informed decisions about conducting and interpreting research where the influence of covariates may be at issue. We primarily address misperceptions in the context of statistical management of the covariates through various forms of modeling, although we also emphasize design and model or variable selection. Other approaches to addressing the effects of covariates, including matching, have logical extensions from what we discuss here but are not dwelled upon heavily. The misperceptions, misstatements, or mistakes we discuss include accurate representation of covariates, effects of measurement error, overreliance on covariate categorization, underestimation of power loss when controlling for covariates, misinterpretation of significance in statistical models, and misconceptions about confounding variables, selecting on a collider, and p value interpretations in covariate-inclusive analyses. This condensed overview serves to correct common errors and improve research quality in general and in nutrition research specifically.