Research Article

Epidemiology and Global Health

Leveraging mobility data to analyze persistent SARS-CoV-2 mutations and inform targeted genomic surveillance

Institute for Infectious Diseases and Infection Control, Jena University Hospital, Germany
Center for Sepsis Control and Care, Jena University Hospital/Friedrich Schiller University Jena, Germany
SMA Development GmbH - epicinsights Agentur für Künstliche Intelligenz und Big Data Analytics, Germany
Department of Anaesthesiology and Intensive Care, Jena University Hospital, Germany
Thuringian State Authority for Consumer Protection, Department Health Protection, Germany
Methodology and Research Infrastructure, Genome Competence Center (MF1), Robert Koch Institute, Germany
Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Germany
Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for Geoanthropology, Germany
Center for Applied Research, InfectoGnostics Research Campus Jena, Germany

Jan 15, 2025

https://doi.org/10.7554/eLife.94045.3

Open access
Copyright information

eLife Assessment

The authors analyze the relationship between human mobility and genomic data of SARS-CoV-2 using mobile phone mobility data and sequence data and present a solid proof of concept. This useful work was conducted on a fine spatial scale and provides suggestions on how mobility-derived surveillance could be conducted, although these results are mixed. The primary significance of this work is the strong use of large datasets that were highly granular. The authors provide a rigorous study, but with less clear predictive power of mobility to inform transmission patterns.

https://doi.org/10.7554/eLife.94045.3.sa0

Significance of the findings:

Useful: Findings that have focused importance and scope

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Solid: Methods, data and analyses broadly support the claims with only minor weaknesses

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results and discussion
Methods
Data availability
References
Article and author information
Metrics

Abstract

Given the rapid cross-country spread of SARS-CoV-2 and the resulting difficulty in tracking lineage spread, we investigated the potential of combining mobile service data and fine-granular metadata (such as postal codes and genomic data) to advance integrated genomic surveillance of the pandemic in the federal state of Thuringia, Germany. We sequenced over 6500 SARS-CoV-2 Alpha genomes (B.1.1.7) across 7 months within Thuringia while collecting patients’ isolation dates and postal codes. Our dataset is complemented by over 66,000 publicly available German Alpha genomes and mobile service data for Thuringia. We identified the existence and spread of nine persistent mutation variants within the Alpha lineage, seven of which formed separate phylogenetic clusters with different spreading patterns in Thuringia. The remaining two are subclusters. Mobile service data can indicate these clusters’ spread and highlight a potential sampling bias, especially of low-prevalence variants. Thereby, mobile service data can be used either retrospectively to assess surveillance coverage and efficiency from already collected data or to actively guide part of a surveillance sampling process to districts where these variants are expected to emerge. The latter concept was successfully implemented as a proof-of-concept for a mobility-guided sampling strategy in response to the surveillance of Omicron sublineage BQ.1.1. The combination of mobile service data and SARS-CoV-2 surveillance by genome sequencing is a valuable tool for more targeted and responsive surveillance.

Introduction

On March 11, 2020, the World Health Organization (WHO) classified the SARS-CoV-2 virus (severe acute respiratory syndrome coronavirus 2) as a global pandemic due to its rapid spread and high infection rate (Zhu et al., 2020). The airborne virus has since caused significant morbidity and mortality worldwide (https://covid19.who.int/). In an attempt to control its spread, many countries initiated comprehensive surveillance efforts with molecular techniques such as polymerase chain reaction (PCR) and whole genome sequencing (WGS) (COVID-19 Genomics UK, 2020; Corona-Surveillanceverordnung, 2022). Consequently, nearly 15.8 million SARS-CoV-2 sequences have been deposited into the ‘Global Initiative on Sharing All Influenza Data’ database (as of July 21, 2023, GISAID). Many research groups have undertaken studies examining the viral spread by integrating sequencing and epidemiological data to monitor the pandemic and investigate local outbreaks (Meredith et al., 2020; Page et al., 2021). Most of these local projects are part of national surveillance programs such as the UK’s Genomics Consortium (COG-UK) or ‘national genomic surveillance’ in the USA (COVID-19 Genomics UK, 2020; Lambrou et al., 2021). In Germany, the ‘Coronavirus-Surveillanceverordnung’ (CorSurV) enacted by the State Ministry of Health on January 19, 2021, mandated that laboratories with sequencing capabilities process SARS-CoV-2-positive samples, offering financial compensation until April 2023 (Corona-Surveillanceverordnung, 2022).

Bioinformatics workflows developed in Germany, such as poreCov (for Oxford Nanopore data) and CoVpipe2 (for Illumina data), reconstruct SARS-CoV-2 consensus genomes from the sequencing data and prepare the results for upload and submission to the Robert Koch Institute (RKI) (Brandt et al., 2021; Lataretu et al., 2024). As the German government’s public health and biomedical research institute responsible for disease control and prevention, the RKI collected the genomes via the German Electronic Sequence Data Hub (DESH) and integrated them with additional epidemiological information to provide an up-to-date overview of the ongoing viral spread. For keeping track of the rapid SARS-CoV-2 evolution, PANGO (Phylogenetic Assignment of Named Global Outbreak) provides a standard naming convention based on unique mutation profiles and further criteria, resulting in the classification of over 3660 lineages (as of August 2023) (Rambaut et al., 2020; PANGO, 2023). Additionally, the WHO classified important viral lineages as ‘Variants of Concern’ (VOC), ‘Variants of Interest’ (VOI), or ‘Variants under Monitoring’ (VUM), using Greek designations in the past (e.g. ‘Alpha’ [Pango lineage main designation B.1.1.7] or ‘Omicron’ [Pango lineage main designation B.1.1.529]). Further, the WHO also de-escalated former VOCs to reflect the current SARS-CoV-2 variant landscape better. The first defined VOC (now de-escalated), the Alpha lineage, rapidly replaced almost all previously circulating lineages globally by the end of 2020 until the VOC Delta (main lineage B.1.617.2) replaced it in mid-2021 (Washington et al., 2021; Walker et al., 2021; Michaelsen et al., 2022).

To predict or monitor the rapid viral spread throughout regions, various data types, like travel data, passenger volumes, or passive wastewater monitoring, were examined previously (Alpert et al., 2021; O’Toole et al., 2021; Li et al., 2022). Furthermore, different studies explored mobility data with genomic data to retrace the origin and spatial expanse of Alpha or utilized geolocation data to model the spread in metropolitan areas to recreate case trajectories and the impact of mobility restrictions (Kraemer et al., 2021; Chang et al., 2021). Mobility data was also used in Germany during the pandemic, revealing that lockdowns leave distant parts of the country less connected due to the sharp decline in long-distance travel (Schlosser et al., 2020). These studies focused on analyzing residential movement and contact tracing to evaluate and inform health policies but were not applied to active molecular surveillance.

Here, we investigated whether mobile service data and fine-granular metadata (such as postal codes and genomic data) can help predict the spread of the Alpha lineage or guide the sampling for more targeted genomic surveillance with a focus on the German federal state of Thuringia.

Results and discussion

The Alpha lineage spread rapidly through Thuringia, showing a pattern similar to its nationwide spread

Thuringia is a rural federal state in central Germany with a population of 2.1 million and no major airports (overview of Thuringia’s population density in Figure 1—figure supplement 2). We investigated if the spread of the Alpha lineage of SARS-CoV-2 behaved differently compared to the whole of Germany. To understand its spread, we used 289,487 public SARS-CoV-2 genomes from Germany (excluding Thuringia; including 137,024 Alpha genomes) and 7394 genomes from our own sequencing data for Thuringia (including 6307 Alpha genomes) to track Alpha’s occurrence from December 2020 to August 2021 (see Figure 1, Figure 1—figure supplement 1, and Figure 1—source data 1; Figure 1—source data 2; for details, see Methods section ‘Alpha spread datasets’). For Thuringia, district-level data (full postal code) per genome were available, whereas, for Germany, only postal code data of the sending laboratories (referred to as ‘primary diagnostic laboratory’ by the RKI where the SARS-CoV-2-positive sample was detected) and sequencing laboratories were publicly available.

Figure 1 with 3 supplements see all

Download asset Open asset

Total number of all sequenced SARS-CoV-2 samples (purple) and the proportion of the Alpha lineage for all sequenced samples (yellow-red) for each state of Germany and each district of Thuringia.

289,487 publicly available German SARS-CoV-2 genomes and their metadata were used for the general German maps, excluding data from Thuringia. For Thuringia, we always used 7394 genomes and their metadata from our database for the German and Thuringian maps. Please note that for all states except Thuringia, we used the postal code of the sending laboratory as a proxy for the geographical location of a sample. A gray border on the maps of Germany highlights Thuringia.

Figure 1—source code 1 R-script to generate Figure 1, Figure 1—figure supplement 1.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig1-code1-v1.zip
Download elife-94045-fig1-code1-v1.zip
Figure 1—source data 1 Total number of reported SARS-CoV-2 samples and number of reported Alpha lineage samples for each German federal state per month. The respective proportions of Alpha and non-Alpha samples are calculated.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig1-data1-v1.xlsx
Download elife-94045-fig1-data1-v1.xlsx
Figure 1—source data 2 Total number of reported SARS-CoV-2 samples and number of reported Alpha lineage samples for each Thuringian district per month. The respective proportions of Alpha and non-Alpha samples are calculated.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig1-data2-v1.xlsx
Download elife-94045-fig1-data2-v1.xlsx

In late December 2020, six federal states in Germany (from here on called states) reported the first cases of the Alpha variant. Although sequencing was initially low, it gradually increased in the following month. However, the Corona-Surveillance regulation was passed at the end of January 2021, leading to a rapid increase in sampling and sequencing by February since sequencing costs could be reimbursed. Even though Thuringia sequenced a similar amount of SARS-CoV-2 samples compared to other German states (as shown in Figure 1), the proportion of the Alpha variant to other lineages was relatively low. However, the proportion of Alpha increased heavily in February.

By March, Alpha had spread to nearly all states and districts (districts are similar to counties or provinces) in Germany (median: 76.47% Alpha samples among a federal states total sequenced samples compared to 36.03% in February, excluding Thuringia) and Thuringia (median: 85.29 %, up from 50.00% in February). So, there was no noticeable difference in the Alpha proportions between Germany and Thuringia after February. During the summer of June and July 2021, sequencing declined in Germany (including Thuringia; Figure 1—figure supplement 1) due to the decrease in overall daily cases, as reported by Meintrup et al., 2022; Oh et al., 2022.

In summary, the spread of the Alpha lineage in Thuringia lagged roughly 2 weeks behind the general spread of other German federal states but showed similar proportions. This suggests that Thuringia experienced a delay in the initial arrival of Alpha. However, we did not observe any difference in the overall spread afterward. Thuringia was among the first states to adopt new containment measures, including contact limitations, closure of retail shops, and prohibition of tourist journeys (December 14, 2020). Jena, a city in Thuringia, was also the first German city to implement mandatory public masking in March 2020 (Pletz et al., 2023). Contacts were further restricted on January 9th, and people were urged to restrict their movement radius to 15 km, which might explain the delay besides the absence of major airports nearby.

All Thuringian genomes were evenly distributed between other German samples in the phylogenetic time tree (see Figure 1—figure supplement 3). However, due to its rapid spread from February onward, it is difficult to accurately track how the Alpha lineage specifically expanded (point of entries, exact origins, etc.). Consequently, we investigated whether ‘sublineages’ might be identifiable and trackable to address this.

Monitoring of Alpha subclusters in Thuringia reveals temporally and regionally restricted distribution patterns

To identify possible clusters among the Alpha lineage spreading in Thuringia, we called each Alpha genome’s mutations via Nextclade by analyzing them using poreCov (Brandt et al., 2021; Aksamentov et al., 2021). We identified nine clusters out of 70,429 Alpha genomes, based on their mutation profile, time period, and phylogenetic distance (from here on called Alpha subclusters; for details, see Methods ‘Subcluster identification’). All subclusters, their time period, and sample size in Thuringia are summarized in Table 1. An overview of each subcluster (phylogenetic time tree, location, and period) is also provided here as interactive views (see Methods ‘Subcluster identification’). Note that our subcluster definition is similar to the definition of a sublineage. However, PANGO sublineages were rarely defined during the Alpha wave (PANGO designation: Q.1 to Q.8; compared to the Delta and Omicron waves).

Table 1

Overview of nine Alpha subclusters in Thuringia, their sample count, their time period, and their specific mutations that are shared across all members of the subcluster (excluding characteristic Alpha mutations that are shared across all subclusters).

The mutation used to define the subcluster is highlighted in bold.

Designation	Mutations	Number of samples	Time period	Remarks
1	S:H49Y, ORF1a:I841V	44	Feb-May 2021	S:H49Y eases cell entry in S-pseudotyped lentiviral system (Ozono et al., 2021)
2	S:N354K	63	Feb-May 2021	S:N354K slightly impaired mAb h11B11 (Du et al., 2021)
3	S:G496S, ORF1a:E1013K	12	Mar-May 2021	S:G496S: compromises BA.1 replication fitness and changed mAb sensitivities, reduces ACE2 binding affinity, and increases immune evasion (Liang et al., 2022; Kimura et al., 2022; Asif et al., 2022)
4	S:N703D, ORF1a:D1228G, ORF1a:A2123V	51	Mar-May 2021	–
5	S:T716V, N:G204P, ORF1a:D1600N	22	Apr-May 2021	–
6	S:S939F	206	Feb-May 2021	S:S939F: modulates T-cell propensity (Donzelli et al., 2022)
6.1^†	S:V90F, S:S939F	55	Feb-May 2021	–
7	ORF1b:A520V	811	Feb-Jun 2021*	–
7.1 ^‡	S:N185D, ORF1b:A520V, ORF1b:L1504F	40	Feb-May 2021	–

*

Only one sample for June.
†

Branch from subcluster 6.
‡

Branch from subcluster 7.

Eight of these subclusters are based around a specific spike protein mutation, while the other contains a mutation within the ORF1b region. The subcluster 7.1 ‘S:N185D’ branched out from the subcluster 7 ‘ORF1b:A520V’ and subcluster 6.1 ‘S:V90F’ is a branch of subcluster 6 ‘S:S939F’ (see Table 1). These two branched subclusters still carry the specific mutation of their originating subcluster. The subclusters 3, 4, and 5 were observable between two and three months, and the other subclusters over at least 4 months. To investigate these subclusters’ regional spread, each sample was mapped to its Thuringian district based on the resident’s postal code from which it was isolated. We then sorted the samples according to their subclusters and visualized them throughout the subcluster’s observed period. The spread of two representative subclusters is exemplary visualized in Figure 2a, and all the subclusters are available via Figure 2—figure supplement 1 and their data in Figure 2—source data 1. Additionally, all subclusters and their metadata are also available via Microreact (see Methods ‘Subcluster identification’).

Figure 2 with 2 supplements see all

Download asset Open asset

Overview of the subclusters ‘S:S939F’ and ‘ORF1b:A520V’ in Thuringian districts.

(a) Accumulated number of sequenced samples for each subcluster per district and per month. (b) Combined visualization of each district’s ‘inbound mobility’ from other districts (color intensity) and the occurrence of a subcluster sample (red = sample found, blue = no sample found). The inbound mobility of each district (blue color intensity) is shown as a proportion of incoming mobility from other districts with or without an identified sample. The darker the blue color of a district, the higher the proportion of inbound mobility from other districts with an identified subcluster sample (red districts). The light blue color describes that most of the inbound mobility of a district comes from other districts without an identified subcluster sample (blue districts). Numbers refer to district types 1, 2, and 3, as further defined in the main text. The last month of subcluster ‘ORF1b:A520V’ is not visualized, as affected districts were unchanged.

Figure 2—source code 1 R-script to generate Figure 2, Figure 2—figure supplements 1 and 2.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig2-code1-v1.zip
Download elife-94045-fig2-code1-v1.zip
Figure 2—source data 1 Accumulated sample count for each Thuringian district per Alpha subcluster and month.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig2-data1-v1.xlsx
Download elife-94045-fig2-data1-v1.xlsx
Figure 2—source data 2 Total number of incoming trips and numbers of trips coming from all cluster-affiliated districts to each Thuringian district per Alpha subcluster and per month. The proportion of trips from cluster-affiliated districts among the total incoming trips is calculated.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig2-data2-v1.xlsx
Download elife-94045-fig2-data2-v1.xlsx

In Thuringia, the seven main mutation variant clusters were initially identified from distinct districts. Subclusters 6.1 and 7.1 subsequently emerged from the same districts as their parent clusters (6 and 7) after 12 and 13 days, respectively. The subclusters mainly spread regionally confined and not across all of Thuringia (see Figure 2a, Figure 2—figure supplement 1) but were also identified in other states of Germany (see ‘https://microreact.org/’-project). For example, the ‘S:S939F’ subcluster spread across 15 states, with the first samples being isolated outside of Thuringia. The eight Spike-mutation subclusters had expanded between 4 and 12 of the 23 Thuringian districts within the observable time period of each subcluster. They expanded by one to six districts per month, with a greater expansion accompanied mainly by a larger increase in the subcluster sample number. In contrast, the ORF1b-variant even comprised 21 districts and expanded between 2 and 7 districts per month (see Figure 2a). Most of each subcluster’s samples were identified in their region of first occurrence, and no additional samples were found after the given periods.

Several limitations need to be considered. The identified subclusters may have multiple origins or may not originate from Thuringia. Due to the lack of precise zip codes (publicly available German genomes are limited to postal codes of sending and sequencing laboratories), monitoring the subclusters in other states on a district level was impossible. Nevertheless, we could follow how the subclusters developed in Thuringia, even if multiple origins may have affected the overall speed or length of each subcluster’s occurrence.

Our surveillance sampling heavily relies on various institutions and partners, and only a portion of the provided samples can be sequenced (see ‘Sampling’ in Methods). For example, the spread of subcluster ‘S:S939F’ revealed two districts in April where no respective samples were found (Figure 2a) despite being surrounded by districts with ‘S:S939F’-samples present. This could be due to the lack of samples sent to sequencing from those regions or the low prevalence. We, therefore, investigated if mobile service data of residents, in addition to molecular surveillance, might be utilized to counteract this issue.

Mobile service data indicates Alpha subcluster spread and sampling bias

With the aim to predict the subcluster spread and, thereby, reduce surveillance-based sampling bias, we utilized anonymized mobile service data from T-Systems International GmbH. Around 200 million trips were used to determine the number of daily trips between the Thuringian districts. We then combined this information with our fine-granular genomic data to specify each district’s monthly proportion of inbound mobility from subcluster-affiliated districts (see Methods ‘Mobile service data’). The results are visualized in Figure 2b (complete overview in Figure 2—figure supplement 2; data provided in Figure 2—source data 2).

The mobile service data-based assumption of a subcluster’s spread aligned well with the subsequent regional coverage of fast-spreading, highly prevalent subclusters, such as subcluster 7, which covered 811 samples (see Figure 2). In contrast, the assumed spread for the low-prevalence subclusters did not correspond well with the actual occurrence. Yet, adding mobile service data resulted in three different ‘types’ of districts (see Figure 2b, annotated districts). Type 1 included districts with high inbound mobility from areas with an identified variant, where the variant was eventually found afterward, while Type 2 included districts with high inbound mobility from areas with an identified variant, where the variant was never identified. Type 3 included districts not directly connected to a district with an identified variant, but a variant was eventually identified while they border Type 2 district(s).

Our previous analysis of the subclusters’ spreading pattern across the districts, based solely on identified variants, indicated putative missed identifications in some districts due to the seemingly illogical spread to districts without a connection to others (Figure 2a, subcluster ‘S:S939F’). The inclusion of mobile service data revealed some of these districts to be Type 2 districts. This suggests that the specific variant should be identifiable within these districts due to the observed high incoming mobility from districts with identified variants. Type 2 districts were mainly observed for subclusters with low prevalence and, consequently, low sample counts, which are usually more difficult to monitor. For example, we assumed missing identifications in some districts of subclusters 1, 2, and 3, which, through the mobile service data, are now partially identified as Type 2 districts. However, due to their low prevalence, it is also possible that these subclusters have not spread to the indicated districts. Despite analyzing the mobile service data of districts from other federal states than Thuringia, we could not apply them, as the lack of precise location data for samples outside of Thuringia prevented the correct calculation of the incoming mobility. Based on the nine observable clusters, we concluded that mobile service data is a good prediction marker for the spread of high-prevalence variants but, more importantly, a good indication of districts that should have an identified low-prevalence variant. Next, we investigated if mobile service data can improve active surveillance via guiding sample collection for genomic sequencing.

Proof of principle: mobile service data-guided sampling for genomic surveillance for Omicron BQ.1.1

Based on our previous findings, we implemented the ‘mobility-guided’ sampling approach under real pandemic circumstances over 1 month in addition to our active surveillance.

As the subject of investigation, we searched for a newly emerging (based on global news reports) and ideally low prevalent SARS-CoV-2 lineage in Thuringia.

Among the various emerging Omicron sublineages during that time, sublineage BQ.1.1 fulfilled the defined criteria. First isolated in a northwestern Thuringian community with around 20,000 inhabitants on October 5, 2022, we identified this particular sublineage on October 14, 2022, among a routine batch of 42 samples. BQ.1.1 was a low-prevalence sublineage that was identified worldwide (https://outbreak.info/situation-reports?pango=BQ.1.1).

Following its first Thuringian identification, we utilized the latest available dataset of the past 2 years of mobile service data (October 2020 and June 2021) to investigate the residential movements for the community of first detection. Considering the highest incoming mobility from both datasets, we identified 18 communities with high (>10,000), 34 with medium (2001–10,000), and 82 with low (30–2000) number of incoming one-way trips from the originating community (purple triangles in Figure 3a). As a result, we specifically requested all the available samples from the eight communities with the highest incoming mobility. Still, we were restricted to the submission of third parties over whom we had no influence. This led to the inclusion of the following eight communities with the most residential movement from the originating community: four in central and three in NW of Thuringia, one in NW-neighboring state Saxony-Anhalt. The samples requested from central Thuringia were also due to their geographical arrangement as a ‘belt’ in central Thuringia, linking three major cities (see Figure 1—figure supplement 2). Subsequently, we collected 19 additional samples (isolated between October 17 and October 25, 2022; see ‘Guided Sampling’ for October 2022, Figure 3a) besides the randomized sampling strategy. Thus, the sampling depth was increased in communities with high incoming mobility from the first origin.

Figure 3

Download asset Open asset

Overview of the mobility-guided sampling of the Omicron sublineage BQ.1.1 in Thuringia (a) compared to the default randomized sampling (surveillance) in October (b) and November 2022 (c).

The randomized surveillance results in November 2022 (c) have been added to highlight the spreading progress of BQ.1.1. Dots reflect the location of each sample (based on residents’ zip codes). Orange dot: first identified BQ.1.1 sample; red dot: additionally identified BQ.1.1 sample; blue dot: another SARS-CoV-2 lineage. Purple triangles represent the number of one-way trips a community received from the community of the first BQ.1.1 occurrence (orange dot) based on mobility data from October 2020 and June 2021. The same mobility data from mobility-guided sampling (a) were added in grayscales to the randomized surveillance (**b and c**) as a visual reference only. Sampling dots are slightly scattered to improve visibility.

Figure 3—source code 1 R-script to generate Figure 3.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig3-code1-v1.zip
Download elife-94045-fig3-code1-v1.zip
Figure 3—source data 1 Overview of all Thuringian samples collected for the mobility-guided pilot experiment between October 5 and November 25, 2022.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig3-data1-v1.xlsx
Download elife-94045-fig3-data1-v1.xlsx
Figure 3—source data 2 Mobility data and sampling counts of communities sampled for the mobility-guided pilot experiment between October 5 and November 25, 2022. For each community, the incoming one-way trips from the community of BQ.1.1 origin in Thuringia based on the October 2020 and June 2021 datasets, the number of guided and randomized samples per community, and an indication of found BQ.1.1 samples are provided.: https://cdn.elifesciences.org/articles/94045/elife-94045-fig3-data2-v1.xlsx
Download elife-94045-fig3-data2-v1.xlsx

As part of the general Thuringian surveillance, we collected 132 samples for October (covering dates between the 5th and 31st) and 69 samples in November (covering dates between the 1st and 25th; see Figure 3b and c). Randomized sampling was not influenced or adjusted based on the mobility-guided sample collection. Thus, it also contains samples from communities with a mobility link toward the first occurrence of BQ.1.1, as they were part of the regular random collection (see gray triangles in Figure 3b). A complete overview of all samples is provided in Figure 3—source data 1. The mobility datasets from October 2020 and June 2021 for all sampled communities are provided in Figure 3—source data 2.

Among the 19 samples specifically collected based on mobile service data, we identified one additional sample of the specific Omicron sublineage BQ.1.1 in a community with high incoming mobility (n=14, number of trips = 37,499) with a distance of approximately 16 km between both towns. Our randomly sampled routine surveillance strategy did not detect another sample during the same period. This was despite a seven times higher overall sample rate, which included 31 samples from communities with an identified incoming mobility from the community of the first occurrence (October 2022, Figure 3b). Only in the 1-month follow-up were four other samples identified across Thuringia through routine surveillance (November 2022, Figure 3c).

During our attempt to implement the mobility-guided sampling approach in real-time during the pandemic, we encountered three distinct limitations, some of which are commonly observed in surveillance practices. The guided sampling depended on the individual sample submitting institutions, affecting the availability of suitable samples, especially for the communities of interest. By choosing a newly emerging Omicron sublineage for our experiment, spread and, therefore, suitability were uncertain. In our case, BQ.1.1’s prevalence in Thuringia was even lower than expected, and it also remained rare in subsequent months, with only 42 samples found until June 2023, 8 months after the first occurrence in Thuringia. Due to the short preparation time, only mobile service data from the past 2 years and no current data were available. Nevertheless, the available datasets still reflect pandemic movement behavior since the pandemic has been ongoing for 2 years.

In summary, increasing the sampling depth in the suspected regions successfully identified the specified lineage using only a fraction of the samples from the randomized sampling. Conversely, randomized surveillance, the ‘gold standard’ acting as our negative control, did not identify additional samples with similar sampling depths in regions with no or low incoming mobility or even in high mobility regions with less sampling depth. Implementing such an approach effectively under pandemic conditions poses difficult challenges due to the fluctuating sampling sizes. Although the finding of the sample may have been coincidental, our proof-of-concept demonstrated how we can leverage the potential of mobile service data for targeted surveillance sampling.

Conclusion

During the SARS-CoV-2 pandemic, diverse data sources like travel, wastewater, and mobility data have been employed in surveillance and transmission tracking (Alpert et al., 2021; O’Toole et al., 2021; Li et al., 2022; Kraemer et al., 2021; Chang et al., 2021; Schlosser et al., 2020). In the present study, we analyzed over 296,800 German SARS-CoV-2 genomes to examine whether mobile service data can predict the spatial distribution of the Alpha lineage in the German state of Thuringia and how they potentially benefit pandemic surveillance.

A plausible explanation for the delayed spread of the Alpha lineage in Thuringia is the lack of major transport hubs, as Alpha first occurred in federal states with such hubs. Previous studies have already highlighted the impact of major transportation hubs in the spread of Sars-CoV-2 (Alpert et al., 2021; Tegally et al., 2023). However, its impact on the total distribution is limited, and the spread was ultimately comparable between Germany and Thuringia. While our findings on mobile service data may also apply to Germany, we could not verify this because the limited location data of publicly available German genomes prevented in-depth investigations outside of Thuringia. Thus, precise sampling location data are crucial to utilize mobile service data in genomic surveillance, but privacy regulations may restrict access to this data. Shortly after Alpha’s emergence, mutation variants formed like the known sublineages Q.1 to Q.8 and the Thuringian subclusters identified by us. This reflects the ongoing evolution during active circulation and indicates an even greater sublineage diversity, which has not been surveyed as closely as in the subsequent Delta and Omicron waves. By monitoring the nine Thuringian subclusters, rather than focusing solely on the parental lineage B.1.1.7, we were again able to effectively track transmissions and gain a comprehensive understanding of the regional spread. So, it underscores the importance of sequencing in pandemic surveillance to explore such genomic changes and, thereby, keep track of the transmission chains and potential outbreaks.

Mobile service data can support such surveillance in different ways. Previous studies examined the capabilities of mobility data in the context of, e.g., case trajectories, but retrospectively applied to already collected data, it can be used to examine surveillance sampling coverage and possible sampling bias. We exemplified this approach with the Alpha lineage, where mobile service data indicated a putative sampling bias and partially predicted the spread of our Thuringian subclusters.

Another approach is actively guiding the sampling process through mobile service data, which we demonstrated with our proof of principle focusing on the Omicron-lineage BQ.1.1 as a real-life example. This approach could allow for a flexible allocation of surveillance resources, enabling adaptation to specific circumstances and increasing sampling depth in regions where a variant is anticipated. By incorporating guided sampling, much fewer resources may be needed for unguided or random sampling, thereby reducing overall surveillance costs.

Additionally, while this approach is particularly useful for identifying low-prevalence variants, it is not limited to such variants. Still, it can provide a guided, more cost-efficient, low-sampling alternative to general randomized surveillance that can also be applied to other viruses or lineages. For this purpose, pre-generated mobility networks automatically tailored to each state’s unique infrastructure and population dynamics could provide better-targeted sampling guidance rather than simple geographical proximity. However, the feasibility depends on the availability and cost of such mobile service data. Alternatively, financial resources could also be invested directly in increasing sampling capacity and coverage, which ultimately depends on individual factors of the respective surveillance. Mobile service data can also be used with other surveillance approaches and elements. For example, wastewater surveillance can give further indications to supplement guided sampling. At the same time, passenger data offers additional insights into traffic hubs as sources of regional movement.

Methods

Sampling

Starting mid-2020, we initially sequenced hospital-intern samples, transitioning by January 2021 to approximately 43 PCR-positive samples per week: 20 from the hospital’s microbiology department and 23 randomly sourced by the Thuringian State Authority for Consumer Protection (‘Thüringer Landesamt für Verbraucherschutz’ [TLV]).

Until June 2023, our institute sequenced 3770 SARS-CoV-2 samples, and SYNLAB Holding Deutschland GmbH, Bioscientia Healthcare GmbH, and DIANOVIS GmbH provided additional 7800 Thuringian SARS-CoV-2 genomes and their metadata.

Sample preparation and sequencing

RNA isolation used the ZymoResearch ‘Quick-RNA Viral Kit’ (Zymo Research Europe GmbH, Germany, Product-ID: R1035), according to the manufacturer’s instructions with 100 µl patient sample input and a centrifuge speed of 16,000×g.

The viral RNA underwent a reverse transcriptase (RT)-PCR followed by a multiplex-PCR using the ARTIC V1200 primer set, according to Freed and Silander’s SARS-CoV-2 sequencing protocol (version 4, updating to version 5 by March 2021) (Freed and Silander, 2021). Subsequent DNA quantification utilized the Qubit dsDNA HS assay (Invitrogen, USA).

From the amplified DNA, a sequencing library was prepared using the Nanopore SQK-LSK109 and SQK-RBK004 kits (Oxford Nanopore Technologies, Oxford, UK), sequenced for a maximum of 72 hr utilizing an Oxford Nanopore MinION Mk1b sequencer with R.9-flowcells and the MinKNOW software (versions MKE_1013_v1_revBC_11Apr2016 to MKE_1013_v1_revBR_11Apr2016 in the respective period), and analyzed with the software pipeline poreCov (versions 0.3.5–0.11.7; including basecalling, demultiplexing, adapter removal, quality filtering, and genome alignment) to reconstruct consensus genomes (Brandt et al., 2021).

Sequencing data and the respective metadata (e.g. isolation date, sending laboratory details) were submitted to the RKI through DESH. We also collected the postal code of the isolation location or at least of the sending local health authority, storing all data additionally in a local database (Jundzill et al., 2023). Due to data protection, such data is limited on the RKI’s public GitHub repository (https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland; Robert Koch-Institut, 2025), providing instead postal codes of the sequencing and sending laboratories.

Alpha spread datasets

From our local database, we extracted 8397 samples with isolation dates before October 1, 2021. After adding federal state and district information, 993 entries with non-Thuringian locations were excluded. Further, 10 entries with unspecific isolation dates were excluded, yielding 7394 samples (including 6307 Alpha genomes [lineages B.1.1.7 and Q.1 to Q.8]).

The publicly available RKI SARS-CoV-2 dataset was downloaded, containing 1,091,655 genomes with the respective metadata (October 17, 2022; Zenodo-version October 16, 2022) (Koch-Institut, 2022). 789,405 entries, isolated after September 2021, and 59 entries without ‘sending laboratory’ information were removed. For the resulting 302,191 entries, location information (location, federal state, district, longitude, latitude) were added based on the sending laboratory postal code. Five entries with a non-existing postal code and all 12,704 Thuringian samples were removed from the dataset, resulting in 289,487 samples (including 137,024 Alpha genomes). We investigated only Alpha lineage samples collected from September 2020 onward, after the first official reports of the Alpha lineage (Washington et al., 2021).

Analyzing both datasets, we calculated the monthly proportion of Alpha lineage samples in Thuringia and Germany per state/district, dividing December 20 and January 2021 into first and second halves.

Subcluster identification

Using a total of 70,429 German and Thuringian Alpha genomes, a phylogenetic time tree was created (see Supplementary file 2, Supplementary file 3, Supplementary file 4, Supplementary file 5, Supplementary file 6, Supplementary file 7, Supplementary file 8 and Figure 1—figure supplement 3). We determined the frequency of all non-Alpha-specific mutations among the 6522 Thuringian Alpha genomes. We then manually screened for mutations present in at least 20 genomes with a small phylogenetic distance and a time occurrence of at least 2 months. This led to nine mutations, each of them creating a defined cluster covering between 12 and 811 closely related genomes. We only kept mutation information of these nine subclusters in the respective metadata, which, together with the tree file of the phylogenetic time tree, was uploaded to a ‘https://microreact.org/’-project, provided as Supplementary file 1 and found under the following link: here.

Mobile service data

T-Systems International GmbH collected and aggregated mobile service data via the Cell ID method, dividing a geographical area into the so-called traffic cells. Each cell is assigned to exactly one transmitter mast, with a spatial resolution from 500 m × 500 m up to 8 km × 8 km (depending on the transmitter mast network density). Cell phones always register to the closest traffic cell, which is recorded and stored in an origin-destination matrix (ODM). For population representation, the data was extrapolated with Deutsche Telekom’s market share. Due to data privacy, the registration data is combined into movement streams between traffic cells, the status resolution is reduced to 1 hr (greater time intervals = less resolution), and individual traffic cells are grouped into districts. The degree of anonymization (k-value=30, data privacy regulation) removed movement streams with less than 30 participants, resulting in approximately 200 Mio trips in the ODM. SMA Development GmbH analyzed all movements between the single Thuringian districts, adding each Alpha sample’s isolation time and location data (per subcluster). The movements were further divided by months and originating district (subcluster-affiliated vs. -unaffiliated), determining each district’s monthly inbound mobility proportion from cluster-affiliated districts.

Research in context

Evidence before this study

We searched PubMed for studies about the use of mobile service data for surveillance written in English. For the broadest possible search, we included any publication covering mobile data and surveillance aspects, using the following search string: (‘cellular data’ OR ‘cell phone data’ OR ‘mobility data’ OR ‘movement data’ OR ‘migration data’ OR ‘phone data’) AND (‘Surveillance’ OR ‘Monitoring’ OR ‘Survey’ OR ‘Pandemic’ OR ‘Disease’ OR ‘Epidemic’ OR ‘Outbreak’). Our search yielded 1285 publications published between 1966 and 2023. We manually screened all these publications but found no study that applied mobile service data for active, targeted surveillance. Across all studies, the general focus was on tracking contacts or analyzing movements to assess, for instance, the efficiency of non-pharmaceutical interventions or generate prediction models. Some studies suggested targeted surveillance based on their results, but it was not yet applied. Additionally, we used ‘suite.ai’ and ‘chatGPT’ (with BING-search access) to let them search for ‘studies that utilize mobile service data to guide the sampling process for infectious disease surveillance’. While ‘suite.ai’ found two studies and ‘chatGPT’ found another ten studies and reviews, none covered the direct application of the mobility data in active surveillance.

Added value of this study

This study highlights the value of combining mobile service data with fine-granular metadata for integrated genomic surveillance during the SARS-CoV-2 pandemic in a German federal state. We illustrated this strategy with the Omicron sublineage BQ.1.1 and how to guide the sampling processes toward areas where the new variant was expected to emerge. Additionally, we used mobile service data during the pandemic to assess our sampling coverage. Our study is the first to actively guide part of the genomic surveillance process during a pandemic.

Implications of all the available evidence

Efficient molecular surveillance setups are crucial in managing outbreaks from the local to the global scale. Different data sources are investigated to increase this efficiency, addressing factors like the more efficient usage of scarce surveillance resources and the prediction of spread. Extending molecular surveillance with such data should improve the future management of pandemics and outbreaks.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the author(s) used Grammarly in order to correct general English and improve readability. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Data availability

All genomic data (genomes and respective metadata), supportive data, and the R-scripts used to generate all figures are provided as supplementary files. Additionally, they are available at https://doi.org/10.17605/OSF.IO/N5QJ6. These include the mobile service data used in this study, which is available in processed and anonymized form. The original mobile service data can not be made public due to legal reasons/ownership. The genomic data is further available in the provided microreact project (project file available as Supplementary File 1 and under https://doi.org/10.17605/OSF.IO/N5QJ6; https://microreact.org/project/ftR2GfjF6iXtSwbmN4ARTx-thuringianalpha-linclusters#76ir-complete-overview).

The following data sets were generated

1. Spott R
(2024) Open Science Framework
Exploring the Spatial Distribution of Persistent SARS-CoV-2 Mutations - Supplementary files.

https://doi.org/10.17605/OSF.IO/N5QJ6

The following previously published data sets were used

1. Koch-Institut R
(2022) Zenodo
SARS-CoV-2 Sequenzdaten aus Deutschland.

https://doi.org/10.5281/zenodo.7212725

References

(2021) Nextclade: clade assignment, mutation calling and quality control for viral genomes
Journal of Open Source Software 6:3773.

https://doi.org/10.21105/joss.03773
- Google Scholar
1. Alpert T
2. Brito AF
3. Lasek-Nesselquist E
4. Rothman J
5. Valesano AL
6. MacKay MJ
7. Petrone ME
8. Breban MI
9. Watkins AE
10. Vogels CBF
11. Kalinich CC
12. Dellicour S
13. Russell A
14. Kelly JP
15. Shudt M
16. Plitnick J
17. Schneider E
18. Fitzsimmons WJ
19. Khullar G
20. Metti J
21. Dudley JT
22. Nash M
23. Beaubier N
24. Wang J
25. Liu C
26. Hui P
27. Muyombwe A
28. Downing R
29. Razeq J
30. Bart SM
31. Grills A
32. Morrison SM
33. Murphy S
34. Neal C
35. Laszlo E
36. Rennert H
37. Cushing M
38. Westblade L
39. Velu P
40. Craney A
41. Cong L
42. Peaper DR
43. Landry ML
44. Cook PW
45. Fauver JR
46. Mason CE
47. Lauring AS
48. St George K
49. MacCannell DR
50. Grubaugh ND
(2021) Early introductions and transmission of SARS-CoV-2 variant B.1.1.7 in the United States
Cell 184:2595–2604.

https://doi.org/10.1016/j.cell.2021.03.061
- PubMed
- Google Scholar
1. Asif A
2. Ilyas I
3. Abdullah M
4. Sarfraz S
5. Mustafa M
6. Mahmood A
(2022) The comparison of mutational progression in SARS-CoV-2: a short updated overview
Journal of Molecular Pathology 3:201–218.

https://doi.org/10.3390/jmp3040018
- Google Scholar
1. Brandt C
2. Krautwurst S
3. Spott R
4. Lohde M
5. Jundzill M
6. Marquet M
7. Hölzer M
(2021) poreCov-an easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing
Frontiers in Genetics 12:711437.

https://doi.org/10.3389/fgene.2021.711437
- PubMed
- Google Scholar
1. Chang S
2. Pierson E
3. Koh PW
4. Gerardin J
5. Redbird B
6. Grusky D
7. Leskovec J
(2021) Mobility network models of COVID-19 explain inequities and inform reopening
Nature 589:82–87.

https://doi.org/10.1038/s41586-020-2923-3
- PubMed
- Google Scholar
Website
1. Corona-Surveillanceverordnung
(2022) German ‘Corona-Surveillanceverordnung’ as issued on the 18th of January 2021 and updated on the 27th of June 2022
Accessed June 27, 2022.

https://www.bundesgesundheitsministerium.de/fileadmin/Dateien/3_Downloads/C/Coronavirus/Verordnungen/CorSurV_BAnz_AT_19.01.2021_V2.pdf
1. COVID-19 Genomics UK
(2020) An integrated national scale SARS-CoV-2 genomic surveillance network
The Lancet 1:e99–e100.

https://doi.org/10.1016/S2666-5247(20)30054-9
- PubMed
- Google Scholar
1. Donzelli S
2. Spinella F
3. di Domenico EG
4. Pontone M
5. Cavallo I
6. Orlandi G
7. Iannazzo S
8. Ricciuto GM
9. Pellini R
10. Muti P
11. Strano S
12. Ciliberto G
13. Ensoli F
14. Zapperi S
15. La Porta CAM
16. Blandino G
17. Morrone A
18. Pimpinelli F
(2022) Evidence of a SARS-CoV-2 double Spike mutation D614G/S939F potentially affecting immune response of infected subjects
Computational and Structural Biotechnology Journal 20:733–744.

https://doi.org/10.1016/j.csbj.2022.01.021
- PubMed
- Google Scholar
1. Du Y
2. Shi R
3. Zhang Y
4. Duan X
5. Li L
6. Zhang J
7. Wang F
8. Zhang R
9. Shen H
10. Wang Y
11. Wu Z
12. Peng Q
13. Pan T
14. Sun W
15. Huang W
16. Feng Y
17. Feng H
18. Xiao J
19. Tan W
20. Wang Y
21. Wang C
22. Yan J
(2021) A broadly neutralizing humanized ACE2-targeting antibody against SARS-CoV-2 variants
Nature Communications 12:5000.

https://doi.org/10.1038/s41467-021-25331-x
- Google Scholar
1. Freed N
2. Silander O
(2021) SARS-CoV2 genome sequencing protocol (1200bp amplicon “midnight” primer set, using nanopore rapid kit) V5
Protocols 1:btsrnnd6.

https://doi.org/10.17504/protocols.io.btsrnnd6
- Google Scholar
1. Jundzill M
2. Spott R
3. Lohde M
4. Hölzer M
5. Viehweger A
6. Brandt C
(2023) Managing and monitoring a pandemic: showcasing a practical approach for the genomic surveillance of SARS-CoV-2
Database 2023:baad071.

https://doi.org/10.1093/database/baad071
- PubMed
- Google Scholar
1. Kimura I
2. Yamasoba D
3. Nasser H
4. Zahradnik J
5. Kosugi Y
6. Wu J
7. Nagata K
8. Uriu K
9. Tanaka YL
10. Ito J
11. Shimizu R
12. Tan TS
13. Butlertanaka EP
14. Asakura H
15. Sadamasu K
16. Yoshimura K
17. Ueno T
18. Takaori-Kondo A
19. Schreiber G
20. Genotype to Phenotype Japan (G2P-Japan) Consortium
21. Toyoda M
22. Shirakawa K
23. Irie T
24. Saito A
25. Nakagawa S
26. Ikeda T
27. Sato K
(2022) The SARS-CoV-2 spike S375F mutation characterizes the Omicron BA.1 variant
iScience 25:105720.

https://doi.org/10.1016/j.isci.2022.105720
- PubMed
- Google Scholar
Software
1. Koch-Institut R
(2022) SARS-cov-2 sequenzdaten aus deutschland
Zenodo.

https://doi.org/10.5281/zenodo.7844056
1. Kraemer MUG
2. Hill V
3. Ruis C
4. Dellicour S
5. Bajaj S
6. McCrone JT
7. Baele G
8. Parag KV
9. Battle AL
10. Gutierrez B
11. Jackson B
12. Colquhoun R
13. O’Toole Á
14. Klein B
15. Vespignani A
16. COVID-19 Genomics UK (COG-UK) Consortium
17. Volz E
18. Faria NR
19. Aanensen DM
20. Loman NJ
21. du Plessis L
22. Cauchemez S
23. Rambaut A
24. Scarpino SV
25. Pybus OG
(2021) Spatiotemporal invasion dynamics of SARS-CoV-2 lineage B.1.1.7 emergence
Science 373:889–895.

https://doi.org/10.1126/science.abj0113
- PubMed
- Google Scholar
(2021) Genomic surveillance for SARS-CoV-2 variants: predominance of the delta (B.1.617.2) and omicron (B.1.1.529) variants — United States, June 2021–January 2022
MMWR. Morbidity and Mortality Weekly Report 71:206–211.

https://doi.org/10.15585/mmwr.mm7106a4
- Google Scholar
1. Lataretu M
2. Drechsel O
3. Kmiecinski R
4. Trappe K
5. Hölzer M
6. Fuchs S
(2024) Lessons learned: overcoming common challenges in reconstructing the SARS-CoV-2 genome from short-read sequencing data via CoVpipe2
F1000Research 12:1091.

https://doi.org/10.12688/f1000research.136683.1
- Google Scholar
1. Li J
2. Ahmed W
3. Metcalfe S
4. Smith WJM
5. Tscharke B
6. Lynch P
7. Sherman P
8. Vo PHN
9. Kaserzon SL
10. Simpson SL
11. McCarthy DT
12. Thomas KV
13. Mueller JF
14. Thai P
(2022) Monitoring of SARS-CoV-2 in sewersheds with low COVID-19 cases using a passive sampling technique
Water Research 218:118481.

https://doi.org/10.1016/j.watres.2022.118481
- Google Scholar
1. Liang R
2. Ye Z-W
3. Ong CP
4. Qin Z
5. Xie Y
6. Fan Y
7. Tang K
8. Poon VK-M
9. Chan CC-S
10. Yang X
11. Cao H
12. Wang K
13. Sun H
14. Hu B
15. Cai J-P
16. Luo C
17. Chik KK-H
18. Chu H
19. Zheng Y
20. Yuen K-Y
21. Chan JF-W
22. Jin D-Y
23. Yuan S
(2022) The spike receptor-binding motif G496S substitution determines the replication fitness of SARS-CoV-2 Omicron sublineage
Emerging Microbes & Infections 11:2093–2101.

https://doi.org/10.1080/22221751.2022.2111977
- PubMed
- Google Scholar
(2022) A comparison of Germany and the United Kingdom indicates that more SARS-CoV-2 circulation and less restrictions in the warm season might reduce overall COVID-19 burden
Life 12:953.

https://doi.org/10.3390/life12070953
- PubMed
- Google Scholar
1. Meredith LW
2. Hamilton WL
3. Warne B
4. Houldcroft CJ
5. Hosmillo M
6. Jahun AS
7. Curran MD
8. Parmar S
9. Caller LG
10. Caddy SL
11. Khokhar FA
12. Yakovleva A
13. Hall G
14. Feltwell T
15. Forrest S
16. Sridhar S
17. Weekes MP
18. Baker S
19. Brown N
20. Moore E
21. Popay A
22. Roddick I
23. Reacher M
24. Gouliouris T
25. Peacock SJ
26. Dougan G
27. Török ME
28. Goodfellow I
(2020) Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study
The Lancet. Infectious Diseases 20:1263–1272.

https://doi.org/10.1016/S1473-3099(20)30562-4
- PubMed
- Google Scholar
(2022) Introduction and transmission of SARS-CoV-2 lineage B.1.1.7, Alpha variant, in Denmark
Genome Medicine 14:47.

https://doi.org/10.1186/s13073-022-01045-7
- PubMed
- Google Scholar
(2022) Advancing precision vaccinology by molecular and genomic surveillance of severe acute respiratory syndrome coronavirus 2 in germany, 2021
Clinical Infectious Diseases 75:S110–S120.

https://doi.org/10.1093/cid/ciac399
- PubMed
- Google Scholar
(2021) Tracking the international spread of SARS-CoV-2 lineages B.1.1.7 and B.1.351/501Y-V2 with grinch
Wellcome Open Research 6:121.

https://doi.org/10.12688/wellcomeopenres.16661.2
- PubMed
- Google Scholar
1. Ozono S
2. Zhang Y
3. Ode H
4. Sano K
5. Tan TS
6. Imai K
7. Miyoshi K
8. Kishigami S
9. Ueno T
10. Iwatani Y
11. Suzuki T
12. Tokunaga K
(2021) SARS-CoV-2 D614G spike mutation increases entry efficiency with enhanced ACE2-binding affinity
Nature Communications 12:848.

https://doi.org/10.1038/s41467-021-21118-2
- PubMed
- Google Scholar
1. Page AJ
2. Mather AE
3. Le-Viet T
4. Meader EJ
5. Alikhan N-F
6. Kay GL
7. de Oliveira Martins L
8. Aydin A
9. Baker DJ
10. Trotter AJ
11. Rudder S
12. Tedim AP
13. Kolyva A
14. Stanley R
15. Yasir M
16. Diaz M
17. Potter W
18. Stuart C
19. Meadows L
20. Bell A
21. Gutierrez AV
22. Thomson NM
23. Adriaenssens EM
24. Swingler T
25. Gilroy RAJ
26. Griffith L
27. Sethi DK
28. Aggarwal D
29. Brown CS
30. Davidson RK
31. Kingsley RA
32. Bedford L
33. Coupland LJ
34. Charles IG
35. Elumogo N
36. Wain J
37. Prakash R
38. Webber MA
39. Smith SJL
40. Chand M
41. Dervisevic S
42. O’Grady J
43. The COVID-19 Genomics UK (COG-UK) Consortium
(2021) Large-scale sequencing of SARS-CoV-2 genomes from one region allows detailed epidemiology and enables local outbreak management
Microbial Genomics 7:589.

https://doi.org/10.1099/mgen.0.000589
- Google Scholar
Software
1. PANGO
(2023) PANGO SARS-cov-2 lineage overview, version 2614dc1
Github.

https://github.com/cov-lineages/lineages-website/blob/master/_data/lineage_data.full.json
(2023) Introduction of mandatory masking in health care and community: experience from Jena, Germany
Infection 51:1587–1590.

https://doi.org/10.1007/s15010-023-02015-w
- PubMed
- Google Scholar
1. Rambaut A
2. Holmes EC
3. O’Toole Á
4. Hill V
5. McCrone JT
6. Ruis C
7. du Plessis L
8. Pybus OG
(2020) A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology
Nature Microbiology 5:1403–1407.

https://doi.org/10.1038/s41564-020-0770-5
- Google Scholar
Software
1. Robert Koch-Institut
(2025) SARS-cov-2-sequenzdaten_aus_deutschland, version b728725
GitHub.

https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland
(2020) COVID-19 lockdown induces disease-mitigating structural changes in mobility networks
PNAS 117:32883–32890.

https://doi.org/10.1073/pnas.2012326117
- PubMed
- Google Scholar
1. Tegally H
2. Wilkinson E
3. Tsui JL-H
4. Moir M
5. Martin D
6. Brito AF
7. Giovanetti M
8. Khan K
9. Huber C
10. Bogoch II
11. San JE
12. Poongavanan J
13. Xavier JS
14. Candido D
15. Romero F
16. Baxter C
17. Pybus OG
18. Lessells RJ
19. Faria NR
20. Kraemer MUG
21. de Oliveira T
(2023) Dispersal patterns and influence of air travel during the global expansion of SARS-CoV-2 variants of concern
Cell 186:3277–3290.

https://doi.org/10.1016/j.cell.2023.06.001
- PubMed
- Google Scholar
1. Walker AS
2. Vihta K-D
3. Gethings O
4. Pritchard E
5. Jones J
6. House T
7. Bell I
8. Bell JI
9. Newton JN
10. Farrar J
11. Diamond I
12. Studley R
13. Rourke E
14. Hay J
15. Hopkins S
16. Crook D
17. Peto T
18. Matthews PC
19. Eyre DW
20. Stoesser N
21. Pouwels KB
22. Covid-19 Infection Survey Team
(2021) Tracking the emergence of SARS-CoV-2 alpha variant in the United Kingdom
The New England Journal of Medicine 385:2582–2585.

https://doi.org/10.1056/NEJMc2103227
- PubMed
- Google Scholar
1. Washington NL
2. Gangavarapu K
3. Zeller M
4. Bolze A
5. Cirulli ET
6. Schiabor Barrett KM
7. Larsen BB
8. Anderson C
9. White S
10. Cassens T
11. Jacobs S
12. Levan G
13. Nguyen J
14. Ramirez JM
15. Rivera-Garcia C
16. Sandoval E
17. Wang X
18. Wong D
19. Spencer E
20. Robles-Sikisaka R
21. Kurzban E
22. Hughes LD
23. Deng X
24. Wang C
25. Servellita V
26. Valentine H
27. De Hoff P
28. Seaver P
29. Sathe S
30. Gietzen K
31. Sickler B
32. Antico J
33. Hoon K
34. Liu J
35. Harding A
36. Bakhtar O
37. Basler T
38. Austin B
39. MacCannell D
40. Isaksson M
41. Febbo PG
42. Becker D
43. Laurent M
44. McDonald E
45. Yeo GW
46. Knight R
47. Laurent LC
48. de Feo E
49. Worobey M
50. Chiu CY
51. Suchard MA
52. Lu JT
53. Lee W
54. Andersen KG
(2021) Emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States
Cell 184:2587–2594.

https://doi.org/10.1016/j.cell.2021.03.052
- PubMed
- Google Scholar
1. Zhu N
2. Zhang D
3. Wang W
4. Li X
5. Yang B
6. Song J
7. Zhao X
8. Huang B
9. Shi W
10. Lu R
11. Niu P
12. Zhan F
13. Ma X
14. Wang D
15. Xu W
16. Wu G
17. Gao GF
18. Tan W
19. China Novel Coronavirus Investigating and Research Team
(2020) A novel coronavirus from patients with pneumonia in China, 2019
The New England Journal of Medicine 382:727–733.

https://doi.org/10.1056/NEJMoa2001017
- PubMed
- Google Scholar

Article and author information

Author details

Riccardo Spott

Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany

Contribution
Writing – original draft, Writing – review and editing, Sample collection and preparation, sequencing, bioinformatic analysis, literature research

For correspondence
riccardo.spott@gmail.com

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-2103-167X
Mathias W Pletz
1. Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany
2. Center for Sepsis Control and Care, Jena University Hospital/Friedrich Schiller University Jena, Jena, Germany
Contribution
Funding acquisition, Writing – review and editing

Competing interests
No competing interests declared
Carolin Fleischmann-Struzek
1. Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany
2. Center for Sepsis Control and Care, Jena University Hospital/Friedrich Schiller University Jena, Jena, Germany
Contribution
Writing – review and editing, Funding acquisition

Competing interests
No competing interests declared
Aurelia Kimmig

Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Christiane Hadlich

SMA Development GmbH - epicinsights Agentur für Künstliche Intelligenz und Big Data Analytics, Jena, Germany

Contribution
Writing – review and editing, Mobile service data analysis

Competing interests
No competing interests declared
Matthias Hauert

SMA Development GmbH - epicinsights Agentur für Künstliche Intelligenz und Big Data Analytics, Jena, Germany

Contribution
Writing – review and editing, Mobile service data analysis

Competing interests
No competing interests declared
Mara Lohde

Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany

Contribution
Sample collection and preparation, sequencing, bioinformatic analysis

Competing interests
No competing interests declared
Mateusz Jundzill

Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany

Contribution
Database setup and maintenance

Competing interests
No competing interests declared
Mike Marquet

Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany

Contribution
Writing – review and editing, Sample collection and preparation, sequencing, bioinformatic analysis

Competing interests
No competing interests declared
Petra Dickmann

Department of Anaesthesiology and Intensive Care, Jena University Hospital, Jena, Germany

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Ruben Schüchner

Thuringian State Authority for Consumer Protection, Department Health Protection, Bad Langensalza, Germany

Contribution
Writing – review and editing, Sample collection and preparation

Competing interests
No competing interests declared
Martin Hölzer

Methodology and Research Infrastructure, Genome Competence Center (MF1), Robert Koch Institute, Berlin, Germany

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Denise Kühnert
1. Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Berlin, Germany
2. Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for Geoanthropology, Jena, Germany
Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-5657-018X
Christian Brandt
1. Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany
2. Center for Applied Research, InfectoGnostics Research Campus Jena, Jena, Germany
Contribution
Software, Supervision, Project administration, Writing – review and editing, Bioinformatic analysis

Competing interests
No competing interests declared

Funding

Bundesministerium für Bildung und Forschung (13N15745)

Mathias W Pletz

Thüringer Aufbaubank (2021 VF 0035)

Carolin Fleischmann-Struzek

German Research Foundation (512648189)

Riccardo Spott

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Funding: This work was supported by grants from the Federal Ministry of Education and Research (project 'SARS-CoV-2Dx') (grant number 13N15745) and the Thüringer Aufbaubank (project 'Pandemie Analyse mittels Advanced Analytics Methoden') (grant number 2021 VF 0035). We acknowledge support by the German Research Foundation Projekt-Nr. 512648189 and the Open Access Publication Fund of the Thueringer Universitaets- und Landesbibliothek Jena.

Ethics

SARS-CoV-2 samples were collected and sequenced according to the German Corona-Surveillance Act issued by the German Health Ministry ("Verordnung zur molekulargenetischen Surveillance des Coronavirus SARS-CoV-2 (Coronavirus-Surveillanceverordnung - CorSurV) Vom 18. Januar 2021", BAnz AT 19.01.2021 V2, published 19th of Jan 2021). Ethics committee/IRB of the University Hospital Jena gave ethical approval for this work (No. UKJ_2018-1263_2-BO).Sampling was performed by medical institutions during routine clinical and epidemiological surveillance and samples were anonymized prior to analysis. Informed consent and consent to publication were not required in accordance with German health research regulations.

Version history

Sent for peer review: January 10, 2024
Preprint posted: April 15, 2024
Reviewed Preprint version 1: April 26, 2024
Reviewed Preprint version 2: November 29, 2024
Version of Record published: January 15, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.94045. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.