Exploring the Spatial Distribution of Persistent SARS-CoV-2 Mutations - Leveraging mobility data for targeted sampling

  1. Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany
  2. Center for Sepsis Control and Care, Jena University Hospital/Friedrich Schiller University Jena, Jena, Germany
  3. SMA Development GmbH - epicinsights Agentur für Künstliche Intelligenz und Big Data Analytics, Jena, Germany
  4. Department of Anaesthesiology and Intensive Care, Jena University Hospital, Jena, Germany
  5. Thuringian State Authority for Consumer Protection, Department Health Protection
  6. Methodology and Research Infrastructure, Genome Competence Center (MF1), Robert Koch Institute, Berlin, Germany
  7. Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Berlin, Germany
  8. Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for Geoanthropology, 07745 Jena, Germany

Editors

  • Reviewing Editor
    Amy Wesolowski
    Johns Hopkins Bloomberg School of Public Health, Baltimore, United States of America
  • Senior Editor
    Jos van der Meer
    Radboud University Medical Centre, Nijmegen, Netherlands

Reviewer #1 (Public Review):

Summary:

In "1 Exploring the Spatial Distribution of Persistent SARS-CoV-2 Mutations -Leveraging mobility data for targeted sampling" Spott et al. combine SARS-CoV-2 genomic data alongside granular mobility data to retrospectively evaluate the spread of SARS-CoV-2 alpha lineages throughout Germany and specifically Thuringia. They further prospectively identified districts with strong mobility links to the first district in which BQ.1.1 was observed to direct additional surveillance efforts to these districts. The additional surveillance effort resulted in the earlier identification of BQ.1.1 in districts with strong links to the district in which BQ.1.1 was first observed.

Strengths:

There are two important strengths of this work. The first is the scale and detail in the data that has been generated and analyzed as part of this study. Specifically, the authors use 6,500 SARS-CoV-2 sequences and district-level mobility data within Thuringia. I applaud the authors for making a subset of their analyses public e.g. on the associated micro react page.

Further, the main focus of the article is on the potential utility of mobility-directed surveillance sequences. While I may certainly be mistaken, I have not seen this proposed elsewhere, at least in the context of SARS-CoV-2. The authors were further able to test this concept in a real-world setting during the emergence of BQ.1.1. This is a unique real-world evaluation of a novel surveillance sequencing strategy and there is considerable value in publishing this analysis.

Weaknesses:

The article is quite strong and I find the analyses to generally be rigorous. However, there are places where I believe the text should be modified to slightly weaken the conclusions drawn from the presented analyses. Specific examples include:

- It seems the mobility-guided increased surveillance included only districts with significant mobility links to the origin district and did not include any "control" districts (those without strong mobility links). As such, you can only conclude that increasing sampling depth increased the rate of detection for BQ.1.1., not necessarily that doing so in a mobility-guided fashion provided an additional benefit. I absolutely understand the challenges of doing this in a real-world setting and think that the work remains valuable even with this limitation, but I would like the lack of control districts to be more explicitly discussed.

- Line 313: While this work has reliably shown that the spread of Alpha was slower in Thuringia, I don't think there have been sufficient analyses to conclude that this is due to the lack of transportation hubs. My understanding is that only mobility within Thuringia has been evaluated here and not between Thuringia and other parts of Germany.

- Line 333 (and elsewhere): I'm not convinced, based on the results presented in Figure 2, that the authors have reliably identified a sampling bias here. This is only true if you assume (as in line 235) that the variant was in these districts, but that hasn't actually been demonstrated here. While I recognize that for high-prevalence variants there is a strong correlation between inflow and variant prevalence, low-prevalence variants by definition spread less and may genuinely be missing from some districts. To support this conclusion that they identified a bias, I'd like to see some type of statistical model that is based e.g. on the number of sequences, prevalence of a given variant in other districts, etc. Alternatively, the language can be softened ("putative sampling bias").

Reviewer #2 (Public Review):

In the manuscript, the authors combine SARS-CoV-2 sequence data from a state in Germany and mobility data to help in understanding the movement of the virus and the potential to help decide where to focus sequencing. The global expansion in sequencing capability is a key outcome of the public health response. However, there remains uncertainty about how to maximise the insights the sequence data can give. Improved ability to predict the movement of emergent variants would be a useful public health outcome. Also knowing where to focus sequencing to maximising insights is also key. The presented case study from one State in Germany is therefore a useful addition to the literature. Nevertheless, I have a few comments.

One of the key goals of the paper is to explore whether mobile phone data can help predict the spread of lineages. However, it appears unclear whether this was actually addressed in the analyses. To do this, the authors could hold out data from a period of time, and see whether they can predict where the variants end up being found.

The abstract presents the mobility-guided sampling as a success, however, the results provide a much more mixed result. Ultimately, it's unclear what having this strategy really achieved. In a quickly moving pandemic, it is unclear what hunting for extra sequences of a specific, already identified, variant really does. I'm not sure what public health action would result, especially given the variant has already been identified.

Relatedly, it is unclear to me whether simply relying on spatial distance would not be an alternative simpler approach than mobile phone data. From Figure 2, it seems clear that a simple proximity matrix would work well at reconstructing viral flow. The authors could compare the correlation of spatial, spatial proximity, and CDR data.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation