Transmission networks of SARS-CoV-2 in Coastal Kenya during the first two waves: A retrospective genomic study

  1. Charles N Agoti  Is a corresponding author
  2. Lynette Isabella Ochola-Oyier
  3. Simon Dellicour
  4. Khadija Said Mohammed
  5. Arnold W Lambisia
  6. Zaydah R de Laurent
  7. John M Morobe
  8. Maureen W Mburu
  9. Donwilliams O Omuoyo
  10. Edidah M Ongera
  11. Leonard Ndwiga
  12. Eric Maitha
  13. Benson Kitole
  14. Thani Suleiman
  15. Mohamed Mwakinangu
  16. John K Nyambu
  17. John Otieno
  18. Barke Salim
  19. Jennifer Musyoki
  20. Nickson Murunga
  21. Edward Otieno
  22. John N Kiiru
  23. Kadondi Kasera
  24. Patrick Amoth
  25. Mercy Mwangangi
  26. Rashid Aman
  27. Samson Kinyanjui
  28. George Warimwe
  29. My Phan
  30. Ambrose Agweyu
  31. Matthew Cotten
  32. Edwine Barasa
  33. Benjamin Tsofa
  34. D James Nokes
  35. Philip Bejon
  36. George Githinji
  1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kenya
  2. Pwani University, Kenya
  3. Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, Belgium
  4. Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory for Clinical and Epidemiological Virology, KU Leuven, University of Leuven, Belgium
  5. Ministry of Health, Kenya
  6. Nuffield Department of Medicine, University of Oxford, United Kingdom
  7. Medical Research Centre (MRC)/ Uganda Virus Research Institute, Uganda
  8. MRC-University of Glasgow Centre for Virus Research, United Kingdom
  9. University of Warwick, United Kingdom

Abstract

Background:

Detailed understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) regional transmission networks within sub-Saharan Africa is key for guiding local public health interventions against the pandemic.

Methods:

Here, we analysed 1139 SARS-CoV-2 genomes from positive samples collected between March 2020 and February 2021 across six counties of Coastal Kenya (Mombasa, Kilifi, Taita Taveta, Kwale, Tana River, and Lamu) to infer virus introductions and local transmission patterns during the first two waves of infections. Virus importations were inferred using ancestral state reconstruction, and virus dispersal between counties was estimated using discrete phylogeographic analysis.

Results:

During Wave 1, 23 distinct Pango lineages were detected across the six counties, while during Wave 2, 29 lineages were detected; 9 of which occurred in both waves and 4 seemed to be Kenya specific (B.1.530, B.1.549, B.1.596.1, and N.8). Most of the sequenced infections belonged to lineage B.1 (n = 723, 63%), which predominated in both Wave 1 (73%, followed by lineages N.8 [6%] and B.1.1 [6%]) and Wave 2 (56%, followed by lineages B.1.549 [21%] and B.1.530 [5%]). Over the study period, we estimated 280 SARS-CoV-2 virus importations into Coastal Kenya. Mombasa City, a vital tourist and commercial centre for the region, was a major route for virus imports, most of which occurred during Wave 1, when many Coronavirus Disease 2019 (COVID-19) government restrictions were still in force. In Wave 2, inter-county transmission predominated, resulting in the emergence of local transmission chains and diversity.

Conclusions:

Our analysis supports moving COVID-19 control strategies in the region from a focus on international travel to strategies that will reduce local transmission.

Funding:

This work was funded by The Wellcome (grant numbers: 220985, 203077/Z/16/Z, 220977/Z/20/Z, and 222574/Z/21/Z) and the National Institute for Health and Care Research (NIHR), project references: 17/63/and 16/136/33 using UK Aid from the UK government to support global health research, The UK Foreign, Commonwealth and Development Office. The views expressed in this publication are those of the author(s) and not necessarily those of the funding agencies.

Editor's evaluation

It is important to describe patterns of SARS-CoV-2 spread across the globe, beyond high-income countries. This study provides and evaluates SARS-CoV-2 sequence data from ~1200 PCR confirmed COVID-19 patients in Coastal Kenya to characterize phylogenetically likely importation and geographic infection routes, as well as the emergence of geographically distinct SARS-CoV-2 lineages.

https://doi.org/10.7554/eLife.71703.sa0

Introduction

Coronavirus Disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared a pandemic on March 11, 2020 (Hu et al., 2021). By February 28, 2021, there had been at least 114 million confirmed cases of COVID-19 and more than 2.6 million deaths worldwide (https://covid19.who.int/). By the same date, Kenya, an East Africa country with a population of around 50 million people, had reported a total of 105,648 COVID-19 cases and 1856 associated deaths, most of which were associated with two distinct waves of infections (MOH, 2021).

Kenya reported its first COVID-19 case on March 13, 2020. In response, the government outlined a series of countermeasures to minimize the effects of a pandemic locally (Brand et al., 2021). For instance, international travel was restricted, international borders closed, public gatherings prohibited, meetings with over 15 participants forbidden, travel from hotspot counties restricted, places of worship, bars, schools, and other learning institutions closed, and a nationwide dusk-to-dawn curfew enforced (Wambua et al., 2022). Despite these measures, the COVID-19 case numbers consistently grew and serological surveys in June 2020 indicated the local epidemic had progressed more than it could be discerned from the limited laboratory testing (Etyang et al., 2021; Uyoga et al., 2021a).

An analysis of blood donor samples collected in the first quarter of 2021 found that anti-SARS-CoV-2 IgG prevalence in Kenya was 48.5% (Adetifa et al., 2021; Uyoga et al., 2021b). Despite this progression of the local epidemic, understanding of local SARS-CoV-2 spread patterns remains limited (Githinji et al., 2021; Wilkinson et al., 2021). During the first two waves, documented cases were concentrated in the major cities, with Nairobi, the capital, accounting for a cumulative total of ~42% of the cases by February 2021 and Mombasa, a coastal city, accounting for ~8% of the cases (Brand et al., 2021). Here, we focused on the latter and its environs.

Throughout the COVID-19 pandemic period, genomic analysis has been crucial for tracking the spread of SARS-CoV-2 and investigating its transmission pathways (Bugembe et al., 2020; Geoghegan et al., 2020; Oude Munnink et al., 2020; Worobey et al., 2020). Previously, we analysed 311 SARS-CoV-2 early genomes collected in Coastal Kenya during Wave 1 (Githinji et al., 2021). In that study, we showed that several Pango lineages had been introduced into Coastal Kenya, but most of them did not take off, except for lineage B.1 (Githinji et al., 2021).

The second SARS-CoV-2 wave of infections in Kenya began in mid-September 2020 (Figure 1A), and a mathematical modelling study suggested that this wave was primarily driven by the easing of government restrictions (Brand et al., 2021). Here, we utilized a large set of genome sequences from Coastal Kenya to rule out that a new more transmissible or immune evasive variant was not involved in the second wave and investigate patterns of virus importations, lineage temporal dynamics, and local spread patterns within and between the six counties of Coastal Kenya during the first two epidemic waves of SARS-CoV-2 infections in Kenya.

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemic in Kenya and government response.

(A) The reported daily new cases in Kenya from March 2020 to February 2021 shown as 7-day-rolling average demonstrating the first two national SARS-CoV-2 waves of infections. (B) The total reported daily cases for Coastal Kenya counties during the study period shown as 7-day-rolling average per million people. (C) The Kenya government COVID-19 intervention level during the study period as summarized by the Oxford Stringency Index (SI) (Hale et al., 2021).

Figure 1—source data 1

Number of daily new cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Kenya up to February 26, 2021, and the corresponding 7-day-rolling average.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig1-data1-v2.csv
Figure 1—source data 2

Number of daily positive tests per million people for the Coastal Kenya region (all six counties combined).

https://cdn.elifesciences.org/articles/71703/elife-71703-fig1-data2-v2.csv
Figure 1—source data 3

Kenya government Coronavirus Disease 2019 (COVID-19) restrictions stringency index during the study period.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig1-data3-v2.csv

Methods

Study design and population

We analysed SARS-CoV-2 genomic sequences from nasopharyngeal/oropharyngeal (NP/OP) swab samples collected across the six coastal counties of Kenya (Mombasa, Kilifi, Kwale, Taita Taveta, Tana River, and Lamu) between March 17, 2020, and February 26, 2021. Of the six, Mombasa is the most densely populated and has a seaport, an international airport, and an island (Table 1). Kwale and Taita Taveta counties share a border with Tanzania while Lamu includes several islands in the Indian Ocean. Based on the observed nationwide peaks in SARS-CoV-2 infections, we divided the study period into (a) Wave 1, which was the period between March 17 and September 15, 2020, and (b) Wave 2, the period between September 16, 2020, and February 26, 2021 (Figure 1A and B). Wave 2 period began when the number of national daily positive cases started to show a renewed consistent rise after the peak of Wave 1.

Table 1
Number of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) positives reported by the Ministry of Health in Kenya by February 26, 2021, and breakdown of those conducted at KEMRI-Wellcome Trust Research Programme (KWTRP), including status of sequencing.
CountyTotal Population size* (%)Population densityMinistry of Health reported positves (%)RT-PCR tests (KWTRP, %)Positives (KWTRP, %)No. of whole genomes sequenced (%)§
Mombasa1,208,333 (27.9)5,4958450 (66.8)46,143 (55.8)3139 (49.6)468 (41.1)
Kilifi1,453,787 (33.6)1162458 (19.4)12,908 (15.6)1443 (22.8)294 (25.8)
Kwale866,820 (20.0)105436 (3.4)5491 (6.6)436 (6.9)102 (9.0)
Taita Taveta340,671 (7.9)20855 (6.7)14,543 (17.6)855 (13.5)196 (13.5)
Tana River315,943 (7.3)8106 (0.8)877 (1.1)106 (1.7)16 (1.7)
Lamu143,920 (3.3)23350 (2.7)2754 (3.3)350 (5.5)63 (5.5)
Overall4,329,474 (100.0)5212,655 (100.0)82,716 (100.0)6329 (100.0)1139 (100.0)
  1. *

    Number of residents as per the 2019 national population census.

  2. Units here are number of persons per square kilometre.

  3. The Ministry of Health reports compiled results from all testing centres across the country including KWTRP.

  4. §

    The numbers in brackets represents the proportion sequenced of those detected following RT-PCR at the KWTRP.

Ethical statement

The study protocol was reviewed and approved by the Scientific and Ethics Review Committee (SERU) at Kenya Medical Research Institute (KEMRI), Nairobi, Kenya (SERU protocol #4035). The committee did not require individual patient consent for studies using residual diagnostic material to investigate the SARS-CoV-2 genomic epidemiology for improved public health response.

Samples analysed

The study used residue NP/OP swab samples collected by the Ministry of Health (MoH) County Department of Health rapid response teams (RRTs) for SARS-CoV-2 diagnostic testing (Agoti et al., 2020; Nyagwange et al., 2022). The RRTs delivered the NP/OP swabs to the KEMRI-Wellcome Trust Research Programme (KWTRP) laboratories within 48 hr in cool boxes with ice packs. The samples were from persons of any age collected following the MoH eligibility criteria that were periodically revised. Participants included persons with (1) acute respiratory illness symptoms, (2) returning travellers from early COVID-19 hotspot countries (i.e. China, Italy, and Iran), (3) persons seeking entry into Kenya at international border points, (4) contacts of confirmed cases, and (5) persons randomly approached as part of the ‘mass’ testing effort to understand the extent of infection spread in the communities.

SARS-CoV-2 testing and genome sequencing at KWTRP

To purify nucleic acids (NA) in the NP/OP samples, a variety of commercial kits were used, namely, QIAamp Viral RNA Mini Kit, RNeasy QIAcube HT Kit, QIASYMPHONY RNA Kit, TIANamp Virus RNA Kit, Da An Gene Nucleic acid Isolation and Purification Kit, SPIN X Extraction, and RADI COVID-19 detection Kit. The NA extracts were tested for SARS-CoV-2 genetic material using one of the following kits/protocols: (1) the Berlin (Charité) primer-probe set (targeting envelope [E] gene, nucleocapsid [N] or RNA-dependent RNA-polymerase [RdRp]), (2) European Virus Archive – GLOBAL (EVA-g) (targeting E or RdRp genes), (3) Da An Gene Co. detection Kit (targeting N or ORF1ab), (4) BGI RT-PCR kit (targeting ORF1ab), (5) Sansure Biotech Novel Coronavirus (2019-nCoV) Nucleic Acid Diagnostic real-time RT-PCR kit or (6) Standard M kit (targeting E and ORF1ab), and (7) TIB MOLBIOL kit (targeting E gene). Kit/protocol-determined cycle threshold cut-offs were used to define positives (Mohammed et al., 2020).

Though we initially intended to sequence every positive case diagnosed at KWTRP, eventually we settled on sequencing a subset of cases once the epidemic had established (Githinji et al., 2021). Samples sequenced were those with RT-PCR cycle threshold values of <30 with spatial (at county level) and temporal (by month) representation (Figure 2—figure supplement 1). We re-extracted NA from samples selected for sequencing using QIAamp Viral RNA Mini kit following the manufacturer’s instructions and reverse-transcribed the RNA using LunaScript RT SuperMix Kit. The cDNA was amplified using Q5 Hot Start High-Fidelity 2x Mastermix along with the ARTIC nCoV-2019 version 3 primers. The PCR products were run on a 1.5% agarose gel, and for samples whose SARS-CoV-2 amplification was considered successful (amplicons visible) were purified using Agencourt AMPure XP beads and taken forward for library preparation. Sequencing libraries were constructed using Oxford Nanopore Technologies (ONT) ligation sequencing kit and the ONT Native Barcoding Expansion kit as described in the ARTIC protocol (Tyson et al., 2020). Every MinION (Mk1B) run comprised 23 samples and 1 negative (no-template) control.

Genome assembly and lineage assignment

Following MinION sequencing, the FAST5 files were base-called and demultiplexed using the ONT’s software Guppy v3.5–4.2. Consensus SARS-CoV-2 sequences were derived from the reads using the ARTIC bioinformatics pipeline (https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html; last accessed August 3, 2021). A threshold of ×20 read depth was required for a base to be included in the consensus genome; otherwise, it was masked with an N (Githinji et al., 2021). Only complete or near-complete genomes with N count <5980 (i.e. >80% coverage) were further analysed.

The consensus genomes were assigned into Pango lineages as described by Rambaut et al., 2020 using Pangolin v3.1.16 (command line version) with Pango v1.2.101 and PangoLEARN model v2021-11-25 (O’Toole et al., 2021). Contextual information about lineages was obtained from the Pango lineage description list available at https://cov-lineages.org/lineage_list.html (last accessed December 21, 2021). Variants of concern (VOC) and variants of interest (VOI) were designated based on the WHO framework as of May 31, 2021 (https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/). Amino acid sequence changes in the Coastal Kenya genomes were investigated using the Nextclade tool v0.14.2 (Hadfield et al., 2018): https://clades.nextstrain.org/ (last accessed August 3, 2021). Mutations in the Kenyan lineages were visualized using the Stanford University CORONAVIRUS ANTIVIRAL & RESISTANCE Database tool on webpage: https://covdb.stanford.edu/page/mutation-viewer/ (last accessed August 3, 2021).

Global contextual sequences

The global contextual sequences were obtained from GISAID (https://www.gisaid.org/) using the inclusion criteria: (1) presence of the full sample collection date (year–month–day), (2) host recorded as ‘Human’, (3) sample collected between March 1, 2020, and February 28, 2021, and (4) absence of >5980 ambiguous (N) nucleotides. Three analysis datasets were prepared as shown in Figure 5—figure supplement 1.

  1. Set 1 was for investigating the global context and temporal dynamics of the Pango lineages detected in Coastal Kenya. All data available on GISAID assigned Pango lineages detected in Coastal Kenya were included (n = 420,492).

  2. Set 2 was for investigating lineage temporal dynamics across widening scales of observation (Coastal Kenya, across Kenya, Eastern Africa, Africa, and globally). These included all eligible African genomes (n = 21,150) and a subset of non-African genomes selected randomly from ‘master dataset’ using the R randomization command: sample_n(). A maximum of 30 genomes were selected from each country by year and month. The Eastern Africa subset comprised of 5275 genomes from 10 countries, namely, Ethiopia, Uganda, Rwanda, Malawi, Zimbabwe, Zambia, Mozambique, Madagascar, Reunion (a France overseas territory), and The Comoros.

  3. Set 3 was for investigating global phylogenetic relationships. It included genomes from the global subset of lineages detected in Coastal Kenya and then randomly split into two subsamples for tractable subsequent phylogenetic analysis (Figure 5—figure supplement 1).

Phylogenetic analysis

Multiple sequence alignments were prepared in Nextalign v0.1.6 software using the initial Wuhan sequence (Accession number: NC_045512) as the reference with the command:

nextalignrNC_045512.fastaiinput.fasta

The alignment was manually inspected in AliView v1.21 to spot any obvious problems/misalignments. Quick non-bootstrapped neighbour-joining trees were created in SEAVIEW v4.6.4 to identify any aberrant sequences which were henceforth discarded. Maximum likelihood (ML) phylogenies were reconstructed using IQTREE v2.1.3 under the GTR (general time-reversible) model of evolution using the command:

./iqtree2sinput.aligned.fastant4mGTR

The ML tree was linked to the various metadata (lineage, county, source, etc.) in R programming software v4.0.2 and visualized using the R package ‘ggtree’ v2.4.2. The ML phylogenetic tree was subsequently time-calibrated with the program TreeTime, assuming a constant genomic evolutionary rate of SARS-CoV-2 of 8.4 × 10–4 nucleotide substitutions per site per year (Sagulenko et al., 2018), and using the command.

treetimetreinput.aligned.fasta.treefilealninput.aligned.fastaclockrate0.00084datesdates.csv

Outlier sequences deviating from the molecular clock were identified by TreeTime and excluded using the R package ‘treeio’. TempEst v1.5.3 was then used to assess the consistency of nucleotide evolution of the analysed data with a molecular clock. A linear regression of root-to-tip genetic distances against sampling dates was plotted in RStudio and the coefficient of determination (R2) assessed. The resulting trees were visualized using the R package ‘ggtree’ v2.4.2.

Import/export analysis

We estimated the number of viral importation/exportation events between Coastal Kenya and the rest of the world by ancestral state reconstruction from the global ML tree using methods similar to those described by Tegally et al., 2021; Wilkinson et al., 2021. This was achieved using the date and location annotated tree topology to count the number of transitions between Coastal Kenya counties and the rest of the world (‘non-coastal Kenya’) using the Python script developed by the KwaZuluNatal Research Innovation & Sequencing Platform team (KRISP, https://github.com/krisp-kwazulu-natal/SARSCoV2_South_Africa_major_lineages/tree/main/Phylogenetics; last accessed August 4, 2021). The results were plotted in R using the package ‘ggplot2’ v3.3.3. This analysis was repeated with a further two subsamples of the global background data and with also a downsampled set of the Coastal Kenya genomes that were normalized spatially and temporally (Supplementary file 5).

Phylogeographic analyses

We used a discrete phylogeographic approach (Lemey et al., 2009) to investigate the dispersal history of SARS-CoV-2 lineages among coastal counties while trying to mitigate the potential impact of sampling bias by subsampling Kenyan counties according to their relative epidemiological importance during the study period. For this purpose, we implemented a subsampling procedure similar to the one described by Dellicour and colleagues to analyse the circulation of SARS-CoV-2 among New York City boroughs during the first phase of the American epidemic (Dellicour et al., 2021b). Specifically, we performed replicated discrete phylogeographic analyses based on random subset of genomic sequences. Each subset was obtained by subsampling available Kenyan genomic sequences according to the COVID-19 incidence recorded in each sampled county during the study period (Mombasa: 699 cases/100,000 people; Kilifi: 169; Kwale: 50; Taita Taveta: 251; Tana River: 34; and Lamu: 243; Table 1). Because Lamu was the proportionally least sampled county when comparing available number of sequences to local incidence, the sampling intensity of this county (63 genomic sequences sampled for a recorded incidence of 243 cases per 100,000 people) served as reference for downsampling the available number of sequences from the other counties. The resulting downsampled data sets comprised the following number of sequences: n = 181 (Mombasa), 44 (Kilifi), 13 (Kwale), 65 (Taita Taveta), 9 (Tana River), and 63 (Lamu). To investigate the impact of the stochastic subsampling procedure, we performed 10 replicated analyses each based on a distinct subsampling.

Discrete phylogeographic inferences were all performed using the discrete diffusion model (Lemey et al., 2009) implemented in the software package BEAST 1.10 (Suchard et al., 2018). In a first time and following a previously described analytical pipeline (Dellicour et al., 2021a), a preliminary discrete phylogeographic reconstruction was performed to delineate clades corresponding to distinct introduction events of SARS-CoV-2 lineages into Kenya. For this initial phylogeographic analysis, we only considered two possible ancestral locations: ‘Kenya’ and ‘other location’. We conducted Bayesian inference through Markov chain Monte Carlo (MCMC) for 106 iterations and sampled every 103 iterations. To ensure that effective sample size (ESS) values associated with estimated parameters were all >200, we inspected MCMC convergence and mixing properties using the program Tracer 1.7 (Rambaut et al., 2018). We then generated a maximum clade credibility (MCC) tree using the program TreeAnnotator 1.10 (Suchard et al., 2018) after having discarded 10% of sampled trees as burn-in. Finally, we used the resulting MCC tree to delineate phylogenetic clades corresponding to independent introduction events into Kenya.

In a second time, each replicated phylogeographic analysis was conducted along the overall time-scaled phylogenetic tree previously obtained with TreeTime (see the ‘Phylogenetic analysis’ subsection), within which Kenyan clades were delineated in the previous step (preliminary discrete phylogeographic inference), and whose Kenyan tips were subsampled with the function ‘drop.tip’ from the R package ‘ape’ (Paradis and Schliep, 2019) according to the above-described subsampling procedure. In order to identify the best-supported lineage transitions events between sampled coastal counties, we here used the Bayesian stochastic search variable selection (BSSVS) approach (Lemey et al., 2009) implemented in BEAST 1.10 (Suchard et al., 2018). Each MCMC was run for 108 iterations and sampled every 104 iterations. As described above, MCMC convergence and mixing properties were again inspected with Tracer. Statistical supports associated with transition events connecting each pair of sampled counties were obtained by computing adjusted Bayes factor (BF) supports, that is, BF supports that consider the relative abundance of samples by location (Dellicour et al., 2021b; Vrancken et al., 2021).

Epidemiological data

The Kenya daily case data between March 2020 and February 2021 was downloaded from Our World in Data (https://ourworldindata.org/coronavirus/country/kenya; last accessedAugust 4 2021). The daily number of confirmed cases in each county during the study period was obtained from the Kenya Ministry of Health website, which provided the breakdown by county. Metadata for the Coastal Kenya samples was gathered from Ministry of Health case investigation forms delivered together with the samples to KWTRP.

Kenya COVID-19 response

We derived the overall status of Kenya government COVID-19 interventions using the Oxford Stringency Index (SI) available from Our World in Data database (https://ourworldindata.org/coronavirus/country/kenya, last accessed on January 18, 2022; Figure 1C). Oxford SI is based on nine response indicators rescaled to values of 0–100, with 100 being strictest (Hale et al., 2021). The nine response indicators used to form the SI are (1) school closures, (2) workplace closures, (3) cancellation of public events, (4) restrictions on public gatherings, (5) closures of public transport, (6) stay-at-home requirements, (7) public information campaigns, (8) restrictions on internal movements, and (9) international travel controls. The various government COVID-19 measures and the dates they took effect or when they were lifted are provided in Supplementary file 1 and are also reviewed in detail in Brand et al., 2021; Wambua et al., 2022.

Statistical analysis

Statistical data analyses were performed in R v4.0.5. Summary statistics (proportions, means, median, and ranges) were inferred where applicable. The ‘lm’ function in R was used to fit a linear regression model evaluating the relationship between sampling dates and root-to-tip genetic distance in the ML phylogeny. The goodness of fit was inferred from the correlation coefficient. Proportions were compared using chi-square test or Fisher’s exact test as appropriate.

Results

COVID-19 waves in Coastal Kenya and sequencing at KWTRP

By February 2021, Mombasa, Lamu, and Taita Taveta counties had experienced at least two waves of SARS-CoV-2 infections while Kilifi, Kwale, and Tana River had experienced only a single wave of infections (Figure 2A). Up to February 26, 2021, the MoH had reported a cumulative total of 12,655 cases for all the six coastal counties, a majority from Mombasa County (n = 8450, 67%; Table 1). Over the same period, KWTRP tested an aggregate of 82,716 NP/OP swabs from the six coastal counties, 6329 (8%) were positive, distributed by month as shown in Figure 2B. The majority of the KWTRP positives were from Mombasa County (n = 3139, 50%).

Figure 2 with 1 supplement see all
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cases on the Kenyan Coast.

(A) The epidemic curves for each of the six Coastal Kenya counties derived from the daily positive case numbers, 7-day-rolling average, as reported by the Ministry of Health. (B) The monthly count of SARS-CoV-2 RT-PCR tests undertaken at the KEMRI-Wellcome Trust Research Programme (KWTRP) and those positive during the study period. (C) The monthly proportion (black bars, primary y-axis) and number (dashed blue line, secondary y-axis) of samples sequenced from total SARS-CoV-2 positives detected at KWTRP. (D) County distribution of the sequenced 1139 samples by wave number. (E) Linear regression fit of the number of Ministry of Health-reported Coronavirus Disease 2019 (COVID-19) cases in the six Coastal Kenya counties as of February 26, 2021, against the number of SARS-CoV-2 genome sequences obtained at KWTRP during the period.

Figure 2—source data 1

Number of daily positive tests per million people for each of the six Coastal Kenya counties.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig2-data1-v2.csv
Figure 2—source data 2

Total monthly severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) tests at KEMRI-Wellcome Trust Research Programme (KWTRP) and identified positives.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig2-data2-v2.csv
Figure 2—source data 3

Monthly proportion of positive samples whole genome sequenced from the positive tests at KEMRI-Wellcome Trust Research Programme (KWTRP).

https://cdn.elifesciences.org/articles/71703/elife-71703-fig2-data3-v2.csv
Figure 2—source data 4

Number of genomes available across the six coastal counties during the two national waves of infections.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig2-data4-v2.csv
Figure 2—source data 5

Total case count and number genomes available from the six coastal counties.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig2-data5-v2.csv

Among the positive cases, we sequenced 1139 cases (18%) distributed by county as reported in Table 1. The sample flow is summarized in Figure 2—figure supplement 1. The sequenced samples were spread across Wave 1 (n = 499, 44%) and Wave 2 (n = 640, 56%; Figure 2C and D) and corresponded to approximately one sequence for every 11 confirmed cases in the region. A high correlation was observed between the MoH case count and the number of samples sequenced for each county (R2 = 0.9216, Figure 2E).

Demographic characteristics of the sequenced sample

The demographic details of the SARS-CoV-2-positive participants identified at KWTRP are presented in Table 2. Compared to Wave 1, Wave 2 identified slightly older individuals as positive (median age, 34 vs. 35 years); females were identified as positive more often (26% vs. 32%), Kenyans were identified as positive more often (80% vs. 88%), and fewer individuals with international travel histories were identified as positive (12% vs. 4%). Tanzania ranked second in terms of the number of individuals providing sequenced samples (n = 34, 4%). A total of 119 samples (15%) were sequenced from people who had recently travelled internationally (within 14 days). Travel history information was missing for 613 (54%) sequenced cases (Table 2).

Table 2
Demographic characteristics of the positive cases identified at KEMRI-Wellcome Trust Research Programme (KWTRP) in Coastal Kenya by sequencing status and wave period.
CharacteristicTotal positivesOverall sequencing statusTotal positives by wave periodTotal sequenced by wave period
(n = 6329)(%)Sequenced(n = 1139)(%)Non-sequenced(n = 5190)(%)p-ValueWave 1 (n = 2849)(%)Wave 2 (n = 3480)(%)p-ValueWave 1 (n = 499)(%)Wave 2 (n = 640)(%)p-Value
Age category (years)<0.0010.01490.0419
0–9178 (2.8)22 (1.9)156 (3.0)94 (3.3)84 (2.4)11 (2.2)11 (1.7)
10–19472 (7.5)85 (7.5)387 (7.5)185 (6.5)287 (8.2)21 (4.2)64 (10.0)
20–291682 (26.6)234 (20.5)1,448 (27.9)769 (27.2)913 (26.1)94 (18.9)140 (21.8)
30–391653 (26.1)290 (25.5)1,363 (26.3)764 (27.0)889 (25.4)123 (24.7)167 (26.1)
40–491140 (18.0)218 (19.1)922 (17.8)488 (17.2)652 (18.6)88 (17.7)130 (20.3)
50–59605 (9.6)122 (10.7)483 (9.3)247 (8.7)358 (10.2)57 (11.4)65 (10.1)
60–69187 (2.9)46 (4.0)141 (2.7)78 (2.8)109 (3.1)23 (4.6)23 (3.6)
70–7974 (1.1)17 (1.5)57 (1.1)33 (1.2)41 (1.2)7 (1.4)10 (1.6)
80+13 (0.2)4 (0.4)9 (0.2)7 (0.2)6 (0.2)3 (0.6)1 (0.2)
Missing325 (3.25)101 (8.9)224 (4.3)167 (5.9)158 (4.5)71 (14.3)30 (4.7)
Gender0.554<0.0010.1979
Female1896 (29.9)333 (29.2)1563 (30.1)763 (26.9)1,133 (32.4)125 (25.1)208 (32.4)
Male4058 (64.1)686 (60.2)3372 (65.0)1860 (65.7)2198 (62.9)288 (57.8)398 (62.1)
Missing375 (5.9)120 (10.5)255 (4.9)209 (7.4)166 (4.7)85 (17.1)85 (5.5)
Nationality<0.001<0.001<0.001
Kenyan5356 (84.6)870 (76.4)4486 (86.4)2270 (80.2)3086 (88.2)316 (63.5)554 (86.4)
Tanzania131 (2.1)34 (3.0)97 (1.9)81 (2.9)50 (1.4)25 (5.0)9 (1.4)
Uganda16 (0.3)1 (0.1)15 (0.3)10 (0.4)6 (0.2)0 (0.2)4 (0.0)
Ethiopia14 (0.2)4 (0.4)10 (0.2)0 (0.0)14 (0.4)1 (0.2)0 (0.0)
Other117 (1.84)24 (2.1)93 (1.8)46 (1.6)71 (2.0)6 (1.2)18 (2.8)
Missing695 (10.9)206 (18.1)489 (9.4)425 (15.0)270 (7.7)150 (30.1)56 (8.7)
Travel history*<0.001<0.001<0.001
Yes485 (7.7)119 (10.4)366 (7.1)340 (12.0)145 (4.1)83 (16.7)36 (5.6)
No2562 (40.7)407 (35.7)2155 (41.5)1372 (48.4)1190 (34.0)189 (38.0)218 (34.0)
Missing3282 (51.9)613 (53.8)2669 (51.4)1120 (39.5)2162 (61.8)226 (45.4)387 (60.4)
  1. *

    Defined as having moved into Kenya in the previous 14 days or sampled at a point of entry (POE) into Kenya.

  2. p-value calculated using a Pearson’s chi-squared test, for variables where some cells in the table had <5 observations, Fishers' exact test was applied.

Viral lineages circulating in Coastal Kenya

The 1139 Coastal Kenya genomes were classified into 43 Pango lineages, including 4 first identified in Kenya (N.8, B.1.530, B.1.549, and B.1.596.1) and 2 global variants of concern (VOC); B.1.1.7 (Alpha) and B.1.351 (Beta; Table 3). A total of 23 and 29 lineages were observed during Wave 1 and Wave 2, respectively, with 9 lineages detected in both waves (Figure 3A and B). Nineteen lineages were identified in three or more samples with the top six lineages accounting for 89% of the sequenced infections, namely, B.1 (n = 723, 63%), B.1.549 (n = 143, 13%), B.1.1 (n = 57, 5%), B.1.530 (n = 32, 3%), N.8 (n = 31, 3%), and B.1.351 (n = 26, 2%; Table 3). Many of the lineages were first detected in Mombasa (n = 21, 49%) before observation in other counties (Supplementary file 3). The temporal pattern of detection for the lineages across six counties is shown in Figure 3C.

Lineage introductions and temporal dynamics in Coastal Kenya.

(A) Timing of detections of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Pango lineages in the sequenced 1139 Coastal Kenya samples. The circle size scaled by number of daily detections. The vertical dashed line demarcates the date of transition from Wave 1 to Wave 2. (B) Cumulative detections by Pango lineage detections by wave number. The bars are coloured by known information about the lineages; Kenya specific (B.1.530, B.1.549, B.1.596.1, and N.8, red bars) or international lineages (black bars). (C) Monthly distribution of the common lineages identified across the six counties presented as raw counts of the sequenced infections. Lineages detected in less than four cases or not considered a variant of concern (VOC) or variant of interest (VOI) were put together and referred to as ‘other Coastal Kenya lineages’. This group comprises 26 lineages, namely, A.25, B.1.1.33, B.1.1.464, B.1.177.6, B.1.201, B.1.212, B.1.222, B.1.281, B.1.284, B.1.340, B.1.390, B.1.393, B.1.396, B.1.413, B.1.416, B.1.433, B.1.450, B.1.480, B.1.535, B.1.558, B.1.593, B.1.596, B.1.609, B.1.629, B.4, and B.4.7.

Figure 3—source data 1

The total daily number of sequenced cases for each identified lineage across each of the six coastal counties.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig3-data1-v2.csv
Figure 3—source data 2

Total cases sequenced for each 43 identified lineages in the two waves of infection in Kenya.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig3-data2-v2.csv
Figure 3—source data 3

The monthly number of cases for each lineage across the two waves of infection in Kenya.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig3-data3-v2.csv
Table 3
Lineages observed in Coastal Kenya, their county distribution, global history, and variants of concern (VOC)/variants of interest (VOI) status.
LineageFrequency (%)MombasaKilifiKwaleTaita TavetaTana RiverLamuEarliest dateNumber assignedDescription
A22 (0.3)3-136--Decmber 30, 20192224Root of the pandemic lies within lineage A

Predominantly found in China
A.234 (0.1)112---August 14, 202092Predominantly found in Uganda
A.23.16 (0.1)2112--September 21, 20201191International lineage
A.253 (0.0)3-----June 8, 202047Predominantly found in Uganda
B9 (0.1)81----December 24, 20197358Second major haplotype (and first to be discovered)
B.1723 (11.4)328192441191228January 1, 202088,731Predominantly found in Europe, origin corresponds to the Northern Italian outbreak early in 2020
B.1.157 (0.9)336513--January 8, 202049,562Predominantly found in Europe
B.1.1.15 (0.1)122---March 2, 20202827Predominantly found in England
B.1.1.331 (0.0)1-----March 1, 20202117Predominantly found in Brazil
B.1.1.4641 (0.0)--1---April 1, 2020666Predominantly found in USA
B.1.1.5194 (0.1)-2-2--July 30, 202023,815Predominantly found in USA/ Mexico
B.1.1.72 (0.0)2-----September 3, 20201,062,326Alpha variant of concern
B.1.1605 (0.1)-21-11February 2, 202028,128Predominantly found in Europe
B.1.177.61 (0.0)-1----May 29, 2020949Predominantly found in Wales
B.1.1795 (0.1)5-----March 9, 2020242Predominantly found in Denmark
B.1.2011 (0.0)-----1March 6, 2020173Predominantly found in the UK
B.1.2122 (0.0)-2----March 3, 202059Predominantly found in South America
B.1.2222 (0.0)2-----February 24, 2020568Predominantly found in Scotland
B.1.2812 (0.0)-2----April 8, 202041Predominantly found in Bahrain
B.1.2841 (0.0)1-----March 9, 202085Predominantly found in TX,USA
B.1.3401 (0.0)1-----March 13, 2020221Predominantly found in USA
B.1.35126 (0.4)6587--September 1, 202029,720Beta variant of concern
B.1.3901 (0.0)1-----March 25, 202091Predominantly found in USA
B.1.3933 (0.0)21----May 29, 202034Predominantly found in Uganda
B.1.3961 (0.0)-1----April 6, 20201375Predominantly found in USA
B.1.4131 (0.0)----1-March 12, 2020195Predominantly found in USA
B.1.4162 (0.0)11----April 11, 2020594Predominantly found in Senegal/ Gambia, reassigned from B.1.5.12
B.1.4331 (0.0)--1---August 3, 2020314Predominantly found in TX, USA
B.1.4503 (0.0)-3----March 14, 202086Predominantly found in TX, USA
B.1.4801 (0.0)---1--July 3, 2020386Predominantly found in England, Australia, Sweden, Norway
B.1.5251 (0.0)-1----March 28, 20208012Eta variant of interest
B.1.53032 (0.5)34222-1October 1, 2020111Predominantly found in Kenya
B.1.5351 (0.0)1-----March 22, 202029Predominantly found in Australia
B.1.549143 (2.3)42561823-4May 11, 2020171Predominantly found in Kenya and England
B.1.5581 (0.0)1-----April 6, 2020211Predominantly found in USA/ Mexico
B.1.5932 (0.0)----2-July 3, 202099Predominantly found in USA
B.1.5961 (0.0)--1---April 11, 20209968Predominantly found in USA
B.1.596.124 (0.4)12831--September 7, 202083Predominantly found in Kenya
B.1.6092 (0.0)11----March 10, 20201879Predominantly found in USA/ Mexico
B.1.6291 (0.0)1-----July 12, 2020231Lineage circulating in several countries
B.43 (0.0)3-----January 18, 2020386Predominantly found in Iran
B.4.71 (0.0)1-----March 14, 202068Predominantly found in Africa and UAE
N.831 (0.5)21---28June 23, 202015Alias of B.1.1.33.8, predominantly found in Kenya

We detected an average of eight Pango lineages in circulation per month during the study period; the lowest (n = 1) in March 2020 and the highest (n = 17) in November 2020 (Figure 4). The earliest sequences for 7 lineages (16%) came from individuals who reported recent international travel while earliest sequences for 16 lineages (37%) came from individuals who had no history of recent travel, and the earliest sequences for 20 lineages (47%) came from individuals who had no information about travel history (Figure 4—figure supplement 1). Among the individuals with recent travel history, the top five lineages were B.1, A, B.1.1, B.1.549, and B.1.351 (Figure 4—figure supplement 2). Most of the lineages detected in Coastal Kenya were first detected in Mombasa County (n = 14, 58%; Supplementary file 3).

Figure 4 with 2 supplements see all
Lineage detection patterns in Coastal Kenya showing monthly count of total detected lineages, detected new lineages, and commutative total of detected lineages in Coastal Kenya across the study period (secondary axis).
Figure 4—source data 1

New, total circulating and cumulative Pango lineage counts by month in Coastal Kenya.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig4-data1-v2.csv
Figure 4—source data 2

Distribution of the detected Pango lineages by travel history information in Coastal Kenya.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig4-data2-v2.xlsx

SARS-CoV-2 lineage dynamics beyond Coastal Kenya

We evaluated various scales of observation to illustrate the spatial-temporal lineage dynamics during our study period (Figure 5). The genome set was carefully selected to minimize sampling bias (Figure 5—figure supplement 1). A total of 33 Pango lineages were identified for the Kenya sample, 125 lineages for Eastern Africa, 337 lineages for Africa, and 950 lineages globally (Supplementary file 4). The number of lineages detected for the different scales was consistent with the widening scope except for across Kenya where a relatively small number of genomes were available. The top 10 Pango lineages observed at each scale of observation is provided in Supplementary file 5.

Figure 5 with 2 supplements see all
Investigation of lineage spatial temporal dynamics at widening scales of observation.

(A) Monthly prevalence of detected lineages in Coastal Kenya from the sequenced 1139 genomes. (B) Monthly prevalence of detected lineages in Kenya (outside coastal counties) from 605 contemporaneous genomes data is available in GISAID. (C) Monthly prevalence of detected lineages in Eastern Africa from 3531 contemporaneous genomes from 10 countries whose contemporaneous data are available in GISAID. The included countries were Comoros, Ethiopia, Madagascar, Malawi, Mozambique, Reunion, Rwanda, Uganda, Zambia, and Zimbabwe. (D), Monthly distribution of detected lineages in African countries (excluding Eastern Africa). A total of 14,874 contemporaneous genomes from 37 countries that were available in GISAID are included in the analysis. (E) Monthly prevalence of detected lineages in a global subsample of 19,993 contemporaneous genomes from 147 countries that were compiled from GISAID (see detail in ‘Methods’ section). Genomes from African samples are excluded in this panel. (F) Includes all genomes analysed from the scales (A–E). Lineages not among the top 10 in at least one of the five scales of observation investigated have been lumped together as ‘Other lineages’.

Figure 5—source data 1

Monthly counts for the top lineages observed at the different scales of observation analysed.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig5-data1-v2.csv

By January 2021, the lineages B.1.1.7 and B.1.351 were already widely spread across Eastern Africa and Africa but there were only sporadic detections in Coastal Kenya (Figure 5A–D). Waves 1 and 2 Coastal Kenya predominant lineage B.1 occurred in substantial proportions across the different scales early in the pandemic (Wave 1), but its prevalence elsewhere outside Kenya diminished faster overtime compared to the Kenya sample. Greater than 95% (909/950) of the lineages comprising infections in the global subsample (March 1, 2020, and February 28, 2021) were not seen in the Coastal Kenya samples (Supplementary file 5). The global pattern of detection of the 43 locally detected lineages is shown in Figure 5—figure supplement 2. Only two lineages in the Coastal Kenya sampling were not in the global subsample; lineage N.8 and lineage B.1.593 (Figure 5—figure supplement 2).

SARS-CoV-2 genetic diversity in Coastal Kenya

A time-resolved ML phylogeny for the Coastal Kenya genomes with global subsample in the background is provided in Figure 6. This phylogeny showed that (1) the Coastal Kenya genomes were represented across several but not all of the major phylogenetic clusters, (2) some of the Coastal Kenya clusters mapped into known Pango lineages, some of which appeared to expand after introduction, and (3) all six coastal counties appeared to have each had multiple virus introductions with some of the clusters comprising genomes detected across multiple counties (Figure 6). Many of the lineages identified in Coastal Kenya formed monophyletic groups (e.g. A, B.1.549, B.1.530, and N.8) with a few exceptions like lineage B.1, B.1.1, and B.1.351 which occurred on the phylogeny as multiple clusters. The data we analysed showed considerable correlation between the root-to-tip genetic distance and the sampling dates of the genomes (R2 = 0.604; Figure 6—figure supplement 1).

Figure 6 with 2 supplements see all
Global context of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) diversity observed in Coastal Kenya.

A time-resolved global phylogeny that combined 1139 Coastal Kenya SARS-CoV-2 genomes and 9906 global reference sequences. Distinct shapes are used to identify the different Coastal Kenya counties and distinct colours to identify the different lineages. Lineages detected in less than four cases were put together and referred to as ‘other Coastal Kenya lineages’. This group comprises 26 lineages, namely, A.25, B.1.1.33, B.1.1.464, B.1.177.6, B.1.201, B.1.212, B.1.222, B.1.281, B.1.284, B.1.340, B.1.390, B.1.393, B.1.396, B.1.413, B.1.416, B.1.433, B.1.450, B.1.480, B.1.535, B.1.558, B.1.593, B.1.596, B.1.609, B.1.629, B.4, and B.4.7. Sequences not fitting clock-like molecular evolution were removed using TreeTime program (Sagulenko et al., 2018). The analysis included 292 genomes obtained from samples collected in Kenya but outside coastal counties and these are shown as a small, solid black circles.

We found that sequences from individuals reporting recent travel (n = 119) occurred throughout the local phylogeny based on the clustering of the Coastal Kenya genomes (Figure 6—figure supplement 2). Recent travellers infected with lineage B.1 (n = 60, 8%) were spread throughout the phylogeny and were captured in all the six counties of Coastal Kenyan counties. Contrastingly, individuals reporting recent travel and infected with lineage A (n = 19, 86%) and some of the lineage B.1.1 (n = 10, 18%)-infected cases clustered, suggesting a potential common infection source/origin for these lineages. Viral sequences from Kenyan nationals were spread across the tree structure. One striking exception was lineage A-infected cases whose nationality was frequently recorded as missing, but majority were travellers.

For detailed investigation into the local SARS-CoV-2 genetic diversity, we reconstructed mutation-resolved phylogenies for the top nine lineages in Coastal Kenya (Figure 7, and corresponding time-resolved phylogenies presented in Figure 7—figure supplement 1). We observed (1) considerable within-lineage diversity (highest in the predominant lineage B.1), (2) formation of multiple subclusters within these lineages, with some of clusters being county-specific (e.g. cluster of Taita Taveta sequences observed in lineage B.1.530; Figure 7F), and (3) scenarios of local sequences interspersed with global comparison genomes from the same lineage implying multiple import events of these lineages into Kenya, for example, for lineages A, B, B.1, B.1.1, and B.1.351 (Figure 7A–E). Of the four lineages that appeared to be Kenya specific, three (B.1.530, B.1.549, and B.1.596.1) had representation in other parts of Kenya outside of the coastal counties with formation of multiple genetic subclusters (Figure 7F and I). However, lineage N.8, which was mainly detected in Lamu, formed a single monophyletic group (Figure 7H) when co-analysed with its precursor lineage B.1.1.33.

Figure 7 with 2 supplements see all
Mutation-resolved lineage-specific phylogenies for the top nine lineages detected in Coastal Kenya.

The Coastal Kenya genomes are indicated with filled different shapes for the different counties. Genomes from other locations within Kenya are indicated with small solid black circles. (A) Phylogeny of the 22 lineage A Coastal Kenya genome combined 240 global lineage A sequences. (B) Phylogeny of the lineage B that combined 9 Coastal Kenya genomes and 291 global lineage B sequences. (C) Phylogeny for lineage B.1 that combined 723 Coastal Kenya genomes and 5136 global lineage B.1 sequences. (D) Phylogeny for lineage B.1.1 that combined 57 Coastal Kenya genomes and 3451 global lineage B.1.1sequences. (E) Phylogeny for lineage B.1.351 that combined 26 Coastal Kenya genomes and 5613 global lineage B.1.351 sequences. (F) Phylogeny for lineage B.1.530 that combined 32 Coastal Kenya genomes and 45 global lineage B.1.530 sequences. (G) Phylogeny for lineage B.1.549 that combined 143 Coastal Kenya genomes and 14 lineage B.1. 549 sequences from other locations. (H) Phylogeny for lineage N.8 that combined 31 Coastal Kenya genomes of lineage N.8, a single Coastal Kenya genomes of lineage B.1.1.33 and 139 lineage B.1.1.33 global sequences. (I) Phylogeny for lineage B.1.596.1 that combined 24 Coastal Kenya genomes and 22 lineage B.1.596.1 global sequences.

Imports and exports from Coastal Kenya

We used ancestral location state reconstruction of the dated phylogeny (Figure 6) to infer virus import and export (Sagulenko et al., 2018). By this approach, a total of 280 and 105 virus importation and virus exportation events were detected, respectively (Table 4), and distributed between the waves as summarized in Figure 8A and B. Virus importations and exportations into the region occurred predominantly through Mombasa (n = 140, 50%) and (n = 85, 81%), respectively. However, relative to its population size, Mombasa was second to Taita Taveta in importation rate per 100,000 people (Table 4). The majority of the international importation events we detected occurred during Wave 1 (Figure 8B). For the detected 105 virus exportations, 71 (68%) occurred during Wave 1 and 34 (32%) during Wave 2 (Figure 8A and B). We repeated the analysis using the second global subsample with a normalized subsample of the Coastal Kenya genomes accounting for total reported infections per county. The reanalysis found closely aligned results to those revealed by subsample1 (Supplementary file 6).

Figure 8 with 1 supplement see all
Virus importations and exportations from Coastal Kenya.

(A) Alluvium plots stratified by wave number showing the estimated number and flow of importations into and exportations from Coastal Kenya. ‘Global’ refer to origins or destinations outside Kenya while ‘Other Kenya’ refer to origins or destinations within Kenya but outside the Coastal Counties. (B) The raw counts bar plot of location transition events observed within and between Coastal Kenya outside world shown as either virus exportations, importations, or inter-county transmission, these stratified by wave number. (C) Monthly trends of the observed transition events stratified by type. The findings presented in this figure are based on subsample 1.

Figure 8—source data 1

The number of importation and exportation events by county and wave period.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig8-data1-v2.csv
Figure 8—source data 2

The number of importations, inter-county transmission, and exportation events by month.

https://cdn.elifesciences.org/articles/71703/elife-71703-fig8-data2-v2.csv
Table 4
Summary of import and export events and rates into coastal counties populations.
CountyVirus import (%)Import rate (per 100,000)*Virus export (%)Export rate (per 100,000)*
Mombasa140 (50)11.685 (81)7.0
Kilifi53 (19)3.64 (4)0.3
Kwale33 (12)3.84 (4)0.5
Taita Taveta46 (16)13.512 (11)3.5
Tana River2 (<1)0.6--
Lamu6 (2)4.1--
Overall2806.71052.4
  1. *

    Denominator population as per the 2019 national census (see Table 1).

Viral circulation between counties of Coastal Kenya

To explore the pattern of viral circulation within and among counties of Coastal Kenya, we conducted replicated discrete phylogeographic analyses based on random subsets of genomic sequences subsampled according to local incidence (Figure 9). We observe notable differences among the reconstructions of viral lineage dispersal history obtained from the 10 replicated analyses, meaning that the phylogeographic outcome is quite sensitive to the sampling pattern. However, if we look at the similarities among those replicated phylogeographic reconstruction, we can observe that Mombasa tended to act as an important hub associated with relatively important viral circulation and at the origin of numbers of viral dispersal events toward surrounding counties.

Replicated discrete phylogeographic reconstructions of the circulation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) lineages within and among counties of Coastal Kenya.

Each replicated analysis was based on a random subset of genomic sequences subsampled according to local incidence (see the ‘Methods’ section for further detail). We here report the number of lineage dispersal events inferred among (arrows) and within (transparent grey circles) counties, both measures being averaged over posterior trees sampled from each posterior distribution. We here only report among-counties transition events supported by adjusted Bayes factor (BF) values >20, which corresponds to a strong support according to the scale of BF values interpretation of Kass and Raftery, 1995.

Discussion

We report patterns of SARS-CoV-2 introduction and spread in Coastal Kenya during Waves 1 and 2, and estimate approximately 300 independent virus introductions occurred, many in the first six months of the pandemic. Given the limited diagnostic testing capacity and the relatively small number of samples sequenced, it is likely that there were more introductions than calculated here.

Multiple virus introductions occurred even at the county level, with inter-county spread predominating Wave 2. A lockdown was put in place for Mombasa, Kilifi, and Kwale in April 2020 and was later lifted on June 7, 2020, allowing mixing of the population and potential virus spread. It is notable that most imports into and exports from the Coastal Region probably passed through Mombasa, a major commercial, industrial, and tourist destination. This observation highlights the need for continuous and systematic surveillance of lineages circulating in Mombasa timely knowledge of variants entering or circulating within Coastal Kenya.

During Wave 1, we detected 23 Pango lineages in Coastal Kenya with lineage B.1 accounting for 73% of the sequenced infections. B.1 was detected in all counties of Coastal Kenya and was considerably diverse. Lineage B.1 dominance may have been in part driven by the possession of the D614G change in the spike protein, which has been found to enhance viral fitness (Baric, 2020). The strict quarantine and isolation of confirmed cases in the early period may have prevented some of the other lineages introduced from widely spreading, for example, lineage A was limited to travellers.

Lineage N.8 was specific to Lamu County with only three cases recorded elsewhere in Coastal Kenya and three cases elsewhere in Kenya. Lineage N.8 precursor (lineage B.1.1.33) was observed earlier in Brazil. The occurrence of lineage N.8 in Lamu may have arisen from its direct introduction from outside Kenya or introduction as B.1.1.33 followed by local evolution. Determining the exact origin of this lineage is complicated by the sparse genomic surveillance elsewhere Kenya during the study period and indeed for many regions across the world. The N.8 lineage has seven characteristic lineage defining mutations including S: D614G and N: R203K, N: G204R, and N: I292T (Figure 7—figure supplement 1).

During Wave 2, Kilifi, Tana River, and Kwale observed their first major wave of infections. This wave started when most of the government COVID-19 restriction measures had been lowered or removed. For instance, international flights resumed on August 1, 2020, the operation of bars had resumed in September 2020, phased reopening of schools started in October 2020, and the curfew hours were moved to from 11 pm to 4 am. A total of 29 lineages were detected in Coastal Kenya during Wave 2, 9 of these had also been earlier detected during Wave 1.

Genomic data on GISAID database indicated that lineages B.1.530, B.1.549, and B.1.596.1 were predominantly detected in Kenya. The first sequenced cases of all these three lineages were identified in Taita Taveta County but the travel history of these individuals was indicated as ‘unknown’. These lineages may have arisen in Kenya or another East Africa location that had limited genomic surveillance, for example, in Tanzania. Lineage B.1.530 has six characteristic mutations including spike P681H change adjacent to the biologically important furin cleavage site, lineage B.1.549 has seven characteristic mutations, five occurring in the ORF1a or ORF1b while lineage B.1.596.1 has eight lineage defining mutation 3 in ORF6 and three in N protein (Figure 7—figure supplement 1).

Three of the four Kenya-specific lineages were later observed in other countries albeit in small numbers. Lineage B.1.530 was detected in seven countries, namely, Germany (n = 3), the USA (n = 3), Rwanda (n = 1), Australia (n = 1), Japan (n = 1), and the Netherlands (n = 1). Lineage B.1.549 was detected in four countries, namely, England (n = 20), the USA (n = 4), Madagascar (n = 3), and Canada (n = 1). Lineage B.1.596.1 was detected in six countries, namely, the USA (n = 21), Sweden (n = 12), Australia (n = 2), Fiji (n = 1), Finland (n = 1), and India (n = 1). Note that the ancestral location state reconstruction analysis detected up to 105 virus exportation events from the Coastal Kenya counties to the rest of the world.

Lineage B.1.351 was first detected in Kilifi in November 2020 in a local with no travel history and later in two asymptomatic international travellers of South Africa nationality. Lineage B.1.1.7 was detected in a local who presented to a Mombasa clinic in the second week of January 2021 and in the subsequent weeks up to the end of the period covered by this analysis (February 2021), only one additional B.1.1.7 case was detected unlike lineage B.1.351, which continued to be detected sporadically in January and February 2021. Overall, only a minor increase in cases was observed in January–February 2021, despite the arrival of these VOCs before they subsequently resulted in the third national wave of infection recorded March–April 2021.

Despite the very large number of lineages detected globally (>900) during our study period, only a small fraction (n = 41, <5%) of these were documented in Coastal Kenya (O’Toole et al., 2021). Notably, two VOC lineages were already extensively spread across Eastern Africa (B.1.351), Africa (B.1.351), and worldwide (B.1.1.7) in the last quarter of 2020 unlike for Coastal Kenya. Thus, it is interesting that whereas in some countries (e.g. South Africa) the second wave appeared to be majorly driven by emergence of new variants, in Coastal Kenya, this may not have been the case. A lag was observed in the VOC large-scale spread in Coastal Kenya perhaps due to its remoteness and public health measures in place during the period.

Our study contributes to improved understanding on SARS-CoV-2 introduction and transmission patterns in sub-Saharan Africa countries (Bugembe et al., 2020; Butera et al., 2021; Githinji et al., 2021; Mashe et al., 2021; Wilkinson et al., 2021). This knowledge has potential to inform the application of future mitigation strategies especially in light of the growing evidence that SARS-CoV-2 will be endemic in human populations (Planas et al., 2021). Our analysis reveals lineage prevalence patterns and routes of entry into Coastal Kenya. New variants were frequently introduced via Mombasa County, thus surveillance in the city may provide an early warning system of new variant introductions into the region. We also provide evidence that the first two waves of infection in Coastal Kenya were not driven by VOCs, indicating the presence of other important factors impacting and driving SARS-CoV-2 waves of infection.

Sampling bias is a limitation as (1) sequenced and non-sequenced samples differed significantly in the demographic characteristics, (2) only a small proportion of confirmed cases (<10%) were sequenced, prioritizing samples with a Ct value of <30.0, (3) the MoH case identification protocols were repeatedly altered as the pandemic progressed (Githinji et al., 2021), and (4) sampling intensity across the six coastal counties due to accessibility differences. This may have skewed the observed lineage and phylogenetic patterns. There was considerable missingness in metadata (e.g. travel history, nationality, Table 2), which made it hard to integrate genomic and epidemiological data in an analysis. Due to amplicon drop-off, some of the analysed genomes were incomplete impacting the overall phylogenetic signal.

The accuracy of the inferred patterns of virus movement into and from Coastal Kenya is dependent on both the representativeness of our sequenced samples for Coastal Kenya and the comprehensiveness of the comparison data from outside Coastal Kenya. Our sequenced sample was proportional to the number of positive cases reported in the respective Coastal Kenya counties. Also, we carefully selected comparison data to optimize chances of observing introductions occurring into the coastal region (e.g. by using all Africa data). But still there remained some important gaps, for example, non-coastal Kenya genomic data was limited (n = 605). Despite this, we think the results from ancestral state reconstruction indicate that Mombasa is a major gateway for variants entering Coastal Kenya is consistent with (1) the county showing the highest number lineages circulating during the study period compared to the other five remaining coastal counties Kenya, (2) approximately half of the detected lineages in Coastal Kenya had their first case identified in Mombasa, (3) Mombasa had an early wave of infections compared to the other coastal counties, and (4) Mombasa is the most well-connected county in the region to the rest of the world (large international seaport and airport and major railway terminus and several bus terminus).

In conclusion, we show that the first two SARS-CoV-2 waves in Coastal Kenya observed transmission of both newly introduced and potentially locally evolved lineages, many of them being non-VOCs. Approximately 50% of lineage introductions into the region occurred through Mombasa City. Our findings are consistent with mathematical modelling conclusion that it is more likely that relaxation or removal of some of the government COVID-19 countermeasures could have facilitated the second wave of SARS-CoV-2 infections in Kenya (Brand et al., 2021). Based on our observations of local distinctive phylogenies and the predominance of inter-county transmission, we suggest focusing COVID-19 control strategies on local transmission rather than international travel.

Data availability

(1) Sequence data have been deposited in GISAID database under accession numbers provided in Supplement File 2. (2) Source Data files have been provided for Figures 1-2 and 4-10. (3) Source Code associated with the figures has been uploaded (Source Code File 1) and also been made available through Harvard Dataverse.

The following data sets were generated
    1. Agoti CN
    (2021) Harvard Dataverse
    Replication Data for: Genomic surveillance reveals the spread patterns of SARS-CoV-2 in coastal Kenya during the first two waves.
    https://doi.org/10.7910/DVN/4ZZYIM
The following previously published data sets were used
    1. Githinji G
    (2021) Github
    ID 8402936. Genomic epidemiology of SARS-CoV-2 in coastal Kenya (March - July 2020).

References

    1. Kass RE
    2. Raftery AE
    (1995) Bayes Factors
    Journal of the American Statistical Association 90:773–795.
    https://doi.org/10.1080/01621459.1995.10476572
  1. Report
    1. MOH
    (2021)
    Update on COVID-19 in Kenya
    Coronavirus, N.E.C.o.
    1. Wilkinson E
    2. Giovanetti M
    3. Tegally H
    4. San JE
    5. Lessells R
    6. Cuadros D
    7. Martin DP
    8. Rasmussen DA
    9. Zekri ARN
    10. Sangare AK
    11. Ouedraogo AS
    12. Sesay AK
    13. Priscilla A
    14. Kemi AS
    15. Olubusuyi AM
    16. Oluwapelumi AOO
    17. Hammami A
    18. Amuri AA
    19. Sayed A
    20. Ouma AEO
    21. Elargoubi A
    22. Ajayi NA
    23. Victoria AF
    24. Kazeem A
    25. George A
    26. Trotter AJ
    27. Yahaya AA
    28. Keita AK
    29. Diallo A
    30. Kone A
    31. Souissi A
    32. Chtourou A
    33. Gutierrez AV
    34. Page AJ
    35. Vinze A
    36. Iranzadeh A
    37. Lambisia A
    38. Ismail A
    39. Rosemary A
    40. Sylverken A
    41. Femi A
    42. Ibrahimi A
    43. Marycelin B
    44. Oderinde BS
    45. Bolajoko B
    46. Dhaala B
    47. Herring BL
    48. Njanpop-Lafourcade BM
    49. Kleinhans B
    50. McInnis B
    51. Tegomoh B
    52. Brook C
    53. Pratt CB
    54. Scheepers C
    55. Akoua-Koffi CG
    56. Agoti CN
    57. Peyrefitte C
    58. Daubenberger C
    59. Morang’a CM
    60. Nokes DJ
    61. Amoako DG
    62. Bugembe DL
    63. Park D
    64. Baker D
    65. Doolabh D
    66. Ssemwanga D
    67. Tshiabuila D
    68. Bassirou D
    69. Amuzu DSY
    70. Goedhals D
    71. Omuoyo DO
    72. Maruapula D
    73. Foster-Nyarko E
    74. Lusamaki EK
    75. Simulundu E
    76. Ong’era EM
    77. Ngabana EN
    78. Shumba E
    79. El Fahime E
    80. Lokilo E
    81. Mukantwari E
    82. Philomena E
    83. Belarbi E
    84. Simon-Loriere E
    85. Anoh EA
    86. Leendertz F
    87. Ajili F
    88. Enoch FO
    89. Wasfi F
    90. Abdelmoula F
    91. Mosha FS
    92. Takawira FT
    93. Derrar F
    94. Bouzid F
    95. Onikepe F
    96. Adeola F
    97. Muyembe FM
    98. Tanser F
    99. Dratibi FA
    100. Mbunsu GK
    101. Thilliez G
    102. Kay GL
    103. Githinji G
    104. van Zyl G
    105. Awandare GA
    106. Schubert G
    107. Maphalala GP
    108. Ranaivoson HC
    109. Lemriss H
    110. Anise H
    111. Abe H
    112. Karray HH
    113. Nansumba H
    114. Elgahzaly HA
    115. Gumbo H
    116. Smeti I
    117. Ayed IB
    118. Odia I
    119. Ben Boubaker IB
    120. Gaaloul I
    121. Gazy I
    122. Mudau I
    123. Ssewanyana I
    124. Konstantinus I
    125. Lekana-Douk JB
    126. Makangara JCC
    127. Tamfum JJM
    128. Heraud JM
    129. Shaffer JG
    130. Giandhari J
    131. Li J
    132. Yasuda J
    133. Mends JQ
    134. Kiconco J
    135. Morobe JM
    136. Gyapong JO
    137. Okolie JC
    138. Kayiwa JT
    139. Edwards JA
    140. Gyamfi J
    141. Farah J
    142. Nakaseegu J
    143. Ngoi JM
    144. Namulondo J
    145. Andeko JC
    146. Lutwama JJ
    147. O’Grady J
    148. Siddle K
    149. Adeyemi KT
    150. Tumedi KA
    151. Said KM
    152. Hae-Young K
    153. Duedu KO
    154. Belyamani L
    155. Fki-Berrajah L
    156. Singh L
    157. Martins L
    158. Tyers L
    159. Ramuth M
    160. Mastouri M
    161. Aouni M
    162. El Hefnawi M
    163. Matsheka MI
    164. Kebabonye M
    165. Diop M
    166. Turki M
    167. Paye M
    168. Nyaga MM
    169. Mareka M
    170. Damaris MM
    171. Mburu MW
    172. Mpina M
    173. Nwando M
    174. Owusu M
    175. Wiley MR
    176. Youtchou MT
    177. Ayekaba MO
    178. Abouelhoda M
    179. Seadawy MG
    180. Khalifa MK
    181. Sekhele M
    182. Ouadghiri M
    183. Diagne MM
    184. Mwenda M
    185. Allam M
    186. Phan MVT
    187. Abid N
    188. Touil N
    189. Rujeni N
    190. Kharrat N
    191. Ismael N
    192. Dia N
    193. Mabunda N
    194. Hsiao NY
    195. Silochi NB
    196. Nsenga N
    197. Gumede N
    198. Mulder N
    199. Ndodo N
    200. Razanajatovo NH
    201. Iguosadolo N
    202. Judith O
    203. Kingsley OC
    204. Sylvanus O
    205. Peter O
    206. Femi O
    207. Idowu O
    208. Testimony O
    209. Chukwuma OE
    210. Ogah OE
    211. Onwuamah CK
    212. Cyril O
    213. Faye O
    214. Tomori O
    215. Ondoa P
    216. Combe P
    217. Semanda P
    218. Oluniyi PE
    219. Arnaldo P
    220. Quashie PK
    221. Dussart P
    222. Bester PA
    223. Mbala PK
    224. Ayivor-Djanie R
    225. Njouom R
    226. Phillips RO
    227. Gorman R
    228. Kingsley RA
    229. Carr RAA
    230. El Kabbaj S
    231. Gargouri S
    232. Masmoudi S
    233. Sankhe S
    234. Lawal SB
    235. Kassim S
    236. Trabelsi S
    237. Metha S
    238. Kammoun S
    239. Lemriss S
    240. Agwa SHA
    241. Calvignac-Spencer S
    242. Schaffner SF
    243. Doumbia S
    244. Mandanda SM
    245. Aryeetey S
    246. Ahmed SS
    247. Elhamoumi S
    248. Andriamandimby S
    249. Tope S
    250. Lekana-Douki S
    251. Prosolek S
    252. Ouangraoua S
    253. Mundeke SA
    254. Rudder S
    255. Panji S
    256. Pillay S
    257. Engelbrecht S
    258. Nabadda S
    259. Behillil S
    260. Budiaki SL
    261. van der Werf S
    262. Mashe T
    263. Aanniz T
    264. Mohale T
    265. Le-Viet T
    266. Schindler T
    267. Anyaneji UJ
    268. Chinedu U
    269. Ramphal U
    270. Jessica U
    271. George U
    272. Fonseca V
    273. Enouf V
    274. Gorova V
    275. Roshdy WH
    276. Ampofo WK
    277. Preiser W
    278. Choga WT
    279. Bediako Y
    280. Naidoo Y
    281. Butera Y
    282. de Laurent ZR
    283. Sall AA
    284. Rebai A
    285. von Gottberg A
    286. Kouriba B
    287. Williamson C
    288. Bridges DJ
    289. Chikwe I
    290. Bhiman JN
    291. Mine M
    292. Cotten M
    293. Moyo S
    294. Gaseitsiwe S
    295. Saasa N
    296. Sabeti PC
    297. Kaleebu P
    298. Tebeje YK
    299. Tessema SK
    300. Happi C
    301. Nkengasong J
    302. de Oliveira T
    (2021) A year of genomic surveillance reveals how the SARS-CoV-2 pandemic unfolded in Africa
    Science 374:423–431.
    https://doi.org/10.1126/science.abj4336

Decision letter

  1. Mary Kate Grabowski
    Reviewing Editor; Johns Hopkins University, United States
  2. Jos W van der Meer
    Senior Editor; Radboud University Medical Centre, Netherlands

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Transmission networks of SARS-CoV-2 in coastal Kenya during the first two waves: a retrospective genomic study" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and David Serwadda as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

– You will see that both reviewers raised likely sampling bias as a major limitation of the study. A revision should include analyses that address this issue. One approach suggested would be to use the full African sequence data set and a balanced subsample of the non-African sequences.

– Authors should make an attempt to respond to all other issues raised by the reviewers.

Reviewer #1 (Recommendations for the authors):

– It’s quite a pain to review this text because figures are separate from figure legends, and not included at the most suitable position in the main text.

– I don t understand the flow chart, why 24% of PCR + samples sequenced

– line 151: please clarify what version of Pangolin was used

– line 168: given the limited sample sizes, reported percentages with one decimal digit convey a sense of precision that is not warranted. I suggest to remove decimal digits.

– line 172: I recommend to describe sample sizes as well.

– line 182: I did not follow this sentence. Please clarify.

– line 193: To interpret the biweekly distribution of lineages in each location, it is essential to consider sample sizes. How wide actually are Agresti Coull confidence intervals for each lineage in each biweek? Considering the data by 1 month or 2 month intervals may be more suitable, whilst still allowing for replacement dynamics to be recognised.

– line 199: Do you mean January 2021?

– line 225: can you date the SARS-CoV-2 phylogenies and estimate the time of introduction of B117 in your coastal sample, or perhaps more simply report the first date of diagnosis. This could provide indication on the time elapsed until VOCs arrived in coastal Kenya.

– line 231: the statement "Greater than 95% of the lineages comprising infections globally were not seen in the Coastal Kenya samples". Please specify the time frame, and support this statement in a Supplementary Figure or Table. It would also be interesting to report on the proportion of lineages in the Coastal Kenya infections that are not seen in non-Africa global samples, and non-Kenya samples.

– line 244: "with some clusters comprising genomes detected across multiple counties" -> these clusters are not well visualised in Figure 5A, and it would be of interest to visualise and discuss them in more detail. Are they all starting in Mombasa? If needed, Figure 5B-D could be moved to the Supplement. Addendum: having read on, I see that you discuss several phylogenetic subtrees in more detail. Are these all those seen in Figure 5A, or were they somehow selected?

– line 256: "possess significant genetic diversity consistent with wide scale spread" -> I am not sure how "significance" was determined, and why the level of diversity is consistent with wide scale spread.

– line 265: I am not sure what the analyses presented in this paragraph aim to show.

– line 280: can the data in Figure 8A presented in a different way so that numbers or proportions can be easily read off? As this figure stands, the only meaningful qualitative observation I can make is that Mombasa drives inter county spread.

– line 307: what was the purpose of the amino acid evolution analysis?

– line 375: this sentence is not clear to me.

– line 416: this sentence is not clear to me.

– line 419: it is not clear to me how exactly this study informs on optimising local interventions.

– line 520: requiring >80% coverage seems high. Why could >50% coverage not be used for this analysis, and would >50% coverage have resulted in more substantial representation of African sequences in the data sets?

Reviewer #2 (Recommendations for the authors):

As mentioned, this is a very well-written paper that presents several well-constructed analyses for identifying and tracking SARS-CoV-2 lineages across coastal Kenya. However, the conclusions presented aren't particularly surprising, and the paper is mostly descriptive. I think adding some discussion of restrictions data or other local context would make the paper a stronger fit for a journal such as eLife.

Small comments for consideration by the authors:

Figure 1A. This is a small point, but I didn't find the colors particularly intuitive here. Could the authors use lighter colors for fewer cases and darker colors for more cases, rather than the divergent color scheme currently used?

Line 99. Could the authors define the Oxford Stringency index briefly in the main text?

Line 151. I believe that the updated name for lineages assigned by the PANGOLIN software is "Pango lineages".

Figure 3A. Could the authors more clearly indicate which wave each month belongs to? It would also be useful to add some information about when each lineage was present globally, either here or in another figure panel (I know that this information is present in Table 2, but a visual representation would be helpful).

Figure 3C. Could the authors somehow visually indicate which identified lineages were unique to Kenya? Perhaps by cross-hatching those colored bars?

Line 217. I think it is better to describe a lineage as "predominantly found in X country" rather than to call it a "Rwandan lineage", for example.

Line 225. Could the authors be more consistent in their use of Pango lineages versus the WHO Α/Β/etc. nomenclature? Switching back and forth is a bit confusing, especially since the connection between the two nomenclatures isn't explicitly introduced.

Line 256. How can the reader evaluate the authors' claim of "significant diversity" by looking at a time tree? Could the authors provide the maximum likelihood trees for these lineages?

Line 259. I think the authors meant to say "implying multiple export (or import) events" ?

Figure 7. If possible, please avoid using red and green colors in the same figure, as these colors are hard to distinguish for many people.

Line 269. Why is it interesting that some lineages were more or less divergent from the Wuhan reference? Isn't this mostly a function of when the lineages emerged? Is there something worth noting on the root-to-tip plot shown in a previous figure?

Line 289. I think the authors mean ONT not OTN.

https://doi.org/10.7554/eLife.71703.sa1

Author response

Essential revisions:

– You will see that both reviewers raised likely sampling bias as a major limitation of the study. A revision should include analyses that address this issue. One approach suggested would be to use the full African sequence data set and a balanced subsample of the non-African sequences.

We thank the editor for an opportunity to resubmit a revised draft.

In the revised manuscript we have included the full African sequence data set that was available on GISAID as of November 2021, sampled during the study period and passing our set basic quality control filters (n=21,150) and a selected subsample for the rest of the world (n=21,093) – see Figure 3—figure supplement 1. The latter was randomly identified using an in-house R code that forced spatial-temporal representation. Basically, a maximum of 30 genomes were accepted per country per month per year (page 36, lines 886-893). We ended up with a select 42,243 genomes and all these were utilized in the widening scales of observation lineage tracking analysis.

However, for phylogenetic comparisons, we had to undertake subsampling of the comparison global genomes due to the large computational requirements to run a maximum likelihood phylogeny with >40,000 taxa. Our subsampling strategy was to limit the inclusion of global genomes to those Pango lineages identified in Coastal Kenya. We then split the resulting dataset into two groups (each ~9500 taxa) to make analysis tractable, see details in Figure 3—figure supplement 1.

During the review period, an additional 389 genomes from Coastal Kenya became available and these have been added into this revised analysis.

We believe this computational subsampling from all available global and African genomes sequences is perhaps the best argument against bias in our observations.

– Authors should make an attempt to respond to all other issues raised by the reviewers.

We have carefully responded to all comments and concerns raised by the reviewers, see below.

Reviewer #1 (Recommendations for the authors):

– It’s quite a pain to review this text because figures are separate from figure legends, and not included at the most suitable position in the main text.

We apologize for that experience. In the resubmitted manuscript, we have fully followed the editorial guidelines on how to organize the manuscript at resubmission stage, we hope this improves readability.

– I don t understand the flow chart, why 24% of PCR + samples sequenced

We thank the reviewer for spotting this, there was an error here in the previous version of the manuscript which we have now corrected. In the revised manuscript, we have clarified that we sequenced 18% of the RT-PCR positives (i.e. 1139/6329).

– line 151: please clarify what version of Pangolin was used

This has been clarified (page 6: line 226-227, Pangolin version 3.1.16).

– line 168: given the limited sample sizes, reported percentages with one decimal digit convey a sense of precision that is not warranted. I suggest to remove decimal digits.

We concur and have removed the decimal digits.

– line 172: I recommend to describe sample sizes as well.

This is a good suggestion, and this is now included in Page 11 lines 515-526 and page 12, lines 553-564.

– line 182: I did not follow this sentence. Please clarify.

Apologies for the confusing sentence. The sentence has now been removed as the said lineage has been reclassified by updated Pango lineage assignment.

– line 193: To interpret the biweekly distribution of lineages in each location, it is essential to consider sample sizes. How wide actually are Agresti Coull confidence intervals for each lineage in each biweek? Considering the data by 1 month or 2 month intervals may be more suitable, whilst still allowing for replacement dynamics to be recognised.

In the revised draft, we have aggregated the data by month. To show the sample sizes, Figure 4C now presents the number of genomes per month per county rather than proportions. The version of the figure showing proportions has been moved to the supplementary section (Figure 4—figure supplement 2)

We agree that presenting Agresti Coull confidence intervals for each lineage in each biweek is a robust way of evaluating significant temporal changes. However, with many of the Pango lineages occurring in small numbers, deriving such confidence intervals is challenging and will be difficult to interpret.

– line 199: Do you mean January 2021?

Yes, thank you for spotting this, the date has been corrected.

– line 225: can you date the SARS-CoV-2 phylogenies and estimate the time of introduction of B117 in your coastal sample, or perhaps more simply report the first date of diagnosis. This could provide indication on the time elapsed until VOCs arrived in coastal Kenya.

We thank the reviewer for this suggestion. The first date of diagnosis for both B.1.1.7 and B.1.351 in our coastal samples has now been provided in lines page 12, lines 545-546. “Β (lineage B.1.351, n=26, first detected in a Kilifi sample collected on 4th November 2020); and Α lineage B.1.1.7, n=2, first detected on 14th January 2021 in a Mombasa sample.”

– line 231: the statement "Greater than 95% of the lineages comprising infections globally were not seen in the Coastal Kenya samples". Please specify the time frame, and support this statement in a Supplementary Figure or Table. It would also be interesting to report on the proportion of lineages in the Coastal Kenya infections that are not seen in non-Africa global samples, and non-Kenya samples.

The timeframe is 1st March 2020 to 28th February 2021, and this is now clarified in the revised manuscript (page 13 line 612-617) now reads as below.

“Greater than 95% (909/950) of the lineages comprising infections globally for the analyzed sub-sample collected between 1st March 2020 and 28th February 2021 were not seen in the Coastal Kenya samples, Supplementary File 3. Only two lineages were observed in the Coastal Kenya sample but were not in the global subsample; lineage N.8 (predominantly found in Kenya) and lineage B.1.593 (predominantly found in USA), Figure 3.”

We have added a table in the supplementary material (Supplementary File 3) showing the Pango lineage overlap between the different scales of observation.

– line 244: "with some clusters comprising genomes detected across multiple counties" -> these clusters are not well visualised in Figure 5A, and it would be of interest to visualise and discuss them in more detail. Are they all starting in Mombasa? If needed, Figure 5B-D could be moved to the Supplement. Addendum: having read on, I see that you discuss several phylogenetic subtrees in more detail. Are these all those seen in Figure 5A, or were they somehow selected?

Considering the reviewer’s suggestion, previous Figure 5B has now been moved to the supplementary material (currently Figure 8—figure supplement 2) while previous 5C and D are dropped from the manuscript since their key message is no longer core to the manuscript’s main storyline.

In the revised manuscript, Figure 9 provides a zoomed in view of the top 9 lineages identified in the Coastal Kenya (A, B, B.1, B.1.1, B.1.351, B.1.530, B.1.549, B.1.596.1 and N.8). Equivalent phylogenies that are time-resolved are provided in Figure 9 —figure supplement 1.

– line 256: "possess significant genetic diversity consistent with wide scale spread" -> I am not sure how "significance" was determined, and why the level of diversity is consistent with wide scale spread.

We have rephrased to “considerable within lineage diversity (highest in the predominant lineage B.1), this consistent with ongoing within lineage SARS-CoV-2 genetic evolution”, page 24, lines 665-667.

– line 265: I am not sure what the analyses presented in this paragraph aim to show.

We agree that the said analysis is not core to the manuscript thus we have deleted it from the revised manuscript.

– line 280: can the data in Figure 8A presented in a different way so that numbers or proportions can be easily read off? As this figure stands, the only meaningful qualitative observation I can make is that Mombasa drives inter county spread.

Thanks for this suggestion. We have revised the plot to show not only the patterns of intercounty transmission but also the similarities and differences in patterns between wave one and wave two.

The observation that Mombasa was the main driver of inter-county spread is indeed the main conclusion from this analysis and we hope that the revised Figure 10A makes this clearer. The aggregated number of imports into each county is provided in Table 4.

In Figure 10 panel B we compare the total location transition events between the Wave one period and the Wave two period, and the observed numbers can be easily read from these bar charts.

We have added Figure 10C that shows the temporal trends of imports, exports and inter-county location transition events during the study period clearly showing the change from mostly imports during Wave one to mostly intercounty transmission during Wave two.

In Figure 10 panel D, we provide bar plots to show the quantitative distribution of the estimated origins of imports into each county.

– line 307: what was the purpose of the amino acid evolution analysis?

We agree that the amino acid changes analysis is not core to the current manuscript and have dropped it from the revised manuscript.

– line 375: this sentence is not clear to me.

Apologies for this confusing sentence, we have rephrased the sentence to “We note from the widening scales of observation analysis that despite the very large number of lineages detected globally (>900) during our study period (1st March 2020 to 28th February 2021, only a small fraction (n=41, <5%) of these were documented in Coastal Kenya (O’Toole et al., 2021)).” See Page 18 line 860-861.

– line 416: this sentence is not clear to me.

Thanks for alerting us. We have rephrased the sentence to:

“Thus, the current analysis is consistent with findings from mathematical models that it is more likely that the relaxation or removal of some of the government COVID-19 countermeasures (reopening of international airspace in August 2020, reopening of bars and restaurants in September 2020, partial reopening learning institutions in October 2020) was associated with a return to pre-pandemic mobility levels among the higher social economic group that facilitated the second wave of SARS-CoV-2 infections in Kenya including for the coastal region (Brand et al., 2021).” See Page 20, line 932-938.

– line 419: it is not clear to me how exactly this study informs on optimising local interventions.

We have added the following paragraph to the discussion to clarify this point.

“Improved understanding of SARS-CoV-2 lineage introductions and local spread during the early waves has potential to inform the application of future mitigation strategies. There is growing evidence that SARS-CoV-2 will be endemic in human populations for the foreseeable future (Planas et al., 2021). Our analysis reveals lineage prevalence patterns and routes of entry into Coastal Kenya that are likely to persist. Our finding that during the early period, lineages in the first two infection waves in coastal Kenya were frequently introduced via Mombasa County, support strengthening surveillance in Mombasa for an early warning system of new variant introductions into the region. We also provide evidence that unlike the SARS-CoV-2 wave numbers 3-5 in Kenya, the early SARS-CoV-2 waves (number 1 and 2) in Coastal Kenya were not driven by occurrence of a VOC indicating presence of other important factors in observed SARS-CoV-2 waves of infection” Page 19, lines 876-887.”

– line 520: requiring >80% coverage seems high. Why could >50% coverage not be used for this analysis, and would >50% coverage have resulted in more substantial representation of African sequences in the data sets?

The 80% cut-off is supported by:

a) PANGO lineages can only be assigned confidently by the pangolin toolkit when genome coverage of >70%.

b) Since lineage assignment is one of our key analyses and there is considerably large, good quality genomic data (>20,000 with >80% coverage) that is publicly available and representative across the Africa, we did not find it informative to include partial genomes from Africa (<80% coverage) that may also be unassignable to PANGO lineages.

c) The missing 20% (or 50%) in many of the GISAID genomes tends to be in the spike region which is phylogenetically the most informative part of the genome, so actually using even 80% genomes (rather than a higher percentage cutoff) is problematic because you miss a lot of informative sites. Dropping to 50% is probably not a good way to improve the analysis. Yes, you may increase the number of genomes, but they are missing the informative nucleotide changes.

d) In the revised analysis we were limited by computational power for the number of genome sequences included, hence did not wish to include more than the African genomes data set available on GISAID with > 80% coverage (n=21, 150), see Figure 3—figure supplement 1.

Reviewer #2 (Recommendations for the authors):

As mentioned, this is a very well-written paper that presents several well-constructed analyses for identifying and tracking SARS-CoV-2 lineages across coastal Kenya. However, the conclusions presented aren't particularly surprising, and the paper is mostly descriptive. I think adding some discussion of restrictions data or other local context would make the paper a stronger fit for a journal such as eLife.

We thank the reviewer for the kind words. We have added a sub analysis and comments on the restrictions, nationality, and travel data on page 10 lines 385-396 and page 10 lines 467479. Some of the results from this analysis are provided in Figure 7 and Figure 5—figure supplement 1 and 2.

Small comments for consideration by the authors:

Figure 1A. This is a small point, but I didn't find the colors particularly intuitive here. Could the authors use lighter colors for fewer cases and darker colors for more cases, rather than the divergent color scheme currently used?

We have revised the Figure 1A as suggested. We hope that this improves the resolution and clarity of the figure.

Line 99. Could the authors define the Oxford Stringency index briefly in the main text?

The Oxford Stringency index is now described in the methods section of the manuscript. Page 8 Lines: 340-350,

“Oxford SI is based on nine response indicators rescaled to values of 0-100, with 100 being strictest (Hale et al., Noam Angrist). The nine response indicators used to form the SI are (a) school closures, (b) workplace closures, (c) cancellation of public events, (d) restrictions on public gatherings, (e) closures of public transport, (f) stay-at-home requirements, (g) public information campaigns, (h) restrictions on internal movements and (i) International travel controls.

Line 151. I believe that the updated name for lineages assigned by the PANGOLIN software is "Pango lineages".

We thanks the reviewer for this correction, and we have updated the name to Pango lineages throughout the manuscript.

Figure 3A. Could the authors more clearly indicate which wave each month belongs to? It would also be useful to add some information about when each lineage was present globally, either here or in another figure panel (I know that this information is present in Table 2, but a visual representation would be helpful).

This is a useful suggestion, in the updated Figure 4A, we have added a vertical dashed line demarcates the date of transition from Wave one to Wave two.

Also in the revised manuscript, we have included a new Figure 3 which details the global context of the 43 lineages identified in Coastal Kenya.

Figure 3C. Could the authors somehow visually indicate which identified lineages were unique to Kenya? Perhaps by cross-hatching those colored bars?

Thanks for this useful suggestion. We have indicated the Kenya unique lineages in Figure 4B where we have stratified the detected lineages into those that were Kenya specific and those that were predominantly detected elsewhere (international lineages).

Line 217. I think it is better to describe a lineage as "predominantly found in X country" rather than to call it a "Rwandan lineage", for example.

Thanks, we agree, and we have corrected this in the revised manuscript.

Line 225. Could the authors be more consistent in their use of Pango lineages versus the WHO Α/Β/etc. nomenclature? Switching back and forth is a bit confusing, especially since the connection between the two nomenclatures isn't explicitly introduced.

This is a very good point, and we apologize for our inconsistent use of the nomenclature. This has been corrected in the revised manuscript where we have used the Pango lineage nomenclature throughout. On first use of the Pango lineage and at key points in the manuscript (e.g. Table headings, figures) we have added the WHO nomenclature in parentheses. We hope that this improves the readability and limits confusion.

Line 256. How can the reader evaluate the authors' claim of "significant diversity" by looking at a time tree? Could the authors provide the maximum likelihood trees for these lineages?

This is a valid concern. In the revised manuscript, we have provided mutation-resolved phylogenetic trees for the specific lineages (Figure 9). The time-resolved lineage-specific phylogenetic trees have been moved to the supplementary material section (Figure 9—figure supplement 1).

Line 259. I think the authors meant to say "implying multiple export (or import) events" ?

Yes, thanks, that has been corrected.

Figure 7. If possible, please avoid using red and green colors in the same figure, as these colors are hard to distinguish for many people.

Thanks for pointing this out. In the revised manuscript, we dropped the previous Figure 7 with this mixed red/green. We have also checked all other figures for appropriate color use.

Line 269. Why is it interesting that some lineages were more or less divergent from the Wuhan reference? Isn't this mostly a function of when the lineages emerged? Is there something worth noting on the root-to-tip plot shown in a previous figure?

Considering our focus in this manuscript is the SARS-CoV-2 epidemiology lineage spatial-temporal dynamics rather than their molecular evolution, the analysis considering divergence within the lineages has been dropped from the revised manuscript.

Line 289. I think the authors mean ONT not OTN.

We thank the reviewer for noting this, it has been corrected.

References

Brand, S.P.C., Ojal, J., Aziza, R., Were, V., Okiro, E.A., Kombe, I.K., Mburu, C., Ogero, M., Agweyu, A., Warimwe, G.M., Nyagwange, J., Karanja, H., Gitonga, J.N., Mugo, D., Uyoga, S., Adetifa, I.M.O., Scott, J.A.G., Otieno, E., Murunga, N., Otiende, M., Ochola-Oyier, L.I., Agoti, C.N., Githinji, G., Kasera, K., Amoth, P., Mwangangi, M., Aman, R., Ng'ang'a, W., Tsofa, B., Bejon, P., Keeling, M.J., Nokes, D.J., Barasa, E., 2021. COVID-19 transmission dynamics underlying epidemic waves in Kenya. Science 374(6570), 989-994.

Githinji, G., de Laurent, Z.R., Mohammed, K.S., Omuoyo, D.O., Macharia, P.M., Morobe, J.M., Otieno, E., Kinyanjui, S.M., Agweyu, A., Maitha, E., Kitole, B., Suleiman, T., Mwakinangu, M., Nyambu, J., Otieno, J., Salim, B., Kasera, K., Kiiru, J., Aman, R., Barasa, E., Warimwe, G., Bejon, P., Tsofa, B., Ochola-Oyier, L.I., Nokes, D.J., Agoti, C.N., 2021. Tracking the introduction and spread of SARS-CoV-2 in coastal Kenya. Nature Communications 12(1), 4809.

O’Toole, Á., Scher, E., Underwood, A., Jackson, B., Hill, V., McCrone, J.T., Colquhoun, R., Ruis, C., Abu-Dahab, K., Taylor, B., Yeats, C., du Plessis, L., Maloney, D., Medd, N., Attwood, S.W., Aanensen, D.M., Holmes, E.C., Pybus, O.G., Rambaut, A., 2021. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evolution.

Planas, D., Saunders, N., Maes, P., Guivel-Benhassine, F., Planchais, C., Buchrieser, J., Bolland, W.-H., Porrot, F., Staropoli, I., Lemoine, F., Péré, H., Veyer, D., Puech, J., Rodary, J., Baele, G., Dellicour, S., Raymenants, J., Gorissen, S., Geenen, C., Vanmechelen, B., Wawina -Bokalanga, T., Martí-Carreras, J., Cuypers, L., Sève, A., Hocqueloux, L., Prazuck, T., Rey, F., Simon-Loriere, E., Bruel, T., Mouquet, H., André, E., Schwartz, O., 2021. Considerable escape of SARS-CoV-2 Omicron to antibody neutralization. Nature.

https://doi.org/10.7554/eLife.71703.sa2

Article and author information

Author details

  1. Charles N Agoti

    1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    2. Pwani University, Kilifi, Kenya
    Contribution
    Conceptualization, Data curation, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review and editing, Investigation
    For correspondence
    cnyaigoti@kemri-wellcome.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2160-567X
  2. Lynette Isabella Ochola-Oyier

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Conceptualization, Investigation, Project administration, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Simon Dellicour

    1. Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, Bruxelles, Belgium
    2. Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory for Clinical and Epidemiological Virology, KU Leuven, University of Leuven, Leuven, Belgium
    Contribution
    Formal analysis, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9558-1052
  4. Khadija Said Mohammed

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Arnold W Lambisia

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Zaydah R de Laurent

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Project administration, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
  7. John M Morobe

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2398-6717
  8. Maureen W Mburu

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  9. Donwilliams O Omuoyo

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3900-5354
  10. Edidah M Ongera

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  11. Leonard Ndwiga

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  12. Eric Maitha

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  13. Benson Kitole

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Resources, Supervision
    Competing interests
    No competing interests declared
  14. Thani Suleiman

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  15. Mohamed Mwakinangu

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  16. John K Nyambu

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  17. John Otieno

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  18. Barke Salim

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  19. Jennifer Musyoki

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Investigation, Project administration, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  20. Nickson Murunga

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
  21. Edward Otieno

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Data curation, Formal analysis, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8014-7306
  22. John N Kiiru

    Ministry of Health, Nairobi, Kenya
    Contribution
    Investigation, Methodology, Project administration, Resources, Supervision
    Competing interests
    No competing interests declared
  23. Kadondi Kasera

    Ministry of Health, Nairobi, Kenya
    Contribution
    Data curation, Investigation, Methodology, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  24. Patrick Amoth

    Ministry of Health, Nairobi, Kenya
    Contribution
    Funding acquisition, Investigation, Project administration, Resources, Supervision
    Competing interests
    No competing interests declared
  25. Mercy Mwangangi

    Ministry of Health, Nairobi, Kenya
    Contribution
    Conceptualization, Investigation, Methodology, Project administration, Resources, Supervision
    Competing interests
    No competing interests declared
  26. Rashid Aman

    Ministry of Health, Nairobi, Kenya
    Contribution
    Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision
    Competing interests
    No competing interests declared
  27. Samson Kinyanjui

    1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    2. Pwani University, Kilifi, Kenya
    3. Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    Contribution
    Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  28. George Warimwe

    1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    2. Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    Contribution
    Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  29. My Phan

    Medical Research Centre (MRC)/ Uganda Virus Research Institute, Entebbe, Uganda
    Contribution
    Formal analysis, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6905-8513
  30. Ambrose Agweyu

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Conceptualization, Investigation, Project administration, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  31. Matthew Cotten

    1. Medical Research Centre (MRC)/ Uganda Virus Research Institute, Entebbe, Uganda
    2. MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
    Contribution
    Conceptualization, Funding acquisition, Project administration, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3361-3351
  32. Edwine Barasa

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Methodology, Project administration, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  33. Benjamin Tsofa

    Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    Contribution
    Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1000-1771
  34. D James Nokes

    1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    2. University of Warwick, Coventry, United Kingdom
    Contribution
    Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5426-1984
  35. Philip Bejon

    1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    2. Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Project administration, Resources, Supervision, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  36. George Githinji

    1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Kilifi, Kenya
    2. Pwani University, Kilifi, Kenya
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9640-7371

Funding

National Institute for Health and Care Research (17/63/82)

  • D James Nokes

National Institute for Health and Care Research (16/136/33)

  • Charles N Agoti
  • Samson Kinyanjui
  • George Warimwe
  • D James Nokes
  • George Githinji

Wellcome Trust (220985)

  • D James Nokes
  • George Githinji

Wellcome Trust (203077/Z/16/Z)

  • Edwine Barasa
  • Benjamin Tsofa
  • Philip Bejon

Wellcome Trust (220977/Z/20/Z)

  • My Phan
  • Matthew Cotten

Medical Research Council (MC_PC_20010)

  • My Phan
  • Matthew Cotten

H2020 European Research Council (n°874850)

  • Simon Dellicour

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Acknowledgements

We thank (1) the members of the six coastal counties of Kenya RRTs for collecting the samples analysed here; (2) the members of the COVID-19 KWTRP Testing Team who tirelessly analysed the samples received at KWTRP to identify positives (see full list of members below); (3) the KWTRP data entry team; (4) laboratories that have shared sequence data on GISAID that we included as comparison data in our analysis (see list in Supplementary file 7); and (5) the KRISP team in South Africa for sharing the scripts we used in the import/export analysis and AFRICA-CDC for facilitating Africa genomics training. This article was published with permission of the Director of KEMRI. Members of COVID-19 Testing Team at KWTRP: Agnes Mutiso, Alfred Mwanzu, Angela Karani, Bonface M Gichuki, Boniface Karia, Brian Bartilol, Brian Tawa, Calleb Odundo, Caroline Ngetsa, Clement Lewa, Daisy Mugo, David Amadi, David Ireri, Debra Riako, Domtila Kimani, Edwin Machanja, Elijah Gicheru, Elisha Omer, Faith Gambo, Horace Gumba, Isaac Musungu, James Chemweno, Janet Thoya, Jedida Mwacharo, John Gitonga, Johnstone Makale, Justine Getonto, Kelly Ominde, Kelvias Keter, Lydia Nyamako, Margaret Nunah, Martin Mutunga, Metrine Tendwa, Moses Mosobo, Nelson Ouma, Nicole Achieng, Patience Kiyuka, Perpetual Wanjiku, Peter Mwaura, Rita Warui, Robinson Cheruiyot, Salim Mwarumba, Shaban Mwangi, Shadrack Mutua, Sharon Owuor, Susan Njuguna, Victor Osoti, Wesley Cheruiyot, Wilfred Nyamu, Wilson Gumbi and Yiakon Sein. Funding This work was supported by the National Institute for Health and Care Research (NIHR) (project references 17/63/82 (PI JN) and 16/136/33 (PI MW)) using UK Aid from the UK Government to support global health research, The UK Foreign, Commonwealth and Development Office and Wellcome Trust (grant# 220985). The KEMRI-Wellcome Core award 203077/Z/16/Z from Wellcome award to PB supports the ongoing testing and thereafter the National COVID Testing Africa AAPs/Centre Wellcome Award, 222574/Z/21/Z, to PB and LIO-O supports the ongoing testing that identifies positive samples for sequencing. Members of COVID-19 Testing Team at KWTRP were supported by funding received by Dr Marta Maia (BOHEMIA study funded UNITAID), Dr Francis Ndungu (Senior Fellowship and Research and Innovation Action (RIA) grants from EDCTP), Dr Eunice Nduati (USAID grant to IAVI: AID-OAA-A-16–00032) and Prof. Anthony Scott (PCIVS grant from GAVI). MC and MVTP were supported by the Wellcome Trust and FCDO – Wellcome Epidemic Preparedness – Coronavirus (AFRICO19, grant agreement number 220977/Z/20/Z), from the MRC (MC_UU_1201412) and from the UK Medical Research Council (MRC/UKRI) and FCDO (DIASEQCO, grant agreement number NC_PC_19060). SD acknowledges support from the Fonds National de la Recherche Scientifique (F.R.S.-FNRS, Belgium; grant no. F.4515.22), from the Research Foundation – Flanders (Fonds voor Wetenschappelijk Onderzoek-Vlaanderen, FWO, Belgium; grant no. G098321N), and from the European Union Horizon 2020 project MOOD (grant agreement no. 874850). The views expressed in this publication are those of the authors and not necessarily those of NIHR, the Department of Health and Social Care, Foreign Commonwealth and Development Office, Wellcome Trust, or the UK government.

Ethics

Human subjects: Samples analysed here were collected under the Ministry of Health protocols as part of the national COVID-19 public health response. The whole genome sequencing study protocol was reviewed and approved by the Scientific and Ethics Review Committee (SERU) at Kenya Medical Research Institute (KEMRI), Nairobi, Kenya (SERU protocol #4035). Individual patient consent was not required by the committee for the use of these samples for studies of genomic epidemiology to inform public health response.

Senior Editor

  1. Jos W van der Meer, Radboud University Medical Centre, Netherlands

Reviewing Editor

  1. Mary Kate Grabowski, Johns Hopkins University, United States

Publication history

  1. Received: June 27, 2021
  2. Preprint posted: July 7, 2021 (view preprint)
  3. Accepted: June 10, 2022
  4. Accepted Manuscript published: June 14, 2022 (version 1)
  5. Version of Record published: July 14, 2022 (version 2)

Copyright

© 2022, Agoti et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 862
    Page views
  • 335
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Charles N Agoti
  2. Lynette Isabella Ochola-Oyier
  3. Simon Dellicour
  4. Khadija Said Mohammed
  5. Arnold W Lambisia
  6. Zaydah R de Laurent
  7. John M Morobe
  8. Maureen W Mburu
  9. Donwilliams O Omuoyo
  10. Edidah M Ongera
  11. Leonard Ndwiga
  12. Eric Maitha
  13. Benson Kitole
  14. Thani Suleiman
  15. Mohamed Mwakinangu
  16. John K Nyambu
  17. John Otieno
  18. Barke Salim
  19. Jennifer Musyoki
  20. Nickson Murunga
  21. Edward Otieno
  22. John N Kiiru
  23. Kadondi Kasera
  24. Patrick Amoth
  25. Mercy Mwangangi
  26. Rashid Aman
  27. Samson Kinyanjui
  28. George Warimwe
  29. My Phan
  30. Ambrose Agweyu
  31. Matthew Cotten
  32. Edwine Barasa
  33. Benjamin Tsofa
  34. D James Nokes
  35. Philip Bejon
  36. George Githinji
(2022)
Transmission networks of SARS-CoV-2 in Coastal Kenya during the first two waves: A retrospective genomic study
eLife 11:e71703.
https://doi.org/10.7554/eLife.71703
  1. Further reading

Further reading

    1. Epidemiology and Global Health
    2. Evolutionary Biology
    Erin Brintnell, Art Poon
    Insight

    Combining clinical and genetic data can improve the effectiveness of virus tracking with the aim of reducing the number of HIV cases by 2030.

    1. Epidemiology and Global Health
    Catherine Meh, Prabhat Jha
    Research Article

    Preference for sons and smaller families and, in the case of China, a one-child policy, have contributed to missing girl births in India and China over the last few decades due to sex-selective abortions. Selective abortion occurs also among Indian and Chinese diaspora, but their variability and trends over time are unknown. We examined conditional sex ratio (CSR) of girl births per 1000 boy births among second or third births following earlier daughters or sons in India, China, and their diaspora in Australia, Canada, United Kingdom (UK), and United States (US) drawing upon 18.4 million birth records from census and nationally representative surveys from 1999 to 2019. Among Indian women, the CSR in 2016 for second births following a first daughter favoured boys in India (866), similar to those in diaspora in Australia (888) and Canada (882). For third births following two earlier daughters in 2016, CSRs favoured sons in Canada (520) and Australia (653) even more than in India (769). Among women in China outside the one-child restriction, CSRs in 2015 for second order births somewhat favoured more girls after a first son (1154) but more heavily favoured boys after a first daughter (561). Third-birth CSRs generally fell over time among diaspora, except among Chinese diaspora in the UK and US. In the UK, third-birth CSRs fell among Indian but not among other South Asian diasporas. Selective abortion of girls is notable among Indian diaspora, particularly at higher-order births.