The global burden of yellow fever

  1. Katy AM Gaythorpe  Is a corresponding author
  2. Arran Hamlet
  3. Kévin Jean
  4. Daniel Garkauskas Ramos
  5. Laurence Cibrelus
  6. Tini Garske
  7. Neil Ferguson
  1. WHO Collaborating Centre for Infectious Disease Modelling, MRC Centre for Global Infectious Disease Analysis, Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), Imperial College London, United Kingdom
  2. Maître de conférences, Laboratoire MESuRS - Cnam Paris, France
  3. Secretariat for Health Surveillance, Brazilian Ministry of Health, Brazil
  4. World Health Organisation, Switzerland

Abstract

Yellow fever (YF) is a viral, vector-borne, haemorrhagic fever endemic in tropical regions of Africa and South America. The vaccine for YF is considered safe and effective, but intervention strategies need to be optimised; one of the tools for this is mathematical modelling. We refine and expand an existing modelling framework for Africa to account for transmission in South America. We fit to YF occurrence and serology data. We then estimate the subnational forces of infection for the entire endemic region. Finally, using demographic and vaccination data, we examine the impact of vaccination activities. We estimate that there were 109,000 (95% credible interval [CrI] [67,000–173,000]) severe infections and 51,000 (95% CrI [31,000–82,000]) deaths due to YF in Africa and South America in 2018. We find that mass vaccination activities in Africa reduced deaths by 47% (95% CrI [10%–77%]). This methodology allows us to evaluate the effectiveness of vaccination and illustrates the need for continued vigilance and surveillance of YF.

Introduction

Yellow fever is a flavivirus endemic in tropical regions of Africa and South America. In Africa, it is the third most commonly reported type of disease outbreak. In the Americas, yellow fever produces extensive epizootics in non-human primates (NHPs) and outbreaks of human cases (Mboussou et al., 2019). It is vaccine preventable, with a safe and effective vaccine available since the 1930s that has been introduced into the Expanded Programme on Immunisation (EPI) of many countries (Region V, 2003). Yellow fever is transmitted by numerous vectors including Aedes spp. and Haemogogus spp. in Africa and the Americas, respectively. A component of the sylvatic reservoir system is in NHPs, and as a result of this, yellow fever cannot be eradicated. The clinical course of yellow fever infection leads to a variety of non-specific symptoms with severe infections potentially exhibiting fever, nausea, vomiting, jaundice, and haemorrhaging, which can result in death (Monath and Vasconcelos, 2015).

The transmission dynamics of yellow fever have numerous components. There are two main ‘cycles’ of transmission: urban and sylvatic. The sylvatic cycle is said to be the driver of most reported transmission with infection occurring mainly between NHPs, mediated by tree-hole breeding mosquitos. These vectors are diurnal and feed mostly on NHP; however, people can be infected if they encroach on this cycle through occupational or recreational activities (Monath and Vasconcelos, 2015). In South America, this accounts for the majority of cases and can potentially lead to large outbreaks; a recent yellow fever season saw over 1000 cases in Brazil alone (Couto-Lima et al., 2017). The urban cycle of yellow fever is less common, but the outbreaks have the potential to be devastating. Whilst urban outbreaks have largely been eradicated in South America (Câmara et al., 2011), they still occur in Africa with a recent urban outbreak, in Angola and the Democratic Republic of the Congo, causing 962 reported cases. Although this is thought to be only a fraction of the actual transmission (Organization, WH, 2017). Large outbreaks as a result of urban transmission are due to the combination of densely populated urban areas, and large populations of Aedes aegypti, which bites humans preferentially and breeds rapidly in urban environments (Harrington et al., 2014). The World Health Organisation (WHO) developed the Eliminate Yellow fever Epidemics (EYE) strategy in order to eliminate urban yellow fever outbreaks by 2026 (World Health Organization, 2017). The intermediate cycle currently only occurs in Africa when tree-hole breeding anthropophilic Aedes reach particularly high densities (Monath and Vasconcelos, 2015).

Control of yellow fever is primarily through vaccination, and there is no specific anti-viral treatment available. The 17D vaccine is live attenuated and was developed in 1936 (Monath, 2005). The vaccine is considered safe; estimates of adverse event occurrence are at most 0.6 per 100,000 doses, and reactions are generally mild (de Menezes Martins et al., 2015). Immunity due to yellow fever vaccination is suggested to be lifelong with WHO recommendations recently updated to reflect this (Staples et al., 2015). Efficacy is also thought to be high with recent estimates suggesting that serological response was 97.9% (95% credible interval [CrI] [82.9–99.7]) (Jean et al., 2016). An issue with the vaccine is production and corresponding stockpiles. As the vaccine is live, production is slow, which has led to vaccine shortages for large outbreaks and for travellers (Gershman et al., 2017). As a result of this, fractional dosing has become a recommendation in outbreak settings (Barrett, 2020).

Due to limited vaccine supply, efficient planning of interventions is vital to avoid large outbreaks. To facilitate this, robust estimates of disease burden and projections of future dynamics are key. Previous studies have focused on evaluating historical vaccine impact and projecting future potential impact. Garske et al., 2014 produced vaccine impact estimates for the African endemic region focusing on mass vaccination campaigns until 2013. They found that mass vaccination activities have averted burden by 57% (95% CI [54–59]) in countries where they took place, accounting for 27% (95% CI [22–31]) of the burden across the region. More recently, Shearer et al., 2018 examined the impact of vaccination globally, across Africa and South America, and found that (all) vaccination activities averted between approximately 94,0000 and 119,000 cases each year.

In this study, we refine and extend the model of Garske et al. to encompass new geographic regions (South America) and new data (on occurrence, serology, vaccination, and NHPs), and produce updated estimates of burden and vaccine impact for yellow fever. In the following sections, we describe the new data and extension to the modelling approach, particularly focusing on the updated model of yellow fever occurrence. Then we present results of our projected transmission intensity considering uncertainty from estimation and the structural uncertainty of the models. Finally, we present burden estimates and reassess the impact of mass vaccination activities in Africa.

Materials and methods

We expand the existing framework of Garske et al., 2014 for Africa to account for transmission in South America as well. Countries are included in the analysis if they have been listed as at risk, endemic, or potentially at risk for yellow fever (World Health Organization, 2017). We fit a generalised linear model (GLM) of yellow fever reports to occurrence data available from 1984 to 2019, shown in Figure 1. This occurrence data denotes whether yellow fever has been reported at all over the observation period, irrespective of number of cases. The GLM then provides a probability of a reported yellow fever outbreak for the entire region. In order to estimate the force of infection that would result in these outbreaks, we use serological survey data. We use this to independently estimate the seroprevalence in the survey locations and thus the force of infection. These individual estimates allow us to calculate a probability of detection over the observation period which we may then use to provide force of infection estimates for the entire region. Finally, using demographic and vaccination data, we can calculate the burden in all provinces.

Global occurrence of yellow fever at province level.

Occurrence since 1984 is shown in yellow.

Data

We combine multiple data sets within a Bayesian framework to account for areas with sparse data and under-reporting. The model is estimated at province level, to match the available occurrence data. All data was from secondary sources, and ethics approval was thereby not required. Figure 2 summarises the included data.

Diagram of models and data sources where λ denotes the force of infection.

Circles denote a product of calculation or inference; square boxes denote data sources. Adapted from Gaythorpe et al., 2019.

Global yellow fever occurrence

Request a detailed protocol

A database of yellow fever occurrence was collated. This was compiled in two parts: occurrence in Africa was compiled originally in Garske et al. and has been subsequently maintained and updated (Garske et al., 2014; Gaythorpe et al., 2019; Jean et al., 2020). Occurrence of yellow fever in South America was collated by Hamlet et al., 2019. Reports of yellow fever in humans were assembled for both continents from sources including the weekly epidemiological record (World Health Organization, 2009), disease outbreak news (World Health Organization, 1996), WHO yellow fever surveillance database (YFSD), Brazilian Ministry of Health, and Pan-American Health Organisation (PAHO). The outbreak dataset for Africa up to 2018 is available to download from: https://github.com/kjean/YF_outbreak_PMVC/tree/main/formatted_data (Gaythorpe, 2021; copy archived at swh:1:rev:14703d7c5c7f63df6de04b81d5a48751604a906a). The cases of yellow fever were included if they were lab-confirmed through polymerase chain reaction. The YFSD includes confirmed and suspected yellow fever occurrence. Due to the low proportion of suspected cases in the database being due to yellow fever, this is used as a measure of surveillance effort where the incidence of suspected cases is aggregated to country level and divided by population to be used as a covariate in the GLM, following Garske et al., 2014.

Vaccination coverage and demography

Request a detailed protocol

We use the methodology of Hamlet et al. and Garske et al. with updated data sets and additional data for South America in order to estimate vaccination coverage across the regions (Garske et al., 2014; Hamlet et al., 2018). The coverage estimates using this methodology are visualised and available to download at district level from 1940 to 2050 in the POLICI shiny application (Hamlet et al., 2018). The coverage is informed by historic data on mass-vaccination activities, reactive campaigns, recent preventive mass vaccination campaigns, and routine infant vaccination (Durieux, 1956; Moreau et al., 1999; World Health Organization/ UNICEF, 2015). Further data was provided by the Ministry of Health for Brazil.

Demographic data was obtained from the United Nations World Population Prospects (UNWPP), which provides country-level population sizes (World population prospects, 2017). These were disaggregated to province level using Landscan data on population distributions (Dobson et al., 2000; LandScan, 2017). Age distributions were assumed to be the same across all provinces, and population distributions were assumed not to substantially vary over the observation period. Landscan provides population size estimates at 1/120 degree resolution. Combining this with UNWPP, we arrive at the total number of individuals in each age group and province over time.

Environmental and species occurrence data

Request a detailed protocol

The GLM of yellow fever occurrence was developed to account for dependence on environmental conditions, habitat suitability, and occurrence of the NHPs. Covariates include measures of enhanced vegetation index, altitude, temperature, precipitation, and land cover types as well as NHP species occurrence, Ae. aegypti and Ae. albopictus occurrence , and Ae. aegypti temperature suitability (Fick and Hijmans, 2017; NASA, L. D, 2001; Xie and Arkin, 1996; Kraemer et al., 2015).

Covariate data sets were available as gridded datasets of various spatial resolution. These were aggregated to province level, the same scale as the occurrence data. Temperature, altitude, and precipitation data was obtained from Worldclim version 2.0 (Fick and Hijmans, 2017). These were aggregated by calculating the mean, maximum, or minimum over the area of the province. Land cover was obtained from MODIS (Friedl and Sulla-Menashe, 2019; Friedl and Sulla-Menashe, 2015). This was aggregated by examining the proportion of each province coverage by a land cover type. NHP species distributions were obtained from the IUCN red list (IUCN, 2019). Occurrence of Ae. aegypti and albopictus was obtained from the supplementary information of Kraemer et al., 2015.

Prior to fitting, all variables were scaled to unit variance.

Serological surveys

Request a detailed protocol

We use serological surveys to assess transmission intensity in specific regions. Unfortunately, these are only available in the African endemic region. We include all surveys included in Gaythorpe et al. as well as a newly published survey undertaken in Kenya (Gaythorpe et al., 2019; Chepkorir et al., 2019; Diallo, 2010; Kuniholm et al., 2006; Merlin et al., 1986; Omilabu et al., 1990; Tsai et al., 1987; Werner et al., 1984). In the majority of surveys, individuals known to have been vaccinated are omitted; however, in south Cameroon, this information is unavailable and so we estimate an additional vaccination factor. In the study of Chepkorir et al., we include vaccinated proportions as stated in their evaluation after omitting those with unknown status. Where it is possible to determine whether there had been outbreaks affecting the survey, surveys describing outbreak seroprevalence were omitted.

Covariates for GLM

View detailed protocol

A full list of covariates is provided below; these are aggregated per province:

  1. Annual temperature maximum, minimum, and range calculated from Worldclim (Fick and Hijmans, 2017),

  2. Population size (on a log scale) from UNWPP disaggregated using Landscan (World population prospects, 2017; Dobson et al., 2000; LandScan, 2017),

  3. Annual precipitation maximum, minimum, and mean calculated from Worldclim (Fick and Hijmans, 2017),

  4. Enhanced vegetation index maximum, minimum, range, and mean calculated from NASA’s Land Processes Distributed Active Archive Center data (NASA, L. D, 2001),

  5. Middle infrared reflectance maximum, minimum, and mean calculated from NASA’s Land Processes Distributed Active Archive Center data (NASA, L. D, 2001),

  6. Proportion of land cover types such as savanna or grasslands were obtained from MODIS (Friedl and Sulla-Menashe, 2019; Friedl and Sulla-Menashe, 2015),

  7. Occurrence of Ae. aegypti and Ae. albopictus as provided in the supplementary data of Kraemer et al., 2015, both vectors are included as both can carry yellow fever, although it is worth noting that Ae. aegypti is the main urban vector,

  8. Occurrence of all NHP families such as cercopithecidae from the IUCN redlist (IUCN, 2019),

  9. Mean altitude per province calculated from Worldclim (Fick and Hijmans, 2017),

  10. Temperature suitability index for Ae. aegypti as described in Gaythorpe et al., 2020.

Further details on how the NHP data was aggregated and how the temperature suitability was calculated are provided below.

Non-human primates

Request a detailed protocol

NHP data was acquired from the IUCN redlist (IUCN, 2019). This provided presence range maps at species level as polygons. NHP species whose range polygon covered more than 10% of a province were considered to be present in that province. In order to produce maps of primate species richness, we perform a count of all species belonging to a family in each province. We include all NHP primate families in the covariate selection process. If a primate family is included in the resulting model, the primate species richness is classed as the number of primate species in that family that are present in the province.

Temperature suitability

Request a detailed protocol

Temperature suitability was calculated as in Gaythorpe et al., 2020. The form of the temperature suitability index is given by:

z(T)=a(T)2exp(-μ(T)ρ(T))μ(T)

where the bite rate, extrinsic incubation period, and mosquito mortality, given by a,ρ, and µ, respectively, are affected by temperature T in the following ways:

a(T)=acT(T-aT0)(aTm-T)0.5ρ(T)=1/(ρcT(T-ρT0)(ρTm-T)0.5)μ(T)=1/(-μc(T-μT0)(μTm-T))

following (Mordecai et al., 2017). The subscripts c,0, and m represent the positive rate constant, minimum temperature, and maximum temperature for each thermal response model. These were estimated within a Bayesian framework, and we retain the point estimates shown in Table 2 (Gaythorpe et al., 2020).

Models

Request a detailed protocol

We extend the model of Garske et al., 2014 to account for yellow fever burden in South America as well as Africa. We update currently included data and include further data where necessary to expand the scope of the modelling.

Seroprevalence

Request a detailed protocol

We assume a constant force of infection for each province over the observation period. This is the same as Garske et al. and was found to be a better reflection of available data than that another, dynamic, model variant (Gaythorpe et al., 2019). We assume homogeneous mixing and account for vaccination using the following form for s(λ,u), the expected seroprevalence in age group u given force of infection λ:

s(λ,u)=1-(1-au(1-exp(-λa)pa)au(pa))(1-auvapaaupa),

where a indexes the annual age groups, pa is the population age distribution, and va is the vaccination coverage in age group a. The binomial log likelihood is then given by the following:

logLsero=ulog(NuKu)s(λ,u)Ku(1s(λ,u)Nu),

where Nu is the number of samples in age group u and Ku is the number of positive samples in age group u.

GLM of the presence/absence of yellow fever reports

Request a detailed protocol

A GLM was fitted to the data set of yellow fever occurrence from 1984 to 2019 at province level. The data is assumed to be binomially distributed and a complementary log–log link function is used such that model predictions in province i, qi, are given by

q=1-exp(-eXβ),

where X denotes the matrix of covariates and β indicates the parameter vector to be fitted. The log-likelihood is given by

logLglm=i(yilog(qi)+(1-yi)log(1-qi)),

where yi denotes the presence/ absence in province i.

The occurrence of yellow fever depends on a number of environmental factors as well as the abundance and distribution of the vector and NHP hosts. We consider many of these variables as potential covariates in the model. As with Garske et al., the number of covariates to consider is large and has been extended for the current work by the inclusion of NHPs and temperature suitability. As such we perform a selection process, detailed below:

  1. We remove covariates that are not significantly associated with the data. For each covariate, we fit a univariate GLM to the data using the base R function, glm. We remove covariates with p-value<0.1; in this case, all covariates were significant.

  2. Highly correlated covariates are clustered such that the pairwise correlation in each cluster exceeds 0.75. This produces 38 clusters.

  3. We choose one covariate from each cluster to be further examined. Here, the covariate with the maximum absolute correlation with the data is chosen.

  4. The function stepAIC from the MASS package is used to further whittle the list of covariates down (Venables and Ripley, 2002). We choose the multiplier of the number of degrees of freedom such that the test criterion is BIC: the Bayesian information criterion instead of AIC: the Akaike information criterion.

  5. The final step is to use the R package, bestglm to produce the best 20 models according to BIC (McLeod and Xu, 2018). This uses the complete enumeration algorithm.

All models then included a measure of surveillance quality. For the 21 countries within the yellow fever surveillance database, specific data on reporting per capita was available. For countries not covered by the yellow fever surveillance database, and thus without an independent estimate of surveillance, individual country factors were fitted. However, countries not considered at risk were grouped together in order to have one country factor. This is in order to avoid infinite parameter estimates in areas which are known not to have yellow fever reports.

Transmission intensity

Request a detailed protocol

The transmission intensity estimates arising from the serology allow us to calculate the number of infections over the observation period in the areas where surveys were conducted. We link this to the probability of yellow fever report through a Poisson reporting process with a probability of the detection. This is calculated by comparing the GLM to predictions of the seroprevalence models in the following way:

qi=1-(1-ρi)ninf,i,

where ρc is the per-country probability of detection, qi is the probability of a report in province i, provided by the GLM, and ninf,i is the number of infections in province i, provided by the seroprevalence model. This means that the probability of detection can be linked to the GLM covariates by:

ninf,ilog(1-ρc)=exp(Xβ),

and in terms of the country factors, GLM covariates βc, and b, the baseline surveillance quality by:

log(-log(1-ρc))=βc+b.

Once the probability of detection has been estimated for each province with a serological study, we take the mean over each and use the resulting probability to extrapolate transmission intensity in areas where there are currently no seroprevalence studies.

Estimation

Request a detailed protocol

The best-fitting models, according to BIC, were estimated within a Bayesian framework described in Gaythorpe et al., 2019 including code used. The estimation is divided into two phases. The GLMs are estimated using adaptive Markov chain Monte Carlo (MCMC) sampling, whereas the seroprevalence models are estimated within a product space framework with the probability of the force of infection model set to 1; as such, the estimation becomes an adaptive MCMC with log-transformed parameters.

Prior distributions were chosen in many cases to match those of Garske et al. Country factors retain the Gaussian prior distribution with mean 0 and standard deviation 2 except for the countries considered low risk, whose country factor had truncated normal prior with mean 0 and standard deviation 30 and limits [0, ], designed to be uninformative and positive. The same prior was used for the GLM coefficient for the aggregated NHP species richness and the converse for the GLM coefficients for temperature range and altitude which were assumed to be negative. All other GLM coefficient priors were normal with mean 0 and standard deviation 30 to be uninformative. The force of infections for each seroprevalence study had exponential priors with rate parameter 0.001, although this was varied in earlier estimation, and the vaccine efficacy had truncated normal prior with mean 0.975 and standard deviation 0.05, according to Jean et al., 2016; this was truncated to [0,1].

Ensemble predictions

Request a detailed protocol

We propagate uncertainty from both the parameter estimation and model structure. This is done through sampling proportionally from the posterior distributions of all 20 of the best-fitting models to produce 500 force of infection and thus burden predictions. We sample proportional to the area under the curve (AUC) of each model fit. We also considered sampling proportional to likelihood; however, the AUC was used to compare our models with previous works such as Garske et al., 2014 and was more computationally efficient for larger sample sizes. In order to produce estimates of the number of severe infections and deaths per province and year, we scale the model output, infections, by sampling from the full uncertainty ranges of Johansson et al., 2014 for the proportion of infections considered severe and, of those, the proportion that will then die of yellow fever.

Whilst we propagate structural and parameter uncertainty to the predictions from the estimation of our models, the models themselves are inherently deterministic. As such, we will not capture the potential for large outbreaks, or stochastic die-out where a spillover event fails to spark an outbreak. Because of this, the burden estimates we present should be considered as an average behaviour of the potential number of deaths caused by yellow fever given the vaccination coverage estimated. This means that whilst the projected burden over time will reflect the data used, the year-on-year variation may be missed.

All analyses, estimation, and the original draft of the manuscript were performed in R version 3.5.1.

Role of funding source

Request a detailed protocol

This work was carried out as part of the Vaccine Impact Modelling Consortium, which is funded by Gavi, the Vaccine Alliance and the Bill and Melinda Gates Foundation. The views expressed are those of the authors and not necessarily those of the Consortium or its funders. The final decision on the content of the publication was taken by the authors. We acknowledge joint Centre funding from the UK Medical Research Council and Department for International Development. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Results

Regression model fitting and variable selection

All of the models included log (surveillance quality) and country factors for which country-specific surveillance information was not available. A total of 50 covariates were considered, all of which were significantly related to the data with threshold p=0.1. These were clustered into 39 groups leading to approximately 2.03×1046 model permutations. Following the use of a step function based on BIC, we reduce this number to 13 covariates and retain the 20 best models including these, shown in Table 1.

Table 1
Composition of the 20 best-fitting generalised linear models of yellow fever reports.

Surveillance quality is also included in all models. If an entry is 1, that covariate is included, if an entry is 0, that covariate is not included. Abbreviations used: MIR = middle infrared reflectance, Temp. = temperature., occ. = occurrence.

ModelCercopithecidae occ.Cebidae occ.Population (log)Temp. suitability (mean)GrasslandsSavannaEvergreen broadleaf forestsAe. aegypti occ.Aotidae occ.Woody savannaTemp. rangeMaximum MIRAltitudeBIC
11111100110010870
21111111111111872
31111100110011872
41111111111100872
51111111111101873
61111111110010873
71111111111110873
81111111110011873
91111110110010873
101111111111011874
111111111110110874
121111111110111874
131111111111010875
141111111010010875
151111100010010875
161111100111010875
171111111110100875
181111100110110875
191111110110011875
201111111011100876

Similar to Garske et al., all of the 20 best-fitting models included the log of population size, relating the probability of a report with the human population. All 20 models also included the temperature suitability at mean temperature which will limit the models in areas where the temperatures are too extreme to sustain vector transmission dynamics; Table 2 shows the parameters for the temperature suitability model. The species richness of three NHP families were included in all variants. These were aggregated in order to balance the contribution of the NHP host with the competence of vectors and human dynamics and populations. All model covariates are shown in Figure 3. Covariates such as Ae. aegypti occurrence, temperature range, altitude and barren, cropland, shrubland, and water body land cover were only included in some of the best models.

Figure 3 with 4 supplements see all
Included model covariates.

Species richness is the sum of all NHP species present per province from families listed in Table 1 and will vary as families are included/excluded. See Figure 3—figure supplement 14 for trace plots of all parameters.

Table 2
Temperature suitability index parameter values.

The subscripts c,0, and m represent the positive rate constant, minimum temperature, and maximum temperature for each thermal response model. Parameter a corresponds to bite rate, ρ corresponds to extrinsic incubation period, and µ corresponds to mosquito mortality.

acaT0aTmρcρT0ρTmμcμT0μTm
Value2.72e-42.2440.13−0.7512.7138.051.36e-417.3342.20

The 20 best-fitting models were also fit with full MCMC and the AUC calculated. These are shown in Figure 4. The AUCs of model variants are very similar with variant six generally the best. All model variants exceed the AUC of Gaythorpe et al., 2019 which had point estimate of 0.916.

Posterior predicted area under the curve (AUC) for all model variants.

The AUC are calculated for 500 samples from the posterior of each model variant.

Yellow fever occurrence

We predict yellow fever occurrence over the observation period in Figure 5. These ensemble predictions indicate high probabilities of report in the Amazon region of Brazil and West Africa. Note the results shown also include a measure of surveillance effort emphasising countries such as Angola.

Median posterior predicted probability of a yellow fever report from ensemble predictions of the 20 best GLMs.

This applies over the observation period 1984–2019.

Seroprevalence

The model prediction captured the wide range of transmission intensity, see Figure 6. However, in certain conditions such as in Kenya area 1, the fit is affected by uncertainty concerning the vaccination status of included individuals. These results show similar qualitative fits to Gaythorpe et al., 2019. Indeed, there are only two additional included studies, those found in Kenya of Chepkorir et al., 2019. In some cases, some of the data points lie outside the 95% Crl, these generally align with areas where there is uncertainty in the vaccination status of included individuals or where it is indicative of an outbreak.

Figure 6 with 2 supplements see all
Seroprevalence predictions for each serological survey.

Central blue line indicates median posterior predicted seroprevalence; blue area indicates 95% CrI. Dots indicate the data with error bar representing binomial confidence intervals. Countries are named by their ISO code with different ecological zones indexed ‘zone x’. See Figure 6—figure supplement 1 for posterior distribution of vaccine efficacy and vaccine factor for CMRs; see Figure 6—figure supplement 2 for comparison of force of infection estimates under different prior distributions.

Transmission intensity

In Figure 7, we show the median ensemble predictions of transmission intensity. In comparison to Garske et al., the force of infection in West Africa is slightly lower, and provinces in the Democratic Republic of the Congo (DRC) are highlighted as areas of high transmission intensity. However, the main area highlighted is that of Amazon in Brazil. See Figure 7—figure supplement 1 for the coefficient of variation between 100 samples from each of the 20 models, sampled equally; this further highlights areas of low transmission intensity such as the Sahara having higher degrees of uncertainty.

Figure 7 with 1 supplement see all
Median posterior predicted force of infection from ensemble predictions of the 20 best GLMs.

Force of infections are assumed to be time invariant as such, these do not correspond to a particular year. See Figure 6—figure supplement 1 for coefficient of variation.

Burden

The annual potential number of deaths and severe infections are shown in Table 3 with deaths per country in 2018 given in Figure 8. We estimate that in 2018 there were approximately 109,000 (95% CrI [67,000–173,000]) severe infections and 51,000 (95% CrI [31,000–82,000]) deaths due to yellow fever in these two regions. Burden is distributed unevenly between countries and continents. The highest burden is seen in the DRC due to a high force of infection and low vaccination coverage. In contrast, Brazil sees the fourth highest burden purely due to high force of infection in the Amazon region rather than low vaccination coverage. The majority of the burden occurs in Africa, which holds for all years shown.

Table 3
Potential deaths and severe infections per year in Africa and South America from ensemble model projections.
ContinentYearSevere infections, medianSevere infections, 95% CrI lowSevere infections, 95% CrI highDeaths, medianDeaths, 95% CrI lowDeaths, 95% CrI high
Africa1995102,97262,162160,70048,47428,67276,998
Africa2005122,10174,915192,77357,18234,44690,736
Africa201398,14862,083150,95345,97328,68072,380
Africa2018100,95263,001158,36247,31829,16274,981
Americas199514,349652826,0166652302612,577
Americas200510,254498818,436482722658779
Americas20138559426415,043399919697162
Americas20188331430614,608388319717033
Posterior predicted potential deaths per country in 2018 from the ensemble model projections.

Impact of mass vaccination campaigns in Africa

It has been shown that mass vaccination campaigns can produce long-lasting effects on disease burden. We examine the effects of mass vaccination activities from 2006 until 2019 in countries in Africa, to continue the analysis of Garske et al., 2014, in Figure 9 and in the figure supplement for 2013. We find large reductions in all countries with mass vaccination campaigns. In 2018, the largest reductions are of approximately 73% (95% CrI [64–79]) in Benin, 73% (95% CrI [60–81]) in Togo, and 61% (95% CrI [52–68]) in Liberia. This demonstrates the continued benefit of those campaigns.

Figure 9 with 1 supplement see all
Median posterior predicted deaths averted for 2018 by country.

Yellow represents the number of deaths without mass vaccination campaigns since 2006, and black represents deaths with current vaccination coverage levels. The points denote median and the line shows the 95% credible interval. See Figure 9—figure supplement 1 for results in 2013.

Overall, the reductions in the number of deaths per year are substantial, shown in Table 4. These amount to approximately 10,000 (95% CrI [6,000–17,000]) deaths averted in 2018 due to mass vaccination activities in Africa, corresponding to a 47% reduction (95% CrI [10–77]) in deaths.

Table 4
Deaths averted per year due to mass vaccination activites occurring from 2006 onwards in Africa.
YearMedian deaths avertedDeaths averted, 95% CrI lowDeaths averted, 95% CrI high
201311,414640019,369
201810,140578117,307

Discussion

In this study, we further developed models of yellow fever transmission in Africa and South America. We calculated disease burden in terms of severe infections (or cases) and deaths from an ensemble of best-fitting GLMs of yellow fever reports between 1984 and 2019 coupled to catalytic models of seroprevalence. We used this approach to evaluate the impact of mass vaccination campaigns in Africa as well as produce updated burden estimates of yellow fever in endemic regions.

We estimate that there are between 63,000 and 158,000 severe infections of yellow fever in Africa, resulting in 29,000–75,000 deaths. In South America, we estimate there are 4000–15,000 severe infections, resulting in 2000–7000 deaths. These estimates are contained within the bounds of Garske et al. who estimated between 51,000 and 380,000 severe infections and between 19,000 and 180,000 deaths occur each year in Africa. We also compare to Shearer et al., 2018 who found approximately 256,000 cases on the African continent and 28,000 within Latin America, which are at the higher end of our predicted ranges. All the above estimates fall within a similar range despite different scopes and modelling approaches.

In order to produce our burden estimates, we first estimate transmission intensity through a force of infection for each province. These estimates differ from those of Garske et al. in West Africa and the DRC. In order to account for the extended model scope, we revisited the covariates used in the GLM leading to the changes in West African forces of infection shown. This can partly be explained by the exclusion of certain covariates, such as longitude, and the inclusion of others, such as NHP species richness. The latter highlights the DRC as an area of high transmission potential, increasing the local estimates of force of infection. The former reduces the force of infection estimates in West Africa. This decrease in West Africa has led to our range of burden lying within the lower range of Garske et al., and whilst the same proportional impact of vaccination is found, the number of deaths averted is also in the lower range of previous estimates. These differences may also indicate a general sensitivity of the approach as, whilst burden is generally agreed to be higher in West Africa, it may be difficult to determine the exact magnitude from occurrence data alone. In South America, the force of infections are estimated to be highest in the Amazon region of Brazil, in part due to the high NHP species richness found there. This is consistent with vaccination efforts which have focused on this area leading to relatively low burden despite the high intensity of transmission.

In refining the GLMs of yellow fever occurrence, we expanded the pool of possible covariates. This was partly facilitated by new data becoming available, such as NHP species occurrence, and partly motivated by the need to capture more of the inherent variability of yellow fever occurrence, through a temperature suitability index. We find that certain features such as human population size and certain land cover types were consistently featured in the top models of occurrence. Similarly, primate families, Cercopithecidae, Cebidae, and Aotidae—were all found to be important to yellow fever occurrence. The collection of covariates and model choice lead to high AUCs for each model ranging from 0.935 to 0.949, higher than Gaythorpe et al., 2019. One element that is omitted explicitly is vector abundance although we do include occurrence of Ae. aegypti and Ae. albopictus. In recent years, there have been a number of excellent efforts to map and predict vector distributions (Kraemer et al., 2015; Brady et al., 2014). We utilise many of the same model covariates and as such have chosen to omit modelled vector distributions in this analysis. Additionally, yellow fever is transmitted by many vectors whose distributions have yet to be characterised.

There are a number of additional limitations with the range of covariates used. We utilise NHP species presence/absence data which we aggregate to province level through counts. As such, we do not have information of the population sizes of NHP, only diversity. This could be reassessed as further data becomes available. Additionally, as a component of the covariate selection process, we cluster our covariates based on their correlation with each other. As such, NHP families who coexist in the same geographic locations are put in the same cluster, for example Atelidae, leading the model selection process to potentially include a NHP that is present in areas with yellow fever, but not necessarily causing or carrying the virus. As with the other covariates, the selection process implies correlation not causation between the covariate and the occurrence of yellow fever.

Apart from the NHP families, we utilise the same environmental covariates for each continent. Whilst this improves consistency, we use elements such as temperature suitability for Ae. aegypti as a proxy for the vectors of yellow fever and there are different species in each continent, which may differ in their own ways to Ae. aegypti. An additional issue is that we aggregate our environmental covariates to province level, reducing resolution and potentially biasing the results. In this study, we feel the sparsity of the data, long time window of interest, and general uncertainty in other features will eclipse biases introduced by the aggregation mechanism, but if this model were to be refined spatially, the bias from aggregation could be readdressed. Finally, whilst we include structural uncertainties from the GLM, we do not included uncertainty in the covariates themselves.

We estimate our models from two main sources of data that we have updated where possible. The occurrence data was expanded to account for additional years and locations from Garske et al., 2014; Gaythorpe et al., 2019; Jean et al., 2020; Hamlet et al., 2019. Yet there are assumptions and uncertainty that are inherent in this data. Firstly, the case definition relies on a vague symptom set, which is prone to mis-attribution and may vary by location, this will affect under-reporting. We accommodate variation between countries though country-specific reporting factors in the GLMs; however, there are number of elements that contribute to surveillance which we essentially aggregate into one component. The most stark difference may be that surveillance in South America uses that the fact that some NHP species experience disease-related mortality as sentinels for yellow fever. In Africa, due to the co-evolution of NHPs and virus, NHPs are not known to be significantly affected in the same way. As such, the surveillance systems are substantially different.

One of the important components to assess inherent under-reporting of yellow fever is serology data. We have significantly expanded the data sources for this aspect of the model, with an addition of 36 studies compared to Garske et al.However, all studies are located in Africa, the high vaccination coverage in many of the provinces render conventional tests of seroprevalence uninformative in terms of assessing background infection. There are further issues that may arise in using seroprevalence. We take a positive serology test to indicate exposure to the disease but also protection, adhering to the conventional assumption that immunity to yellow fever is acquired after infection or vaccination and remains for a lifetime. However, there have been recent studies suggesting that this may not be the case in children. Domingo et al., 2019 found that immunity against yellow fever waned in children following vaccination. If these results are representative of infant and child vaccination across the regions and time, our estimates of population immunity may need to be readdressed.

Modelling yellow fever is inherently uncertain. We have attempted to quantify this uncertainty through a Bayesian framework and ensemble model predictions. However, there are still elements that we have not captured. Uncertainty in demography and vaccination are not propagated through our model results and yet both will be influential. Demography is captured from UNWPP and scaled according to Landscan. We assume relatively static age structures and, whilst UNWPP goes some way to accounting for population movements, we do not include them explicitly. Population movements not only effect the model directly but will also influence the resulting vaccination coverage estimates. In a similar way, vaccination activities are collated from a number of sources with all efforts made to ensure completeness. Yet there are activities that may have been omitted or not correctly parameterised could affect the results of this study. A dominant area of uncertainty is in the symptom spectrum of yellow fever. In our model, we estimate infections and then scale these to arrive at severe infections or cases and, finally, deaths. In this we use the estimates of Johannsson et al. to capture the uncertainty in the proportion of infections considered severe etc.; however, this remains an area of contention for yellow fever as previous estimates of case fatality ratio (CFR) have varied substantially and are significantly larger than other flaviruses such as dengue (Johansson et al., 2014; Oo et al., 2017). The estimates of CFR we use also differ from those in the global burden of disease study leading to substantial differences in burden estimates between the two, although under-reporting is also addressed differently in the global burden of disease study (GBD) (Compare, 2019).

We have refined an established model but have also inherited some of its limitations, one of which is constancy. We assume that dynamics do not change substantially over the observation period in each province. As such, the variation over time is dominated by changes in demography and vaccination. In reality, the epidemiology of yellow fever is likely to change and has seen changes over recent years. Brazil experienced some of the largest outbreaks in its history with yellow fever in 2017 and 2018; this was suggested to have been caused by changing patterns of human behaviour, such as urbanisation and movement, or changes in epidemiology in the sylvatic cycle; however, the full list of causes remains unclear (Couto-Lima et al., 2017; Moreira-Soto et al., 2018; Chen et al., 2019; Possas et al., 2018; Saúde, 2019). Spillover is also inherently stochastic, whereas, due to the focus on long-term burden, we assume a constant risk of spillover. As such, the model will not capture outbreak dynamics over a short time window but may highlight areas at most risk of outbreaks. The resulting estimates of burden and vaccine impact are thus the potential number of deaths given the conditions in each province and each country given the environmental conditions but may vary year on year due to outbreaks and stochastic spillover events.

Conclusion

We have refined and extended an established model to update estimates of disease burden for yellow fever. We find consistent results that 92.2% (95% CrI [88.8–95%]) of global burden occurs in Africa and that mass vaccination activities have substantially reduced the number of cases and deaths we see today. We also highlight areas when burden is potentially high, in part due to lower-than-optimal vaccination coverages. The optimal route to avert deaths and potential yellow fever outbreaks is through tackling areas, and sub-populations, with low vaccination coverage. This is because vaccination is the main intervention for yellow fever, both as a preventative measure coupled with surveillance and as an outbreak response intervention. However, uncertainty in current data sources, and their interpretation, will limit the effectiveness of planning strategies. Our modelling approach underscores the need to examine background immunity, due to both natural infection and vaccination, in order to address not only the risk of future deaths but also assess how much of yellow fever is actually visible. For an old disease with an effective vaccine, yellow fever still poses new threats and, allowed to run unchecked, will provide a substantial health burden in many tropical areas as well as posing a significant global exportation risk.

Data availability

Public repository data: Vaccination cov erage: coverage is available to download from the PoLiCi shiny app (https://shiny.dide.imperial.ac.uk/polici/). Serology surveys: There are seven published surveys used, available at DOI:10.1016/0147-9571(90)90521-T , DOI:10.1093/trstmh/tru086 , DOI: 10.1186/s12889-018-5726-9 , DOI: 10.4269/ajtmh.2006.74.1078 , PMID: 3501739 , PMID: 4004378 , PMID: 3731366. Demographic data: Population level data was obtained from UN WPP (https://population.un.org/wpp/), this was disaggregated using Landscan 2017 data (https://landscan.ornl.gov/landscan-data-availability). Environmental data: This was obtained from LP DAAC (https://lpdaac.usgs.gov/) and worldclim (http://www.worldclim.org/). Non-human primate occurrence: This was obtained from the IUCN red list (https://www.iucnredlist.org/resources/spatial-data-download). Mosquito occurrence: This was obtained from The global compendium of Aedes aegypti and Ae. albopictus occurrence (https://doi.org/10.1038/sdata.2015.35). Yellow fever outbreaks: These were compiled from the WHO weekly epidemiologic record and disease outbreak news (https://www.who.int/wer/en/ and https://www.who.int/csr/don/en/). Compiled dataset for Africa available from https://github.com/kjean/YF_outbreak_PMVC/tree/main/formatted_data (copy archived at https://archive.softwareheritage.org/swh:1:rev:14703d7c5c7f63df6de04b81d5a48751604a906a/). Data elsewhere: The data from the WHO YF surveillance database and from recent serological surveys from WHO member states in Africa underlying the results presented in the study are available from World Health Organization (contact: William Perea, pereaw@who.int or Laurence Cibrelus, cibrelusl@who.int or Jennifer Horton, jhorton@who.int). Data from the occurrence of YF in Brazil were obtained from the Brazilian MoH (contact: Daniel Garkauskas Ramos). Code is available from: https://github.com/mrc-ide/YFestimation (copy archived at https://archive.softwareheritage.org/swh:1:rev:a352fe9369c4e1ec5915d63f0b36c8cfcc18894b/) for estimation and https://github.com/mrc-ide/YellowFeverModelEstimation2019 (copy archived at https://archive.softwareheritage.org/swh:1:rev:38f62f18a9261be7b987e042402331b6aa514233/) for specific analyses.

The following previously published data sets were used

References

  1. Report
    1. Compare G
    (2019)
    Viz Hub. Institute for Health Metrics and Evaluation [Website]
    Seattle, WA: Institute for Health Metrics and Evaluation, University of Washington.
  2. Report
    1. Diallo M
    (2010)
    Rapid Assessment of Yellow Fever Viral Activity in the Central African Republic
    Central African Republic.
    1. Dobson JE
    2. Bright EA
    3. Coleman PR
    4. Durfee RC
    5. Worley BA
    (2000)
    LandScan: a global population database for estimating populations at risk
    Photogrammetric Engineering and Remote Sensing 66:849–857.
    1. Durieux C
    (1956)
    Mass yellow fever vaccination in French Africa south of the Sahara
    Yellow Fever Vaccination, Monograph Series 30:115–121.
  3. Report
    1. Friedl M
    2. Sulla-Menashe D
    (2015)
    MCD12C1 Modis/terra+Aqua Land Cover Type Yearly L3 Global 0.05Deg Cmg V006 [Data Set]
    U.S. Department of the Interior.
  4. Report
    1. Friedl M
    2. Sulla-Menashe D
    (2019)
    MCD12Q1 Modis/terra+Aqua Land Cover Type Yearly L3 Global 500m Sin Grid V006 [Data Set].
    U.S. Department of the Interior.
  5. Report
    1. LandScan
    (2017)
    Landscan.ornl.gov
    LandScan.
    1. Merlin M
    2. Josse R
    3. Kouka-Bemba D
    4. Meunier D
    5. Senga J
    6. Simonkovich E
    7. Malonga JR
    8. Manoukou F
    9. Georges AJ
    (1986)
    Evaluation of immunological and entomotological indices of yellow fever in Pointe-Noire, people’s Republic of Congo
    Bulletin De La Societe De Pathologie Exotique Et De Ses Filiales 79:199–206.
    1. Moreau JP
    2. Girault G
    3. Dramé I
    4. Perraut R
    (1999)
    Reemergence of yellow fever in west africa: lessons from the past, advocacy for a control program
    Bulletin De La Societe De Pathologie Exotique 92:333–336.
  6. Report
    1. NASA, L. D
    (2001)
    NASA Land Processes Distributed Active Archive Center (LP DAAC)
    USGS/Earth Resources Observation and Science (EROS) Center.
  7. Report
    1. Region V
    (2003)
    Yellow Fever Control in Africa: Progress, Issues and Challenges
    Vaccine Preventable Diseases Bulletin.
  8. Report
    1. Saúde
    (2019)
    Secretaria De Vigilância Em Saúde. Departamento De Análise Em Saúde E Vigilância De Doenças Não Transmissíveis, B. 
    da. saúde brasil 2019 Uma análise da situação de saúde com enfoque nas doenças imunopreveníveis e na imunização.
    1. Staples JE
    2. Bocchini JA
    3. Rubin L
    4. Fischer M
    5. Centers for Disease Control and Prevention (CDC)
    (2015)
    Yellow fever vaccine booster doses: recommendations of the advisory committee on immunization practices, 2015
    MMWR. Morbidity and Mortality Weekly Report 64:647.
    1. Tsai TF
    2. Lazuick JS
    3. Ngah RW
    4. Mafiamba PC
    5. Quincke G
    6. Monath TP
    (1987)
    Investigation of a possible yellow fever epidemic and serosurvey for Flavivirus infections in northern Cameroon, 1984
    Bulletin of the World Health Organization 65:855.
    1. Werner G
    2. Huber H
    3. Fresenius K
    (1984)
    Prevalence of yellow fever antibodies in north Zaire
    Annales De La Societe Belge De Medecine Tropicale 65:91–93.
    1. World Health Organization
    (2009)
    Weekly epidemiological record (WER)
    Issues 84:485–492.
    1. World Health Organization
    (2017)
    Eliminate yellow fever epidemics (eye): A global strategy, 2017–2026–Éliminer les épidémies de fièvre jaune(EYE): Une stratégie mondiale, 2017-2026
    Weekly Epidemiological Record= Relevé Épidémiologique Hebdomadaire 92:193–204.
  9. Report
    1. World population prospects
    (2017)
    United Nations DoE. Projections S
    The United Nations.

Decision letter

  1. Miles P Davenport
    Senior Editor; University of New South Wales, Australia
  2. Jennifer Flegg
    Reviewing Editor; The University of Melbourne, Australia
  3. Jennifer Flegg
    Reviewer; The University of Melbourne, Australia
  4. Alex T Perkins
    Reviewer; University of Notre Dame, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This manuscript reports on a significant update to a leading model of the burden of yellow fever and the impact of vaccination. It provides estimates of the global burden of yellow fever in 2018 and the impact of vaccination activities in Africa. This paper is of interest to a broad range of researchers and public health practitioners engaged in the management of yellow fever.

Decision letter after peer review:

Thank you for submitting your article "The global burden of yellow fever" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Jennifer Flegg as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Miles Davenport as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Alex Perkins (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Summary:

This manuscript reports on a significant update to a leading model of the burden of yellow fever and the impact of vaccination. Some of the major updates include extending it from Africa to also include South America, and accounting for model uncertainty with different combinations of spatial covariates. Some differences with this update of the model are reported and explained, and the impact of mass vaccination campaigns is quantified. This paper is of interest to a broad range of researchers and public health practitioners engaged in the management of yellow fever. It provides estimates of the global burden of yellow fever in 2018 and the impact of vaccination activities in Africa. Given the limited volume and quality of data available on yellow fever epidemiology, the modelling approach is appropriate and supports the study conclusions.

Essential Revisions:

1) Novelty.

The novelty of the method is oversold, particularly in the Abstract. It is not accurate to say that this manuscript "develops a novel framework" or "newly developed methodology" given that it is an update of an existing framework by Garske et al. The exact novelty of the framework should be made clear. Please cite the Garske et al. paper where you first mention the framework in the Materials and methods.

2) Data.

Insufficient details are provided on some model inputs. The authors should provide further details on the YF occurrence data used in the analysis. In particular, details on how cases were diagnosed, and discuss the potential impact of diagnostic uncertainties on the study results. Further details and justification should also be provided on the non-human primate covariate. For example:

– YF occurrence data. Please summarise the number of occurrence records used in the analysis by geographic region and time. Briefly summarise how occurrences of YF were diagnosed. PCR? Serology? Clinically? If serological and /or clinically diagnosed cases are included, please comment on the potential impact of misdiagnosed cases (e.g. due to cross-reactivity with other flaviviruses such as dengue) and how this may have impacted your results and conclusions.

– NHPs data. What is the difference between habitat suitability and occurrence of non-human primates? Also, I suggest that you do not refer to the species range maps provided by IUCN as "species distributions" as these are not modelled species distributions (as the authors point out in the Discussion). Perhaps refer to them as "range maps". Please clarify what you mean by "species richness". Did you count the number of species range maps covering at least 10% of each province? Which NHPs did you include and what was the justification?

– Vector data. Ae aegypti is the main vector for urban yellow fever transmission, please justify why you included Ae Albopictus as a potential covariate.

3) Materials and methods.

Some of the methods are unclear and it is not obvious which data is used to inform which model parameters. For example, it is unclear how were severe infections and deaths estimated from transmission intensity surface. Please provide details in the Materials and methods section. Consider including a workflow diagram to help the reader to follow more easily what you have done. Please justify the choice of priors (e.g. prior on force of infection seems fairly small and strong) as well as choice to sample proportional to AUC.

4) Presentation of results.

Please provide additional information to help readers understand the tables and figures without referring back to the text. For example:

– Table 1. Please add more informative column and row labels. Describe what 1s and 0s represent in the figure legend.

– Table 3. These are not all environmental covariates as the legend suggests – there appears to be reservoir and vector species included here. Please re-produce the table with more informative descriptions of each covariate and provide references.

– Table 4. Please define each parameter in the table legend.

– Figure 2. Please provide more descriptive labels for each plot.

– Figure 4. Are these estimates for 2018? Please state in the legend.

– Figure 5. Please provide more informative labels on each of the plots. Please describe what the dots and bars represent in the figure legend.

– Figure 6. Are these estimates for 2018? Please state in the legend.

The AUCs of the models are essentially all the same, in which case it doesn't look like there was much success in discriminating among them and the ensemble is essentially a simple average of the component models – can the authors comment?

Please define what you mean by "potential" deaths in the Materials and methods and/or Results section. You mention it in the Discussion, but it should be earlier. Conceptually this was a bit confusing because "potential" could be interpreted as assuming no vaccination.

5) Discussion – for the explanation for the discrepancy in force of infection spatially as compared to Garske et al. -- can we really be convinced that this is the correct interpretation (i.e., lower FOI in West Africa) or if we should consider this to be a sensitivity of the model?

6) Please discuss why you only estimated vaccination impact in Africa and not South America.

Reviewer #1:

This paper presents a mathematical estimation of the burden of yellow fever in Africa and South America, using multiple types of data. Results are presented based on ensemble model predictions. While the paper presents results of public health interest, I'm not convinced of the novelty of the approach, this is rather an extension of an existing methodology to more data and regions. For this reason, I feel like this work would be better suited in a specialist journal.

1) I'm not convinced that the contribution of this paper is novel enough for publication in eLife. The model framework was already largely in place, as was most of the data. I think the Abstract oversells the novelty of the methodology.

2) The first mention of a temporal component to the models was in the Results. I found the lack of introduction of the temporal nature of the models quite confusing.

3) What was done for serological surveys in South America, since there were none available? How were the serological data representative of the whole population? Can this be justified more?

4) It would be good to be clearer about which model parameters go where and which data is used to inform which parameters. E.g. a workflow diagram would help the reader to follow more easily what you have done.

5) Why is BIC used for model selection? That's not exactly a natural choice for Bayesian models since it does not consider the effect of the choice of priors.

Reviewer #2 :

Abstract – It is not accurate to say that this manuscript "develops a novel framework" or "newly developed methodology" given that it is an update of an existing framework by Garske et al. This issue is handled appropriately elsewhere in the manuscript, but here it is not.

Materials and methods – The prior on force of infection seems fairly small and strong. How was this choice made, and how sensitive are the results to it?

Materials and methods – For the ensemble predictions, is there a specific rationale or precedent for sampling proportional to AUC? It sounds reasonable, but also somewhat arbitrary without better justification.

Results – The AUCs of the models are essentially all the same, in which case it doesn't look like there was much success in discriminating among them and the ensemble is essentially a simple average of the component models. Am I missing I anything with that assessment?

Discussion – The explanation for the discrepancy in force of infection spatially as compared to Garske et al. is appreciated. I wonder if we can really be convinced that this is the correct interpretation (i.e., lower FOI in West Africa) or if we should consider this to be a sensitivity of the model. I'm not sure whether we can say without some sort of out of sample test of the predictions of these two models.

Reviewer #3:

Gaythorpe et al. estimated the global burden of yellow fever and the impact of mass vaccination activities in Africa in 2018. A previously published Bayesian modelling framework was extended and applied to a range of new and updated data sources. First, the authors updated an existing dataset of yellow fever occurrences (from 1987 to 2018). These data were used, along with a range of geospatial covariate data, to estimate the probability of yellow fever being reported in each first administrative region (i.e. province) within yellow fever risk zones. Measures of climatic and environmental variables and the presence of non-human primate reservoir species and mosquito vector species, among other factors, were included as model covariates. Data from a number of serological surveys was used to account for under-reporting and to estimate transmission intensity across the study region. Next, the authors updated an existing dataset of vaccination activities and used these data to calculate the number of deaths attributable to yellow fever at a province level, globally, and to estimate the number of deaths averted by vaccination in Africa. The authors estimated that in 2018, there were 51,000 (95%CrI[31,000-82,000]) deaths globally due to yellow fever, with 90% of the burden in Africa. Further, they estimated that vaccination averted 10,000 (95%CrI[6,000-17,000]) deaths in Africa in 2018. The study did not estimate the impact of vaccination in South America. The data available for studying global YF epidemiology is limited in volume and quality, which means that analyses such as the one presented here are inherently uncertain. This study considered uncertainty from estimation and model structure, but did not account for other key sources of uncertainty, i.e., in estimates of vaccination coverage or model covariates. Nonetheless, the study demonstrates a useful approach to estimating disease burden and vaccination impact over broad geographic areas. The data and approach seem appropriate to support the study's conclusions. However, in the manuscript's current state, some aspects of the analysis and model inputs need to be further clarified and justified:

1) The authors claim that they have developed a novel framework for estimating disease burden and vaccine impact. However, the study appears to make a number of extensions (e.g. incorporating new geographic regions, new covariates, updated data) to previously published methods. The authors should make the novelty of the framework more clear.

2) Limited details are provided on some model inputs. The authors should provide further details on the YF occurrence data used in the analysis. In particular, details on how cases were diagnosed, and discuss the potential impact of diagnostic uncertainties on the study results. Further details and justification should also be provided on the non-human primate covariate. For example, which NHP species were included and why.

3) The authors should provide information on the data and approach used to estimate the number of severe infections and deaths from the estimates of transmission intensity.

4) The authors claim that they developed a novel framework. However, the work appears to make a number of extensions (e.g. incorporating new geographic regions, new covariates, updated data) to previously published methods. Please make the novelty of the framework more clear.

5) Limited information on the model inputs are provided in the Materials and methods section. Where the authors point to previously published methods or dataset, they should at least provide a brief summary of the method/dataset. For updated or new datasets, more detailed descriptions should be provided. For example:

– YF occurrence data. Please summarise the number of occurrence records used in the analysis by geographic region and time. Briefly summarise how occurrences of YF were diagnosed. PCR? Serology? Clinically? If serological and /or clinically diagnosed cases are included, please comment on the potential impact of misdiagnosed cases (e.g. due to cross-reactivity with other flaviviruses such as dengue) and how this may have impacted your results and conclusions.

– NHPs data. What is the difference between habitat suitability and occurrence of non-human primates? Also, I suggest that you do not refer to the species range maps provided by IUCN as "species distributions" as these are not modelled species distributions (as the authors point out in the Discussion). Perhaps refer to them as "range maps". Please clarify what you mean by "species richness". Did you count the number of species range maps covering at least 10% of each province? Which NHPs did you include and what was the justification?

– Vector data. Ae aegypti is the main vector for urban yellow fever transmission, please justify why you included Ae Albopictus as a potential covariate.

6) How were severe infections and deaths estimated from transmission intensity surface? Please provide details in the Materials and methods section.

7) Please define what you mean by "potential" deaths in the Materials and methods and/or Results section. You mention it in the Discussion, but it should be earlier. Conceptually this was a bit confusing because "potential" could be interpreted as assuming no vaccination.

https://doi.org/10.7554/eLife.64670.sa1

Author response

Essential Revisions:

1) Novelty.

The novelty of the method is oversold, particularly in the Abstract. It is not accurate to say that this manuscript "develops a novel framework" or "newly developed methodology" given that it is an update of an existing framework by Garske et al. The exact novelty of the framework should be made clear. Please cite the Garske et al. paper where you first mention the framework in the Materials and methods.

Many thanks for this, we have made sure to re-write the Abstract so that the methods are not oversold. We have also cited Garske et al. on first mention to make sure the framework is clearly explained.

In summary, this work applies the framework of Garske et al. to a new geographic region, South America, with updated datasets on yellow fever occurrence, substantially more serological survey studies and new environmental and species occurrence dataset. Within the Materials and methods we have made updates to accommodate the new structure and aims. We included the best fitting models in a hierarchical Bayesian framework and sampled proportionally from their posterior predictive distributions to accommodate both parameter and structural uncertainty in the resulting predictions.

2) Data.

Insufficient details are provided on some model inputs. The authors should provide further details on the YF occurrence data used in the analysis. In particular, details on how cases were diagnosed, and discuss the potential impact of diagnostic uncertainties on the study results. Further details and justification should also be provided on the non-human primate covariate. For example:

– YF occurrence data. Please summarise the number of occurrence records used in the analysis by geographic region and time. Briefly summarise how occurrences of YF were diagnosed. PCR? Serology? Clinically? If serological and /or clinically diagnosed cases are included, please comment on the potential impact of misdiagnosed cases (e.g. due to cross-reactivity with other flaviviruses such as dengue) and how this may have impacted your results and conclusions.

We have included further description on the occurrence data for YF including:

– linking to the outbreak dataset available on github for Africa: https://github.com/kjean/YF_outbreak_PMVC/tree/main/formatted_data

– Adding further detail on the diagnosis methods and the inclusion of suspected YF cases as a measure of surveillance effort.

– NHPs data. What is the difference between habitat suitability and occurrence of non-human primates? Also, I suggest that you do not refer to the species range maps provided by IUCN as "species distributions" as these are not modelled species distributions (as the authors point out in the Discussion). Perhaps refer to them as "range maps". Please clarify what you mean by "species richness". Did you count the number of species range maps covering at least 10% of each province? Which NHPs did you include and what was the justification?

Thank you, we have added further information detailing how the NHP data is included and correcting the language from “species distribution” to “range maps. The included NHP families are mentioned in Table 1 and all species within those families are included in the calculation of “species richness”, we have detailed this further in the data description.

– Vector data. Ae aegypti is the main vector for urban yellow fever transmission, please justify why you included Ae Albopictus as a potential covariate.

There are a number of vectors of YF, aegypti is the main urban vector. However, albopictus is also known to be able to carry YF. As such, we included both in the covariate selection process. We have noted this in the list of covariates. However, Ae. albopictus was not found to be significant in the final models.

3) Materials and methods.

Some of the methods are unclear and it is not obvious which data is used to inform which model parameters. For example, it is unclear how were severe infections and deaths estimated from transmission intensity surface. Please provide details in the Materials and methods section. Consider including a workflow diagram to help the reader to follow more easily what you have done. Please justify the choice of priors (e.g. prior on force of infection seems fairly small and strong) as well as choice to sample proportional to AUC.

Thank you, we have made an effort to redress the Materials and methods section in this regard. Specifically:

– We have clarified that the estimates of number of severe infections and deaths come from Johannsson et al. and are used to scale the model output, infections.

– We have included a diagram to explain the data used at each stage (Figure 2)

– We have further explained the choice of priors. These were largely retained from earlier work such as Garske et al. and sensitivity analysis was performed therein. We have also included a further supplementary figure which shows the difference in force of infection estimates with two priors: the original exponential with rate = 0.001, and another exponential with rate = 0.1. The estimates extensively overlap- as such the results are not sensitive to this choice of prior. See new Figure 6—figure supplement 2.

– We have clarified sampling proportional to AUC although we found small differences if using AUC compared to, for example, likelihood.

4) Presentation of results.

Please provide additional information to help readers understand the tables and figures without referring back to the text. For example:

– Table 1. Please add more informative column and row labels. Describe what 1s and 0s represent in the figure legend.

Thank you, we have updated the table with better labels and further description in the legend.

– Table 3. These are not all environmental covariates as the legend suggests – there appears to be reservoir and vector species included here. Please re-produce the table with more informative descriptions of each covariate and provide references.

We changed the table to a bulleted list with a reference for each covariate data source.

– Table 4. Please define each parameter in the table legend.

We have updated the legend with parameter definitions.

– Figure 2. Please provide more descriptive labels for each plot.

We have updated the plot labels.

– Figure 4. Are these estimates for 2018? Please state in the legend.

We have added a description to the legend. The occurrence is a probability over the observation period 1984-2019.

– Figure 5. Please provide more informative labels on each of the plots. Please describe what the dots and bars represent in the figure legend.

Thank you, we have added further detail to the legend.

– Figure 6. Are these estimates for 2018? Please state in the legend.

We have added further detail to the legend – Force of infections are assumed to be time invariant as such, these do not correspond to a particular year.

The AUCs of the models are essentially all the same, in which case it doesn't look like there was much success in discriminating among them and the ensemble is essentially a simple average of the component models – can the authors comment?

We have added further description on the averaging. We found that although the AUC largely overlapped, there were some differences with where the bulk of the distribution lay. As such, and because these models were similar in performance, we chose to sample proportionately, thus including any bias, and providing a good reflection of the true structural uncertainty in the Results.

Please define what you mean by "potential" deaths in the Materials and methods and/or Results section. You mention it in the Discussion, but it should be earlier. Conceptually this was a bit confusing because "potential" could be interpreted as assuming no vaccination.

We have added a paragraph on this in the Materials and methods to further explain what we indicate by “potential”. We aimed to describe the fact that what we present captures an average burden over the years and does not, for example, include some of the stochastic effects that may lead to outbreaks or lack thereof.

5) Discussion – for the explanation for the discrepancy in force of infection spatially as compared to Garske et al. – can we really be convinced that this is the correct interpretation (i.e., lower FOI in West Africa) or if we should consider this to be a sensitivity of the model?

This may be a sensitivity of the model and we have added further to the Discussion to highlight this point. Particularly as data in West Africa (particularly serology data) is lacking which would be incredibly valuable in pinning down the exact magnitude of the force of infection in this region.

6) Please discuss why you only estimated vaccination impact in Africa and not South America.

We focused on the continuing effects of mass vaccination campaigns carried out in Africa, this is an updated analysis of Garske et al. to demonstrate the continued benefit of those campaigns.

Reviewer #1:

This paper presents a mathematical estimation of the burden of yellow fever in Africa and South America, using multiple types of data. Results are presented based on ensemble model predictions. While the paper presents results of public health interest, I'm not convinced of the novelty of the approach, this is rather an extension of an existing methodology to more data and regions. For this reason, I feel like this work would be better suited in a specialist journal.

1) I'm not convinced that the contribution of this paper is novel enough for publication in eLife. The model framework was already largely in place, as was most of the data. I think the Abstract oversells the novelty of the methodology.

We have rewritten the Abstract to avoid overselling the manuscript.

2) The first mention of a temporal component to the models was in the Results. I found the lack of introduction of the temporal nature of the models quite confusing.

3) What was done for serological surveys in South America, since there were none available? How were the serological data representative of the whole population? Can this be justified more?

Unfortunately, there are no serological surveys available for South America partly due to concerns about cross-reactivity with other flaviviruses and partly as the vaccination coverage is generally high, rendering seroprevalence studies potentially less informative. We included serological data that includes information on the vaccination status of the included individuals and was not conducted as part of an outbreak investigation. As such, these provide a valuable view of the population of the province and time when the survey was conducted. We have added a further sentence on the serological survey inclusion.

4) It would be good to be clearer about which model parameters go where and which data is used to inform which parameters. E.g. a workflow diagram would help the reader to follow more easily what you have done.

Thank you, we have added a diagram (Figure 2) to illustrate which data is used where.

5) Why is BIC used for model selection? That's not exactly a natural choice for Bayesian models since it does not consider the effect of the choice of priors.

BIC is used in the covariate selection process only. AUC and likelihoods are examined to compare the final model variants.

Reviewer #2:

Abstract – It is not accurate to say that this manuscript "develops a novel framework" or "newly developed methodology" given that it is an update of an existing framework by Garske et al. This issue is handled appropriately elsewhere in the manuscript, but here it is not.

Materials and methods – The prior on force of infection seems fairly small and strong. How was this choice made, and how sensitive are the results to it?

See above for discussion and new figure (Figure 6—figure supplement 2) where we compare the estimates for two choices of prior on the force of infection.

Materials and methods – For the ensemble predictions, is there a specific rationale or precedent for sampling proportional to AUC? It sounds reasonable, but also somewhat arbitrary without better justification.

We examined both sampling by likelihood and by AUC, there was no major difference, likely due to the point below. As such, and because we were using AUC to compare against the original model of Garske et al., 2013, and the updated model of Gaythorpe et al., 2019, we chose to sample by AUC. This was also substantially more computationally efficient for the large sample sizes.

Results – The AUCs of the models are essentially all the same, in which case it doesn't look like there was much success in discriminating among them and the ensemble is essentially a simple average of the component models. Am I missing I anything with that assessment?

Whilst the distribution of AUC generally overlap, there are differences in where the majority of the distribution lies for example if we compare variant 1 and 2 we not that there are key differences. As such, and because we also did not want to omit potentially useful covariates, we chose to sample proportionally from all of the models.

Discussion – The explanation for the discrepancy in force of infection spatially as compared to Garske et al. is appreciated. I wonder if we can really be convinced that this is the correct interpretation (i.e., lower FOI in West Africa) or if we should consider this to be a sensitivity of the model. I'm not sure whether we can say without some sort of out of sample test of the predictions of these two models.

Thank you, this is a fair point and we have added to the Discussion to highlight that this may be a sensitivity of the approach, especially given that only occurrence data is available in West Africa rather than both occurrence and serology.

Reviewer #3:

Gaythorpe et al. estimated the global burden of yellow fever and the impact of mass vaccination activities in Africa in 2018. A previously published Bayesian modelling framework was extended and applied to a range of new and updated data sources. First, the authors updated an existing dataset of yellow fever occurrences (from 1987 to 2018). These data were used, along with a range of geospatial covariate data, to estimate the probability of yellow fever being reported in each first administrative region (i.e. province) within yellow fever risk zones. Measures of climatic and environmental variables and the presence of non-human primate reservoir species and mosquito vector species, among other factors, were included as model covariates. Data from a number of serological surveys was used to account for under-reporting and to estimate transmission intensity across the study region. Next, the authors updated an existing dataset of vaccination activities and used these data to calculate the number of deaths attributable to yellow fever at a province level, globally, and to estimate the number of deaths averted by vaccination in Africa. The authors estimated that in 2018, there were 51,000 (95%CrI[31,000-82,000]) deaths globally due to yellow fever, with 90% of the burden in Africa. Further, they estimated that vaccination averted 10,000 (95%CrI[6,000-17,000]) deaths in Africa in 2018. The study did not estimate the impact of vaccination in South America. The data available for studying global YF epidemiology is limited in volume and quality, which means that analyses such as the one presented here are inherently uncertain. This study considered uncertainty from estimation and model structure, but did not account for other key sources of uncertainty, i.e., in estimates of vaccination coverage or model covariates. Nonetheless, the study demonstrates a useful approach to estimating disease burden and vaccination impact over broad geographic areas. The data and approach seem appropriate to support the study's conclusions. However, in the manuscript's current state, some aspects of the analysis and model inputs need to be further clarified and justified:

1) The authors claim that they have developed a novel framework for estimating disease burden and vaccine impact. However, the study appears to make a number of extensions (e.g. incorporating new geographic regions, new covariates, updated data) to previously published methods. The authors should make the novelty of the framework more clear.

Thank you we have clarified this in the Abstract and have added caveats throughout the text.

2) Limited details are provided on some model inputs. The authors should provide further details on the YF occurrence data used in the analysis. In particular, details on how cases were diagnosed, and discuss the potential impact of diagnostic uncertainties on the study results. Further details and justification should also be provided on the non-human primate covariate. For example, which NHP species were included and why.

We have added further details on the occurrence data and the non-human primate covariates.

3) The authors should provide information on the data and approach used to estimate the number of severe infections and deaths from the estimates of transmission intensity.

We have included details on this and further referenced Johansson et al.

4) The authors claim that they developed a novel framework. However, the work appears to make a number of extensions (e.g. incorporating new geographic regions, new covariates, updated data) to previously published methods. Please make the novelty of the framework more clear.

Thank you, we have addressed this in both the Abstract and the Materials and methods sections.

5) Limited information on the model inputs are provided in the Materials and methods section. Where the authors point to previously published methods or dataset, they should at least provide a brief summary of the method/dataset. For updated or new datasets, more detailed descriptions should be provided. For example:

– YF occurrence data. Please summarise the number of occurrence records used in the analysis by geographic region and time. Briefly summarise how occurrences of YF were diagnosed. PCR? Serology? Clinically? If serological and /or clinically diagnosed cases are included, please comment on the potential impact of misdiagnosed cases (e.g. due to cross-reactivity with other flaviviruses such as dengue) and how this may have impacted your results and conclusions.

We have included further description on the occurrence data for YF including:

– linking to the outbreak dataset available on github for Africa: https://github.com/kjean/YF_outbreak_PMVC/tree/main/formatted_data

– Adding further detail on the diagnosis methods and the inclusion of suspected YF cases as a measure of surveillance effort

– NHPs data. What is the difference between habitat suitability and occurrence of non-human primates? Also, I suggest that you do not refer to the species range maps provided by IUCN as "species distributions" as these are not modelled species distributions (as the authors point out in the Discussion). Perhaps refer to them as "range maps". Please clarify what you mean by "species richness". Did you count the number of species range maps covering at least 10% of each province? Which NHPs did you include and what was the justification?

Thank you, we have added further information detailing how the NHP data is included and correcting the language from “species distribution” to “range maps”. The included NHP families are mentioned in Table 1 and all species within those families are included in the calculation of “species richness”, we have detailed this further in the data description.

– Vector data. Ae aegypti is the main vector for urban yellow fever transmission, please justify why you included Ae Albopictus as a potential covariate.

There are a number of vectors of YF, Aegypti is the main urban vector. However, albopictus is also known to be able to carry YF. As such, we included both in the covariate selection process. We have noted this in the list of covariates. However, Ae. Albopictus was not found to be significant in the final models.

6) How were severe infections and deaths estimated from transmission intensity surface? Please provide details in the Materials and methods section.

Severe infections and deaths were calculated from the model output: infections scaled by the full uncertainty range of Johansson et al. who estimated the proportion of infections considered severe and, of those, the proportion of those who go on to die of YF. We have added further clarification in the text and in the flow diagram, Figure 2.

7) Please define what you mean by "potential" deaths in the Materials and methods and/or Results section. You mention it in the Discussion, but it should be earlier. Conceptually this was a bit confusing because "potential" could be interpreted as assuming no vaccination.

We have added a paragraph on this in the Materials and methods to further explain what we indicate by “potential”. We aimed to describe the fact that what we present captures an average burden over the years and does not, for example, include some of the stochastic effects that may lead to outbreaks or lack thereof.

https://doi.org/10.7554/eLife.64670.sa2

Article and author information

Author details

  1. Katy AM Gaythorpe

    WHO Collaborating Centre for Infectious Disease Modelling, MRC Centre for Global Infectious Disease Analysis, Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), Imperial College London, London, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    k.gaythorpe@imperial.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3734-9081
  2. Arran Hamlet

    WHO Collaborating Centre for Infectious Disease Modelling, MRC Centre for Global Infectious Disease Analysis, Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), Imperial College London, London, United Kingdom
    Contribution
    Data curation, Formal analysis, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Kévin Jean

    Maître de conférences, Laboratoire MESuRS - Cnam Paris, Paris, France
    Contribution
    Formal analysis, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6462-7185
  4. Daniel Garkauskas Ramos

    Secretariat for Health Surveillance, Brazilian Ministry of Health, Brasilia, Brazil
    Contribution
    Data curation, Validation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Laurence Cibrelus

    World Health Organisation, Geneva, Switzerland
    Contribution
    Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Tini Garske

    WHO Collaborating Centre for Infectious Disease Modelling, MRC Centre for Global Infectious Disease Analysis, Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), Imperial College London, London, United Kingdom
    Contribution
    Conceptualization, Data curation, Supervision, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Neil Ferguson

    WHO Collaborating Centre for Infectious Disease Modelling, MRC Centre for Global Infectious Disease Analysis, Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), Imperial College London, London, United Kingdom
    Contribution
    Supervision, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared

Funding

Bill and Melinda Gates Foundation (OPP1117543)

  • Katy AM Gaythorpe
  • Tini Garske
  • Neil Ferguson

Medical Research Council (MR/R015600/1)

  • Katy AM Gaythorpe
  • Arran Hamlet
  • Tini Garske
  • Neil Ferguson

Bill and Melinda Gates Foundation

  • Katy AM Gaythorpe
  • Tini Garske
  • Neil Ferguson

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Senior Editor

  1. Miles P Davenport, University of New South Wales, Australia

Reviewing Editor

  1. Jennifer Flegg, The University of Melbourne, Australia

Reviewers

  1. Jennifer Flegg, The University of Melbourne, Australia
  2. Alex T Perkins, University of Notre Dame, United States

Version history

  1. Received: November 6, 2020
  2. Accepted: February 23, 2021
  3. Version of Record published: March 16, 2021 (version 1)

Copyright

© 2021, Gaythorpe et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 5,139
    Page views
  • 438
    Downloads
  • 38
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Katy AM Gaythorpe
  2. Arran Hamlet
  3. Kévin Jean
  4. Daniel Garkauskas Ramos
  5. Laurence Cibrelus
  6. Tini Garske
  7. Neil Ferguson
(2021)
The global burden of yellow fever
eLife 10:e64670.
https://doi.org/10.7554/eLife.64670

Further reading

    1. Epidemiology and Global Health
    Charumathi Sabanayagam, Feng He ... Ching Yu Cheng
    Research Article Updated

    Background:

    Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD).

    Methods:

    We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40–80 y with diabetes but free of DKD who participated in the baseline and 6-year follow-up visit of the Singapore Epidemiology of Eye Diseases Study (2004–2017). Incident DKD (11.9%) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 with at least 25% decrease in eGFR at follow-up from baseline. A total of 339 features, including participant characteristics, retinal imaging, and genetic and blood metabolites, were used as predictors. Performances of several ML models were compared to each other and to logistic regression (LR) model based on established features of DKD (age, sex, ethnicity, duration of diabetes, systolic blood pressure, HbA1c, and body mass index) using area under the receiver operating characteristic curve (AUC).

    Results:

    ML model Elastic Net (EN) had the best AUC (95% CI) of 0.851 (0.847–0.856), which was 7.0% relatively higher than by LR 0.795 (0.790–0.801). Sensitivity and specificity of EN were 88.2 and 65.9% vs. 73.0 and 72.8% by LR. The top 15 predictors included age, ethnicity, antidiabetic medication, hypertension, diabetic retinopathy, systolic blood pressure, HbA1c, eGFR, and metabolites related to lipids, lipoproteins, fatty acids, and ketone bodies.

    Conclusions:

    Our results showed that ML, together with feature selection, improves prediction accuracy of DKD risk in an asymptomatic stable population and identifies novel risk factors, including metabolites.

    Funding:

    This study was supported by the National Medical Research Council, NMRC/OFLCG/001/2017 and NMRC/HCSAINV/MOH-001019-00. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

    1. Epidemiology and Global Health
    C Kim, Benjamin Chen ... RECOVER Mechanistic Pathways Task Force
    Review Article

    The NIH-funded RECOVER study is collecting clinical data on patients who experience a SARS-CoV-2 infection. As patient representatives of the RECOVER Initiative’s Mechanistic Pathways task force, we offer our perspectives on patient motivations for partnering with researchers to obtain results from mechanistic studies. We emphasize the challenges of balancing urgency with scientific rigor. We recognize the importance of such partnerships in addressing post-acute sequelae of SARS-CoV-2 infection (PASC), which includes ‘long COVID,’ through contrasting objective and subjective narratives. Long COVID’s prevalence served as a call to action for patients like us to become actively involved in efforts to understand our condition. Patient-centered and patient-partnered research informs the balance between urgency and robust mechanistic research. Results from collaborating on protocol design, diverse patient inclusion, and awareness of community concerns establish a new precedent in biomedical research study design. With a public health matter as pressing as the long-term complications that can emerge after SARS-CoV-2 infection, considerate and equitable stakeholder involvement is essential to guiding seminal research. Discussions in the RECOVER Mechanistic Pathways task force gave rise to this commentary as well as other review articles on the current scientific understanding of PASC mechanisms.