Epidemiological dynamics of Ebola outbreaks
Abstract
Ebola is a deadly virus that causes frequent disease outbreaks in the human population. In this study, we analyse its rate of new introductions, case fatality ratio, and potential to spread from person to person. The analysis is performed for all completed outbreaks and for a scenario where these are augmented by a more severe outbreak of several thousand cases. The results show a fast rate of new outbreaks, a high case fatality ratio, and an effective reproductive ratio of just less than 1.
https://doi.org/10.7554/eLife.03908.001eLife digest
The West Africa outbreak of Ebola virus disease is larger than any of the previous outbreaks over the last four decades. Most human outbreaks likely begin when a person is infected after contact with an infected wild animal—but during an outbreak the virus can spread from persontoperson via contact with blood or other bodily fluids. There is no vaccine against Ebola nor is there a specific treatment. The percentage of infected people who have been killed by the Ebola virus in the past outbreaks varies from 50% to 90%. However, predicting how an outbreak will progress once it has started remains difficult.
For any infectious disease, it is important to estimate how many new people, on average, each person with the disease will go on to infect. When this value—called the ‘basic reproductive ratio’ (or R_{0})—is greater than 1, a significant percentage of the population is expected to eventually become infected if medical interventions are not introduced. Conversely, when R_{0} is less than 1, chance events may lead to a large number of cases, but only a fraction of the total population will be affected.
Previous estimates of the basic reproductive ratio for Ebola gave values greater than 1, making it unclear why all the completed outbreaks of Ebola had infected at most several hundred people and had not caused global pandemics. Medical intervention and control measures were generally considered the most likely answer. However, it is important to note that these previous predictions were made using data from only two large outbreaks of Ebola in 1995 and 2000.
Now, House has used a different modelling approach to estimate Ebola's reproductive ratio. The new model is based on data from for all 24 completed Ebola outbreaks and includes the time between outbreaks, the number of deaths, and the final number of cases. The model also included ‘data’ from a hypothetical scenario of a more severe outbreak with several thousand cases.
House revealed that new outbreaks tend to occur frequently and that often a large percentage of those infected with Ebola will die of the disease, although the exact values vary between different outbreaks. Furthermore, if there is no fundamental change compared to the past, the analysis predicts that the ‘effective reproductive ratio’ for persontoperson spread of Ebola (which takes into account the effect of medical intervention) is just less than 1. It also predicts that the final number of cases can be very different for different outbreaks.
House concluded that at first the current West African outbreak was unusual but still consistent with the pattern of previous outbreaks. However, as the number of people infected continued to grow, it makes this less plausible. It is now more likely that there is some fundamental difference, for example in the infectiousness of the Ebola strain, in the current outbreak compared to all the previous outbreaks; although further work would be needed to confirm this.
https://doi.org/10.7554/eLife.03908.002Introduction
Ebola virus disease is an often fatal disease of humans that is not vaccinepreventable and has no specific treatment. A total of 25 outbreaks, believed to have arisen due to zoonotic transmission from wild mammals, have occurred since the first observed cases in humans in 1976 (World Health Organisation, 2014a). The current epidemic is the largest to date (World Health Organisation, 2014b). This gives particular urgency to quantitative estimation of epidemiological quantities relevant to Ebola, such as case fatality ratio, timing of new outbreaks, and the strength of humantohuman transmission.
The most important epidemiological quantity to estimate for an infectious disease is typically the basic reproductive ratio, ${R}_{0}$, defined as the expected number of secondary cases produced per primary case early in the epidemic (Diekmann et al., 1990). When ${R}_{0}$ is greater than 1, the expectation is that a new epidemic will eventually infect a significant percentage of the population if it is not stopped by interventions or chance extinction; conversely, when ${R}_{0}$ is less than 1, chance events may lead to a large number of cases, but these are always expected to be much less numerous than the total population size.
Previous attempts to estimate ${R}_{0}$ for Ebola have found values between 1.34 and 3.65 by fitting compartmental epidemic models to the incidence over time of the large outbreaks in the Democratic Republic of Congo in 1995 and Uganda in 2000 (Chowell et al., 2004; Ferrari et al., 2005; Legrand et al., 2007), with similar results obtained for the ongoing outbreak (Althaus, 2014). This leads to the question of why all completed outbreaks numbered at most several hundreds, with the typical answer being that the medical and social response to an outbreak reduces transmission, leading to an effective reproductive ratio ${R}_{t}<{R}_{0}$ (Chowell et al., 2004; Legrand et al., 2007), although it is also important to note that heterogeneity in transmission can lead to extremely high probabilities of an outbreak becoming extinct even if ${R}_{t}$ is slightly greater than 1 (LloydSmith et al., 2005).
Results
Figure 1 shows the results of fitting to times between outbreaks, with Figure 1A showing the empirical distribution of times between outbreaks together with the fitted model distribution that has mean $1.49[1.02,2.24]$ years between outbreaks and Figure 1C showing the posterior for the rate parameter. Figure 1 also shows the results of fitting CFR to number of deaths and final size, with Figure 1B showing empirical CFRs for different outbreaks together with the fitted model distribution. Other plots in Figure 1D,E show the posteriors for the beta distribution parameters.
Figure 2 shows the results of fitting to completed outbreaks, with Figure 2A,B giving the fitted distribution against data, Figure 2C showing the posterior for the reproductive ratio, which is estimated to be ${R}_{t}=0.88[0.64,0.96]$. Figure 2D shows the posterior for the geometric parameter, which is estimated to be $p=0.089[0.029,0.19]$.
While the model is designed not to depend explicitly on the temporal dynamics of Ebola virus disease, Figure 3A shows a set of 24 outbreaks simulated from a continuoustime Markov chain with the same probability distribution for final size as the estimated model. These show behaviour that is typical of nearcritical branching processes, which often becoming extinct early but also often grows to significant size before extinction. Figure 3B plots the likelihood surface for these simulated data showing parameter identifiability.
Figure 4 shows the results of fitting to completed outbreak final sizes augmented by an outbreak of uncertain size in the range 1000–5000. In this study, Figure 4A gives the fitted distribution against data, and Figure 4B shows the posterior for the probability of the additional outbreak, which is estimated to be $0.023[0.0015,0.088]$. Figure 4C shows the posterior for the reproductive ratio, which is estimated to be ${R}_{t}=0.94[0.87,0.99]$, and Figure 4D shows the posterior for the geometric parameter, which is estimated to be $p=0.11[0.054,0.21]$.
Discussion
The results obtained point to the following conclusions about Ebola transmission dynamics. (i) The rate of new epidemics and CFR are both high, but with significant variability from outbreak to outbreak. (ii) The effective reproductive ratio ${R}_{t}$ for persontoperson transmission is just below 1. (iii) There is extremely large variability in the final size of outbreaks.
It is also important to consider the sensitivity of these conclusions. A larger final size for the current outbreak (but still significantly less than the population size of a country) as suggested by the analysis above will tend to lead to a narrower posterior about a value of ${R}_{t}$ closer to 1; this can be understood from general properties of branching processes (Athreya and Ney, 1992). Such a finely tuned constant value of ${R}_{t}$ would, however, become increasingly difficult to interpret as a fundamental property of the outbreak and a modelling approach in which ${R}_{t}$ was allowed to vary in time—along with the public health and behavioural responses—would be preferred.
Also, it is possible that a number of small outbreaks were not recorded by the WHO. This could be addressed through incorporation of additional variability into the model through introduction of explicit overdispersal parameters as in the study by LloydSmith et al. (LloydSmith et al., 2005) and Blumberg and LloydSmith (Blumberg and LloydSmith, 2013), although for the data currently available there was no strong evidence for overdispersal beyond that implied by the geometric distributions.
All of these conclusions suggest no reason for complacency and give support to appeals for greater resources to respond to the ongoing epidemic (Médecins Sans Frontières, 2014).
Materials and methods
Description
Request a detailed protocolIn this study, a different approach is taken based on using the time between outbreaks, number of deaths, and final number of cases, for all 24 completed Ebola outbreaks reported by the World Health Organisation (World Health Organisation, 2014a). Full mathematical details of the approach are given below.
First, we model the start of new outbreaks as a ‘memoryless’ Poisson process with a rate λ. Secondly, we assume that each new outbreak has a case fatality ratio (CFR—the probability that a case will die) picked from a beta distribution. Thirdly, the final size model involves two components: (i) a geometrically distributed number of cases, A, which includes cases arising from animaltohuman and precontrol transmission; (ii) a branching process model of humantohuman transmission (Athreya and Ney, 1992; Ball and Donnelly, 1995), whose offspring distribution has mean ${R}_{t}$, generating Z cases. The final size is then $K=A+ZA$. This quantity should be interpreted as arising from a combination of ${R}_{t}$, ${R}_{0}$, and timing of interventions.
Bayesian MCMC with uninformative priors was used to fit all models (Gilks et al., 1995). Since doubts have been raised in the literature about the use of final size data for emerging diseases (Drake, 2005), a simulation study was also performed to test identifiability, although a recent study by Blumberg and LloydSmith (Blumberg and LloydSmith, 2013) of joint identifiability of two parameters in a related model is also highly relevant in this context.
Finally, the final size data were augmented by an outbreak of unknown size in the range 1000–5000 (with mathematical details given by Equation. (5), below) and the model was refitted. Due to the significant uncertainty in the severity of the current outbreak, this is not intended to be a realtime analysis, but rather to show how the modelling approach responds to such a scenario in general.
Technical details
Transmission model
Request a detailed protocolEach outbreak has an initial number of cases A and a secondary number of cases Z. The total outbreak size is $K=A+ZA$. We model the number of initial cases as a shifted geometric distribution,
We then model the number of secondary cases as the total progeny of a Galton–Watson branching process with A initial individuals and offspring distribution given by a geometrically distributed random variable ξ with mean ${R}_{t}\u2255(1q)/q$. We adapt the results from Ball and Donnelly (Ball and Donnelly, 1995) to our model, giving
This gives a formula for the total size of the outbreak of
If the data D consists of a set of ${k}_{i}$ (which is the size of outbreak i, with N the total number of outbreaks) then the likelihood function for the transmission model is
When the data $D\prime $ consists of the set of ${k}_{i}$ augmented by an outbreak of size between ${\kappa}_{1}$ and ${\kappa}_{2}$, we use likelihood function
New outbreak model
Request a detailed protocolWe model the start of new outbreaks in the human population as a Poisson process of rate λ. If the time period over which N outbreaks is observed is T years, then the likelihood is
We estimate $\lambda =0.67[0.45,0.98]$, with posterior distribution given in Figure 1C. The probability density function for t being the next outbreak time is
which is shown in Figure 1A.
Case fatality model
Request a detailed protocolWe let ${C}_{i}$ be a random variable for the probability of fatality for a given case in outbreak i. We assume a parametric model in which this is drawn from a beta distribution, meaning that the probability density function is
Then if ${d}_{i}\le {k}_{i}$ is the number of fatalities in outbreak i, treating each fatality as independent, conditioned on infection, gives
Then the likelihood is
We estimate $\alpha =6.1[2.8,11]$ and $\beta =3.1[1.5,5.9]$, with posterior distributions given in Figure 1D,E.
Statistical methodology
Request a detailed protocolThe MCMC methodology used was Randomwalk Metropolis–Hastings with thinning to produce ${10}^{3}$ uncorrelated samples, with each posterior ultimately derived from one long chain. The parameter spaces involved are lowdimensional enough that largescale sweeps can be performed to check for multimodality, which was not observed, and convergence of the chains was observed to be fast and independent of initial conditions.
For the simulation study, the realtime incidence curves are produced by modelling the geometric distributions as arising from Poissonian transmission with exponentially distributed rates. The times between new introductions are not explicitly modelled or shown.
Code
Request a detailed protocolMATLAB code to reproduce the analysis of this paper is available at: https://github.com/thomasallanhouse/elifeebolacode.
Data availability
References

Estimating the reproduction number of Zaire ebolavirus (EBOV) during the 2014 outbreak in West AfricaPLOS Currents Outbreaks, Sep 2, 10.1371/currents.outbreaks.91afb5e0f279e7f29e7056095255b288.

Strong approximations for epidemic modelsStochastic Processes and their Applications 55:1–21.https://doi.org/10.1016/03044149(94)00034Q

Inference of R(0) and transmission heterogeneity from the size distribution of stuttering chainsPLOS Computational Biology 9:e1002993.https://doi.org/10.1371/journal.pcbi.1002993

The basic reproductive number of Ebola and the effects of public health measures: the cases of Congo and UgandaJournal of Theoretical Biology 229:119–126.https://doi.org/10.1016/j.jtbi.2004.03.006

Estimation and inference of ${R}_{0}$ of an infectious pathogen by a removal methodMathematical Biosciences 198:14–26.https://doi.org/10.1016/j.mbs.2005.08.002

Understanding the dynamics of ebola epidemicsEpidemiology and Infection 135:610–621.https://doi.org/10.1017/S0950268806007217

Ebola in West Africa: “The epidemic is out of control”Accessed 21 August 2014. http://www.msf.org.uk/node/25511.

Ebola virus disease, Fact sheet Number 103Accessed 21 August 2014. http://www.who.int/mediacentre/factsheets/fs103/en/.

Global Alert and response, ebola virus diseaseAccessed 21 August 2014. http://www.who.int/csr/don/archive/disease/ebola/en/.
Decision letter

Prabhat JhaReviewing Editor; University of Toronto, Canada
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for choosing to send your work entitled “Epidemiological Dynamics of Ebola Outbreaks” for consideration at eLife. Your full submission has been evaluated by Prabhat Jha (Senior editor), a Reviewing editor, and 3 peer reviewers, and the decision was reached after discussions between the reviewers. We regret to inform you that your work will not be considered further for publication in its current form.
However, given the importance of scientific debate on Ebola transmission, we would review (quickly) a substantially revised version of this paper. To consider this, we would need to be assured that you have addressed each of the substantive points below. Please also comment if you can on the specific role of any putative therapies (such as monoclonal antibodies) in affecting the R0 estimation.
The following individuals responsible for the peer review of your submission have agreed to reveal their identity: Prabhat Jha, Matt Ferrari, and Sake de Vlas.
Reviewer 1:
1) In general I found this to be a sound and reasonable analysis. I find that the author's assertion that R0>1 is a sufficient condition for a global pandemic to be overstated, however, and I would strongly recommend tempering the language. There are a great many conditions that contribute to the likelihood of a pandemic, and the epidemic potential, as encapsulated in R0 is only one of these a necessary condition, but hardly sufficient.
Given that the author is motivating this manuscript with the current Ebola outbreak, it would be quite interesting to comment on how likely the current outbreak (i.e. an outbreak equal to or greater than the current outbreak) would be under the author's fitted model.
Reviewer 2:
It is an interesting idea trying to capture the epidemiological characteristics of all observed Ebola outbreaks in just 4 parameters: p, R0 (or a function of q), α, and β. However, in my view the approach is far too simplistic. In particular, the assumption that R0 (or q) has the same value everywhere and at every time is of course incorrect. This makes the suggestion of a “global pandemic” inappropriate in the first place; the conditions that have favoured spread of Ebola in tropical Africa will never apply to most other continents.
Another problem is that the same R0 is assumed to apply throughout an outbreak. This is why the estimated posterior values of R0 are <1, for the simple reason that all observed epidemics have ended. Usually, outbreak situations are characterized by an initial value of R0 > 1 (or sometimes slightly <1), followed by a decrease through interventions, behaviour change, lack of sufficient susceptible individuals, or a combination. R0 then becomes Rt. Ebola is a terrible disease and during outbreaks people are known to change their behaviour, e.g. by avoiding contact with suspected cases. Also, isolation of patients is a very effective means of control, dramatically reducing the risk of further spread as the outbreak progresses.
Thus, these analyses need to be made more sophisticated by including heterogeneity in values of R0, between outbreaks and especially within outbreaks. Within outbreaks the value of Rt (which has value R0 at time 0) should be allowed to change (i.e. decrease), which could perhaps be done by including a time series component in the MCMC approach.
Furthermore, the author assumes the data to be perfect. However, is this true? There may have been several unnoticed outbreaks of a single to a few cases that never ended up in the WHO data base. How would this change the findings? There should be more discussion and (sensitivity) analyses about this issue.
Give the current topicality it would also be interesting to relate the ongoing very large outbreak (about 700 fatalities) to the presented analyses. Can such a large outbreak still be explained by this simple approach?
Reviewer 3:
Using the estimate, he concludes R0 for Ebola is less than one. I must admit, it is a simple and elegant paper, and on the face of it, it seems like a very good idea.
My chief critique is that the author has not cited and seems to be unaware of an 8year old critique of the methodology by Drake:
(2006) The Difficulties of Predicting the Outbreak Sizes of Epidemics. PLOS Med 3(1): e23. doi:10.1371/journal.pmed.0030023
In sum, Drake concludes that in the event of a delayed onset intervention, that the final size of an epidemic could be anything. The final size is thus a poor metric for R0 estimation.
To be suitable for publication, the article would have to address the issues raised by Drake. In some sense the article would have to make the case that the model being used was appropriate to the task. The model is very simple, but Drake's point is that the final size is highly sensitive to departures from that model, and many of the modifying assumptions would likely be highly appropriate for Ebola. If a new model were adopted, the reanalysis would be extensive.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled “Epidemiological Dynamics of Ebola Outbreaks” for further consideration at eLife. Your revised article has been favorably evaluated by Prabhat Jha (Senior editor) and four reviewers. We would like to reach a final decision on this, however, and given the important public scrutiny to Ebola virus, we invite you to pay specific attention to the following in another revision of this work:
1) Provide a clearer explanation of Rt and the extent to which it fits the observed ongoing spread of Ebola (including the value of being less than <1). Several reviewers have raised specific queries on the estimation, to which we would like you to respond, specifically to those of Sake de Vlas. For example, please explain:
(a) If Ro >1, but then control is implemented and makes Re < 1, then the final size of the epidemic depends on when the control was implemented. Conversely, the final size distribution reflects Ro, Re, and timing, not just Ro. This must be discussed as a caveat to the use of final size to estimate Ro.
(b) The fixed value of Rt is rather unlikely. Still, with Rt slightly < 1, (sometimes large) outbreaks could occur, followed by 'natural' extinction. It remains an interesting theoretical exercise to demonstrate that this simplistic assumption fits to the Ebola data, including the current multicountry epidemic. Please ensure that you highlight this limitation properly in the discussion.
2) Please review each of the reviewer comments (below) and reply to me.
The next stage will involve a rapid review by the editor – Prabhat Jha – to see if the major concerns have been addressed, and if so, the paper will be accepted as submitted. If it is not, we will have to send this out for additional review. Thus, please do these replies carefully and make changes to the manuscript that you see fairly addressing these concerns.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
Reviewer 1:
The author has addressed many of the concerns from the earlier reviews and I appreciate the attempts to a) temper the language, and b) include some analysis in light of the current outbreak.
I still think that the author has not described the latter analysis very well. The main text states that the data were “augmented”, though there is no real detail about what this is meant to mean (either in the main text or the appendix). I can imagine what was likely done, but the reader should have to imagine what methods were used. I would strongly suggest that the author clarify this methodology and provide some real interpretation of the output. At present, Figure 5 is presented without any real synthesis. The added paragraph in the results simply restates the figure legend without presenting any real interpretation or speculation on how to interpret these results in light of the current outbreak. I'm not suggesting that the author overreach here, but this analysis would permit some kind of statement like “if the current outbreak were to exceed ##, then there would be strong evidence of a shift in dynamics from that which resulted in the prior pattern of outbreaks.”
Reviewer 2: Sake de Vlas
First of all, I would like to reemphasize that the model used for final size of Ebola outbreaks is too simplistic, or too 'general' in the author's words. Outbreaks are normally characterized by an initial value of Rt = R0 (usually >1), which reduces to a lower value (usually <1) during the epidemic, as a consequence of interventions or behaviour change of the population.
There is no reason to believe that this would be different for Ebola. On the contrary, Ebola is known to steadily spread (unnoticed) for some time, but after detection outbreaks are rapidly contained by stringent control through isolation of patients, often heroically organized by MSF. The current multicountry outbreak is characterized by a lack of trust by the affected populations, so that control cannot be organized well and transmission continues. A proper model, certainly a practical one to be used for making predictions of future course and the possible impact of interventions, should take into account such mechanisms.
Having said that, I still consider it an interesting mathematical thought experiment to study to which extent previous outbreaks (and the current one) might be explained by assuming a simple constant value of Rt. That this would result in a value of Rt slightly less than 1 is not a surprise at all, since this is the only way that outbreaks can both occur and eventually die out naturally. I remain a bit disappointed that the author did not manage to include a simple process of (e.g. logistic) decline in Rt during an epidemic, but hopefully this will be the subject of a followup study.
I am happy with the much improved Introduction and Discussion. The description of the Bayesian approach is adequate as well. Figure 5 is a useful addition, even though it expresses that the current massive outbreak is perceived by the model as just a rare random event. Crudely, with the estimated average 1.5 years between outbreaks and a 0.023 chance of a big one, such an outbreak is expected to occur once every 1.5/0.023 = 65 years. Would that be realistic?
All in all, I recommend acceptance of the study. Remaining comments:
1) The technical annex still mentions R0, whereas Rt would be expected.
2) Figure 4A, giving an illustration of the timing of epidemics, requires some more explanation. It seems that somewhere a generation time has been assumed, since otherwise the timing of successive new cases within outbreaks cannot be related to the timing of new outbreaks.
3) Perhaps the author can calculate the occurrence of largesize outbreaks more precisely, using the detailed results of the Bayesian analyses.
Reviewer 3:
The response to my comment – the one dealing with the critique of methods from Drake et al. – was inadequate. In fact, I can't tell whether the author had read the whole article.
Drake's point about variability in the final size of an epidemic is a serious challenge to the validity of this work. Such critiques must be dealt with in substantive ways, and not by putting a bandaid on the analysis.
Drake's most critical point, illustrated in Figure 4, was about delayed onset of interventions. This is the most serious challenge to interpreting the analysis of final size as an estimate of R0. In fact, it seems Ebola fits this quite well, in that interventions are applied unevenly throughout the course of an epidemic, once detected, the epidemics are generally nipped in the bud and they die out. What seems much more likely to me is that R0 is higher than one before control measures are implemented, and less than one after that. In such situations, the final size of an epidemic has much more to do with the timing of the onset of interventions.
This is the point that House has not dealt with at all, as far as I can tell, and it is a serious challenge to the validity of his estimate.
Reviewer 4: Simon Hay
I think Thomas House has done a comprehensive job at responding to the reviewers concerns. The article is extremely brief, as per the format, but written clearly.
1) Critical arguments are around whether the concerns raised by reviewer 2 and those of Drake (2006) raised by reviewer 3, will need to be evaluated by the relevant individuals and I will be very interested in their responses.
2) Apologies if I have missed them, but there is no plot of the epidemics by size and time: that is the core dataset. This would be useful to show the reader and I imagine with a little imagination one could show all the baseline data being used. The data/code etc should also be made available on Dryad or the like.
3) When plotted elsewhere I note that there is a distinct change in the frequency of the epidemics pre and post 2000, with the frequency much greater in the last 15 years. Does this nonstationary pattern (hesitate to use the word trend) in outbreaks effect the analyses.
4) I think the impact statement should be refined. Ebola is clearly transmissible enough to cause a major epidemic – we are in the midst of one. I don't think this is exactly this is what you mean to say. Is it something more along the lines of “Despite the Rt of Ebola being less than 1, it can lead to the generation of a range of outbreak sizes consistent with the scale of the ongoing epidemic”? Regardless it needs clarifying.
https://doi.org/10.7554/eLife.03908.007Author response
Reviewer 1:
1) In general I found this to be a sound and reasonable analysis. I find that the author's assertion that R0>1 is a sufficient condition for a global pandemic to be overstated, however, and I would strongly recommend tempering the language. There are a great many conditions that contribute to the likelihood of a pandemic, and the epidemic potential, as encapsulated in R0 is only one of these a necessary condition, but hardly sufficient.
This is a fair comment and I have significantly reworded the manuscript, removing the reference to a “global pandemic” throughout.
Given that the author is motivating this manuscript with the current Ebola outbreak, it would be quite interesting to comment on how likely the current outbreak (i.e. an outbreak equal to or greater than the current outbreak) would be under the author's fitted model.
I have carried out significant additional MCMC runs to address this issue, leading to the results shown in Figure 5.
Reviewer 2:
It is an interesting idea trying to capture the epidemiological characteristics of all observed Ebola outbreaks in just 4 parameters: p, R0 (or a function of q), α, and β. However, in my view the approach is far too simplistic. In particular, the assumption that R0 (or q) has the same value everywhere and at every time is of course incorrect. This makes the suggestion of a “global pandemic” inappropriate in the first place; the conditions that have favoured spread of Ebola in tropical Africa will never apply to most other continents.
Another problem is that the same R0 is assumed to apply throughout an outbreak. This is why the estimated posterior values of R0 are <1, for the simple reason that all observed epidemics have ended. Usually, outbreak situations are characterized by an initial value of R0 > 1 (or sometimes slightly <1), followed by a decrease through interventions, behaviour change, lack of sufficient susceptible individuals, or a combination. R0 then becomes Rt. Ebola is a terrible disease and during outbreaks people are known to change their behaviour, e.g. by avoiding contact with suspected cases. Also, isolation of patients is a very effective means of control, dramatically reducing the risk of further spread as the outbreak progresses.
Thus, these analyses need to be made more sophisticated by including heterogeneity in values of R0, between outbreaks and especially within outbreaks. Within outbreaks the value of Rt (which has value R0 at time 0) should be allowed to change (i.e. decrease), which could perhaps be done by including a time series component in the MCMC approach.
I am in general agreement with the spirit of these points. I now discuss R_{T} rather than R_{0}, and have removed discussion of a “global pandemic”. I am in slight disagreement with the reviewer on two technical matters: (1) If a pattern of minor outbreaks is really consistent with a supercritical branching process model (which it may well be) then there will in fact be some posterior mass in that region of parameter space. (2) A discretetime stochastic model like a Galton Watson process can be a good approximation to the final outcome of a great many realtime epidemic models as seen through coupling arguments – the standard stochastic SIR and SEIR models are just two that will lead to exactly the model considered, but a great many others will be well approximated by it.
The model proposed is not intended as a fully mechanistic approach; instead I hope to capture a small number of key mechanisms in a principled way, while still being able to implement a full likelihoodbased fitting procedure without any methods such as fixing parameters by hand.
Furthermore, the author assumes the data to be perfect. However, is this true? There may have been several unnoticed outbreaks of a single to a few cases that never ended up in the WHO data base. How would this change the findings? There should be more discussion and (sensitivity) analyses about this issue.
This is now considered in the Discussion section; since the models involved are reasonably transparent I argue that it is clear how they would respond to such a scenario.
Give the current topicality it would also be interesting to relate the ongoing very large outbreak (about 700 fatalities) to the presented analyses. Can such a large outbreak still be explained by this simple approach?
I have carried out significant additional MCMC runs to address this issue, leading to the results shown in Figure 5.
Reviewer 3:
Using the estimate, he concludes R0 for Ebola is less than one. I must admit, it is a simple and elegant paper, and on the face of it, it seems like a very good idea.
My chief critique is that the author has not cited and seems to be unaware of an 8year old critique of the methodology by Drake:
(2006) The Difficulties of Predicting the Outbreak Sizes of Epidemics. PLOS Med 3(1): e23. doi:10.1371/journal.pmed.0030023
In sum, Drake concludes that in the event of a delayed onset intervention, that the final size of an epidemic could be anything. The final size is thus a poor metric for R0 estimation.
To be suitable for publication, the article would have to address the issues raised by Drake. In some sense the article would have to make the case that the model being used was appropriate to the task. The model is very simple, but Drake's point is that the final size is highly sensitive to departures from that model, and many of the modifying assumptions would likely be highly appropriate for Ebola. If a new model were adopted, the reanalysis would be extensive.
I have now performed a simulation study to address this issue. As I interpret it, Drake’s argument is that there can be significant variability in the distribution of the final size KR_{0}, which has implications for predictability. Given N observed samples from such a distribution, however, the posterior distribution for R_{0}{K_{I}}^{N}_{I=1} need not be excessively wide, as the simulation study indeed shows. I also make reference to the recent study by Blumberg and LloydSmith that deals with these issues for a related model in more detail. As I argue above, the simplicity of the model (which I would prefer to call ‘generality’ but of course I’d say that) is a strength in this context since it can be fitted with full statistical rigour and approximates the behaviour of many other models.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
1) Provide a clearer explanation of Rt and the extent to which it fits the observed ongoing spread of Ebola (including the value of being less than <1). Several reviewers have raised specific queries on the estimation, to which we would like you to respond, specifically to those of Sake de Vlas. For example, please explain:
(a) If Ro >1, but then control is implemented and makes Re < 1, then the final size of the epidemic depends on when the control was implemented. Conversely, the final size distribution reflects Ro, Re, and timing, not just Ro. This must be discussed as a caveat to the use of final size to estimate Ro.
I have added the following sentence to the second paragraph of the Methods, about the final outbreak size random variable: “This quantity should be interpreted as arising from a combination of R_{T}, R_{0} and timing of interventions.”
(b) The fixed value of Rt is rather unlikely. Still, with Rt slightly < 1, (sometimes large) outbreaks could occur, followed by 'natural' extinction. It remains an interesting theoretical exercise to demonstrate that this simplistic assumption fits to the Ebola data, including the current multicountry epidemic. Please ensure that you highlight this limitation properly in the discussion.
The following sentence has been added to the second paragraph of the Discussion: “Such a finely tuned constant value of R_{T} would, however, become increasingly difficult to interpret as a fundamental property of the outbreak and a modelling approach in which R_{T} was allowed to vary in time – along with the public health and behavioural responses – would be preferred.”
Reviewer 1:
The author has addressed many of the concerns from the earlier reviews and I appreciate the attempts to a) temper the language, and b) include some analysis in light of the current outbreak.
I still think that the author has not described the latter analysis very well. The main text states that the data were “augmented”, though there is no real detail about what this is meant to mean (either in the main text or the appendix). I can imagine what was likely done, but the reader should have to imagine what methods were used. I would strongly suggest that the author clarify this methodology and provide some real interpretation of the output. At present, Figure 5 is presented without any real synthesis. The added paragraph in the results simply restates the figure legend without presenting any real interpretation or speculation on how to interpret these results in light of the current outbreak. I'm not suggesting that the author overreach here, but this analysis would permit some kind of statement like “if the current outbreak were to exceed ##, then there would be strong evidence of a shift in dynamics from that which resulted in the prior pattern of outbreaks.”
The method of augmentation is now given in the Methods, in particular Equation (5).
The additional sentence in the Discussion section detailed above should address this issue; I am not sure how to quantify the appropriate number, but at some point the model becomes unnatural and the solution to this is likely to be timeinhomogeneity as suggested by the reviewers.
Reviewer 2: Sake de Vla
First of all, I would like to reemphasize that the model used for final size of Ebola outbreaks is too simplistic, or too 'general' in the author's words. Outbreaks are normally characterized by an initial value of Rt = R0 (usually >1), which reduces to a lower value (usually <1) during the epidemic, as a consequence of interventions or behaviour change of the population.
There is no reason to believe that this would be different for Ebola. On the contrary, Ebola is known to steadily spread (unnoticed) for some time, but after detection outbreaks are rapidly contained by stringent control through isolation of patients, often heroically organized by MSF. The current multicountry outbreak is characterized by a lack of trust by the affected populations, so that control cannot be organized well and transmission continues. A proper model, certainly a practical one to be used for making predictions of future course and the possible impact of interventions, should take into account such mechanisms.
Having said that, I still consider it an interesting mathematical thought experiment to study to which extent previous outbreaks (and the current one) might be explained by assuming a simple constant value of Rt. That this would result in a value of Rt slightly less than 1 is not a surprise at all, since this is the only way that outbreaks can both occur and eventually die out naturally. I remain a bit disappointed that the author did not manage to include a simple process of (e.g. logistic) decline in Rt during an epidemic, but hopefully this will be the subject of a followup study.
I am happy with the much improved Introduction and Discussion. The description of the Bayesian approach is adequate as well. Figure 5 is a useful addition, even though it expresses that the current massive outbreak is perceived by the model as just a rare random event. Crudely, with the estimated average 1.5 years between outbreaks and a 0.023 chance of a big one, such an outbreak is expected to occur once every 1.5/0.023 = 65 years. Would that be realistic?
I understand the reviewer’s disappointment, but the issue here is really one of lack of data availability and quality for a temporal analysis, together with issues of mathematical formulation and fitting of a more complex model, that will require a substantial additional study to address adequately.
Nevertheless, the reviewer’s general characterisation of the study is I believe fair; something more complex would be needed to inform predictions and intervention strategies, but I would argue that such a model should incorporate insights from analysis of previous outbreaks and it is this prior information that I seek to provide here.
1) The technical annex still mentions R0, whereas Rt would be expected.
This has been modified to read R_{T}.
2) Figure 4A, giving an illustration of the timing of epidemics, requires some more explanation. It seems that somewhere a generation time has been assumed, since otherwise the timing of successive new cases within outbreaks cannot be related to the timing of new outbreaks.
This is now detailed in the Methods section.
3) Perhaps the author can calculate the occurrence of largesize outbreaks more precisely, using the detailed results of the Bayesian analyses.
This is possible; however the estimates quickly become dominated by uncertainty in parameters.
Reviewer 3:
The response to my comment – the one dealing with the critique of methods from Drake et al. – was inadequate. In fact, I can't tell whether the author had read the whole article.
Drake's point about variability in the final size of an epidemic is a serious challenge to the validity of this work. Such critiques must be dealt with in substantive ways, and not by putting a bandaid on the analysis.
Drake's most critical point, illustrated in Figure 4, was about delayed onset of interventions. This is the most serious challenge to interpreting the analysis of final size as an estimate of R0. In fact, it seems Ebola fits this quite well, in that interventions are applied unevenly throughout the course of an epidemic, once detected, the epidemics are generally nipped in the bud and they die out. What seems much more likely to me is that R0 is higher than one before control measures are implemented, and less than one after that. In such situations, the final size of an epidemic has much more to do with the timing of the onset of interventions.
This is the point that House has not dealt with at all, as far as I can tell, and it is a serious challenge to the validity of his estimate.
I believe that I have understood Drake’s point; however the reviewer has not replied to my initial response to this objection, which I believe remains valid. This was essentially that (i) regardless of temporal behaviour, the final size of an outbreak should still follow some distribution, and (ii) sufficient samples from this distribution turn out to enable parameter identifiability for the model currently under consideration. Establishing (ii) required significant additional work as presented in the revised paper, which I believe does represent a substantive attempt to address the concerns.
Reviewer 4: Simon Hay
I think Thomas House has done a comprehensive job at responding to the reviewers concerns. The article is extremely brief, as per the format, but written clearly.
1) Critical arguments are around whether the concerns raised by reviewer 2 and those of Drake (2006) raised by reviewer 3, will need to be evaluated by the relevant individuals and I will be very interested in their responses.
2) Apologies if I have missed them, but there is no plot of the epidemics by size and time: that is the core dataset. This would be useful to show the reader and I imagine with a little imagination one could show all the baseline data being used. The data/code etc should also be made available on Dryad or the like.
The data on which the analysis is based are shown in black in Figure 1A,B 2A,B and 4A – since the raw dataset is owned by WHO I am unsure about release; however I will of course provide access to the code (https://github.com/thomasallanhouse/elifeebolacode) and once the code is of sufficient quality, I will release it more generally via the EpiStruct project.
3) When plotted elsewhere I note that there is a distinct change in the frequency of the epidemics pre and post 2000, with the frequency much greater in the last 15 years. Does this nonstationary pattern (hesitate to use the word trend) in outbreaks effect the analyses.
Figure 1A shows that on aggregate, the pattern of epidemics is consistent with an Exponential/memory less distribution; the question of testing for clustering is an interesting one, however it is arguably beyond the scope of the current work.
4) I think the impact statement should be refined. Ebola is clearly transmissible enough to cause a major epidemic – we are in the midst of one. I don't think this is exactly this is what you mean to say. Is it something more along the lines of “Despite the Rt of Ebola being less than 1, it can lead to the generation of a range of outbreak sizes consistent with the scale of the ongoing epidemic”? Regardless it needs clarifying.
This was an unfortunate error, I did not notice the Impact Statement during resubmission, and will modify to “The pattern of past Ebola outbreaks is indicative of an effective reproductive ratio of less than 1, which can lead to the generation of a range of outbreak sizes consistent with the scale of the ongoing epidemic”.
https://doi.org/10.7554/eLife.03908.008Article and author information
Author details
Funding
Engineering and Physical Sciences Research Council
 Thomas House
The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Work supported by the UK Engineering and Physical Sciences Research Council. I would like to thank Deirdre Hollingsworth, Matt Keeling, and Graham Medley for helpful discussions and the Editors and Reviewers for helpful comments and suggestions.
Reviewing Editor
 Prabhat Jha, University of Toronto, Canada
Publication history
 Received: July 6, 2014
 Accepted: September 11, 2014
 Accepted Manuscript published: September 12, 2014 (version 1)
 Version of Record published: October 17, 2014 (version 2)
Copyright
© 2014, House
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 7,410
 Page views

 297
 Downloads

 25
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Epidemiology and Global Health
 Evolutionary Biology
Combining clinical and genetic data can improve the effectiveness of virus tracking with the aim of reducing the number of HIV cases by 2030.