The intractable challenge of evaluating cattle vaccination as a control for bovine Tuberculosis
Abstract
Vaccination of cattle against bovine Tuberculosis (bTB) has been a longterm policy objective for countries where disease continues to persist despite costly testandslaughter programs. The potential use of vaccination within the European Union has been linked to a need for field evaluation of any prospective vaccine and the impact of vaccination on the rate of transmission of bTB. We calculate that estimation of the direct protection of BCG could be achieved with 100 herds, but over 500 herds would be necessary to demonstrate an economic benefit for farmers whose costs are dominated by testing and associated herd restrictions. However, the low and variable attack rate in GB herds means field trials are unlikely to be able to discern any impact of vaccination on transmission. In contrast, experimental natural transmission studies could provide robust evaluation of both the efficacy and mode of action of vaccination using as few as 200 animals.
https://doi.org/10.7554/eLife.27694.001eLife digest
Bovine tuberculosis is an infectious disease of livestock and wildlife in many parts of the world. It also can spread to humans. In the United Kingdom (UK), infected cattle and badgers contribute to its spread. To control bovine tuberculosis, cattle are tested and infected animals are slaughtered. Badgers in areas near cattle are killed to keep their populations small and reduce the likelihood of them infecting cattle. These control strategies are very controversial. Testing and slaughtering cattle is expensive and many people object to badger culling.
Developing a vaccine that would protect cattle against bovine tuberculosis is a potential alternative approach being investigated by the UK government. But such a vaccination is currently illegal in Europe because vaccinated animals may test positive for infection, creating confusion. Tests for bovine tuberculosis exist, but these DIVA (short for “Differentiates Infected from Vaccinated Animals”) tests are not yet licensed for use in the UK. The European Union (EU) said it would consider relaxing its laws against bovine tuberculosis vaccination if the UK government is able to prove a vaccine is effective on farms.
Now, Conlan et al. show that the specific field trials recommended by the EU would have to be extremely large to show a benefit of vaccination. Mathematical models were used to calculate how many cattle herds a bovine tuberculosis vaccine study would need to show that it protects cattle from infection, reduces transmission of the disease, and saves farmers money. Conlan et al. show that a study including 100 herds would be large enough to prove the vaccine protected individual animals. But a trial would have to include 500 herds to show that vaccination saves farmers money.
Because transmission of bovine tuberculosis is slow in the UK, trials on working farms are unlikely to be able to measure whether vaccination reduces the spread of the disease. Instead, Conlan et al. show that smaller, less expensive experiments in controlled settings would be able to estimate the effects of bovine tuberculosis vaccination on transmission.
These results informed the UK government decision to delay farmbased studies of a bovine tuberculosis vaccine until a DIVA test is available. If vaccination and the use of a DIVA test can be proven to be effective enough to replace test and slaughter policies it could be a huge economic boon to farmers, particularly those in lower income countries.
https://doi.org/10.7554/eLife.27694.002Introduction
The use of cattle vaccination for the control of bovine Tuberculosis in Great Britain is currently prohibited under national and European law. In 2013, and before the recent referendum decision for the UK to leave the EU, the European Commission indicated that any change in legislation to allow the deployment of cattle vaccination would be dependent on carrying out field trials under European production conditions. To this end, the Commission obtained detailed recommendations on the design of suitable field trials prepared by the European Food Safety Authority (EFSA, 2013). A consortium was commissioned by the Department of Environment, Food and Rural Affairs (Defra) to design trials that addressed all the EFSA recommendations. To support these designs, we used withinherd transmission models with parameter distributions estimated from GB data (Conlan et al., 2012, 2015) to calculate sample sizes necessary to demonstrate the likely protective benefits of vaccination. Regardless of the outcome of forthcoming negotiations for the exit of the UK from the EU, the use of cattle vaccination may require international approval to maintain trade and perhaps more importantly to ensure the economic buyin of UK Farmers. In this paper, we demonstrate that satisfying two key EFSA recommendations have profound implications for the likely benefits and necessary scale of any field trials; these are that vaccination should be used only as a supplement to the existing testandslaughter policy and that field trials should demonstrate the impact of vaccination on transmission rather than just individual animal efficacy (EFSA, 2013). Use of vaccination as a supplement, rather than replacement, to testandslaughter means that a successful vaccine which reduces the overall burden and transmission of disease may, nonetheless, provide only limited benefit for farmers. Our analyses suggest that field evaluation of the impact of cattle vaccination on rates of transmission is unviable in Great Britain before deployment of vaccination at scale. We propose that experimental natural transmission studies (Velthuis et al. 2007) should be prioritised in order to demonstrate the mode of action of cattle vaccination before costly, and risky, field trials.
Since the 1890s, the control of bovine Tuberculosis (bTB) in cattle has depended on the use of the tuberculin skin test to identify and remove infected animals (Francis, 1947). In countries and regions with no significant wildlife reservoirs, test and slaughter of tuberculinpositive animals has dramatically reduced, and in the notable case of Australia eliminated bTB from cattle populations (More et al., 2015). Tuberculin testing as carried out during the attestation era of the 1940s and 1950s brought bTB to the brink of elimination in Great Britain (GB) (Ritchie, 1959). However, the subsequent relaxation of cattle controls, with the majority of herds tested every three years by the 1970s (Wilesmith, 1983), was followed by a steady increase in incidence in the 1980s that triggered the progressive tightening of cattle controls that continues today. Relaxation of testing in GB coincided with the identification of a wildlife reservoir of bTB in the European badger (Meles meles) – a legally protected species. Culling of badgers to reduce the risk of bTB in cattle herds is a highly contentious issue politically (Grant, 2009), scientifically (Godfray et al., 2013) and in the wider arena of public opinion (Cassidy, 2012). In this context, development of a viable vaccination strategy for badgers and/or for cattle that could reduce the costs associated with testandslaughter has been a longterm priority for Great Britain (Krebs et al., 1997).
Control of bovine tuberculosis in Great Britain
While the ultimate goal of the locally devolved strategies in Great Britain is to eliminate bovine tuberculosis from domestic cattle herds, the more economically important goal is to achieve officially TBfree (OTF) status. OTF status is defined by the EU in terms of demonstrating a longterm herd level prevalence of confirmed bTB of less than 0.1% (Council Directive 64/432/EEC). While Scotland has already achieved this goal, herd level prevalence in England and Wales continues to rise despite intensifying control measures. The current Welsh (Welsh Government, 2012) and English (DEFRA, 2014) strategies for achieving OTF status, and international trade regulations, depend on the continued use of tuberculin testing and compulsory removal of test positive animals from herds. The only viable candidate vaccine for use in cattle at this time is the Bacillus CalmetteGuérin (BCG) vaccine which sensitizes vaccinated animals to tuberculin and dramatically increases the likelihood of falsepositive tests.
The practical and economic benefits of cattle vaccination therefore hinge on the performance of a new diagnostic test that can accurately Differentiate Infected from Vaccinated Animals (a socalled DIVA test) as much as on the efficacy of vaccination. DIVA tests for BCG have already been developed in the form of a interferongamma blood test (Vordermeier et al., 2011) and a skin test based on defined antigens (Whelan et al., 2010). However, both these tests must still be validated and, in the case of the skin test, approved by regulatory authorities. From the perspective of maintaining the security of international trade, and highlighted by EFSA (EFSA, 2013), the most important requirement for validation is that the sensitivity of any proposed DIVA test is at least as good as the existing tuberculin test. However, under the intensive schedule of testing in Great Britain, where affected herds are repeatedly tested until clear, it is diagnostic specificity that provides the greatest barrier to delivery of an economic benefit of vaccination (Conlan et al., 2015).
Modelling the impact of cattle vaccination
To explore this issue of DIVA specificity, and the more general costs and benefits of cattlebased control measures, we previously developed and fitted dynamic herdlevel transmission models that mimic the sequence of testing in GB herds (Conlan et al., 2012, 2015). We compared two basic models for bTB transmission, which are distinguished by different assumed relationships between epidemiological and diagnostic latency and described fully in Appendix 1. For our purposes in this study, the SOR (susceptible, occult, reactive) and SORI (susceptible, occult, reactive and infectious) models can be considered as plausible upper and lower bounds on the transmission potential of Mycobacterium bovis in Great Britain.
Such dynamic transmission models are essential to predict the effectiveness of vaccination within populations due to the indirect benefits of vaccination on transmission (Anderson and May 1992). When some individuals in the population are directly protected from infection by vaccination (allor nothing vaccine effect), they can no longer contribute to transmission; this leads to a further, indirect reduction in the potential spread of a disease within the population.
In order for a vaccine to be useful, it does not necessarily have to provide sterilising immunity to infection (Smith et al., 1984). ‘Leaky’ vaccines that reduce, but do not eliminate, the risk of infection of vaccinates can still control the spread of disease, particularly if the vaccine also reduces the infectiousness of vaccinated individuals. This distinction between the direct and indirect modes of action of vaccination is particularly relevant for BCG. Evidence from challenge (Hope et al., 2005) and natural transmission studies (Ameni et al., 2010) argues more strongly for a reduction in the rate of progression, with a larger proportion of vaccinated animals demonstrating a reduction in the extent of lesions than presenting with sterilizing immunity. For this reason, EFSA specified that field trial designs for the evaluation of BCG in cattle should be able to directly estimate the impact of vaccination on transmission (EFSA, 2013).
Experimental transmission studies can be designed such that the impact of vaccination on transmission can be directly estimated, but achieving this in the field and within an ongoing testandslaughter program is considerably more challenging. The UK bTB control program is complex and dynamic, with the scheduling and interpretation of tests linked to the (apparent) burden of infection within herds (Conlan et al., 2012). Furthermore, the removal of testpositive animals from herds as soon as they are disclosed means that the force of infection, which drives statistical power, is dependent on unobserved infection within cattle, wildlife and the wider environment. Exact likelihoodbased methods of inference which can deal with this missing information, such as dataaugmented MCMC (Jewell et al., 2009) have so far proven to be computationally intractable for bTB. As a result, published estimates of transmission rates in Great Brita have all used approximate methods of inference, depending on aggregating data at the population level from large numbers of herds (Conlan et al., 2012; O'Hare et al., 2014; BrooksPollock et al., 2014). Given the scale of data required for these advanced methods and the need for results of any field trial to be transparent and easily communicated to stakeholders, we consider them inappropriate as a framework to design field trials. Instead, we focus on the use of classical relative risk measures of vaccine efficacy (Smith et al., 1984; Halloran et al., 1991), commonly used in field trial design for human vaccines, to quantify the likely impact of BCG vaccination on transmission.
Relative risk measures of vaccine efficacy
The basic requirement to estimate the indirect benefit of vaccination from either field (Halloran et al., 1991) or experimental trial designs (Velthuis et al. 2007) is the inclusion of at least two groups with differing levels of vaccine coverage. By comparing the relative risk of transmission for unvaccinated individuals within herds that contain different proportions of vaccinated animals, the reduction in infectiousness of vaccinates that subsequently become infected can be estimated. For such designs, three separate vaccine efficacy measures can be defined (Figure 1). Direct efficacy quantifies the protection of individuals from infection and compares the risk of infection of vaccinated animals relative to unvaccinated animals either within the same herd or a control herd. Indirect efficacy compares the risk of infection of unvaccinated animals within a partially vaccinated herd to the risk of unvaccinated animals in an unvaccinated control herd. Finally, Total Efficacy compares the risk of infection of all animals within a partially vaccinated herd to that in unvaccinated control herds.
We define the end point for calculation of these risk ratios as evidence of visible lesions or culture confirmation of all animals that are removed over the course of a trial due to a positive test reaction or natural turnover. Following EFSA recommendations, this controls for the impact that the imperfect specificity of bTB diagnostic tests may have on estimates of vaccine efficacy (EFSA, 2013).
Herd level measures of vaccine effectiveness in field trials
Of equal importance to the efficacy of the vaccination, and essential to quantify the potential costsandbenefits of a cattle vaccination program, is the population level effectiveness of vaccination within the existing surveillance system. The herd level effectiveness of vaccination strategies can be assessed by comparing vaccinated, or partially vaccinated, herds to whole herd controls. As the lion’s share of costs associated with bovine tuberculosis, for both farmers and government, are incurred from testing and compensation, we choose to measure effectiveness of vaccination through statistical measures of withinherd persistence. Specifically, we consider the risk of breakdown (herd level incidence), duration of breakdowns and the probability of recurrence. Note that despite similar definitions, these measures are not directly comparable to published estimates of withinherd persistence in Great Britain (Karolemeas et al., 2010; Karolemeas et al., 2011; Conlan et al., 2012) due to differences in the scheduling of testing during the proposed trials and the replacement of tuberculin with DIVA testing for both vaccinated and unvaccinated herds (see Appendix 1).
We define the herd level incidence as the proportion of study herds that have a breakdown over the fixed time horizon of the trial design (3 years); prolonged breakdowns as the proportion of herds that require more than 1 DIVA test in addition to the disclosing test to clear restrictions and recurrence as the proportion of herds that experience a breakdown and subsequently see a second incident with the time horizon of the trial.
Conceptual design to estimate vaccine efficacy and herd level effectiveness
As previously discussed, estimating the indirect efficacy of vaccination requires at least two groups with different levels of vaccination coverage. By selecting one of these groups to be a set of unvaccinated controls, a twophase design can also be used to estimate the herd level effectiveness of vaccination. We use our herdlevel simulation models to predicted the effectiveness of vaccination and calculate appropriate sample sizes for different measures of vaccine efficacy. Throughout, we aim for an 80% statistical power, defined as the probability of failing to detect a given effect at the 97.5% significance level. Vaccinated and control animals are tested at 60day intervals throughout the trial, with slaughter of DIVA test positive animals.
Within the vaccinated group, the statistical power to estimate the direct efficacy of vaccination depends on our ability to estimate the attack rate in both the vaccinated and unvaccinated subpopulations. As such, for withinherd controls a balanced design where the target coverage is 50% will be optimal. This balanced design will also be optimal for estimating indirect efficacy for a vaccine that halves the rate of transmission. Estimates of the indirect efficacy depend on comparing the attack rate in the unvaccinated controls within the partially vaccinated group, to that in unvaccinated herds. In the case of indirect efficacy, we must balance our ability to estimate the attack rate in the withinherd controls against the effect size generated by the presence of vaccinated animals within the group. A priori, for an efficacious vaccine, we would expect rates of transmission to be lower in the vaccinated herds and thus we weight the design to vaccinate 75% of recruited herds, retaining 25% as unvaccinated controls. This design places a greater importance on our ability to measure the direct effect of vaccination, while still allowing for the estimation of a relatively large impact of vaccination on transmission should it exist.
Results
The predicted effect size of vaccination and statistical power, at least for direct efficacy, are largely consistent between our two alternative transmission models (SOR and SORI). Important differences are manifest for the indirect impacts of vaccination so both sets of model results are presented, with SORI results presented first and SOR results as supplementary figures.
Across all the considered scenarios, relative risk measures of vaccine efficacy are systematically lower than the true assumed reduction in susceptible and infectiousness. For example, reductions in susceptibility of ${\epsilon}_{S}$ = 30, 60, and 90% correspond to predicted Direct Efficacies of ~25, 50 and 75% (Figure 2). This discrepancy between the true efficacy and that measured by relative risk is the consequence of the (assumed) limited duration (average of one year) of immunity (Shim and Galvani, 2012) and systematic biases in the relative risk measures arising from the heterogeneity in attack rate between herds (discussed further below).
The statistical power associated with these predicted effect sizes also depends critically on the variability of the posterior predictive distributions (PPD) – which for some measures is extreme. To allow for comparison between different measures, we summarise the effect size as the median value of the PPD (Figure 2, Figure 2—figure supplement 2), and plot the 95% posterior predictive intervals for the most optimistic vaccination scenario (${\epsilon}_{S}=90\%,{\epsilon}_{I}=90\%$) separately (Figure 2—supplement 1, Figure 2—supplement 3).
Direct efficacy
In this conceptual design, Direct Efficacy can be estimated relative to either withinherd (WH) or betweenherd (BH) control animals (Figure 1,2). For an assumed direct protection (${\epsilon}_{S})$ of 90%, and average duration of immunity of 1 year, the power calculations are relatively insensitive to this design choice and the assumed effect of vaccination on infectiousness (${\epsilon}_{I}$). For this baseline assumed effect size of 90% (Figure 2E), which corresponds to an effective efficacy (~ 60%) comparable with existing experimental and field estimates for BCG (Hope et al., 2005; Ameni et al., 2010; LopezValencia et al., 2010), 100 randomly selected herds in GB would comfortably provide > 90% power to estimate a positive direct efficacy for both the alternative SORI (Figure 2E,F) and SOR models (Figure 2—figure supplement 1 E,F).
This lack of sensitivity of statistical power to the choice of controls extends to lower levels of protection (${\epsilon}_{S}=30\%)$). However, in this scenario there is an increased sensitivity to the effect of vaccination on infectiousness (${\epsilon}_{I}$) and > 300 herds would be necessary to achieve the target of 80% power (Figure 2A,B). Alternative designs with a single target level of vaccination (distributed between or withinherds) can mitigate this reduction in power and achieve the same statistical power with 100 herds (results not shown, Triveritas, 2014). The necessitity for designs to directly estimate indirect effects of vaccination therefore has a very real impact on the necessary scale of trials and the statistical power to estimate the basic individual level protection afforded by the vaccine.
Indirect efficacy
In contrast to direct efficacy, estimates of the indirect efficacy are more sensitive to the choice of model with an indirect efficacy of ~0 predicted by the SORI model (Figure 2) and a positive indirect efficacy of up to 10% from the SOR model (Figure 2—figure supplement 2). This is a consequence of the different assumptions, discussed in detail in Appendix 1, concerning the time from infection to infectiousness. For the SORI model, estimates of transmission rates are higher than for the SOR model; however, animals must pass through a period of latency before becoming infectious. For the SOR model, animals have lower estimated transmission rates but are immediately infectious upon infection. As a result, the SOR model is more sensitive to the impact of vaccination on infectiousness and predicts a greater indirect benefit of vaccination.
Nonetheless, both models predict such a small indirect efficacy that there is a high probability of estimating a negative vaccine efficacy  implying an increase in infectiousness in vaccinated animals  even when a true protective effect exists (Figure 2—figure supplements 1, 2 and 3).
Total efficacy
The magnitude of indirect protection for both models is constrained considerably by the removal of infectious animals as soon as they become DIVA positive and by the extrinsic rate of infection that captures the risk of both animal movements and the unobserved environmental reservoir. These withinherd models assume that vaccination has no impact on the reservoir of infection, hence the small magnitude of the predicted indirect benefits of vaccination. As a consequence of these two factors, the total efficacy is estimated to be approximately half that of the direct efficacy and the number of herds required to power a trial based on the total efficacy are correspondingly larger. Both models suggest that an 80% power of estimating a positive total efficacy would require $>300$ herds even for a true direct effect of vaccination on susceptibility of ${\mathrm{\epsilon}}_{\mathrm{S}}=\mathrm{}90\mathrm{\%}$ (Figure 2, Figure 2—figure supplement 2).
Systematic bias in estimates of vaccine efficacy through relative risk measures
The underestimate of the (instantaneous) efficacy of vaccination (${\mathrm{\epsilon}}_{\mathrm{S}}$, ${\mathrm{\epsilon}}_{\mathrm{I}}$) through (cumulative) relative risk measures is the natural consequence of the limited duration of immunity and dynamics of transmission withinherds. To explore how this systematic underestimate of vaccine efficacy through relative risk measures depends on trial duration and design, we examine the posterior predictive distributions for the withinherd prevalence of infection (Figure 3). We define withinherd prevalence as the proportion of the total atrisk population during a trial that is found to be either testpositive or culture confirmed at slaughter.
These distributions reveal a highfrequency mode of singleton (or very few) reactor TB incidents  even for trial durations extending up to 9 years. This right skewed distribution of withinherd prevalence is consistent with the distribution of reactor animals seen within UK herds where less than half of bTB breakdowns have more than one reactor animal disclosed.
The consequence of this low predicted attack rate in the majority of trial herds is a systematic underestimate of vaccine efficacy through relative risk measures. The power to discriminate between the attack rate in vaccinated and unvaccinated animal’s rests almost entirely with the relatively few herds that experience a high attack rate (Figure 3). The origins of this variability are multifactorial including systematic differences in the withinherd reproduction ratio resulting from the demographic structure of herds, parametric uncertainty and variability in the extrinsic (environmental) risk of infection between herds even within the same risk areas.
Herd size and the residence time of animals within a herd could in theory be used to target herds with a greater potential for transmission. However, the practicality of such targeting is limited by the relative infrequency of such herds, the necessity that participation in any trial would be voluntary and the additional requirement from the EU/EFSA that the study population for field evaluation should be representative of European production systems (EFSA, 2013). Targeting herds with a greater environmental risk of infection is impractical due to the lack of useful data or robust methodology to quantify these risks.
Perhaps, the most natural step to increase the risk of transmission would be to retain, rather than cull, testpositive animals for the duration of any trial. However, such action has been ruled out by policy makers due to the legal and ethical issues of leaving animals known to be infected and may pose a risk of transmission to farm workers or researchers.
Nonetheless, it is important to consider what effect this may have on the likely success of field trials. To this end, we explore the effect that retaining reactor animals has on the posterior predictive distribution for withinherd prevalence for trial durations of 3, 6 and 9 years (Figure 3, Figure 3—figure supplement 1). We see that for a 3year trial this, unpalatable option for policy makers, would make no difference to the predicted attack rate in unvaccinated control herds due to the relatively low cattletocattle transmission rates and long generation time of bTB in cattle (Figure 3, panel A). Even for impractically long trials of up to 9 years, retaining reactors only serves to thicken the long tail of herds that experience a relatively high rate of transmission (Figure 3, panel A).
Herdlevel effectiveness of vaccination
To maintain consistency with our design for efficacy, we consider the predicted impact of vaccination for all three of our herd level measures at a target vaccination coverage of 50% and compare with an alternative design with 100% whole herd vaccination.
For all three herd level measures, the impacts of vaccination predicted by the SORI model at a baseline efficacy of ${\epsilon}_{S}=90\%$(corresponding to a predicted effective direct efficacy of vaccination of ~ 75%) are modest and variable, with an average improvement of between 10 and 20% for whole herd vaccination, halving to between 5 and 10% for a target coverage of 50% (Figures 4, 5 and 6). As with the measures of vaccine efficacy, the predictive distributions for persistence measures manifest considerable variability with a substantial probability of observing a negative effect of vaccination even when a true protective effect exists (Figure 4—figure supplement 1, Figure 5—figure supplement 1, Figure 6—figure supplement 1).
As a consequence of this variation, achieving the target statistical power of 80%, would require a study population of at least 500 herds for whole herd vaccination and in excess of 2000 herds (the upper limit considered) for the target coverage of 50%. The SOR model predicts a similar, but more variable effect size (Figure 4—figure supplement 2, Figure 5—figure supplement 2, Figure 6—figure supplement 2), that is more sensitive to the effect of vaccination on infectiousness (Figure 5—figure supplement 3, Figure 6—figure supplement 3, Figure 6—figure supplement 3).
Discussion
We have used withinherd transmission models, with parameter distributions estimated from field data in Great Britain, to calculate indicative sample sizes for field trials of cattle vaccination with BCG as a supplement to an ongoing testandslaughter program for bTB.
Our models suggest that evaluation of the direct protective effect of BCG in the field would be viable in the UK. A three year trial with 100 herds should provide an 80% power of estimating an individual protective efficacy of at least 30%. The scale of such a trial is affected by the requirement that testpositive animals are removed from trial herds, but is driven by the heterogeneity in withinherd prevalence of bTB in Great Britain.
At the most basic level, demonstrating the efficacy of a vaccine depends on achieving sufficient exposure of vaccinated and control animals. This is a fundamental challenge for the managed cattle herds in Great Britain where high attack rates are limited to a very small proportion of affected herds. The relatively rarity of these herds and dependence on (unmeasurable) confounding factors, such as the environmental risk of infection, makes targeting this subpopulation of herds impractical and biases estimates of vaccine efficacy through relative risk ratios.
This distribution of disease has a bigger implication for the potential of field trials to measure the indirect efficacy of vaccination on transmission. For all of the conceptual trial designs and model scenarios considered in this paper, the Indirect Efficacy estimated from relative risk ratios would be essentially zero, with a high probability of estimating a negative efficacy in underpowered trials even when a considerable individual level reduction in infectiousness exists. Given the slow timescale of bTB transmission, even the controversial step of retaining testpositive (reactor) animals within trial herds would not reduce the risk of a trial failing.
Should BCG be licensed for use in cattle, at least in the UK, vaccination will be at the discretion of individual farmers who will be expected to bear the costs of vaccination. In the UK, the major economic costs for farmers accrue with respect to the frequency of testing and the period of time under restrictions. The individual efficacy of vaccination is therefore of far less interest to farmers than the herd level effects in terms of the impact of the surveillance and testing regime on their business (Bennett and Balcombe, 2012).
Our models predict relatively modest improvements for farmers who would choose to vaccinate, with at most a 15% predicted reduction in the risk of a recurrent or prolonged breakdown. Part of the reason for this modest estimated effectiveness of vaccination in these model scenarios is that unvaccinated trial herds benefit from the likely benefits of the prospective DIVA test. The limited data available from challenge studies suggests that DIVA tests (Conlan et al., 2015) will have a higher sensitivity than tuberculin testing. The overall benefit of vaccination and DIVA testing together would be expected to be larger than the effect of vaccination or tuberculin testing alone.
Another factor likely to limit the effectiveness of vaccination in our models is the constant extrinsic rate of infection, estimated from data, that is unaffected by the level of infection within the herd. This is a pragmatic modelling assumption, taken due to the complete lack of information routinely collected on the burden of disease within environmental and wildlife reservoirs. More complex dynamic models of the reservoir could, and have been, constructed in national level models (BrooksPollock et al., 2014). However, no model can account for our basic data gap of the balance of transmission between cattle and wildlife populations that will ultimately determine the longterm outcome of vaccination (BrooksPollock and Wood, 2015). The appropriateness of our assumption of a constant reservoir will depend on the extent to which the rate of extrinsic infection into herds varies over the course of simulation. For the purposes of trial evaluation, this should be considered as a worst case scenario as vaccination will have no direct impact on reducing the infection risk within the static reservoir. However, over the (relatively) short timescale of a trial we believe this will be a reasonable approximation. The extent to which the impact of vaccination over longer timeframes will be greater depends critically on the relative rates of infection to and from the environmental reservoir (Woodroffe et al., 2016) and between species (BrooksPollock and Wood, 2015), the magnitude of which are highly uncertain and difficult to quantify.
A consequence of this modest predicted benefit of vaccination is that herd level effectiveness would be exceptionally difficult to estimate from partially vaccinated herds, requiring a sample size in excess of 2000 herds. This highlights once more the devastating impact including partially vaccinated herds in the design, required to estimate the indirect effect of vaccination, has on the necessary scale of trials. The number of herds required could be reduced by a three arm design which includes fully vaccinated, partially vaccinated and unvaccinated control herds. However, such a design would still require of the order of 500 fully vaccinated herds and controls, compared to 100 to evaluate the direct protection, and still have a high risk of failing to provide actionable information on the impact of vaccination on transmission.
On advice from Defra and informed by the results of this paper, the Triveritas consortium proposed an alternative to a three arm design with a phased series of trials to first evaluate vaccine efficacy, and then proceed to larger scale trials to quantify herd level effectiveness (Triveritas 2014). Such an approach to mitigating risk is implicit in the established standards for evaluation of human vaccines, a comparison that warrants further discussion.
For human vaccines, evaluation of the population level effectiveness and indirect protection of a vaccine is typically reserved for Phase IV trials, carried out after the licensing and deployment of a vaccine at scale. In this light, the EU requirement that the impact of BCG on rates of transmission should be demonstrated before a vaccine can be licensed is notable. Although unusual, there are important biological reasons that motivated this requirement for cattle vaccination for bTB. It is possible that the use of ineffective vaccine in combination with a less sensitive DIVA test could lead to a perverse consequence of vaccination and increase the rates of silent transmission of infection. This important question must be addressed before the widespread deployment of BCG, but we would argue that field trials are not the most effective way to achieve this.
In Appendix 2, we illustrate that an natural transmission experiment involving as few as 200 animals over a twoyear period could provide a greater power to not only estimate the efficacy of BCG, but also the mode of action in terms of the impact on susceptibility to infection and the infectiousness. Equivalent to a Phase II trial of a human vaccine, a successful experimental transmission study could provide the confidence to go ahead with field evaluation of the efficacy and effectiveness of vaccination without the necessity to compromise the power of trials with the inclusion of partially vaccinated herds.
Our calculated sample sizes for natural transmission studies presented in Appendix 2 depend on estimates of transmission rates from field data where transmission rates scale with herd size (discussed in Appendix 3). The validity of density dependent scaling of transmission rates for small group settings is debatable, as the empirical relationship may be the consequence of husbandry factors that correlate with herd size rather than a true dependence on group size. For this reason, we suspect that field estimates of transmission may underestimate the transmission potential in small groups allowing for a shorter contact time.
Nonetheless, the experience of previous transmission experiments with reactor animals from Great Britain (Khatri et al., 2012) would caution against committing to large, and potentially expensive, natural transmission study in the absence of more encouraging pilot data. Our proposed design recommends a group size of 52 animals and a contact period of 1 year in line with the more optimistic model scenarios. Endemically infected countries where the feasibility of natural transmission models has already been demonstrated (Ameni et al., 2010) are more promising locations for such experiments than Great Britain. However, the twophase design of our natural transmission trial allows for a stop/go point, where phase I can be continued for an additional year if insufficient transmission is seen within the unvaccinated control animals. In this way, a trial could still provide key information on the direct efficacy of vaccination, even if low rates of transmission rule out evaluation of the indirect effects.
Experimental trials for vaccine efficacy have the advantage  and disadvantage  that extrinsic sources of infection from wildlife and the environment can be eliminated and controlled for. Such experimental designs would provide more precise information on the efficacy and mode of action of vaccination for predicting the potential impact than could realistically be achieved in a field setting. They would not satisfy the current EC requirement, and EFSA recommendation, that trials should be carried out under European production conditions (EFSA, 2013) or convince farmers about the practicality of cattle vaccination alongside an unmanaged wildlife reservoir. Natural transmission studies should therefore be considered as an initial screening step for any prospective vaccine before larger, more expensive and riskier trials in the field. Such field trials could (or should) be based on modelling of transmission in the cattlewildlife system using among others parameter estimates from these transmission studies.
From challenge data, we already know that BCG has the potential to provide a protective benefit to cattle. However, our results highlight the enormous scale of trials that would be necessary to evaluate BCG alongside continuing testing in the field. The scale of such trials could be dramatically reduced by addressing the mode of action of vaccination through smaller scale natural transmission studies.
Based on our current knowledge of the likely efficacy of BCG, our models do not predict a substantial benefit of vaccination at the herd level when used as a supplement to ongoing testandslaughter. Indeed, the primary benefits predicted by our model come from the likely increase in diagnostic sensitivity provided by a replacement DIVA test rather than vaccination in itself. The format of the tuberculin skin test used in Great Britain – the Single Intradermal Comparative Cervical Tuberculin test (SICCT) prioritises diagnostic specificity over sensitivity. This is in contrast to countries who have successfully achieved TBfree status based on the use of the more sensitive Single Intradermal Test (SIT). Although not the primary focus our study, our results reinforce the benefits for management of bTB that would come from routine use of a more sensitive and equally specific test. Likewise, our results highlight that ruling out the use of vaccination as a replacement, rather than a supplement, to testandslaughter will inevitably limit the effectiveness and perceived benefits for farmers. Reconsidering this policy option would revolutionise the economic case for the deployment of an effective vaccine, not only in Great Britain but in developing countries which can not afford to adopt expensive testandslaughter programmes.
Materials and methods
Simulation protocols for field trial designs
Request a detailed protocolFor each vaccination scenario, defined by a unique level of vaccination coverage and assumed efficacy of vaccination, we simulate 5000 trials with from a sample of herds representative of the range of herd sizes and demography seen in Great Britain.
Model parameters for each simulation are sampled from approximate Bayesian posterior distributions estimated for the relevant model, as described in Appendix 1. Sensitivity to model parameters is thus implicit in our analysis, with simulations used to generate predictive posterior distributions for the statistical measure under consideration. We use the median value of these predictive distributions to quantify the expected effect size of vaccination and the full distribution to estimate the statistical power for each measure of vaccine efficacy.
Sensitivity of our results to model structure is explored by comparing the two alternatives within herd transmission models (SOR and SORI) described in full in Appendix 1.
Simulations are initiated with no infection within herds and an extrinsic force of infection as estimated from breakdown herds in high incidence (historic annual testing) areas. Herds are initialised with no infection within the herd, and become infected at this extrinsic infection rate. Our simulated study population will therefore contain both affected (breakdown) and unaffected herds. Thus, estimated sample sizes correspond to the total number of herds that must be recruited rather than breakdowns.
As the model is fitted to breakdown herds only, this background rate of infection should only be considered as representative of herds with a past history of bTB. Herds with no previous history of bTB might be expected to experience a lower rate of challenge from the outside of the herd and increase the calculated sample sizes.
Power calculations for relative risk measures of vaccine efficacy
Request a detailed protocolRelative risk measures of vaccine efficacy compare the attack rate in unvaccinated and vaccinated groups within a defined population as illustrated in Figure 1. The attack rate within each group is calculated as the ratio of the number of cases divided by the total at risk population. For our purposes the atrisk population is defined as the total population of animals removed from herds over the duration of a trial and cases can either be culture confirmed testpositive animals or TB lesioned animals found at routine slaughter.
Direct Efficacy compares the attack rate in vaccinated animals (ARV) against unvaccinated control animals (ARU) and is calculated as:
where ARV and ARU are calculated for each scenario using 10,000 independent samples from a pool of 5000 model simulations as described above.
Indirect Efficacy can only be measured within designs with whole herd controls and vaccinated herds with target vaccination coverage of < 100%. Indirect efficacy compares the attack rate in unvaccinated animals (ARU_{V}) within a vaccinated herd and that from unvaccinated control herds (ARU) and is calculated as:
Total Efficacy can also only be measured within designs with whole herd controls and compares the attack rate in all animals on a partially vaccinated herd (AR) to that within unvaccinated control herds (ARU) and is calculated as:
Power calculations for relative risk measures of vaccine efficacy for field trial designs
Request a detailed protocolWe base our power calculations for field trial designs upon a classical hypothesis test on the relative risk of infection (RR) in vaccinated compared to unvaccinated animals (Kirkwood and Sterne, 2003). We test against a null hypothesis of no difference between the two populations (RR = 1). To account for the high probability of estimating a negative efficacy, even when a protective efficacy exists, we use a onesided test with alternative hypothesis $H1:\text{}RR\text{}\text{}1$. The hypothesis test takes the form of a ztest with $z=log\left(RR\right)/s.e.\left(log\right(RR\left)\right)$, where the standard error of the relative risk is calculated using the standard result based upon the numbers of cases and atrisk animals in the vaccinated and unvaccinated groups. Power is then estimated based upon the empirical distribution of RR generated by sampling 10,000 independent outcomes from our pool of 5000 model simulations generated for each scenario. For each simulation, we calculate the zstatistic as described above and estimate the proportion of simulations where z is less than the critical value ($zcr$) defining the 95% level (p=0.025 for 1sided test). The power, defined as the probability of observing a significantly protective effect when it exists, is then calculated as the proportion of simulations where $z\text{}\text{}zcr$.
Power calculations for herd level measures of vaccine effectiveness
The effectiveness of vaccination at the herd level can be quantified in terms of the risk of breakdown (herd level incidence), duration of breakdowns and the probability of recurrence. Note that due to the differences in the scheduling of testing during the proposed trials these measures are not directly comparable to those previously used to quantify withinherd persistence under the current statutory regime of testing. Quantifying these herd level measures requires a design with both vaccinated and unvaccinated herds subject to the same (DIVA) testing protocol.
We consider three complementary measures of the potential effectiveness of cattle vaccination:
Herd level incidence
Request a detailed protocolThe proportion of study herds that have a breakdown over the fixed time horizon of the simulation (3 years unless otherwise stated).
Prolonged breakdowns
Request a detailed protocolThe proportion of herds that require more than 1 (DIVA) test in addition to the disclosing test to clear restrictions.
Recurrent breakdowns
Request a detailed protocolThe proportion of breakdowns that recur within the fixed time horizon of the simulation (3 years unless otherwise stated).
All these herd level effects are all defined in terms of probabilities or proportions. We can therefore estimate statistical power for these measures using a hypothesis test on the difference between two proportions (Kirkwood and Sterne, 2003). We test against a null hypothesis of no difference between the two proportions (d = 0). To account for the high probability of estimating a negative efficacy, even when a protective efficacy exists, we use a onesided test with alternative hypothesis H1: d > 0. The hypothesis test takes the form of a ztest with z = d/s.e.(log(RR)), where the standard error of the relative risk is calculated using the standard result based upon the difference d, the numbers of cases and numbers of atrisk animals in the vaccinated and unvaccinated groups. Power is then calculated based upon the empirical distribution of RR generated by sampling a given number of herds from a pool of 10,000 model simulations. For each simulation, we calculate the zstatistic as described above and estimate the proportion of simulations where z is less than the critical value (zcr) defining the 95% level (p=0.025 for 1sided test). The power, defined as the probability of observing a significantly protective effect when it exists, is then calculated as the proportion of simulations where z < zcr.
Appendix 1
Withinherd models for transmission of bovine Tuberculosis
To explore the likely sample sizes required for field evaluation of BCG we use individual based models previously developed to explore the potential deployment of cattle vaccination and reported in (Conlan et al., 2015). Briefly, these herd level models were designed to estimate the effectiveness of testing at clearing infection from herds set against empirically estimated rates of withinherd transmission and extrinsic introduction of disease from cattlemovements and environmental reservoirs. As such the complexity of the models reflects a tradeoff between achieving a realistic level of biological complexity and the potential to directly infer model parameters from epidemiological data. Full details of the formulation and estimation of these models have already been published (Conlan et al., 2015). We describe the structure and estimated parameter distributions of these models below. Full source code for the model implementation and scripts used to generate all of the figures in this paper are available from a publicly accessible git repository: https://bitbucket.org/MonkeyMyshkin/bcgtrials.git (Conlan, 2018; copy archived at https://github.com/elifesciencespublications/bcgtrials).
Modelling latency of bovine Tuberculosis
A particular challenge with modelling the transmission of bovine Tuberculosis is the uncertainty surrounding the rate of progression from infection to infectiousness and the relationship between diagnostic status and infectiousness. The traditional view of bTB progression in cattle is captured in the SORI compartmental model framework where susceptible animals (S) must progress through a series of latent classes where they are first undetectable (or occult O), detectable (or reactive to the tuberculin skin test R) before finally becoming infectious (I). However, we have found that, at the withinherd level at least, the epidemiological patterns of transmission are equally well described by a simpler SOR model where all infected animals (O,R) are potentially infectious, but transmit at a lower average rate. For the purpose of this study, these two models provide plausible upper and lower transmission scenarios as an additional sensitivity analysis to assess the robustness of the power calculations for trial designs.
The SORI and SOR models – with extensions to model the action of vaccination – are implemented as stochastic continuous time Markov processes with parameters detailed in Appendix 1—table 1 and defined by the events and transitions in Appendix 1—table 2. The two models are distinguished by the reactive compartments ($R,{R}_{V}$) being absorbing states for the SOR model with $\raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{${T}_{V}$}\right.=0$ and by the form of the force of infection $\lambda (a,t$):
Transmission within herds nonlinearly increases with the size of the herd through the (estimated) parameter $q$ and susceptibility is assumed to vary with age according to independently estimated relative risk of infection ($RR\left(a\right),$Appendix 1—table 3). Herds are treated as independent, coupled to an extrinsic constant reservoir of infection $\chi $ that varies (${\chi}_{1},{\chi}_{2},{\chi}_{4}$) by the risk area that the herd is located in (the historical Parish Testing Interval (PTI) 1, 2, & 4).
Demography of herds
The study population used to estimate these models includes a representative sample of herd sizes and management models of herds from Great Britain with a past history of bTB breakdowns. For each herdlevel simulation the size of the herd is fixed, with the rate of on and off movements of animals sampled from the cattle tracing system (CTS) to generate a realistic agestructure and distribution of residence times for individual animals as described in Conlan et al. (2015).
The demographic events of birth, death and movement off a herd are simulated as nonMarkov processes. The time of each demographic event is sampled from an extract of the cattle tracing system (CTS) database for each individual animal included in the model. During each step of the model we simulate the time for the next Markov event (t_{MARKOV}) and compare to the time of the next nonMarkov (demographic) events (t_{NON_MARKOV}). If t_{NON MARKOV} <t_{MARKOV,} we carry out the demographic event first and recalculate a new t_{MARKOV..} Otherwise, a Markov (epidemiological) event is simulated and the model time updated. For a complete specification of the simulation algorithm and handling of demographic events see (Conlan et al., 2015).
Model parameterisation
Distributions for key model parameters (Appendix 1—table 1) describing the current control regime were estimated using Approximate Bayesian Computation (ABC) based on statistical measures of herdlevel incidence and persistence of bTB (Conlan et al., 2015). These approximate posterior distributions capture both the uncertainty in estimates of model parameters and the sensitivity of model predictions to the value of these parameters. Summary parameter estimates are presented in Appendix 1table 4, full details and posterior predictive checks of these model estimates can be found in (Conlan et al., 2015), samples from the estimated ABC posterior distribution are provided as part of the source code for all results and figures in this paper are publicly available from the following git repository: https://bitbucket.org/MonkeyMyshkin/bcgtrials.git (Conlan, 2018; copy archived at https://github.com/elifesciencespublications/bcgtrials).
Auxiliary parameters relating to the agedependent risks of testing positive (Appendix 1—table 3), demonstrating evidence of visible lesions (Appendix 1—table 5) and testing positive to the SICCT test (Appendix 1—table 6) were independently estimated and fixed to the specified values. Parameters relating to the individual level efficacy of vaccination (Appendix 1—table 2) were informed by a mixture of experimental and field data as described once again in (Conlan et al., 2015) and subject to sensitivity analysis with respect to both the direct reduction in susceptibility (${\epsilon}_{S}$) due to vaccination and impact on infectiousness (${\epsilon}_{I}$).
Testing and vaccination schedules for efficacy trials
To accommodate the practical requirements of field trials, herds are subjected to a simplified schedule of testing and surveillance based upon the current regulatory regime. To allow trials to be blinded, tuberculin testing is assumed to be replaced by DIVA testing for all vaccinated and unvaccinated animals on trial herds (EFSA 2013). Hence, we assume that ${p}_{T}={p}_{D}$ and ${p}_{FP}={p}_{FPD}$ and the test characteristics do not change with confirmation status of breakdowns.
Herds are assumed to be recruited from a high incidence area (historical annual testing interval) and have a pasthistory of infection but are disease free at the beginning of the trial. Simulated herds become infected at rate determined by the extrinsic infectious pressure ($\chi )$. Appendix 1table 7 and Appendix 1table 8 tabulate the predicted herd level incidence for the SORI (Appendix 1table 7) and SOR (Appendix 1table 8) models for a threeyear trial period and different combinations of vaccine coverage and efficacy.
Herds are revaccinated on an annual schedule, with the entire herd vaccinated on day 0. To reduce the frequency of researcher visits to herds during trials, imports and births into the herd are assigned as vaccinates or controls on entry to a herd, batched and vaccinated at 180 day intervals and then revaccinated according to the herd schedule. Therefore, even for a 100% target vaccination coverage the instantaneous level of vaccination coverage will change dynamically over a simulation. The variability in coverage over time will depend on the demography of the herd and in particular the rate of demographic turnover (i.e. moves on/moves off). This turnover of animals will limit the effectiveness of vaccination at the herd level contrasted to that which could be achieved under more controlled experimental conditions.
Breakdowns are triggered on trial herds by the failure of routinely scheduled 6 monthly tests or by detection of a lesioned animals at slaughter. Once infection is disclosed in a trial herd it is subject to 60 day short interval DIVA testing. After a single clear DIVA test, short interval testing is suspended and followup 6 month and 12 month tests are scheduled. In contrast to the statusquo in GB where testintervals are variable and subject to veterinary discretion (Conlan et al., 2012), trial herds are assumed to be tested at these precisely specified intervals.
DIVA test positive animals are removed from herds and subject to slaughterhouse inspection. The probability of animals having visible lesions is assumed to depend on infection status and age only. The probability of lesions depends on the empirical relationship between age and the probability of visible lesions being detected (Appendix 1—table 5) as estimated in (BrooksPollock et al., 2013) and previously used in (Conlan et al., 2015).
Given the intention of trials to inform the likely benefit of vaccination within an ongoing schedule of surveillance testing we assume that a validated DIVA test with specificity of at least 99.85% and sensitivity of at least 73.3% will be available and is a necessary requirement for progressing to any field trials of cattle vaccination. These values correspond to the breakeven scenario considered in our previous modelling study where the costs of additional false positive DIVA results balance the expected benefits of a cattle vaccine with at least 60% protective efficacy for an average duration of protection of 1 year (Conlan et al., 2015). Data from animal challenge studies suggests that this breakeven specificity level of >99.85 is achievable in vaccinates with an interferongamma DIVA sensitivity (relative to visible lesions) of 73.3% (95% CI: 61.9, 82.9%) (using ESAT6, CFP10, Rv3615c antigens) (GJ Jones, M Vordermeier, personal communication).
Appendix 2
Alternative natural transmission study design to evaluate vaccine efficacy
In this appendix we carry out sample size calculations for an alternative natural transmission study design to evaluate the mode of action of BCG vaccination. A natural transmission study has the key advantages compared to field trials that testpositive animals can be retained through the course of the experiment (Appendix 2—figure 1). We consider the following design:
Two experimental groups with 4N cohoused animals; 2N seeder reactor animals; N vaccinated sentinel animals and N unvaccinated sentinels; 60 day interval DIVA testing; Retention of testpositive animals; Two phase design with sentinel animals from Phase I used as seeder animals in Phase II
The basic experimental unit of this design (Appendix 2—figure 1) is a group where an equal number of (presumed infected) seeder animals ($S$) are cohoused with susceptible sentinel animals ($U,V$) that also differ with respect to their vaccination status (Vaccinated $V$, Unvaccinated $U$). The first experimental phase (Phase I) consists of two experimental replicates (Group 1 and Group 2) that essentially compare the relative rate of transmission of M. bovis to unvaccinated and vaccinated sentinel animals when challenged by naturally infected seeder animals (${S}_{R}$) recruited from the field. Any reduction in the rate of transmission to the vaccinated group in Phase I will have contributions from the direct protection offered through a reduction in susceptibility and the indirect effect of the reduction in susceptibility and infectiousness of the sentinel animals.
Taken alone, and in common with past experimental estimates of the efficacy of BCG, the two experimental groups in Phase I can only estimate the direct efficacy of vaccination. Phase II provides information on the effect of vaccination on infectiousness by using the exposed sentinel animals from Phase I as seeder animals for a second round of transmission with fresh vaccinated ($V$) and unvaccinated ($U$) sentinels. This comparison is essential to separate the relative effect of vaccination on susceptibility and infectiousness. Note that we can distinguish the (indirect) effect of reduced infectiousness from the indirect effect of reduced susceptibility because we use the information regarding which animals are infected. In contrast to field trials, where the extrinsic force of infection acting on different herds is completely unobserved, the number of seeder and testpositive animals can be used as a first estimate to adjust for the change in the force of infection over the course of an experiment. Thus, the rate of transmission of bTB to sentinel animals in both vaccinated and unvaccinated groups can be directly estimated via the rate at which animals become DIVA positive, rather than relying on relative risk ratios based on the clinical endpoint. This is a key advantage which opens up the use of more powerful methods of analysis such as survival or mechanistic chainbinomial models (Velthuis et al. 2007).
Sample size calculations for natural transmission study to evaluate vaccine efficacy
For natural transmission studies for a chronic infection such as bTB, we can use the reproductive ratio R as a design parameter. R can be manipulated directly (at least within the bounds set by the bovine lifespan) through setting the incontact time (T_{C}) of each experimental phase. The basic requirement for the design of a natural transmission study is that we see transmission in all experimental groups during both experimental phases. Thus, we must ensure that the reproductive ratio R is greater than one and ideally is large enough that sufficient seeder animals are generated in Phase I to be used for Phase II.
The sample size calculations to power the proposed transmission study are conceptually equivalent to the comparison of two treatments for which the power can be readily estimated. The statistical comparison of two treatments with respect to transmission can be based on the final size (FS) for the two transmission chains in the different treatment groups, or on the measurements of the number of infected and susceptible at the beginning of an interval and the number of cases in each interval (Velthuis et al., 2007a, 2007b). Clearly the latter methods have a higher power as the timeseries of transmission events provide more information than the final size alone. However, for the calculation of the sample size we want to use a method that gives a conservative estimate of the sample size (too large rather than too small). Thus, the FS methods are more suitable for this purpose.
The power of experiments analysed by FS is approximately the same irrespective on how animals are distributed over groups, provided that, as in our designs, we aim for a 50:50 mix of seeder (infected) and sentinel animals in each transmission group (Velthuis et al., 2007a). Thus 20 groups with pairs (one seeder and one sentinel) has a similar power as 1 experiment with 40 animals (20 seeder and 20 sentinels). The statistical analysis with FS depends on finding the joint final size probability distribution for all the groups. For larger groups this is difficult but for pairs this is straightforward. Each pair has only two possible outcomes: the recipient becomes infected or not. This depends on what happens with the infected animal: it either recovers first or it infects the recipient first. The events that can happen are:
The probability that the infection occurs first (between now and infinity, or in other words given that one of the events occurs) is (competing risks):
Which simplifies to assuming I≠0 and the basic reproduction ratio $R=\frac{\beta}{\alpha}$:
Thus, for a pairwise experiment S = 1 (I = 1 check not 0) and N = 2:
The other possibility: that recovery occurs before the contact infection is of course the complementary probability. Thus each pairwise experiment is one Bernoulli experiment with $p=\frac{R}{R+2}$.
Thus n multiple pairwise experiments follow a binomial distribution with this p and n as total number of trials.
Sample size can be calculated based upon this information, either numerically using Fishers exact test or analytically using an asymptotic normal approximation. When $n$ is large $\left(np>0.5,\text{}n\left(1p\right)5\right)$ the binomial distributions can be approximated by normal distributions leading to the following expression for n:
which is in our case the number of pairs thus:
and where $\mathrm{n}=2\left({p}_{1}\right(1{p}_{1})+{p}_{2}(1{p}_{2}\left)\right){\left(\frac{{Z}_{\alpha}+{Z}_{\beta}}{{p}_{1}{\mathrm{p}}_{2}}\right)}^{2}$ and ${Z}_{\alpha}$ are the critical values of the standard Normal distribution for the two types of error, e.g. ${Z}_{\beta}$ (one sided, error rate 2.5%) and ${Z}_{\alpha}=1.96\mathrm{}$ (power 80%), and the two values are based on the reproduction ratios under the alternative hypothesis (i.e. when there is a difference), $p}_{1}=\frac{{R}_{1}}{{R}_{1}+2$ and $p}_{2}=\frac{{R}_{2}}{{R}_{2}+2$.
In order to proceed we require an estimate of the reproductive ratio ($R$) for bTB in an experimental transmission setting. We define this experimental reproductive ratio $R$ as the expected number of infections when a single infectious seeder animal is placed in contact with a completely susceptible population of size $R$ for an incontact time $H$. The definition is in contrast to the basic reproduction ratio (${T}_{C}$), which for a chronic infection such as bTB would be the expected number of infections over the lifetime of a single infectious seeder animal.
Calculated sample size and experimental duration
For a balanced design with an equal number of seeder and incontact animals the sample size therefore depends on two parameters – the number of pairwise comparisons between sentinel animals (or equivalently the group size) and the average expected numbers of infections per seeder animal over the course of the experiment (the experimental reproduction ratio $R$).
For a chronic infection like bovine Tuberculosis, $R$ is essentially a design parameter which can be adjusted by adjusting the duration of the incontact period between seeder and sentinel animals. The statistical power to estimate a significant difference between vaccinated and unvaccinated animals is relatively insensitive to the value of $R$ provided it is greater than the threshold value of 1. For an $R\sim 1.52.0$ a group size of 52 animals would provide 80% power to estimate an effect size (total vaccine efficacy) of 75% at the 95% significance level (Appendix 2—table 1). Reducing the effect size to 50% and 25% would increase the required group size to 128 and 600 animals respectively (Appendix 2—table 1).
Predicting the incontact period ($T}_{C$) necessary to achieve this target value of $R\sim 1.5$ is challenging given the sparsity of quantitative estimates of transmission rates of bTB in a small group setting. A natural transmission experiment was attempted at the Animal and Plant Health Agency (APHA) facilities at Weybridge using GB reactor animals. However, specific limitations of the animal housing and lack of timeseries data make interpretation of this data challenging – and posterior distributions for the estimated rate of transmission range over several orders of magnitude (see Appendix 3).
Robust estimates of transmission rates are almost exclusively derived from population level field data, the relevance of which for an experimental controlled transmission setting is questionable. There is considerable variability between estimates of the rate of cattletocattle transmission from the literature (a brief review is presented as Appendix 3). A further challenge for application to small group settings is that field estimates of transmission demonstrate a clear increase in the rate of transmission of bTB with herd size.
In Appendix 2—figure 2 we explore the impact of density dependence on the likely duration of natural transmission studies based upon the predicted experimental reproduction ratio $R$ from our two most recent withinherd transmission models (Conlan et al., 2015). For an effect size equivalent to a 75% total efficacy of vaccination and a 1 year contact time, a group size of between 80 (SORI model) and 150 (SOR model) animals would be necessary to achieve an 80% power at the 95% significance level. For a twoyear contact time these samples sizes fall to between 50 and 100 animals for the SORI and SOR models respectively. For a 1 year duration sample sizes increase to between 200350 animals for a 50% total efficacy (Appendix 2—figure 2–figure supplement 1) and in excess of 800 animals for a 25% total efficacy of vaccination (Appendix 2—figure 2–figure supplement 2).
Appendix 3
Minireview of estimated cattletocattle transmission rates of bTB
The duration of a challenge study depends critically on our estimate of the likely rate of cattletocattle transmission. A pilot natural transmission study with vaccinated and unvaccinated animals was carried out by the Animal Plant Health Agency (APHA, formally the Animal Health and Veterinary Laboratory Agency, AHVLA) at Weybridge in the UK. Unfortunately, for reasons discussed below, this study does not provide useful estimates of transmission rates. There is therefore a need to place these findings in the wider context of bTB transmission estimates from the literature to inform the likely range of transmission rates we might expect to be able to achieve in new experimental designs.
Weybridge pilot study and other natural transmission experiments
A pilot natural transmission study was carried out by APHA in Weybridge using 40 reactor animals recruited from UK herds and 60 sentinel animals to give a study population of 100. Animals had to be grouped together in pens of 10 animals (6 sentinels and 4 reactor animals) due to the physical design of the barn (Khatri et al., 2012) which is likely to have limited the potential for transmission. Only 8 transmission events were observed after an incontact period $T}_{C$ of 12 months. With no time series information, this final size distribution is the only information from which transmission rates can be inferred.
We estimated the (frequencydependent) transmission parameter $(\beta )$ for this final size data using an exact likelihood for the stochastic SI model calculated by numerically solving the master equations (Allen 2003). The stochastic SI model has a single event with rate $\beta S\frac{I}{N}$, where $S$ is the number of susceptible sentinel animals, $I$ the number of infected animals, $N=S+I$ the size of the group.
The point (maximum likelihood) estimate of 2.24 × 10^{−5} per year from this data is exceptionally small compared with an estimated frequency dependent transmission parameter of the order of 2 per year from field data (Fischer et al., 2005; Barlow et al., 1997). However, given the limited information in the final size distribution the 95% credible interval of the posterior estimate (Appendix 3Figure 1) ranges over several orders of magnitude (95% CI: 1.4 × 10–14 – 5.97). The consistency of this estimated effective (i.e. frequency dependent) rate of transmission with density dependent estimates of bTB transmission from the field depends on what we interpret as the herd size for this population (Appendix 3Figure 1). The maximum likelihood estimate from the Weybridge study is more consistent with an effective group size of 10 (the size of holding pens) that a group size equivalent to the size of the barn (100 animals). While this is suggestive that rates of transmission may be enhanced in a more suitable facility which allows for free mixing of animals, we must be cautious given that the full posterior predictive distributions from field estimates like within the posterior estimate of transmission from the Weybridge study.
Despite the lack of success of this experiment, there are further good reasons to expect that higher rates of transmission are achievable in designs that balance the number of infectious and susceptible contacts within each group (which as noted before was limited in the pilot experiment due to facility constraints). Indeed, experimental studies in endemically infected countries, such as Ethiopia, have achieved higher rates of transmission (Ameni et al., 2010). This success has since been repeated with nearly identical patterns of transmission to the published study (Ameni and Vordermeier, unpublished data). Unfortunately, rates of transmission have not yet been quantified empirically from these studies.
The imprecision of the transmission parameter estimate from (Khatri et al., 2012) makes it unsuitable to define a prior range of estimates to design any future experimental transmission studies. We therefore look to field estimates of cattletocattle transmission to provide a more appropriate basis to progress. This route requires careful discussion as estimates vary considerably between different studies and populations. Furthermore, transmission rates in the field and in an experimental setting cannot necessarily be expected to be equivalent due to differences in the management, health and welfare of animals and other extrinsic factors acting on herds. However, in the absence of other alternatives, field estimates of bTB transmission provide the only reasonable basis to proceed.
Published models of bovine Tuberculosis in cattle share a common basic structure but differ in two key aspects with respect to the assumed progression of infected animals to infectiousness (latency) and the scaling of transmission rates with herd size (socalled density dependence).
Latency of bovine Tuberculosis
Tuberculosis is a chronic and progressive infectious disease that has been described as having an incubation period that ranges from a few weeks to a lifetime (Comstock, Livesay, and Woolpert 1974). Withinherd transmission models have been primarily concerned with assessing the efficacy of testandslaughter protocols. Taking a standard compartmental approach, these models subdivide the period of epidemiological latency between infection and infectiousness into two further compartments based on the animals reactivity to the tuberculin skin test. When susceptible ($S$) animals become infected within such models they enter an “Occult” ($O$) compartment where they are insensitive to testing, before progressing to become “Reactive” ($R$) to the skin test and eventually (reactive and) “Infectious” (I). Such SORI models typically assume the average time between infection and infectiousness is of the order of 1 year or longer.
However, due to the basic uncertainty between diagnostic status and infectiousness for bTB discussed earlier, this period of latency has not been directly estimated and is poorly identified in models estimated from population level data (Conlan et al., 2012; O'Hare et al., 2014; BrooksPollock et al., 2014; Conlan et al., 2015). Evidence from experimental challenge studies (Kao et al., 2007) and the rates of detection of visible lesions in young reactor animals (BrooksPollock et al., 2013) suggest that this duration of latency has the potential to be short in some contexts. For this reason, but primarily for the sake of parsimony, Conlan et al. estimated simpler alternative SOR models where animals are potentially infectious immediately upon infection and the latent stages only affect the reaction of animals to the diagnostic skin test (Conlan et al., 2012, 2015). Model fits to field data have not thus far provided any basis to choose between these two structures that make very different predictions for the rates of cattletocattle transmission within herds, the burden of infection remaining after herds clear movement restrictions and the scaling of transmission rates with herd size (Conlan et al., 2012).
Scaling of rates of cattletocattle transmission with herd size
An empirical relationship between herd size and the abundance of bovine TB has been described in a variety of contexts since well before the introduction of systematic control measures for bTB (Francis 1947). This empirical relationship has motivated bTB models to model transmission using a socalled ‘density’ dependent transmission function (Barlow et al. 1997; Biek et al. 2012; O’Hare et al. 2014). This terminology is somewhat of a misnomer as under this assumption transmission (and thus the basic reproduction ratio $({R}_{0})$ is actually assumed to scale with herd size rather than the true density of animals (De Jong, Diekmann, and Heesterbeek 1995; Begon et al. 2002). The biological mechanisms responsible for generating such a herd size dependence are unclear, motivating other authors (Fischer et al. 2005) to use the more theoretically appealing and justifiable assumption of frequency dependent transmission (where transmission rates scale independently of herd size).
Conlan et al. introduced a nonlinearly density dependence transmission function (Melegaro et al., 2004; Smith et al., 2009) to attempt to select between these two extreme assumptions (Conlan et al., 2012) where the rate at which susceptible individuals within a herd becomes infectious is modelled as:
$I$ is the number of infectious individuals (taken as the sum of the $O$ and $R$ compartments for a SOR model) in a herd of size$H.$ $H$. ${H}_{m}=165$ is a constant defining a centering transformation used to improve convergence of parameter estimation. Finally, $q$ measures the strength of dependence of transmission rates of $H$, with $H$, corresponding to density dependence and $q=0$ frequency dependence.
Comparison of field estimates of cattletocattle transmission rates
Appendix 3—table 1 summarises the point estimates of transmission rates, occult and reactive periods from seven relevant models from the literature. The estimates by Barlow and Fischer were calculated (rather than formally estimated) based on the same observation of transmission within a single herd (of 200 cattle) in New Zealand but make opposite assumptions about density dependence (Barlow et al., 1997; Fischer et al., 2005). Conlan and O’Hare (Conlan et al., 2012; O'Hare et al., 2014; Conlan et al., 2015) used national GB data to estimate their models using approximate Bayesian methods of inference. The two variants of the SOR and SORI models published by Conlan et al. in 2012 and 2015, differ primarily with respect to the demographic structure of herds and the prior distributions used for estimation of the occult and reactive periods. The 2012 variants used a particularly simple demographic model for herds with an exponential age distribution (Conlan et al., 2012). The 2015 variants implement the SOR and SORI models as individual based models with a realistic agestructure reconstructed from cattle tracing system (CTS) records (Conlan et al., 2015) and incorporate an agedependent risk of infection and detection of visible lesions (BrooksPollock et al., 2013). The 2015 variants are also distinguished, in common with (BrooksPollock et al., 2014), by using a far less restrictive prior distribution on the duration of the reactive period leading to far greater estimates for this parameter than previous models.
Although the majority of these studies estimated full posterior distributions for model parameters, for the purpose of comparison only point (median) estimates are discussed here. The transmission parameter initself has no straightforward biological interpretation, as the potential for transmission also depends on the estimated values of $q=1$, $q$ and ${T}_{O}$. It is therefore more appropriate to compare estimates of transmission through the corresponding reproductive ratio for a defined population.
As previously discussed, the reproductive ratio for an experimental group (${T}_{R}$) will depend on the group size (H) and incontact time between seeder and sentinel animals ($R$). $R$ can be calculated as a function of the transmission parameters through the next generation operator (De Jong, Diekmann, and Heesterbeek 1995). For the SOR model ${T}_{C}$ is independent of the occult and reactive periods:
while for the SORI model, R also depends on the probability that a seeder animal is in the occult ($R=\frac{\beta {T}_{C}}{{\left(\raisebox{1ex}{$H$}\!\left/ \!\raisebox{1ex}{${H}_{m}$}\right.\right)}^{q}}$), reactive (${p}_{O}$) or infectious (${p}_{R}$) compartment:
$R=\frac{\beta}{{\left(\raisebox{1ex}{$H$}\!\left/ \!\raisebox{1ex}{${H}_{m}$}\right.\right)}^{q}}\left({p}_{I}{T}_{C}+\frac{{p}_{R}}{{\sigma}_{R}}\left({e}^{{\sigma}_{R}{T}_{C}}1\right)+\frac{{p}_{O}}{{{\sigma}_{O}\sigma}_{R}}\left(\frac{{\sigma}_{O}}{{\sigma}_{R}}\left({e}^{{\sigma}_{R}{T}_{C}}1\right)\frac{{\sigma}_{R}}{{\sigma}_{O}}\left({e}^{{\sigma}_{O}{T}_{C}}1\right)\right)\right)$ where ${\sigma}_{R}=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{${T}_{R}$}\right.$.
In Appendix 3—figure 1 we compare the predicted (point estimates) of $R$ for a contact time of 1 year using the point estimates from Appendix 3—table 1. For SORI type models, the predicted $R$ depends heavily on whether we assume the initial infected individual is either latently infected (Occult, right panel) or infectious (Infective, left panel). For the purpose of powering these designs we make the conservative assumption that we will only be able to recruit seeder animals based upon SICCT test status giving the empirical distributions summarised in Appendix 3—figure 2. In practice, we would hope to be able to increase the proportion of seeder animals in the infectious class by the requirement that they are both SICCT and IGRA positive, or through the use of more sophisticated assays (IL2 or microRNA expression) that predict more advanced disease.
Appendix 3—figure 3 clearly illustrates the importance of how transmission rates scale with herd size with respect to the design of transmission studies for bTB. With frequency dependent scaling of transmission rates a 1 year contact time will be comfortably sufficient to ensure that R is greater than the epidemic threshold (1) for the smallest group. However, for density dependent estimates this will only be true for group sizes in excess (for some parameter sets far in excess) of 50 animals. There is considerable variability in the assumed (or estimated) scaling of $R$ with group size between different models, although some commonalities emerge. In particular we note that the predicted $R$ for the traditional SORI model structures (Barlow et al. 1997; Fischer et al., 2005; O’Hare et al. 2014) are comparable in large herds of ~ 200 animals. The predicted $R$ for models without epidemiological latency (SOR 2012, SOR 2015) is consistently less the more traditional SORI models. Finally, the SOR 2015 and SORI 2015 models demonstrate the greatest discrepancy between each other and the rest of the estimates, but also differ considerably with respect to both model demography and the prior assumptions used for their estimation (Conlan et al. 2015). The variability in these predicted estimates of $R$ demonstrate the challenges of translating parameter estimates from the field – which are in turn also dependent on estimates of the efficacy of SICCT testing – to an experimental setting.
Given this variability selecting a single estimate to power and benchmark designs would be inappropriate. For convenience, as these models are directly available to us, we define a set of scenarios based upon estimates from the most recent SOR 2015 and SORI 2015 models (Conlan et al., 2012, 2015). The scaling of transmission rates with herdsize is an empirical observation and may well arise due to husbandry or herd management factors that are directly correlated with herdsize that may well not apply to an experimental setting. If this is the case then density dependent estimates from the field may underestimate the potential for transmission in small groups. To allow for this possibility we define a set of additional “Optimistic” scenarios where transmission rates are fixed to be independent of herd size. We achieve this by fixing the effective herd size to $R$, where the SORI model estimates approach the upper frequency dependent estimate of ${H}_{m}=165$$R$ (Fischer et al., 2005).
Data availability
All data generated or analysed during this study are included in the manuscript and supporting files. Source code for all models and simulated data sets used to generate all figures is provided in a git repository.
References

BookAn Introduction to Stochastic Processes with Applications to BiologyLondon, UK: Pearson.

Field evaluation of the efficacy of Mycobacterium bovis Bacillus CalmetteGuerin against bovine tuberculosis in neonatal calves in EthiopiaClinical and Vaccine Immunology 17:1533–1538.https://doi.org/10.1128/CVI.0022210

A simulation model for the spread of bovine tuberculosis within New Zealand cattle herdsPreventive Veterinary Medicine 32:57–75.https://doi.org/10.1016/S01675877(97)000020

A clarification of transmission terms in hostmicroparasite models: numbers, densities and areasEpidemiology and Infection 129:147–153.https://doi.org/10.1017/S0950268802007148

Farmers’ willingness to pay for a tuberculosis cattle vaccineJournal of Agricultural Economics 63:408–424.https://doi.org/10.1111/j.14779552.2011.00330.x

Agedependent patterns of bovine tuberculosis in cattleVeterinary Research 44:97.https://doi.org/10.1186/129797164497

Eliminating bovine tuberculosis in cattle and badgers: insight from a dynamic modelProceedings of the Royal Society B: Biological Sciences 282:20150374.https://doi.org/10.1098/rspb.2015.0374

The prognosis of a positive tuberculin reaction in childhood and adolescenceAmerican Journal of Epidemiology 99:131–138.https://doi.org/10.1093/oxfordjournals.aje.a121593

Potential benefits of cattle vaccination as a supplementary control for bovine tuberculosisPLOS Computational Biology 11:e1004038.https://doi.org/10.1371/journal.pcbi.1004038

Estimating the hidden burden of bovine tuberculosis in Great BritainPLoS Computational Biology 8:e1002730.https://doi.org/10.1371/journal.pcbi.1002730

ReportA Strategy for Achieving Officially Bovine Tuberculosis Free Status for EnglandLondon, United Kingdom: Department for Environment, Food and Rural Affairs.

A restatement of the natural science evidence base relevant to the control of bovine tuberculosis in Great BritainProceedings of the Royal Society B: Biological Sciences 280:20131634.https://doi.org/10.1098/rspb.2013.1634

Intractable policy failure: the case of bovine TB and badgersThe British Journal of Politics and International Relations 11:557–573.https://doi.org/10.1111/j.1467856X.2009.00387.x

Direct and indirect effects in vaccine efficacy and effectivenessAmerican Journal of Epidemiology 133:323–331.https://doi.org/10.1093/oxfordjournals.aje.a115884

Vaccination of neonatal calves with Mycobacterium bovis BCG induces protection against intranasal challenge with virulent M. bovisClinical and Experimental Immunology 139:48–56.https://doi.org/10.1111/j.13652249.2005.02668.x

Predicting undetected infections during the 2007 footandmouth disease outbreakJournal of the Royal Society Interface 6:1145–1151.https://doi.org/10.1098/rsif.2008.0433

BookEpidemic Models: Their Structure and Relation to DataMollison D, editors. Cambridge: Cambridge University Press.

Mycobacterium bovis shedding patterns from experimentally infected calves and the effect of concurrent infection with bovine viral diarrhoea virusJournal of the Royal Society Interface 4:545–551.https://doi.org/10.1098/rsif.2006.0190

Predicting prolonged bovine tuberculosis breakdowns in Great Britain as an aid to controlPreventive Veterinary Medicine 97:183–190.https://doi.org/10.1016/j.prevetmed.2010.09.007

Recurrence of bovine tuberculosis breakdowns in Great Britain: risk factors and predictionPreventive Veterinary Medicine 102:22–29.https://doi.org/10.1016/j.prevetmed.2011.06.004

A naturaltransmission model of bovine tuberculosis provides novel disease insightsVeterinary Record 171:448.2–44448.https://doi.org/10.1136/vr.101072

Field evaluation of the protective efficacy of Mycobacterium bovis BCG vaccine against bovine tuberculosisResearch in Veterinary Science 88:44–49.https://doi.org/10.1016/j.rvsc.2009.05.022

Estimating the transmission parameters of pneumococcal carriage in householdsEpidemiology and Infection 132:433–441.https://doi.org/10.1017/S0950268804001980

Estimating epidemiological parameters for bovine tuberculosis in british cattle using a bayesian partiallikelihood approachProceedings of the Royal Society B: Biological Sciences 281:20140248.https://doi.org/10.1098/rspb.2014.0248

Hostpathogen time series data in wildlife support a transmission function between density and frequency dependenceProceedings of the National Academy of Sciences 106:7905–7909.https://doi.org/10.1073/pnas.0809145106

Assessment of the protective efficacy of vaccines against common diseases using casecontrol and cohort studiesInternational Journal of Epidemiology 13:87–93.https://doi.org/10.1093/ije/13.1.87

Design and analysis of smallscale transmission experiments with animalsEpidemiology and Infection 135:202–217.https://doi.org/10.1017/S095026880600673X

Comparing methods to quantify experimental transmission of infectious agentsMathematical Biosciences 210:157–176.https://doi.org/10.1016/j.mbs.2007.04.009

Development of a skin test for bovine tuberculosis for differentiating infected from vaccinated animalsJournal of Clinical Microbiology 48:3176–3181.https://doi.org/10.1128/JCM.0042010
Decision letter

Neil M FergusonReviewing Editor; Imperial College London, United Kingdom
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "The intractable challenge of evaluating cattle vaccination as a control for bovine Tuberculosis" for consideration by eLife. Your article has been favorably evaluated by Tadatsugu Taniguchi (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
All three reviewers thought the work to be important and of broad interest, but each raised a number distinct and important major issues which need to be addressed in a revised submission. In addition, all reviewers found the current manuscript overlong and lacking in clarity. We would urge greater selectivity in presenting key results, and a substantial overhaul (and shortening) of the main text of the paper. Additional analyses and detailed methods can be included as supplementary information.
The reviewers also raised concerns about how study power was considered, and the issues of likely attack rates in transmission experiments and extrinsic infection rates should also be given particular consideration, in addition to the other detailed comments made.
Reviewer #1:
This paper explores an important topic – how to test the efficacy of a bovine TB vaccine – using simulation models of TB transmission to test different trial designs. Overall, the analysis seem rigorous to the extent I could judge it, but ambiguity or missing details left me with several significant questions. The presentation of the paper is far from ideal overall – it is much too long, difficult to follow, and lack of required detail in the Materials and methods means it is not always clear how results have been derived. Really the paper needs major reorganisation, with a (much) shorter and selective main text, then a more detailed supplement to detail methods, parameterisation and sensitivity analyses.
Issues:
 Subsection “Review of estimated cattletocattle transmission rates for bovine TB” is critical – the failure of the AHVLA experiment fundamentally calls into question the feasibility of experimental studies of vaccine efficacy in the UK context. This really needs to be highlighted up front in the paper (not just in the Materials and methods). I note the authors contention that increasing group size and the proportion of the group initially infected would likely substantially increase infection rates, but given the cost of such studies and the need to demonstrate that the results will likely transfer to the field setting, I would suggest that funding any largescale transmission study should be contingent upon achieving a higher attack rate in a small pilot (e.g. 50 animals, 25 infected, 25 uninfected). Such a pilot would also give invaluable data with which to better power future vaccine studies.
 How exactly are the expected values plotted in Figure 3A (and like figures) being estimated? No details are given in the Materials and methods. I'm guessing it is the average of the estimated VE over a large number of simulations of the experimental transmission study of X herds? How was VE calculated – using the expressions given in Figure 1? Giving some representation of the 95% range of the estimates from single experiments would be useful.
 Likewise, how exactly are the estimates of power calculated in Figure 3B? I presume from their simulations? In doing so, are they also simulating a 2level analysis of the simulated trial results (i.e. accounting for variation between herds)? What primary endpoint is being evaluated – a difference in attack rates between vaccinated and control herds, or estimation of VE to some level of precision (e.g. +/0.05). I would suggest the latter is more useful. i.e. for a fixed set of assumptions about VE, run 1000 simulations of the trial in X herds, and count the number of simulations, Z, for which the desired measure of efficacy is estimated to within +/0.05 (say) of its true value. Power is then Z/1000.
 "Clearly the latter methods have a higher power and that is the method that is used for estimation and calculation of the posthoc power from the simulation studies described below (A4)". I am confused. What does A4 refer to? Which tables and figures use the FS based approximate analytical calculation and which use direct simulation? Precise details of how vaccine efficacy and power was calculated from simulations of the experimental design should be given. I don't frankly see that the analytical approach in the subsection “Sample size calculations for natural transmission study” adds anything very much. Given any experiment must have a fixed duration, what is important is the net attack rate seen in the vaccinated and control animals over the duration of the experiment. This depends on the transmission rate (and how that varies as a function of the time from infection) and the duration of the study phases (i.e. contact time). Going from transmission rates to R and back confuses things, at least for me. I would rather see Figure 10 show the posterior distribution for transmission rate (including the herd size factor), therefore. This would be more informative than the estimates given in Table 12. It would also allow Figure 9 to be removed – which doesn't add anything informative beyond Figure 11 in my view – indeed the addition of the red vertical dashed lines to Figure 9 is confusing.
 Continuing in the same vein, the paper gives the impression that Table 11 is driven by the results of Table 10, which misses the subtleties of Figure 11. This latter figure is the most interesting in the paper, but I didn't understand some of the trends in Figure 11. Why does increasing contact time decrease power (for fixed group size) for some model variants and vaccine efficacies? I can only assume this is because infection rates are saturating in both groups. However, if the experiment was analysed making use of all the DIVA test results in a survival analysis, this shouldn't matter – the higher infection hazard in the control group animals should still be resolvable in the first phase, and the lower infectiousness of the vaccinated animals in the second phase. As I've said above, the authors need to give precise details of how power is being estimated from the simulation for Figure 11 (see above) – how are the (simulated) experimental results being analysed, what is the trial primary endpoint (i.e. what statistical test is being examined when calculating power)? Again, given the cost of such experiments, analysis needs to make best possible use of the data collected – which survival analysis is more likely to achieve than simple comparison of final attack rates.
 In the first paragraph of the subsection “Group size and duration of transmission studies under different transmission scenarios” – What are Figures B3 and B4 and what is Table B1? Assuming Table B1 is actually Table 12, how are the values given in that table used to generate Figure 9 on? In particular, were the first 2 rows of Table 12 used for any simulations, or just the Conlan et al. estimates? As mentioned elsewhere, I would drop the Conlan 2012 results – presumably they were superseded by the 2015 ones, and they give rather optimistic results for the trial contact times.
 Table 11, 25% effect size – the bottommost rows have a group size of 800, while this group size isn't mentioned in Table 10. Is this a typo? As commented below, I don't feel including all the 2012 model variants adds anything here.
 The second paragraph of the subsection “Comparison of field estimates of cattletocattle transmission rates” is unnecessary. Presumably the 2015 models are preferred, so reference to and results relating to the 2012 model (half of Figure 8, Figures 9, 11, Figure 11—figure supplements 2, 4, half of Table 11) can be removed.
Reviewer #2:
This paper is a well written, thorough and painstaking analysis of a narrow technical issue, sample size calculations for a hypothetical trial of vaccination to protect cattle from bovine TB. It is a substantial and technically useful piece of work, though not easily generalizable given the specific and complicated details of bovine TB epidemiology and management in the UK.
I agree with the statement (subsection “Conceptual design to estimate vaccine efficacy and herd level effectiveness”, fifth paragraph) that it is important to evaluate the impact of vaccination on infectiousness as well as susceptibility – as a rule the former is ignored.
The prediction that there would be, in the situation modelled, only a very small indirect impact of vaccination means that very large sample sizes would be needed for a trial to detect it. The situation modelled includes current test and slaughter practices, but presumably if a vaccine were to be used it would not be used in conjunction with these. If it was, as is spelt out later, the additional benefit would be very small and presumably not cost effective. Or is that the point the authors wish to make? Either way, a vaccinationonly scenario would be of interest (regardless of current EC requirements).
The way statistical power is estimated (more than "slightly" unusual in my view – Discussion, third paragraph) also makes the study less generalizable. It would help to set out what efficacy we are looking for, greater than zero seems a very low bar. (What's more, the subsequent discussion about the DIVA test implies that even if the vaccine itself had zero efficacy there would be some effect of the more sensitive test – Discussion, sixth paragraph). These impacts could be partitioned by varying parameter settings appropriately.
The role of extrinsic infection (Discussion, eighth paragraph) could also be explored more systematically. The issue of testing efficacy locally when the ultimate aim would be to intervene over a whole population is problematic for many intervention trials for infectious disease.
Overall, I felt that, though a rather daunting volume of results are presented already, more could have been done to dissect out the likely multiple contributors to the low efficacy anticipated in a herd level vaccination trial.
Reviewer #3:
General Comments/Suggestions:
This manuscript describes the adaptation of previously published models to understand how field trials might (or might not!) detect the benefits of a bTB vaccine deployed in Britain. In general, the methods and conclusions seem sound, and represent an important warning on relying on these sorts of trials. I did have some problems understanding the interpretation results shown in the figures associated with this manuscript, and I suspect that some figures may not be referred to correctly. I have included some suggestions below on how to make the figures more easily readable. The mathematical typesetting in the manuscript also made it somewhat less readable – typesetting that sets apart mathematical entities like R and R_{0} more clearly would have helped me, and would also have limited confusion when "R" is used both as a reproduction parameter, and as a compartment (e.g. in the SORI model).
Because this work is based on mathematical modelling, I would urge the authors to make as much of the modelling code as possible available on a public repository, or provide links to the previouslypublished code used. Publishing code in this way makes the work much more reproducible.
[Editors’ note: the authors were asked to provide a plan for revisions before the editors issued a final decision. What follows is the editors’ letter requesting such plan.]
Thank you for sending your article entitled "The intractable challenge of evaluating cattle vaccination as a control for bovine Tuberculosis" for peer review at eLife. Your revised article has been favorably evaluated by Tadatsugu Taniguchi (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.
Please review the major comments of reviewer #1 (the Reviewing Editor), which center on the interpretation of your results and their relevance for policy. We would then ask you to respond within the next two weeks with your views on how justified you feel these comments to be, and an action plan and timetable for the completion of any additional work. We plan to share your responses with the reviewers and then issue a binding recommendation.
Reviewer #1:
The rewritten paper is much clearer and more comprehensible. My few technical comments are detailed below. I do have more major issues with the conclusions and tone of the paper however:
 Given the problematic experience with previous transmission experiments, my own conclusion from reading this paper was that relatively small field trial of 200 herds would give valuable information about the likely veterinary health impact of vaccination at the individual animal level (but that a trial with 500 herds would be needed to measure herdlevel effectiveness).
 Data from an experimental study is fundamentally different in quality than that from a randomised field trial. In human public health, the former might be viewed as equivalent to a prephase II human challenge study, which gives proof of concept. It does not guarantee that the results can be read across to the natural setting – which is why phase III trials are still needed.
 The authors seem unnecessarily pessimistic about what their simulations imply for the feasibility of field trials, at least at the individual animal level. I interpret Figure 3 as showing that a trial run in 200 herds would have excellent power at measuring the direct effect of vaccination, and reasonable power at measuring the 'total' effect.
 Yes, measuring indirect effects is difficult, but arguably is addressable in a postmarketing (phase IV) implementation study.
 I think there are issues with a clusterrandomised trial with 50% coverage in each cluster (herd). Even for the vaccinated animals, outcomes will be different in a herd with 100% vs 50% coverage of a leaky vaccine. Plus, presumably the goal for any widescale vaccination policy would be 100% coverage? A 3arm trial with herd level vaccination coverage of 0, 50% and 100% in the three arms might be more informative. Comparing attack rates in the 0 and 100% arms would give a measure of total effect (the most important outcome). Comparing vaccinated animals in the 50% and 100% coverage arms and unvaccinated animals in the 50% and 0% arms would give more information (and therefore power) to differentiated impacts of vaccination on infectiousness and susceptibility.
 Appendix 2 on wholeherd effectiveness is interesting and critically important (to the extent I would much rather see this in the main text and the discussion of experimental transmission studies in an appendix) – and in my view calls into question the whole viability of vaccination, if the overall impact on herd breakdowns is really only likely to be in the range of 1020%. Putting that (major) issue to one side, the results in this section also highlight the potential benefits of a 3arm design.
 Regarding the discussion of bias in RR measures – it is unsurprising that such measures underestimate ε_{s} – it's the difference between comparing a hazard with a cumulative hazard.
 Indeed, Figure 3 (and the supplementary version) seems to show that one could use models to quite reliably go back from the measured relative risks to the underlying effect of vaccine on susceptibility – albeit not on infectiousness. – Figure
Reviewer #2:
The first round of reviews seems to have picked up a large number of errors and presentational issues. The authors have addressed these fairly comprehensively and the manuscript is greatly improved as a result. If the topic is thought appropriate for eLife then I recommend that the manuscript is now acceptable for publication.
Reviewer #3:
In general I am satisfied with the reorganisation and changes made. I find the manuscript is more focused and easier to get through in its current form. I still find the Appendices somewhat arduous, and would encourage the authors to consider any lastminute changes they can make to streamline them, but I accept that sometimes Appendices with technical content can be long.
I was a bit disappointed that the authors felt they could not address the impact of the distribution of latencies on the design, but accept their justification that, with the very high level of uncertainty on these distributions, they do not want to "muddy the waters" in this already very long submission.
I appreciate the links to public code repositories.
https://doi.org/10.7554/eLife.27694.044Author response
Reviewer #1:
This paper explores an important topic – how to test the efficacy of a bovine TB vaccine – using simulation models of TB transmission to test different trial designs. Overall, the analysis seem rigorous to the extent I could judge it, but ambiguity or missing details left me with several significant questions. The presentation of the paper is far from ideal overall – it is much too long, difficult to follow, and lack of required detail in the Materials and methods means it is not always clear how results have been derived. Really the paper needs major reorganisation, with a (much) shorter and selective main text, then a more detailed supplement to detail methods, parameterisation and sensitivity analyses.
Issues:
We have sharpened the focus of the paper, moving technical details of methods, models, review of transmission rates and sensitivity analyses to a new supplementary information document.
 Subsection “Review of estimated cattletocattle transmission rates for bovine TB” is critical – the failure of the AHVLA experiment fundamentally calls into question the feasibility of experimental studies of vaccine efficacy in the UK context. This really needs to be highlighted up front in the paper (not just in the Materials and methods). I note the authors contention that increasing group size and the proportion of the group initially infected would likely substantially increase infection rates, but given the cost of such studies and the need to demonstrate that the results will likely transfer to the field setting, I would suggest that funding any largescale transmission study should be contingent upon achieving a higher attack rate in a small pilot (e.g. 50 animals, 25 infected, 25 uninfected). Such a pilot would also give invaluable data with which to better power future vaccine studies.
We acknowledge, and to a certain extent, share this concern about the feasibility of natural transmission studies using UK reactor animals. However, specific aspects of the design of the Khatri et al. study make interpretation of the findings challenging. Indeed reviewer #2 takes the contrasting view that our discussion of the AHVLA study was too negative as the rate of transmission seen in this study was not inconsistent with estimates from the field.
As discussed in the manuscript, and acknowledged by reviewer #1, the practicality of such studies depends on how transmission rates scale with group size. Likewise, the extent to which the rate of transmission seen in the Khatri et al.study is consistent with field estimates depends critically on what we consider the effective group size was in this study. We have estimated the effective (frequency dependent) transmission rate from the results of the AHVLA transmission experiment. The maximum likelihood estimate from this data is indeed comparable to the field estimate (SOR model) for a herd size of 10 animals which corresponds to the size of pens within the barn used at AHVLA. This grouping was imposed by the physical design of the facility and is likely to have reduced the opportunities for transmission between the 100 animals that were held within the same barn. However, perhaps the most surprising outcome of this study was that out of the eight transmission events observed two were associated with a genotype that was not present in reactor animals housed within the same pen. As transmission appears to have taken place between pens, the group size for this experiment could be argued to be the total number of animals in the barn. In this case, the maximum likelihood estimate of transmission would be far less consistent with field estimates, falling within the lower tail of the posterior distribution estimated from field data.
However, the posterior distribution for the transmission parameter estimated from the AHVLA study ranges over several orders of magnitude and includes both of these extreme assumptions for the effective group size. Given the level of uncertainty in this estimate we chose not to dwell on this issue in the original manuscript. We would further hesitate to rule out the feasibility of a transmission study in the UK solely on these data which were compromised by the practical constraints of the available facilities at AHVLA.
To address the concerns of both reviewer #1 and #2 we have added this analysis and comparison to the new supplementary information file. However, we concur that there is a strong argument to be made for a smaller scale pilot transmission study carried out and have added this to the Discussion. We would go further and suggest that carrying out such studies in endemic countries, although ruled out by EFSA for policy decisions in the UK/EU, would be a more practical way to establish the mode of action of BCG. We are currently planning such experiments in Ethiopia and India based on the design presented in this study and have added this wider biological context in the discussion of the revised manuscript.
 How exactly are the expected values plotted in Figure 3A (and like figures) being estimated? No details are given in the Materials and methods. I'm guessing it is the average of the estimated VE over a large number of simulations of the experimental transmission study of X herds? How was VE calculated – using the expressions given in Figure 1? Giving some representation of the 95% range of the estimates from single experiments would be useful.
 Likewise, how exactly are the estimates of power calculated in Figure 3B? I presume from their simulations? In doing so, are they also simulating a 2level analysis of the simulated trial results (i.e. accounting for variation between herds)? What primary endpoint is being evaluated – a difference in attack rates between vaccinated and control herds, or estimation of VE to some level of precision (e.g. +/0.05). I would suggest the latter is more useful. i.e. for a fixed set of assumptions about VE, run 1000 simulations of the trial in X herds, and count the number of simulations, Z, for which the desired measure of efficacy is estimated to within +/0.05 (say) of its true value. Power is then Z/1000.
This was a key omission in the original submission and we appreciate the opportunity to correct this. In the reorganized manuscript, we have added a new Materials and methods section in the main manuscript which focuses on the definition of efficacy measures from field and experimental designs and the explicit calculations used to power each respective trial design.
On reflection, we realize that the distinction between how, and why, models have been used to inform field and experimental designs was not made clearly or powerfully enough in the original manuscript. As a consequence, one of our key messages – that field trials are a fundamentally imprecise way of assessing the mode of action of cattle vaccination with BCG – was also obscured.
Experimental transmission studies can be designed to directly estimate the effect of vaccination on susceptibility to infection ($\u03f5}_{S$) and reduction in infectiousness ($\u03f5}_{I$) through either a mechanistic (e.g. chainbinomial) or survival analysis. The assumed effect size and the expected effect size in this situation are the same and the power to estimate an effect depends only on the expected rate of transmission that can be achieved.
Use of a mechanistic model in a field trial setting to directly estimate εS and εI is not feasible in this setting due to our inability to quantify or control the extrinsic infectious pressure acting on cattle from sympatric wildlife reservoirs. Randomisation can be used to deal with this confounder, but the cost is we must use population level relative risk measures of efficacy which although related to the assumed individual level efficacy of vaccination ($\u03f5}_{S$ and $\u03f5}_{I$) are population level measures whose expected value also depends on the rates of transmission within herds, the balance of intrinsic and extrinsic transmission rates and the frequency of testing and removal of reactor animals.
This was the motivation for using withinherd transmission models, estimated from field data and representative of the range of herd demographics and transmission settings seen in GB to predict the likely effect size of these relative risk measures of efficacy for different assumed individual effects ($\u03f5}_{S},{\u03f5}_{I$). The variability in these measures from model simulations is extensive. In our original submission, we omitted confidence intervals on the plots of predicted effect size due to the difficulty in visualizing such wide and overlapping distributions. While this uncertainty is reflected in the low power to see a significant effect of vaccination for small numbers of herds, we acknowledge that this does not make clear the high probability of estimating a negative efficacy of vaccination for small effects. To bring home this point we have added supplemental plots that illustrate the full range of the predictive distribution for each relative risk measures of vaccine efficacy for the most optimistic individual level effect (Figure 3—figure supplement 1, Figure 3—figure supplement 4 and others in supplementary information).
While we agree with both reviewer #1 and #2 that precision is normally a more useful measure of the power of a trial design, in the face of such fundamental imprecision it strongly believe it is not meaningful here. The “lowbar” of estimating any significant effect of vaccination is, we would argue, the best we might hope to achieve from a field setting. The large risk that remains that an underpowered trial will estimate a (statistically significant) negative efficacy is our main argument that, in particular for quantifying the mode of action and impact of BCG on infectiousness, controlled natural challenge designs are the only realistic proposition.
 "Clearly the latter methods have a higher power and that is the method that is used for estimation and calculation of the posthoc power from the simulation studies described below (A4)". I am confused. What does A4 refer to? Which tables and figures use the FS based approximate analytical calculation and which use direct simulation? Precise details of how vaccine efficacy and power was calculated from simulations of the experimental design should be given. I don't frankly see that the analytical approach in the subsection “Sample size calculations for natural transmission study” adds anything very much. Given any experiment must have a fixed duration, what is important is the net attack rate seen in the vaccinated and control animals over the duration of the experiment. This depends on the transmission rate (and how that varies as a function of the time from infection) and the duration of the study phases (i.e. contact time). Going from transmission rates to R and back confuses things, at least for me. I would rather see Figure 10 show the posterior distribution for transmission rate (including the herd size factor), therefore. This would be more informative than the estimates given in Table 12. It would also allow Figure 9 to be removed – which doesn't add anything informative beyond Figure 11 in my view – indeed the addition of the red vertical dashed lines to Figure 9 is confusing.
Apologies first about the inclusion of this reference to A4, a posthoc simulation study carried to benchmark the analytical results for a range of simulation scenarios. In the interests of length and given the consistency and robustness of the analytic results we chose not to include this simulation study in this manuscript.
While we appreciate, the reviewers point about moving between transmission rates and reproduction ratios we disagree that transmission rates – even when scaled by herd size – are a more biologically meaningful quantity. In addition to potentially scaling with group size, estimated transmission rates depend on the form of the assumed latency distributions between infection and infectiousness associated with a particular model. The reproduction ratio for a particular duration of experiment folds in this additional information and is therefore a more meaningful comparator. Furthermore, the reproduction ratio serves as a design parameter (as noted by reviewer #2) for the existing experimental design theory that we depend on for the proposed design.
However, we agree that the presentation of the analytical results that are used to power the proposed natural transmission study could have been more clearly communicated. To this end we have generated new simplified figures that explore power as a function of group size and three discrete contact times of one, two and three years (Figure 5) and replace Figures 9, 11 and associated tables from the original manuscript.
 Continuing in the same vein, the paper gives the impression that Table 11 is driven by the results of Table 10, which misses the subtleties of Figure 11. This latter figure is the most interesting in the paper, but I didn't understand some of the trends in Figure 11. Why does increasing contact time decrease power (for fixed group size) for some model variants and vaccine efficacies? I can only assume this is because infection rates are saturating in both groups. However, if the experiment was analysed making use of all the DIVA test results in a survival analysis, this shouldn't matter – the higher infection hazard in the control group animals should still be resolvable in the first phase, and the lower infectiousness of the vaccinated animals in the second phase. As I've said above, the authors need to give precise details of how power is being estimated from the simulation for Figure 11 (see above) – how are the (simulated) experimental results being analysed, what is the trial primary endpoint (i.e. what statistical test is being examined when calculating power)? Again, given the cost of such experiments, analysis needs to make best possible use of the data collected – which survival analysis is more likely to achieve than simple comparison of final attack rates.
Once again, apologies for the lack of detail in the specific power calculations and use of simulation (for field trial designs) and analytic results (for experimental designs). As the reviewer highlights the sample size calculation for the proposed experimental design is based on the final size distribution. Thus, based on the analytic calculation power will decrease with the difference in proportion infected within the vaccinated and control group – in particular when infection saturates in both groups. We agree that using the time series information provided by DIVA testing has the potential to greatly increase statistical power through the use of a survival or mechanistic chainbinomial transmission model. Indeed, we carried out such an analysis based upon the chainbinomial model in the simulation study mentioned above and found the estimates were both unbiased and that they validated the robustness of the analytic estimates of samples sizes as described in this paper.
We have updated the paper to make clear that these methods would be the most powerful way to analyse data from such experimental designs, but that the analytic final size method provides a robust, and most importantly conservative, method of sample size calculation.
 In the first paragraph of the subsection “Group size and duration of transmission studies under different transmission scenarios” – What are Figures B3 and B4 and what is Table B1? Assuming Table B1 is actually Table 12, how are the values given in that table used to generate Figure 9 on? In particular, were the first 2 rows of Table 12 used for any simulations, or just the Conlan et al. estimates? As mentioned elsewhere, I would drop the Conlan 2012 results – presumably they were superseded by the 2015 ones, and they give rather optimistic results for the trial contact times.
 Table 11, 25% effect size – the bottommost rows have a group size of 800, while this group size isn't mentioned in Table 10. Is this a typo? As commented below, I don't feel including all the 2012 model variants adds anything here.
 The second paragraph of the subsection “Comparison of field estimates of cattletocattle transmission rates” is unnecessary. Presumably the 2015 models are preferred, so reference to and results relating to the 2012 model (half of Figure 8, Figures 9, 11, Figure 11—figure supplements 2, 4, half of Table 11) can be removed.
Apologies once more for the formatting errors which were introduced during the submission process and the lack of clarity between numerical and analytic results. On advice, we have dropped the Conlan, 2012 results from the revised manuscript and simplified the presentation of sample size calculations as discussed above.
Reviewer #2:
This paper is a well written, thorough and painstaking analysis of a narrow technical issue, sample size calculations for a hypothetical trial of vaccination to protect cattle from bovine TB. It is a substantial and technically useful piece of work, though not easily generalizable given the specific and complicated details of bovine TB epidemiology and management in the UK.
I agree with the statement (subsection “Conceptual design to estimate vaccine efficacy and herd level effectiveness”, fifth paragraph) that it is important to evaluate the impact of vaccination on infectiousness as well as susceptibility – as a rule the former is ignored.
The prediction that there would be, in the situation modelled, only a very small indirect impact of vaccination means that very large sample sizes would be needed for a trial to detect it. The situation modelled includes current test and slaughter practices, but presumably if a vaccine were to be used it would not be used in conjunction with these. If it was, as is spelt out later, the additional benefit would be very small and presumably not cost effective. Or is that the point the authors wish to make? Either way, a vaccinationonly scenario would be of interest (regardless of current EC requirements).
The way statistical power is estimated (more than "slightly" unusual in my view – Discussion, third paragraph) also makes the study less generalizable. It would help to set out what efficacy we are looking for, greater than zero seems a very low bar. (What's more, the subsequent discussion about the DIVA test implies that even if the vaccine itself had zero efficacy there would be some effect of the more sensitive test – Discussion, sixth paragraph). These impacts could be partitioned by varying parameter settings appropriately.
The role of extrinsic infection (Discussion, eighth paragraph) could also be explored more systematically. The issue of testing efficacy locally when the ultimate aim would be to intervene over a whole population is problematic for many intervention trials for infectious disease.
Overall, I felt that, though a rather daunting volume of results are presented already, more could have been done to dissect out the likely multiple contributors to the low efficacy anticipated in a herd level vaccination trial.
The European Union and UK policy makers have consistently held the view that the use of cattle vaccination will only be acceptable as a supplement to ongoing testandslaughter. We agree with the reviewer that this poses a particular barrier for the potential costeffectiveness of any roll out as well as limiting the likely success of any prospective field trials. As such, we also agree that it is important to consider the potential for a vaccinationonly trial and have addressed this question in the revised manuscript.
Rather than replicating our analysis of the full range of efficacy and persistence measures for this new scenario, and to address the related question as to the factors contributing to the low expected efficacy, we examine the impact that retention of DIVA test positive animals has on the predicted number of reactor animals seen in different trial scenarios.
In Figure 4 of the revised manuscript we present posterior predictive distributions for the withinherd prevalence of bTB. These distributions reveal a high frequency mode of single (or very few) reactor TB incidents. This prediction is consistent with the distribution of reactor animals seen within UK herds where less than half of bTB breakdowns have more than 1 reactor animal disclosed. This empirical observation is also reflected in the cattletocattle reproduction ratios from withinherd models which suggest a bimodal distribution of subcritical (R_{0} < 1) and supercritical herds (R_{0} > 1).
The low predicted attack rate in the majority of trial herds leads to relative risk measures systematically underestimating vaccine efficacy. Indeed the ability to discriminate between the attack rate in vaccinated and unvaccinated groups rests almost entirely with the relatively few herds which experience a high attack rate. This heterogeneity is, we would argue, the fundamental constraint that makes relative risk measures at the population level a poor way to assess the efficacy of cattle vaccination for bTB in Great Britain.
The new Figure 4 illustrates the extent to which retaining reactor animals or increasing the duration of the trial might increase the expected attack rate. Perhaps surprisingly, retaining reactor animals has almost no effect on the expected herdlevel prevalence for a threeyear trial. This is due to the long generation time of bTB transmission, requiring in excess of 15 years for model herds to reach an endemic equilibrium. Increasing the duration of trial has a more pronounced effect, but has little impact on the mode of low prevalence herds serving to thicken the long tail of herds which see a relatively higher rate of transmission.
The only feasible solution we can see to increasing the utility of relative risk measures of vaccine efficacy would be to target herds likely to see a higher rate of transmission. For example, if we could target large herds with a low turnover of animals we could increase the chances that a randomly selected herd has a reproduction ratio greater than 1. However, this would not be a practical design for a field trial due to the relative rarity of such farms (larger farms tend also to have higher turnovers) and the dependence on farmers to volunteer for participation. Even within model simulations, such targeting would only have limited effect due to the high level of stochasticity and parametric uncertainty. Likewise, targeting herds with a high extrinsic rate of transmission could increase the probability of exposure during a trial but is impractical due to the complete lack of data or methodology to quantify the local infection risk for particular herds.
Practicality notwithstanding, such targeting of herds would not satisfy the EU/EFSA requirement that field trials be carried out in herds that are representative of European production conditions.
The way statistical power is estimated (more than "slightly" unusual in my view – line 417) also makes the study less generalizable. It would help to set out what efficacy we are looking for, greater than zero seems a very low bar. (What's more, the subsequent discussion about the DIVA test implies that even if the vaccine itself had zero efficacy there would be some effect of the more sensitive test – lines 4535). These impacts could be partitioned by varying parameter settings appropriately.
As argued above in our response to reviewer #1 one of our main conclusions is that the imprecision of relative risk measures of vaccine efficacy for bovine TB would generate a high risk of estimating a negative vaccine efficacy even when a true effect exists. In this context an efficacy greater than zero may in fact be too high a bar for any vaccine, a point we now make more explicit in the revised paper.
Reviewer #3:
General Comments/Suggestions:
This manuscript describes the adaptation of previously published models to understand how field trials might (or might not!) detect the benefits of a bTB vaccine deployed in Britain. In general, the methods and conclusions seem sound, and represent an important warning on relying on these sorts of trials. I did have some problems understanding the interpretation results shown in the figures associated with this manuscript, and I suspect that some figures may not be referred to correctly. I have included some suggestions below on how to make the figures more easily readable. The mathematical typesetting in the manuscript also made it somewhat less readable – typesetting that sets apart mathematical entities like R and R_{0} more clearly would have helped me, and would also have limited confusion when "R" is used both as a reproduction parameter, and as a compartment (e.g. in the SORI model).
Typesetting of all mathematical entities using equation editor has been carried out in the revised manuscript.
Because this work is based on mathematical modelling, I would urge the authors to make as much of the modelling code as possible available on a public repository, or provide links to the previouslypublished code used. Publishing code in this way makes the work much more reproducible.
A data repository containing all simulation code, parameter sets, data and script to reproduce every figure in the manuscript has been made available with the resubmission.
[Editors’ note: what follows is the authors’ plan to address the revisions.]
Reviewer #1:
The rewritten paper is much clearer and more comprehensible. My few technical comments are detailed below. I do have more major issues with the conclusions and tone of the paper however:
 Given the problematic experience with previous transmission experiments, my own conclusion from reading this paper was that relatively small field trial of 200 herds would give valuable information about the likely veterinary health impact of vaccination at the individual animal level (but that a trial with 500 herds would be needed to measure herdlevel effectiveness).
 Data from an experimental study is fundamentally different in quality than that from a randomised field trial. In human public health, the former might be viewed as equivalent to a prephase II human challenge study, which gives proof of concept. It does not guarantee that the results can be read across to the natural setting – which is why phase III trials are still needed.
 The authors seem unnecessarily pessimistic about what their simulations imply for the feasibility of field trials, at least at the individual animal level. I interpret Figure 3 as showing that a trial run in 200 herds would have excellent power at measuring the direct effect of vaccination, and reasonable power at measuring the 'total' effect.
 Yes, measuring indirect effects is difficult, but arguably is addressable in a postmarketing (phase IV) implementation study.
 I think there are issues with a clusterrandomised trial with 50% coverage in each cluster (herd). Even for the vaccinated animals, outcomes will be different in a herd with 100% vs 50% coverage of a leaky vaccine. Plus, presumably the goal for any widescale vaccination policy would be 100% coverage? A 3arm trial with herd level vaccination coverage of 0, 50% and 100% in the three arms might be more informative. Comparing attack rates in the 0 and 100% arms would give a measure of total effect (the most important outcome). Comparing vaccinated animals in the 50% and 100% coverage arms and unvaccinated animals in the 50% and 0% arms would give more information (and therefore power) to differentiated impacts of vaccination on infectiousness and susceptibility.
 Appendix 2 on wholeherd effectiveness is interesting and critically important (to the extent I would much rather see this in the main text and the discussion of experimental transmission studies in an appendix) – and in my view calls into question the whole viability of vaccination, if the overall impact on herd breakdowns is really only likely to be in the range of 1020%. Putting that (major) issue to one side, the results in this section also highlight the potential benefits of a 3arm design.
 Regarding the discussion of bias in RR measures – it is unsurprising that such measures underestimate ε_{s} – it's the difference between comparing a hazard with a cumulative hazard.
 Indeed, Figure 3 (and the supplementary version) seems to show that one could use models to quite reliably go back from the measured relative risks to the underlying effect of vaccine on susceptibility – albeit not on infectiousness.
Reviewer #2:
The first round of reviews seems to have picked up a large number of errors and presentational issues. The authors have addressed these fairly comprehensively and the manuscript is greatly improved as a result. If the topic is thought appropriate for eLife then I recommend that the manuscript is now acceptable for publication.
Reviewer #3:
In general I am satisfied with the reorganisation and changes made. I find the manuscript is more focused and easier to get through in its current form. I still find the Appendices somewhat arduous, and would encourage the authors to consider any lastminute changes they can make to streamline them, but I accept that sometimes Appendices with technical content can be long.
I was a bit disappointed that the authors felt they could not address the impact of the distribution of latencies on the design, but accept their justification that, with the very high level of uncertainty on these distributions, they do not want to "muddy the waters" in this already very long submission.
I appreciate the links to public code repositories.
We thank the reviewers again for the careful review of our manuscript and insightful comments.
In particular the reviewing editor makes careful reference to the standards and practice of field evaluation of vaccines in human public health to make a case for the inherent usefulness of a field trial of BCG in cattle. We find very little to disagree with these comments and acknowledge that the presentation of our results may have obscured our support for the proper evaluation of BCG vaccination in the field.
We wholeheartedly agree that field evaluation of the individual efficacy of BCG in 200 herds would be feasible and provide valuable information to inform decision on deployment. Indeed as part of the government funded Triveritas consortium, authors on this paper made this specific recommendation to Defra. We agree that a 3arm design would additionally provide actionable information on the effectiveness of BCG under conditions of deployment in the field. Such a design was not offered within the Triveritas report due to concerns over the likely cost of field trials and the requirement to mitigate the risk of failure. Hence, a twostage trial was proposed with progression to a larger trial to assess effectiveness contingent on the success of a smaller scale trial to demonstrate individual efficacy.
However, the intractable – not impossible – challenge of field evaluation of BCG in Great Britain comes from the relationship between the scale of such trials, the current legal status of vaccination and the expectation of what can be achieved by a single trial. When Defra commissioned the design of field trials and evaluated the proposed designs they took the view that field trials must satisfy all of the recommendations of the EFSA report including the two key requirements highlighted in our manuscript relating to indirect transmission and use of vaccination as a supplement to testing rather than a replacement. This position goes beyond the EFSA recommendations themselves which clearly state that deviation from the recommendations could be made provided there was a strong scientific reason for the decision.
A key objective of this paper is to provide a clear argument for the value of carrying out a series of more tractable trials which are more each more appropriate for assessing different facets of the efficacy of BCG. The analogy to the phases of vaccine trials raised by the reviewing editor is a powerful one, which we have used ourselves in private discussions with Defra. In particular we agree with the reviewing editor that addressing the indirect effects of vaccination would potentially be achievable in a phase IV trial after deployment. However, the current legal situation is that the use of cattle vaccination is explicitly prohibited by EU, EC and GB law. For the purposes of a trial BCG could be used under authorisation of the Secretary of State – however those animals would likely be subject to movement and trade restrictions for the rest of their life. Deployment of vaccination at scale would require a change in UK and EU/EC law (subject still to negotiations surrounding trade agreements following Brexit) which at the moment is linked to satisfying all of the requirements of the EFSA report including the indirect effects of vaccination. The UK government’s interpretation of the EU position is that the impact of BCG on transmission of bTB must be demonstrated before marketing authorisation is given to allow deployment of the vaccine at scale. We believe that this is an unreasonable expectation. Although we agree that future field evaluation of indirect effects would be important, we also contend that experimental quantification of the mode of action of BCG would be a more tractable solution to break this legal and political impasse.
We agree with the reviewing editor about the importance of the Appendix on wholeherd effectiveness. However, we would disagree that these results call into question the viability of vaccination per se, rather that they bring into question the viability of vaccination that is only used as a supplement to ongoing testandslaughter. The official Defra position is that field trials have been postponed until the diagnostic DIVA test – on which the power of our trial designs critically depend – is appropriately validated. However, a contributing factor to this decision was the overwhelming challenge of seeing a positive costbenefit for vaccination after factoring in the additional costs of a new diagnostic test and the vaccine itself. We believe that vaccination could still play an important role in the control of bovine tuberculosis – particularly in developing countries where testandslaughter is not acceptable economically, or in the case of India, ethically. In the UK we would also argue that the use of vaccination as a replacement for testandslaughter could transform the case for vaccination – and the potential to evaluate its effectiveness robustly in the field.
To address these major issues from the reviewing editor we propose the following plan of action:
 Swap the vaccine effectiveness scenarios into the main body of the paper and move the experimental design to a technical appendix
 Revise the discussion to:
Explicitly state that field evaluation of BCG is viable, providing it focuses on validating the direct individual level efficacy
Explicitly state that a trial becomes considerably more challenging if:
Indirect effects of cattle vaccination must be estimated before deployment of the vaccine
Vaccination is only considered as a supplement to testandslaughter
Propose that these challenges can be mitigated by planning a phased series of trials in line with established standards in human public health:
Use natural transmission experiments to address the mode of action of BCG before field evaluation (Phase II)
Field evaluation of individual efficacy of BCG in 200 herds (Phase III)
Trials to establish the effectiveness and indirect effects of BCG (Phase IV)
Caution that the use of vaccination only as a supplement to testandslaughter will necessarily limit the perceived benefits of vaccination to farmers and argue for policy makers to consider strategies for the use of vaccination as a replacement for testandslaughter.
We also propose to make the following more minor changes in response to the reviewing editors comments:
1) “Regarding the discussion of bias in RR measures – it is unsurprising that such measures underestimate ε_{s} – it's the difference between comparing a hazard with a cumulative hazard.”
We will add this clarification, but feel the general exposition of betweenherd variability is important and should be retained.
Timetable for revisions
Based on the action plan outlined above we would expect to be able to prepare a revised manuscript within twoweeks. Recalculating the power curves using a nonparametric MannWitney test would require an additional two weeks to edit, check and rerun the code to update the eight affected figures in the main manuscript and technical appendix. As argued above, we believe that the power calculations based on relative risk are sufficient and appropriate to support our conclusions, but are happy to make this revision should the reviewing editor require it.
https://doi.org/10.7554/eLife.27694.045Article and author information
Author details
Funding
Department for Environment, Food and Rural Affairs (SE 3287)
 Andrew James Kerr Conlan
 Martin Vordermeier
 James LN Wood
Department for Environment, Food and Rural Affairs (SE 32100)
 Andrew James Kerr Conlan
 Martin Vordermeier
The Alborada Trust
 Andrew James Kerr Conlan
 James LN Wood
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
 Neil M Ferguson, Imperial College London, United Kingdom
Publication history
 Received: April 11, 2017
 Accepted: May 2, 2018
 Version of Record published: June 5, 2018 (version 1)
Copyright
© 2018, Conlan et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,495
 Page views

 187
 Downloads

 8
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Cancer Biology
 Computational and Systems Biology
Late advances in genome sequencing expanded the space of known cancer driver genes severalfold. However, most of this surge was based on computational analysis of somatic mutation frequencies and/or their impact on the protein function. On the contrary, experimental research necessarily accounted for functional context of mutations interacting with other genes and conferring cancer phenotypes. Eventually, just such results become 'hard currency' of cancer biology. The new method, NEAdriver employs knowledge accumulated thus far in the form of global interaction network and functionally annotated pathways in order to recover known and predict novel driver genes. The driver discovery was individualized by accounting for mutations' cooccurrence in each tumour genome  as an alternative to summarizing information over the whole cancer patient cohorts. For each somatic genome change, probabilistic estimates from two lanes of network analysis were combined into joint likelihoods of being a driver. Thus, ability to detect previously unnoticed candidate driver events emerged from combining individual genomic context with network perspective. The procedure was applied to ten largest cancer cohorts followed by evaluating error rates against previous cancer gene sets. The discovered driver combinations were shown to be informative on cancer outcome. This revealed driver genes with individually sparse mutation patterns that would not be detectable by other computational methods and related to cancer biology domains poorly covered by previous analyses. In particular, recurrent mutations of collagen, laminin, and integrin genes were observed in the adenocarcinoma and glioblastoma cancers. Considering constellation patterns of candidate drivers in individual cancer genomes opens a novel avenue for personalized cancer medicine.

 Computational and Systems Biology
 Evolutionary Biology
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machinelearningdriven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.