Abstract
Beneficial and deleterious mutations cause the fitness of lineages to vary across a phylogeny and thereby shape its branching structure. While standard phylogenetic models do not allow mutations to feedback and shape trees, birthdeath models can account for this feedback by letting the fitness of lineages depend on their type. To date, these multitype birthdeath models have only been applied to cases where a lineage’s fitness is determined by a single character state. We extend these models to track sequence evolution at multiple sites. This approach remains computationally tractable by tracking the genotype and fitness of lineages probabilistically in an approximate manner. Although approximate, we show that we can accurately estimate the fitness of lineages and sitespecific mutational fitness effects from phylogenies. We apply this approach to estimate the populationlevel fitness effects of mutations in Ebola and influenza virus, and compare our estimates with in vitro fitness measurements for these mutations.
https://doi.org/10.7554/eLife.45562.001Introduction
The fitness effects of new mutations is a key determinant of a population’s evolutionary potential to adapt over time. Studies exploring the distribution of fitness effects (DFE) in a wide range of organisms have revealed that, while many mutations are neutral, a smaller but significant fraction have substantial effects on fitness (Sanjuán et al., 2004; EyreWalker and Keightley, 2007; Visher et al., 2016). These findings have spurred interest in molecular evolutionary models that consider how nonneutral mutations shape sequence evolution and patterns of genetic diversity. Such models range in complexity from simple models assuming that selection operates uniformly across all sites (Muse and Gaut, 1994; Goldman and Yang, 1994; Yang and Nielsen, 2008) to parameter rich models with sitespecific fitness effects (Halpern and Bruno, 1998; Lartillot and Philippe, 2004; Rodrigue et al., 2010; Hilton and Bloom, 2018). While all of these models assume sequences evolve along an underlying phylogenetic tree representing their shared common ancestry, all also assume that the mutation process driving sequence evolution is independent of the other evolutionary processes giving rise to the tree. This independence assumption implies that mutations do not feedback and affect the fitness of lineages in the tree, such that lineages carrying highly beneficial mutations are just as likely to survive and produce sampled descendants as lineages riddled with deleterious mutations (Figure 1A).
While questionable in terms of biological realism, independence between the tree generating process and the mutation process allows for tractable statistical models. Assuming independence, the joint likelihood of a phylogenetic tree $\mathcal{T}$ and the sequence data $\mathcal{S}$ at the tips of the tree having evolved as observed can be factored into two distinct components:
The likelihood of the sequence data $L(\mathcal{S}\mathcal{T},\mu )$ conditional on the tree and the mutational parameters $\mu $ can be computed efficiently for most continuoustime Markov models of sequence evolution (Felsenstein, 1981). The probability density $p(\mathcal{T}\theta )$ of the tree $\mathcal{T}$ given the parameters generating the tree $\theta $ can likewise be computed under widely used coalescent (Griffiths and Tavaré, 1994; Pybus et al., 2000) or birthdeath models (Rannala and Yang, 1996; Stadler, 2009). In Bayesian phylogenetics, $p(\mathcal{T}\theta )$ is normally thought of as the prior distribution over trees rather than a likelihood, because the tree itself is inferred from the sequence data.
The assumption of independence between the mutation and tree generating processes may be unproblematic in certain scenarios, such as if mutations are truly neutral or do not contribute to substantial fitness differences among lineages. A common argument invoked in defense of ignoring nonneutral mutations is that macroevolutionary tree generating processes like speciation and extinction play out on longer timescales than the substitution process fixing or removing mutations within a population (Bustamante, 2005). In this case, fitness variation drives the substitution process within a population but does not ultimately drive the formation of a phylogeny at the species level. But such separationoftimescales arguments do not hold when segregating mutations contribute to substantial fitness variation between lineages in a phylogeny, such as for rapidly evolving microbes where several different mutant strains can cocirculate. In these cases, the tree generating and mutation processes occur on the same timescale, and the fitness effects of mutations can feedback and shape the branching structure of a phylogeny (Kaplan et al., 1988; Nicolaisen and Desai, 2012; Neher and Hallatschek, 2013). Ignoring nonneutral evolution in this case may introduce biases into phylogenetic inference. But perhaps more importantly, fitness differences among lineages can be correlated with ancestral genotypes, providing information about the molecular basis of adaptive evolution we would otherwise ignore.
We therefore explore an approach that couples molecular sequence evolution to the treegenerating process using multitype birthdeath (MTBD) models. Under this approach, mutations can directly impact the fitness of a lineage in the phylogeny by altering its birth or death rate (Figure 1B). For a single evolving site or other character state, the joint likelihood of the phylogeny together with the observed tip states can be computed exactly under the MTBD model (Maddison et al., 2007; Stadler and Bonhoeffer, 2013; Kühnert et al., 2016). However, this approach is impractical for more than a few nonneutrally evolving sites due to the need to track all possible genotypes as separate types in the state space of the model. We therefore explore an approximate birthdeath model that considers how mutations at multiple sites contribute to a lineage’s overall fitness, without the need to track all possible genotypes in sequence space. This approach allows us to infer the fitness effects of individual mutations and the fitness of any particular lineage at any time (based on its inferred ancestral genotype) from the branching structure of a phylogeny. Because our approach is particularly relevant to rapidly adapting microbial pathogens, we apply it to Ebola and influenza virus sequence data in order to quantify the fitness effects of naturally occurring amino acid substitutions.
Materials and methods
The MTBD at a single evolving site
Request a detailed protocolAt a single evolving site, the multitype birthdeath (MTBD) model of Stadler and Bonhoeffer (2013) can be used to compute the joint likelihood $L(S,\mathcal{T}\mu ,\theta )$ of the sequence or character state data $\mathcal{S}$ and phylogenetic tree $\mathcal{T}$ in a way that couples the mutation process with changes in fitness along a lineage. Let ${D}_{n}(t)$ represent the probability density (i.e. the likelihood) that the subtree descending from lineage $n$ evolved between time $t$ and the present exactly as observed (Figure 1C). Further, let ${D}_{n,i}(t)$ represent this probability density conditional on lineage $n$ being in state $i$ out of $M$ possible states at time $t$. Here the state of a lineage refers to a particular allele or character state (e.g. nucleotide or amino acid) at a single site. We reserve the term genotype to refer to a particular configuration of states across multiple sites in a sequence.
The density ${D}_{n,i}(t)$ can be computed going backwards in time from the present ($t=0$) to time $t$ along a lineage by numerically solving a system of ordinary differential equations:
Here, ${\lambda}_{i}$ is the birth rate and ${d}_{i}$ is the death rate of lineages in state $i$, and thus reflect a lineage’s fitness. Mutations between states $i$ and $j$ occur at a rate ${\gamma}_{i,j}$, independently of birth events. Each term in (Equation 2) describes how ${D}_{n,i}$ changes through time by accounting for all of the different events that could have occurred along the lineage. The first term (a) considers the change in probability density given that no birth, death or mutation event occurred. The second term (b) considers the probability of a birth event that went unobserved because one of the child lineages produced no sampled descendants (this event has probability ${E}_{i}(t)$, see below). The third term (c) reflects the probability that the lineage mutated from state $i$ to $j$.
${E}_{i}(t)$ represents the probability that a lineage in state $i$ is not sampled and has no sampled descendants. This probability can be computed at any time $t$ by solving a second set of ODEs:
The first term (a) reflects the probability that a lineage dies and is not sampled, where ${s}_{i}$ is the probability that a lineage in state $i$ is sampled upon dying. Terms bd have similar interpretations as in (Equation 2).
At a tip lineage $n$, we initialize ${D}_{n,i}(t)={d}_{i}{s}_{i}$ if the lineage was sampled upon death at time $t$. Alternatively, if $n$ was sampled at the present time $t=0$ before dying, then ${D}_{n,i}(t)={\rho}_{i}$, where ${\rho}_{i}$ is the probability that an individual in state $i$ was sampled at present. At a branching event, the probability density ${D}_{a,i}$ of the parent lineage $a$ in state $i$ giving rise to two descendent lineages $n$ and $m$ is updated as:
The factor of two enters because either lineage $m$ or $n$ could have given birth and we must consider both possible events.
At the root, we can compute the probability density of the entire tree by summing over all possible root states:
where ${q}_{i}$ is the prior probability that the root is in state $i$ at time ${t}_{root}$. Including the term $1{E}_{i}({t}_{root})$ in the denominator conditions the birthdeath process on giving rise to at least one sampled individual. ${D}_{n}$ represents the probability that the entire tree and the tip states $\mathcal{S}$ evolved as exactly as observed. It is therefore equivalent to the joint likelihood $L(\mathcal{S},\mathcal{T}\mu ,\theta )$ we seek where $\mu =\{\gamma \}$ and $\theta =\{\lambda ,d,s\}$.
In theory, this approach could be extended to evolution at any number of sites as long as we track ${D}_{n,i}(t)$ for all possible genotypes $i$. Unfortunately, this approach has limited utility because the number of possible genotypes in sequence space scales exponentially with the number of sites $L$ (i.e. ${4}^{L}$ possible genotypes for nucleotide sequences), making the MTBD model impractical for modeling evolution at more than a few sites.
The marginal fitness birthdeath model
Request a detailed protocolWhile the fitness of a lineage will generally depend on its genotype across multiple sites, tracking evolution in the space of all possible genotypes is, as just discussed, computationally infeasible. We therefore seek an approach that considers how mutations at multiple sites determine the fitness of a lineage without the need to track ${D}_{n,i}$ for all possible genotypes. In the approach described below and outlined in Figure 1D, we therefore track molecular evolution at each site, computing the probability that each site occupies each state, and then approximate the probability of a lineage being in any particular genotype based on these site probabilities. To compute the expected fitness of a lineage, we can then sum, or marginalize, over the fitness of each genotype weighted by its approximate probability. We therefore refer to this approach as the marginal fitness birthdeath (MFBD) model.
First, in order to couple a lineage’s fitness with the birthdeath process, we will assume that the birth rate ${\lambda}_{n}$ of any lineage $n$ scales according to the fitness ${f}_{g}$ of its genotype:
where ${\lambda}_{0}$ is the base birth rate assigned to a particular reference genotype (e.g. the wildtype). A lineage’s death rate can also be coupled to its fitness, but for simplicity we will assume a lineage’s fitness is reflected only in its birth rate ${\lambda}_{n}$.
Let $\mathcal{G}$ be the set of all possible genotypes in sequence space and ${g}_{k}$ be the state of genotype $g$ at site $k$. To make it clear when we are considering evolution in genotype space rather than at a particular site, we will write the probability density ${D}_{n,i}$ as ${D}_{n,g}$ when $i$ refers to a particular genotype. Furthermore, let ${D}_{n,k,i}$ be the probability density of the subtree descending from lineage $n$ given that site $k$ is in state $i$. By definition,
where the sum is over all genotypes in $\mathcal{G}$ with site $k$ in state $i$.
We can derive a difference equation for ${D}_{n,k,i}$ from ${D}_{n,g}$ in a straightforward manner:
Taking the limit as $\mathrm{\Delta}t\to 0$, we get a new system of differential equations for ${D}_{n,k,i}(t)$:
Unfortunately, (Equation 9) would still require us to track ${D}_{n,g}(t)$ for all possible genotypes, precisely what we wish not to do. We show below that, if we can approximate ${f}_{g}$ and ${E}_{n,g}$ for any given lineage, we can write (Equation 9) in terms of only ${D}_{n,k,i}$ (see (Equation 19)) and therefore do not need to track each genotype.
Approximating the fitness of a lineage
Request a detailed protocolWe begin by approximating the fitness ${f}_{n}$ of a lineage $n$. Even if we do not know the exact genotype of a lineage at a particular time, we can compute the lineage’s expected fitness by summing over the fitness of each genotype ${f}_{g}$ weighted by the probability ${\omega}_{n,g}$ that lineage $n$ is in genotype $g$:
The same logic can be extended to compute the expected marginal fitness $\mathbb{E}({f}_{n,k,i})$ of a lineage $n$ that at site $k$ is in state $i$:
Computing $\mathbb{E}({f}_{n,k,i})$ using (Equation 11) requires knowledge of the genotype probabilities ${\omega}_{n,g}$, which would again require us to track evolution in genotype space. We therefore introduce our major assumption: that we can approximate genotype probabilities using only the marginal site probabilities ${\omega}_{n,k,i}$ that site $k$ is in state $i$. We describe how we compute ${\omega}_{n,k,i}$ below. For now, we make the approximation that
This approximation assumes that all sites evolve independently of one another, which is not generally true because mutations at different sites are linked together in genotypes with shared ancestral histories, creating correlations among sites that we ignore.
Using the approximate genotype probabilities ${\widehat{\omega}}_{n,g}$, we can in turn approximate the expected marginal fitness of a lineage:
If the fitness effects of each site act multiplicatively to determine the overall fitness of a lineage, we can compute ${\widehat{f}}_{n,k,i}$ as:
where ${\sigma}_{ki}$ is the fitness effect of site $k$ being in state $i$. This formulation of ${\widehat{f}}_{n,k,i}$ is useful if the number of sites $L$ is large and the number of genotypes we need to sum over in (Equation 13) is therefore also extremely large.
Approximating the probability of no sampled descendants
Request a detailed protocolThe ${E}_{n,g}(t)$ term in (Equation 9) represents the probability that a lineage $n$ alive at time $t$ in the past is not sampled and leaves behind no sampled descendants. ${E}_{n,g}(t)$ therefore necessarily depends on the fitness of unobserved lineages descending from $n$ and how fitness along these lineages evolves through changes in their genotype. Because it is often easier track evolution in one dimensional fitness space rather than highdimensional sequence space (Kepler and Perelson, 1993; Tsimring et al., 1996), we simplify this problem by tracking a proxy for ${E}_{n,g}(t)$ though fitness space.
Let ${E}_{u}$ be the probability that a lineage with expected fitness $u$ leaves no sampled descendants. While fitness can take on a continuous range of values, we track these probabilities only for a discrete set of points $\mathcal{V}$ in fitness space. We can track ${E}_{u}$ for $u\in \mathcal{V}$ by modifying (Equation 3) to obtain:
We can then substitute ${E}_{n,g}(t)$ in (Equation 9) with ${E}_{u}$ for the fitness value $u$ closest to ${f}_{g}$ or ${\widehat{f}}_{n,k,i}$ in fitness space.
Tracking evolution in fitness space requires us to specify rates ${\gamma}_{u,v}$ for how lineages transition between fitness classes $u$ and $v$. Let ${\mathcal{G}}_{u}$ be the set of genotypes with expected fitness closest to $u$ out of all fitness values in $\mathcal{V}$. We approximate ${\gamma}_{u,v}$ as:
where ${\mu}_{ij}$ is the mutation rate between genotypes $i$ and $j$. In other words, we compute the average rate of transitions out of fitness class $u$ into $v$ by summing over all possible transitions between genotypes contained within each fitness class. Note that if each genotype falls in a unique fitness class such that ${\mathcal{G}}_{u}=1$ for all $u\in \mathcal{V}$, then ${E}_{u}$ is computed exactly. In the Results, we compare using the approximate transition rates above to compute ${E}_{u}$ versus an even simpler approximation where we assume no transitions between fitness classes along unobserved lineages, which has been assumed in earlier multitype birthdeath models (Rabosky et al., 2014; BaridoSottani et al., 2018).
Computing the marginal site densities ${D}_{n,k,i}$
Request a detailed protocolRecall that (Equation 9) provided an exact way to track the marginal site densities ${D}_{n,k,i}$ based on the genotype densities ${D}_{n,g}$. To efficiently evaluate ${D}_{n,k,i}$ without the need to track ${D}_{n,g}$ for all genotypes, we apply the three approximations made above. First, we approximate the genotype probabilities ${\widehat{\omega}}_{n,g}$ based on the marginal site probabilities. Second, we marginalize over the fitness of each genotype (weighted by its genotype probability) to compute ${\widehat{f}}_{n,k,i}$ and then substitute ${\widehat{f}}_{n,k,i}$ for ${f}_{g}$ for all genotypes where ${g}_{k}=i$ below. Third, we approximate ${E}_{n,g}$ by ${E}_{u}$ for a single fitness value $u$ closest to ${\widehat{f}}_{n,k,i}$. Making these approximations in (Equation 9) leads to:
Assuming that the mutation rate from $i$ to $j$ at site $k$ does not depend on the genetic background, we can substitute ${\sum}_{j=1}^{M}{\sum}_{\{{g}^{\prime}\in \mathcal{G}:{g}_{k}^{\prime}=j\}}{\gamma}_{g,{g}^{\prime}}$ with ${\sum}_{j=1}^{M}{\gamma}_{i,j}$, where ${\gamma}_{i,j}$ is the per site mutation rate. We can likewise substitute ${\sum}_{j=1}^{M}{\sum}_{\{{g}^{\prime}\in \mathcal{G}:{g}_{k}^{\prime}=j\}}{\gamma}_{g,{g}^{\prime}}{D}_{n,{g}^{\prime}}(t)$ with ${\sum}_{j=1}^{M}{\gamma}_{i,j}{\sum}_{\{{g}^{\prime}\in \mathcal{G}:{g}_{k}^{\prime}=j\}}{D}_{n,{g}^{\prime}}(t)$. Making these substitutions and rearranging the sums in (Equation 17), we have:
Recalling that ${D}_{n,k,i}={\sum}_{\{g\in \mathcal{G}:{g}_{k}=i\}}{D}_{n,g}$ (and by extension ${D}_{n,k,j}={\sum}_{\{g\in \mathcal{G}:{g}_{k}=j\}}{D}_{n,g}$), we have:
The significance of (Equation 19) is twofold. First, we can track sequence evolution at each site individually without tracking all genotypes. Second, given ${\widehat{f}}_{n,k,i}$, we can track the overall fitness of a lineage by marginalizing over the fitness effects of all possible mutations at other sites. We can therefore track sequence evolution at each site while simultaneously taking into account the coupled fitness effects of mutations at all other sites on a lineage’s fitness.
Computing ${\widehat{f}}_{n,k,i}$ still requires us to approximate the genotype probabilities using (Equation 12), which in turn requires the marginal site probabilities ${\omega}_{n,k,i}$. In our notation, ${\omega}_{n,k,i}$ represents the conditional probability $p(i{\mathcal{T}}_{n},{\mathcal{S}}_{n})$ that lineage $n$ is in particular state $i$, where ${\mathcal{T}}_{n}$ represents the subtree descending from $n$ with tip sequences $\mathcal{\mathcal{S}}}_{n$ represents the inverse conditional probability density $p({\mathcal{T}}_{n},{\mathcal{S}}_{n}i)$. We can therefore apply Bayes theorem to compute ${\omega}_{n,k,i}$ given ${D}_{n,k,i}$:
The $q(i)$ terms represent the prior probability that the lineage is in state $i$. Here we make a simplification in assuming that the tree ancestral and sister to lineage $n$ has no information regarding ${\omega}_{n,k,i}$, and thus assume a uniform prior on $q(i)=1/M$. The $q(i)$ terms therefore cancel above.
Because the fitness of a lineage depends on the state of all sites, we must solve (Equation 19) for all sites simultaneously as one coupled system of differential equations. This requires updating ${D}_{n,k,i}$ at each time step, which suggests the following iterative procedure.
At a tip $n$ observed to be in genotype $g$, we initialize ${\widehat{f}}_{n,k,i}$ as ${f}_{g}$ if ${g}_{k}=i$ or else ${\widehat{f}}_{n,k,i}=0$, ${\widehat{D}}_{n,k,i}=ds$ or $\rho $, and ${\omega}_{n,k,i}=1$ if ${g}_{k}=i$, else ${\omega}_{n,k,i}=0$. Then at each time step backwards through time from time $t$ to time $t+\mathrm{\Delta}t$, for each site and state we:
Update ${D}_{n,k,i}$ by numerically integrating (Equation 19) over time step $\mathrm{\Delta}t$.
Update the marginal site probabilities ${\omega}_{n,k,i}$ using (Equation 20)
Update the expected marginal fitness values ${\widehat{f}}_{n,k,i}$ using (Equation 13) or (Equation 14).
Computing the full joint likelihood
Request a detailed protocolWe can now compute the joint likelihood of the tree and sequence data if we track ${D}_{n,k,i}$ at each site back to the root. At the root, ${D}_{n,k,i}({t}_{root})$ represents $p(\mathcal{T},{\mathcal{S}}_{k}\mu ,\theta ,i)$, the probability density of the entire tree $\mathcal{T}$ and the observed sequence data ${\mathcal{S}}_{k}$ as site $k$, conditional on site $k$ being in state $i$ at the root. To be precise, ${D}_{n,k,i}$ only approximates $p(\mathcal{T},{\mathcal{S}}_{k}\mu ,\theta ,i)$ because we computed ${D}_{n,k,i}$ using the expected marginal fitness of a lineage ${\widehat{f}}_{n,k,i}$ based on approximate genotype probabilities. We therefore introduce an additional auxiliary variable $\mathcal{F}$ representing the entire set of expected fitness values ${\widehat{f}}_{n,k,i}$ computed over all lineages, sites and states. Using this notation, ${D}_{n,k,i}({t}_{root})=p(\mathcal{T},{\mathcal{S}}_{k}\mu ,\theta ,\mathcal{F},i)$. By summing over all possible root states at site $k$ (and conditioning on survival), we can then compute:
Likewise, we can compute the conditional probability density $p({\mathcal{S}}_{k}\mathcal{T},\mu ,\theta ,\mathcal{F})$ of the sequence data at site $k$ given the tree:
We already know $p(\mathcal{T},{\mathcal{S}}_{k}\mu ,\theta ,\mathcal{F})$ from above but now need the tree density $p(\mathcal{T}\mu ,\theta ,\mathcal{F})$. This can easily be computed using a birthdeath process where the birth rate of each lineage at any time $t$ is always rescaled by its expected fitness ${\widehat{f}}_{n}(t)$ contained within $\mathcal{F}$.
We can now compute the joint density $p(\mathcal{T},{\mathcal{S}}_{1:L}\mu ,\theta )$ for all sites. Because each site is conditionally independent of all other sites given $\mathcal{F}$, we can factor $p(\mathcal{T},{\mathcal{S}}_{1:L}\mu ,\theta ,\mathcal{F})$ into a product of densities for ${\mathcal{S}}_{k}$ at each site and the density of the entire tree $\mathcal{T}$:
We can thus approximate the joint likelihood of the sequence data and the phylogeny $p(\mathcal{T},{\mathcal{S}}_{1:L}\mu ,\theta )$ as $p(\mathcal{T},{\mathcal{S}}_{1:L}\mu ,\theta ,\mathcal{F})$. This allows us to consider how selection shapes sequence evolution at each site while simultaneously considering how the fitness effects of mutations at multiple sites act together to shape the phylogeny. As (Equation 23) makes clear though, the goodness of our approximation depends on how well the fitness values in $\mathcal{F}$ are approximated, which in turn depends on how well we can approximate genotypes based on the marginal site probabilities. We explore the goodness of these approximations in the Results section.
Implementation
Request a detailed protocolWe first implemented the marginal fitness birthdeath (MFBD) model in Matlab version R2017b. The Matlab implementation was used to test how well the MFBD model can approximate likelihoods and genotype probabilities relative to the exact multitype birth death model tracking all possible genotypes for a simple model with only four genotypes. For statistical inference, the MFBD was implemented as an addon package for BEAST 2 (Bouckaert et al., 2014) named Lumière, which extends the existing BDMM package for multitype birthdeath models (Kühnert et al., 2016). BEAST two is a general software platform that allows a wide range of evolutionary models including birthdeath models to be fit to phylogenetic trees while jointly inferring the phylogeny using Bayesian MCMC sampling. The BEAST 2 implementation of Lumière therefore allows the joint posterior distribution of all parameters in the MFBD model and the phylogeny to be estimated from sequence data. Source code for Lumière and the Matlab implementation are freely available at https://github.com/davidrasm/Lumiere.
Simulations
Request a detailed protocolTo test the statistical performance of our approach, mock phylogenies and sequence data were simulated under a birthdeathmutationsampling process using a variant of the Gillespie stochastic simulation algorithm (Gillespie, 2007) that recorded the ancestry of all individuals in the population. A binary sequence was associated with each lineage and allowed to mutate with a constant persite mutation rate $\gamma $. Mutations could alter the fitness of a lineage by either increasing or decreasing its birth rate according to sitespecific fitness effects. At death events, lineages were sampled with probability $s$, in which case they were included in the mock phylogeny. Code for these simulations is available at https://github.com/davidrasm/Lumiere/tree/master/sim (Rasmussen, 2019; copy archived at https://github.com/elifesciencespublications/Lumiere).
Ebola analysis
Request a detailed protocolWe used the Lumière implementation of the MFBD model to estimate the fitness effects of amino acid mutations previously identified to increase the infectivity of Ebola virus in human cell lines (Diehl et al., 2016; Urbanowicz et al., 2016). We reanalyzed a set of 1610 whole genome EBOV sequences sampled from Guinea, Sierra Leone and Liberia in 2014 to 2016. The sequence alignment along with the timecalibrated molecular phylogeny we used for our analysis were downloaded from https://github.com/ebov/spacetime/tree/master/Data.
(Urbanowicz et al., 2016) measured the fitness effects of 17 viral genotypes carrying 18 different amino acid mutations in either single, double or triple mutant backgrounds relative to the Makona genotype first sampled at the beginning of the epidemic. Because our methods cannot estimate fitness effects of mutations at very low frequencies, we only analyzed 9 of these mutations that were present in at least 10 of the 1610 viral samples. Preliminary analysis revealed that these mutations fall within eight unique genetic backgrounds because of the way mutations are nested within other single or double mutant lineages in the phylogeny. Because the data of Urbanowicz et al. (2016) strongly suggest that epistatic interactions between mutations affect viral fitness, we estimated the genotypic fitness ${f}_{g}$ of these eight major genotypes rather than sitespecific fitness effects $\sigma $. We therefore used the MFBD to track sequence evolution at each site, but used (Equation 13) to marginalize over these genotypes when approximating the fitness of a lineage.
We estimated the fitness of each genotype relative to the Makona genotype, assuming a uniform $[0,2]$ prior distribution on these fitness values. For the other parameters in the model, we assumed a fixed death or removal rate $d$ of 0.1667 per day based on earlier estimates (Gire et al., 2014; Stadler et al., 2014). Sampling was modeled as occurring upon removal, with the sampling proportion $s$ set to zero before March 2014, when the first sample was collected. After March 2014, we assumed a fixed sampling proportion of 0.056, reflecting the fact that the dataset included samples from 1610 individuals out of the 28,652 probable cases reported by the WHO (WHO, 2016). Lastly, we assumed a constant amino acid mutation rate over all sites with an exponential prior on both the forward and backward mutation rate with a mean rate of 2 × 10^{−3} per site per year. We also ran a second analysis where we included the geographic locations of lineages (Guinea, Sierra Leone and Liberia) as an additional evolving character state in our model. In this analysis, we estimated the effect of geographic location on transmission rates in Sierra Leone and Liberia relative to the base transmission rate in Guinea. Both analyses can be reproduced in Lumière with the XML input files available at https://github.com/davidrasm/Lumiere/tree/master/ebola.
Influenza H3N2 analysis
Request a detailed protocolWe used the Lumière implementation of the MFBD model to estimate the fitness effects of amino acid mutations in the hemagglutinin (HA) protein of human influenza virus subtype H3N2. In order to ensure our fitness estimates were directly comparable to the mutational fitness effects previously estimated by Lee et al. (2018), we focused our analysis on viral samples in the same antigenic cluster as the A/Perth/16/2009 strain studied by Lee et al. for two reasons. First, Lee et al. (2018) showed that the fitness effects of amino acid mutations in HA vary depending on the genetic background, with greater fitness differences between more divergent strains. We therefore only considered strains with low genetic divergence from A/Perth/16/2009. Second, the deep mutational scanning experiments were performed in cell culture, and therefore do not reflect the antigenic component of viral fitness in the human population. Only considering a single antigenic cluster therefore minimizes the effect of antigenic mutations.
To further minimize additional background variation in fitness due to geography, we only considered samples collected in the United States from January 2009 to the end of 2012. Overall, we downloaded 2150 sequences from the Influenza Research Database (https://www.fludb.org/) that met these criteria. Nucleotide sequences of the HA segment were aligned in Muscle (Edgar, 2004) and a maximum likelihood phylogenetic tree was estimated in RAxML (Stamatakis, 2014) using a GTR + Gamma substitution model. To get a timecalibrated phylogeny, branch lengths in the ML tree were converted into units of real calendar time with Least Squares Dating v0.3 (To et al., 2016) using a previously estimated molecular clock rate for the HA segment of H3N2 of 5.72 x 10^{−3} substitutions per site per year (Rambaut et al., 2008).
In our first analysis, we estimated mutational fitness effects from the H3N2 phylogeny under the MFBD model assuming that fitness effects are multiplicative across sites, as in (Equation 14). Because of the large number of naturally occurring mutations in the HA sequences, we limited our analyses to the 17 most abundant amino acid mutations that were present in more than 10% of the sampled sequences. To compare our estimates of populationlevel fitness effects to fitness effects measured in vitro, we converted the relative amino acid preferences at each site from the deep mutational scanning experiments to mutational fitness effects:
where ${\pi}_{k,i}$ is the relative preference for amino acid $i$ at site $k$. To compute these fitness effects, we used the averaged relative amino acid preferences reported in Dataset S3 of Lee et al. (2018).
In our second analysis, we used the relative preference data from the deep mutational scanning experiments to predict the populationlevel fitness of viral lineages. For this analysis, we considered all naturally occurring mutations in the HA protein that were present in at least 10 samples. In all, the fitness effects of 67 mutations distributed across 56 sites were included. To map relative amino acid preferences across multiple sites to populationlevel fitness, we assume that the mutational fitness effects computed from the relative amino acid preferences are additive on a log_{2} scale, such that the fitness ${f}_{n}$ of a lineage is:
Here, $\alpha $ is a linear scaling term that allows us to calibrate populationlevel fitness in terms of the sum of the sitespecific fitness effects. We also include the scaling exponent $\kappa $ to account for curvature in the fitness landscape, as might be expected to arise if mutations interact globally through synergistic ($\kappa >1$) or antagonistic ($\kappa <1$) epistatic effects across sites (Elena et al., 2010). A complete list of the HA mutations considered, their fitness effects predicted by DMS and the XML input file needed to reproduce our analysis are available at https://github.com/davidrasm/Lumiere/tree/master/influenzaH3N2.
Results
The four genotype model
We first consider a simple model of molecular evolution in order to compare the marginal fitness birthdeath (MFBD) model against the exact multitype birthdeath (MTBD) model tracking all genotypes. Specifically, we consider a binary evolving sequence of length $L=2$ where all mutations are deleterious and carry a selective fitness cost $\sigma $. Fitness effects of individual mutations act multiplicatively, such that the double mutant has fitness ${(1\sigma )}^{2}$. With this simple model, it is therefore possible to track the evolutionary dynamics of all four genotypes ($\mathcal{G}=\{00,01,10,11\}$) under both models.
Figure 2A shows a phylogeny simulated under the four genotype model, colored according to the genotype of each lineage. We computed the joint likelihood that this tree and observed tip genotypes evolved under a range of different fitness values $\sigma $ for both the exact MTBD and approximate MFBD models (Figure 2B). The likelihood profiles under both models peak around the true value of $\sigma $ and closely match at lower values of $\sigma $, but begin to diverge at higher values. The probability of a single hypothetical lineage being in each genotype approximated under the MFBD model is also shown against the exact genotype probabilities computed under the MTBD in (Figure 2C).
Because the MFBD approximates the probability of a lineage being in each genotype based on the marginal sites probabilities, we also compared how well the MFBD model approximates the genotype probability densities ${D}_{n,g}$ relative to the exact multitype birthdeath model. Recall that ${D}_{n,g}$ provides the probability that the subtree descending from a lineage $n$ has evolved exactly as observed, and therefore forms the foundation of all likelihood calculations under our model. Averaged over all genotypes, the error introduced by approximating ${D}_{n,g}$ under the MFBD model is greatest at intermediate mutation rates (Figure 3A). When there is no selection ($\sigma =0$), the MFBD introduces no error, but the error increases as the strength of selection increases (Figure 3B). We can also consider a variant of the four genotype model where each of the single mutant genotypes is neutral with $\sigma =0$ but an epistatic interaction between the two sites causes the double mutant to be deleterious with some fitness cost $\u03f5$. Again, the error introduced by the MFBD grows as the strength of the epistatic fitness effect increases (Figure 3C).
Taken together, these results suggest that the MFBD model introduces error in ${D}_{n,g}$ by ignoring correlations among sites due to the fact that selection acts at the level of genotypes, especially when epistasis is strong. The additional correlations between sites induced by selection then causes the genotype probabilities to deviate from those expected based on the marginal site probabilities. Conversely, at very high mutation rates, correlations between sites quickly break down so that sites evolve effectively independently of one another, such that the error introduced by the MFBD also decreases as the mutation rate becomes very high.
Overall, the magnitude of the error introduced by approximating the genotype probabilities is small, especially when we can compare the MFBD model against a more naive approximation that tracks sequence evolution at each site completely independent of all other sites by setting the expected marginal fitness ${\widehat{f}}_{n,k,i}=\sigma $ instead of using (Equation 14). This approximation completely ignores how the fitness of a lineage depends on mutations at other sites, and the error in ${D}_{n,g}$ is generally considerably greater than under the MFBD model (Figure 3A–C; dashed lines). Moreover, even when the error introduced by the MFBD model is relatively large, the model still tracks the dynamics of ${D}_{n,g}$ backwards through time along a lineage well (Figure 3D–F; for parameter values marked by the black asterisks in AC).
The MFBD model also approximates ${E}_{n}$, the probability that a lineage has no sampled descendants, using a discretized fitness space and is therefore another source of potential error. Mirroring the results for ${D}_{n,g}$, the error introduced by this approximation peaks at intermediate mutation rates while it increases monotonically with the strength of selection and epistatic fitness effects (Figure 4A–C). Interestingly, tracking how lineages transition between fitness classes in fitness space does not improve the approximation relative to simply ignoring changes in fitness along unobserved lineages (Figure 4A–C; dashed lines). The overall magnitude of error introduced by approximating ${E}_{n}$ is also small, although using a discretized fitness space does lead to some jaggedness in the dynamics of ${E}_{n}$ (Figure 4D–F). However, only when selection is very strong ($\sigma >0.8$) does tracking ${E}_{n}$ in fitness space result in significant errors, and then only in the more distant past (Figure 4E). In this case, a lineage’s fitness in the distant past may be a poor predictor of its probability of leaving sampled descendants at a time point in the distant future because the fitness of the lineage and its descendants may greatly change over time in a way that is difficult to predict without considering the exact mutational pathways through which a lineage can move in sequence space.
Estimating sitespecific fitness effects
Next, we simulated phylogenies under a model where the fitness effect of the mutant allele at each site is drawn independently from a distribution of fitness effects (DFE) in order to test how well we can estimate sitespecific fitness effects. Because there can be considerable uncertainty surrounding these fitness effects, we now estimate the posterior distribution of fitness effects using Bayesian MCMC. The accuracy and precision of the estimated fitness effects varies considerably across sites, as shown for a representative phylogeny with five evolving sites in Figure 5.
In order to better understand this variability, we simulated 100 phylogenies with randomly drawn fitness effects at either 2, 5 or 10 evolving sites. Overall, the estimated posterior median fitness effects are well correlated with their true values, although the strength of this correlation decreases as the number of sites increases (Figure 6A–C). Coverage of the 95% credible intervals on the other hand increased from 71.0 to 72.8% to 77.4%.
While there is no systematic directional bias, fitness effects are underestimated for sites at which the mutant allele is at low frequency among sampled individuals and overestimated for sites where the mutant allele is at high frequencies. This however appears to be an intrinsic feature of estimating fitness effects from the branching structure of a phylogeny, as the same phenomena is observed under the exact MTBD model with two sites and four genotypes (Figure 6D), and the estimates made under the approximate MFBD model are highly correlated with estimates made under the exact MTBD model (Figure 6E).
Across all sites and simulations, accuracy decreased when the mutant allele at a given site was at low or high frequencies, and there was considerably more uncertainty for sites where the mutant allele was at very low frequencies (Figure 6F). Thus, while the MFBD model generally performs well at estimating sitespecific fitness effects, the accuracy and precision of these estimates varies greatly depending on the frequency of a given mutation in a phylogeny.
Ebola virus adaptation to humans
The Ebola virus glycoprotein (GP) binds to cells during viral cell entry and is therefore thought to be a key determinant of viral fitness in different hosts. Previously, (Urbanowicz et al., 2016) analyzed a large set of naturally occurring amino acid mutations in the GP isolated from patients during the 2013–16 epidemic in Western Africa. The effect of these GP mutations on fitness were then experimentally determined using infectivity assays in cell culture. Several mutant genotypes dramatically increased viral infectivity relative to the Makona genotype isolated during the earliest stages of the epidemic. However, the effect of these mutations on viral transmission and fitness at the host population level have not yet been determined. We therefore applied the MFBD model to a large dataset of 1610 Ebola virus (EBOV) genomes sampled during the 2013–16 epidemic to infer the populationlevel fitness effects of these GP mutations.
We analyzed 9 out of the 18 amino acid mutations analyzed by Urbanowicz et al. (2016) that were present in at least 10 of the 1610 viral samples. These nine mutations fall in eight different genetic backgrounds or genotypes (Figure 7). Because Urbanowicz et al. (2016) found evidence for epistatic interactions between several of these mutations, we estimated the fitness of these eight genotypes rather than sitespecific mutational fitness effects. Table 1 shows the relative fitness of these genotypes estimated at the populationlevel versus their fitness in cell culture.
Mapping the genotypes and fitness of lineages inferred under the MFBD model onto the phylogeny allows us to reconstruct the series of events by which EBOV adapted to humans (Figure 7). Shortly after the epidemic started in 2013, the A82V mutation occurred and gave rise to lineage B, which then spread to Sierra Leone, Liberia and Mali. (Urbanowicz et al., 2016) found that the A82V mutation increases infectivity by 2–3 fold in cell culture. At the populationlevel, this mutation appears to have a less dramatic effect, increasing transmissibility by only 5% relative to the Makona genotype. The P330S mutation appears to have temporarily decreased the fitness of the main surviving clade in lineage A, although mutations N107D and G480D later rescue the fitness of this lineage, consistent with the findings of Urbanowicz et al. (2016). Meanwhile, the R410S mutation occurred within lineage B but did not have an immediate effect on fitness. However, R410S appears to epistatically interact with mutation K439E, which occurs twice along the same lineage carrying the R410S mutation and in this genetic background increases infectivity 2–3 fold in cell culture. We estimate that the A82V+R410S+K439E genotype had the highest populationlevel fitness, but only increased fitness by 14% relative to the Makona genotype. Three other mutations, R29K, T230A and I371V, also occurred in the A82V genetic background, but were not estimated to have further increased the fitness of the A82V genotype.
Because the A82V mutation occurred along a lineage that spread from Guinea to Sierra Leone and several of the genotypes we considered were also geographically restricted, we performed a second analysis to check whether our estimates of genotype fitness were confounded by geographic differences in transmission rates. In this model, we accounted for geographic effects by including location (Guinea, Sierra Leone or Liberia) as an additional evolving character state or ‘site’ in the model. We found no evidence that transmission rates differed by location; relative transmission rates were 1.01 (95% CI: 0.97–1.05) in Sierra Leone and 0.99 (95% CI: 0.94–1.04) in Liberia compared with Guinea. All mutant genotypes had higher estimated fitness relative to the Makona genotype under the model with geographic effects due to a lower estimated fitness of the Makona genotype. However, the rank order of genotypic fitness values is consistent across models (Table 1). Overall, the population level fitness of all eight genotypes agree with their fitness in cell culture in terms of the sign or direction of their effects, but these genotypes had much greater fitness relative to the Makona genotype in cell culture than at the population level.
Influenza H3N2 fitness variation
We also applied the MFBD model to estimate the fitness effects of mutations in the hemagglutinin (HA) protein of human influenza virus subtype H3N2. Lee et al. (2018) recently estimated the relative preference for each amino acid residue at all sites in the HA protein in cell culture using a reverse genetics approach known as deep mutational scanning (DMS). The fitness effect of mutating one amino acid to another is expected to correlate strongly with the relative preference for each amino acid in these experiments. We therefore sought to compare the populationlevel fitness effects of naturally occurring mutations estimated under the MFBD model with their fitness measured in vitro by DMS.
To minimize the effect of antigenic mutations, which would not be reflected in the DMS experiments, we limited our analysis to viral lineages in the same antigenic cluster as the A/Perth/16/2009 strain studied by Lee et al. (2018). We first estimated the fitness effects of the 17 most abundant mutations that reached a frequency of 10% or greater among viruses sampled in the United States between 2009 and 2012. We found no apparent relationship between the estimated populationlevel fitness effects of these mutations and their in vitro effects, although there is agreement that most of these mutations are nearly neutral (Figure 8A). Although the 95% credible intervals on our estimates are narrow, these results need to be interpreted with extreme caution because our MCMC algorithm never converged on a stable posterior distribution for several mutations (Effective Sample Size < 10) due to strong correlations between mutations in their estimated fitness effects. This is likely due to the fact that many of these mutations occur only once in the phylogeny and share the same ancestry and therefore genetic background as other mutations in the phylogeny (Figure 8B).
While our populationlevel estimates did not correlate with the in vitro data, the fitness effects predicted by DMS correlate strongly with the maximum frequency that naturally occurring mutations reach in the human population (Lee et al., 2018). We therefore sought to test whether using the DMS experimental data to inform the MFBD model about the fitness effects of mutations, rather than estimating them independently from the phylogeny, would result in a better fit of the model to the H3N2 phylogeny and sequence data. Doing so requires a fitness model that aggregates mutational fitness effects across sites and then maps this combined fitness to the populationlevel fitness of a lineage. In our model, we sum the mutational fitness effects predicted by the relative amino acid preferences across all sites to get a composite predictor of fitness: ${\theta}_{DMS}={\sum}_{k}^{L}lo{g}_{2}\frac{{\pi}_{k,i}}{{\pi}_{k,l}}$. We then use (Equation 25) to map ${\theta}_{DMS}$ to overall populationlevel fitness.
Fitting our model to the H3N2 phylogeny allows us to calibrate how the mutational fitness effects based on relative preferences scale to populationlevel fitness. Overall, large changes in ${\theta}_{DMS}$, resulting from mutations to more or less preferred amino acid residues, have a relatively small impact on populationlevel fitness. Populationlevel fitness grows slowly and roughly linearly with mutations to more preferred amino acids (Figure 9; inset). Nevertheless, when mutational fitness effects are aggregated across all sites, there are substantial fitness differences between lineages (Figure 9). Relative to a hypothetical lineage bearing the consensus sequence, fitness ranges from 0.84 to 1.04 across lineages with many lineages having a relative fitness less than one, indicating a slightly deleterious mutation load. Accounting for these fitness differences results in the MFBD model informed by the DMS data fitting the H3N2 phylogeny substantially better (Log likelihood: −4184) than a model assuming all mutations are neutral (Log likelihood: −7510). As would be expected, lineages predicted to be more fit also tend to persist between influenza seasons. Most notably, a lineage with higher than average fitness circulates in 2009 and 2010 during the H1N1 pandemic. This lineages carries the T228A mutation, which is predicted to have a large beneficial effect in the DMS experiments. It is therefore tempting to speculate that this mutation may have conferred an advantage that helped seasonal H3N2 compete with the pandemic H1N1 virus.
Discussion
Many assumptions are made in phylogenetics to model molecular evolution in a statistically tractable way. Historically, one of the most pervasive yet biologically questionable of these assumptions has been that sequences evolve neutrally along lineages, such the mutations do not feedback and alter the branching process shaping the phylogeny. Our marginal fitness birthdeath (MFBD) model allows us to relax this core assumption in order to consider how nonneutral evolution at multiple sites affects sequence evolution, the fitness of lineages, and the overall branching structure of a phylogeny. While our approach is not exact in that it approximates genotype probabilities by assuming sites evolve independently when computing the marginal fitness of a lineage, we have shown that this approximation generally works well and only produces significant errors in rather extreme situations, such as the four genotype model with very strong selection or epistasis. While an earlier approach based on birthdeath models allowed for lineagespecific fitness values to be inferred from the branching pattern of a phylogeny (Neher et al., 2014), this approach did not connect fitness back to the mutational process nor allow for the fitness effects of individual mutations or genotypes to be estimated. Using our approach, we demonstrated that the fitness effects of specific mutations can be estimated from simulated phylogenies under the MFBD with accuracy comparable to an exact multitype birth death model. The MFBD model therefore provides a new, statistically powerful way of incorporating adaptive molecular evolution into phylodynamics.
The MFBD model allows us to exploit phylogenetic information about adaptive evolution that most methods for inferring selection from patterns in sequence data ignore. Currently, codonsubstation models (Goldman and Yang, 1994; Muse and Gaut, 1994) and the related class of mutationselection models (Yang and Nielsen, 2008) are by far the most widely used approach for inferring selection. These approaches rely on comparing sequence substitution patterns such as the dN/dS ratio of nonsynonymous to synonymous substitutions across sites. These approaches can be very powerful when sequences from highly divergent taxa are compared, such that enough time has elapsed for multiple substitutions at a single site to have accumulated between lineages. But on the shorter timescales relevant to evolution within a population, substitution patterns like the dN/dS ratio are relatively insensitive to selection pressures and may produce misleading inferences of selection (Kryazhimskiy and Plotkin, 2008). For example, a highly beneficial nonsynonymous mutation that occurs in a single lineage and then spreads through a population may produce a very low dN/dS ratio, indicative of purifying selection rather than adaptive evolution. In contrast, comparing the evolutionary dynamics of lineages with and without the mutation allows us to infer if that mutation confers a competitive advantage. Thus, considering the branching pattern of phylogenies provides additional information about molecular evolution not visible from substitution patterns in sequence data alone.
While new technologies increasingly allow researchers to quantify mutational fitness effects in vitro or even in vivo (Zanini and Neher, 2013; Thyagarajan and Bloom, 2014), how fitness measured in the lab translates to fitness in nature is largely unknown. This is especially pertinent for emerging pathogens whose epidemic potential often depends on new adaptive mutations (Antia et al., 2003; Longdon et al., 2014). Phylodynamic approaches like the MFBD model that can quantify fitness at the host population level are therefore greatly needed, as they offer a means to assess the epidemiological significance of mutant lineages. Extrapolating from our experience with Ebola, where the populationlevel fitness effects of each mutant genotype we considered matched the sign of their effect in cell culture, we suspect that fitness measured in the lab will generally agree with fitness in nature. This seems reasonable, as mutations that increase replication or cellular infectivity within hosts should generally promote transmissibility between hosts (e.g. Quinn et al., 2000; Fraser et al., 2007). But at the same time, there is no reason to believe that transmission rates will increase linearly or even monotonically with increasing withinhost growth rates. We therefore expect that the magnitude of fitness effects might often greatly differ across scales, as we found for the A82V glycoprotein mutation in Ebola. While A82V doubles infectivity in cell culture, we estimated that it only increases transmissibility at the population level by 5% (95% CI: 4–7%). Interestingly, (Diehl et al., 2016) found that A82V only slightly increases viral titers in Ebola patients, which is likely a much better proxy for transmissibility than cellular infectivity, lending support to our more moderate estimates at the population level.
For influenza, we were unable to reliably estimate the fitness effects of individual mutations from the H3N2 phylogeny. We believe that these inference problems likely stem from the fact that many of these mutations occur only once in the phylogeny and in the same genetic background as other mutations in the HA protein. The shared phylogenetic ancestry of mutations creates an identifiability problem akin to the problem of collinearity in more standard regressiontype models. In either case, the individual effects of highly correlated variables are difficult or impossible to infer. Nevertheless, including the mutational fitness effects predicted by deep mutational scanning experiments improved the fit of the MFBD model to the H3N2 phylogeny by thousands of log likelihood units. Accounting for these fitness effects in the MFBD model also revealed substantial variation in populationlevel fitness among viral lineages within a single antigenic cluster. Most lineages were reconstructed to have a slightly deleterious mutation load, consistent with earlier reports that background variation in fitness arising from deleterious mutations, not just antigenic mutations, plays a large role in determining which H3N2 lineages ultimately persist (Illingworth and Mustonen, 2012; Luksza and Lässig, 2014; Koelle and Rasmussen, 2015). Moreover, the fitness variation uncovered by our analysis likely represents only the ‘tip of the iceberg’, since there are likely mutations in other genomic segments besides HA with large fitness effects (Raghwani et al., 2017), which we did not consider.
The influenza analysis highlights some of the inevitable difficulties encountered when inferring mutational fitness effects from phylogenies. Increasing the number of sites under consideration also increases the complexity of the genetic background in which mutations occur due to the increased probability of mutations being linked to other mutations rather than occurring in isolation. This leads to strong correlations between the fitness effects of different sites in an increasingly high dimensional parameter space, making statistical inference challenging, especially using MCMC methods. Spurious correlations may also arise due to additional, unmodeled sources of fitness variation. For example, if a mutation occurs coincidently with another beneficial mutation or the mutation occurs by chance along a lineage spreading through a higher fitness environment, it will likely be inferred to increase fitness even if it is actually neutral. In the future, the MFBD should therefore be extended to account for unmodeled sources of fitness variation. For example, each lineage could be assigned a random fitness effect representing the unmodeled components of fitness variation. These random effects could then be modeled as a continuous trait evolving along lineages, such that more closely related lineages would be expected to have similar fitness and overly large changes between closely related lineages would be penalized. Such a model would then allow us to say whether a fitness effect attributed to a particular mutation could be equally well explained by random effects arising from unmodeled fitness variation. Until such a principled approach is implemented, the fitness effects of individual mutations need to be interpreted carefully unless they occur in multiple genetic backgrounds and confounding sources of fitness variation can be accounted for, as we tried to do for Ebola by including potentially confounding geographic fitness effects.
In spite of these shortcomings, we believe the MFBD model offers a powerful means to explore many questions not previously possible with strictly neutral phylodynamic models. Even if the fitness effects of individual mutations are not identifiable, it may still be possible to infer the distribution of fitness effects across sites, a key determinant of adaptive evolution that has only been explored in a few systems (EyreWalker and Keightley, 2007). The MFBD model can also be used to compare the fitness of a mutation or lineage across different environments, such as in different hosts of a pathogen. Finally, the MFBD is not limited to exploring sequence evolution, as the model is generalizable to any discrete character state, including phenotypic, geographic or environmental characters. Thus, more generally, our model can be thought of as a multitrait, multitype birthdeath model that can be used to explore how different molecular and nonmolecular characters interact to shape the overall fitness of lineages in a phylogeny.
References

Detection of HIV transmission clusters from phylogenetic trees using a multistate birth–death modelJournal of the Royal Society Interface 15:20180512.https://doi.org/10.1098/rsif.2018.0512

BEAST 2: a software platform for bayesian evolutionary analysisPLOS Computational Biology 10:e1003537.https://doi.org/10.1371/journal.pcbi.1003537

MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research 32:1792–1797.https://doi.org/10.1093/nar/gkh340

Simple genomes, complex interactions: epistasis in RNA virusChaos: An Interdisciplinary Journal of Nonlinear Science 20:026106.https://doi.org/10.1063/1.3449300

The distribution of fitness effects of new mutationsNature Reviews Genetics 8:610–618.https://doi.org/10.1038/nrg2146

Evolutionary trees from DNA sequences: a maximum likelihood approachJournal of Molecular Evolution 17:368–376.https://doi.org/10.1007/BF01734359

Stochastic simulation of chemical kineticsAnnual Review of Physical Chemistry 58:35–55.https://doi.org/10.1146/annurev.physchem.58.032806.104637

A codonbased model of nucleotide substitution for proteincoding DNA sequencesMolecular Biology and Evolution 11:725–736.https://doi.org/10.1093/oxfordjournals.molbev.a040153

Sampling theory for neutral alleles in a varying environmentPhilosophical Transactions of the Royal Soceity B Biological Sciences 344:403–410.https://doi.org/10.1098/rstb.1994.0079

Evolutionary distances for proteincoding sequences: modeling site specific residue frequenciesMolecular Biology and Evolution 15:910–917.https://doi.org/10.1093/oxfordjournals.molbev.a025995

The population genetics of dN/dSPLOS Genetics 4:e1000304.https://doi.org/10.1371/journal.pgen.1000304

Phylodynamics with migration: a computational framework to quantify population structure from genomic dataMolecular Biology and Evolution 33:2102–2116.https://doi.org/10.1093/molbev/msw064

A bayesian mixture model for acrosssite heterogeneities in the aminoacid replacement processMolecular Biology and Evolution 21:1095–1109.https://doi.org/10.1093/molbev/msh112

The evolution and genetics of virus host shiftsPLOS Pathogens 10:e1004395.https://doi.org/10.1371/journal.ppat.1004395

Estimating a binary character's effect on speciation and extinctionSystematic Biology 56:701–710.https://doi.org/10.1080/10635150701607033

Distortions in genealogies due to purifying selectionMolecular Biology and Evolution 29:3589–3600.https://doi.org/10.1093/molbev/mss170

An integrated framework for the inference of viral population history from reconstructed genealogiesGenetics 155:1429–1437.

Viral load and heterosexual transmission of human immunodeficiency virus type 1New England Journal of Medicine 342:921–929.https://doi.org/10.1056/NEJM200003303421303

BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic treesMethods in Ecology and Evolution 5:701–707.https://doi.org/10.1111/2041210X.12199

Probability distribution of molecular evolutionary trees: a new method of phylogenetic inferenceJournal of Molecular Evolution 43:304–311.https://doi.org/10.1007/BF02338839

On incomplete sampling under birth–death models and connections to the samplingbased coalescentJournal of Theoretical Biology 261:58–66.https://doi.org/10.1016/j.jtbi.2009.07.018

Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methodsPhilosophical Transactions of the Royal Society B: Biological Sciences 368:20120198.https://doi.org/10.1098/rstb.2012.0198

Fast dating using leastsquares criteria and algorithmsSystematic Biology 65:82–97.https://doi.org/10.1093/sysbio/syv068

RNA virus evolution via a fitnessspace modelPhysical Review Letters 76:4440–4443.https://doi.org/10.1103/PhysRevLett.76.4440

The mutational robustness of influenza A virusPLOS Pathogens 12:e1005856.https://doi.org/10.1371/journal.ppat.1005856

Ebola situation reportsEbola situation reports, http://apps.who.int/ebola/ebolasituationreports.

MutationSelection models of Codon substitution and their use to estimate selective strengths on Codon usageMolecular Biology and Evolution 25:568–579.https://doi.org/10.1093/molbev/msm284

Quantifying selection against synonymous mutations in HIV1 env evolutionJournal of Virology 87:11843–11850.https://doi.org/10.1128/JVI.0152913
Decision letter

Aleksandra M WalczakReviewing Editor; École Normale Supérieure, France

Diethard TautzSenior Editor; MaxPlanck Institute for Evolutionary Biology, Germany

Trevor BedfordReviewer; Fred Hutchinson Cancer Research Center, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Coupling adaptive molecular evolution to phylodynamics using fitnessdependent birthdeath models" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Diethard Tautz as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Trevor Bedford (Reviewer #1).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
The reviewers and editor agree that the subject of introducing selection and fitness into phylogenic inference is extremely important, neglected and very hard. For these reasons everyone is very enthusiastic about this effort. Nevertheless, all reviewers raise concerns both about presentation and about validation and applications. The reviewers should be understanding if not all their suggestions can be incorporated but we would urge you to try and satisfy them as much as possible, and very clearly explain instances where you cannot follow their suggestions. In our opinion, given the scope of eLife and that many readers will probably be less expert in probability theory than the reviewers, we think it is extremely important to be very clear about the assumptions and where the method works and where it falls short (see comments by reviewer 2). In light of this, reviewer 1 comments that testing different systems will provide such intuition for the less mathematically trained. You do not have to worry about selling the method – I think this is a point where we also learn from negative results.
We are including the reviewers' comments in full, since they are all instructive and each is very different.
Reviewer #1:
I've been working in the field of phylodynamics for the last several years, and I can say that it's widely acknowledged how difficult it's been to bridge the more traditional craft of demographic inferences of population size and migration to more fundamental evolutionary insights of strain fitnesses. I applaud the efforts of Rasmussen and Stadler to tackle such an important issue in their manuscript "Coupling adaptive molecular evolution to phylodynamics using fitnessdependent birthdeath models". I believe this work represents an innovative modeling avenue to address fitness in phylodynamic inference and will eventually lead to many fruitful biological insights.
I found the model explication clear and well motivated. Although I admit difficulty with the full mathematical details, I could certainly follow the overall approach and was given a good sense of what approximations had been taken and why.
I believe the core modeling advance is completely sound and warrants publication. However, I believe that for eLife in particular, a manuscript should point the way towards biological insight. The application to Ebola in West Africa is not as biologically exciting as other systems; the method appears to work, but there's just little adaptive evolution going on. I believe the manuscript would find more traction with readers if it showed a clear example of elucidating impactful fitness differences in an evolving population.
Major suggestions:
1) My central suggestion would be to include a second biological example in which strain variation should be more apparent. Given David Rasmussen's expertise and the known biology of the system, I would suggest H3N2 influenza. Working with influenza phylogenies, there are often clear cut examples of new mutations emerging that rapidly increase in the population (such as HA1 F159Y, https://nextstrain.org/flu/seasonal/h3n2/ha/12y?c=gtHA1_159). If a few of these genotypes are selected (such as those dictated by the Koel 7, see Koel et al., 2013), are there fitness differences visible in the phylogeny?
I'm not suggesting to do any sort of formal prediction from these fitnesses. Just to see if H3N2 phylogenies give reasonable model outputs.
Alternatively, if flu doesn't work as a motivating example, this should be discussed, as it's so obvious. I don't know if the low sampling fraction would pose an issue to the model.
There were a couple issues I had with Ebola analysis. Nothing fundamental, but issues that should be addressed to get at the biology. These were:
2) I remain concerned about the conflation of epidemiological circumstance and evolutionary fitness in these inferences (Bedford and Malik, 2016, Cell). In particular, the GP A82V mutation corresponds cleanly to a transition from early circulation in Guinea to circulation in Sierra Leone. Could this be addressed technically by adding an additional "site" to the model distinguishing between Guinea, Sierra Leone and Liberia?
3) It seems funny to me to only look the subset of sites investigated by Urbanowicz et al. in the lab. There were other common mutations in GP that weren't looked at by Urbanowicz et al. For example, both R321K and E580G (https://nextstrain.org/ebola?c=gtGP_321,580) are significantly more common than R29K, I371V and K439E (https://nextstrain.org/ebola?c=gtGP_29,371,439). Thus, it's weird to me to present a table that lists fitness impacts of several mutations, when there are other mutations that are largely colinear with the presented mutations, that may have actually been the causal variants.
I have this same issue with genes outside of GP. There are highly successful mutations like NP R111C (https://nextstrain.org/ebola?c=gtNP_111) that are not accounted for. These may be driving the fitness differences observed. For example, L177A visually correlates better with the successful clade than GP E410V, which received a lot of model weight.
Is it possible to put forth a more systematic approach here? I don't think with what's been done that it's possible to conclude much about genetic variants driving transmission differences during the West African epidemic. I worry that handing out an approach as described will lead to people cherry picking what sites they use to define strains in subsequent analyses. There should be some guidance for how to approach this objectively.
Reviewer #2:
The manuscript "Coupling adaptive molecular evolution to phylodynamics using fitnessdependent birthdeath model" introduces a probabilistic framework to estimate the joint distribution of the sequence data and of the phylogenetic tree, in which the tree is affected by the mutational process. The authors first introduce a general set of coupled different equations for the backwards evolution of the probabilities of observed subtrees and nonobserved lineages. As these equations are intractable, a series of approximations to reduce the size of the problem are considered, including projections onto lowdimensional subspaces, decorrelation between sites in the genome… These simplified equations are solved in three cases: two toy models (small number of sites), for which the ground truth is known, and Ebola glycoprotein (GP) data sampled during the 201316 epidemic. Based on their computational approach, the authors can quantitatively estimate the fitness effects of 8 GP mutants.
I fully agree with the authors that estimating the likelihood of sequence data and tree altogether, without any unrealistic separation between the tree dynamics and the mutation processes, is an important and very challenging goal. Despite the biological importance of this issue, few works have considered the coupling between the tree dynamics and fitness so far, see for instance Neher, Russel and Shraiman (2014). The present work by Rasmussen and Stadler is an interesting and sophisticated step in this direction. I have however several criticisms against the current version of the manuscript, both on form and content.
My first comment is about the accessibility of the paper, more precisely, whether it is selfcontained or not. While some compromise has always to be made between a full, selfcontained presentation and the necessity to have a reasonable length, I think the manuscript should be made more accessible. I had to go to Stadler and Bonhoeffer (2013) to understand clearly the meaning of the main quantities (D_{n,i} and E_{i}); inserting a figure similar to Figure 1 of that paper would be useful. Equations 2 and 3, which are identical to the ones of Stadler and Bonhoeffer (apart from the presence of mutations – γ_{i,j} – instead of lineage – λ_{i,j} – rates), are also better explained in that paper. Furthermore, the presence of the normalization factor including the probability of being observed in Equation 5 is not really explained here. Again, the 2013 paper was helpful to understand the origin of this denominator.
Secondly, I agree that the set of coupled equations over the D and E variables is intractable, as its solution would require to track an exponentially large (in the genome length) number of quantities. There is no doubt that simplifying approximations are needed, but I am worried that the range of validity of these approximations is not adequately studied here. My concerns are:
1) The dimensionality of the problem is reduced from exponential to linear (in the number of sites) for the D probabilities, while the E probabilities are restricted to a discrete set of points in the fitness space. The corresponding equations, respectively, 19 and 15, are obtained by ad hoc modifications from the original evolution equations, i.e. they are not derived by clear methods that would help the reader understand the magnitude of the terms neglected. It is far from being clear if and when these approximations are acceptable. Again, I understand that approximations are needed, I simply would like to see some justification or some conditions under which the approximations would not affect too much the outcome. The necessity of dimensional reduction is not specific to computational evolutionary biology. Chemical master equations are also virtually of infinite dimensions, and cannot be solved without simplifications, in particular finitestate projections, reminiscent of the procedures followed in the present manuscript. The accuracy of these simplifications is a subject of concern and is studied, see for instance Munsky and Kammash (2006, J. Chem. Phys.).
2) Another drastic hypothesis, besides dimensional reduction, is the decorrelation of sites in the fitness function, see Equation 12. Obviously, epistatic effects are present in reality, and it is therefore important to understand how they would bias the estimates of the fitness effects predicted by the algorithm. This point is studied with the simulated 2site model (bottom panels in Figure 1), but I was unable to understand what the conclusion (the obvious fact that errors grow with the strength of selection against the double mutant) entails for more complex cases. What can we expect as the genome size grows, and epistatic effects accumulate? Even in the absence of coupling in the fitness, the quality of the prediction of sitefitness effects seems to deteriorate rapidly with the genome size, from 2 to 10 sites, see top panels in Figure 3. The latter size is comparable to the number (9) of mutations studied for GP, but how much could the results of Table 1 affected by the other, not studied mutations in this Ebola virus protein?
3) Across a large phylogenetic tree, it is expected that the relationship between the fitness and the sequences varies across lineages, e.g. owing to changes in the environmental conditions. I wonder how much such effects could influence the predictions for the fitness effects output by the algorithm. To be more specific, suppose that the fitness f changes by some epsilon (depending on the lineage n), how much would the D and E probabilities change? I expect that this could be studied numerically in the small size models, or even analytically with linearresponse theory. Assessing the robustness of the predictions against slight variations in the sequencetofitness relation is an important issue.
As a conclusion, this work is interesting and can be appreciated as a valuable effort to solve an important problem in phylogeny/sequences estimation. A substantial number of further investigations is, however, required to assess the validity and robustness of the method, before one can fully trust its predictions.
Reviewer #3:
This is an interesting paper that addresses the problem of jointly modeling mutation, selection, and sampling from a population, where the individuals are related through a phylogenetic tree. The idea is that selection can alter the expected shape of a tree, and therefore the tree shape is informative about both the genotypic state, and the selection coefficient.
The model builds upon an earlier work (Stadler and Bonhoeffer, 2013) in which a model for the evolution of a single site under selection was introduced. The current paper extends this model to multiple sites under selection. The authors achieve this by making additional assumptions, boiling down to assuming no linkage between these sites.
I am unsure about one aspect of the model – the initialisation of the variable D(t), which is initialized as the product of the probability a lineage is sampled upon death, s, and the rate that the lineage dies, d. I don't see why the death rate comes into this – if one state has a particularly low death rate, it seems that we should be likely to see it, rather than less likely – after all the pathogen does not need to die for the sample to be taken.
The authors show that they are able to infer the selection coefficient / fitness cost in a 4genotype scenario, where the fitness cost is 0.5. In the application to actual data, the corresponding parameter is inferred to be in the range 515%, rather than 50%; and many of the inferences include 0 in their 95% confidence interval. As it stands it is hard to have strong confidence in these results, as there are no simulations to show that the model is sufficiently powered and provides reasonably unbiased estimates in this regime. It would be very helpful if the authors could provide a simulation with parameters in the regime appropriate for the real data.
https://doi.org/10.7554/eLife.45562.013Author response
Reviewer #1:
I've been working in the field of phylodynamics for the last several years, and I can say that it's widely acknowledged how difficult it's been to bridge the more traditional craft of demographic inferences of population size and migration to more fundamental evolutionary insights of strain fitnesses. I applaud the efforts of Rasmussen and Stadler to tackle such an important issue in their manuscript "Coupling adaptive molecular evolution to phylodynamics using fitnessdependent birthdeath models". I believe this work represents an innovative modeling avenue to address fitness in phylodynamic inference and will eventually lead to many fruitful biological insights.
I found the model explication clear and well motivated. Although I admit difficulty with the full mathematical details, I could certainly follow the overall approach and was given a good sense of what approximations had been taken and why.
I believe the core modeling advance is completely sound and warrants publication. However, I believe that for eLife in particular, a manuscript should point the way towards biological insight. The application to Ebola in West Africa is not as biologically exciting as other systems; the method appears to work, but there's just little adaptive evolution going on. I believe the manuscript would find more traction with readers if it showed a clear example of elucidating impactful fitness differences in an evolving population.
Thanks for your comments, Trevor. We acknowledge that Ebola virus has had limited time to adapt to the human population, and other rapidly evolving viruses like influenza exhibit more adaptation.
Major suggestions:
1) My central suggestion would be to include a second biological example in which strain variation should be more apparent. Given David Rasmussen's expertise and the known biology of the system, I would suggest H3N2 influenza. Working with influenza phylogenies, there are often clear cut examples of new mutations emerging that rapidly increase in the population (such as HA1 F159Y, https://nextstrain.org/flu/seasonal/h3n2/ha/12y?c=gtHA1_159). If a few of these genotypes are selected (such as those dictated by the Koel 7, see Koel et al., 2013), are there fitness differences visible in the phylogeny?
I'm not suggesting to do any sort of formal prediction from these fitnesses. Just to see if H3N2 phylogenies give reasonable model outputs.
Alternatively, if flu doesn't work as a motivating example, this should be discussed, as it's so obvious. I don't know if the low sampling fraction would pose an issue to the model.
We have now also applied our method to influenza H3N2 as a second motivating example. In particular, we applied our method to look at fitness variation among strains in the Perth 2009 antigenic cluster. This allows for direct comparisons of populationlevel fitness effects estimated using our method with those estimated by Lee et al. (2018) in vitro using deep mutational scanning.
However, we found it very difficult to estimate sitespecific mutational fitness effects for H3N2, even when we restricted our analysis to a small subset of naturally occurring mutations. We think that this is likely due to the fact that many of these mutations occur only once in the phylogeny and in the same genetic background as other mutations, and this shared common ancestry between mutations creates a problem with identifiability similar to the problem of colinearity among highly correlated variables in regressiontype models.
While this is disappointing, we also performed a second analysis where we let the deep mutational scanning data from Lee et al. inform our MFBD about the fitness effects of the naturally occurring mutations. This model fit the H3N2 phylogeny dramatically better than a model where mutations were assumed to be neutral. Moreover, reconstructing the ancestral fitness of lineages in the phylogeny under this model revealed substantial fitness variation among lineages within the Perth 2009 cluster, with most lineages carrying a mildly deleterious mutation load.
Given that previous work on H3N2 has argued for the importance of fitness variation within antigenic clusters in determining which lineages survive (Luska and Lässig, 2014; Koelle and Rasmussen, 2015), we believe this application to H3N2 provides another example of how our model can be applied to gain biological insights into the mechanisms of pathogen adaptation.
Looking at the F159Y mutation would have also been be interesting. But we only considered lineages circulating between 2009 and 2012 in our analysis and all lineages circulating before 2014 have the F allele.
There were a couple issues I had with Ebola analysis. Nothing fundamental, but issues that should be addressed to get at the biology. These were:
2) I remain concerned about the conflation of epidemiological circumstance and evolutionary fitness in these inferences (Bedford and Malik, 2016, Cell). In particular, the GP A82V mutation corresponds cleanly to a transition from early circulation in Guinea to circulation in Sierra Leone. Could this be addressed technically by adding an additional "site" to the model distinguishing between Guinea, Sierra Leone and Liberia?
This is an excellent point. We have now rerun the Ebola analysis with the geographic location as an additional “site” as suggested. However, we found no differences in transmission rates between locations. The A82V and other mutant genotypes are actually estimated to have slightly larger fitness effects when we account for geographic effects due to a lower estimated fitness for the Makona genotype. The rank order of the genotype fitness values is largely conserved. These results are now included in the main text.
3) It seems funny to me to only look the subset of sites investigated by Urbanowicz et al. in the lab. There were other common mutations in GP that weren't looked at by Urbanowicz et al. For example, both R321K and E580G (https://nextstrain.org/ebola?c=gtGP_321,580) are significantly more common than R29K, I371V and K439E (https://nextstrain.org/ebola?c=gtGP_29,371,439). Thus, it's weird to me to present a table that lists fitness impacts of several mutations, when there are other mutations that are largely colinear with the presented mutations, that may have actually been the causal variants.
I have this same issue with genes outside of GP. There are highly successful mutations like NP R111C (https://nextstrain.org/ebola?c=gtNP_111) that are not accounted for. These may be driving the fitness differences observed. For example, L177A visually correlates better with the successful clade than GP E410V, which received a lot of model weight.
While we acknowledge that any of these additional mutations might be driving fitness differences between lineage, we don’t believe including these mutations in our analysis is as interesting as including geographic effects. There will likely always be additional unmodeled sites affecting fitness, but our method is not intended to provide a complete “genome scan” for sites under selection. Rather it is a way to formally test hypotheses about the fitness effects of particular mutations. Thus while our analysis is limited to the mutations studied by Urbanowicz et al., this is the intended use of our method and similar to what we imagine others will use the method for.
Is it possible to put forth a more systematic approach here? I don't think with what's been done that it's possible to conclude much about genetic variants driving transmission differences during the West African epidemic. I worry that handing out an approach as described will lead to people cherry picking what sites they use to define strains in subsequent analyses. There should be some guidance for how to approach this objectively.
We discuss issues with differentiating between the fitness effects of individual mutations versus unmodeled fitness variation in the Discussion. We also warn against over interpreting estimates of mutational fitness effects unless “they occur in multiple genetic backgrounds and confounding sources of fitness variation can be accounted for”. Furthermore, we think that including the additional Ebola analysis with geographic effects provides a good example of how conclusions about the fitness effects of particular mutations can and should be tested by including other potentially confounding sources of fitness variation in the analysis.
However, we agree with the reviewer that there is likely a more general way of accounting for unmodeled fitness effects. We now suggest one potential approach in the Discussion:
“In the future, the MFBD could be extended to account for unmodeled sources of fitness variation. For example, each lineage could be assigned a random fitness effect representing the unmodeled components of fitness variation. These random effects could then be modeled as a continuous trait evolving along lineages, such that more closely related lineages would be expected to have similar fitness due to shared genetic ancestry and overly large changes in fitness between closely related lineages would be penalized. Such a model would then allow us to say whether or not a mutational fitness effect could be equally well explained by random effects arising from unmodeled effects.”
Reviewer #2:
[…]
My first comment is about the accessibility of the paper, more precisely, whether it is selfcontained or not. While some compromise has always to be made between a full, selfcontained presentation and the necessity to have a reasonable length, I think the manuscript should be made more accessible. I had to go to Stadler and Bonhoeffer (2013) to understand clearly the meaning of the main quantities (D_{n,i} and E_{i}); inserting a figure similar to Figure 1 of that paper would be useful. Equations 2 and 3, which are identical to the ones of Stadler and Bonhoeffer (apart from the presence of mutations – γ_{i,j} – instead of lineage – λ_{i,j} – rates), are also better explained in that paper. Furthermore, the presence of the normalization factor including the probability of being observed in Equation 5 is not really explained here. Again, the 2013 paper was helpful to understand the origin of this denominator.
We have tried our best to make the model description as clear, precise and accessible as possible. But we envision our paper having two types of readers: those more interested in the biological applications and those more interested in the mathematical details. For the first type of reader, we have included a highlevel summary of the model in Figure 1. For those interested in the mathematical details, we have chosen to devote more space to describing the new marginal fitnessbirthdeath model in detail than the original multitype model. We agree with the reviewer that a reader interested in the full details of the original multitype model may need to go back and consult Stadler and Bonhoeffer (2013).
We have made adjustments to improve accessibility as the reviewer suggested. Most notably, we have followed the reviewers suggestion and added a panel to Figure 1 illustrating the original multitype birthdeath model (including the D_{n,I} and E_{i} probabilities) and how this differs from the new marginal fitness birthdeath model. We also now describe the normalization factor used in Equation 5 to condition the birthdeath process on leaving at least on sampled descendent.
Secondly, I agree that the set of coupled equations over the D and E variables is intractable, as its solution would require to track an exponentially large (in the genome length) number of quantities. There is no doubt that simplifying approximations are needed, but I am worried that the range of validity of these approximations is not adequately studied here. My concerns are:
1) The dimensionality of the problem is reduced from exponential to linear (in the number of sites) for the D probabilities, while the E probabilities are restricted to a discrete set of points in the fitness space. The corresponding equations, respectively, 19 and 15, are obtained by ad hoc modifications from the original evolution equations, i.e. they are not derived by clear methods that would help the reader understand the magnitude of the terms neglected. It is far from being clear if and when these wild approximations are acceptable. Again, I understand that approximations are needed, I simply would like to see some justification or some conditions under which the approximations would not affect too much the outcome. The necessity of dimensional reduction is not specific to computational evolutionary biology. Chemical master equations are also virtually of infinite dimensions, and cannot be solved without simplifications, in particular finitestate projections, reminiscent of the procedures followed in the present manuscript. The accuracy of these simplifications is a subject of concern and is studied, see for instance Munsky and Kammash (2006, J. Chem. Phys.).
We agree with the reviewer that we gave too little attention to the validity of our approximations under different conditions in the original manuscript. Thus, instead of focusing our analysis of the fourgenotype model on individual simulated phylogenies, we now more broadly explore how well the MFBD performs against the exact multitype birthdeath model under different evolutionary dynamics including over a wide range of mutation rates, selection coefficients and epistatic fitness effects. In particular, Figures 3 and 4 now compare the D and E probabilities along a lineage approximated under the MFBD model with the exact model. This new analysis shows that the MFBD approximates the D and E probabilities well under most conditions and that only under rather extreme conditions (very strong selection or epistatic interactions) do approximations introduce appreciable error. But even in these extreme cases the overall magnitude of error introduced is rather small.
In the revision, we also now give more attention to the approximation we used to track the E probabilities in discretized fitness space. While the reviewer sees our approximation as “wild”, we show that our approximation tracks the E probabilities well for a given lineage (see new Figure 4). In fact, using an even more dramatic approximation used in previous multitype birthdeath methods (Rabosky et al., 2014; BaridoSottani et al., 2018), where transitions between states/fitness classes are ignored altogether along unobserved lineages, still provides accurate results. Thus, while neither approximation tracks the detailed dynamics of how unobserved lineages move through genotype space and thus fitness space, this does not seem necessary in order to capture the overall dynamics of how the E probabilities evolve backwards in time, which is all that is necessary for the MFBD model.
2) Another drastic hypothesis, besides dimensional reduction, is the decorrelation of sites in the fitness function, see Equation 12. Obviously, epistatic effects are present in reality, and it is therefore important to understand how they would bias the estimates of the fitness effects predicted by the algorithm. This point is studied with the simulated 2site model (bottom panels in Figure 1), but I was unable to understand what the conclusion (the obvious fact that errors grow with the strength of selection against the double mutant) entails for more complex cases. What can we expect as the genome size grows, and epistatic effects accumulate? Even in the absence of coupling in the fitness, the quality of the prediction of sitefitness effects seems to deteriorate rapidly with the genome size, from 2 to 10 sites, see top panels in Figure 3. The latter size is comparable to the number (9) of mutations studied for GP, but how much could the results of Table 1 affected by the other, not studied mutations in this Ebola virus protein?
We expect that the betweensite correlations we found for the twosite, fourgenotype model would persist between any pair of sites, regardless of the total genome length, because we are assuming complete linkage between sites (no recombination). Thus, we always expect the correlations to become stronger as the strength of selection or epistatic fitness effects grows, regardless of genome length.
However, we do not believe these correlations cause the accuracy of our estimated sitespecific fitness effects to decline with more sites. A far more likely reason for this decline is that the complexity of the genetic background in which mutations occur increases with more sites. Mutations therefore start to occur in the same genetic background as other mutations, making it difficult to estimate the effects of individual mutations. This leads to more uncertainty and variability in our estimates of sitespecific effects, which explains the loss of accuracy with increasing numbers of sites.
3) Across a large phylogenetic tree, it is expected that the relationship between the fitness and the sequences varies across lineages, e.g. owing to changes in the environmental conditions. I wonder how much such effects could influence the predictions for the fitness effects output by the algorithm. To be more specific, suppose that the fitness f changes by some epsilon (depending on the lineage n), how much would the D and E probabilities change? I expect that this could be studied numerically in the small size models, or even analytically with linearresponse theory. Assessing the robustness of the predictions against slight variations in the sequencetofitness relation is an important issue.
We also expect that the sequencetofitness relationship would vary across the phylogeny due to changes in environmental conditions or other unmodeled fitness effects, such as mutations at other sites. One way of assessing the impact of this background fitness variation on our fitness estimates is to directly incorporate potentially confounding variables into the model. Following the suggestion of reviewer #1, we now take this approach in our Ebola analysis by including the geographic location of lineages in our fitness model.
In the Discussion, we also now suggest a more principled approach to dealing with background variation in fitness due to unmodeled effects: random fitness effects could be assigned to individual lineages, allowing us to say whether a fitness effect ascribed to a particular mutation could equally well be explained by random effects representing unmodeled sources of fitness variation.
We could, as the reviewer suggests, also perform an analysis where we add background “noise” into the mapping between sequences and fitness, but it seems reasonable to expect that the quality of our estimates would depend entirely on the magnitude of this noise.
As a conclusion, this work is interesting and can be appreciated as a valuable effort to solve an important problem in phylogeny/sequences estimation. A substantial number of further investigations is, however, required to assess the validity and robustness of the method, before one can fully trust its predictions.
Reviewer #3:
This is an interesting paper that addresses the problem of jointly modeling mutation, selection, and sampling from a population, where the individuals are related through a phylogenetic tree. The idea is that selection can alter the expected shape of a tree, and therefore the tree shape is informative about both the genotypic state, and the selection coefficient.
The model builds upon an earlier work (Stadler and Bonhoeffer, 2013) in which a model for the evolution of a single site under selection was introduced. The current paper extends this model to multiple sites under selection. The authors achieve this by making additional assumptions, boiling down to assuming no linkage between these sites.
Yes, except to clarify we are actually assuming complete linkage between sites because we do not consider recombination. However, along an individual lineage, we are assuming correlations between the state of different sites can be ignored when calculating genotype probabilities. This is actually quite different from assuming no linkage in a standard population genetics model, where recombination would break up correlations between alleles at different sites at the population level. Our assumption is much more restricted: we are only assuming these correlations can be ignored at the level of an individual lineage.
I am unsure about one aspect of the model – the initialisation of the variable D(t), which is initialized as the product of the probability a lineage is sampled upon death, s, and the rate that the lineage dies, d. I don't see why the death rate comes into this – if one state has a particularly low death rate, it seems that we should be likely to see it, rather than less likely – after all the pathogen does not need to die for the sample to be taken.
In our version of the multitype birthdeath model, sampling is assumed to be coupled to death/removal events. A sampled individual must therefore have died to be sampled, and the D(t) probabilities need to reflect this. However, a lineage can die and not be sampled, and in this case the reviewer’s reasoning is correct: a lineage with a lower death rate will have a higher probability of surviving and therefore of being observed at a future sampling event.
The MTBD can be reformulated such that individuals can be sampled independently of death rates, but we would then need to take into account sampled ancestors where a sampled lineage can survive and produce other sampled lineages after the time of sampling. This can be done (see Kühnert et al., 2016), but we feel that this is beyond the scope of an already rather long paper.
The authors show that they are able to infer the selection coefficient / fitness cost in a 4genotype scenario, where the fitness cost is 0.5. In the application to actual data, the corresponding parameter is inferred to be in the range 515%, rather than 50%; and many of the inferences include 0 in their 95% confidence interval. As it stands it is hard to have strong confidence in these results, as there are no simulations to show that the model is sufficiently powered and provides reasonably unbiased estimates in this regime. It would be very helpful if the authors could provide a simulation with parameters in the regime appropriate for the real data.
We do consider estimating sitespecific fitness effects in the 515% range in Figure 6, where the mutational fitness effects in these simulations were chosen to randomly vary between ~0.4 and ~1.2. While these simulations were not exactly calibrated to the epidemiology of Ebola, they do show that our model can consistently estimate sitespecific fitness effects over a wide range of birth rates, mutation rates, sampling fractions and even mutational fitness effects.
https://doi.org/10.7554/eLife.45562.014Article and author information
Author details
Funding
Seventh Framework Programme (European Research Commission)
 Tanja Stadler
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We would like to thank Denise Kühnert for helpful advice on how to implement the MFBD model in BEAST 2, Katia Koelle for advice on the influenza H3N2 analysis, and Louis du Plessis for permission to reproduce the model schematic shown in Figure 1A.
Senior Editor
 Diethard Tautz, MaxPlanck Institute for Evolutionary Biology, Germany
Reviewing Editor
 Aleksandra M Walczak, École Normale Supérieure, France
Reviewer
 Trevor Bedford, Fred Hutchinson Cancer Research Center, United States
Publication history
 Received: January 27, 2019
 Accepted: July 26, 2019
 Accepted Manuscript published: August 14, 2019 (version 1)
 Accepted Manuscript updated: August 15, 2019 (version 2)
 Version of Record published: August 29, 2019 (version 3)
Copyright
© 2019, Rasmussen and Stadler
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,387
 Page views

 218
 Downloads

 4
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.