Dengue genetic divergence generates withinserotype antigenic variation, but serotypes dominate evolutionary dynamics
Abstract
Dengue virus (DENV) exists as four genetically distinct serotypes, each of which is historically assumed to be antigenically uniform. Recent analyses suggest that antigenic heterogeneity may exist within each serotype, but its source, extent and impact remain unclear. Here, we construct a sequencebased model to directly map antigenic change to underlying genetic divergence. We identify 49 specific substitutions and four colinear substitution clusters that robustly predict dengue antigenic relationships. We report moderate antigenic diversity within each serotype, resulting in genotypespecific patterns of heterotypic crossneutralization. We also quantify the impact of antigenic variation on realworld DENV population dynamics, and find that serotypelevel antigenic fitness is a dominant driver of dengue clade turnover. These results provide a more nuanced understanding of the relationship between dengue genetic and antigenic evolution, and quantify the effect of antigenic fitness on dengue evolutionary dynamics.
https://doi.org/10.7554/eLife.42496.001Introduction
Dengue virus (DENV) is a mosquitoborne flavivirus which consists of four genetically distinct clades, canonically thought of as serotypes (DENV1 – DENV4) (Lanciotti et al., 1997). DENV circulates primarily in South America and Southeast Asia, infecting 400 million people annually. Primary DENV infection is more often mild and is thought to generate lifelong homotypic immunity and temporary heterotypic immunity, which typically wanes over 6 months to 2 years (Sabin, 1952; Reich et al., 2013; Katzelnick et al., 2016). Subsequent heterotypic secondary infection induces broad crossprotection, and symptomatic tertiary and quaternary cases are rare (Gibbons et al., 2007; Olkowski et al., 2013). However, a small subset of secondary infections are enhanced by nonneutralizing, crossreactive antibodies, resulting in severe disease via antibodydependent enhancement (ADE) (Halstead, 1979; Katzelnick et al., 2017; Sangkawibha et al., 1984; Salje et al., 2018). Approximately 1–3% of cases progress to severe dengue hemorrhagic fever, causing $\sim $9000 deaths each year (Bhatt et al., 2013; Stanaway et al., 2016) and relative risk of severe dengue from secondary heterotypic infection relative to primary infection is estimated to be $\sim $24 (Mizumoto et al., 2014). Thus, the antigenic relationships between dengue viruses — describing whether the immune response generated after primary infection results in protection or enhancement of secondary infection — are key drivers of DENV case outcomes and epidemic patterns.
While each serotype is clearly genetically and antigenically distinct, it is not clear how subserotype clades of DENV interact antigenically. Each DENV serotype consists of broad genetic diversity (Figure 1A), including canonical clades termed ‘genotypes’ (RicoHesse, 1990; Twiddy et al., 2003). Specific genotypes have been associated with characteristically mild or severe disease, and heterogeneous neutralization titers suggest that the immune response to some genotypes is more crossprotective than others (Gentry et al., 1982; Russell and Nisalak, 1967). Until recently, it has been assumed that these intraserotype differences are minimally important compared to interserotype differences. However, empirical evidence has demonstrated that these genotypespecific differences can drive case outcomes and epidemic severity (reviewed in Holmes and Twiddy, 2003). For example, analysis of a longitudinal cohort study demonstrated that specific combinations of primary infection serotype and secondary infection genotype can mediate individual case outcomes (OhAinle et al., 2011). On a population scale, the DENV1immune population of Iquitos, Peru, experienced either entirely asymptomatic or very severe epidemic seasons in response to two different genotypes of DENV2 (Kochel et al., 2002).
One explanation for these and similar observations is that overlooked intraserotype antigenic variation contributes to these genotypespecific case outcomes and epidemic patterns. Recent efforts to antigenically characterize diverse DENV viruses suggests that each serotype may contain antigenic heterogeneity, but the source and impact of this heterogeneity is not clear (Katzelnick et al., 2015). Here, we characterize the evolutionary basis for observed antigenic heterogeneity among DENV clades. We also quantify the impact of within and betweenserotype antigenic variation on realworld DENV population dynamics.
Results
Dengue neutralization titer data
Antigenic distance between a pair of viruses $i$ and $j$ is experimentally quantified using neutralization titers, which measure how well serum drawn after infection with virus $j$ is able to neutralize virus $i$ in vitro (Russell and Nisalak, 1967). Throughout the following, we refer to serum raised against virus $j$ as serum $j$ for brevity. To measure the pairwise antigenic distances for a panel of diverse DENV viruses (Figure 1), Katzelnick et al. (2015) infected naive nonhuman primates (NHP) with each virus, drew sera at 3 months postinfection, and then titered this sera against a panel of test viruses. To compare patterns of crossprotection in NHP and humans, they also drew sera from 31 study participants 6 weeks after inoculation with a monovalent component of the NIH dengue vaccine candidate. This sera was also titered against a broad panel of DENV viruses. As originally reported, we find generally consistent patterns of neutralization between the NHP and human sera data; see Katzelnick et al. (2015) for a detailed comparison. In total, our dataset consists of 454 NHP sera titers spanning the breadth of DENV diversity, and 728 human sera titers providing deep coverage of a small subset of viruses.
To normalize these measurements, we take the log${}_{2}$ of each value, such that one antigenic unit corresponds to a twofold drop in neutralization, and we define antigenic distance between autologous serumvirus pairs (i.e. virus $i$ and serum $i$) as zero. Normalized antigenic distance between virus $i$ and serum $j$ is thus calculated as ${D}_{ij}={\mathrm{log}}_{2}({T}_{ii}){\mathrm{log}}_{2}({T}_{ij})$, such that a higher value of ${D}_{ij}$ indicates that serum $j$ is less effective at neutralizing virus $i$, implying greater antigenic distance between viruses $i$ and $j$. For brevity, these normalized titer values are hereafter referred to simply as log${}_{2}$(titers).
The full dataset of standardized titer values is shown in Figure 1B. Here, we see that homotypic virusserum pairs are more closely related antigenically than heterotypic pairs. However, we also observe large variance around this trend, both within and between serotypes. This suggests that treating each serotype as antigenically uniform potentially overlooks important antigenic heterogeneity across viruses within each serotype.
Dengue antigenic evolution corresponds to genetic divergence
Titer measurements are prone to noise, and there is a limited amount of available titer data. If the antigenic heterogeneity observed in the raw data is truly the result of an underlying evolutionary process, we expect that differences in antigenic phenotype correspond to underlying mutations in surface proteins. Dengue has two surface proteins, prM (membrane) and E (envelope). While previous studies have identified epitopes on both prM and E, it is believed that antibodies involved in ADE primarily target prM, while neutralizing antibodies primarily target E (de Alwis et al., 2014). The assay used to generate this titer dataset captures neutralization, but does not capture the effects of ADE; we thus focus our analysis on the E gene.
To fully map the relationship between DENV genetic and antigenic evolution, we adapt a substitutionbased model originally developed for influenza (Neher et al., 2016). Conceptually, this model predicts titer values through three steps. First, we align E gene sequences from titered dengue viruses and catalog the amino acid mutations between each serum strain and test virus strain in our dataset. Next, we infer how much antigenic change is attributable to specific mutations by constructing a parsimonious model that links normalized antigenic distances to observed mutations. This assigns each mutation $m$ an antigenic effect size, ${d}_{m}\ge 0$; forward and reverse mutations are assigned separate values of ${d}_{m}$. With this in hand, we estimate the asymmetrical antigenic distance ${\widehat{D}}_{ij}$ between all pairs of sera and test viruses by summing over ${d}_{m}$ for all mutations observed between the serum and the test virus (Materials and methods, Equation 2).
To learn these values of ${d}_{m}$, we first split our dataset into training (random 90% of measurements) and test data (the remaining 10% of values). We take the training data and fit ${d}_{m}$ for each mutation that is observed two or more times, subject to regularization as follows (also detailed in Materials and methods, Equation 3). Parsimoniously, we expect that antigenic change is more likely to be incurred by a few key mutations than by many mutations; correspondingly, our prior expectation of values of ${d}_{m}$ is exponentially distributed such that most values of ${d}_{m}=0$. This is directly analogous to lasso regression to identify a few parameters with positive weights and set other parameters to 0 (Tibshirani, 1996). Additionally, some viruses have greater binding avidity, and some sera are more potent than others (Figure 1—figure supplement 1); these ‘row’ and ‘column’ effects, respectively, are normally distributed and are taken into account when training the model. The model uses convex optimization to learn the values of ${d}_{m}$ that minimize the sum of squared errors (SSE) between observed and predicted titers in the training data. We thus learn model parameters from the training data, and then use those parameters to predict test data values. We assess model performance by comparing the predicted test titer values to the actual values, aggregated across 100fold Monte Carlo cross validation.
This model formulation is an effective tool for estimating antigenic relationships between viruses based on their genetic sequences. On average across crossvalidation replicates, this model yields a root mean squared error (RMSE) of 0.75 when predicting titers relative to their true value (95% CI 0.74–0.77, RMSE), and explains 78% of the observed variation in neutralization titers overall (95% CI 0.77–0.79, Pearson ${R}^{2}$). This is comparable to the model error from a cartographybased characterization of the same dataset (RMSE 0.65–0.8 log${}_{2}$ titer units) (Katzelnick et al., 2015). Prediction error was comparable between human and nonhuman primate sera, indicating that these genetic determinants of antigenic phenotypes are not host speciesspecific (Figure 2—figure supplement 2).
The 48 strains included in the titer dataset (as serum strains, test virus strains, or both) are 25.7% divergent on average (amino acid differences in E). Pairwise comparisons of all serum strains and test viruses yields 1534 unique mutations that are observed at least twice. Our parsimonious model attributes antigenic change to a total of 49 specific mutations and four colinear mutation clusters (each consisting of 2–6 cooccurring mutations) (Figure 2, Table 1). Each of these mutations confers 0.01–2.11 (median 0.19) log${}_{2}$ titer units of antigenic change; 27 mutations or mutation clusters have ${d}_{m}\ge 0.2$. These mutations span all domains of E, and most occur both between and within serotypes (Figure 2).
Each serotype of dengue contains moderate antigenic heterogeneity
By linking antigenic change to specific mutations, we are able to estimate unmeasured antigenic distances between any pair of viruses in the dataset based on their genetic differences. As an example, we estimated the antigenic distance between serum raised against each monovalent component of the NIH vaccine candidate and all other viruses in the dataset. As shown in Figure 3, vaccineelicited antibodies result in strong homotypic neutralization, but heterotypic crossneutralization varies widely between specific strains. This has important ramifications for vaccine design and trial evaluation.
We also observe antigenic heterogeneity at the genotype level. On average, heterotypic genotypes are separated by 6.9 antigenic mutations (or colinear mutation clusters) and 2.18 log${}_{2}$ titers. Homotypic genotypes are separated by a mean of 1.9 antigenic mutations, conferring a total of 0.30 log${}_{2}$ titers of antigenic distance (Figure 4). Notably, the titer dataset spans the breadth of canonical DENV genotypes, but in most cases lacks the resolution to detect withingenotype antigenic diversity. We thus expect that these results represent a lower bound on the true extent of DENV intraserotype antigenic diversity.
In summary, we have identified a small number of antigenically relevant mutations that explain most of the observed antigenic heterogeneity in dengue, as indicated by neutralization titers. These mutations occur both between and within serotypes, suggesting that dengue antigenic evolution is an ongoing, although gradual, process. This results in strainspecific and genotypelevel antigenic variation, although the scale of this variation is small compared to serotypelevel differences. From this, we conclude that there is antigenic variation within each serotype of DENV, and that this is driven by underlying genetic divergence.
Antigenic novelty predicts serotype success
From the titer model, we find evidence that homotypic genotypes of DENV vary in their ability to escape antibody neutralization. However, antibody neutralization is only one of many factors that shape epidemic patterns. We investigate whether the observed antigenic diversity influences dengue population dynamics in the real world.
The size of the viral population (i.e. prevalence, commonly analyzed using SIR models as reviewed in Lourenço et al., 2018) is determined by many complex factors, and reliable values for population prevalence are largely unavailable. Contrastingly, the composition of the viral population (i.e. the relative frequency of each viral clade currently circulating) can be estimated over time by examining historical sequence data (Lee et al., 2018; Neher et al., 2016), and is primarily driven by viral fitness (Bedford et al., 2011).
In meaningfully antigenically diverse viral populations, antigenic novelty (relative to standing population immunity) contributes to viral fitness: as a given virus $i$ circulates in a population, the proportion of the population that is susceptible to infection with $i$–and other viruses antigenically similar to $i$–decreases over time as more people acquire immunity (Bedford et al., 2012; Luksza and Lässig, 2014). Antigenically novel viruses that are able to escape this population immunity are better able to infect hosts and sustain transmission chains, making them fitter than the previously circulating viruses (Zhang et al., 2005; Bedford et al., 2012; Gupta et al., 1998; Wearing and Rohani, 2006; Lourenço and Recker, 2013). Thus, if antigenic novelty constitutes a fitness advantage for DENV, then we would expect greater antigenic distance from recently circulating viruses to correlate with higher growth rates.
To test this hypothesis, we examine the composition of the dengue virus population in Southeast Asia from 1970 to 2015. We estimate the relative population frequency of each DENV serotype at three month intervals, ${x}_{i}(t)$ (Figure 5A), based on their observed relative abundance in the ‘slice’ of the phylogeny corresponding to each timepoint (N = 8644 viruses; see Materials and methods, Equations 45). While there is insufficient data to directly compare these estimated frequencies to regional case counts, we see good qualitative concordance between frequencies similarly estimated for Thailand and previously reported case counts from Bangkok (Figure 5—figure supplement 1).
Fitter virus clades increase in frequency over time, such that ${x}_{i}(t+dt)>{x}_{i}(t)$. It follows that these clades have a growth rate—defined as the foldchange in frequency over time—greater than one: $\frac{{x}_{i}(t+dt)}{{x}_{i}(t)}>1$. To isolate the extent to which antigenic fitness contributes to clade success and decline, we extend work by Luksza and Lässig (2014) to build a simple model that attempts to predict clade growth rates based on two variables: the antigenic fitness of the clade at time $t$, and a timeinvariantfree parameter representing the intrinsic fitness of the serotype the clade belongs to. We estimate the antigenic fitness of clade $i$ at time $t$ as a function of its antigenic distance from each viral clade $j$ that has circulated in the same population over the previous 2 years, weighted by the relative frequency of $j$ and adjusted for waning population immunity (Figure 5B; Materials and methods, Equation 610). Growth rates are estimated based on a 2year sliding window (Figure 5C).
This simple model explains 54.7% of the observed variation in serotype growth rates, and predicts serotype growth vs. decline correctly for 66.0% of predictions (Figure 5D). This suggests that antigenic fitness is a major driver of serotype population dynamics. This also demonstrates that this model captures key components of dengue population dynamics; examining the formulation of this model in more detail can yield insights into how antigenic relationships influence DENV population composition. The fitness model includes six free parameters that are optimized such that the model most accurately reproduces the observed fluctuations in DENV population composition (minimizing the RMSE of frequency predictions, see Materials and methods). We find that serotype fluctuations are consistent with a model wherein population immunity wanes linearly over time, with the probability of protection dropping by about 63% per year for the first 2 years after primary infection. This model assumes no fundamental difference between homotypic and heterotypic reinfection; rather, homotypic immunity is assumed to wane at the same rate as heterotypic immunity, but starts from a higher baseline of protection based on closer antigenic distances. We also find that these dynamics are best explained by intrinsic fitness that moderately varies by serotype (Table 2); we are not aware of any literature that directly addresses this observation via competition experiments. However, intrinsic fitness alone is unable to predict serotype dynamics (Table 3) and relative strength of antigenic fitness and intrinsic fitness are approximately matched in determining overall serotype fitness.
Antigenic novelty also partially predicts genotype success
To estimate how well antigenic fitness predicts genotype dynamics, we used the same model to predict genotype success and decline. As before, fitness of genotype $i$ is based on the intrinsic fitness of the serotype $i$ belongs to, and the antigenic distance between $i$ and each other genotype, $j$, that has recently circulated (Equation 610). For genotypes, we can calculate antigenic distance between $i$ and $j$ at either the serotype level or the genotype level. In the ‘interserotype model’, we treat each serotype as antigenically uniform, and assign the mean serotypelevel antigenic distances to all pairs of constituent genotypes. In the ‘intergenotype model’, we incorporate the observed withinserotype heterogeneity, and use the mean genotypelevel antigenic distances (as shown in Figure 4). If withinserotype antigenic heterogeneity contributes to genotype fitness, then we would expect estimates of antigenic fitness based on the ‘intergenotype model’ to better predict genotype growth rates.
We find that antigenic fitness contributes to genotype turnover, although it explains less of the observed variation than for serotypes. As for serotypes, intrinsic fitness alone was unable to predict genotype turnover (Table 3). When antigenic distance is estimated from the ‘interserotype model’, we find that our model of antigenic fitness explains approximately 28.6% of the observed variation in genotype growth rates, and correctly predicts genotype growth vs. decline 66.6% of the time (Figure 6C). Perhaps surprisingly, more precise estimates of antigenic distance between genotypes from the ‘intergenotype model’ does not improve our predictions of genotype success (${R}^{2}=0.254$, 61.0% accuracy; Figure 6D, Table 3). This suggests that although we find strong evidence that genotypes vary in their ability to escape neutralizing antibodies, these differences are subtle enough that they do not impact broadscale regional dynamics over time.
Discussion
Withinserotype antigenic heterogeneity
We show that mapping antigenic change to specific mutations and interpolating across the DENV alignment is able to explain a large majority of the observed variation in antigenic phenotypes, as measured by neutralization titers. We identify 49 specific mutations and 4 colinear mutation clusters that contribute to antigenic variation, of which 27 mutations or mutation clusters have an antigenic impact of 0.20 log${}_{2}$ titers or greater. These mutations span all major domains of E, and occur both within and between serotypes. This demonstrates that DENV antigenic divergence is closely coupled to genetic divergence. We use these mutations to infer unmeasured antigenic relationships between viruses, revealing substantial withinserotype antigenic variation. For comparison, we reconstructed the ancestral sequence of each serotype and constrained the model to only permit antigenic change to be attributed to these serotypelevel differences. While this interserotypeonly model predicts titers to a reasonable degree, we find that it has higher error (RMSE = 0.86) than the full model which accounts for withinserotype heterogeneity (RMSE = 0.79; Table 4). This supports and expands upon previous reports (Katzelnick et al., 2015; Forshey et al., 2016; Waggoner et al., 2016) that the null hypothesis of antigenically uniform serotypes is inconsistent with observed patterns of crossprotection and susceptibility.
Consistent with the relatively long timescale of dengue evolution, we observe many sites in the dengue phylogeny to have mutated multiple times. These represent instances of parallelism, reversion and homoplasy. For example, we observe that site 390 is consistently S in DENV1, N in DENV3 and H in DENV4, while DENV2 genotypes show a mixture of D, N and S (Figure 2—figure supplement 1). We estimate an antigenic impact of 0.18 log${}_{2}$ titers of the N390S mutation. Our model predicts that the parallel N390S mutations in DENV1 and DENV2 Cosmopolitan makes these viruses slightly more antigenically similar rather than more antigenically distinct. Along these lines, we compared the ‘substitution’ model to a similar model formulation (termed the ‘tree’ model) which assigns ${d}_{m}$ values to individual branches in the phylogeny, rather than to individual mutations, so that each branch with a positive ${d}_{m}$ value increases antigenic distance between strains (Neher et al., 2016). As expected from the high degree of homoplasy across the dengue phylogeny, we observe that the ‘substitution’ model outperforms the ‘tree’ model in predicting titers in validation datasets (Table 3).
To investigate the impact of this observed variation, we examine patterns of neutralization in response to vaccination with each monovalent component of the NIH vaccine candidate (Figure 3). Here, we see that each monovalent component elicits broad homotypic protection, but levels of heterotypic cross protection vary widely between heterotypic genotypes. This is consistent with previous reports of genotypespecific interactions between standing population immunity and subsequent heterotypic epidemics as modulating epidemic severity (OhAinle et al., 2011; Kochel et al., 2002). We hypothesize that this observed withinserotype variation primarily effects heterotypic secondary infection outcomes, rather than modulating homotypic immunity. Although we note that Juraska et al. (2018) demonstrate that vaccine efficacy decreases with increasing amino acid divergence of breakthrough infections from the vaccine insert.
Overall, we expect that these antigenic phenotypes represent a lower bound on the extent, magnitude, and nature of antigenic heterogeneity with DENV. Our current titer dataset spans the breadth of DENV diversity, but due to small sample size, it lacks the resolution to detect most subgenotype antigenic variation. The appearance of the deep antigenic divergence of the four serotypes, and the more recent antigenic divergences within each serotype, suggest that DENV antigenic evolution is likely an ongoing, although gradual, process. We therefore expect that future studies with richer datasets will find additional antigenic variation within each genotype. This dataset also contains many leftcensored titer values, where we know two viruses are at least $T$ titer units apart, but do not know exactly how far apart. If we knew the true value of these censored titers, many of them would indicate larger antigenic distances than the reported values, $T$, which are used to train the model. Thus, it is likely that our model systematically underestimates the magnitude of titer distances.
Finally, antibody neutralization and escape (as measured by PRNT titers) is only one component of the immune response to DENV. Although analysis of a longitudinal cohort study shows that these neutralization titers correlate with protection from severe secondary infection, it is unclear how PRNT titers correspond to antibodydependent enhancement (Katzelnick et al., 2016). It is also important to note that DENV case outcomes are partially mediated by interactions with innate and Tcell immunity, the effects of which are not captured in neutralization titers (Green et al., 2014). Overall, while richer datasets and the development of more holistic assays will be required in order to fully characterize the extent of DENV antigenic diversity, it is clear that the fourserotype model is insufficient to explain DENV antigenic evolution.
Viral clade dynamics
We use these inferred antigenic relationships to directly quantify the impact of antigenic fitness on DENV population composition. To do so, we measure serotype frequencies across Southeast Asia over time and construct a model to estimate how they will fluctuate (Materials and methods, Equation 616). This model places a fitness value on each serotype that derives from a constant intrinsic component alongside a timedependent antigenic component. Antigenic fitness declines with population immunity, which is accumulated via the recent circulation of antigenically similar viruses. Our primary model parameterization assumes that both heterotypic and homotypic immunity wane linearly over time at the same rate, with homotypic immunity starting from a higher baseline of protection based on closer antigenic distances. We compared this to a secondary model parameterization with only heterotypic waning (see Materials and methods), under which we observe similar model performance (Table 3).
We find that antigenic fitness is able to explain much of the observed variation in serotype growth and decline (Figure 5). Forward simulations under the optimized parameter set display damped oscillations around the serotypespecific ‘set points’ determined by intrinsic fitnesses, but intrinsic fitness alone is unable to explain serotype fluctuations (${R}^{2}=0.04$; Table 3, Figure 5—figure supplement 2). This demonstrates that although intrinsic fitness plays an important role in dictating longterm dynamics, wherein particular serotypes tend to circulate at low frequency (e.g. DENV4) and others at high frequency (e.g. DENV1 and DENV2), antigenic fitness plays out on shorterterm time scales, dictating circulation over several subsequent years.
We similarly use this model to quantify the effect of withinserotype antigenic variation on the success and decline of canonical DENV genotypes (Figure 6). As above, genotype antigenic fitness declines with population immunity. Here, we estimate population immunity based on antigenic distance from recently circulating genotypes, using distances that are either genotypespecific or based only on the serotype that each genotype belongs to. We then directly compare how strongly these coarser serotypelevel versus specific genotypelevel antigenic relationships impact DENV population dynamics. Overall, we find that antigenic fitness explains a moderate portion of the observed variation in genotype growth and decline. Surprisingly, however, we find that incorporating withinserotype antigenic differences does not improve our predictions (Figure 6C–D). We suggest two possible explanations for this observation.
First, it may be that although genotypes are antigenically diverse, these differences do not influence largescale regional dynamics over time. We may then hypothesize that withinserotype antigenic heterogeneity mediates disease severity, but does not influence infection or onward transmission. This hypothesis is consistent with the findings of Nagao and Koelle (2008), who demonstrated that dengue epidemiological dynamics are compatible with a model wherein immunity confers protection against severe symptoms, but not asymptomatic infection. This is also consistent with Ten Bosch et al. (2018)’s findings that asymptomatic dengue infections contribute to onward transmission.
Alternatively, this lack of signal could be methodologically explained if either (A) genotypelevel frequency trajectories estimated from public data are overly noisy for this application or (B) our model of antigenic fitness based on PRNT assay data does not match reality, due to either PRNT assay data not well reflecting human immunity or due to our particular model formulation that parameterizes immunity from titer distances (Equation 610). In the present analysis, we are not able to firmly resolve these disparate possibilities.
These observations are also subject to caveats imposed by the available data and model assumptions. Our estimates of antigenic fitness are informed by the antigenic distances inferred by the substitution model; thus, as above, we are unable to account for nuanced antigenic differences between subgenotype clades of DENV due to limited titer data. We estimate DENV population composition over time based on available sequence data, pooled across all of Southeast Asia (Materials and methods, Equation 4). As the vast majority of cases of DENV are asymptomatic, sequenced viruses likely represent a biased sample of more severe cases from urban centers where patients are more likely to seek and access care. We also assume that Southeast Asia represents a closed viral population with homogeneous mixing. However, increasing globalization likely results in some amount of viral importation that is not accounted for in this model (Allicock et al., 2012). Finally, although Southeast Asia experiences hyperendemic DENV circulation, the majority of DENV transmissions are hyperlocal (Salje et al., 2017), and viral populations across this broad region may not mix homogeneously each season. Thus, it is possible that these subserotype antigenic differences impact finerscale population dynamics, but we lack the requisite data to examine this hypothesis.
Conclusions
We find that withinserotype antigenic evolution helps explain observed patterns of crossneutralization among dengue genotypes. We also find that serotypelevel population immunity is a strong determinant of viral clade dynamics across Southeast Asia. As richer datasets become available, future studies that similarly combine viral genomics, functional antigenic characterization, and population modeling have great potential to improve our understanding of how DENV evolves antigenically and moves through populations.
Model sharing and extensions
We have provided all source code, configuration files and datasets at https://github.com/blab/dengueantigenicdynamics (copy archived at https://github.com/elifesciencespublications/dengueantigenicdynamics), and wholeheartedly encourage other groups to adapt and extend this framework for further investigation of DENV antigenic evolution and population dynamics.
Materials and methods
Data
Titers
Request a detailed protocolAntigenic distance between pairs of viruses $i$ and $j$ is experimentally measured using a neutralization titer, which measures how well serum drawn after infection with virus $i$ is able to neutralize virus $j$ in vitro (Russell and Nisalak, 1967). Briefly, twofold serial dilutions of serum $i$ are incubated with a fixed concentration of virus $j$. Titers represent the lowest serum concentration able to neutralize 50% of virus, and are reported as the inverse dilution. We used two publicly available plaque reduction neutralization titer (PRNT50) datasets generated by Katzelnick et al. (2015). The primary dataset was generated by infecting each of 36 nonhuman primates with a unique strain of DENV. NHP sera was drawn after 12 weeks and titered against the panel of DENV viruses. The secondary dataset was generated by vaccinating 31 human trial participants with a monovalent component of the NIH DENV vaccine. Sera was drawn after 6 weeks and titered against the same panel of DENV viruses. As discussed in Katzelnick et al., these two datasets show similar patterns of antigenic relationships between DENV viruses. In total, our dataset includes 1182 measurements across 48 virus strains: 36 of these were used to generate serum, and 47 were used as test viruses.
Sequences
Request a detailed protocolFor the titer model analysis, we used the full sequence of E (envelope) from the 48 strains in the titer dataset.
For the clade frequencies analysis, we downloaded all dengue genome sequences available from the Los Alamos National Lab Hemorrhagic Fever Virus Database as of March 7, 2018, that contained at least the full coding sequence of E (envelope) (total N = 12,645) (Kuiken et al., 2012). We discarded sequences which were putative recombinants, duplicates, lab strains, or which lacked an annotated sampling location and/or sampling date. We selected all remaining virus strains that were annotated as a Southeast Asian isolate (total N = 8,644).
For both datasets, we used the annotated reference dataset from Pyke et al. (2016) to assign sequences to canonical genotypes.
Alignments and trees
Request a detailed protocolWe used MAFFT v7.305b to align nucleotide E gene sequences for each strain before translating the aligned sequences (no frameshift indels were present) (Katoh and Standley, 2013). All maximum likelihood phylogenies were constructed with IQTREE version 1.6.8 and the GTR + I + G15 nucleotide substitution model (Nguyen et al., 2015).
Titer model
Request a detailed protocolWe compute standardized antigenic distance between virus $i$ and serum $j$ (denoted ${D}_{ij}$) from measured titers ${T}_{ij}$ relative to autologous titers ${T}_{ii}$, such that
We then average normalized titers across individuals. To predict unmeasured titers, we employ the ‘substitution model’ from Neher et al. (2016) and implemented in Nextstrain (Hadfield et al., 2018), which assumes that antigenic evolution is driven by underlying genetic evolution.
In the substitution model, observed titer drops are mapped to mutations between each serum and test virus strain after correcting for overall virus avidity, ${v}_{i}$, and serum potency, ${p}_{j}$ (‘row’ and ‘column’ effects, respectively), so that
where ${d}_{m}$ is the titer drop assigned to each mutation, $m$, between serum $i$ and virus $j$, and $m$ iterates over mutations. We randomly withhold 10% of titer measurements as a test set. We use the remaining 90% of titer measurements as a training set to learn values for virus avidity, serum potency, and mutation effects. As in Neher et al. (2016), we formulate this as a convex optimization problem and solve for these parameter values to minimize the cost function
We used $\lambda =3.0$, $\kappa =0.6$, and $\delta =1.2$ to minimize test error. Respectively, these terms represent the squared training error; an L1 regularization term on mutation effects, such that most values of ${d}_{m}=0$; and L2 regularization terms on virus avidities and serum potencies, such that they are normally distributed. These parameter values are then used to predict the antigenic distance between all pairs of viruses, $i$ and $j$. We assess performance by comparing predicted to known titer values in our test data set, and present test error (aggregated from 100fold Monte Carlo crossvalidation) throughout the manuscript.
Viral clade dynamics
Empirical clade frequencies
Request a detailed protocolAs discussed in Neher et al. (2016) and Lee et al. (2018), we estimate empirical clade frequencies from 1970 to 2015 based on observed relative abundances of each clade in the ‘slice’ of the phylogeny corresponding to each quarterly timepoint.
Briefly, the frequency trajectory of each clade in the phylogeny is modeled according to a Brownian motion diffusion process discretized to 3month intervals. Relative to a simple Brownian motion, the expectation includes an ‘inertia’ term that adds velocity to the diffusion and the variance includes a term $x(1x)$ to scale variance according to frequency following a WrightFisher population genetic process. This results in the diffusion process
with ‘volatility’ parameter ${\sigma}^{2}$ and inertia parameter $\u03f5$. The term $dx$ is the increment in the previous timestep, so that $dx=x(t)x(tdt)$. We used $\u03f5=0.7$ and $\sigma =2.0$ to maximize fit to empirical trajectory behavior.
We also include an Bernoulli observation model for clade presence/absence among sampled viruses at timestep $t$. This observation model follows
where $v\in V$ represents the set of viruses that belong to the clade and $v\notin V$ represents the set of viruses that do not belong to the clade. Each frequency trajectory is estimated by simultaneously maximizing the likelihood of the process model and the likelihood of the observation model via adjusting frequency trajectory $\overrightarrow{x}=({x}_{1},\mathrm{\dots}{x}_{n})$.
Population immunity
Request a detailed protocolFor antigenically diverse pathogens, antigenic novelty represents a fitness advantage (Lipsitch and O'Hagan, 2007). This means that viruses that are antigenically distinct from previously circulating viruses are able to access more susceptible hosts, allowing the antigenically novel lineage to expand. We adapt a simple deterministic model from Luksza and Lässig (2014) to directly quantify dengue antigenic novelty and its impact on viral fitness. We quantify population immunity to virus $i$ at time $t$, ${P}_{i}(t)$, as a function of which clades have recently circulated in the past $N$ years, and how antigenically similar each of these clades is to virus $i$, so that
where ${D}_{ij}$ is the antigenic distance between $i$ and each nonoverlapping clade $j$, $n$ is the number of years since exposure, and ${x}_{j}(tn)$ is the relative frequency of $j$ at year $tn$. Waning immunity is modeled as a nonnegative linear function of time following
The relationship between antigenic distance and the probability of protection, $C$, is also assumed to be nonnegative and linear with slope $\sigma $, such that
In addition to this primary analysis, we conducted a secondary analysis with a different parameterization of immunity that removes waning of homotypic immunity while allowing waning of heterotypic immunity. In this case, we assume the relationship between antigenic distance and the probability of protection, $C$, to be 50% at antigenic distance $1/\sigma $ and to wane based on years since infection $n$ modified by ${\gamma}_{\mathrm{het}}$ following
We model the effects of population immunity, ${P}_{i}(t)$, on viral antigenic fitness, ${f}_{i}(t)$, as
where $\beta $ and ${f}_{0}$ are fit parameters representing the slope of the linear relationship between immunity and fitness, and the intrinsic relative fitness of each serotype, respectively.
Frequency predictions
Request a detailed protocolSimilar to the model implemented in Luksza and Lässig (2014), we estimate predicted clade frequencies at time $t+dt$ as
for shortterm predictions (where $dt<1$ year).
We do not attempt to predict future frequencies for clades with ${x}_{i}(t)<0.05$.
For longterm predictions, we must account for immunity accrued at each intermediate timepoint between $t$ and $dt$. We divide the interval between $t$ and $dt$ into 3 month timepoints, $[t+u,t+2u,\mathrm{\dots},t+U]$, such that $U=dt$. We then compound immunity based on predicted clade frequencies at each intermediate timepoint following
We then calculate clade growth rates, defined as the foldchange in relative clade frequency between time $t$ and time $t+dt$
Null models
Request a detailed protocolTo quantify the impact of antigenic fitness on DENV clade success, we compare our antigenicallyinformed model to two null models.
Under the ‘equal fitness null’ model, all viruses have equal total fitness (antigenic and intrinsic fitness) at all timepoints
Under the ‘intrinsic fitness null’ model, all viruses have equal antigenic fitness but serotypespecific intrinsic fitness at all timepoints
Model performance assessment and parameter fitting
Request a detailed protocolWe assess predictive power as the root mean squared error between predicted and empirical clade frequencies. To assess both the final frequency predictions and the predicted clade trajectories, this RMSE includes error for each clade, for each starting timepoint $t$, and for each intermediate predicted timepoint $t+u$.
Our frequency prediction model has a total of six free parameters. We jointly fit these parameters to minimize RMSE of serotype frequency predictions via the NelderMead algorithm as implemented in SciPy v.1.0.0 (Table 2) (Jones et al., 2001; Gao and Han, 2012). We use $N=2$ years of previous immunity that contribute to antigenic fitness and project $dt=2$ years in the future when predicting clade frequencies.
Simulations
Request a detailed protocolTo ensure the model machinery functions correctly, we seeded a forward simulation of clade dynamics with 2 years of empirical frequencies and simulated predicted dynamics over the remainder of the time course (Figure 5—figure supplement 3). We then fit model parameters as described above, and obtained parameter values that well recover input values (Table 5).
Data and software availability
Request a detailed protocolSequence and titer data, as well as all source code used for analyses and figure generation, is publicly available at https://github.com/blab/dengueantigenicdynamics (copy archived at https://github.com/elifesciencespublications/dengueantigenicdynamics). Our work relies upon many open source Python packages and software tools, including iPython (Perez and Granger, 2007), Matplotlib (Hunter, 2007), Seaborn (Waskom, 2017), Pandas (McKinney, 2010), CVXOPT (Andersen et al., 2013), NumPy (van der Walt et al., 2011; Gao and Han, 2012), Biopython (Cock et al., 2009), SciPy (Jones et al., 2001), Statsmodels (Seabold and Perktold, 2010), Nextstrain (Hadfield et al., 2018), MAFFT (Katoh and Standley, 2013), and IQTREE (Nguyen et al., 2015). Package versions are documented in the GitHub repository.
Data availability
All data, code, model implementations, analyses and figures are available via our online repository at github.com/blab/dengueantigenicdynamics (copy archived athttps://github.com/elifesciencespublications/dengueantigenicdynamics).
References

Phylogeography and population dynamics of dengue viruses in the americasMolecular Biology and Evolution 29:1533–1543.https://doi.org/10.1093/molbev/msr320

SoftwareCVXOPT: a Python package for convex optimizationAbel Ee Ucla Edu/cvxopt.

Strength and tempo of selection revealed in viral gene genealogiesBMC Evolutionary Biology 11:220.https://doi.org/10.1186/1471214811220

Incomplete protection against dengue virus type 2 Reinfection in peruPLOS Neglected Tropical Diseases 10:e0004398.https://doi.org/10.1371/journal.pntd.0004398

Implementing the NelderMead simplex algorithm with adaptive parametersComputational Optimization and Applications 51:259–277.https://doi.org/10.1007/s1058901093293

Identification of distinct antigenic determinants on dengue2 virus using monoclonal antibodiesThe American Journal of Tropical Medicine and Hygiene 31:548–555.https://doi.org/10.4269/ajtmh.1982.31.548

Analysis of repeat hospital admissions for dengue to estimate the frequency of third or fourth dengue infections resulting in admissions and dengue hemorrhagic fever, and serotype sequencesThe American Journal of Tropical Medicine and Hygiene 77:910–913.https://doi.org/10.4269/ajtmh.2007.77.910

Innate immunity to dengue virus infection and subversion of antiviral responsesJournal of Molecular Biology 426:1148–1160.https://doi.org/10.1016/j.jmb.2013.11.023

Nextstrain: realtime tracking of pathogen evolutionBioinformatics 34:4121–4123.https://doi.org/10.1093/bioinformatics/bty407

In vivo enhancement of dengue virus infection in rhesus monkeys by passively transferred antibodyJournal of Infectious Diseases 140:527–533.https://doi.org/10.1093/infdis/140.4.527

The origin, emergence and evolutionary genetics of dengue virusInfection, Genetics and Evolution 3:19–28.https://doi.org/10.1016/S15671348(03)000042

Matplotlib: a 2D graphics environmentComputing in Science & Engineering 9:90–95.https://doi.org/10.1109/MCSE.2007.55

SciPy: Open source scientific tools for PythonSciPy: Open source scientific tools for Python, http://www.scipy.org/.

MAFFT multiple sequence alignment software version 7: improvements in performance and usabilityMolecular Biology and Evolution 30:772–780.https://doi.org/10.1093/molbev/mst010

The LANL hemorrhagic fever virus database, a new platform for analyzing biothreat virusesNucleic Acids Research 40:D587–D592.https://doi.org/10.1093/nar/gkr898

Molecular evolution and phylogeny of dengue4 virusesJournal of General Virology 78:2279–2284.https://doi.org/10.1099/002213177892279

Patterns of antigenic diversity and the mechanisms that maintain themJournal of the Royal Society Interface 4:787–802.https://doi.org/10.1098/rsif.2007.0229

Challenges in dengue research: a computational perspectiveEvolutionary Applications 11:516–533.https://doi.org/10.1111/eva.12554

Natural, persistent oscillations in a spatial multistrain disease system with application to denguePLOS Computational Biology 9:e1003308.https://doi.org/10.1371/journal.pcbi.1003308

ConferenceData structures for statistical computing in PythonProceedings of the 9th Python in Science Conference. pp. 51–56.

On the risk of severe dengue during secondary infection: a systematic review coupled with mathematical modelingJournal of Vector Borne Diseases 51:153.

IQTREE: a fast and effective stochastic algorithm for estimating maximumlikelihood phylogeniesMolecular Biology and Evolution 32:268–274.https://doi.org/10.1093/molbev/msu300

Dynamics of dengue disease severity determined by the interplay between viral genetics and serotypespecific immunityScience Translational Medicine 3:114ra128.https://doi.org/10.1126/scitranslmed.3003084

Reduced risk of disease during postsecondary dengue virus infectionsThe Journal of Infectious Diseases 208:1026–1033.https://doi.org/10.1093/infdis/jit273

IPython: a system for interactive scientific computingComputing in Science & Engineering 9:21–29.https://doi.org/10.1109/MCSE.2007.53

Interactions between serotypes of dengue highlight epidemiological impact of crossimmunityJournal of the Royal Society Interface 10:20130414.https://doi.org/10.1098/rsif.2013.0414

Dengue virus identification by the plaque reduction neutralization testJournal of Immunology 99:291–296.

Research on dengue during world war IIThe American Journal of Tropical Medicine and Hygiene 1:30–50.https://doi.org/10.4269/ajtmh.1952.1.30

Risk factors in dengue shock syndrome: a prospective epidemiologic study in Rayong, Thailand. I. the 1980 outbreakAmerican Journal of Epidemiology 120:653–669.https://doi.org/10.1093/oxfordjournals.aje.a113932

ConferenceStatsmodels: econometric and statistical modeling with Python9th Python in Science Conference.

The global burden of dengue: an analysis from the global burden of disease study 2013The Lancet Infectious Diseases 16:712–723.https://doi.org/10.1016/S14733099(16)000268

Regression shrinkage and selection via the lassoJournal of the Royal Statistical Society: Series B (Methodological) 58:267–288.https://doi.org/10.1111/j.25176161.1996.tb02080.x

Inferring the rate and timescale of dengue virus evolutionMolecular Biology and Evolution 20:122–129.https://doi.org/10.1093/molbev/msg010

The NumPy array: a structure for efficient numerical computationComputing in Science & Engineering 13:22–30.https://doi.org/10.1109/MCSE.2011.37

Homotypic dengue virus reinfections in nicaraguan childrenJournal of Infectious Diseases 214:986–993.https://doi.org/10.1093/infdis/jiw099
Article and author information
Author details
Funding
National Science Foundation (DGE1256082)
 Sidney M Bell
Pew Charitable Trusts
 Trevor Bedford
National Institute of General Medical Sciences (R35GM11977401)
 Trevor Bedford
National Institute of Allergy and Infectious Diseases (R01AI11470301)
 Leah Katzelnick
National Institute of Allergy and Infectious Diseases (P01AI106695)
 Leah Katzelnick
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Richard Neher, John Huddleston, Andrew Rambaut, Molly OhAinle, David Shaw, Paul Edlefsen, Michal Juraska, and all members of the Bedford Lab for useful discussion and advice. SB is a Graduate Research Fellow and is supported by NSF DGE1256082. TB is a Pew Biomedical Scholar and is supported by NIH R35 GM11977401. LK is supported by NIH awards R01AI11470301 and P01AI106695. Our work depends on open data sharing and many open source software tools. We gratefully acknowledge the authors and developers who make our work possible.
Version history
 Received: October 2, 2018
 Accepted: August 5, 2019
 Accepted Manuscript published: August 6, 2019 (version 1)
 Version of Record published: September 6, 2019 (version 2)
 Version of Record updated: April 5, 2023 (version 3)
Copyright
© 2019, Bell et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 3,253
 views

 463
 downloads

 37
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Evolutionary Biology
A comprehensive census of McrBC systems, among the most common forms of prokaryotic Type IV restriction systems, followed by phylogenetic analysis, reveals their enormous abundance in diverse prokaryotes and a plethora of genomic associations. We focus on a previously uncharacterized branch, which we denote coiledcoil nuclease tandems (CoCoNuTs) for their salient features: the presence of extensive coiledcoil structures and tandem nucleases. The CoCoNuTs alone show extraordinary variety, with three distinct types and multiple subtypes. All CoCoNuTs contain domains predicted to interact with translation system components, such as OBfolds resembling the SmpB protein that binds bacterial transfermessenger RNA (tmRNA), YTHlike domains that might recognize methylated tmRNA, tRNA, or rRNA, and RNAbinding Hsp70 chaperone homologs, along with RNases, such as HEPN domains, all suggesting that the CoCoNuTs target RNA. Many CoCoNuTs might additionally target DNA, via McrC nuclease homologs. Additional restriction systems, such as Type I RM, BREX, and Druantia Type III, are frequently encoded in the same predicted superoperons. In many of these superoperons, CoCoNuTs are likely regulated by cyclic nucleotides, possibly, RNA fragments with cyclic termini, that bind associated CARF (CRISPRAssociated Rossmann Fold) domains. We hypothesize that the CoCoNuTs, together with the ancillary restriction factors, employ an echeloned defense strategy analogous to that of Type III CRISPRCas systems, in which an immune response eliminating virus DNA and/or RNA is launched first, but then, if it fails, an abortive infection response leading to PCD/dormancy via host RNA cleavage takes over.

 Evolutionary Biology
 Neuroscience
Neuropeptides are ancient signaling molecules in animals but only few peptide receptors are known outside bilaterians. Cnidarians possess a large number of G proteincoupled receptors (GPCRs) – the most common receptors of bilaterian neuropeptides – but most of these remain orphan with no known ligands. We searched for neuropeptides in the sea anemone Nematostella vectensis and created a library of 64 peptides derived from 33 precursors. In a largescale pharmacological screen with these peptides and 161 N. vectensis GPCRs, we identified 31 receptors specifically activated by 1 to 3 of 14 peptides. Mapping GPCR and neuropeptide expression to singlecell sequencing data revealed how cnidarian tissues are extensively connected by multilayer peptidergic networks. Phylogenetic analysis identified no direct orthology to bilaterian peptidergic systems and supports the independent expansion of neuropeptide signaling in cnidarians from a few ancestral peptidereceptor pairs.