Adaptive Evolution: Can we read the future from a tree?
Darwinian evolution is a dynamical principle that connects the past and the future. According to this principle, fitness differences between the individuals in a population are an important driving force of evolution. Biologists have long used fitness effects to explain observed evolutionary changes. For good reasons, however, they have been hesitant to make predictions about the future of a species. Given the bewildering complexity of what is possible in evolution, attempting to say what will happen in a specific instance may appear futile. Moreover, we cannot test any predictions, because we have not seen the evolutionary past and will not see the future.
Recently, however, evolutionary biology is gaining predictive power in an increasing number of systems, which include viruses, bacteria and populations of cancer cells. In these systems, high mutation rates make evolution happen in front of our eyes. Every year, for example, the human influenza virus replaces 2% of the amino acids in the protein domains that interact with the immune system of its host. Using modern genome sequencing, we can now monitor the genetic history of entire populations and reconstruct their genealogical trees. Such trees show how the individuals of today's populations are connected to their evolutionary ancestors. Now, in eLife, Richard Neher, Colin Russell and Boris Shraiman investigate how much these trees can tell us about the future of a population (Neher et al., 2014).
Inferring evolutionary patterns from genealogical trees has a long history. Geneticists use probabilistic methods to map mutations onto specific tree branches (Figure 1A). Counting how often these mutations appear in different lineages tells us which fitness effects are predominant in a population (McDonald and Kreitman, 1991; Strelkowa and Lässig, 2012). From the statistics of the genealogical tree itself, epidemiologists infer the growth rate of pathogen populations and use that information to predict the future course of an epidemic (Figure 1B, Stadler, 2010). Neher, Russell and Shraiman—who are at the Max Planck Institute for Developmental Biology, the University of Cambridge, and the University of California at Santa Barbara, respectively—extend this genealogy-based inference to genetic changes within a population (Figure 1C). This required developing new ways to extract information from genealogical trees: predictions must now be made for clades of genetically similar individuals, so we need a model that captures growth rate differences between different clades within one genealogical tree.
To meet this challenge, Neher and colleagues build on a formalism that is rooted in statistical physics and has become a major new development in population genetics (Tsimring et al., 1996; Rouzine et al., 2003; Desai and Fisher, 2007). The basic idea is simple. Given that fitness differences within a population are carried by genetic mutations, we can imagine splitting each mutation and its fitness effect into ever-smaller pieces. This leads to a model in which the overall fitness variation of a population is made up of many small-effect mutations. By the law of large numbers, the fitness distribution then becomes bell-shaped. Such distributions are called travelling fitness waves (Tsimring et al., 1996). In a given lineage, the accumulation of many small fitness effects follows a diffusive random walk. This picture applies to fast adaptive processes in asexual populations where the expansion of a successful clade is fuelled by multiple beneficial mutations—for example, when viruses evolve to escape their hosts' immune defences.
Neher and colleagues link their fitness wave model to simpler heuristic measures of growth, which can easily be used to analyse data from a large genealogical tree. Specifically, they look at the local tree ‘volume’ λ(τ), which sums all tree branches in the vicinity of a given node with a discounting scale τ. This quantity provides a (nonlinear) measure of how fast the number of individuals grows around that node. For example, in a subtree growing exponentially with rate r, the volume λ(τ) equals simply τ/(1–τr). By interpreting this growth rate as fitness, Neher and colleagues obtain a measure of fitness differences between clades. A substantial fraction of the local tree volume is generated by small-effect mutations ‘hitch-hiking’ in successful clades (for example synonymous mutations, which do not change a protein). This explains why the local tree volume is closely related to fitness measures used in previous prediction schemes (Łuksza and Lässig, 2014).
The key strength of this method is that it uses only the information contained in a genealogical tree. Thus, it can be applied in cases where we do not know which functions undergo adaptive evolution or where in the genome they are encoded. This feature is also important for interpreting the results: genealogy-based inference reveals growth rate differences within a population sample, but it remains agnostic about their cause. In the fitness wave model, adaptive evolution is that cause, but the demographic structure of a population or variations in sampling density may generate a similar signal in tree data.
Neher and colleagues apply their method to predict the evolution of the human influenza virus A/H3N2. This is a challenging problem: one year in advance, we need to forecast the prevalent clades circulating in a given winter season. Despite the simplicity of their method, Neher and colleagues predict the ancestor sequence of next year's clades with remarkable accuracy for the majority of northern winters between 1995 and 2013.
We do not yet know in detail how the genetic evolution of the influenza virus is related to its interactions with the human immune system. These ‘antigenic’ properties determine how effective influenza vaccines are. They depend on a smaller number of mutations, some of which have individually large effects (Koel et al., 2013). Thus, prediction schemes geared towards antigenic properties must go beyond examining the overall sequence genealogies and weigh mutations by their antigenic effect (Bedford et al., 2014; Łuksza and Lässig, 2014).
Altogether, as Neher and colleagues show, current predictions reach about halfway between random picks and optimal predictions. This poses big conceptual and practical questions: How much can future methods improve on that score? And where does the inherent unpredictability of evolution start? Prediction is the ultimate test of any dynamical principle. Quantitative evolutionary science is being put to that test now.
References
-
The solitary wave of asexual evolutionProceedings of the National Academy of Sciences of USA 100:587–592.https://doi.org/10.1073/pnas.242719299
-
Sampling-through-time in birth-death treesJournal of Theoretical Biology 267:396–404.https://doi.org/10.1016/j.jtbi.2010.09.010
-
RNA virus evolution via a fitness-space modelPhysical Review Letters 76:4440–4443.https://doi.org/10.1103/PhysRevLett.76.4440
Article and author information
Author details
Publication history
Copyright
© 2014, Lässig and Łuksza
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,863
- views
-
- 148
- downloads
-
- 2
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Chromosomes and Gene Expression
- Evolutionary Biology
Gene regulation is essential for life and controlled by regulatory DNA. Mutations can modify the activity of regulatory DNA, and also create new regulatory DNA, a process called regulatory emergence. Non-regulatory and regulatory DNA contain motifs to which transcription factors may bind. In prokaryotes, gene expression requires a stretch of DNA called a promoter, which contains two motifs called –10 and –35 boxes. However, these motifs may occur in both promoters and non-promoter DNA in multiple copies. They have been implicated in some studies to improve promoter activity, and in others to repress it. Here, we ask whether the presence of such motifs in different genetic sequences influences promoter evolution and emergence. To understand whether and how promoter motifs influence promoter emergence and evolution, we start from 50 ‘promoter islands’, DNA sequences enriched with –10 and –35 boxes. We mutagenize these starting ‘parent’ sequences, and measure gene expression driven by 240,000 of the resulting mutants. We find that the probability that mutations create an active promoter varies more than 200-fold, and is not correlated with the number of promoter motifs. For parent sequences without promoter activity, mutations created over 1500 new –10 and –35 boxes at unique positions in the library, but only ~0.3% of these resulted in de-novo promoter activity. Only ~13% of all –10 and –35 boxes contribute to de-novo promoter activity. For parent sequences with promoter activity, mutations created new –10 and –35 boxes in 11 specific positions that partially overlap with preexisting ones to modulate expression. We also find that –10 and –35 boxes do not repress promoter activity. Overall, our work demonstrates how promoter motifs influence promoter emergence and evolution. It has implications for predicting and understanding regulatory evolution, de novo genes, and phenotypic evolution.
-
- Evolutionary Biology
Spatial patterns in genetic diversity are shaped by individuals dispersing from their parents and larger-scale population movements. It has long been appreciated that these patterns of movement shape the underlying genealogies along the genome leading to geographic patterns of isolation-by-distance in contemporary population genetic data. However, extracting the enormous amount of information contained in genealogies along recombining sequences has, until recently, not been computationally feasible. Here, we capitalize on important recent advances in genome-wide gene-genealogy reconstruction and develop methods to use thousands of trees to estimate per-generation dispersal rates and to locate the genetic ancestors of a sample back through time. We take a likelihood approach in continuous space using a simple approximate model (branching Brownian motion) as our prior distribution of spatial genealogies. After testing our method with simulations we apply it to Arabidopsis thaliana. We estimate a dispersal rate of roughly 60 km2/generation, slightly higher across latitude than across longitude, potentially reflecting a northward post-glacial expansion. Locating ancestors allows us to visualize major geographic movements, alternative geographic histories, and admixture. Our method highlights the huge amount of information about past dispersal events and population movements contained in genome-wide genealogies.