Adaptation dynamics between copy-number and point mutations
Abstract
Together, copy-number and point mutations form the basis for most evolutionary novelty, through the process of gene duplication and divergence. While a plethora of genomic data reveals the long-term fate of diverging coding sequences and their cis-regulatory elements, little is known about the early dynamics around the duplication event itself. In microorganisms, selection for increased gene expression often drives the expansion of gene copy-number mutations, which serves as a crude adaptation, prior to divergence through refining point mutations. Using a simple synthetic genetic reporter system that can distinguish between copy-number and point mutations, we study their early and transient adaptive dynamics in real time in Escherichia coli. We find two qualitatively different routes of adaptation, depending on the level of functional improvement needed. In conditions of high gene expression demand, the two mutation types occur as a combination. However, under low gene expression demand, copy-number and point mutations are mutually exclusive; here, owing to their higher frequency, adaptation is dominated by copy-number mutations, in a process we term amplification hindrance. Ultimately, due to high reversal rates and pleiotropic cost, copy-number mutations may not only serve as a crude and transient adaptation, but also constrain sequence divergence over evolutionary time scales.
Editor's evaluation
This is an important paper that proposes a novel evolutionary mechanism by which copy-number mutations can slow down the accumulation of point mutations in populations evolving in certain environments. The authors use an evolution experiment in bacteria equipped with a clever reporter system to provide convincing evidence that this mechanism indeed operates. This paper will be of broad interest to readers in evolutionary biology and related fields.
https://doi.org/10.7554/eLife.82240.sa0Introduction
Adaptive evolution proceeds by selection acting on mutations, which are often implicitly equated with point mutations, that is, changes to a single nucleotide in the DNA sequence. However, nature is full of different types of bigger-scale mutations, such as mutations to the copy-number of genomic regions ranging from only a few base pairs up to half a bacterial chromosome (Anderson and Roth, 1977; Darmon and Leach, 2014). The specific properties of mutations, such as their rate of formation and reversal, might influence the evolutionary dynamics in major ways, but are rarely considered.
In bacteria, which are our focus, the duplication of genes or genomic regions occurs orders of magnitude more frequently than point mutations, ranging from 10–6 up to 10–2 per cell per generation (Roth, 1988; Drake et al., 1998; Andersson and Hughes, 2009; Elez et al., 2010; Reams and Roth, 2015). Moreover, while duplications can form via different mechanisms, they all are genetically unstable (Andersson and Hughes, 2009); the repeated stretch of DNA sequence is prone to recA-dependent homologous recombination. At rates between 10–3 and 10–1 per cell per generation duplications will reverse to the single copy (deletion) or duplicate further (amplification) (Roth, 1988; Andersson and Hughes, 2009; Pettersson et al., 2009; Reams and Roth, 2015; Tomanek et al., 2020). Amplification of a gene or genomic region will, to a first approximation, increase its expression by means of elevated gene dosage (Elde et al., 2012; Gruber et al., 2012; Näsvall et al., 2012; Yona et al., 2015; Steinrueck and Guet, 2017; Belikova et al., 2020; Todd and Selmecki, 2020). Not surprisingly, due to their high rate of formation, gene amplifications are adaptive in situations where a rapid increase in gene expression is needed: resistance to antibiotics, pesticides or drugs via over-expression of resistance determinants (Prody et al., 1989; Albertson, 2006; Bass and Field, 2011; Nicoloff et al., 2019), immune evasion (Belikova et al., 2020), or novel metabolic capabilities through increased expression of spurious enzymatic side activities (Blount et al., 2020; Richts et al., 2021). Due to their high intrinsic rate of deletion, often combined with significant fitness cost (Bergthorsson et al., 2007; Pettersson et al., 2009; Reams et al., 2010), copy-number mutations not only differ from point mutations in their frequency of occurrence, but also in the nature of their reversibility.
Together, copy-number and point mutations are responsible for the evolution of most functional novelty of genes through the process of duplication and divergence of existing genes (Ohno, 1970; Kacser and Beeby, 1984; Conant and Wolfe, 2008; Andersson et al., 2015). Owing to the dynamic nature of gene duplication formation and reversal, the interplay between copy-number and point mutations may lead to complex evolutionary dynamics around the time point of origin of a new gene duplication event. However, so far most attention has been focused on understanding the long-lasting process of how duplicate gene pairs diverge by accumulating point mutations (Lynch and Conery, 2000; Teufel et al., 2015; Friedlander et al., 2017), while we know little about the potentially short-lived initial duplication event itself (Innan and Kondrashov, 2010). On one hand, this bias is due to significant technical challenges in studying transient copy-number variation experimentally (Andersson and Hughes, 2009; Lauer and Gresham, 2019; Belikova et al., 2020; Tomanek et al., 2020), and on the other hand, research has focused on the plethora of long-term evolutionary data that document the sequence divergence of paralogs, as ‘attention is shifted to where the data are’ (Kondrashov, 2012).
In bacteria adaptive amplification, that is, amplification as a response to selection as opposed to neutral duplication and divergence, is considered the default mode of paralog evolution (Andersson and Hughes, 2009; Treangen and Rocha, 2011; Copley, 2020) and has been conceptualized in the innovation-amplification-divergence (IAD) model (Bergthorsson et al., 2007), which was later validated by evolution experiments (Elde et al., 2012; Näsvall et al., 2012). The IAD model posits that selection for a novel enzymatic activity leads to adaptive gene amplification that increases expression of an existing enzyme if it exhibits low levels of a beneficial secondary enzymatic activity (also referred to as promiscuous functions; Aharoni et al., 2005; Tawfik, 2010; Copley, 2017). Eventually, protein sequences diverge as point mutations improve the secondary enzymatic function: a new protein function is born from an existing one. After the new (improved) function is present, superfluous additional gene copies will be lost due to their cost and high rate of reversibility, leaving only the copies of the two (ancestral and evolved) paralogs (Bergthorsson et al., 2007; Reams et al., 2010; Elde et al., 2012; Näsvall et al., 2012).
Similarly, adaptive amplification can precede the divergence of promoter sequences under selection favouring increased gene expression (Steinrueck and Guet, 2017). Thus, gene amplifications serve as a fast adaptation which can later be replaced by point mutations either within the coding region of a gene, increasing a cryptic enzymatic activity, or in its non-coding promoter region, increasing its expression (Elde et al., 2012; Näsvall et al., 2012; Yona et al., 2015; Steinrueck and Guet, 2017).
Since elevated numbers of gene copies provide an increased target for point mutations to occur (San Millan et al., 2017), it has been suggested that copy-number mutations speed up the process of divergence (Andersson and Hughes, 2009). However, if both, copy-number and point mutations, are adaptive (Gruber et al., 2012), they also have the potential to interact epistatically or due to clonal interference (Gerrish and Lenski, 1998). This interaction could result in unexpected evolutionary dynamics due to the different rates of formation and reversal of the two different mutation types.
To fill the knowledge gap that exists at around ‘time zero’ of the duplication-divergence process (Innan and Kondrashov, 2010), we designed a synthetic genetic system with which we can monitor, in real time, arising copy-number and point mutations in evolving populations of Escherichia coli. Importantly, while our results are also relevant to the divergence of paralogous protein sequences, here we study the process of divergence in a model gene promoter. Our genetic reporter system allows us to phenotypically distinguish between copy-number and point mutations, by specifically selecting for the increased expression of an existing but barely expressed gene. With our system at hand, we set out to test whether adaptive copy-number mutations facilitate or hinder adaptation by point mutation.
Results
The motivation for this work was sparked by an evolution experiment conducted in E. coli at a locus exhibiting high rates of gene amplification (Steinrueck and Guet, 2017), which failed to produce any evolved clones with point mutations and thus lead us to hypothesize that copy-number mutations may interfere with the evolution by point mutations under certain conditions.
An experimental system that distinguishes copy-number and point mutations
To study the interplay between copy-number and point mutations during adaptation, we follow the fate of a barely expressed gene during its evolution towards higher expression. Our experimental system consists of an intact endogenous galK gene of E. coli that harbours a random promoter sequence (P0) that replaces its endogenous promoter. By growing E. coli in the presence of the sugar galactose, we are selecting for increased galK expression. Adaptation to selection for increased expression can happen by two different, non-mutually exclusives ways: through increased copy-number (duplication or amplification) or through point mutations in the P0 promoter region of galK (divergence) (Tomanek et al., 2020).
Importantly, our genetic reporter system allows us to distinguish between the two mutation types. GalK is part of a chromosomal reporter gene cassette and is transcriptionally fused to a yfp gene (Figure 1A). Hence, any increases in galK expression – be it by copy-number or point mutations – can be detected as increases in YFP expression. However, only mutations to the copy-number of the entire galK locus lead to an additional increase in the expression of an independently transcribed cfp gene downstream of galK-yfp (Steinrueck and Guet, 2017; Tomanek et al., 2020; Figure 1A, Figure 1—figure supplement 1A–C). Hence, increases in yfp alone indicate the divergence of the galK promoter sequence P0 by point mutations, while increases of both fluorophores indicate copy-number mutations of the whole locus. Finally, clones with increased yfp but without point mutations in P0 would indicate the presence of a trans-acting mutation at a different locus on the chromosome or a rare amplification event occurring independent of the repeated IS elements and excluding CFP (Steinrueck and Guet, 2017; Tomanek et al., 2020). Moreover, while in principle possible, an adaptive mutation in the coding sequence of galK itself is extremely unlikely to be selected under our experimental conditions given that growth is limited only by expression of the endogenous and fully functional galactokinase enzyme.
Different substrate levels result in different enzyme expression demands
Our experimental environment consists of liquid minimal medium containing amino acids as a basic carbon and energy source, such that cells can grow even in the absence of galK expression (Figure 1B – grey line). Adding galactose to this basic medium renders galK expression highly beneficial. To characterize the relation between fitness and galK expression, we engineered a construct where the expression of galK is induced by the addition of arabinose. Growth rate increased along with galK expression and saturated at a certain expression level, which depended on the galactose medium used (Figure 1B). Thus, our system allows studying adaptation in environments with different gene expression demands: low concentrations of galactose demand a low level of galK expression (and increasing expression above this level does not add any extra benefit), while high concentrations of galactose demand a higher level of galK expression to obtain maximum growth rate. In other words, our experimental system allows selecting for different levels of improvement of a biological function (in our case increased galK expression) by growing cells in different galactose concentrations.
Evolution of galK expression in IS+ and IS- strains
Given the vast range of duplication rates observed at different chromosomal loci in bacteria (Roth, 1988; Andersson and Hughes, 2009; Elez et al., 2010; Reams and Roth, 2015), our objective was to experimentally manipulate the ability of galK to form duplications and study its effect on evolutionary dynamics. A common way to manipulate the duplication rate is by deleting the recA gene involved in homologous recombination (Goldberg and Mekalanos, 1986; Reams et al., 2010; Dhar et al., 2014). However, given its role in DNA repair, comparing recA and ΔrecA strains will be strongly influenced by the growth defects that such a mutation entails. In order to not have to consider pleiotropic effects caused by a difference in the genome-wide duplication rate, we instead compare two identical strains whose difference in duplication rate is restricted to a single genomic locus. To this end, we take advantage of a chromosomal location that is characterized by high rates of duplication and amplification due to homologous recombination occurring between two endogenous identical insertion sequences (IS) elements that flank this specific locus (Steinrueck and Guet, 2017; Tomanek et al., 2020). By deleting one copy of IS1, we generated two otherwise isogenic strains of E. coli that differ solely by the presence of one IS1 element approximately 10 kb downstream of galK (Figure 1C), and are thus predicted to show strong differences in their rates of duplication formation at this locus. In the following, we will refer to these strains as IS+ and IS-.
To understand how the duplication rate affects adaptive dynamics, we conducted an evolution experiment with 96 replicate populations of the IS+ and IS- strains (Figure 1D). Growing these populations in minimal medium containing only amino acids (control) or supplemented with three different galactose concentrations enabled us to follow adaptation to different gene expression demands (levels of selective pressure) (Figure 2A). Daily measurements of population fluorescence prior to dilution (1:820) allowed us to monitor population phenotypes roughly every 10 generations over 12 days.
The evolution experiment confirmed that the two strains differ strongly in their rate of copy-number mutations of the galK locus. The strain lacking one of the flanking IS1 elements (IS-) showed a drastic reduction in the ability to undergo galK amplification. In contrast to the IS+ strain, very few IS- populations evolved increased CFP expression (Figure 2A – red traces). Interestingly, in the IS+ strain, the number of populations amplified by the end of the experiment depended on the environment. At least twice as many populations were amplified in the low (0.01%) galactose environment compared to the other two environments (68, 19, and 34 populations for low, intermediate, and high galactose, respectively) (Figure 2—figure supplement 1A). Not only the number of amplified populations, but also the maximum CFP fluorescence attained by IS+ populations differed significantly between the low (0.01%) and higher (0.1% and 1%) galactose environments (Figure 2—figure supplement 1B). Populations, which evolved increases in CFP fluorescence, did so within 2 days and maintained this level relatively stably for the duration of the experiment. (See Figure 2—figure supplement 2A for an independent evolution experiment confirming the environment-dependent patterns of amplification.) The observed difference in the number of galK copies is consistent with the observation that the three environments select for different levels of increasing gene expression (‘levels of improvement’) (Figure 1B) and confirms that amplifications are an efficient way of tuning gene expression (Tomanek et al., 2020).
We then asked whether other differences in the nature of adaptive mutations exist between the three different environments. To get a coarse-grained overview, we plotted the YFP fluorescence of evolving populations as a proxy for galK expression against their CFP fluorescence as a proxy for galK copy-number for all time points (Figure 2B). The YFP-CFP plot shows that evolving populations exhibit qualitatively different distributions of fluorescence levels in the three different environments, indicating that adaptation has followed different trajectories.
In the absence of galactose, populations retain their ancestral fluorescence phenotype. In the lowest galactose concentration (0.01%), data points show a correlated increase between YFP and CFP fluorescence indicative of gene copy-number mutations (‘YFP+CFP+’ in Figure 2B). In the intermediate galactose concentration (0.1%) 5/96 IS- populations exhibit increased YFP fluorescence with ancestral (single-copy) CFP fluorescence indicative of promoter mutants, (‘YFP+’ fraction in Figure 2B; Figure 2—figure supplement 3A). However, sequencing the P0 region upstream of galK of these evolved clones from populations with strongly increased YFP fluorescence (‘YFP+’ fraction in Figure 2B) showed that they harboured an ancestral P0 sequence (Figure 2—figure supplement 3A). We hypothesized that the YFP+ populations carried an amplification extending into galK-yfp, yet excluding cfp. Quantitative real-time PCR confirmed our suspicion (Figure 2—figure supplement 3B). As the IS- strain cannot undergo the frequent duplication via the two flanking IS elements, it cannot access a major adaptive route available to the IS+ strain. Thus, its adaptation follows an alternative trajectory, which occurs through a repeat-independent lower-frequency duplication with junctions between yfp and cfp (Figure 2—figure supplement 3C).
While increased CFP still reliably reports on increased copy-number, the yfp-only amplification hijacks our ability to unambiguously infer ancestral copy-number from ancestral CFP fluorescence alone. As increasing CFP itself bears no adaptive benefit, populations with increased CFP must carry amplifications that also include galK. In contrast, ancestral copy-number can only be confirmed by qPCR. The fact that some populations carry IS-independent yfp-only amplifications implies that our system of fluorescence reporters will yield a slight underestimate of the number of amplified populations both in the IS+ and IS- strain. However, we were ultimately interested in the divergence of promoter sequences, and going forward relied on sequencing to unambiguously determine the presence of adaptive promoter mutations.
In the high (1%) and intermediate (0.1%) galactose environment, data points occupy an additional space (‘mixed fraction’ in Figure 2B) between the other two fractions, where both YFP and CFP are increased, but the YFP increase is larger than in the YFP+CFP+ fraction. Based on these population-level data, we hypothesized that this phenotypic space is occupied either by a population of double mutants carrying a combination of point and copy-number mutations, or by populations consisting of cells with only promoter mutations and cells with only copy-number mutations (i.e. the two mutations being mutually exclusive). Knowing the single-cell phenotype is therefore crucial for distinguishing between the two cases. Importantly, single-cell fluorescence (using FACS) recapitulated the population measurements with the YFP-CFP phenotype falling into three distinct fractions (Figure 2C).
Copy-number and point mutations occur as a combination in the intermediate and high demand environment
To understand whether copy-number and point mutations are mutually exclusive or if they occur as a combination in the IS+ strain after evolution in intermediate (0.1%) and high (1%) galactose, we determined the single-cell fluorescence of all mixed fraction populations using flow cytometry (Figure 3A–B). It is worth noting that after 12 days of evolution, cells with ancestral YFP and CFP fluorescence were still present in every single amplified population. While some populations consisted of a high fraction of cells with elevated CFP fluorescence, mutants did not yet spread to complete fixation in any of them, highlighting the fact that our experiments are capturing the transient adaptive dynamics.
Flow cytometry results showed that IS+ populations of the mixed fraction from intermediate (0.1%) galactose (Figure 3A) consisted of a single type of mutant with increased YFP/CFP fluorescence relative to the ancestral values (Figure 3C). If instead a population consisted of two mutually exclusive mutants, we would expect cells to fall into two distinct phenotypic clusters, one with only increased YFP (corresponding to the ‘YFP+’ fraction) and one with only amplifications (corresponding to the ‘YFP+CFP+’ fraction). Moreover, YFP fluorescence of the mixed fraction cells was greater than YFP for pure amplification mutants, which falls along the diagonal axis (Figure 2C – right panel), again indicating a combination of copy-number and promoter mutations. To confirm the presence of combination mutants, we randomly picked three populations of the mixed fraction. Sequencing revealed that within these populations, only amplified clones, but not clones with single-copy cfp harboured an SNP (–30T>A) in P0 (Figure 3E).
Similar to intermediate galactose, IS+ populations from the high (1%) galactose mixed fraction (Figure 3B) harboured cells with the combination mutation phenotype and, in addition, cells with pure amplifications (Figure 3D). Taken together, these data indicate that copy-number and point mutations can occur as a combination in environments with sufficiently high gene expression demand.
Copy-number and point mutations are mutually exclusive in the low demand environment
After finding combination mutants in the high galactose environments, we analysed the single-cell fluorescence of all IS+ populations from the low (0.01%) galactose environment. Surprisingly, and in contrast to the intermediate and high galactose environments, in low galactose adaptive amplification of IS+ populations happened more rapidly with the majority of populations (68/96) showing increases in CFP fluorescence during the course of the experiment (Figure 4A – left top and bottom panel, Figure 4—figure supplement 1A–B). Notably, cells of those few populations that did not follow this general trend (Figure 4A – right top and bottom panel) showed an increase in YFP without a concomitant increase in CFP. As this small increase in YFP was not visible in the initial population measurements of liquid cultures (Figure 2B), we turned to patching populations onto LB agar, a potentially more sensitive method, which alleviates changes in fluorescence related to growth rate. Imaging populations confirmed the increase in YFP for all populations with elevated YFP in single-cell measurements (Figure 4—figure supplement 2A). We first examined population B1 with clearly increased YFP more carefully by re-streaking it on LB agar (Figure 4C). Consistent with flow cytometry results (Figure 4B), we found colonies with three different fluorescence phenotypes: ancestral, increased YFP (‘YFP+’), and a small subpopulation with both, increased YFP and CFP (amplified). Sequencing of the amplified colony type confirmed it to be a bona fide amplification without additional promoter SNPs. Sequencing of the YFP+ colony type uncovered two adaptive SNPs in P0 (–30T>A and –37C>T), which were identical to a previously identified promoter mutation ‘H5’ (Figure 2—figure supplement 2B; Steinrueck and Guet, 2017; Tomanek et al., 2020).
As we failed to find combination mutants (i.e. a mixed fraction) in population measurements from the low galactose environment (Figure 2B), we used agar patches from four different time points of the evolution experiment (Figure 4—figure supplement 2A) to screen IS+ populations more comprehensively (Figure 4D). Re-streaking, sequencing and flow cytometry analysis revealed that all populations with elevated YFP and ancestral CFP (Figure 4D – red triangles) harboured either only promoter mutants or a mixed population of a few amplified cells and a majority of promoter mutants (Table 1). As opposed to high and intermediate galactose, we did not find a single population with combination mutants in low galactose. Moreover, the fact that mutations were mutually exclusive within populations was also reflected when we analysed their fate over time. Quantitative analysis of the fluorescence intensity of patched populations (Figure 4D) confirmed that populations with a significant fraction of promoter mutants (i.e. visibly YFP+ on the agar patch) did not become amplified later in the experiment. As a single exception, population F6 gained the YFP+ phenotype early, but became dominated by gene amplifications by the end of the experiment (Figure 4D – right panel, blue triangle). Nevertheless, also in this case, copy-number and point mutations did not occur in the same genetic background. Conversely, all YFP+ populations evolved exclusively from those with ancestral phenotype; no single amplified population gained a functional promoter within the time frame of the experiment (Figure 4D).
The complete absence of combination mutants in the low demand environment is consistent with the fact that only a modest increase in galK expression is necessary to reach maximal fitness (Figure 1B). Thus, while a combination of amplification and promoter point mutation evolves in response to selection for a strong increase in galK expression (intermediate and high demand environments), either mutation alone might provide a sufficient increase in gene expression to allow for maximal growth in the low demand environment. This would mean that the fitness benefit of either mutation does not add up when combined. In other words, we hypothesize that negative epistasis precludes the evolution of combination mutants in the low demand environment. Alternatively, the lack of combination mutants could be explained by clonal interference between competing adaptive amplifications and point mutations (a possibility we discuss in the last section of Results).
An increased fraction of adaptive promoter mutations is found in IS- populations evolved in the low demand environment
If copy-number mutations are more frequent than point mutations and their combination does not spread to observable frequencies in the low demand environment, we would expect divergence to proceed more slowly as compared to an intermediate or high demand environment.
To directly test this hypothesis, we estimated the level of divergence between all of the IS+ and IS- populations evolved in the low demand (0.01% galactose) environment. We pooled all 96 populations into pools of 32 and quantified the fraction of SNPs in P0 previously known to be adaptive (Tomanek et al., 2020). To do so, we subjected PCR amplicons of the pooled populations to next-generation sequencing (Figure 5A, Figure 5—figure supplement 1A). We designed our sequencing experiment such that we were able to analyse 39 bp upstream and downstream of the galK start codon. We calculated the fraction of sequence reads carrying either one or both most frequently observed adaptive SNPs at position –30 and –37 upstream of the galK start codon (Table 1). As a control, we also compared the fraction of SNPs within the galK gene of the IS+ and IS- evolved under different galactose conditions. In our experimental system, galactose selection is not expected to lead to adaptive mutations anywhere in the coding region of galK, as the enzyme itself is fully functional despite lacking a functional promoter sequence. Comparing the fraction of reads with SNPs (i.e. reads with a single SNP in galK divided by the number of reads with ancestral galK) allowed us to compare across samples with different absolute numbers of sequencing reads (Figure 5—figure supplement 1A). Consistent with our expectation, the fraction of sequencing reads with a single SNP at any position in galK was similar in populations evolved in different galactose concentrations and in the control populations evolved in the absence of galactose (Figure 5A–B).
We then compared the fraction of reads with the two adaptive SNPs in P0 previously known to confer increased galK expression (Figure 5A). While the fraction of reads carrying SNPs in galK is similar in all media, SNPs in P0 were more frequent in media containing galactose than in the control (Figure 5A – left and right panels) in agreement with strains adapting to galactose selection. Intriguingly, in low galactose, we found a higher fraction of reads carrying both adaptive single SNPs (–30T>A and –37C>T) in IS- populations than in the IS+ populations. This is consistent with our hypothesis that the more frequent amplification mutants effectively out-compete point mutations in the low demand environment.
We are here using the fraction of sequencing reads (‘alleles’) with adaptive SNPs divided by the number of ancestral reads as a simple metric of divergence. However, this normalization leads to an underestimation of SNPs if they occur in an amplified background. For instance, a SNP within a cell with four P0-galK copies, where one carries an SNP, counts less than a cell with one copy of P0-galK carrying one SNP. The rationale for using the fraction of adaptive alleles as our metric of divergence as opposed to the alternative, which is the number of SNPs per cell, is twofold: First, the methodology used here does not allow comparing absolute read counts between samples. Second, and more importantly, due to the random nature of deletion mutations, a single SNP in an amplified array of four copies has a one in four chance of being retained as a lasting divergent copy in the process of amplification and divergence. Hence, the dilution of SNPs by additional amplified copies is not simply a counting artefact, but reflects a biological reality relevant to the very process that we are studying. Therefore, we conclude that in the low demand environment a strain which cannot adapt by gene amplification exhibits a higher level of divergence than a strain which frequently adapts by gene amplification.
Evolutionary dynamics between mutation types differ for different initial random promoter sequences
Given the paucity of point mutations that we observed for the evolution of the random P0 sequence (either a combination of –30T>A and –37C>T or each SNP alone), we wondered whether a greater variety of mutations could be obtained when using a different random promoter sequence as a starting point for evolution. Therefore, we repeated our evolution experiment in the intermediate (0.1%) galactose environment with three additional random promoter sequences (P0-1, P0-2, P0-3).
After 10 days of evolution, only two out of the four random P0 sequences evolved increased galK-yfp expression (Figure 6A). This is roughly consistent with the fact that approximately 60–80% of random sequences are one point mutation away from a functional constitutive promoter (Yona et al., 2018; Lagator et al., 2022). Interestingly, P0-1 and P0-3 did not gain any gene duplications or amplifications. At first glance, this drastic difference in gene amplification was unexpected, since the IS+ strains only differ in their P0 sequence, and not in their gene duplication rate. However, random sequences have different abilities to recruit RNA-polymerase, and as a result, different baseline expression levels (Yona et al., 2018; Lagator et al., 2022). Given that a plateau exists in the expression growth relation for low levels of expression (Figure 1B), the initial expression level conferred by P0-1 and P0-3 might be too low to yield a selective benefit upon gene duplication alone. According to this hypothesis, these random (non-)promoters are not only two (or more) point mutations away from a beneficial sequence, but also two (or more) copy-number mutations.
Copy-number and point mutations are mutually exclusive in the intermediate demand environment for P0-2
For P0, the evolution experiment in intermediate galactose reproduced our previous findings, namely a YFP+CFP+ (amplified) and a mixed (amplified with increased YFP) fraction for IS+ populations and a YFP+ fraction for IS- populations (compare Figure 6A with Figure 2B), which corresponds to an amplification of YFP, but not CFP (Table 2).
For P0-2, the evolutionary dynamics differed from P0. In the IS+ strain, almost every single population evolved amplifications within the first 2 days of the evolution experiment (Figure 6B, Figure 6—figure supplement 1A). Moreover, only two fractions are visible in the YFP-CFP plots of P0-2. The first fraction is occupied by YFP+ populations carrying a single copy of cfp. The second fraction along the diagonal between YFP and CFP is occupied by amplified populations (YFP+CFP+). Moreover, it is shifted towards higher values of YFP/CFP relative to values found for P0 (Figure 6—figure supplement 1B), suggesting that P0-2 exhibits a higher baseline expression level than all the other three random promoter sequences. In contrast to the population-level measurements, single-cell measurements were not sufficiently sensitive to corroborate any difference in leaky expression amongst the four random promoter sequences (Figure 6—figure supplement 1C). However, in line with the observed evolutionary dynamics, P0 and even more so P0-2 confers a significant growth advantage over the other two promoters (Figure 6—figure supplement 1D). As mentioned above, this suggests that the observed growth advantage of P0-2 populations can explain their rapid amplification dynamics. In agreement with the evolution experiments with P0, the YFP+CFP+ (amplification) fraction is also strongly reduced in the IS- strain for P0-2.
Intriguingly, with the majority (88/96) of P0-2 IS+ population amplified, six P0-2 IS+ populations that failed to evolve amplifications show an increase in YFP/CFP early in the evolution experiment (Figure 6B – left panel, Figure 6—figure supplement 1A). This result combined with the idea that P0-2 exhibits a relatively high baseline expression level and the absence of a mixed fraction for P0-2 (Figure 6A) suggests that increases in gene expression evolve either via gene amplification or via point mutation. In other words, because initial galK expression is high in P0-2, a small improvement (either amplification or a promoter mutation) is sufficient to reach the required gene expression demand. Thus, the adaptive trajectory of P0-2 in intermediate galactose resembles that of P0 in low galactose as both environments select only for a modest improvement in galK expression.
In contrast to the IS+ strain, where only six populations showed increased YFP/CFP fluorescence that emerged only within the first 3 days of evolution, populations of the IS- strain were evolving increased YFP/CFP fluorescence throughout the experiment (Figure 6B – right panel). We were curious whether the increase in YFP/CFP in both, IS+ and IS- populations, was due to promoter mutations. Sequencing of randomly picked evolved clones revealed that in the majority (4/6 for IS+, 11/21 for IS-) of clones with increased YFP/CFP indeed harboured a mutation in P0-2, including an SNP, a 12 and a 13 bp deletion (Table 2; Figure 6C). Importantly, colonies of the same populations but with ancestral fluorescence harboured ancestral P0-2 sequences (Table 1), indicating that the observed mutations (Table 2) are causal for the increased YFP expression. While finding the causal mutations for the remaining evolved clones with increased YFP but ancestral P0-2 (Figure 6C) lies outside the scope of the current work, we speculate that they may occur further upstream of P0-2 or could be acting in trans such as mutations in the transcription factor rho (Steinrueck and Guet, 2017).
To confirm that the 12 bp deletion mutation, the 13 bp deletion mutation and the SNP were in fact adaptive, we reconstituted these mutations into the ancestral P0-2 strain, where they conferred increased YFP expression (Figure 6D) resulting in increased growth in medium supplemented with galactose (Figure 6E). The finding that the promoter mutations were responsible for increased galK-yfp expression was corroborated by the fact that these mutations occurred exclusively in populations with increased YFP but ancestral CFP, and were completely absent in amplified (YFP+CFP+) and ancestral colonies from a random set of 14 IS+ populations (Figure 6C). It is worth noting that mutations observed in P0-2 were more diverse than those observed in P0 (seven different mutations including indels, an IS insertion and an SNP in P0-2 versus three different SNPs in P0 – compare Tables 1 and 2). Thus, amplification can interfere with divergence not only by point mutations but also by small insertions and deletions.
Taken together, the facts that (i) the majority of IS+ populations become rapidly amplified, (ii) with few promoter mutations arising exclusively in the first day in non-amplified populations (mutations are mutually exclusive), and (iii) many more promoter mutations occur in IS- populations throughout the evolution experiment strongly suggest that negative epistasis between frequent copy-number mutation and point mutations hinder fixation of the latter.
Amplification hinders divergence by point mutations in the low demand environment
The experimental results we presented this far suggest that the evolutionary dynamics of duplication/amplification and divergence depend on the level of gene expression increase selected for (Figure 7). In both environments, promoter point mutations evolve at a low rate in a single-copy background. However, if rates of copy-number mutation are high, evolutionary dynamics are dominated by amplification. Irrespective of the environment, this amplification increases the mutational target size for rarer adaptive point mutations to occur. However, only if a strong increase in galK expression is selected for (high demand environment), the beneficial effects of both types of mutation add up, and we observe a combination of amplifications and point mutations to occur, in agreement with the IAD model (Bergthorsson et al., 2007; Näsvall et al., 2012; Andersson et al., 2015; Figure 7A).
The IAD model assumes that amplification and point mutations only occur in the same genetic background. However, whether the two different types of mutation fix consecutively in the same genetic background or in different competing clones depends on the effective population size and the respective mutation rates (Gerrish and Lenski, 1998). High rates of duplication and amplification may cause clonal interference between competing mutants, slowing down the fixation of either. Moreover, there needs to be sufficient selective benefit (‘demand’) for two consecutive selective sweeps to occur. If, however, only a modest level of gene expression increase is selected for (low demand environment) (Figure 1B), a single mutational event may be sufficient to provide it. Therefore, adaptation is dominated by the more frequent type of mutation, namely copy-number mutation. In other words, amplifications effectively hinder divergence in the low demand environment due to their negative epistatic interaction with point mutations. Thus, in a process, which we term amplification hindrance, the high rate of amplification results in evolutionary dynamics that slow down divergence via two different non-mutually exclusive mechanisms: clonal interference and negative epistasis.
However, in our experiments mutation rates can be assumed to be equal across environments. Moreover, in the absence of galK expression (i.e. for the ancestral strain) population sizes are similar across different galactose concentrations (Figure 7—figure supplement 1A). Hence, clonal interference is an unlikely explanation for the absence of combination mutants in the low galactose environment. However, there is a difference in the degree to which strains that harbour amplifications fulfil the necessary gene expression demand posed by the environment they have evolved in. Strains with amplifications evolved in the high and intermediate galactose environment grow slower and to lower densities than a strain with a strong constitutive promoter. In contrast, in the low galactose concentration strains with amplifications evolved in this environment exhibit both yield and growth rate comparable to that of the promoter mutant strain (Figure 7—figure supplement 1B–D).
These results suggest that gaining additional promoter point mutations on top of an amplification would only be beneficial in the higher galactose concentrations, but yield little or no fitness benefit in the low galactose environment. Therefore, under the experimental conditions presented here, gene expression demand – and hence negative epistasis – plays a major role in amplification hindrance.
Discussion
In this study, we investigated the interaction dynamics between two different types of mutations, adaptive copy-number and point mutations. While the process of gene duplication and divergence per se has been intensely studied since the pioneering work of Ohno more than half a century ago, no experiments have scrutinized the early phase of this process, where transient evolutionary changes may prevail. So far, the few existing experimental studies simply introduced mutations a priori without studying their formation dynamics (Dhar et al., 2014), while in silico studies used genomics to query the ‘archaeological’ results of millions of years of sequence evolution (Innan and Kondrashov, 2010).
Here, we used experimental evolution to investigate how the early adaptive dynamics of diverging promoter sequences is influenced by the rate of copy-number mutations as well as the level of expression increase selected for. We found that the spectrum of adaptive mutations differed drastically between environments selecting for different levels of expression of the same gene (Figures 1B, 3A and 6A). Combination mutants carrying both, copy-number and promoter point mutations, only evolved under conditions selecting for big increases in the levels of galK expression. In contrast, selection for only a modest increase in galK expression lead to populations adapting by either gene amplifications or point mutations in their random promoter sequence, but not both simultaneously. Moreover, if amplification occurred early in the experiment, the random promoter sequence P0 did not diverge within the timespan of the experiment (Figure 4D). This phenomenon was even more pronounced for a second random promoter sequence, P0-2 (Figure 6B–C).
Moreover, comparing the number of point mutations between strains that differ solely in the rate of undergoing copy-number mutations in the galK locus, we found that under a low demand environment, a strain with a high duplication rate (IS+) diverged more slowly compared to a strain with low duplication rate (IS-).
Taken together, our results suggest that frequent gene amplification hinders the fixation of adaptive point mutations due to most likely negative epistasis between these two different mutation types. While epistatic interactions can occur with any two adaptive mutations, copy-number mutations are unique, in that they are orders of magnitude more frequent than point mutations in bacteria (Roth, 1988; Drake et al., 1998; Andersson and Hughes, 2009; Elez et al., 2010; Reams and Roth, 2015) and in eukaryotes (Lynch et al., 2008; Lipinski et al., 2011; Schrider et al., 2013; Keith et al., 2016). This large difference in rates means that a competition between point and copy-number mutations is heavily skewed in favour of the latter (Figure 7B).
Unlike the phenomenon of clonal interference (which occurs between any two beneficial mutations even if their adaptive benefits are additive) (Gerrish and Lenski, 1998), negative epistasis does not slow down adaptation per se, as adaptation is agnostic to whether point or copy-number mutations lead to an improved phenotype. However, negative epistasis slows down divergence as populations have reached the fitness peak with an alternative kind of adaptive mutation. Negative epistasis between point and copy-number mutations can be expected to occur in any selective condition, which requires only a relatively modest increase to a particular biological function, namely an increase in gene expression or enzyme activity by only a few-fold. Thus, amplification hindrance may not only be of general relevance for the evolution of gene expression in bacteria, but also for the evolution of promiscuous enzyme functions, which analogous to a barely expressed gene can be enhanced by either copy-number mutations or point mutations in the coding sequence.
While we found that amplification slows down divergence under conditions of negative epistasis, the consensus in the literature has been that copy-number mutations not only serve as a first step in the ‘relay race of adaptation’ (Yona et al., 2015), but that they also facilitate divergence, either indirectly by providing a first ‘crude’ adaptation to cope with a new environment until more refined adaptation occurs by point mutations, or directly by increasing the target size for point mutations (Andersson and Hughes, 2009; Elde et al., 2012; Yona et al., 2015; Cone et al., 2017; Bayer et al., 2018; Lauer et al., 2018; Todd and Selmecki, 2020). The intuitive idea that amplification speeds up divergence (Andersson et al., 1998) was originally developed as strong evidence against the adaptive mutagenesis hypothesis proposed by Cairns and others (Cairns et al., 1988; Cairns and Foster, 1991).
Based on it, various experimental studies interpreted observations of adaptation to dosage selection in the light of ‘amplification as a facilitator of divergence’ (Song et al., 2009; Pränting and Andersson, 2011; Elde et al., 2012; Näsvall et al., 2012; Yona et al., 2012; Yona et al., 2015; Cone et al., 2017; Bayer et al., 2018; Lauer et al., 2018; Todd and Selmecki, 2020). However, despite showing that adaptive amplification precedes divergence by point mutations, none of the studies provided a direct experimental test of the hypothesis that amplification causes increased rates of divergence. Experiments controlling for the rate of amplification were needed in order to dissect the ensuing evolutionary dynamics and establish causality.
All else being equal, more copies indeed mean more DNA targets for point mutations to occur (San Millan et al., 2017). However, as our experiments show, all else is not necessarily equal, and the evolutionary dynamics may differ strongly between an organism that can increase copy-number as an adaptation and an organism that cannot. Intriguingly, indications for more complex dynamics can be found in the existing literature (Yona et al., 2012; Lauer et al., 2018; Richts et al., 2021). One study showed that rapid adaptive gene amplification in yeast results in strong clonal interference between lineages (Lauer et al., 2018). A second study in yeast found that adaptation to an abrupt increase in temperature was dominated by rapid copy-number mutation, with SNPs occurring only much later (Yona et al., 2012; Yona et al., 2015). Lastly, an experimental evolution study in Bacillus, adaptation was dominated by copy-number mutations and the authors noted the surprising lack of promoter mutations (Richts et al., 2021).
The transient dynamics of gene amplification allows tuning of gene expression on short evolutionary time scales in the absence of an evolved promoter (Tomanek et al., 2020). In principle, such transient evolutionary dynamics do not leave traces in the record of genomic sequence data on evolutionary time scales and as such, their detailed study may not seem warranted. This is especially true in the context of duplication and divergence of paralogs, which is studied because abundant genomic sequence data are available (Kondrashov, 2012). Our present study proved this intuition wrong, as we uncovered a potentially long-lasting effect resulting from the transient dynamics associated with copy-number mutations: if adaptation by amplification is the fastest and sufficient, other, less frequent, mutations may not have a chance to compete. While our evolution experiments were conducted under continuous selection, natural environments are often characterized by regimes of fluctuating selection. Due to the pleiotropic cost often associated with copy-number increases as well as their high rate of deletion, adaptive amplification returns to the ancestral single-copy state in the absence of selection (Andersson and Hughes, 2009; Reams et al., 2010). This means that once the selective benefit of the transient adaptation ceased, no change at the level of genomic DNA remains (Roth, 1996). Therefore, the idea that gene amplifications act as a transient ‘regulatory state’ rather than a mutation (Roth, 1996; Tomanek et al., 2020) can be extended by an implication found here, namely that amplifications could effectively act as buffer against long-lasting point mutations. In this view, amplification could repeatedly provide rapid adaptation to selection for increased gene expression, but collapse back to the single-copy ancestral state once selection has subsided and yet hinder sequence divergence each time it does so. Thus, on sufficiently long time scales, the transient dynamics that play out before the fixation of mutations may ultimately shape entire genomes (Cvijović et al., 2018).
Amplification hindrance is in agreement with the observation that gene duplication and divergence is not a dominant force in the expansion of protein families in bacteria (Treangen and Rocha, 2011; Tria and Martin, 2021). Consequently, in all situations where rapid amplification provides sufficient adaptation, amplification hindrance could work as a mutational force that – in addition to purifying selection – acts to conserve existing genes and their expression level. While purifying selection affects deleterious alleles only, counterintuitively, amplification hindrance prevents beneficial mutations from fixating.
Methods
Bacterial strain construction
To construct the IS- strain, we replaced the second copy of IS1 downstream of the selection and reporter cassette in IT030 (Tomanek et al., 2020) with a kanamycin cassette using pSIM6-mediated recombineering (Datta et al., 2006). Recombinants were selected on 25 µg/ml kanamycin to ensure single-copy integration.
To generate the additional random promoters sequences P0-1, P0-2, and P0-3, we generated 189 nucleotides using the ‘Random DNA sequence generator’ (https://faculty.ucr.edu/~mmaduro/random.htm) with the same GC content as P0 (55%). We synthesized these three sequences as gBlocks (Integrated DNA Technology, BVBA, Leuven, Belgium) with attached XmaI and XhoI restriction sites, which we used to clone P0-1, P0-2, and P0-3 into plasmid pMS6* (Tomanek et al., 2020) by replacing P0. We used pMS6* with the respective P0 sequence as a template to amplify the selection and reporter cassette and integrate it into MS022 (IS+) and IT049 (IS-) as described previously (Tomanek et al., 2020).
>P0
ACCGGAAAGACGGGCTTCAAAGCAACCTGACCACGGTTGCGCGTCCGTATCAAGATCCTCTTAATAAGCCCCCGTCACTGTTGGTTGTAGAGCCCAGGACGGGTTGGCCAGATGTGCGACTATATCGCTTAGTGGCTCTTGGGCCGCGGTGCGTTACCTTGCAGGAATTGAGGCCGTCCGTTAATTTCC.
>P0_1
GTAGGCCCGCACGCAAGACAAACTGCTGGGGAACCGCGTTTCCACGACCGGTGCACGATTTAACTTCGCCGACGTGACGACATTCCAGGCAGTGCCTCCGCCGCCGGACCCCCCTCGTGATCGGGTAGCTGGGCATGCCCTTGTGAGATATAACGAGAGCCTGCCTGTCTAATGATCTCACGGCGAAAG.
>P0_2
TCGGGGGGACAGCAGCGGCTGCAGACATTATACCGCAACAACACCAAGGTGAGATAACTCCGTAGTTGACTACGCGTCCCTCTAGGCCTTACTTGACCGGATACAGTGTCTTTGACACGTTTGTGGGCTACAGCAATCACATCCAAGGCTGGCTATGCACGAAGCAACTCTTGGGTGTTAGAATGTTGA.
>P0_3
CCCCTGTATTTGGGATGCGGGTAGTAGATGAGCGCAGGGACTCCGAGGTCAAGTACACCACCCTCTCGTAGGGGGCGTTCCAGATCACGTTACCACCATACCATTCGAGCATGGCACCATCTCCGCTGTGCCCATCCTGGTAGTCATCATCCCTATCACGCTTTCGAGTGTCTGGTGGCGGATATCCCC.
Reconstitution of P0-2 mutants in the ancestral strain
The reconstituted P0-2 mutant strains were obtained using pSIM6-mediated oligo recombineering (Sawitzke et al., 2011) of the ancestral strain and selecting recombinants on M9 0.1% galactose agar. The sequence of the oligonucleotides used is listed below. Successful recombinants were confirmed by Sanger sequencing of P0-2. Amongst the recombinants transformed with the –122_–134del construct, we also recovered one colony with higher YFP fluorescence intensity than the other recombinants. Sequencing showed an additional single deletion (–118del) in addition to the –122_–134del created by recombineering. Fluorescence and growth rate of the serendipitously obtained mutant is shown in Figure 6D–E along with the three intended mutants.
>A11 oligo (–131_–144del)
ACCGCAACAACACCAAGGTGAGATAACTCCGTAGTTGACTGGCCTTACTTGACCGGATACAGTGTCTTTGACACGTTTGTGGG.
>H12 oligo (–100C>T)
CTAGGCCTTACTTGACCGGATACAGTGTCTTTGATACGTTTGTGGGCTACAGCAATCACATCCAAGGCTG.
>F2 oligo (–122_–134del)
CAACACCAAGGTGAGATAACTCCGTAGTTGACTACGCGTCCCTTGACCGGATACAGTGTCTTTGACACGTTTGTGGGCTACAGCA.
List of strains used
Strain name | Genotype | Purpose | Source |
---|---|---|---|
MG1655 | F- λ- ilvG- rfb-50 rph-1 | Strain background for all evolution experiments | Lab collection |
IT013-TCD | BW27784, JA23100::galP, mglBAC::FRT, galK::FRT, locus1::pBAD-galK | Strain with pBAD-galK for testing expression-growth relation | Tomanek et al., 2020 |
BW25142 | lacIq rrnB3 ∆lacZ4787 hsdR514 ∆(araBAD)567 ∆(rhaBAD)568 ∆phoBR580 rph-1 galU95 ∆endA9 uidA(∆MluI)::pir-116 recA1 | Host for pir plasmid pMS6* | Khlebnikov et al., 2001 |
MS022 | MG1655, JA23100::galP, mglBAC::FRT, galK::FRT | IS+ background for ancestor strain construction | Lab collection |
IT030 | MS022 locus2::P0-RBS-galK -RBS-yfp -FRT-pR-cfp | IS+ ancestor strain | Tomanek et al., 2020 |
IT049 | MS022 deleted for IS1C | IS- background for ancestor strain construction | This study |
IT049-P0 | IT049 locus2::P0-RBS-galK -RBS-yfp -FRT-pR-cfp | IS- ancestor strain P0 | This study |
IT049-P0-1 | IT049 locus2::P0-1-RBS-galK -RBS-yfp -FRT-pR-cfp | IS- ancestor strain P0-1 | This study |
IT049-P0-2 | IT049 locus2::P0-2-RBS-galK -RBS-yfp -FRT-pR-cfp | IS- ancestor strain P0-2 | This study |
IT049-P0-3 | IT049 locus2::P0-3-RBS-galK -RBS-yfp -FRT-pR-cfp | IS- ancestor strain P0-3 | This study |
MS022-P0 | MS022 locus2::P0-RBS-galK -RBS-yfp -FRT-pR-cfp | IS+ ancestor strain P0 | This study |
MS022-P-01 | MS022 locus2::P0-1-RBS-galK -RBS-yfp -FRT-pR-cfp | IS+ ancestor strain P0-1 | This study |
MS022-P0-2 | MS022 locus2::P0-2-RBS-galK -RBS-yfp -FRT-pR-cfp | IS+ ancestor strain P0-2 | This study |
MS022-P0-3 | MS022 locus2::P0-3-RBS-galK -RBS-yfp -FRT-pR-cfp | IS+ ancestor strain P0-3 | This study |
IT030-H5r | MS022 locus2::pconst-RBS-galK -RBS-yfp -FRT-pR-cfp | Strain with constitutive galK expression conferred by two SNPs in P0 | Tomanek et al., 2020 |
IT030-D8c | MS022 locus2::pconst-RBS-galK -RBS-yfp -FRT-pR-cfp | Strain with constitutive galK expression conferred by one SNP in P0 | Tomanek et al., 2020 |
List of primers used
Name | Sequence | Purpose |
---|---|---|
E_flank_f | GCTGGAGCCACTTGTAGCC | cassette integration test locus 2, sequencing P0s |
E_flank_r | TCCTTGCTGAATCATTTTGTTC | cassette integration test locus 2 |
P0_check_Fw | GTGTGAGTGGCAGGGTAG | sequencing P0s (together with E_flank_f) |
qPCR_galK _Fw | GCTACCCTGCCACTCACA | estimating galK copy number |
qPCR_galK _Rv | CGCAGGGCAGAACGAAAC | estimating galK copy number |
rbsB_qPCR_Fw | GGCACAAAAATTCTGCTGATTAA | qPCR control locus |
rbsB_qPCR_Rv | GCAGCTCGATAACTTTGGC | qPCR control locus |
P1_P0-1 | GCCTTAGTTGTAAGTGTCTACCAT GTCCCCGAACAAGTGTTCACTATG TCTAGGCCCGCACGCAAGAC | integration of the selection and reporter cassette with P0-1 (Fw primer) |
P1_P0-2 | GCCTTAGTTGTAAGTGTCTACCAT GTCCCCGAACAAGTGTTCACTATG TCTCGGGGGGACAGCAGCG | integration of the selection and reporter cassette with P0-2 (Fw primer) |
P1_P0-3 | GCCTTAGTTGTAAGTGTCTACCAT GTCCCCGAACAAGTGTTCACTATG TCTGTATTTGGGATGCGGGTAGTAGA | integration of the selection and reporter cassette with P0-3 (Fw primer) |
E_int_Rv | TCGGAAGGGAAGAGGGAGTGCGGG AAATTTAAGCTGGATCACATATTGCC GAGGCCTTATGCTAGCTTC | integration of the selection and reporter cassette (Rv primer) |
E_int_Fw | GCCTTAGTTGTAAGTGTCTACCATGTC CCCGAACAAGTGTTCACTATGTCACCG GAAAGACGGGCTTC | integration of the selection and reporter cassette with P0 (Fw primer) |
deep_seq_Fw | TCGTCGGCAGCGTCAGATGTGTATAAG AGACAGACGGGTTCTTATGCCTTAGTT | 1st step PCR for amplicon deep sequencing (with 5´nextera anchor for Illumina sequencing) |
deep_seq_Rv | GTCTCGTGGGCTCGGAGATGTGTATAA GAGACAGGTGTGAGTGGCAGGGTAG | 1st step PCR for amplicon deep sequencing (with 5´nextera anchor for Illumina sequencing) |
Culture conditions
Bacterial strains were grown at 37°C. All evolution experiments, as well as growth experiments with the purpose of measuring OD600 and fluorescence, were conducted in M9 medium supplemented with 2 mM MgSO4, 0.1 mM CaCl2, 0.1% casaminoacids (‘evolution medium’), and carbon source (galactose, glucose, or glycerol) at the concentration indicated in the respective figures (Sigma-Aldrich, St Louis, MO), with the exception of Figure 2—figure supplement 2B, where bacteria were grown in M9 medium without casaminoacids (carbon sources as indicated in the figure).
Evolution experiments
Evolution experiments were inoculated with ancestral colonies of IS+ and IS- strains grown in 3 ml of LB medium over night, after two washing steps in M9 medium without carbon source (M9 buffer) and a dilution of 1:200.
Bacterial cultures were grown in 200 µl liquid evolution medium with the indicated galactose concentrations in clear flat-bottom 96-well plates and shaken in a Titramax plateshaker at 750 rpm (Heidolph, Schwabach, Germany), allowing for a total population size of ~108 colony forming units for the ancestral strain. Every day, populations were transferred to fresh plates using a VP408 pin replicator (V&P SCIENTIFIC, Inc, San Diego, CA) resulting in a dilution of ~1:820 (Steinrueck and Guet, 2017), corresponding to ~10 generations. Immediately after the transfer, growth and fluorescence measurements were performed in the overnight plates using a Biotek H1 plate reader (Biotek, Vinooski, Vermont). Thus, population phenotypes were measured every 10 generations.
Growth rate measurements in liquid cultures
To measure the growth rate in a 2D gradient of arabinose and galactose (Figure 1B), an overnight culture of strain IT013 (Tomanek et al., 2020) grown in M9 supplemented with 1% glycerol and 0.1% casaminoacids was diluted 1:200 into 96-well plates containing 200 µl of M9 supplemented with 0.1% casaminoacids, with concentrations of galactose and the inducer arabinose as indicated in Figure 1B. For the full duration of the experiment, cultures were grown in the plate reader with continuous orbital shaking and OD600 and fluorescence was measured in 10 min intervals.
Growth rate was calculated using a custom R script. Briefly, the script applies a linear model (base R function lm()) to a 20-datapoint sliding window of log(OD600) as a function of time. The script then outputs the steepest slope (maximal growth rate) amongst all possible sliding windows (Figure 1—source data 1). The growth rates plotted in Figure 6E and Figure 2—figure supplement 2B were obtained in the same manner (see Figure 6—source data 2 and Figure 2—figure supplement 2—source data 1), with strains and carbon sources as indicated in the respective figures.
Flow cytometry experiments
Frozen evolved populations (–80°C, 15% glycerol) from day 4, day 8, or day 12 (as indicated in the figures) were pinned (1:820) into M9 buffer and put on ice until the measurement. Fluorescence was measured using a BD FACSCanto II system (BD Biosciences, San Jose, CA) equipped with FACSDiva software. CFP fluorescence was collected with a 450/50 nm bandpass filter by exciting with a 405 nm laser. YFP fluorescence was collected with a 510/50 bandpass filter by exciting with a 488 nm laser. The bacterial population was gated on the FSC and SSC signal resulting in approximately 6000 events analysed per sample, out of 10,000 recorded events.
Quantitative real-time PCR
For qPCR, gDNA was isolated from overnight cultures grown in the respective evolution medium inoculated by single evolved colonies using Wizard Genomic DNA purification kit (Promega, Madison, WI). We performed qPCR using Promega qPCR 2× Mastermix (Promega, Madison, WI) and a C1000 instrument (Bio-Rad, Hercules, CA). To quantify the copy-number of samples of an evolving population, we designed one primer pair within galK (target) and one primer within rbsB as a reference, which lies outside the amplified region. We compared the ratios of the target and the reference loci to the ratio of the same two loci in the single-copy control. Using dilution series of one of the gDNA extracts as template, we calculated the efficiency of primer pairs and quantified the copy-number of galK in each sample employing the Pfaffl method, which takes amplification efficiency into account (Pfaffl, 2001). qPCR was performed in three technical replicates.
Measurement of colony fluorescence
Evolving populations were pinned onto LB agar supplemented with 1% charcoal and imaged using a macroscope setup (https://openwetware.org/wiki/Macroscope) (Chait et al., 2010). To obtain median colony YFP and CFP fluorescence intensity, a region of interest was determined using the ImageJ plugin ‘Analyze Particles’ (settings: 200px-infinity, 0.5–1.0 roundness) to identify colonies on 16-bit images with threshold adjusted according to the default value. The region of interest including all colonies was then used to measure intensity and plotted using a custom R script (Figure 4—source data 1).
Amplicon deep sequencing of P0
Frozen samples of evolved populations were diluted 1:10 into 100 µl of LB and grown for 5 hr (37°C, shaking) to increase cell numbers prior to DNA extractions. Columns 1–4 (populations A1, B1, C1, …, F4, G4, H4), 5–8 (populations A5, B5, C5, …, F8, G8, H8), and 9–12 (populations A9, B9, C9, …, F12, G12, H12) of each 96-well plate were pooled prior to DNA extraction using Wizard Genomic DNA purification kit (Promega, Madison, WI). The P0 region including the beginning of galK was amplified for 25 PCR cycles using primers deep_seq_Fw and deep_seq_Rv carrying 5′ adaptors for Illumina sequencing. In parallel, PCRs were performed for 35 cycles to confirm bands on a gel. Illumina sequencing was carried out by Microsynth (Balgach, Switzerland).
We note that our amplicon libraries of P0 were contaminated with reads carrying the sequence of P0-2, which we had prepared for sequencing in parallel (Figure 5—figure supplement 1). We therefore excluded all reads of P0-2 for our analysis of P0 and do not report the result of the P0-2-specific samples as they could not be trusted.
Reads of P0 were analysed using a custom R script. Briefly, we defined four sequence motifs of each 39 bp length, which represented the ancestral P0 sequence and the same region with known adaptive SNPs (–30T>A, –37C>T or both). We calculated the fraction of reads with an evolved versus ancestral 39 bp motif in all samples, including those of control populations evolved in the absence of galactose. We also calculated the fraction of reads carrying a 39 bp ancestral galK sequence motif with any single single SNP versus those with the same 39 bp motif of the ancestral galK sequence.
Data availability
Source Data and R scripts to generate the plots shown in the Figures are uploaded as the respective source code files. Flow cytometry and Illumina sequencing data are uploaded on Dryad together with R scripts to generate the plots shown in the respective Figures (Flow cytometry data: Figure 2C, 3C-D and Figure Supplements, 4A; Illumina sequencing data: Figure 5 and Figure Supplement).
-
Dryad Digital RepositoryFlow cytometry YFP and CFP data and deep sequencing data of populations evolving in galactose.https://doi.org/10.5061/dryad.rfj6q57ds
References
-
The “ evolvability ” of promiscuous protein functionsNature Genetics 37:73–76.https://doi.org/10.1038/ng1482
-
Gene amplification in cancerTrends in Genetics 22:447–455.https://doi.org/10.1016/j.tig.2006.06.007
-
Tandem genetic duplications in phage and bacteriaAnnual Review of Microbiology 31:473–505.https://doi.org/10.1146/annurev.mi.31.100177.002353
-
Gene amplification and adaptive evolution in bacteriaAnnual Review of Genetics 43:167–195.https://doi.org/10.1146/annurev-genet-102108-134805
-
Evolution of new functions de novo and from preexisting genesCold Spring Harbor Perspectives in Biology 7:1–19.https://doi.org/10.1101/cshperspect.a017996
-
Gene amplification and insecticide resistancePest Management Science 67:886–890.https://doi.org/10.1002/ps.2189
-
Adaptation by copy number variation in monopartite virusesCurrent Opinion in Virology 33:7–12.https://doi.org/10.1016/j.coviro.2018.07.001
-
Turning a hobby into a job: how duplicated genes find new functionsNature Reviews. Genetics 9:938–950.https://doi.org/10.1038/nrg2482
-
Shining a light on enzyme promiscuityCurrent Opinion in Structural Biology 47:167–175.https://doi.org/10.1016/j.sbi.2017.11.001
-
Evolution of new enzymes by gene duplication and divergenceThe FEBS Journal 287:1262–1283.https://doi.org/10.1111/febs.15299
-
Experimental studies of evolutionary dynamics in microbesTrends in Genetics 34:693–703.https://doi.org/10.1016/j.tig.2018.06.004
-
Bacterial genome instabilityMicrobiology and Molecular Biology Reviews 78:1–39.https://doi.org/10.1128/MMBR.00035-13
-
Increased gene dosage plays a predominant role in the initial stages of evolution of duplicate TEM-1 beta lactamase genesEvolution; International Journal of Organic Evolution 68:1775–1791.https://doi.org/10.1111/evo.12373
-
Seeing mutations in living cellsCurrent Biology 20:1432–1437.https://doi.org/10.1016/j.cub.2010.06.071
-
The fate of competing beneficial mutations in an asexual populationGenetica 102:127–144.
-
Effect of a RecA mutation on cholera toxin gene amplification and deletion eventsJournal of Bacteriology 165:723–731.https://doi.org/10.1128/jb.165.3.723-731.1986
-
The evolution of gene duplications: classifying and distinguishing between modelsNature Reviews. Genetics 11:97–108.https://doi.org/10.1038/nrg2689
-
Evolution of catalytic proteins or on the origin of enzyme species by means of natural selectionJournal of Molecular Evolution 20:38–51.https://doi.org/10.1007/BF02101984
-
Gene duplication as a mechanism of genomic adaptation to a changing environmentProceedings. Biological Sciences 279:5048–5057.https://doi.org/10.1098/rspb.2012.1108
-
An evolving view of copy number variantsCurrent Genetics 65:1287–1295.https://doi.org/10.1007/s00294-019-00980-0
-
A new mathematical model for relative quantification in real-time RT-PCRNucleic Acids Research 29:45e–445.https://doi.org/10.1093/nar/29.9.e45
-
Mechanisms of gene duplication and amplificationCold Spring Harbor Perspectives in Biology 7:a016592.https://doi.org/10.1101/cshperspect.a016592
-
A Bacillus subtilis δpdxt mutant suppresses vitamin B6 limitation by acquiring mutations enhancing pdxs gene dosage and ammonium assimilationEnvironmental Microbiology Reports 13:218–233.https://doi.org/10.1111/1758-2229.12936
-
Rearrangements of the bacterial chromosome: formation and applicationsScience 241:1314–1318.
-
BookRearrangements of the bacterial chromosome: formation and applicationsIn: Neidhardt FC, editors. In Escherichia coli and Salmonella: Cellular and Molecular Biology (2nd edn). Washington, D.C: American Society for Microbiology. pp. 2256–2276.
-
Multicopy plasmids potentiate the evolution of antibiotic resistance in bacteriaNature Ecology & Evolution 1:10.https://doi.org/10.1038/s41559-016-0010
-
What fraction of duplicates observed in recently sequenced genomes is segregating and destined to fail to fix?Genome Biology and Evolution 7:2258–2264.https://doi.org/10.1093/gbe/evv139
-
Gene amplification as a form of population-level gene expression regulationNature Ecology & Evolution 4:612–625.https://doi.org/10.1038/s41559-020-1132-7
-
Gene duplications are at least 50 times less frequent than gene transfers in prokaryotic genomesGenome Biology and Evolution 13:1–14.https://doi.org/10.1093/gbe/evab224
-
Random sequences rapidly evolve into de novo promotersNature Communications 9:1530.https://doi.org/10.1038/s41467-018-04026-w
Article and author information
Author details
Funding
No external funding was received for this work.
Acknowledgements
We are grateful to N Barton, F Kondrashov, M Lagator, M Pleska, R Roemhild, D Siekhaus, and G Tkacik for input on the manuscript and to K Tomasek for help with flow cytometry.
Copyright
© 2022, Tomanek and Guet
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,911
- views
-
- 272
- downloads
-
- 10
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Evolutionary Biology
Spatial patterns in genetic diversity are shaped by individuals dispersing from their parents and larger-scale population movements. It has long been appreciated that these patterns of movement shape the underlying genealogies along the genome leading to geographic patterns of isolation by distance in contemporary population genetic data. However, extracting the enormous amount of information contained in genealogies along recombining sequences has, until recently, not been computationally feasible. Here we capitalize on important recent advances in genome-wide gene-genealogy reconstruction and develop methods to use thousands of trees to estimate per-generation dispersal rates and to locate the genetic ancestors of a sample back through time. We take a likelihood approach in continuous space using a simple approximate model (branching Brownian motion) as our prior distribution of spatial genealogies. After testing our method with simulations we apply it to Arabidopsis thaliana. We estimate a dispersal rate of roughly 60km2 per generation, slightly higher across latitude than across longitude, potentially reflecting a northward post-glacial expansion. Locating ancestors allows us to visualize major geographic movements, alternative geographic histories, and admixture. Our method highlights the huge amount of information about past dispersal events and population movements contained in genome-wide genealogies.
-
- Evolutionary Biology
The majority of highly polymorphic genes are related to immune functions and with over 100 alleles within a population, genes of the major histocompatibility complex (MHC) are the most polymorphic loci in vertebrates. How such extraordinary polymorphism arose and is maintained is controversial. One possibility is heterozygote advantage (HA), which can in principle maintain any number of alleles, but biologically explicit models based on this mechanism have so far failed to reliably predict the coexistence of significantly more than ten alleles. We here present an eco-evolutionary model showing that evolution can result in the emergence and maintenance of more than 100 alleles under HA if the following two assumptions are fulfilled: first, pathogens are lethal in the absence of an appropriate immune defence; second, the effect of pathogens depends on host condition, with hosts in poorer condition being affected more strongly. Thus, our results show that HA can be a more potent force in explaining the extraordinary polymorphism found at MHC loci than currently recognized.