DNA replication errors are a major source of adaptive gene amplification

Julie N Chuong; Nadav Ben Nun; Ina Suresh; Julia Matthews; Titir De; Grace Avecilla; Farah Abdul-Rahman; Nathan Brandt; Yoav Ram; David Gresham

doi:10.7554/eLife.98934.1

eLife assessment

This study provides important new insights into the contribution of local DNA features to the molecular mechanisms and dynamics of copy number variation (CNV) formation during adaptive evolution. While limited to a single CNV, the experiments are carefully controlled and present convincing evidence that supports the conclusions. This work will be of general interest to those studying genome architecture and evolution from yeast biologists to cancer researchers.

https://doi.org/10.7554/eLife.98934.1.sa3

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

convincing: Appropriate and validated methodology in line with current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Detecting and understanding heritable changes in DNA that contribute to adaptive evolution is a primary goal of evolutionary biology. Copy number variants (CNVs)—gains and losses of genomic sequences—are an important source of genetic variation underlying rapid adaptation and genome evolution. However, despite their central role in evolution little is known about the factors that contribute to the structure, size, formation rate, and fitness effects of adaptive CNVs. Local genome elements are likely to be an important determinant of these properties. Whereas it is known that point mutation rates vary with genomic location and local sequence features, the role of genome architecture in the formation, selection, and the resulting evolutionary dynamics of CNVs is poorly understood. Previously, we have found that the GAP1 gene in Saccharomyces cerevisiae undergoes frequent and repeated amplification and selection under long-term experimental evolution in glutamine-limiting conditions. The GAP1 gene has a unique genomic architecture consisting of two flanking long terminal repeats (LTRs) and a proximate origin of DNA replication (autonomously replicating sequence, ARS), which are likely to promote rapid GAP1 CNV formation. To test the role of these genomic elements on CNV-mediated adaptive evolution we performed experimental evolution in glutamine-limited chemostats using engineered strains lacking either the adjacent LTRs, ARS, or all elements. Using a CNV reporter system and neural network simulation-based inference (nnSBI) we quantified the formation rate and fitness effect of CNVs for each strain. We find that although GAP1 CNVs repeatedly form and sweep to high frequency in all strains, removal of local DNA elements significantly impacts the rate and fitness effect of CNVs and the rate of adaptation. We performed genome sequence analysis to define the molecular mechanisms of CNV formation for 177 CNV lineages. We find that 49% of all GAP1 CNVs are mediated by the DNA replication-based mechanism Origin Dependent Inverted Repeat Amplification (ODIRA) regardless of background strain. In the absence of the local ARS, a distal ARS can mediate ODIRA CNV formation. In the absence of local LTRs homologous recombination mechanisms still mediate gene amplification following de novo insertion of retrotransposon elements at the locus. Our study demonstrates the remarkable plasticity of the genome and reveals that DNA replication errors are a predominant source of adaptive CNVs.

Introduction

Defining the genetic basis and evolutionary dynamics of adaptation is a central goal in evolutionary biology. Mutations underlying adaptation or biological innovation can depend on multiple factors including genetic backgrounds, phenotypic states, and genome architecture (Blount et al., 2008, 2012). One important class of mutation mediating adaptive evolution are copy number variants (CNVs) which comprise duplications or deletions of genomic sequences that range in size from gene fragments to whole chromosomes. Quantifying the rates at which CNVs occur, the factors that influence their formation, and the fitness and functional effects of CNVs is essential for understanding their role in evolutionary processes.

CNVs play roles in rapid adaptation in multiple contexts and are an initiating event in biological innovation. In laboratory evolution experiments, a spontaneous tandem duplication captured a promoter for expression of a citrate transporter and resulted in Escherichia coli cells, typically unable to use citrate, to start metabolizing citrate as a carbon source (Blount et al., 2012). CNVs can be beneficial in cancer cells, promote tumorigenesis (Ben-David & Amon, 2020), enhance cancer cell adaptability (Rutledge et al., 2016), and accelerate resistance to anti-cancer therapies (Lukow et al., 2021). Over longer time scales, CNVs serve as substrate from which new genes evolve (Ohno, 1970; Taylor & Raes, 2004) as duplicated genes redundant in function can accumulate mutations and evolve to acquire new functions. For example, the globin gene family in mammals arose from rounds of gene duplication and subsequent diversification (Storz, 2016). CNVs also contribute to macro-evolutionary processes and thereby contribute to species differences, such as between humans and chimpanzees (Cheng et al., 2005) and speciation (Zuellig & Sweigart, 2018).

Mutations, including CNVs, occur in part because of errors made during DNA replication or DNA repair. Two general processes underlie CNV formation: (1) DNA recombination-based mechanisms and (2) DNA replication-based mechanisms (Harel et al., 2015; Hastings, Lupski, et al., 2009; Malhotra & Sebat, 2012; Pös et al., 2021; Zhang, Gu, et al., 2009). Recombination-mediated mechanisms of CNV formation include non-allelic homologous recombination (NAHR) and nonhomologous end joining. NAHR occurs via recombination between homologous sequences that are not allelic. As such, NAHR occurs more frequently with repetitive sequences due to improper alignment of DNA segments and can occur either between (interchromosomal) or within (intrachromosomal) a chromosome (Harel et al., 2015). One prevalent class of repetitive sequence are retrotransposons and both full length and partial sequences, such as long terminal repeats (LTR), are substrates for homologous recombination generating gene amplifications (Avecilla et al., 2023; Dunham et al., 2002; Gresham et al., 2008; Lauer et al., 2018; Spealman et al., 2022). DNA replication-based mechanisms include fork stalling template switching (FoSTes) and microhomology mediated break-induced repair (MMBIR) (Carvalho et al., 2013; Gu et al., 2008a; Hastings, Ira, et al., 2009; Lee et al., 2007). During FoSTes and MMBIR, the DNA replication fork stalls due to a single strand nick and a replication error occurs in which the lagging strand switches to an incorrect template strand mediated by microhomology. Reinitiation of DNA synthesis at the incorrect site can form CNVs. A particular type of DNA replication-based error is Origin-Dependent Inverted Repeat Amplification (ODIRA), in which short inverted repeats near an origin of DNA replication enable template switching of the leading strand to the lagging strand. Subsequent replication generates an intermediate linear DNA molecule that can recombine into the original genome to form a triplication with an inverted middle copy (Brewer et al., 2015; Martin et al., 2024).

In microbes, CNVs can mediate rapid adaptation to selective conditions imposed through nutrient limitation in a chemostat. Selected CNVs often include genes encoding nutrient transporters that facilitate import of the limiting nutrient (Dunham et al., 2002; Gresham et al., 2008; Horiuchi et al., 1963; Payen et al., 2016; Sonti & Roth, 1989), likely as a result of improved nutrient transport capacity due to increased protein production. Previous studies have found amplification of the general amino acid permease gene, GAP1, when Saccharomyces cerevisiae populations are continuously cultured in glutamine-limited chemostats (Gresham et al., 2010; Lauer et al., 2018). Amplification of GAP1 confers increased fitness in the selective environment (Avecilla et al., 2023). Sequence characterization of these CNVs revealed that a diversity of de novo CNV alleles arose and were selected for including tandem duplications of GAP1, complex large CNVs, aneuploidies, and translocations. However, little is known about the molecular basis of this genetic diversity.

Local genome elements are likely to be an important determinant of CNV formation rates and mechanisms. Genomic context can influence multiple properties including mutation rate, epigenetic regulation, chromatin state, transcription levels, DNA replication, and recombination rate (Arndt et al., 2005; Chuang & Li, 2004; Lang & Murray, 2011; Lercher & Hurst, 2002; Matassi et al., 1999; Nishant et al., 2009; Wolfe et al., 1989). Prior work has shown that CNVs occur more frequently in repetitive regions in the genome (Harel et al., 2015; Pentao et al., 1992; Stankiewicz et al., 2003; Turner et al., 2008). However, little is known about the role of local genomic architecture and organization on CNV formation rates, the types of CNVs that are generated, their associated fitness effects, and ultimately the paths taken during adaptive evolution.

Here, we aimed to directly investigate the effect of local genome architecture elements on de novo GAP1 CNV formation and selection dynamics during adaptive evolution of Saccharomyces cerevisiae. We hypothesized that sequence elements proximate to GAP1 potentiate CNV formation. The GAP1 locus, which is located on the short arm of chromosome XI, consists of two flanking Ty1 long terminal repeats (LTRs) that share 82% sequence identity and an origin of DNA replication or autonomously replicating sequence (ARS) (Figure 1A). Both LTRs and ARS may facilitate GAP1 CNV formation due to their proximity. First, the flanking LTRs can undergo inter-chromatid NAHR to form tandem duplications of GAP1 on a linear chromosome (Lauer et al., 2018; Spealman et al., 2022). Second, intra-chromatid NAHR between the flanking LTRs can form an extrachromosomal circle containing GAP1 and an ARS able to self-propagate and integrate into the genome (Gresham et al., 2010). Finally, GAP1 triplications can form through ODIRA using short inverted repeats and the proximate ARS (Brewer et al., 2015; Lauer et al., 2018; Martin et al., 2024). These elements are thought to facilitate a high rate of GAP1 amplification, estimated to be on the order of 10^-4 per haploid genome per generation (Avecilla et al., 2022). To test our hypothesis we used a CNV reporter, wherein a constitutively expressed fluorescent GFP gene is inserted adjacent to GAP1 (Lauer et al., 2018). We engineered strains that lacked either the ARS (ARSΔ), both flanking LTRs (LTRΔ), or all three elements (ALLΔ) (Figure 1A). We performed experimental evolution using wildtype (WT) and genomic architecture mutant populations in glutamine-limited chemostats for 137 generations and quantified GAP1 CNVs using flow cytometry (Figure 1). Surprisingly, we find that the proximate DNA elements are not required for GAP1 CNV formation as GAP1 CNVs were identified in all evolving populations. We used neural network simulation-based inference (nnSBI) to infer the CNV formation rate and selection coefficient (Avecilla et al., 2022). We find that genomic architecture mutants have significantly reduced CNV formation rates relative to WT and significantly lower selection coefficients even though GAP1 CNVs repeatedly form and sweep to high frequency in all strains. We performed genome sequence analysis to define the molecular mechanisms of CNV formation for 177 CNV lineages and found that 49% of GAP1 CNVs are mediated by ODIRA regardless of background strain. In the absence of the local ARS, a distal ARS facilitates CNV formation through ODIRA. We also find that homologous recombination mechanisms still mediate gene amplification in the absence of LTRs in part initiated by de novo insertion of retrotransposon elements at the locus. Our study reveals the remarkable plasticity of the genome and that DNA replication errors are a predominant source of adaptive CNVs.

Local ARS contributes to CNV dynamics during adaptive evolution.
**(A)** The *Saccharomyces cerevisiae GAP1* gene is located on the short arm of chromosome XI (beige rectangle). Light blue rectangle - Ty Long terminal repeats (LTR). Purple rectangle - Autonomously replicating sequences (ARS). Orange rectangles - tRNA genes. *GAP1* ORF - white rectangle. The *GAP1* gene (white rectangle) is flanked by Ty1 LTRs (YKRCδ11, YKRCδ12), which are remnants of retrotransposon events and is directly upstream of an autonomously replicating sequence (ARS1116). Variants of the *GAP1* locus were engineered to remove either both LTRs, the single ARS, or all three elements. All engineered genomes contain a CNV reporter. **(B)** We evolved the four different strains in 5-8 replicate populations, for a total of 27 populations, in glutamine-limited chemostats and monitored the formation and selection of *de novo GAP1* CNVs for 137 generations using flow cytometry. Population samples were taken every 8-10 generations and 100,000 cells were assayed using a flow cytometer. Colored lines show the median proportion of cells in a population with *GAP1* amplifications across 5-8 replicate populations of the labeled strain. The shaded regions represent the median absolute deviation across the replicates. **(C)** We summarized CNV dynamics and found that strain has a significant effect on CNV appearance (Kruskal-Wallis, p =0.001384). There are significant differences in CNV appearance between LTRΔ (blue) and ARSΔ (red), and LTRΔ (blue) and ALLΔ (yellow) (pairwise wilcoxon test with Bonferroni correction, p=0.0059 and p=0.0124, respectively). **(D)** Strain has a significant effect on the per generation increase in proportion of cells with CNV (ANOVA, p = 0.00318). There is a significant difference between LTRΔ (blue) and ALLΔ (yellow) (pairwise t-test with Bonferroni correction, p = 0.0026). **(E)** Strain has a significant effect on time to CNV equilibrium phase (ANOVA, p = 0.00833). There is a significant difference in time to CNV equilibrium between WT and ARSΔ (pairwise t-tests with bonferroni correction, p =0.050).

Results

Accurate estimation of CNV allele frequencies remains challenging using molecular methods such as DNA sequencing and qPCR. To address this challenge we developed a CNV reporter comprising a constitutively expressed fluorescent gene inserted upstream of GAP1 and observed recurrent amplification and selection of GAP1 in glutamine-limiting conditions (Lauer et al., 2018). Subsequently, we showed that a high rate of GAP1 CNV formation and strong fitness effects explain the highly reproducible evolutionary dynamics (Avecilla et al., 2022). Noncoding sequence elements proximate to GAP1, including flanking LTRs in tandem orientation and an ARS, contribute to GAP1 CNV formation (Gresham et al., 2010; Lauer et al., 2018). Many studies have shown that repetitive sequence regions and origins of replications are hotspots of CNVs (Arlt et al., 2012; Cardoso et al., 2016; Di Rienzi et al., 2009; Gresham et al., 2010; Lauer et al., 2018; Martin et al., 2024). Thus, we hypothesized that the local genomic architecture of GAP1 facilitates its high rate of CNV formation.

To test the role of proximate genomic features we engineered strains deleted for each element and thus differing from our wildtype strain (WT) containing a GAP1 CNV reporter by a single modification. Specifically, we constructed ARSΔ, a strain lacking the single ARS, LTRΔ, a strain lacking the flanking LTRs, and ALLΔ, a strain lacking all three elements (Figure 1A). All strains contain the CNV reporter at the identical location as the WT strain. We confirmed scarless deletions of genetic elements using Sanger and whole-genome sequencing.

Local genomic architecture contributes to GAP1 CNV evolutionary dynamics

We founded independent populations with each of the three engineered strains lacking proximate genomic features and a WT strain. We studied GAP1 CNV dynamics in populations maintained in glutamine-limited chemostats over 137 generations (Figure 1). For each of the four strains, we propagated 5-8 clonal replicate populations, each originating from the same inoculum (founder population) derived from a single colony. Approximately every 10 generations, we measured GFP fluorescence of sampled populations using a flow cytometer and quantified the proportion of cells containing GAP1 CNVs (Methods). We observed similar CNV dynamics across independent populations within each strain (Figure S3B). Therefore, we summarized CNV dynamics for each strain using the median proportion of the population with a GAP1 CNV. In every strain, GAP1 CNVs are generated and selected resulting in qualitatively similar dynamics in WT and mutant strains (Figure 1B).

Deletion of the ARS, but not the flanking LTRs alters CNV dynamics

We quantified three phases of CNV dynamics 1) the time to CNV appearance, defined by the inflection point before rise in frequency (Fig 1C), 2) the selection phase, corresponding to the increase in proportion of CNVs per generation during the initial expansion of CNVs (Figure 1D), and 3) the equilibrium phase, corresponding to the plateau (Figure 1E). The time to CNV appearance (Figure 1C) and the CNV selection phase (Figure 1B) does not differ between WT and LTRΔ populations (pairwise wilcoxon test, adjusted p = 1, pairwise t-test p = 1, respectively). In the WT and LTRΔ populations, GAP1 CNVs appear at generation 50 and increase in frequency at similar rates, ∼15% per generation in WT and ∼18% per generation in LTRΔ. The two strains both reach their equilibrium phase at the same point around generation 75 (pairwise t-test, adjusted p =1). The absence of a significant difference in CNV adaptation dynamics between the two strains suggests that the LTRs are not a major determinant of GAP1 CNV evolutionary dynamics.

By contrast, in ARSΔ and ALLΔ populations, we observe a delay in the time to CNV appearance. In both of these strains, CNVs are first detected at generations 65-80, whereas in WT and LTRΔ populations CNVs are first detected at generation 50 (ARSΔ vs. LTRΔ, wilcoxon pairwise test, adjusted p = 0.0059, ALLΔ vs. LTRΔ, wilcoxon pairwise test, adjusted, p = 0.0124) (Figure 1). Thus, the local ARS contributes to the initial GAP1 CNV dynamics. Similarly, CNV selection is significantly different between the LTRΔ (18%) and ALLΔ (13%) (pairwise t-test, adjusted p-value = 0.0026). Finally, we also observe a significant delay (ANOVA, p = 0.00833) in the generation at which the CNV frequency reaches equilibrium in ARSΔ (∼generation 112) compared to WT (pairwise t-test, adjusted p = 0.05) (Figure 1E). These observations suggest that absence of the ARS in the ARSΔ and ALLΔ strains delays the appearance of GAP1 CNVs compared with the presence of the ARS in WT or LTRΔ strains.

GAP1 amplifications can occur without CNV reporter amplification

In both WT and LTRΔ populations we observed that GAP1 CNV abundance stabilized around 75% during the equilibrium phase (Figure 1B). In these populations, inspection of raw flow cytometry data revealed a persistent single-copy GFP subpopulation (Figure 2A). These data could be explained by two possible scenarios: (1) the existence of a non-GAP1 CNV subpopulation comprising beneficial variation at other loci with fitness effects equivalent to GAP1 CNVs or (2) lineages with GAP1 CNVs without co-amplification of the CNV reporter. To resolve these two possibilities we sequenced clones from the single-copy GFP subpopulation in each of the five WT populations (Supplementary table 1) and identified the presence of GAP1 amplifications without co-amplification of the CNV reporter (Supplementary figure S2A). The presence of GAP1 CNVs in four out of five populations lacking the CNV reporter amplification suggests it may have occurred in the founder population. We found eight distinct CNV breakpoint pairs (Supplementary figure S2A) indicating at least eight independent events occurred, either in the founder population or just after being propagated to independent populations. All clones from one of the five populations (i.e. population 3) contained one copy of GFP and one copy of GAP1, suggesting these clones have a beneficial mutation elsewhere in the genome allowing them to stably coexist with the GAP1 CNV subpopulation.

CNV reporter failure does not impact parameter inference.
(A) Flow cytometry of a representative WT population with a persistent single-copy GFP subpopulation comprising ∼25% of the population. (B) Model illustration. X_A is the frequency of ancestor cells in the chemostat; X_C⁺, X_C⁻ are the frequencies of cells with *GAP1* duplications with two or one reporters, respectively, and a selection coefficient s_C; *X_B* is the frequency of cells with other beneficial mutations and a selection coefficient *s_B*. *GAP1* duplications form with a rate δ_C, other beneficial mutations occur with rate δ_B. At generation 0, only genotypes C⁻ and A are present, with frequencies of X_C⁻ = φ and *X_A* = 1 − φ. (C) Examples of total CNV proportions (solid) and reported CNV proportions (dashed) for two parameter combinations, both with *s_C* = 0. 15, φ = 10⁻⁴.

Incorporating unreported pre-existing CNVs in an evolutionary model

To quantify the evolutionary parameters underlying empirically measured CNV dynamics (Figure 1) we built an evolutionary model with the goal of performing network simulation-based inference (nnSBI) (Avecilla et al., 2022; Cranmer et al., 2020; Gonçalves et al., 2020). Previously, our evolutionary model assumed the GAP1 CNV reporter allowed us to detect all GAP1 CNVs (Avecilla et al., 2022). However, our new experimental results indicate the existence of a small subpopulation of unreported GAP1 CNVs present at the beginning of the experiments (Figure 2A). Therefore, we expanded the evolutionary model to include φ, the proportion of cells with GAP1 CNVs without co-amplification of the reporter, at the commencement of the experiment (i.e. generation 0) (Figure 2B). The remaining model parameters are δ_C, the rate at which GAP1 duplications form, and δ_B, the rate other beneficial mutations occur. We find that this updated evolutionary model can accurately describe the observed dynamics (Supplementary Figure S3B), which are clearly affected by the value of φ. When the total CNV proportion is very different from the reported proportion, e.g., when φ≫δ_C > δ_B, a reduced CNV formation rate results in a greater discrepancy between reported and total CNV proportions (Figure 2C)

Decreased CNV formation rates in mutants suggest adjacent elements can drive GAP1 CNV formation

We used nnSBI to estimate CNV formation rates and selection coefficients from evolutionary dynamics observed in glutamine-limited chemostats. We trained a neural density estimator using evolutionary simulations (Methods). This neural density estimator then allows us to infer posterior distributions over model parameters (formation rate, δ; selection coefficient, s) from the CNV dynamics in single replicates and a collective posterior distribution from a set of replicates of the same strain. δ_C is the GAP1 CNV formation rate. s_C is the GAP1 CNV selection coefficient, wherein the fitness effect is 1 + s. We estimated the confidence of our inference approach on synthetic simulations by computing its coverage, i.e., the probability that the true parameter falls within the 95% highest density interval (HDI) of the posterior distribution. Our neural density estimator is slightly over-confident for φ with a coverage of 0.934, and under-confident for GAP1 CNV selection coefficient and formation rate with a coverage of 0.992 for s_C and 0.995 for δ_C, respectively. (Supplementary Table 2). Despite this under-confidence, the posterior distributions are narrow in biological terms: the 95% HDI represents less than an order of magnitude for both s_C and δ_C. Thus, we did not apply post-training adjustments to the neural density estimator, such as calibration (Cook et al., 2006) or ensembles (Caspi et al., 2023; Hermans et al., 2022).

We find that the individual maximum a posteriori (MAP) estimates vary across strains and replicates (Supplementary Figure S3A). Overall, the CNV selection coefficient, s_C, ranges from 0.1 to 0.22 (with one exception of 0.3) whereas the CNV formation rate, δ_C, ranges from 10⁻⁶ to 10⁻⁴ (with one exception of 10⁻³ and two of 10⁻⁷); and the proportion of pre-existing GAP1 CNVs that do not amplify the reporter φ ranges from 10⁻⁶ to 10⁻² (with two exceptions of 10⁻⁸). We found that the MAP estimates for replicate populations within the same strain cluster together with some outliers (Supplementary Figure S3A). We performed posterior predictive checks, drawing parameter values from the posterior distributions and simulating the frequency dynamics (Supplementary Figure S3H), which agree with the observed data (Supplementary Figure S3B). For each strain, we estimate the collective posterior distribution based on all individual posteriors, which generates a posterior distribution conditioned on all observations, P(θ|X₁, …, X_n) (Methods). The collective posterior allows us to estimate whether there is a difference in CNV formation rate and fitness effects across the four strains.

Collective posterior HDIs are very narrow (Figure 3A) and fit empirical observations (Figure 3C). The collective MAP estimates of the CNV selection coefficient are similar for the WT and LTRΔ (0. 182). For ARSΔ and ALLΔ, the selection coefficient is estimated to be lower, with values of 0.146 and 0.126, respectively. However, these selection coefficients are still large, consistent with these populations containing GAP1 CNVs that are highly beneficial under glutamine-limitation. The collective MAP estimate for the CNV formation rate in WT is 4. 5⋅10⁻⁵. In contrast with selection coefficients, the CNV formation rate is markedly lower in all strains ranging from 1⋅10⁻⁵ for LTRΔ and ALLΔ to 2. 4⋅10⁻⁶ in ARSΔ. These results are consistent with our hypothesis that proximate sequence features facilitate GAP1 CNV formation.

Inference of CNV formation rate and selection coefficient from experimental evolutionary data.
(A) Collective MAP estimate (black markers) and 50% HDR (colored areas) of *GAP1* CNV formation rate, δ_C, and selection coefficient, s_C. Marginal posterior distributions are shown on the top and right axes. (B) Collective posterior prediction of Shannon diversity of CNV lineages (e^{−Σ_i[p_ilog(p_i)]}, Jost, 2006). Line and shaded area show mean and 50% HDI. (C) CNV reported frequency (X_C⁺) prediction using collective MAP (solid line) compared to empirical observations (dotted lines).

In all strains, collective parameter estimations are highly correlated, as expected for joint estimation of selection coefficients and beneficial mutation rates (Gitschlag et al., 2023). Thus, the collective MAP predictions can be interpreted as a range of parameter values for each strain. Indeed, other than the very final time point for the WT population, all collective MAP predictions lay within the interquartile ranges (Supplementary Figure S3B). The observed GAP1 CNV frequency stabilizes at different levels in the different experiments (Figure 1B). This can be explained by pre-existing unreported CNVs with frequencies estimated to be between φ=4⋅10⁻⁶ to 1. 6⋅10⁻⁴ by the collective MAPs (Supplementary Figure S3C). Indeed, our model predicts that the total (reported and unreported) final CNV frequency is nearly one in all cases (Supplementary Figure S3G).

Next, we estimated the de novo CNV diversity. Previous work showed a diversity of CNV allele types formed under glutamine-limited selection including tandem duplications, segmental amplification, translocations, and whole chromosome amplification (Lauer et al., 2018), and that lineage richness decreases rapidly over the course of evolution due in part to competition and clonal interference (Lauer et al., 2018; Levy et al., 2015; Nguyen Ba et al., 2019). Our model does not include competition, clonal interference, or recurrent CNV formation. Therefore, diversity calculations are likely overestimations. Nonetheless, a comparison of diversity between strains is informative of whether proximate genome elements affect CNV allele diversity. Therefore, for each strain, we used its collective MAP to simulate a posterior prediction for the genotype frequencies (Figure 3C), which we then used to predict the posterior Shannon diversity (Jost, 2006). We predict the set of CNV alleles to be highly diverse: the final predicted Shannon diversity ranges from 1. 6⋅10⁴ in ARSΔ to 3. 2⋅10⁵ in WT (Figure 3B). Our model predicts that the diversity increases rapidly during the selection phase and stabilizes in the equilibrium phase. This is because CNV alleles that form towards the end of the experiment would have a low frequency with a minor effect on diversity. We observe the greatest diversity in WT populations with lower diversity in the three genomic architecture mutants. Moreover, diversity saturates faster in WT populations. This suggests that the WT strain is able to form more unique CNVs allele types earlier compared to the other three strains (Figure 3B). Shannon diversity is lower in LTRΔ and further lower in ALLΔ and ARSΔ (Figure 3B). This rank order is the same as the CNV formation rates (Figure 3A).

We used a modified version of the evolutionary model with the inferred parameters to simulate an evolutionary competition between WT and the three architecture mutant strains over 116 generations, a point in which CNVs have reached high frequency in the original experiment. To “win” these competitions, the competitor strains need to adapt to glutamine-limitation by producing CNVs (Supplementary Figure S3J and supplementary information). The results of the simulated evolutionary competitions predict that the WT would dominate over all three strains, as its predicted final proportion almost always exceeds its initial frequency of 0.5. The average predicted proportion of WT cells when competing with LTRΔ is 0.717. In contrast, ARSΔ and ALLΔ are predicted to be almost eliminated by generation 116, as the average predicted WT proportion is 0.998 and 0.999, respectively. These simulated competitions further suggest that the ARS is a more important contributor to adaptive evolution mediated by GAP1 CNVs.

Inference of CNV mechanisms in genome architecture mutants

Contrary to our expectations, removal of proximate genomic elements from the GAP1 locus does not inhibit the formation of GAP1 CNVs. We sought to determine the molecular basis by which GAP1 CNVs form in the absence of these local elements. Therefore, we isolated ∼40 GAP1 CNV-containing clones from each population containing the four different strains at generations 79 and 125 and performed Illumina whole-genome sequencing. Using a combination of read depth, split read, and discordant read analysis, we defined the extent of the amplified region, the precise CNV breakpoints, and GAP1 copy number. On the basis of these features, we inferred the CNV-forming mechanisms for each GAP1 CNV (Methods). Over the 177 analyzed GAP1 CNVs, we observed tandem amplifications, tandem triplications with an inverted middle copy, intra-chromosomal translocations, aneuploidy, and complex CNVs. GAP1 copy numbers range from two to six in any given clone. Each of the four strains is able to produce a diversity of CNV alleles ranging from small (tens of kilobases) to large (∼hundreds of kilobases) segmental amplifications (Figure 4). We identified six major CNV-forming mechanisms across the four strains: ODIRA, LTR NAHR, NAHR, transposon-mediated, complex CNVs, and whole chromosome duplication (aneuploidy) (Figure 4A and Methods).

Local and distal elements contribute to generation of *GAP1* CNV alleles.
**(A)** Schematic of *Saccharomyces cerevisiae GAP1* locus on Chromosome XI: 513332-518060 with LTR, ARS elements and tRNA genes labeled. ODIRA is a DNA replication-error based CNV mechanism. Here, we classify a clone as ODIRA if it has an inverted sequence in at least one breakpoint. ODIRA typically forms tandem triplications with an inverted middle copy and contains an ARS. Long terminal repeat non-allelic homologous recombination (LTR NAHR) is a mechanism we define by having both CNV breakpoints at LTR sites. Sometimes we detect a hybrid sequence between two LTR sequences, a result of recombination between the two LTRs. Non-allelic homologous recombination (NAHR) is defined by having at least one CNV breakpoint not at LTR sites, ie. other homologous sequences in the genome. Sometimes we detect a hybrid sequence between the two homologous sequences. Transposon-mediated mechanisms observed involve at least one intermediate novel LTR retrotransposon insertion. The newly deposited LTR sequences recombines with other LTR sequences, either pre-existing or introduced by a second *de novo* retrotransposition, to form a resulting CNV. Complex CNV is defined by a clone having more than two breakpoints in chromosome XI, indicative of having more than one amplification event. (B) Violin plot of CNV length in each genome-sequenced clone, n = 177. Strain has a significant effect on CNV length, Kruskal-Wallis test, p = 3.0 x 10^-4. (C) Barplot of inferred CNV mechanisms, described in A, for each CNV clone isolated from evolving populations. Inference came from a combination of read depth, split read, and discordant read analysis (see Methods). Strain is significantly associated with CNV Mechanism Fisher’s Exact Test, p = 5.0 x 10^-4. There is a significant increase in ODIRA prevalence between WT and LTRΔ, chi-sq, p = 0.02469. There is a significant decrease in ODIRA prevalence from WT to ARSΔ and ALLΔ, chi-sq, p = 0.002861 and 0.002196, respectively. There is a significant decrease of LTR NAHR from WT to LTRΔ, chi-sq, p = 0.03083.
**(D) Top**: Schematic of *S. cerevisiae* chromosome XI, with LTR, ARS elements, tRNA genes annotated. LTR-blue, ARS-purple, tRNA-orange, *GAP1* ORF-white rectangle. Using a combination of read depth, split read, and discordant read analysis, we defined the extent of the amplified region, the precise CNV breakpoints, and *GAP1* copy number. *GAP1* copy numbers were estimated using read depth relative to the average read depth of chromosome XI. We define the upstream and downstream breakpoints as kilobases away from the start codon of the *GAP1* ORF (vertical dotted line). **Bottom:** Dumbbell plots represent the amplified region (>1 copy) relative to the WT architecture reference genome. The ends of the dumbbells mark the approximate CNV breakpoints. Select clones were chosen as representative of the observed diversity of amplifications.
(E) Scatterplots of CNV length for all genome-sequenced clones, n = 177. We defined the upstream and downstream breakpoints as kilobases away from the start codon of the *GAP1* ORF (vertical dotted line in dumbbell plot). CNV mechanisms are defined in Figure 4A.

ODIRA is a predominant mechanism of CNV formation

We inferred GAP1 CNVs formed through ODIRA in all four genotypes at high frequencies: 22 out of 37 WT clones (59%), 42 out of 52 LTRΔ clones (81%), 11 out of 42 ARSΔ clones (26%), and 12 out of 46 ALLΔ clones (26%). Considering the set of all CNVs in all strains, ODIRA is the most common CNV mechanism comprising almost half of all CNVs (87/177, 49%). The second most common mechanism is NAHR between flanking LTRs (38/177, 21%), which generate tandem amplifications. In the WT background, ODIRA (22/37) and NAHR between LTRs (11/37) account for 89% of GAP1 CNVs.

In LTRΔ, GAP1 CNVs form via ODIRA, chromosome missegregation, and NAHR using other sites. As expected, in LTRΔ clones we did not detect NAHR between LTRs in 52 clones and as a result focal amplifications were not detected (Figure 4C). In LTRΔ, CNVs are formed predominantly by ODIRA (42/52, 81%) (Supplementary Table 3), a significant increase relative to WT clones (chi-sq, p = 0.02469) (Figure 4C). By contrast, aneuploidy (5/52), complex CNV (3/52), and NAHR (2/52) account for less than 10% of GAP1 CNVs in LTRΔ. Consequently, we observe an increase in GAP1 CNV size in LTRΔ relative to WT (Figure 4B) as there is an increased prevalence of segmental amplifications and aneuploidy (Figure 4E).

Aneuploidy was rarely observed. Whole amplification of chromosome XI was detected in six out of 177 clones (3.4%) (Figure 4C). It was detected only in 2 strains: WT and LTRΔ (Figure 4D).

ODIRA generates CNVs using distal ARS

Whereas removal of proximate LTRs abrogates the formation of CNVs through NAHR, removal of the local ARS does not prevent the formation of GAP1 CNVs through ODIRA (Figure 4C). In the absence of the proximate ARS, distal ones are used to form ODIRA as all amplified regions of ODIRA clones contain a distal ARS (Figure 4D), with one exception (see Methods, Supplementary Figure S4B). We observe an increase of LTR NAHR in the ARSΔ clones (27/52, 52%) relative to WT clones (11/37, 39%) (Figure 4C, chi-sq, p = 0.03083). In ARSΔ, we find two CNV size groups (Figure 4B). Small amplifications are formed via NAHR of the flanking LTRs (Figure 4E) and large segmental amplifications are formed via ODIRA and by NAHR between one local and one distal LTR (Figure 4E).

Novel retrotransposition events potentiate GAP1 CNVs

CNVs in the ALLΔ clones form by two major mechanisms: 1) ODIRA using distal ARS sites to form large amplifications and 2) LTR NAHR following novel Ty LTR retrotransposon insertions to form focal amplifications (transposon-mediated, Figure 4). ALLΔ clones have larger amplifications formed by ODIRA than ODIRA-generated amplification in WT and LTRΔ (Figure 4E) because they encompass distal ARS and inverted repeats (Figure 4D). Surprisingly, we detected novel LTR retrotransposon events that generated new LTRs that subsequently formed GAP1 CNVs through NAHR with a pre-existing LTR in the genome or an LTR from a second novel retrotransposition (Figure 4E). Regions upstream of tRNA genes are known to be hotspots for Ty retrotransposons (Ji et al., 1993; Mularoni et al., 2012). We find the novel retrotransposons insert near one or both of the previously deleted LTR sites (Supplementary File 1), which flank GAP1 and are downstream of tRNA genes (Figure 1A). We only detected novel retrotranspositions in ALLΔ populations. In total we detected 15 unique Ty retrotransposon insertion sites of which eight were upstream of the deleted LTR, YKRCδ11, and four were downstream of deleted LTR, YKRCδ12 (Supplementary File 1). The remaining two insertions were distal to the GAP1 gene: one on the short arm and the second on the long arm of chromosome XI. Every insertion was upstream of an tRNA gene, consistent with the biased preference of Ty LTR insertions (Ji et al., 1993; Mularoni et al., 2012). Recombination between a new and preexisting LTR produces large amplifications whereas recombination between two newly inserted Ty1 flanking the GAP1 gene forms focal amplifications of the GAP1 gene (Figure 4E).

Discussion

In this study we sought to understand the molecular basis of repeated de novo amplifications and selection of the general amino acid permease gene, GAP1, in S. cerevisiae continuously cultured in glutamine-limited selection. We hypothesized that a high formation rate of GAP1 CNVs is due to the unique genomic architecture at the locus, which comprises two flanking long terminal repeats and a DNA replication origin. We used genetic engineering, experimental evolution, and neural network simulation-based inference to quantify de novo CNV dynamics and estimate the CNV formation rate and selection coefficient in engineered mutants lacking the proximate genome elements. We find that removal of these elements has a significant impact on de novo CNV dynamics, CNV formation rate, and selection coefficients. However, CNVs are formed and selected in the absence of these elements highlighting the plasticity of the genome and diversity of mechanisms that generate CNVs during adaptive evolution.

Despite their proximity to GAP1 we found that flanking LTRs are not an essential driver of CNV formation. The de novo CNV dynamics of WT and LTRΔ populations are similar and we find that although the CNV formation rate is reduced, the effect is small. In contrast, a significantly decreased CNV formation rate and delayed CNV appearance time was observed in the absence of the ARS in ARSΔ and ALLΔ populations, which suggests that the local ARS is a major determinant of GAP1 CNV-mediated adaptive dynamics. Indeed, ODIRA was identified as the predominant CNV mechanism in sequence-characterized clones revealing that DNA replication errors are a major source of CNV formation during adaptive evolution.

The prevalence of ODIRA is a result of many replication origins and the pervasiveness of inverted repeat sequences throughout the chromosome (Figure 4). In particular, breakpoint analysis of LTRΔ CNV clones show that ODIRA produces a continuum of CNV sizes along the short arm of chromosome XI. Downstream breakpoints range from near the GAP1 gene (∼3 kilobases) to all the way to the right telomere of chromosome XI (153 kilobases) (Figure 4E). The S. cerevisiae genome contains a high number, 1-1000 per kilobase, of inverted repeats ranging from 3bp to 14bp (Martin et al., 2024). The longer repeats are more likely to be used in ODIRA (Martin et al., 2024). The ubiquity of inverted repeats is in stark contrast to the relative paucity of LTR sequences, which are dispersed throughout the genome. Thus, ODIRA supplies a diverse and high number of gene amplifications for selection to act on, setting the stage for genome evolution and adaptation.

Consistent with previous reports of increased Ty insertions in S. cerevisiae under stress conditions (Morillon et al., 2000, 2002), we observed novel retrotransposon insertions in populations evolved in glutamine-limited chemostats. Transposon insertions could be harmful and lead to loss-of-function mutations but are also substrate for generating beneficial alleles including CNVs (Blanc & Adams, 2003; Dunham et al., 2002; Gresham et al., 2008; Wilke & Adams, 1992). We only detected novel Ty insertions in the ALLΔ strain. This is likely because regions upstream of tRNA genes unoccupied by LTRs are predisposed to transposition. Our detection of novel retrotransposon insertions is consistent with a previous experimental evolution study that suggested that Ty insertions were rare under constant nitrogen-limitation and substantially more common under fluctuating nitrogen limitation, in which cells experience total nitrogen starvation periodically (Hays et al., 2023). We observe a substantially lower frequency of novel Ty retrotransposition events, 15/177 clones, compared to 898/345 clones, with multiple insertions per genome, isolated from serial transfer ammonium-limited experimental evolution (Hays et al., 2023). Importantly, the role of Ty differs in the two studies, as in our case beneficial CNV formed after novel retrotransposition through recombination of newly introduced LTR sequences, whereas Hays et al. found Ty associated null alleles that are beneficial in nitrogen-limited conditions. Our results reveal the prevalence of Ty insertions even under constant nitrogen-limiting conditions and demonstrate how new retrotransposition events can contribute to CNV formation.

Aneuploidy was not a major source of adaptation in our experiments as it was infrequently detected (n = 6/177). This contrasts with studies suggesting aneuploidy is a rapid and transient route to adaptation over short evolutionary time scales (Chen, Bradford, et al., 2012; Chen et al., 2015; Chen, Rubinstein, et al., 2012; Pavelka et al., 2010; Selmecki et al., 2015; Yona et al., 2012).

However, aneuploidy incurs a fitness cost (Robinson et al., 2023; Tsai et al., 2019; Yang et al., 2021) and therefore can be outcompeted by slow-forming but less costly beneficial mutations in large populations (Kohanovski et al., 2024). Our observations of higher frequencies of focal and segmental amplifications may be because they are less costly than whole-chromosome amplifications.

A variety of DNA replication errors generate CNVs. Replication slippage at palindromic DNA and DNA repeats can cause fork stalling and downstream CNV formation (Lee et al., 2007; Zhang, Khajavi, et al., 2009). DNA repeats can form secondary structures like R loops, cruciforms, non-B DNA structures, and hairpins which stimulate CNV formation (Gu et al., 2008b). Untimely replication, faulty fork progression, S-phase checkpoint dysfunction, defective nucleosome assembly, and DNA repeat sites including LTRs are sources of replication-associated genome instability (Aguilera & García-Muse, 2013). However, DNA replication alone might not be the only contributor. The GAP1 gene is highly transcribed under glutamine-limitation (Airoldi et al., 2016). Transcription-replication collisions may fuel ODIRA CNV formation at this locus (Lauer & Gresham, 2019; Wilson et al., 2015). CNV formation can also be stimulated by transcription-associated replication stress and histone acetylation (Hull et al., 2017; Salim et al., 2021; Whale et al., 2022) and replication fork stalling at tRNA genes (Osmundson et al., 2017; Yeung & Smith, 2020). Testing the role of transcription in promoting the formation of adaptive CNVs warrants further investigation.

Recent work has proposed that ODIRA CNVs are a major mechanism of CNVs in human genomes (Brewer et al., 2015; Martin et al., 2024). Studies of human and yeast genomes have typically considered homologous recombination as the predominant mechanism of CNV formation (Lupski & Stankiewicz, 2005). CNV hotspots identified in the human (Chance et al., 1994; Lupski, 1998; Lupski & Stankiewicz, 2005; Pentao et al., 1992) and yeast genomes are indeed mediated by NAHR of long repeat sequences (Green et al., 2010; Gresham et al., 2010). However, a focus on recombination-based mechanisms as a means of generating copy number variation may be the result of ascertainment bias or the comparative ease of studying the effect of long repeat sequences over short palindromic ones. Our study demonstrates that experimental evolution in yeast is a useful approach to elucidating the molecular mechanisms by which DNA replication errors generate CNVs.

Methods

Strains and Media

Each of the three architecture mutants were constructed independently starting with the GAP1 CNV reporter strain (DGY1657). To construct each deletion strain, we performed two rounds of transformations both using PCR amplified donor templates designed for homology-directed repair. The first transformation used a repair template containing a nourseothricin resistance cassette to replace the pre-existing kanamycin resistance cassette and GAP1 gene. The repair template was designed to also deleted the elements of interests (ie. ARS, both flanking LTRs, or both LTRs and ARS). The second transformation replaced the nourseothricin cassette with a kanamycin resistance cassette and GAP1 gene thus yielding a genomic architecture Δ strain. We confirmed scarless deletions with sanger sequencing and whole-genome-sequencing. Final identifiers are DGY1657 for the WT architecture strain, DGY2076 for the LTRΔ strain, DGY2150 for the ARSΔ strain, and DGY2071 for the ALLΔ strain.

DGY1, DGY500, and DGY1315 are zero-, one-, two-, copy GFP controls, respectively, described in Lauer et al, 2018 and Spealman et al. 2023 (Lauer et al., 2018; Spealman et al., 2023). Briefly, GFP under the ACT1 promoter was inserted at neutral loci that do not undergo amplification in glutamine-limited continuous culture. 400μM glutamine-limited media is described in Lauer et al. 2018.

Long-Term Experimental Evolution

We performed experimental evolution of 30 S. cerevisiae populations in miniature chemostats (ministats) for ∼137 generations under nitrogen limitation with 400μM glutamine as in Lauer et al. (2018). Of the 30 populations, there were three controls: one control population with no fluorescent reporter (DGY1), one with one GFP fluorescent reporter (DGY500), one with two GFP fluorescent reporters (DGY1315). The remaining 27 populations have the GAP1 CNV GFP reporter. Of these, five populations are WT (DGY1657), seven are LTRΔ (DGY2076), seven are ARSΔ (DGY2150), and eight are ALLΔ (DGY2071). We inoculated each ministat containing 20ml of glutamine-limited media with 0.5 ml culture from its corresponding genotype founder population. The founder population was founded by a single colony grown overnight in glutamine-limited media at 30°C. Replicate populations of the same strain were inoculated from the same founder population derived from a single colony. Strains were randomized among the 30-plex ministat setup to account for the possibility of systematic position effects. After inoculation, populations were incubated in a growth chamber at 30°C for 24 hours with the media inflow pump off. After 24 hours, the populations had reached early stationary phase and we turned on the media inflow pump and waited 4 hours for the populations to reach steady-state equilibrium, at which the population size was ∼10⁸ cells. This was generation zero. Ministats were incubated in a growth chamber at 30°C with a dilution rate of 0.12 culture volumes/hr. Since the ministats had a 20ml culture volume, the population doubling time was 5.8 hours. Approximately every 10 generations, we froze 2-ml samples of each population in 15% glycerol stored at -80°C. Approximately every 30 generations, we pelleted cells from 1-ml samples of each population and froze them at -80°C for genomic DNA extraction.

Flow Cytometry analysis to study GAP1 CNV dynamics

To track GAP1 CNV dynamics, we sampled 1-ml from each population approximately every 10 generations. We sonicated cell populations for 1 minute to remove any cell clumping and immediately analyzed samples on the Cytek Aurora flow cytometer. We sampled 100,000 cells per population and recorded forward scatter, side scatter, and GFP fluorescent signals for every cell. We performed hierarchical gating to define cells, single cells, unstained (zero-copy-GFP control) cells, cells with one copy of GFP (GAP1), and two or more copies of GFP (GAP1) (Spealman et al., 2023). First we gated for cells (filtered out any debris, bacteria) by graphing forward scatter area (FSC-A) against side scatter area (SSC-A). Second, we gated for single cells by graphing forward scatter area against forward scatter height and drawing along the resulting diagonal. Finally, we drew non-overlapping gates to define three subpopulations: zero copy, one copy, and two or more copies of GFP by graphing B2 channel area (B2-A), which detects GFP (excitation = 516 λ, emission = 529 λ), against forward scatter area (FSC-A). We note that the one copy and two copy events overlap some, which is a limitation in this experiment (Spealman et al., 2023).

We found that two architecture mutants, DGY2150 and DGY2071, had strain-specific GFP fluorescence even though they only harbored one copy of GFP. DGY2150 and DGY2071 had slightly higher fluorescence than the one copy GFP control strain, DGY500, but less than that of the two copy GFP control strain, DGY1315. The third architecture mutant, DGY2076, had the same GFP fluorescence as the one-copy GFP control strain (DGY500). We ruled out that they were spontaneous diploids by looking at forward scatter signals. The forward scatter signal was not different from that of the one copy control (a haploid) and was not as high as a diploid. Therefore due to strain-specific fluorescence, we decided to perform strain-based gating, ie. one set of gates for the WT strain, a second set of gates for the LTRΔ strain, and so on. Since the controls are also a strain of their own, they were not used to set universal gates for one-copy or two-copy. Thus, for each strain, we chose the basis of our one-copy gate as the timepoint per strain in aggregate with the lowest median cell-sized normalized fluorescence. The two-or-more-copy (CNV) gate was drawn directly above and non-overlapping with the one-copy gate.

Quantification of dynamics

To obtain the proportion of CNVs for each population at each timepoint, we applied gates that correspond to zero-,one-, and two-or-more copy subpopulations. Using such proportion per population per timepoint, we summarize population CNV dynamics as follows (Lauer et al., 2018; Spealman et al., 2023). We calculate the generation of CNV appearance for each of the evolved populations. We defined CNV appearance as the generation where the proportion of CNV-containing cells first surpasses a threshold of 10% for three consecutive generations. Next, modified from Lang et al. (2011) (Lang et al., 2011) and Lauer et al. (2018) (Lauer et al., 2018), we calculate the percent increase in CNVs per generation for each evolved population. We compute the natural log of the proportion of the population with CNVs divided by the proportion of the population without CNVs for each timepoint. These proportions were obtained previously by gating. We plot these values across time and perform linear regression during the initial increase of CNVs. The slope of the linear regression is the percent increase in CNVs per generation. Finally, we calculate the time to CNV equilibrium, as defined by the generation at which a linear regression results in a slope < 0.005 after the selection phase.

Neural network simulation-based inference of evolutionary parameters

Evolutionary model

We developed a Wright-Fisher model that describes the evolutionary dynamics, similar to our previous study (Avecilla et al., 2022). In this study we have shown that a Wright-Fisher model is suitable for describing evolutionary dynamics in a chemostat. This is a discrete-time, non-overlapping generations model with a constant population size. Every generation has three stages: selection, in which the frequency of genotypes with beneficial alleles increases; mutation, in which genotypes can gain a single beneficial mutation or CNV; and drift, in which the population of the next generation is generated by sampling from a multinomial distribution. Our model follows the change in frequency of four genotypes (Figure 2B): A, the ancestor genotype; B, a cell with a non-CNV beneficial mutation; C⁺, a genotype with two copies of GAP1 and two copies of the CNV reporter; and C⁻, a genotype with two copies of GAP1 but only a single copy of the CNV reporter. CNV and non-CNV alleles are formed at a rate of δ_C and δ_B and have a selection coefficient of s_C and s_B, respectively. The frequency of genotype i is X_i. Unlike X_B and X_C⁺, which may increase due to both mutation and selection, we assume that C⁻ is not generated after generation 0 (as experimental results suggest that the reporter is working properly). Hence, the C⁻ genotype only increases in frequency due to selection, with s_C as its selection coefficient. We assume C⁻ has an initial frequency φ. Model equations and further details are in the supplementary information.

Simulation-based inference

We use a neural network simulation-based inference method, Neural Posterior Estimation or NPE (Papamakarios, 2019) to estimate the joint posterior distribution of three model parameters, s_C, δ_C and φ, while the other parameters, s_B and δ_B are fixed to a specific value(Table 1). Inferring all five model parameters resulted in similar prediction accuracy and s_C and δ_C estimates.

Model parameters and priors.
Fixed parameters from (Avecilla et al., 2022; Hall et al., 2008; Joseph & Hall, 2004; Venkataram et al., 2016).

We applied NPE implemented in the Python package sbi (Tejero-Cantero et al., 2020) using a masked autoregressive flow (Papamakarios et al., 2018) as the neural density estimator: an artificial neural network that “learns” an amortized posterior of model parameters from a set of synthetic simulations. Posterior amortization allows us to estimate the posterior distribution P(θ|Χ) for a new observation Χ without the need to re-run the entire inference pipeline, i.e., generating new simulations and re-training the network (as is the case in sampling-based methods such as Markov chain Monte Carlo or MCMC).

We generated 100,000 synthetic observations simulated from our evolutionary model using parameters drawn from the prior distribution (Table 1). The neural density estimator was trained using early stopping with a convergence threshold of 100 epochs without decreases in minimal validation loss (the default in sbi is 20). Using 100 epochs as a threshold resulted in improved predictions. We validated that this improvement in prediction accuracy is not a result of over-fitting (Supplementary Figure S3F).

We validated the trained neural density estimator by measuring the coverage property: the probability of parameters to fall within the estimated posterior marginal 95% HDI. Then, we used the distribution of (Supplementary Figure S3D) and posterior predictive checks (Supplementary Figure S3B) as quantitative and qualitative measures of prediction accuracy, respectively.

Collective posterior distribution

NPE estimates a posterior distribution per observation, i.e., P(θ|X). Given n observations X₁, …X_n generated from the same model distribution P(θ), where each observation is a time-series of GAP1 CNV frequency, NPE infers n individual posterior distributions, each conditioned on a single observation, P(X_i). We estimate the collective posterior distribution based on n individual posteriors, that is, a posterior distribution conditioned on all observations,

This can be computed using the individual posteriors P(X_i) and the prior P(θ) (see supplementary information for derivation).

However, as P(X_i) could be infinitesimally small, a single observation could potentially reject a parameter value that is likely according to other observations. We want the collective posterior to be robust to such non-representative observations. Therefore, we define P_∈(X_i) = max(∈, P(X_i)) and use this quantity instead of P(X_i) in eq. 1. For a correct choice of ∈, the collective posterior mode should reflect a value with high posterior density for multiple observations, rather than a value that no individual posterior completely rejects. Thus, we implemented an amortized collective posterior: given a set of observations, X₁, …X_n, an amortized posterior for individual observations, P(θ|X), a prior, P(θ), and ∈, a collective posterior distribution P(θ|X₁, …, X_n), can be computed. This distribution can then be used to evaluate the posterior distribution of specific parameters given the observations, generate random samples from the posterior distribution (e.g., using rejection sampling), or find the collective MAP.

We set ∈ = e⁻¹⁵⁰ based on a visual grid-search. To find the normalizing factor (denominator in eq. 1), the integral is approximated by a dense Riemann sum (300³ points). Maximizing the distribution, i.e., finding the collective MAP, is implemented using scipy’s minimize method with the Nelder-Mead algorithm.

Genetic diversity

Using our evolutionary model with the inferred parameters, we can estimate the diversity of CNV alleles in the experiments. For each strain, we used samples from its collective posterior to simulate a posterior prediction for the CNV allele frequencies (Figure 3C), which we then used to compute the posterior Shannon diversity (Jost, 2006), as detailed in the supplementary information.

Whole genome sequencing of isolated clones

Clones were isolated from archived populations and verified to harbor a GAP1 CNV by measuring GFP fluorescence signal consistent with two or more copies. Populations of each strain from generation 79 were streaked out from the -80°C archive on YPD and incubated at 30°C for 2 days. Plates containing single colonies were viewed under a blue light to view GFP fluorescent colonies by eye. Relative to the fluorescence of the 2 copy control strain, we picked single colonies that fluoresced as bright or brighter, reasoning that these colonies would likely contain GAP1 CNVs. Single colonies were used to inoculate cultures in glutamine-limited media and incubated at 30°C for 18 hours. The cultures were analyzed on the Cytek Aurora to verify they indeed harbored two or more copies of GAP1 based on GFP fluorescence signal. For Illumina whole genome sequencing, genomic DNA was isolated using Hoffman-Winston method. Libraries were prepared using a Nextera kit and Illumina adapters. Libraries were sequenced on Illumina NextSeq 500 platform PE150 (2 x 150 300 Cycle v2.5) or Illumina NovaSeq 6000 SP PE150 (2x 150 300 Cycle v1.5). We also used custom Nextera Index Primers reported in S1 Table Baym et al. 2015 (Baym et al., 2015).

Breakpoint analysis and CNV mechanism inference in sequenced clones

Reference genomes

We created a custom reference genome for each of the genomic architecture mutants. The custom reference genome containing the GAP1 CNV reporter in Lauer et al. 2018 (NCBI assembly R64) was modified to delete the flanking LTRs, single ARS, or all three elements.

Copy number estimation by read depth

The estimation of GAP1 copy number from read depth used is described in Lauer et al. 2018, except we searched for ≥ 1000 base pairs of contiguous sequence. CNV boundaries were refined by visual inspection.

Structural variation calling and breakpoint analysis

Whole genome sequences of clones were run through CVish, a structural variant caller (Spealman, 2019). Structural variant calling was also done on each of the ancestor genomes: WT, ARSΔ, LTRΔ, and ALLΔ. Output .bam files containing split reads and discordant reads of evolved clones and its corresponding ancestor were visualized on Jbrowse 2 to confirm locations of de novo CNV breakpoints and orientation of sequences at the novel breakpoint junctions. Novel contigs relative to the reference genome were outputted in addition to the supporting split reads that generated the contig. Blastn was used to verify orientation of contigs, namely inverted sequences used to define ODIRA (see Definitions of Inferred CNV Mechanisms).

Definitions of Inferred CNV mechanisms

We used the following liberal classifications for each CNV category. We called a clone ODIRA if we found inverted sequences in at least one breakpoint (Figure 4A). We define LTR NAHR as having both breakpoints at LTR sites (Figure 4A). This is evidence of recombination between the homologous LTR sequences. This mechanism typically forms tandem amplifications. In some cases, we find the hybrid sequence between two LTRs, but this is hard to retrieve in short-read-sequencing.

We define NAHR as having breakpoints at homologous sequences, with at least one breakpoint not at an LTR sequence (Figure 4A). We define transposon-mediated as a clone having a breakpoint at a novel LTR retrotransposon insertion and the other breakpoint at a different LTR site (Figure 4A). This supports that the newly deposited LTR sequence recombined with other LTR sequences (either pre-existing or introduced by a second de novo retrotransposition) to form CNVs. Sometimes we are able to recover the hybrid sequence between LTR sequences. We define complex CNV as having more than two breakpoints on chromosome XI and a read depth profile that looks like more than one amplification event occurred (multi-step profile). For the complex CNV clones, we were not able to resolve the CNV mechanisms due to the limitations of short-read sequencing.

Supplement

Summary of genome sequence analysis of clones containing a single copy of the GAP1 CNV reporter. Estimated copy number of the GAP1 gene and inserted GFP gene of sequenced clones from five 1-copy-GFP subpopulations of the WT genome architecture strain. Copy number estimation is defined as the read depth of the target gene relative to the average read depth of the chromosome XI. Populations 1, 2, 4, 5 contain clones harboring GAP1 CNVs but only 1 copy of GFP. Population 3 and 5 contain clones containing 1 copy each of GAP1 and GFP suggesting these lineages have beneficial mutations elsewhere in the genome, allowing coexistence with the GAP1 CNV major subpopulation.

**Independent *GAP1* amplifications lacking CNV reporter amplification**. Read depth plots of the *GAP1* CNV reporter locus of sequenced clones from five 1-copy-GFP subpopulations. Identification of eight distinct CNV breakpoint pairs, shown above, across the populations indicate the occurrence of at least eight independent amplifications of *GAP1* without GFP amplification. GFP reference gene - green rectangle, *GAP1* reference gene - purple rectangle.

MAP estimates of *GAP1* CNV formation rates (δ_C) and selection coefficients (*s_C*) for all replicate populations.
Markers show MAP estimates from individual replicates, crosses show 50% HDI of collective posteriors. Extreme points are marked for comparison to data and posterior prediction, see Supplementary Figure S3B for posterior predictive checks.

Posterior predictive checks for all replicates.
Black markers are the empirical observations, dashed line shows MAP prediction. The leftmost plot of each row shows the collective MAP prediction with empirical data’s interquartile range (gray bars).

Pairwise and marginal collective posteriors for all estimated model parameters.
Diagonals show marginal collective posteriors per parameter per strain. Below-diagonal plots show pairwise KDEs for all pairs of model parameters. Collective joint MAPs (which may differ from collective marginal MAPs, as the marginal distribution integrates over all other parameters), are marked by a red vertical line. Panels are separated by strain: **(A)** WT, **(B)** ARSΔ, **(C)** LTRΔ, **(D)** ALLΔ.

Parameter estimation accuracy on synthetic data.
Log-ratio of MAP estimate and true parameter value for 829 synthetic simulations in which the final reported *GAP1* CNV proportion is at least 0.3.

Estimation of network confidence.
The coverage, defining the probability that the true parameter falls within the 95% highest density interval (HDI) of the posterior distribution, for 829 synthetic simulations in which the final reported *GAP1* CNV proportion is at least 0.3. 95% HDI was calculated for each simulation using 200 posterior samples. Our neural density estimator is slightly over-confident for φ (coverage of 0.934), and under-confident for *GAP1* CNV selection coefficient and formation rate (coverage of 0.992 for s_C and 0.995 for δ_C). Despite this under-confidence, the posterior distributions are narrow in biological terms: the 95% HDI represents less than an order of magnitude for both s_C and δ_C. Thus, we did not apply post-training adjustments to the neural density estimator, such as calibration (Cook et al., 2006) or ensembles (Caspi et al., 2023; Hermans et al., 2022).

Neural density estimator training and validation loss during training.
Convergence threshold of 100 unimproved epochs (no decrease in minimal validation loss) was reached after 569 epochs.

Total *GAP1* CNV frequency.
Solid lines show collective MAP predictions, dashed lines show the total proportion of *GAP1* CNVs, comprising unreported pre-existing CNVs and reported CNVs generated during the experiment, as predicted by the evolutionary model.

Error estimation of parameter inference.
Average root mean square errors (RMSE) of 50 posterior samples against the observed data. **(A)** Individual posteriors and individual replicates. **(B)** Collective posterior and individual replicates. **(C)** Collective posterior and empirical mean.

Pairwise evolutionary competition predictions.
We simulated evolutionary competitions in the experimental conditions of WT vs. genomic architecture mutants, starting from equal frequencies. The proportion at generation 116 of WT was predicted using 10,000 combinations of collective posterior samples for each pairwise competition. Overall, WT outcompetes all mutants because it adapts faster (due to faster CNV formation rate), but its advantage over ARSΔ and ALLΔ is much higher than its advantage over LTRΔ. (A) Histograms for three pairwise competitions. Note that ARSΔ and ALLΔ values overlap at this scale and are all in the rightmost bar. (B) High-resolution histograms for ARSΔ and ALLΔ.

No significant interaction between strain and generation on CNV length.
Boxplot of CNV length of clones by strain and generation of isolation. There is no significant interaction between strain and generation of isolated clone, and no significant effect of generation on CNV length (Two-way ANOVA, Strain x Generation, p = 0.33)

**Types of ODIRA detected**. We found 87 ODIRA clones total regardless of strain. The majority of ODIRA clones fit the canonical definition of having two inverted junctions and 3 copies, 55/87 clones (63%) (ODIRA_3). We found four non-canonical types. We found 17 clones (20%) with only one inverted junction detected and 3 copies (ODIRA_oneEnd_3). We found 11 clones (13%) with two inverted junctions but only 2 copies (ODIRA_2). We found 3 clones (3.4%) with only one inverted junction detected and 2 copies (ODIRA_oneEnd_2). We found 1 clone (1.1%) with two inverted junctions but the amplified region did not contain an ARS.

Inferred CNV mechanisms by strain.
Counts of inferred CNV mechanisms for each sequenced clone, n=177, separated by strain.

Supplementary File 1. Ty-associated clones and locations of novel Ty insertions.

Supplementary File 2. CNV Clone Sequencing Analysis

Data availability

Sequencing data is available at SRA PRJNA1098800.

Other associated data are available here: https://osf.io/js7z8/.

Source code repository simulation-based inference: https://github.com/yoavram-lab/chuong_et_al.

Scripts for flow cytometry-based evolutionary dynamics and analysis of CNV clones: https://github.com/GreshamLab/local_arch_variants.

Acknowledgements

We thank all the members of the Gresham lab for helpful discussions, NYU Gencore for sequencing samples, and NYU High Performance Cluster for computing and storage. We thank Joshua Caleb Macdonald, Saharon Rosset, Uri Obolski, and Adi Stern for discussions and advice. This work was supported by NSF GRFP DGE1839302 (JNC), NIGMS T32GM132037 (JNC), NIGMS R01GM134066 (DG), R01GM107466 (DG), and R35GM153419 (DG), NIAID R01AI140766 (DG), NSF 1818234 (DG), Israel Science Foundation (ISF, Y.R. 552/19), US–Israel Binational Science Foundation (Y.R. & D.G. 2021276), Minerva Center for Live Emulation of Evolution in the Lab (Y.R.) fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University (N.B.N), and fellowship from the AI and Data Science Center at Tel-Aviv University (N.B.N).

References

1. Aguilera A.
2. García-Muse T
2013Causes of genome instabilityAnnual Review of Genetics 47:1–32https://doi.org/10.1146/annurev-genet-111212-133232 Google Scholar
1. Airoldi E. M.
2. Miller D.
3. Athanasiadou R.
4. Brandt N.
5. Abdul-Rahman F.
6. Neymotin B.
7. Hashimoto T.
8. Bahmani T.
9. Gresham D
2016Steady-state and dynamic gene expression programs in Saccharomyces cerevisiae in response to variation in environmental nitrogenMolecular Biology of the Cell 27:1383–1396https://doi.org/10.1091/mbc.E14-05-1013 Google Scholar
1. Arlt M. F.
2. Wilson T. E.
3. Glover T. W
2012Replication stress and mechanisms of CNV formationCurrent Opinion in Genetics & Development 22:204–210https://doi.org/10.1016/j.gde.2012.01.009 Google Scholar
1. Arndt P. F.
2. Hwa T.
3. Petrov D. A
2005Substantial Regional Variation in Substitution Rates in the Human Genome: Importance of GC Content, Gene Density, and Telomere-Specific EffectsJournal of Molecular Evolution 60:748–763https://doi.org/10.1007/s00239-004-0222-5 Google Scholar
1. Avecilla G.
2. Chuong J. N.
3. Li F.
4. Sherlock G.
5. Gresham D.
6. Ram Y
2022Neural networks enable efficient and accurate simulation-based inference of evolutionary parameters from adaptation dynamicsPLOS Biology 20:e3001633https://doi.org/10.1371/journal.pbio.3001633 Google Scholar
1. Avecilla G.
2. Spealman P.
3. Matthews J.
4. Caudal E.
5. Schacherer J.
6. Gresham D
2023Copy number variation alters local and global mutational toleranceGenome Research https://doi.org/10.1101/gr.277625.122 Google Scholar
1. Baym M.
2. Kryazhimskiy S.
3. Lieberman T. D.
4. Chung H.
5. Desai M. M.
6. Kishony R
2015Inexpensive Multiplexed Library Preparation for Megabase-Sized GenomesPLOS ONE 10:e0128036https://doi.org/10.1371/journal.pone.0128036 Google Scholar
1. Ben-David U.
2. Amon A
2020Context is everything: Aneuploidy in cancerNature Reviews. Genetics 21:44–62https://doi.org/10.1038/s41576-019-0171-x Google Scholar
1. Blanc V. M.
2. Adams J
2003Evolution in Saccharomyces cerevisiae: Identification of Mutations Increasing Fitness in Laboratory PopulationsGenetics 165:975–983https://doi.org/10.1093/genetics/165.3.975 Google Scholar
1. Blount Z. D.
2. Barrick J. E.
3. Davidson C. J.
4. Lenski R. E
2012Genomic analysis of a key innovation in an experimental Escherichia coli populationNature 489:513–518https://doi.org/10.1038/nature11514 Google Scholar
1. Blount Z. D.
2. Borland C. Z.
3. Lenski R. E
2008Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coliProceedings of the National Academy of Sciences 105:7899–7906https://doi.org/10.1073/pnas.0803151105 Google Scholar
1. Brewer B. J.
2. Payen C.
3. Rienzi S. C. D.
4. Higgins M. M.
5. Ong G.
6. Dunham M. J.
7. Raghuraman M. K
2015Origin-Dependent Inverted-Repeat Amplification: Tests of a Model for Inverted DNA AmplificationPLOS Genetics 11:e1005699https://doi.org/10.1371/journal.pgen.1005699 Google Scholar
1. Cardoso A. R.
2. Oliveira M.
3. Amorim A.
4. Azevedo L
2016Major influence of repetitive elements on disease-associated copy number variants (CNVs)Human Genomics 10:30https://doi.org/10.1186/s40246-016-0088-9 Google Scholar
1. Carvalho C. M. B.
2. Pehlivan D.
3. Ramocki M. B.
4. Fang P.
5. Alleva B.
6. Franco L. M.
7. Belmont J. W.
8. Hastings P. J.
9. Lupski J. R
2013Replicative mechanisms for CNV formation are error proneNature Genetics 45:1319–1326https://doi.org/10.1038/ng.2768 Google Scholar
1. Caspi I.
2. Meir M.
3. Ben Nun N.
4. Abu Rass R.
5. Yakhini U.
6. Stern A.
7. Ram Y
2023Mutation rate, selection, and epistasis inferred from RNA virus haplotypes via neural posterior estimationVirus Evolution 9:vead033https://doi.org/10.1093/ve/vead033 Google Scholar
1. Chance P. F.
2. Abbas N.
3. Lensch M. W.
4. Pentao L.
5. Roa B. B.
6. Patel P. I.
7. Lupski J. R
1994Two autosomal dominant neuropathies result from reciprocal DNA duplication/deletion of a region on chromosome 17Human Molecular Genetics 3:223–228https://doi.org/10.1093/hmg/3.2.223 Google Scholar
1. Chen G.
2. Bradford W. D.
3. Seidel C. W.
4. Li R
2012Hsp90 stress potentiates rapid cellular adaptation through induction of aneuploidyNature 482:246–250https://doi.org/10.1038/nature10795 Google Scholar
1. Chen G.
2. Mulla W. A.
3. Kucharavy A.
4. Tsai H.-J.
5. Rubinstein B.
6. Conkright J.
7. McCroskey S.
8. Bradford W. D.
9. Weems L.
10. Haug J. S.
11. Seidel C. W.
12. Berman J.
13. Li R
2015Targeting the Adaptability of Heterogeneous AneuploidsCell 160:771–784https://doi.org/10.1016/j.cell.2015.01.026 Google Scholar
1. Chen G.
2. Rubinstein B.
3. Li R
2012Whole chromosome aneuploidy: Big mutations drive adaptation by phenotypic leap. BioEssays: News and Reviews in MolecularCellular and Developmental Biology 34:893https://doi.org/10.1002/bies.201200069 Google Scholar
1. Cheng Z.
2. Ventura M.
3. She X.
4. Khaitovich P.
5. Graves T.
6. Osoegawa K.
7. Church D.
8. DeJong P.
9. Wilson R. K.
10. Pääbo S.
11. Rocchi M.
12. Eichler E. E
2005A genome-wide comparison of recent chimpanzee and human segmental duplicationsNature 437:88–93https://doi.org/10.1038/nature04000 Google Scholar
1. Chuang J. H.
2. Li H
2004Functional Bias and Spatial Organization of Genes in Mutational Hot and Cold Regions in the Human GenomePLOS Biology 2:e29https://doi.org/10.1371/journal.pbio.0020029 Google Scholar
1. Cook S. R.
2. Gelman A.
3. Rubin D. B
2006Validation of Software for Bayesian Models Using Posterior QuantilesJournal of Computational and Graphical Statistics 15:675–692https://doi.org/10.1198/106186006X136976 Google Scholar
1. Cranmer K.
2. Brehmer J.
3. Louppe G
2020The frontier of simulation-based inferenceProceedings of the National Academy of Sciences 117:30055–30062https://doi.org/10.1073/pnas.1912789117 Google Scholar
1. Di Rienzi S. C.
2. Collingwood D.
3. Raghuraman M. K.
4. Brewer B. J.
2009Fragile Genomic Sites Are Associated with Origins of ReplicationGenome Biology and Evolution 1:350–363https://doi.org/10.1093/gbe/evp034 Google Scholar
1. Dunham M. J.
2. Badrane H.
3. Ferea T.
4. Adams J.
5. Brown P. O.
6. Rosenzweig F.
7. Botstein D
2002Characteristic genome rearrangements in experimental evolution of Saccharomyces cerevisiaeProceedings of the National Academy of Sciences of the United States of America 99:16144–16149https://doi.org/10.1073/pnas.242624799 Google Scholar
1. Gitschlag B. L.
2. Cano A. V.
3. Payne J. L.
4. McCandlish D. M.
5. Stoltzfus A
2023Mutation and Selection Induce Correlations between Selection Coefficients and Mutation RatesThe American Naturalist 202:534–557https://doi.org/10.1086/726014 Google Scholar
1. Gonçalves P. J.
2. Lueckmann J.-M.
3. Deistler M.
4. Nonnenmacher M.
5. Öcal K.
6. Bassetto G.
7. Chintaluri C.
8. Podlaski W. F.
9. Haddad S. A.
10. Vogels T. P.
11. Greenberg D. S.
12. Macke J. H
2020Training deep neural density estimators to identify mechanistic models of neural dynamicseLife 9:e56261https://doi.org/10.7554/eLife.56261 Google Scholar
1. Green B. M.
2. Finn K. J.
3. Li J. J
2010Loss of DNA Replication Control Is a Potent Inducer of Gene AmplificationScience 329:943–946https://doi.org/10.1126/science.1190966 Google Scholar
1. Gresham D.
2. Desai M. M.
3. Tucker C. M.
4. Jenq H. T.
5. Pai D. A.
6. Ward A.
7. DeSevo C. G.
8. Botstein D.
9. Dunham M. J
2008The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeastPLoS Genetics 4:e1000303https://doi.org/10.1371/journal.pgen.1000303 Google Scholar
1. Gresham D.
2. Usaite R.
3. Germann S. M.
4. Lisby M.
5. Botstein D.
6. Regenberg B
2010Adaptation to diverse nitrogen-limited environments by deletion or extrachromosomal element formation of the GAP1 locusProceedings of the National Academy of Sciences of the United States of America 107:18551–18556https://doi.org/10.1073/pnas.1014023107 Google Scholar
1. Gu W.
2. Zhang F.
3. Lupski J. R
2008aMechanisms for human genomic rearrangementsPathoGenetics 1:4https://doi.org/10.1186/1755-8417-1-4 Google Scholar
1. Gu W.
2. Zhang F.
3. Lupski J. R
2008bMechanisms for human genomic rearrangementsPathoGenetics 1:4https://doi.org/10.1186/1755-8417-1-4 Google Scholar
1. Hall D. W.
2. Mahmoudizad R.
3. Hurd A. W.
4. Joseph S. B
2008Spontaneous mutations in diploid Saccharomyces cerevisiae: Another thousand cell generationsGenetics Research 90:229–241https://doi.org/10.1017/S0016672308009324 Google Scholar
1. Harel T.
2. Pehlivan D.
3. Caskey C. T.
4. Lupski J. R.
5. Rosenberg R. N
6. Pascual J. M
2015Chapter 1—Mendelian, Non-Mendelian, Multigenic Inheritance, and EpigeneticsIn: Rosenberg’s Molecular and Genetic Basis of Neurological and Psychiatric Disease (Fifth Edition) Academic Press pp. 3–27https://doi.org/10.1016/B978-0-12-410529-4.00001-2 Google Scholar
1. Hastings P. J.
2. Ira G.
3. Lupski J. R
2009A Microhomology-Mediated Break-Induced Replication Model for the Origin of Human Copy Number VariationPLOS Genetics 5:e1000327https://doi.org/10.1371/journal.pgen.1000327 Google Scholar
1. Hastings P. J.
2. Lupski J. R.
3. Rosenberg S. M.
4. Ira G
2009Mechanisms of change in gene copy numberNature Reviews Genetics 10:551–564https://doi.org/10.1038/nrg2593 Google Scholar
1. Hays M.
2. Schwartz K.
3. Schmidtke D. T.
4. Aggeli D.
5. Sherlock G
2023Paths to adaptation under fluctuating nitrogen starvation: The spectrum of adaptive mutations in Saccharomyces cerevisiae is shaped by retrotransposons and microhomology-mediated recombinationPLOS Genetics 19:e1010747https://doi.org/10.1371/journal.pgen.1010747 Google Scholar
1. Hermans J.
2. Delaunoy A.
3. Rozet F.
4. Wehenkel A.
5. Begy V.
6. Louppe G
2022A Trust Crisis In Simulation-Based Inference? Your Posterior Approximations Can Be UnfaithfularXiv https://doi.org/10.48550/arXiv.2110.06581 Google Scholar
1. Horiuchi T.
2. Horiuchi S.
3. Novick A
1963The genetic basis of hyper-synthesis of beta-galactosidaseGenetics 48:157–169https://doi.org/10.1093/genetics/48.2.157 Google Scholar
1. Hull R. M.
2. Cruz C.
3. Jack C. V.
4. Houseley J
2017Environmental change drives accelerated adaptation through stimulated copy number variationPLOS Biology 15:e2001333https://doi.org/10.1371/journal.pbio.2001333 Google Scholar
1. Ji H.
2. Moore D. P.
3. Blomberg M. A.
4. Braiterman L. T.
5. Voytas D. F.
6. Natsoulis G.
7. Boeke J. D
1993Hotspots for unselected Ty1 transposition events on yeast chromosome III are near tRNA genes and LTR sequencesCell 73:1007–1018https://doi.org/10.1016/0092-8674(93)90278-x Google Scholar
1. Joseph S. B.
2. Hall D. W
2004Spontaneous Mutations in Diploid Saccharomyces cerevisiae: More Beneficial Than ExpectedGenetics 168:1817–1825https://doi.org/10.1534/genetics.104.033761 Google Scholar
1. Jost L
2006Entropy and diversityOikos 113:363–375https://doi.org/10.1111/j.2006.0030-1299.14714.x Google Scholar
1. Kohanovski I.
2. Pontz M.
3. Vande Zande P.
4. Selmecki A.
5. Dahan O.
6. Pilpel Y.
7. Yona A. H.
8. Ram Y
2024Aneuploidy can be an evolutionary diversion on the path to adaptationMolecular Biology and Evolution 41:msae052https://doi.org/10.1093/molbev/msae052 Google Scholar
1. Lang G. I.
2. Botstein D.
3. Desai M. M
2011Genetic variation and the fate of beneficial mutations in asexual populationsGenetics 188:647–661https://doi.org/10.1534/genetics.111.128942 Google Scholar
1. Lang G. I.
2. Murray A. W
2011Mutation Rates across Budding Yeast Chromosome VI Are Correlated with Replication TimingGenome Biology and Evolution 3:799–811https://doi.org/10.1093/gbe/evr054 Google Scholar
1. Lauer S.
2. Avecilla G.
3. Spealman P.
4. Sethia G.
5. Brandt N.
6. Levy S. F.
7. Gresham D
2018Single-cell copy number variant detection reveals the dynamics and diversity of adaptationPLOS Biology 16:e3000069https://doi.org/10.1371/journal.pbio.3000069 Google Scholar
1. Lauer S.
2. Gresham D
2019An evolving view of copy number variantsCurrent Genetics 65:6https://doi.org/10.1007/s00294-019-00980-0 Google Scholar
1. Lee J. A.
2. Carvalho C. M. B.
3. Lupski J. R
2007A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic DisordersCell 131:1235–1247https://doi.org/10.1016/j.cell.2007.11.037 Google Scholar
1. Lercher M. J.
2. Hurst L. D
2002Human SNP variability and mutation rate are higher in regions of high recombinationTrends in Genetics 18:337–340https://doi.org/10.1016/S0168-9525(02)02669-0 Google Scholar
1. Levy S. F.
2. Blundell J. R.
3. Venkataram S.
4. Petrov D. A.
5. Fisher D. S.
6. Sherlock G
2015Quantitative evolutionary dynamics using high-resolution lineage trackingNature 519:181–186https://doi.org/10.1038/nature14279 Google Scholar
1. Lukow D. A.
2. Sausville E. L.
3. Suri P.
4. Chunduri N. K.
5. Wieland A.
6. Leu J.
7. Smith J. C.
8. Girish V.
9. Kumar A. A.
10. Kendall J.
11. Wang Z.
12. Storchova Z.
13. Sheltzer J. M
2021Chromosomal instability accelerates the evolution of resistance to anti-cancer therapiesDevelopmental Cell 56:2427–2439https://doi.org/10.1016/j.devcel.2021.07.009 Google Scholar
1. Lupski J. R
1998Genomic disorders: Structural features of the genome can lead to DNA rearrangements and human disease traitsTrends in Genetics 14:417–422https://doi.org/10.1016/S0168-9525(98)01555-8 Google Scholar
1. Lupski J. R.
2. Stankiewicz P
2005Genomic Disorders: Molecular Mechanisms for Rearrangements and Conveyed PhenotypesPLoS Genetics 1:e49https://doi.org/10.1371/journal.pgen.0010049 Google Scholar
1. Malhotra D.
2. Sebat J
2012CNVs: Harbingers of a Rare Variant Revolution in Psychiatric GeneticsCell 148:1223–1241https://doi.org/10.1016/j.cell.2012.02.039 Google Scholar
1. Martin R.
2. Espinoza C. Y.
3. Large C. R. L.
4. Rosswork J.
5. Bruinisse C. V.
6. Miller A. W.
7. Sanchez J. C.
8. Miller M.
9. Paskvan S.
10. Alvino G. M.
11. Dunham M. J.
12. Raghuraman M. K.
13. Brewer B. J
2024Template switching between the leading and lagging strands at replication forks generates inverted copy number variants through hairpin-capped extrachromosomal DNAPLOS Genetics 20:e1010850https://doi.org/10.1371/journal.pgen.1010850 Google Scholar
1. Matassi G.
2. Sharp P. M.
3. Gautier C
1999Chromosomal location effects on gene sequence evolution in mammalsCurrent Biology 9:786–791https://doi.org/10.1016/S0960-9822(99)80361-3 Google Scholar
1. Morillon A.
2. Bénard L.
3. Springer M.
4. Lesage P
2002Differential Effects of Chromatin and Gcn4 on the 50-Fold Range of Expression among Individual Yeast Ty1 RetrotransposonsMolecular and Cellular Biology 22:2078–2088https://doi.org/10.1128/MCB.22.7.2078-2088.2002 Google Scholar
1. Morillon A.
2. Springer M.
3. Lesage P
2000Activation of the Kss1 Invasive-Filamentous Growth Pathway Induces Ty1 Transcription and Retrotransposition in Saccharomyces cerevisiaeMolecular and Cellular Biology 20:5766–5776https://doi.org/10.1128/MCB.20.15.5766-5776.2000 Google Scholar
1. Mularoni L.
2. Zhou Y.
3. Bowen T.
4. Gangadharan S.
5. Wheelan S. J.
6. Boeke J. D
2012Retrotransposon Ty1 integration targets specifically positioned asymmetric nucleosomal DNA segments in tRNA hotspotsGenome Research 22:693–703https://doi.org/10.1101/gr.129460.111 Google Scholar
1. Ba Nguyen
2. Cvijović A. N.
3. Rojas Echenique I.
4. Lawrence J. I.
5. Rego-Costa K. R.
6. Liu A.
7. Levy X.
8. Desai S. F.
9. M M.
2019High-resolution lineage tracking reveals travelling wave of adaptation in laboratory yeastNature 575:494–499https://doi.org/10.1038/s41586-019-1749-3 Google Scholar
1. Nishant K. T.
2. Singh N. D.
3. Alani E
2009Genomic mutation rates: What high-throughput methods can tell us. BioEssays: News and Reviews in MolecularCellular and Developmental Biology 31:912–920https://doi.org/10.1002/bies.200900017 Google Scholar
1. Ohno S
1970Evolution by Gene DuplicationNew York: Springer-Verlag Google Scholar
1. Osmundson J. S.
2. Kumar J.
3. Yeung R.
4. Smith D. J
2017Pif1-family helicases cooperate to suppress widespread replication fork arrest at tRNA genesNature Structural & Molecular Biology 24:162–170https://doi.org/10.1038/nsmb.3342 Google Scholar
1. Papamakarios G
2019Neural Density Estimation and Likelihood-free InferencearXiv https://doi.org/10.48550/arXiv.1910.13233 Google Scholar
1. Papamakarios G.
2. Pavlakou T.
3. Murray I
2018Masked Autoregressive Flow for Density EstimationarXiv https://doi.org/10.48550/arXiv.1705.07057 Google Scholar
1. Pavelka N.
2. Rancati G.
3. Zhu J.
4. Bradford W. D.
5. Saraf A.
6. Florens L.
7. Sanderson B. W.
8. Hattem G. L.
9. Li R
2010Aneuploidy confers quantitative proteome changes and phenotypic variation in budding yeastNature 468:321–325https://doi.org/10.1038/nature09529 Google Scholar
1. Payen C.
2. Sunshine A. B.
3. Ong G. T.
4. Pogachar J. L.
5. Zhao W.
6. Dunham M. J
2016High-Throughput Identification of Adaptive Mutations in Experimentally Evolved Yeast PopulationsPLoS Genetics 12:e1006339https://doi.org/10.1371/journal.pgen.1006339 Google Scholar
1. Pentao L.
2. Wise C. A.
3. Chinault A. C.
4. Patel P. I.
5. Lupski J. R
1992Charcot-Marie-Tooth type 1A duplication appears to arise from recombination at repeat sequences flanking the 1.5 Mb monomer unitNature Genetics 2:292–300https://doi.org/10.1038/ng1292-292 Google Scholar
1. Pös O.
2. Radvanszky J.
3. Buglyó G.
4. Pös Z.
5. Rusnakova D.
6. Nagy B.
7. Szemes T
2021DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspectsBiomedical Journal 44:548–559https://doi.org/10.1016/j.bj.2021.02.003 Google Scholar
1. Robinson D.
2. Vanacloig-Pedros E.
3. Cai R.
4. Place M.
5. Hose J.
6. Gasch A. P
2023Gene-by-environment interactions influence the fitness cost of gene copy-number variation in yeastG3 13:jkad159https://doi.org/10.1093/g3journal/jkad159 Google Scholar
1. Rutledge S. D.
2. Douglas T. A.
3. Nicholson J. M.
4. Vila-Casadesús M.
5. Kantzler C. L.
6. Wangsa D.
7. Barroso-Vilares M.
8. Kale S. D.
9. Logarinho E.
10. Cimini D
2016Selective advantage of trisomic human cells cultured in non-standard conditionsScientific Reports 6:22828https://doi.org/10.1038/srep22828 Google Scholar
1. Salim D.
2. Bradford W. D.
3. Rubinstein B.
4. Gerton J. L
2021DNA replication, transcription, and H3K56 acetylation regulate copy number and stability at tandem repeatsG3 (Bethesda, Md.) 11:jkab082https://doi.org/10.1093/g3journal/jkab082 Google Scholar
1. Selmecki A. M.
2. Maruvka Y. E.
3. Richmond P. A.
4. Guillet M.
5. Shoresh N.
6. Sorenson A. L.
7. De S.
8. Kishony R.
9. Michor F.
10. Dowell R.
11. Pellman D
2015Polyploidy can drive rapid adaptation in yeastNature 519:349–352https://doi.org/10.1038/nature14187 Google Scholar
1. Sonti R. V.
2. Roth J. R
1989Role of gene duplications in the adaptation of Salmonella typhimurium to growth on limiting carbon sourcesGenetics 123:19–28https://doi.org/10.1093/genetics/123.1.19 Google Scholar
1. Spealman P.
2019CVish Structural Variant Breakpoint IdentifierGithub https://github.com/pspealman/CVish
1. Spealman P.
2. Avecilla G.
3. Matthews J.
4. Suresh I.
5. Gresham D
2022Complex Genomic Rearrangements following Selection in a Glutamine-Limited Medium over Hundreds of GenerationsMicrobiology Resource Announcements 11:e00729–22https://doi.org/10.1128/mra.00729-22 Google Scholar
1. Spealman P.
2. De T.
3. Chuong J. N.
4. Gresham D
2023Best Practices in Microbial Experimental Evolution: Using Reporters and Long-Read Sequencing to Identify Copy Number Variation in Experimental EvolutionJournal of Molecular Evolution 91:356–368https://doi.org/10.1007/s00239-023-10102-7 Google Scholar
1. Stankiewicz P.
2. Shaw C. J.
3. Dapper J. D.
4. Wakui K.
5. Shaffer L. G.
6. Withers M.
7. Elizondo L.
8. Park S.-S.
9. Lupski J. R
2003Genome architecture catalyzes nonrecurrent chromosomal rearrangementsAmerican Journal of Human Genetics 72:1101–1116https://doi.org/10.1086/374385 Google Scholar
1. Storz J. F
2016Gene Duplication and Evolutionary Innovations in Hemoglobin-Oxygen TransportPhysiology 31:223–232https://doi.org/10.1152/physiol.00060.2015 Google Scholar
1. Taylor J. S.
2. Raes J
2004Duplication and divergence: The evolution of new genes and old ideasAnnual Review of Genetics 38:615–643https://doi.org/10.1146/annurev.genet.38.072902.092831 Google Scholar
1. Tejero-Cantero A.
2. Boelts J.
3. Deistler M.
4. Lueckmann J.-M.
5. Durkan C.
6. Gonçalves P. J.
7. Greenberg D. S.
8. Macke J. H
2020SBI -- A toolkit for simulation-based inferencearXiv https://doi.org/10.48550/arXiv.2007.09114 Google Scholar
1. Tsai H.-J.
2. Nelliat A. R.
3. Choudhury M. I.
4. Kucharavy A.
5. Bradford W. D.
6. Cook M. E.
7. Kim J.
8. Mair D. B.
9. Sun S. X.
10. Schatz M. C.
11. Li R
2019Hypo-osmotic-like stress underlies general cellular defects of aneuploidyNature 570:117–121https://doi.org/10.1038/s41586-019-1187-2 Google Scholar
1. Turner D. J.
2. Miretti M.
3. Rajan D.
4. Fiegler H.
5. Carter N. P.
6. Blayney M. L.
7. Beck S.
8. Hurles M. E
2008Germline rates of de novo meiotic deletions and duplications causing several genomic disordersNature Genetics 40:90–95https://doi.org/10.1038/ng.2007.40 Google Scholar
1. Venkataram S.
2. Dunn B.
3. Li Y.
4. Agarwala A.
5. Chang J.
6. Ebel E. R.
7. Geiler-Samerotte K.
8. Hérissant L.
9. Blundell J. R.
10. Levy S. F.
11. Fisher D. S.
12. Sherlock G.
13. Petrov D. A
2016Development of a Comprehensive Genotype-to-Fitness Map of Adaptation-Driving Mutations in YeastCell 166:1585–1596https://doi.org/10.1016/j.cell.2016.08.002 Google Scholar
1. Whale A. J.
2. King M.
3. Hull R. M.
4. Krueger F.
5. Houseley J
2022Stimulation of adaptive gene amplification by origin firing under replication fork constraintNucleic Acids Research 50:915–936https://doi.org/10.1093/nar/gkab1257 Google Scholar
1. Wilke C. M.
2. Adams J
1992Fitness effects of Ty transposition in Saccharomyces cerevisiaeGenetics 131:31–42https://doi.org/10.1093/genetics/131.1.31 Google Scholar
1. Wilson T. E.
2. Arlt M. F.
3. Park S. H.
4. Rajendran S.
5. Paulsen M.
6. Ljungman M.
7. Glover T. W
2015Large transcription units unify copy number variants and common fragile sites arising under replication stressGenome Research 25:189–200https://doi.org/10.1101/gr.177121.114 Google Scholar
1. Wolfe K. H.
2. Sharp P. M.
3. Li W.-H
1989Mutation rates differ among regions of the mammalian genomeNature 337:283–285https://doi.org/10.1038/337283a0 Google Scholar
1. Yang F.
2. Todd R. T.
3. Selmecki A.
4. Jiang Y.
5. Cao Y.
6. Berman J
2021The fitness costs and benefits of trisomy of each Candida albicans chromosomeGenetics 218:iyab056https://doi.org/10.1093/genetics/iyab056 Google Scholar
1. Yeung R.
2. Smith D. J
2020Determinants of Replication-Fork Pausing at tRNA Genes in Saccharomyces cerevisiaeGenetics 214:825–838https://doi.org/10.1534/genetics.120.303092 Google Scholar
1. Yona A. H.
2. Manor Y. S.
3. Herbst R. H.
4. Romano G. H.
5. Mitchell A.
6. Kupiec M.
7. Pilpel Y.
8. Dahan O
2012Chromosomal duplication is a transient evolutionary solution to stressProceedings of the National Academy of Sciences 109:21010–21015https://doi.org/10.1073/pnas.1211150109 Google Scholar
1. Zhang F.
2. Gu W.
3. Hurles M. E.
4. Lupski J. R
2009Copy Number Variation in Human Health, Disease, and EvolutionAnnual Review of Genomics and Human Genetics 10:451–481https://doi.org/10.1146/annurev.genom.9.081307.164217 Google Scholar
1. Zhang F.
2. Khajavi M.
3. Connolly A. M.
4. Towne C. F.
5. Batish S. D.
6. Lupski J. R
2009The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humansNature Genetics 41:849–853https://doi.org/10.1038/ng.399 Google Scholar
1. Zuellig M. P.
2. Sweigart A. L
2018Gene duplicates cause hybrid lethality between sympatric species of MimulusPLoS Genetics 14:e1007130https://doi.org/10.1371/journal.pgen.1007130 Google Scholar

Article and author information

Author information

Julie N Chuong
Department of Biology, Center for Genomics and Systems Biology, New York University
ORCID iD: 0000-0002-4388-9458
Nadav Ben Nun
School of Zoology, Faculty of Life Sciences, Tel Aviv University, Edmond J. Safra Center for Bioinformatics, Tel Aviv University
Ina Suresh
Department of Biology, Center for Genomics and Systems Biology, New York University
Julia Matthews
Department of Biology, Center for Genomics and Systems Biology, New York University
Titir De
Department of Biology, Center for Genomics and Systems Biology, New York University
Grace Avecilla
Biobus, Inc
Farah Abdul-Rahman
Memorial Sloan Kettering Cancer Center
Nathan Brandt
Department of Biological Sciences, North Carolina State University
Yoav Ram
School of Zoology, Faculty of Life Sciences, Tel Aviv University, Edmond J. Safra Center for Bioinformatics, Tel Aviv University
David Gresham
Department of Biology, Center for Genomics and Systems Biology, New York University
ORCID iD: 0000-0002-4028-0364
- Correspondence: dgresham@nyu.edu

Version history

Preprint posted: May 6, 2024
Sent for peer review: May 13, 2024
Reviewed Preprint version 1: August 2, 2024
Reviewed Preprint version 2: December 13, 2024
Version of Record published: February 3, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.98934. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 1,938
downloads: 104
citations: 3

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Introduction

Local ARS contributes to CNV dynamics during adaptive evolution.

Results

Local genomic architecture contributes to GAP1 CNV evolutionary dynamics

Deletion of the ARS, but not the flanking LTRs alters CNV dynamics

GAP1 amplifications can occur without CNV reporter amplification

CNV reporter failure does not impact parameter inference.

Incorporating unreported pre-existing CNVs in an evolutionary model

Decreased CNV formation rates in mutants suggest adjacent elements can drive GAP1 CNV formation

Inference of CNV formation rate and selection coefficient from experimental evolutionary data.

Inference of CNV mechanisms in genome architecture mutants

Local and distal elements contribute to generation of GAP1 CNV alleles.

ODIRA is a predominant mechanism of CNV formation

ODIRA generates CNVs using distal ARS

Novel retrotransposition events potentiate GAP1 CNVs

Discussion

Methods

Strains and Media

Long-Term Experimental Evolution

Flow Cytometry analysis to study GAP1 CNV dynamics

Quantification of dynamics

Neural network simulation-based inference of evolutionary parameters

Evolutionary model

Simulation-based inference

Model parameters and priors.

Collective posterior distribution

Genetic diversity

Whole genome sequencing of isolated clones

Breakpoint analysis and CNV mechanism inference in sequenced clones

Reference genomes

Copy number estimation by read depth

Structural variation calling and breakpoint analysis

Definitions of Inferred CNV mechanisms

Supplement

MAP estimates of GAP1 CNV formation rates (δC) and selection coefficients (sC) for all replicate populations.

Posterior predictive checks for all replicates.

Pairwise and marginal collective posteriors for all estimated model parameters.

Parameter estimation accuracy on synthetic data.

Estimation of network confidence.

Neural density estimator training and validation loss during training.

Total GAP1 CNV frequency.

Error estimation of parameter inference.

Pairwise evolutionary competition predictions.

No significant interaction between strain and generation on CNV length.

Inferred CNV mechanisms by strain.

Data availability

Acknowledgements

References

Article and author information

Author information

Julie N Chuong

Nadav Ben Nun

Ina Suresh

Julia Matthews

Titir De

Grace Avecilla

Farah Abdul-Rahman

Nathan Brandt

Yoav Ram

David Gresham

Version history

Cite all versions

Copyright

Metrics

MAP estimates of GAP1 CNV formation rates (δ_C) and selection coefficients (s_C) for all replicate populations.