Local ARS contributes to CNV dynamics during adaptive evolution.

(A) The Saccharomyces cerevisiae GAP1 gene is located on the short arm of chromosome XI (beige rectangle). Light blue rectangle - Ty Long terminal repeats (LTR). Purple rectangle - Autonomously replicating sequences (ARS). Orange rectangles - tRNA genes. GAP1 ORF - white rectangle. The GAP1 gene (white rectangle) is flanked by Ty1 LTRs (YKRCδ11, YKRCδ12), which are remnants of retrotransposon events and is directly upstream of an autonomously replicating sequence (ARS1116). Variants of the GAP1 locus were engineered to remove either both LTRs, the single ARS, or all three elements. All engineered genomes contain a CNV reporter. (B) We evolved the four different strains in 5-8 replicate populations, for a total of 27 populations, in glutamine-limited chemostats and monitored the formation and selection of de novo GAP1 CNVs for 137 generations using flow cytometry. Population samples were taken every 8-10 generations and 100,000 cells were assayed using a flow cytometer. Colored lines show the median proportion of cells in a population with GAP1 amplifications across 5-8 replicate populations of the labeled strain. The shaded regions represent the median absolute deviation across the replicates. (C) We summarized CNV dynamics and found that strain has a significant effect on CNV appearance (Kruskal-Wallis, p =0.001384). There are significant differences in CNV appearance between LTRΔ (blue) and ARSΔ (red), and LTRΔ (blue) and ALLΔ (yellow) (pairwise wilcoxon test with Bonferroni correction, p=0.0059 and p=0.0124, respectively). (D) Strain has a significant effect on the per generation increase in proportion of cells with CNV (ANOVA, p = 0.00318). There is a significant difference between LTRΔ (blue) and ALLΔ (yellow) (pairwise t-test with Bonferroni correction, p = 0.0026). (E) Strain has a significant effect on time to CNV equilibrium phase (ANOVA, p = 0.00833). There is a significant difference in time to CNV equilibrium between WT and ARSΔ (pairwise t-tests with bonferroni correction, p =0.050).

CNV reporter failure does not impact parameter inference.

(A) Flow cytometry of a representative WT population with a persistent single-copy GFP subpopulation comprising ∼25% of the population. (B) Model illustration. XA is the frequency of ancestor cells in the chemostat; XC+, XC are the frequencies of cells with GAP1 duplications with two or one reporters, respectively, and a selection coefficient sC; XB is the frequency of cells with other beneficial mutations and a selection coefficient sB. GAP1 duplications form with a rate δC, other beneficial mutations occur with rate δB. At generation 0, only genotypes C and A are present, with frequencies of XC = φ and XA = 1 − φ. (C) Examples of total CNV proportions (solid) and reported CNV proportions (dashed) for two parameter combinations, both with sC = 0. 15, φ = 10−4.

Inference of CNV formation rate and selection coefficient from experimental evolutionary data.

(A) Collective MAP estimate (black markers) and 50% HDR (colored areas) of GAP1 CNV formation rate, δC, and selection coefficient, sC. Marginal posterior distributions are shown on the top and right axes. (B) Collective posterior prediction of Shannon diversity of CNV lineages (e−Σi[pilog(pi)], Jost, 2006). Line and shaded area show mean and 50% HDI. (C) CNV reported frequency (XC+) prediction using collective MAP (solid line) compared to empirical observations (dotted lines).

Local and distal elements contribute to generation of GAP1 CNV alleles.

(A) Schematic of Saccharomyces cerevisiae GAP1 locus on Chromosome XI: 513332-518060 with LTR, ARS elements and tRNA genes labeled. ODIRA is a DNA replication-error based CNV mechanism. Here, we classify a clone as ODIRA if it has an inverted sequence in at least one breakpoint. ODIRA typically forms tandem triplications with an inverted middle copy and contains an ARS. Long terminal repeat non-allelic homologous recombination (LTR NAHR) is a mechanism we define by having both CNV breakpoints at LTR sites. Sometimes we detect a hybrid sequence between two LTR sequences, a result of recombination between the two LTRs. Non-allelic homologous recombination (NAHR) is defined by having at least one CNV breakpoint not at LTR sites, ie. other homologous sequences in the genome. Sometimes we detect a hybrid sequence between the two homologous sequences. Transposon-mediated mechanisms observed involve at least one intermediate novel LTR retrotransposon insertion. The newly deposited LTR sequences recombines with other LTR sequences, either pre-existing or introduced by a second de novo retrotransposition, to form a resulting CNV. Complex CNV is defined by a clone having more than two breakpoints in chromosome XI, indicative of having more than one amplification event. (B) Violin plot of CNV length in each genome-sequenced clone, n = 177. Strain has a significant effect on CNV length, Kruskal-Wallis test, p = 3.0 x 10-4. (C) Barplot of inferred CNV mechanisms, described in A, for each CNV clone isolated from evolving populations. Inference came from a combination of read depth, split read, and discordant read analysis (see Methods). Strain is significantly associated with CNV Mechanism Fisher’s Exact Test, p = 5.0 x 10-4. There is a significant increase in ODIRA prevalence between WT and LTRΔ, chi-sq, p = 0.02469. There is a significant decrease in ODIRA prevalence from WT to ARSΔ and ALLΔ, chi-sq, p = 0.002861 and 0.002196, respectively. There is a significant decrease of LTR NAHR from WT to LTRΔ, chi-sq, p = 0.03083.

(D) Top: Schematic of S. cerevisiae chromosome XI, with LTR, ARS elements, tRNA genes annotated. LTR-blue, ARS-purple, tRNA-orange, GAP1 ORF-white rectangle. Using a combination of read depth, split read, and discordant read analysis, we defined the extent of the amplified region, the precise CNV breakpoints, and GAP1 copy number. GAP1 copy numbers were estimated using read depth relative to the average read depth of chromosome XI. We define the upstream and downstream breakpoints as kilobases away from the start codon of the GAP1 ORF (vertical dotted line). Bottom: Dumbbell plots represent the amplified region (>1 copy) relative to the WT architecture reference genome. The ends of the dumbbells mark the approximate CNV breakpoints. Select clones were chosen as representative of the observed diversity of amplifications.

(E) Scatterplots of CNV length for all genome-sequenced clones, n = 177. We defined the upstream and downstream breakpoints as kilobases away from the start codon of the GAP1 ORF (vertical dotted line in dumbbell plot). CNV mechanisms are defined in Figure 4A.

Model parameters and priors.

Fixed parameters from (Avecilla et al., 2022; Hall et al., 2008; Joseph & Hall, 2004; Venkataram et al., 2016).

Summary of genome sequence analysis of clones containing a single copy of the GAP1 CNV reporter. Estimated copy number of the GAP1 gene and inserted GFP gene of sequenced clones from five 1-copy-GFP subpopulations of the WT genome architecture strain. Copy number estimation is defined as the read depth of the target gene relative to the average read depth of the chromosome XI. Populations 1, 2, 4, 5 contain clones harboring GAP1 CNVs but only 1 copy of GFP. Population 3 and 5 contain clones containing 1 copy each of GAP1 and GFP suggesting these lineages have beneficial mutations elsewhere in the genome, allowing coexistence with the GAP1 CNV major subpopulation.

Independent GAP1 amplifications lacking CNV reporter amplification. Read depth plots of the GAP1 CNV reporter locus of sequenced clones from five 1-copy-GFP subpopulations. Identification of eight distinct CNV breakpoint pairs, shown above, across the populations indicate the occurrence of at least eight independent amplifications of GAP1 without GFP amplification. GFP reference gene - green rectangle, GAP1 reference gene - purple rectangle.

MAP estimates of GAP1 CNV formation rates (δC) and selection coefficients (sC) for all replicate populations.

Markers show MAP estimates from individual replicates, crosses show 50% HDI of collective posteriors. Extreme points are marked for comparison to data and posterior prediction, see Supplementary Figure S3B for posterior predictive checks.

Posterior predictive checks for all replicates.

Black markers are the empirical observations, dashed line shows MAP prediction. The leftmost plot of each row shows the collective MAP prediction with empirical data’s interquartile range (gray bars).

Pairwise and marginal collective posteriors for all estimated model parameters.

Diagonals show marginal collective posteriors per parameter per strain. Below-diagonal plots show pairwise KDEs for all pairs of model parameters. Collective joint MAPs (which may differ from collective marginal MAPs, as the marginal distribution integrates over all other parameters), are marked by a red vertical line. Panels are separated by strain: (A) WT, (B) ARSΔ, (C) LTRΔ, (D) ALLΔ.

Parameter estimation accuracy on synthetic data.

Log-ratio of MAP estimate and true parameter value for 829 synthetic simulations in which the final reported GAP1 CNV proportion is at least 0.3.

Estimation of network confidence.

The coverage, defining the probability that the true parameter falls within the 95% highest density interval (HDI) of the posterior distribution, for 829 synthetic simulations in which the final reported GAP1 CNV proportion is at least 0.3. 95% HDI was calculated for each simulation using 200 posterior samples. Our neural density estimator is slightly over-confident for φ (coverage of 0.934), and under-confident for GAP1 CNV selection coefficient and formation rate (coverage of 0.992 for sC and 0.995 for δC). Despite this under-confidence, the posterior distributions are narrow in biological terms: the 95% HDI represents less than an order of magnitude for both sC and δC. Thus, we did not apply post-training adjustments to the neural density estimator, such as calibration (Cook et al., 2006) or ensembles (Caspi et al., 2023; Hermans et al., 2022).

Neural density estimator training and validation loss during training.

Convergence threshold of 100 unimproved epochs (no decrease in minimal validation loss) was reached after 569 epochs.

Total GAP1 CNV frequency.

Solid lines show collective MAP predictions, dashed lines show the total proportion of GAP1 CNVs, comprising unreported pre-existing CNVs and reported CNVs generated during the experiment, as predicted by the evolutionary model.

Error estimation of parameter inference.

Average root mean square errors (RMSE) of 50 posterior samples against the observed data. (A) Individual posteriors and individual replicates. (B) Collective posterior and individual replicates. (C) Collective posterior and empirical mean.

Pairwise evolutionary competition predictions.

We simulated evolutionary competitions in the experimental conditions of WT vs. genomic architecture mutants, starting from equal frequencies. The proportion at generation 116 of WT was predicted using 10,000 combinations of collective posterior samples for each pairwise competition. Overall, WT outcompetes all mutants because it adapts faster (due to faster CNV formation rate), but its advantage over ARSΔ and ALLΔ is much higher than its advantage over LTRΔ. (A) Histograms for three pairwise competitions. Note that ARSΔ and ALLΔ values overlap at this scale and are all in the rightmost bar. (B) High-resolution histograms for ARSΔ and ALLΔ.

No significant interaction between strain and generation on CNV length.

Boxplot of CNV length of clones by strain and generation of isolation. There is no significant interaction between strain and generation of isolated clone, and no significant effect of generation on CNV length (Two-way ANOVA, Strain x Generation, p = 0.33)

Types of ODIRA detected. We found 87 ODIRA clones total regardless of strain. The majority of ODIRA clones fit the canonical definition of having two inverted junctions and 3 copies, 55/87 clones (63%) (ODIRA_3). We found four non-canonical types. We found 17 clones (20%) with only one inverted junction detected and 3 copies (ODIRA_oneEnd_3). We found 11 clones (13%) with two inverted junctions but only 2 copies (ODIRA_2). We found 3 clones (3.4%) with only one inverted junction detected and 2 copies (ODIRA_oneEnd_2). We found 1 clone (1.1%) with two inverted junctions but the amplified region did not contain an ARS.

Inferred CNV mechanisms by strain.

Counts of inferred CNV mechanisms for each sequenced clone, n=177, separated by strain.