Local and distal elements contribute to generation of GAP1 CNV alleles.
(A) Schematic of Saccharomyces cerevisiae GAP1 locus on Chromosome XI: 513332-518060 with LTR, ARS elements and tRNA genes labeled. Long terminal repeat non-allelic homologous recombination (LTR NAHR) is defined on the basis of both CNV breakpoints occuring at LTR sites as revealed by read depth plots (left, pink and blue vertical lines) and increased read depth relative to the genome-wide read depth (left). In some cases we detect a hybrid sequence between two LTR sequences, a result of recombination between the two LTRs (right). LTR NAHR typically forms tandem duplications.
(B) ODIRA is a DNA replication-error based CNV mechanism generated by template switching of the leading and lagging strand template at short inverted repeats. In this clone example, the relative read depth estimate of 2.67 copies of GAP1 is rounded to 3 copies (left) and has apparent breakpoints in the CAF4 and AIM29 genes. We classify a clone as being formed by ODIRA if it has a de novo inverted sequence in at least one breakpoint. In the clone example, the short inverted repeat pairs are CAAT, ATTG (ChrXI:508938, ChrXI: 508986) in CAF4 and TAAAA, AAAAT (ChrXI:582561, ChrXI582610) in AIM29. The contig sequence at the breakpoint (rectangle) is aligned to the reference within the CAF4 coding sequence fragment. The 5’ and 3’ ends of the contig are labeled and a dashed line indicates contiguity (no gaps). The contig spans the CAF4 breakpoint junction and contains a de novo inversion, i.e.) two fragments of the CAF4 gene are in opposite orientations with the mediating short inverted repeats shown in blue and underlined. The contig was generated using CVish (Methods) and supported by split reads at the breakpoint junction (not shown).. The contig containing a de novo inversion across the AIM29 breakpoint is not shown. ODIRA typically forms tandem triplications with an inverted middle copy and contains an ARS (bottom).
(C) Non-allelic homologous recombination (NAHR) is defined by having at least one CNV breakpoint not at the proximate LTR sites, ie. other homologous sequences in the genome. In the example clone, we detect a hybrid sequence between the two homologous sequences in BET3 (ChrXI) and SPT4 (ChrVI). Because these two sequences are on different chromosomes we infer that an interchromosomal translocation occurred. The other breakpoint is unresolved. A read depth plot supports the amplified segment containing the GAP1 gene. NAHR is able to produce supernumerary chromosomes as is the case in Clone 2968 (Figure 4G)
(D) Transposon-mediated mechanism is defined by an inference of at least one intermediate novel LTR retrotransposon insertion followed by LTR NAHR. In the ALLΔ strains which have the flanking LTRs deleted, we find novel LTR retrotransposon insertions near previously deleted LTR sites. The newly deposited LTR sequence (downstream of GAP1) recombines with another LTR sequence (upstream of GAP1), either pre-existing or introduced by a second de novo retrotransposition, to form a resulting CNV (focal amplification or segmental amplification). Read depth estimation (not shown) supports the CNV breakpoints at pre-existing or newly deposited LTRs.
(E) Violin plot of CNV length in each genome-sequenced clone, n = 177. Strain has a significant effect on CNV length, Kruskal-Wallis test, p = 1.008 x 10-5. Pairwise wilcoxon rank sum test with bonferroni correction show significant CNV length differences between WT and LTRΔ (p = 0.00490), WT and ALLΔ (p = 0.01230), LTRΔ and ARSΔ (p=0.00056), ARSΔ and ALLΔ (p=0.002).
(F) Barplot of inferred CNV mechanisms, described in A-D, for each CNV clone isolated from glutamine-limited evolving populations. Complex CNV is defined by a clone having more than two breakpoints in chromosome XI, indicative of having more than one amplification event. Inference came from a combination of read depth, split read, and discordant read analysis and the output of CVish (see Methods). Strain is significantly associated with CNV Mechanism, Fisher’s Exact Test, p = 5.0 x 10-4. There is a significant increase in ODIRA prevalence between WT and LTRΔ, chi-sq, p = 0.02469. There is a significant decrease in ODIRA prevalence from WT to ARSΔ and ALLΔ, chi-sq, p = 0.002861 and 0.002196, respectively. There is a significant decrease of LTR NAHR from WT to LTRΔ, chi-sq, p = 0.03083.
(G) Top: Schematic of S. cerevisiae chromosome XI, with LTR, ARS elements, tRNA genes annotated. LTR-blue, ARS-purple, tRNA-orange, GAP1 ORF-white rectangle. Using a combination of read depth, split read, and discordant read analysis, we defined the extent of the amplified region, the precise CNV breakpoints, and GAP1 copy number. GAP1 copy numbers were estimated using read depth relative to the average read depth of chromosome XI. Bottom: Dumbbell plots represent the amplified region (>1 copy) relative to the WT reference genome. The ends of the dumbbells mark the approximate CNV breakpoints shown relative to the start codon of the GAP1 ORF (vertical dotted line). Select clones were chosen as representative of the observed diversity of amplifications.
(H) Scatterplots of CNV length for all genome-sequenced clones, n = 177. We defined the upstream and downstream breakpoints as kilobases away from the start codon of the GAP1 ORF (vertical dashed line in (G) dumbbell plot and this scatterplot). CNV mechanisms are defined in Figure 4A-D and Methods. Select clones from (G) dumbbell plots are annotated. Note in the focal amplications resulting for LTR NAHR in WT clones and ARSΔ clones, respectively. In ARSΔ, note NAHR between one local and one distal LTR ∼60 kb upstream. Note in ALLΔ focal amplications mediated by two newly deposited LTR sequences from two transposon insertions. Note in ALLΔ amplifications formed between one newly inserted LTR and one distal pre-existing LTR sequence, 30kb or 60 kb upstream.