Spatial inter-centromeric interactions facilitated the emergence of evolutionary new centromeres

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Centromeres of Candida albicans form on unique and different DNA sequences but a closely related species, Candida tropicalis, possesses homogenized inverted repeat (HIR)-associated centromeres. To investigate the mechanism of centromere type transition, we improved the fragmented genome assembly and constructed a chromosome-level genome assembly of C. tropicalis by employing PacBio sequencing, chromosome conformation capture sequencing (3C-seq), chromoblot, and genetic analysis of engineered aneuploid strains. Further, we analyzed the 3D genome organization using 3C-seq data, which revealed spatial proximity among the centromeres as well as telomeres of seven chromosomes in C. tropicalis. Intriguingly, we observed evidence of inter-centromeric translocations in the common ancestor of C. albicans and C. tropicalis. Identification of putative centromeres in closely related Candida sojae, Candida viswanathii and Candida parapsilosis indicates loss of ancestral HIR-associated centromeres and establishment of evolutionary new centromeres (ENCs) in C. albicans. We propose that spatial proximity of the homologous centromere DNA sequences facilitated karyotype rearrangements and centromere type transitions in human pathogenic yeasts of the CUG-Ser1 clade.

Introduction

The efficient maintenance of the genetic material and its propagation to subsequent generations determine the fitness of an organism. Genomic rearrangements are often associated with the development of multiple diseases, including cancer. Chromosomal rearrangements, on the other hand, are often observed during speciation (Searle, 1998). Such structural changes begin with the formation of at least one DNA double-strand break (DSB), which is generally repaired by homologous recombination (HR) or non-homologous end joining (NHEJ) in vivo. Studies using engineered in vivo model systems suggested that the success of DSB repair through HR depends upon an efficient identification of a template donor. This process of ‘homology search’ is facilitated by the physical proximity and the extent of DNA sequence homology (Lee et al., 2016; Agmon et al., 2013; Burgess and Kleckner, 1999). Multi-invasion-induced rearrangements (MIRs) involving more than one template donors have recently been shown to be influenced by physical proximity and homology (Piazza et al., 2017). Therefore, the nature of genomic rearrangements is mostly dependent on the type of spatial genome organization. In yeasts, apicomplexans, and certain plants, centromeres cluster inside the nucleus (Muller et al., 2019), which may facilitate translocations between two chromosomes involving their centromeric and adjacent pericentromeric loci.

The centromere, one of the guardians of genome stability, assembles a large DNA-protein complex to form the kinetochore, which ensures fidelity of chromosome segregation by correctly attaching chromosomes to the spindle. Paradoxically, this conserved process of chromosome segregation is carried out by highly diverse species-specific centromere DNA sequences. For example, the length of centromere DNA is ~125 bp in budding yeast Saccharomyces cerevisiae (Clarke and Carbon, 1980), but it can be as long as a few megabases in humans (Mahtani and Willard, 1990). Centromeres have been cloned and characterized from a large number of fungal species. The only factor that remains common to most fungal centromeres is the presence of histone H3 variant CENP-A^Cse4 except in some Mucorales like Mucor circinelloides (Navarro-Mendoza et al., 2019). Many kinetochore proteins are believed to have evolved from pre-eukaryotic lineages and remained conserved within closely related species complexes or expanded through gene duplication (Meraldi et al., 2006; Tromer et al., 2019; van Hooff et al., 2017). It remains a paradox that despite the rapid evolution of centromere DNA, the kinetochore structure remains relatively well-conserved (Ekwall, 2007). Therefore, an examination of the evolutionary processes driving species-specific changes in centromere DNA is essential for a better understanding of centromere biology.

The first cloned centromere that of the budding yeast S. cerevisiae carries conserved genetic elements capable of forming a functional centromere de novo when cloned into a yeast replicative plasmid (Clarke and Carbon, 1980). Such genetic regulation of centromere function also exists in the fission yeast Schizosaccharomyces pombe, where centromeres possess inverted repeat-associated structures of 40–100 kb (Clarke and Baum, 1990). Other closely related budding and fission yeasts were also found to harbor a DNA sequence-dependent regulation of centromere function (Gordon et al., 2011; Tong et al., 2019; Kobayashi et al., 2015), but the advantage of having such genetic regulation is not well understood. In fact, the majority of species with known centromeres are thought to be regulated by an epigenetic mechanism (Ekwall, 2007). A truly epigenetically-regulated fungal centromere carrying a 3–5 kb long CENP-A^Cse4-bound unique DNA sequence exists in another budding yeast C. albicans (Sanyal et al., 2004), a CUG-Ser1 clade species in the fungal phylum of Ascomycota. Subsequently, such unique centromeres were also discovered in closely related Candida dubliniensis (Padmanabhan et al., 2008) and Candida lusitaniae (Kapoor et al., 2015). Strikingly, all seven centromeres of C. tropicalis, another CUG-Ser1 clade species, carry 3–4 kb long inverted repeats (IR) flanking ~3 kb long CENP-A^Cse4 rich central core (CC). The centromere sequences are highly identical to each other in C. tropicalis. Intriguingly, centromere DNA of C. tropicalis can facilitate de novo recruitment of CENP-A^Cse4 to some extent (Chatterjee et al., 2016). In contrast, centromeres of C. albicans completely lack such a DNA sequence-dependent mechanism (Baum et al., 2006). Such a rapid transition in the structural and functional properties of centromeres within two closely related species offers a unique opportunity to study the process of centromere type transition.

Kinetochore proteins appeared as a single punctum at the periphery of a nucleus indicating the presence of constitutively clustered centromeres in C. tropicalis (Chatterjee et al., 2016). Our previous analysis also showed that centromeres of C. tropicalis were located near interchromosomal synteny breakpoints (ICSBs) as relics of ancient translocations in the common ancestor of C. tropicalis and C. albicans (Chatterjee et al., 2016). Do homologous centromere DNA regions in close spatial proximity facilitate chromosomal translocation events? Due to the nature of the then-available fragmented genome assembly, the genome-wide distribution of the ICSBs and the spatial organization of the genome in C. tropicalis remained unexplored. However, the near-complete C. albicans genome assembly was available. Therefore, to examine whether the spatial proximity of clustered centromeres drives interchromosomal translocation events guiding speciation in the CUG-Ser1 clade required a chromosome-level complete genome assembly of C. tropicalis.

In this study, we constructed a chromosome-level gapless genome assembly of the C. tropicalis type strain MYA-3404 by combining information from previously available contigs, NGS reads and high-throughput 3C-seq data. Using this assembly and 3C-seq data, we studied the spatial genome organization in C. tropicalis. Next, we mapped the ICSBs in the C. tropicalis genome with reference to that of C. albicans (ASM18296v3) to test whether the frequency of ICSB correlated with the spatial genome organization. In addition, we performed Oxford Nanopore and Illumina sequencing and assembled the genome of Candida sojae (strain NCYC-2607), a sister species of C. tropicalis in the CUG-Ser1 clade (Shen et al., 2018). Finally, using this genome assembly of C. sojae and publicly available genome assembly of C. viswanathii (ASM332773v1), we identified the putative centromeres of these two species as HIR-associated loci syntenic to the centromeres of C. tropicalis. Based on our results, we propose a model that suggests homology and proximity guided centromere-proximal translocations facilitated karyotype evolution and possibly aided in rapid transition from HIR-associated to unique centromere types in the members of the CUG-Ser1 clade.

Results

A chromosome-level gapless assembly of the C. tropicalis genome in seven chromosomes

C. tropicalis has seven pairs of chromosomes (Chatterjee et al., 2016; Butler et al., 2009). However, the current publicly available genome assembly (ASM633v3) has 23 nuclear contigs and one mitochondrial contig. To completely assemble the nuclear genome of C. tropicalis in seven chromosomes, we combined results of short-read Illumina sequencing and long-read single molecule real-time sequencing (SMRT-seq) with high-throughput 3C-seq (simplified Hi-C) experiment (Figure 1A, Figure 1—figure supplement 1A–D; Sexton et al., 2012). We started from the publicly available genome assembly of C. tropicalis strain MYA-3404 in 23 nuclear contigs (ASM633v3, Assembly A) (Butler et al., 2009). We used Illumina sequencing reads to scaffold them into 16 contigs to get Assembly B (Figure 1A). Next, we used the SMRT-seq long reads to join these contigs, which resulted in an assembly of 12 contigs (Assembly C, Supplementary file 1). Based on the contour clamped homogenized electric field (CHEF)-gel karyotyping (Figure 1B) and 3C-seq data (Figure 1—figure supplement 1E–G), we joined two contigs and rectified a misjoin in Assembly C to produce an assembly of seven chromosomes and five short orphan haplotigs (OHs). We suspected that the OHs are heterozygous loci in the diploid genome of C. tropicalis. Analysis of the de novo contigs (Figure 1—figure supplement 1H, Materials and methods), sequence coverage data (Figure 1—figure supplement 2A–B), and Southern hybridization of engineered aneuploid strains demonstrated that the small OHs mapped to heterozygous regions of the genome (Figure 1—figure supplement 2C–I, Materials and methods). Next, we used de novo contigs to fill pre-existing 104 N-gaps and scaffolded 14 sub-telomeres (Figure 1—figure supplement 3A–C, Supplementary file 2). Finally, we used 3C-seq reads to polish the complete genome assembly of C. tropicalis constituting 14,609,527 bp in seven telomere-to-telomere long gapless chromosomes (Figure 1B). We call this new assembly as Assembly2020.

Figure 1 with 6 supplements see all

Download asset Open asset

Construction of the gapless assembly of *C. tropicalis* type strain MYA-3404 in seven chromosomes.

(A) Schematic showing the stepwise construction of the gapless chromosome-level assembly (Assembly2020) of *C. tropicalis* (also see Figure 1—figure supplement 1 and Figure 1—figure supplement 2). (B) An ethidium bromide (EtBr)-stained CHEF gel image of separated chromosomes of the *C. tropicalis* (strain MYA-3404) and *C. albicans* (strain SC5314) (Materials and methods). *C. albicans* chromosomes are used as size markers for estimation and validation of lengths and identities of *C. tropicalis* chromosomes in the newly constructed Assembly2020. (C) An ideogram of seven chromosomes of *C. tropicalis* as deduced from Assembly2020 and drawn to scale. The genomic location of the three loci showing copy number variations (CNVs), *DUP4, DUP5* and *DUPR* located on Chr4, Chr5 and ChrR respectively, are marked and depicted as striped box. The CNVs for which the correct homolog-wise distribution of the duplicated copy is unknown are marked with asterisks. Homolog-specific differences for Chr1 and Chr4, occurred due to an exchange of chromosomal parts in a balanced heterozygous translocation between Chr1B and Chr4B, are highlighted with black borders (also see Figure 1—figure supplement 4C). (D) A circos plot showing the genome-wide distribution of various sequence features. Very high sequence coverage at rDNA locus is clipped for more precise representation and marked with an asterisk.

We assigned the numbers to each chromosome according to the length, starting from the longest as chromosome 1 (Chr1) through the shortest as chromosome 6 (Chr6). The remaining chromosome, the one containing the rDNA locus, was named as chromosome R (ChrR) (Figure 1C). Accordingly, centromeres on each chromosome were named after the respective chromosome number. Additionally, we oriented the DNA sequence of each chromosome in a way to consistently maintain the short arm at the 5′ end. The statistics of these genome assemblies of C. tropicalis is summarized in Supplementary file 3. In Assembly2020, 1278 out of 1315 Ascomycota-specific BUSCO gene sets could be identified compared to 1255 identified using Assembly A (Supplementary file 4, Materials and methods). The inclusion of 23 additional BUSCO gene sets suggests significantly improved contiguity and completeness of Assembly2020.

Previously, using centromere-proximal probes, we could distinctly identify five chromosomes (Chr1, Chr2, Chr3, Chr5, and Chr6) in chromoblot analysis (Chatterjee et al., 2016). However, the lengths of Chr4 and ChrR could not be determined. To validate the correct assembly of these two chromosomes (Chr4 and ChrR), we performed additional chromoblot analysis. We observed that Chr4 homologs differed in size (Figure 1—figure supplement 4A). Analysis of the sequence coverage across Chr4 identified an internal duplication of ~235 kb region, which could explain the size difference between the homologs Chr4A and Chr4B (Figure 1C, Figure 1—figure supplement 4B). We named this duplicated locus as DUP4. Subsequently, we scanned the entire genome for the presence of copy number variations (CNVs), which led to the identification of two additional large-scale duplication events: one each on Chr5 (DUP5,~23 kb) and ChrR (DUPR,~80 kb) (Figure 1C, Figure 1—figure supplement 4B). Further, using CNAtra software (Khalil et al., 2020) we confirmed these duplication events and identified additional small-scale CNV loci with copy number <1.5 or >2.5 (Figure 1—figure supplement 4C). Additionally, we detected a balanced heterozygous translocation event between Chr1 and Chr4 (Figure 1—figure supplement 5A) through analyses of 3C-seq data and de novo contigs (Figure 1—figure supplement 5B). This translocation was validated using chromoblot analysis (Figure 1—figure supplement 5C) as well as Illumina, and SMRT-seq read mapping (Figure 1—figure supplement 5D). Thus, while chromoblot analysis suggests that the actual length of ChrR is ~2.8 Mb (Figure 1—figure supplement 5E), the assembled length is 2.1 Mb (Figure 1C). Considering the length of the rDNA locus is ~700 kb in C. albicans (Jones et al., 2004), we reason that the difference between the assembled length and actual length (derived from chromoblot analysis) of ChrR in C. tropicalis can be attributed to the presence of the repetitive rDNA locus of ~700 kb, which is not completely assembled in Assembly2020.

Next, we performed phasing of the diploid genome of C. tropicalis using SMRT-seq and 3C-seq data to identify the homolog-specific variations (Materials and methods). This analysis produced 16 nuclear contigs, which were colinear with the chromosomes of Assembly2020, except for the previously validated heterozygous translocation between Chr1 and Chr4 (Figure 1—figure supplement 5F). To characterize the sequence variations in the diploid genome of C. tropicalis, we identified the single nucleotide polymorphisms (SNPs) and insertion-deletion (indel) mutations (Materials and methods). Intriguingly, we detected a long chromosomal region depleted of SNPs and indels on the left arm of ChrR (Figure 1D). We named this region that lost heterozygosity on ChrR as LOH^R. Strikingly, we found parts of the syntenic region of LOH^R to be SNP and indel depleted in the C. sojae strain NCYC-2607, a closely related species of C. tropicalis, as well as in C. albicans reference strain SC5314 (Figure 1—figure supplement 6). We also identified the genome-wide distribution of transposons and simple repeats but could not detect preferential enrichment of these sequence elements at any specific genomic location in C. tropicalis (Figure 1D). Together, we demonstrate, for the first time, multiple CNVs, a long-track LOH, and evidence of a heterozygous reciprocal translocation event in the diploid genome of C. tropicalis. Possible implications of these events in conferring virulence and drug resistance in this successful human fungal pathogen remain to be explored.

Conserved principle of the spatial genome organization in C. tropicalis and C. albicans

Indirect immunofluorescence imaging of the C. tropicalis strain (CtKS102) expressing Protein-A tagged CENP-A^Cse4 suggested that centromeres are clustered and localized at the periphery of the DAPI-stained nuclear DNA mass as a single punctum (Figure 2A–B). We mapped 3C-seq data (Materials and methods), that were generated using DpnII, to the Assembly2020 to construct the genome-wide chromatin contact map of C. tropicalis. The resultant heatmap depicts high signal intensities along the diagonal, indicating that the intrachromosomal interactions are generally stronger than interchromosomal interactions, as observed before (Figure 2C; Duan et al., 2010). However, the most striking feature of the heatmap is the presence of conspicuous puncta in the interchromosomal areas, which signify strong spatial proximity between centromeres (Figure 2C–D). The aggregate signal analysis further reiterated the enrichment of centromere-centromere interactions (Figure 2E). Strikingly, we also noted the enrichment of telomere-telomere interactions as compared to the neighboring regions (Figure 2C–E). Statistical comparison was then performed between these telomere-telomere interactions and bulk chromatin, which revealed that the interchromosomal telomeric interactions were significantly greater than the all interchromosomal interactions (Mann-Whitney U test P value = 1.129⋅10⁻¹¹) (Figure 2—figure supplement 1A). On the other hand, cis interactions between the two telomeres of an individual chromosome (intrachromosomal telomeric interactions) were also significantly enhanced compared to all intrachromosomal long-range (>100 kb) interactions (Mann-Whitney U test P value = 7.374⋅10⁻¹¹) (Figure 2—figure supplement 1B). All these lines of evidence prompted us to propose that C. tropicalis chromosomes adopt the Rabl-like configuration, a characteristic feature of the higher-order genome organization in yeasts (Duan et al., 2010; Descorps-Declère et al., 2015; Burrack et al., 2016).

Figure 2 with 1 supplement see all

Download asset Open asset

Spatial genome organization reveals centromere-centromere and telomere-telomere contacts in *C. tropicalis*.

(A) A representative field image of *C. tropicalis* (strain CtKS102) cells expressing Protein-A tagged CENP-A^Cse4. CENP-A signals (red) were obtained using anti-Protein A antibodies by indirect immuno-fluorescence microscopy. Nuclei of the corresponding cells were stained by DAPI (blue). The images were acquired using a DeltaVision imaging system (GE) and processed using FIJI software (Schindelin et al., 2012). Scale, 2 µm. (B) A 3D reconstruction showing clustered kinetochores marked by CENP-A^Cse4 (red) at the periphery of the DAPI-stained nucleus (blue) using Imaris software (Oxford Instruments) in *C. tropicalis*. Scale, 2 µm. (C) A genome-wide contact probability heatmap (bin size = 10 kb) generated using 3C-seq data. Chromosome labels and their corresponding ideograms are shown on the axes of the heatmap. Colorbar represents the contact probability in the log2 scale. (D) Zoom in view of heatmap showing Chr4 and Chr5 from panel C (blue box). (E) Heatmaps plotted from aggregate signal analysis of matrices (bin size = 2 kb) surrounding centromere-centromere (top) or telomere-telomere interactions (bottom). *Top*, genomic loci containing mid-points of centromeres are aligned at the center ; *bottom*, genomic loci from 5′ or 3′ ends of chromosomes are aligned at the bottom right corner.

Previously, microscopic and Hi-C studies revealed similar centromere clustering and strong physical interactions among centromeres in C. albicans (Burrack et al., 2016; Sreekumar et al., 2019a; Sreekumar et al., 2019b). This study now reveals that despite substantial karyotypic changes, a conserved principle of genome organization exists in two yeast species, C. albicans and C. tropicalis, with diverged centromere features.

Centromere and telomere proximal loci are hotspots for complex translocations

Using the chromosome-level assemblies of C. tropicalis type strain MYA-3404 and C. albicans type strain SC5314 (ASM18296v3), we performed a detailed genome-wide synteny analysis employing four different approaches. We used two analytical tools, Symap (Soderlund et al., 2011) and Satsuma synteny (Grabherr et al., 2010), and a custom approach to identify the ICSBs based on the synteny of the conserved orthologs (Figure 3A). Next, we compared and validated the results obtained from our custom approach of analysis with another published tool Synchro (Drillon et al., 2014). Considering the C. albicans genome as the reference, all four methods of analyses suggest that six out of seven centromeres (except CEN6) of C. tropicalis are located proximal to multiple ICSBs (Figure 3A, Figure 3—figure supplement 1A). Although it appears that CtCEN6 escaped inter-centromeric translocations, synteny analysis suggested that a chromosomal region carrying three consecutive CtCEN6-proximal ORFs was lost in the C. albicans genome (Figure 3—figure supplement 1B). Strikingly, these ICSBs are rare at the chromosomal arms (Figure 3A). ORF-level synteny analysis further revealed that four out of seven centromeres (CEN2, CEN3, CEN5, and CENR) in C. tropicalis are precisely located at the ICSBs (Figure 3—figure supplement 1C), while multiple ICSBs are located within ~100 kb of other two centromeres (Figure 3A). Additionally, a convergence of orthoblocks from as many as four different chromosomes of C. albicans was detected within 100 kb of C. tropicalis centromeres (Figure 3B). It is important to note that by using the C. tropicalis genome as the reference, all centromeres of C. albicans, except CaCEN2, were found to be associated with ICSBs (Figure 3—figure supplement 1D). Taken together, centromeres of both these species are found to be associated with chromosomal translocations.

Figure 3 with 1 supplement see all

Download asset Open asset

Genome-wide mapping of interchromosomal synteny breakpoints in *C. tropicalis* identifies a spatial cue for karyotype evolution.

(A) Scaled representation of the color-coded orthoblocks (relative to *C. albicans* chromosomes) and ICSBs (white lines) in *C. tropicalis* (Materials and methods). Orthoblocks are defined as stretches of the target genome (*C. tropicalis*) carrying more than two syntenic ORFs from the same chromosome of the reference genome (*C. albicans*). The centromeres are represented with black arrowheads. (B) Zoom in view of the *C. tropicalis* centromere-specific ICSBs on *CEN2*, *CEN3*, *CEN5* and *CENR* showing the color-coded (relative to *C. albicans* chromosomes) ORFs flanking each centromere. *C. tropicalis-*specific unique ORFs proximal to *CEN3* and *CEN5* are shown in red. (C) A plot showing the chromosome-wise ICSB density, calculated as number of ICSBs per 100 kb of the C. *tropicalis* genome (y-axis), as a function of the linear distance from the centromere in nine bins. These bins are a) 0–100 kb on both sides of centromere (bin I), (b) 100–200 kb (bin II), (c) 200–300 kb (bin III), (d) 300–400 kb (bin IV), (e) 400–500 kb (bin V), (f) 500–600 kb (bin VI), (g) 600–700 kb (bin VII), (h) >700 kb to 200 kb from telomere ends (bin VIII), and i) 200 kb from the telomere ends (bin IX). Chr6 was excluded from this analysis, as it does not harbor any ICSB. (D) A violin plot comparing the distribution of lengths of orthoblocks (y-axis) at three different genomic zones: a) the centromere-proximal zone (CP), (b) the centromere-distal zone (CD), and c) telomere-proximal zone (TP). Orthoblocks, which span over more than one zone, were assigned to the zone with maximum overlap. The centromere-distal dataset was compared with the other two groups using the Mann-Whitney U test and the respective P values are mentioned. (**E - F**) Circos plots representing the convergence of centromere-proximal ORFs of *C. tropicalis* chromosomes near the centromeres (*CEN4* and *CEN7*) of *C. albicans*. Chromosomes of *C. tropicalis* and *C. albicans* are marked with black and purple filled circles at the beginning of each chromosome, respectively.

To correlate the frequency of translocations with the spatial genome organization, we quantified ICSB density (the number of ICSBs per 100 kb of the genome) for different zones across the chromosome for all chromosomes except CtChr6 (Figure 3C). Our analysis reveals that the ICSB density is maximum at the centromere-proximal zones for all six chromosomes, but drops sharply at the chromosomal arms. However, the ICSB density near the telomere-proximal zone for Chr2, Chr4, and ChrR shows an increase compared to the chromosomal arms, albeit at a lower magnitude than centromeres. We also compared the lengths of orthoblocks across three different genomic zones - the centromere-proximal (0–300 kb from the centromere on both sides), centromere-distal (>300 kb from the centromere to 200 kb away from the telomere ends), and telomere-proximal (0–200 kb from the telomere ends) zones. This analysis further reveals that the lengths of the orthoblocks located proximal to centromeres and telomeres are significantly smaller than orthoblocks located at the centromere-/telomere-distal zones (Figure 3D).

We further probed into the consequences of strong inter-centromeric interactions, as described above. Synteny analysis across centromere-proximal regions of the two species hints that inter-centromeric translocations may have occurred in the common ancestor of C. albicans and C. tropicalis. If such is the case, the centromere-proximal ORFs of different chromosomes in C. tropicalis should have converged on the C. albicans genome. Indeed, we identified at least ten loci where a convergence of C. tropicalis ORFs from different chromosomes had taken place in C. albicans (Figure 3—figure supplement 1E). Intriguingly, we found four such loci that are proximal to the centromeres (CEN3, CEN4, CEN7, and CENR) in C. albicans (Figure 3E–F, Figure 3—figure supplement 1F–G). This observation strongly supports the possibility of inter-centromeric translocation events in the common ancestor of C. albicans and C. tropicalis. Additionally, the other four centromeres in C. albicans are located proximal to ORFs, orthologs of which are also proximal to the centromeres in C. tropicalis (Figure 3—figure supplement 1E). We posit that the ancestral HIR-associated centromeres were lost in C. albicans, and ENCs formed proximal to the ancestral centromere loci on unique DNA sequences. A similar centromere type transition within two isolates of C. parapsilosis, another species of the CUG-Ser1 clade, has been recently reported (Ola et al., 2020).

Rapid transition in the centromere type within the members of the CUG-Ser1 clade

Since multiple translocation events near centromeric regions of the C. tropicalis genome could be detected, we hypothesized that complex translocations between HIR-associated centromeres in the common ancestor of C. albicans and C. tropicalis led to the loss of HIR and the evolution of unique centromere types observed in C. albicans and C. dubliniensis. However, the genomic rearrangements are rare events, even at the evolutionary time scale. Therefore, if HIR-associated centromeres are to be the ancestral state from which unique centromeres were derived, some other closely related species should have retained HIR-associated centromeres. Indeed, we identified eight HIR-associated structures, in the reference genome of C. parapsilosis strain CDC317 (ASM18276v2). Identification of the HIR-associated structures present at the intergenic and transcription-poor regions, one each on all eight chromosomes, suggests that these loci are the putative centromeres of C. parapsilosis. Indeed, it was recently reported that all eight CENP-A^Cse4 enriched centromeres in the CLIB214 strain of C. parapsilosis are located at HIR-associated loci (Ola et al., 2020). Based on these lines of evidence, we conclude that the common ancestor of C. albicans and C. tropicalis possibly carried HIR-associated centromeres. Surprisingly, two centromeres in another isolate (90-137) of C. parapsilosis have been shown to be formed on non-HIR-associated loci (Ola et al., 2020). However, the driving force triggering polymorphisms in centromere locations within the same species is yet to be understood.

Although IRs are present in CEN4, CEN5, and CENR of C. albicans, these sequences are not homogenized like the HIR-associated centromeres in C. tropicalis (Figure 4A). To study the presence of HIRs in C. sojae (NCYC-2607), a sister species of C. tropicalis (Shen et al., 2018), we assembled its genome into 42 contigs, including seven chromosome-length contigs (Materials and methods). Using this assembly, we identified seven putative centromeres in C. sojae as intergenic and HIR-associated loci syntenic to the centromeres in C. tropicalis (Figure 4—figure supplement 1A–C). Each of these seven putative centromeres in C. sojae consists of a ~2 kb long CC region flanked by 3–12 kb long inverted repeats (Supplementary file 5). Using a similar approach, we identified six HIR-associated centromeres in the publicly available genome assembly (ASM332773v1) of Candida viswanathii, another species closely related to C. tropicalis (Figure 4—figure supplement 1D–E, Supplementary file 6; Tsui et al., 2008). A dot-plot analysis identified the presence of homologous sequences shared across IRs but not among the CC elements (Figure 4A) of the HIR-associated centromeres present in C. tropicalis and the putative centromeres of C. sojae and C. viswanathii (Supplementary file 7). Moreover, we detected extensive structural conservation in centromere DNA elements, especially among IRs within an individual species (Figure 4—figure supplement 2A). These structural feature of IRs are also significantly conserved across the three species, C. tropicalis, C. sojae, and C. viswanathii (Figure 4—figure supplement 2B).

Figure 4 with 2 supplements see all

Download asset Open asset

Genome-wide analysis of centromere DNA sequences across the CUG-Ser1 clade reveals the emergence of unique centromeres from an ancestral homogenized inverted repeat-associated centromere type.

(A) A dot-plot matrix representing the sequence and structural homology among species of the CUG-Ser1 clade was generated using Gepard (Materials and methods). (B) A logo plot showing the 12-bp-long IR-motif, identified using MEME-suit (Materials and methods). (C) The distribution of IR-motif density on centromere DNA sequences and across the entire genome of each species was calculated as the number of motifs per kb of DNA (Materials and methods). Note that *C. albicans* and *C. dubliniensis* centromeres that form on unique and different DNA sequences do not contain the IR-motif. (D) IGV track images showing the IR-motif density across seven chromosomes of *C. tropicalis.* The location of the centromere on each chromosome is marked with a black arrowhead. (E) IGV track images showing the IR-motif distribution across seven HIR-associated centromeres of *C. tropicalis.*

Cloning of a full-length centromere of C. tropicalis in a replicative plasmid facilitated de novo CENP-A^Cse4 deposition but failed to do so when the native IRs were replaced with CaCEN5 IRs (Chatterjee et al., 2016). This result indicated DNA sequence specificity is required for centromere function in C. tropicalis. To identify the DNA sequence as a putative genetic element, we analyzed centromere DNA sequences of all three Candida species with HIR-associated centromeres and the unique centromeres of C. albicans for the presence of any conserved motif(s) (Materials and methods). This analysis identified a highly conserved 12-bp motif (dubbed as IR-motif) (Figure 4B) clustered specifically at centromeres but not anywhere else in the entire genome of C. tropicalis, C. sojae and C. viswanathii (Figure 4C–D, Figure 4—figure supplement 2C). On the contrary, the IR-motif density at centromeres in C. albicans remains approximately an order of magnitude lower than that of C. tropicalis (Figure 4C). This observation indicates a potential function of IR-motifs in the regulation of de novo CENP-A^Cse4 loading in C. tropicalis. Moreover, this CEN-enriched motif found at IRs is absent at central core region in C. tropicalis (Figure 4E) and at the putative centromeres in C. sojae and C. viswanathii (Figure 4—figure supplement 2D). Additionally, we noted that the direction of the IR-motif is diverging away from the central core in C. tropicalis (Figure 4—figure supplement 2E) as well as in the other two species (Figure 4—figure supplement 2F). The conserved structure and organization of the IR-motif sequences in the HIR-associated centromeres of three Candida species suggest an inter-species conserved function of the IR DNA sequence. However, the clusters of IR-motifs are located at a variable distance from CC in these species (Figure 4—figure supplement 2G). The importance of the sequence and the density of IR-motifs on the centromere function is yet to be determined.

Discussion

In this study, we improved the current genome assembly of the human fungal pathogen C. tropicalis by employing SMRT-seq, 3C-seq, and chromoblot experiments, and present Assembly2020, the first chromosome-level gapless genome assembly of this organism. We further identified three large-scale duplication events and few small-scale CNV loci in its genome, phased the diploid genome of C. tropicalis, and mapped SNPs and indels. We constructed a genome-wide chromatin contact map and identified significant centromere-centromere as well as telomere-telomere spatial interactions. Comparative genome analysis between C. albicans and C. tropicalis reveals that six out of seven centromeres of C. tropicalis are mapped precisely at or proximal to ICSBs. Strikingly, ORFs proximal to the centromeres of C. tropicalis are converged into specific regions on the C. albicans genome, suggesting that inter-centromeric translocations may have occurred in their common ancestor. Moreover, the presence of HIR-associated putative centromeres in C. sojae and C. viswanathii, like in C. tropicalis, suggests that such a centromere structure is plausibly the ancestral form in the CUG-Ser1 clade but lost both in C. albicans and C. dubliniensis. We propose that loss of such a centromere structure might have occurred during translocation events involving centromeres of homologous DNA sequences in the common ancestor, to give rise to ENCs on unique DNA sequences and facilitated speciation.

Unlike other centromeres, CEN6 of C. tropicalis did not seem to undergo inter-centromeric translocations. A closer analysis revealed that three CEN6-associated ORFs of C. tropicalis are absent in the C. albicans genome while the other flanking ORFs remain conserved. This observation can be explained by a double-stranded DNA break at the centromere followed by the fusion of broken ends resulting in the loss of those ORFs.

The availability of the chromosome-level genome assembly and improved annotations of genomic variants and genes absent in the publicly available fragmented genome assembly of C. tropicalis should greatly facilitate genome-wide association studies to understand the pathobiology of this organism including the cause of antifungal drug resistance. Besides, this study sheds light on how genetic elements required for de novo centromere establishment in an ancestral species could be lost in the derived lineages to give rise to epigenetically-regulated centromeres.

C. tropicalis is a human pathogenic ascomycete, closely related to the well-studied model fungal pathogen C. albicans (Legrand et al., 2019). These two species diverged from their common ancestor ~39 million years ago (Kumar et al., 2017) and evolved with distinct karyotypes (Chatterjee et al., 2016), having different phenotypic traits (Cavalheiro and Teixeira, 2018), and ecological niches (Pappas et al., 2018). While C. albicans remains the primary cause of candidiasis worldwide, systemic ICU-acquired candidiasis is primarily (30.5–41.6%) caused by C. tropicalis in tropical countries including India (Chakrabarti et al., 2015), Pakistan (Farooqi et al., 2013), and Brazil (da Costa et al., 2014). Moreover, the occurrence of drug resistance, particularly multidrug resistance, in C. tropicalis is on the rise (Chakrabarti et al., 2015; Xiao et al., 2015; Gonçalves et al., 2016). Therefore, relatively less-studied C. tropicalis is emerging as a major threat for nosocomial candidemia with 29–72% broad spectrum mortality rate (Lamoth et al., 2018). Fluconazole resistance in C. albicans can be gained due to segmental aneuploidy of Chr5 containing long IRs at the centromere, by the formation of isochromosomes (Selmecki et al., 2006), which was also identified in Chr4 with IRs at its centromere (Todd et al., 2019). All seven centromeres in C. tropicalis are associated with long IRs with the potential to form isochromosomes.

Since the mechanism of homology search during HR is positively influenced by spatial proximity and the extent of DNA sequence homology (Agmon et al., 2013; Seeber et al., 2018), at least in the engineered model systems, it is expected that spatially clustered homologous DNA sequences undergo more translocation events than other loci. Although these factors were not shown to be involved in karyotypic rearrangements during speciation, a retrospective survey in light of spatial proximity and homology now offers a better explanation. For example, the bipolar to the tetrapolar transition of the mating type locus in the Cryptococcus species complex was associated with inter-centromeric recombination following pericentric inversion (Sun et al., 2017). Similar inter-centromeric recombination has been reported in the common ancestor of two fission yeast species, Schizosaccharomyces cryophilus and Schizosaccharomyces octosporus (Tong et al., 2019). These examples raise an intriguing notion that centromeres serve as sites of recombination, which may lead to centromere loss and/or the emergence of ENCs. This notion is supported by the fact that DSBs at centromeres following fusion of the acentric fragments to other chromosomes led to chromosome number reduction in Ashbya species (Gordon et al., 2011) and Malassezia species (Sankaranarayanan et al., 2020). Genomic instability at the centromere can also lead to fluconazole resistance, as in the case of isochromosome formation on Chr5 of C. albicans (Selmecki et al., 2006). Additionally, breaks at the centromeres were reported to be associated with cancers in humans (Barra and Fachinetti, 2018).

What would be the consequence of the spatial proximity of chromosomal regions with high DNA sequence homology in other domains of life? interchromosomal contacts between chromosome pairs have been correlated with the number of translocation events in both naturally occurring populations and experimentally induced mammalian cells (Arsuaga et al., 2004; Bickmore and Teague, 2002; Branco and Pombo, 2006; Canela et al., 2017; Engreitz et al., 2012; Hlatky et al., 2002; Holley et al., 2002; Klein et al., 2011; Roukos et al., 2013; Zhang et al., 2012). It has been suggested that contacts between various chromosomal territories as well as their relative positions in the nucleus influence the sites and frequency of translocation events both in flies and mammals (Engreitz et al., 2012; Aten et al., 2004; Foster et al., 2013; Savage, 1998; Savage, 2000; Meaburn, 2016). While centromeres remained clustered either throughout the cell cycle or most parts of it in many fungal species, such is not the case in metazoan cells. Nevertheless, one of the well-studied translocation events, Robertsonian translocation (RT) involving fusion between arms of two different chromosomes near a centromere, is the most frequently detected chromosomal abnormality in humans (Therman et al., 1989). The occurrence of RT was first reported in grasshoppers (Robertson, 1916) and subsequently it has been implicated in the karyotype evolution in humans (Therman et al., 1989), mice (Castiglia and Capanna, 2002; Dumas and Britton-Davidian, 2002), and wheat (Friebe et al., 2005). Moreover, RTs cause sterility in humans (Guichaoua et al., 1990), often linked with the heterogeneity of carcinomas (Hermsen et al., 2005), and implicated in genetic disorders (Mattei et al., 1984). Intriguingly, cytological and Hi-C based evidence (Imakaev et al., 2012) of spatial proximity (reviewed in Muller et al., 2019) among the repeat-associated centromere DNA sequences (Kalitsis et al., 2006) in these species supports a possibility that RTs may have been guided by spatial proximity. Similarly, chromoplexy, involving a series of translocation events among multiple chromosomes without alterations in the copy number, was identified in prostate cancers (Zhang et al., 2013; Baca et al., 2013). Although fine mapping of translocation events at the repetitive regions in human cancer cells is challenging, the growing evidence that such events are associated with the formation of micronuclei (Crasta et al., 2012) supports the idea that the spatial genome organization may influence chromoplexy as well (Meaburn et al., 2007).

The identification of HIR-associated putative centromeres in C. parapsilosis, C. sojae, and C. viswanathii supports the idea that the unique centromeres might have evolved from an ancestral HIR-associated centromere (Coughlan et al., 2016; Figure 5A). While HIR-associated centromeres of C. tropicalis, C. sojae, and C. viswanathii form on different DNA sequences, a well-conserved IR-motif was identified in this study that is present in multiple copies on the centromeric IR sequences across these three species. Some centromeres in C. albicans carry chromosome-specific IRs but lack IR-motifs. Besides, CaCEN5 IRs could not functionally complement the centromere function in C. tropicalis for the de novo CENP-A^Cse4 recruitment. This indicates a possible role of the conserved IR-motifs on species-specific centromere function (Chatterjee et al., 2016). Therefore, the loss of HIR-associated centromeres in C. albicans that are only epigenetically propagated (Baum et al., 2006) clearly shows how the ability of de novo establishment of kinetochore assembly in an ancestral lineage can be lost in a derived lineage. However, the mechanism through which IR-motifs may regulate centromere identity remains to be explored.

Figure 5

Download asset Open asset

The spatial genome organization remained conserved in the CUG-Ser1 clade despite centromere type diversity.

(A) A maximum likelihood-based phylogenetic tree of closely related CUG-Ser1 species analyzed in this study. The centromere structure of each species is shown and drawn to scale. (B) A model showing possible events during the loss of HIR-associated centromeres and emergence of the unique centromere type through inter-centromeric translocations possibly occurred in the common ancestor of *C. tropicalis* and *C. albicans.* The model is drawn to show translocation events involving two *C. tropicalis* chromosomes (CtChr3 and CtChr4) as representatives, which can be mapped proximal to the centromere on *C. albicans* ChrR (CaChrR) as shown in Figure 3F. (C) Rabl-like chromosomal conformation is maintained despite inter-centromeric translocations that facilitated centromere type transition.

Loss of HIR-associated centromeres during inter-centromeric translocations or MIR must have been catastrophic for the cell, and the survivor was obligated to activate another centromere at an alternative locus. How is such a location determined? Artificial removal of a native centromere in C. albicans leads to the activation of a neocentromere (Thakur and Sanyal, 2013; Ketel et al., 2009), which then becomes part of the centromere cluster (Burrack et al., 2016). This evidence supports the existence of a spatial determinant, known as the CENP-A cloud or CENP-A-rich zone (Thakur and Sanyal, 2013; Fukagawa and Earnshaw, 2014), influencing the preferential formation of neocentromere at loci proximal to the native centromere (Thakur and Sanyal, 2013; Scott and Sullivan, 2014). We found that the unique and different centromeres of C. albicans are located proximal to the ORFs, which are also proximal to the centromeres in C. tropicalis. This observation indicates that the formation of the new centromeres in C. albicans may have been influenced by spatial proximity to the ancestral centromere cluster. However, new centromeres of C. albicans are formed on loci with completely unique and different DNA sequences. Similar to centromeres of C. albicans, centromere repositioning events may lead to the formation of ENCs, which are often associated with speciation in mammals (Rocchi et al., 2012; Stanyon et al., 2008). It was found that the location of one centromere in horse varies across individuals (Wade et al., 2009; Purgato et al., 2015). Although, there are cases where ENCs formed without genomic rearrangements, the driving force facilitating centromere relocation was proposed to be associated with chromosomal inversion and translocation in certain cases (Schubert, 2018). Because of these reasons, it may be logical to consider the centromeres of C. albicans as ENCs (Figure 5B). Intriguingly, even after the catastrophic chromosomal rearrangements, the ENCs in C. albicans remain clustered similar to C. tropicalis (Figure 5C). This observation identifies spatial clustering of centromeres as a matter of cardinal importance for the fungal genome organization.

Materials and methods

Media

C. tropicalis and C. sojae strains (Supplementary file 8) used in this study were grown in non-selective YPDU (2% dextrose, 2% peptone, 1% yeast extract, and 0.01% uracil), and incubated at 30°C at 180 rpm. For growing C. albicans strains, YPD media was supplemented with 0.1 mg/mL of uridine. The transformation of C. tropicalis was performed as described previously (Chatterjee et al., 2016). The selection of transformants was based on prototrophy for the metabolic markers used. In the case of selection for the antibiotic marker (CaSAT1), conferring nourseothricin (NTC) resistance, growth media was supplemented with 100 μg/mL NTC (NTC; Werner Bioagents, CAS No. 96736-11-7). Recycling of the CaSAT1 marker was done by growing the NTC resistant strains in YPMU (4% maltose, 2% peptone, 1% yeast extract, and 0.01% uracil) and segregants which are NTC sensitive were selected by patching them on YPDU and YPDU supplemented with NTC. For counter selection against CaURA3, the 5-Fluoroorotic Acid (5-FOA; Sigma-Aldrich, CAS No. 207291-81-4) was used at 1 mg/mL concentration. The strains, primers, and plasmids used in this study are listed in Supplementary files 8, 9, and 10, respectively.

Share this article

Cite this article

Construction of the gapless assembly of C. tropicalis type strain MYA-3404 in seven chromosomes.

Spatial genome organization reveals centromere-centromere and telomere-telomere contacts in C. tropicalis.

Genome-wide mapping of interchromosomal synteny breakpoints in C. tropicalis identifies a spatial cue for karyotype evolution.

Genome-wide analysis of centromere DNA sequences across the CUG-Ser1 clade reveals the emergence of unique centromeres from an ancestral homogenized inverted repeat-associated centromere type.

The spatial genome organization remained conserved in the CUG-Ser1 clade despite centromere type diversity.

Author details

Krishnendu Guin

Contribution

Competing interests

Yao Chen

Contribution

Competing interests

Radha Mishra

Present address

Contribution

Competing interests

Siti Rawaidah BM Muzaki

Contribution

Competing interests

Bhagya C Thimmappa

Present address

Contribution

Competing interests

Caoimhe E O'Brien

Contribution

Competing interests

Geraldine Butler

Contribution

Competing interests

Amartya Sanyal

Contribution

For correspondence

Competing interests

Kaustuv Sanyal

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags