Microevolution of Trypanosoma cruzi reveals hybridization and clonal mechanisms driving rapid genome diversification

  1. Gabriel Machado Matos
  2. Michael D Lewis
  3. Carlos Talavera-López
  4. Matthew Yeo
  5. Edmundo C Grisard
  6. Louisa A Messenger
  7. Michael A Miles
  8. Björn Andersson  Is a corresponding author
  1. Departamento de Biologia Celular, Embriologia e Genética, Universidade Federal de Santa Catarina, Brazil
  2. Department of Cell and Molecular Biology, Karolinska Institute, Sweden
  3. Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, United Kingdom
  4. Institute of Computational Biology, Computational Health Centre, Helmholtz Munich, Germany
  5. Departamento de Microbiologia, Imunologia e Parasitologia, Universidade Federal de Santa Catarina, Brazil

Abstract

Protozoa and fungi are known to have extraordinarily diverse mechanisms of genetic exchange. However, the presence and epidemiological relevance of genetic exchange in Trypanosoma cruzi, the agent of Chagas disease, has been controversial and debated for many years. Field studies have identified both predominantly clonal and sexually recombining natural populations. Two of six natural T. cruzi lineages (TcV and TcVI) show hybrid mosaicism, using analysis of single-gene locus markers. The formation of hybrid strains in vitro has been achieved and this provides a framework to study the mechanisms and adaptive significance of genetic exchange. Using whole genome sequencing of a set of experimental hybrids strains, we have confirmed that hybrid formation initially results in tetraploid parasites. The hybrid progeny showed novel mutations that were not attributable to either (diploid) parent showing an increase in amino acid changes. In long-term culture, up to 800 generations, there was a variable but gradual erosion of progeny genomes towards triploidy, yet retention of elevated copy number was observed at several core housekeeping loci. Our findings indicate hybrid formation by fusion of diploid T. cruzi, followed by sporadic genome erosion, but with substantial potential for adaptive evolution, as has been described as a genetic feature of other organisms, such as some fungi.

Editor's evaluation

The authors dissected the across-the-genome consequences of sexual recombination in Trypanosoma cruzi, a serious human pathogen. They had discovered hybrid formation in this species 20 years ago, here they went at length by culturing parental and hybrid clones for 5 years, demonstrating that tetraploid T cruzi hybrids undergo genome erosion.

https://doi.org/10.7554/eLife.75237.sa0

Introduction

Trypanosoma cruzi is a kinetoplastid protozoan and the etiologic agent of Chagas disease, one of the neglected and highest impact parasitic infections in the Americas (Pérez-Molina and Molina, 2018). Chagas disease is estimated to cause great loss in both health-care costs and disability-adjusted life years (Lee et al., 2013). Human migration and specific modes of transmission than the canonical vector-based have led to a spreading beyond its natural geographical boundaries, becoming a global issue (Pérez-Molina and Molina, 2018; Bern, 2015). T. cruzi transmission is a zoonosis maintained by numerous species of triatomine insects and different species of mammals (Jansen et al., 2018). This parasite has a complex life cycle, where transmission to humans occurs most frequently by contamination with infected feces from triatomine vectors. Other routes of human infection include congenital, oral (mainly consumption of raw contaminated foods), and contaminated transplant tissues and blood products. To evade the immune responses, the parasite displays a vast, complex repertoire of surface proteins involved in cell invasion and pathogenicity (De Pablos and Osuna, 2012; Pech-Canul et al., 2017; Talavera-López et al., 2021). The regions of the genome coding for these molecules are composed of highly repetitive sequences, with hundreds to thousands of members of each family, as well as substantial numbers of transposable elements, microsatellite, and tandem repeats (Talavera-López et al., 2021; El-Sayed et al., 2005; Wang et al., 2021).

The reproductive mode of these parasitic protozoans has been widely debated, and both preponderate clonal evolution (Tibayrenc and Ayala, 2012; Tibayrenc and Ayala, 2013) and sexual reproduction (Ramírez and Llewellyn, 2014; Schwabl et al., 2019; Inbar et al., 2019) have been proposed. Many eukaryotic pathogens have essentially clonal population structures while also having non-obligate sexual cycles, which may enable them to adapt to environmental changes (Heitman, 2006; Lewis et al., 2011; Steensels et al., 2021). In trypanosomatids, the most well-studied mating system is that of Trypanosoma brucei, for which a meiotic process is well supported based on the identification of a haploid parasite stage in the vector (Peacock et al., 2014) and on the patterns of allele inheritance and recombination observed in experimental hybrids (MacLeod et al., 2005). In T. cruzi, great genetic and phenotypic diversity is observed, and six distinct genetic clades have been recognized, named TcI to TcVI (discrete typing units or DTU-I to -VI) (Zingales et al., 2012). Genotyping analysis indicates that TcV and TcVI are recent, natural inter-lineage hybrids of TcII and TcIII, showing that recombination events have shaped the evolution of T. cruzi lineages (Schwabl et al., 2019; Lewis et al., 2011; Messenger and Miles, 2015). In addition to natural evidence of hybridization, the formation of T. cruzi hybrids in vitro was also described (Gaunt et al., 2003). Genetic marker analyses of hybrid strains showed multi-allelic (non-Mendelian) inheritance of microsatellite alleles and an elevated DNA content (Gaunt et al., 2003; Lewis et al., 2009). The hybridization events in T. cruzi were proposed to occur via a parasexual mechanism, similar to those observed in certain fungi (Lewis et al., 2011; Steensels et al., 2021; Gaunt et al., 2003; Bennett, 2015). While certain aspects of the molecular mechanisms involved in this process have been studied (Alves et al., 2018), the contribution of each parental strain in hybrid genomes, as well as the mechanisms and adaptive significance of the hybridization phenomenon, are not well understood.

Experimental evolution approaches in defined environments have been widely used to study the evolution of model organisms (such as bacteria, yeast, and Drosophila sp.) (Zeyl, 2006; Burke and Rose, 2009; Kawecki et al., 2012; Lenski, 2017) although this is not the case for parasitic protozoa. To investigate the effects of clonal reproduction and genetic exchange on the T. cruzi genome at the microevolutionary scale, we selected two closely related TcI strains (P1, P2) and three hybrid clones that were generated from a P1 × P2 cross (1C2, 1D12, 2C1) (Gaunt et al., 2003). In addition, to dissect the underlying mechanisms of hybridization in T. cruzi, we applied a comparative genomics approach based on genome sequence data generated for parental strains and hybrid clones at the beginning and at the end of the in vitro microevolution experiment.

Results

Stability of DNA content during experimental in vitro microevolution

To investigate the effects of clonal reproduction and genetic exchange on the T. cruzi genome at the microevolutionary scale, we selected two TcI strains (P1, P2) and three hybrid clones that were generated from a P1 × P2 cross (1C2, 1D12, 2C1) (Gaunt et al., 2003). The two parents and three hybrids were included in an in vitro evolution experiment in which they were continuously cultured as epimastigote forms for 5 years, equivalent to approximately 800 generations of replication by binary fission (Figure 1A). At the end of the experiment, we re-cloned the parasite lines and selected three clones per line for analysis. During this microevolution experiment, we monitored the DNA content of each parasite line at approximately 200-generation intervals to check for evidence of large-scale structural genomic changes. We observed that the DNA content of the two parental lines remained approximately constant (Figure 1B–D). At the start of the experiment, the total DNA content of the hybrid clones was approximately 70% greater than the parentals (Figure 1C, D, Lewis et al., 2009), which equated to slightly below expectations for a tetraploid parasite with one kinetoplast (containing mitochondrial DNA), that is, a 4n1k (4C) genome size. Over the course of the microevolution experiment, in contrast to P1 and P2, we found that the DNA content of the hybrids became progressively lower over time, up to a maximum decrease of 22.5% by generation 800. By extrapolation from the estimated genome sizes of P1 (94.5 Mb) and P2 (92.5 Mb) (Lewis et al., 2009), these data indicate genome erosion in the hybrids occurred at an average rate of 23 kb per generation. There were, however, notable differences in genome erosion patterns between the three hybrid lines. Hybrid 2C1 showed gradual genome erosion at each time point, with all three 800 generation evolved clones approximating a 3C genome size. Hybrid 1C2 showed a more intermittent pattern – most of the erosion occurred between 0–200 and 400–600 generations, reaching a 3C genome size, but interestingly, 2 of the 800 generation clones gained ~10% more DNA than at t=600. Lastly, hybrid 1D12 had an initial 9% loss between t=0 and t=200, but was then quite stable over time between 3C and 4C.

In vitro microevolution of Trypanosoma cruzi hybrids.

(A) Overview of experimental design. (B) DNA content profiles shown as propidium iodide (PI) staining intensity histograms measured by flow cytometry. PI fluorescence = relative fluorescence units; y-axis shows number of events from the population of parasites at the indicated generation of the in vitro culture experiment from start (t=0) to finish (t=800). Plots of one parent and one hybrid are shown, representative of three biological replicates for each time point. The vertical dashed line marks the position of the G1 fluorescence intensity peak at t=0. (C) Quantitative analysis of DNA content for both parents and three hybrid strains at each time point and for three clones per strain generated after the final time point of 800 generations. Data are the mean + SEM of three biological replicates after normalization against an internal standard and conversion to a ratio of the mean of the two parents at t=0. The predicted total DNA content ranges equating to tetraploid (4n1k) and triploid (3n1k) parasites are indicated, under the assumption that kDNA constitutes between 20% and 30% of total DNA. (D) The same data as shown in C, represented as the % change in DNA content over time compared to t=0. (E) Distinct genomic signatures exclusive to each parental strain were inherited by the hybrid strains. In black are represented SNPs present in both parental strains, in green SNPs exclusive to the P1 strain, in purple SNPs exclusive to the P2 strain, in dark blue SNPs common to all hybrids, and in light blue SNPs exclusive to each hybrid clone.

Next, we looked for evidence of instability at specific loci in the hybrid genomes using PCR-based multilocus microsatellite genotyping, which had previously provided some evidence for non-Mendelian inheritance in the hybrids from the original P1 × P2 cross (Gaunt et al., 2003). Specifically, at many loci the hybrids inherited more than one allele per parent. We found that three of the nine evolved hybrid clones had lost at least one microsatellite allele (Supplementary file 1). These data indicated that the enlarged genomes of T. cruzi hybrids were subject to erosion over an extended period of clonal replication. To dissect the underlying mechanisms, we applied a comparative genomics approach based on sequence data for parental and hybrid samples from the beginning and end of the microevolution experiment.

Parental strains display distinct genomic signatures

The genomes of the two parental strains were characterized using de novo genome assemblies from short Illumina reads. Despite high sequence coverage and the linking information provided by additional long insert size libraries, the final assemblies only reconstructed 77.1% and 74.8% of the P1 and P2 genomes, respectively (Supplementary file 2), due to the high repeat content of the genomes as also observed to other previously sequenced T. cruzi genomes. The haploid estimated genome size of P1 differed from P2 by approximately 2 Mbp, and both parental strains genomes were found to be slightly larger than the 44 Mbp reference TcI-Sylvio X10/1 strain (Talavera-López et al., 2021), in line with flow cytometry-based DNA content measurements (Lewis et al., 2009).

While the parental strains both belong to the TcI clade, with highly conserved synteny in the core regions, they still show significant genetic diversity. Reads from the parental strains were mapped to the reference TcI-Sylvio X10/1 genome, followed by SNP calling and a comparative analysis. The resultant genome-wide average SNP density was 9.28 and 3.71 SNP/kb for P1 and P2, respectively (Table 1). Most SNPs were located within repetitive regions, particularly in areas containing surface molecule gene family members, which can be related to gene expansion in these regions. Indeed, copy number variation (CNV) analysis showed that P1 displays these tandem-repeated and surface molecule-coding regions expanded in comparison to P2. To avoid the influence of low mapping quality in repeated regions, we applied a strict mapping quality filter, removing any SNP in regions with mapping quality below 50. SNP density in the retained high-quality regions was 2.84 and 2.11 SNPs/kb for P1 and P2, respectively (Table 1), and a total of 68,616 and 34,843 SNPs exclusive for each parental strain were catalogued for use in the analysis of the hybrid strains (Figure 1E). This level of differentiation between P1 and P2 provided ample scope to explore the genetic composition and inheritance patterns in the hybrid clones.

Table 1
Average SNP density per kilobase in starting generation parental strains and hybrid clones.
SNP/kb genome-wideSNP/kb high-quality mapping regions
P1-09.282.84
P2-03.712.11
1C2-07.633.48
1D12-07.463.29
2C1-08.223.49

Mitochondrial DNA (kDNA) comprises up to 20–30% of the total T. cruzi DNA, consisting of relatively conserved dozens of maxi- and thousands of minicircles (Souza et al., 2011). This range was used to infer 3C and 4C genome sizes (Figure 1E) and the frequency of kDNA-mapping reads was 10.1% for parents and 11.5% for hybrids in average. However, there was substantial variation between samples as expected, and no correlation with the flow cytometric measurement of total DNA content. Despite the large amounts of kDNA maxi- and minicircles/cell, our interpretation is that extraction efficiency of kDNA is lower than for nuclear DNA and more variable between samples, probably due to the highly intercalated molecular network structure of kDNA. All kDNA-mapping reads were removed prior genetic diversity analysis.

An increase in genetic diversity is observed after hybridization

To evaluate the impact of hybridization on genetic diversity, we compared variant densities in the high-quality mapping regions of the genomes of all hybrid and parental strains. The average SNP density in these regions was higher in all hybrids than in parental strains: 3.48 SNP/kb for 1C2-0, 3.29 SNP/kb for 1D12-0, and 3.49 SNP/kb for 2C1-0 (Table 1), partially due to the presence of alleles from both parental strains. In addition to the SNPs clearly inherited from each parental strain, additional SNPs common to all hybrids and SNPs exclusive to each hybrid clone were identified in both coding and non-coding regions, revealing sample-specific patterns (Figure 1E). Some SNPs that were common to all hybrid strains were not detected in the parental genomes, mainly consisting in mutations that occurred in culture during the approximately 70 generations of parent parasite growth before the cross was set up (Figure 1A). They are unlikely to be attributable to growth of a common progenitor hybrid prior to the cloning step because of differential inheritance of kDNA in 1D12 vs. 2C1 and 1C2 (Gaunt et al., 2003), which suggests they are derived from at least two independent hybridization events. Interestingly, de novo mutations common to all hybrids and exclusive to each hybrid clone were predicted to generate a higher percentage of non-synonymous changes in comparison to the mutations inherited from the parental strains (Supplementary file 3). Indeed, the ratio of non-synonymous changes per synonymous changes was higher in the de novo mutations common to all hybrids (2.36 for 1C2-0, 2.25 for 1D12-0, and 2.19 for 2C1-0), and exclusive to each hybrid clone (2.40 for 1C2-0, 2.23 for 1D12-0, and 2.37 for 2C1-0) than in those inherited from the parental strains (1.41 for 1C2-0, 1.40 for 1D12-0, and 1.42 for 2C1-0) (Supplementary file 3).

To investigate the impact of these mutations on the variable regions of the genome, we compared the ratio of surface protein-coding genes per non-surface protein-coding genes. This ratio is approximately 4-fold higher in SNPs common to all hybrids and in SNPs exclusive to each hybrid clone than in those inherited from the parental strains (Supplementary file 4). In addition, we compared the ratio of non-synonymous mutations in surface protein-coding genes per non-synonymous mutations in other genes. Interestingly, this ratio was approximately 3.3-fold higher in the hybrid clone-specific SNPs in comparison to those inherited from the parental strains (Supplementary file 5).

CCNV analysis reveals tetraploid hybrids

Chromosome copy number variation (CCNV) was determined using the combination of read depth coverage (RDC) and allele balance (AB) methods (Reis-Cunha and Bartholomeu, 2019). An increase or decrease in the mean RDC of a chromosome when compared to the overall genome coverage is an indicator of a gain or loss of chromosomal sequences, respectively (Reis-Cunha and Bartholomeu, 2019). If the ratio between the median chromosome coverage and the median genome coverage is approximately one, the chromosome has the same copy number as the genome, while fluctuations in this ratio indicate aneuploidies. Here, the RDC analysis was based on the ratio between the mean coverage of predicted single-copy genes in each chromosome and the mean coverage of all single-copy genes in the genome as described by Reis-Cunha et al., 2015 (Figure 2A). In this CCNV estimation, values of ‘0.5’, ‘1’, and ‘2’ denote that the chromosome has, respectively, ‘0.5’, ‘1’, or ‘2’ copies per haploid genome. This means that, if the studied genome is mainly diploid, a value of ‘1’ in this estimation represents two chromosomal copies (one per haploid genome), whereas a value of ‘1.5’ represents three copies and a value of ‘2’ represents four copies. This methodology was shown to eliminate bias caused by chromosomal repetitive content in T. cruzi (Reis-Cunha et al., 2015), which would otherwise obscure the correct copy numbers. We were unable to estimate CCNV in chromosomes 17, 22, 30, and 47 by this methodology due to the lack of single-copy genes in these chromosomes, and therefore the chromosomal somy prediction presented for them was estimated based on the ratio between the mean RDC of each chromosome position and the mean coverage of all genome positions (Figure 2A). Aneuploidy was observed in the P2 strain, involving trisomy of chromosomes 2 and 21 (Figure 2A, B). The RDC analysis indicated aneuploidy in chromosome 44 in the P1 strain, but due to the lack of regions with high-quality SNPs, it was not possible to confirm this in the AB analysis (Figure 2A, B). The RDC analysis also indicated aneuploidies in chromosome 6 in both parental strains and in chromosome 31 of P1 strain (Figure 2A). However, the AB analysis was not consistent across the entire chromosome (Figure 2B), which may be related to CNV in specific regions of these chromosomes rather than aneuploidy, as observed in other T. cruzi strains (Schwabl et al., 2019; Reis-Cunha et al., 2015).

Somy estimation based on read depth coverage (RDC) and allelic balance (AB) reveals aneuploidies in parental strains and tetrasomic hybrid clones.

(A) Aneuploidy analysis based on RDC among all 47 chromosomes in the starting generation of parental strains (P1-0 and P2-0) and hybrids (1C2-0, 1D12-0, and 2C1-0). An increase or decrease in the mean RDC of a chromosome when compared to the genome coverage is an indicator of a gain or loss of chromosomal sequences. When the ratio between the median chromosome coverage and the median genome coverage was approximately one, the chromosome was considered having the same somy as the genome, while fluctuations in this ratio (lower than 0.75 or higher than 1.25) were considered as putative aneuploidies. (B) Proportion of the alleles in heterozygous SNP positions in chromosomes with RDC ratio higher than 1.25 to confirm somy estimations. Blue points represent the proportion of the alleles in heterozygous SNP positions along the chromosome. Darker blue represents the frequency of the first allele while lighter blue represents the frequency of the second allele, black lines represent the median value in windows. (C) Ploidy estimation based on the proportion of the alleles in heterozygous SNP positions of single-copy genes. It is expected that diploid genomes display a proportion of each allele around 50%, triploid genomes around 33.3% and 66.6%, while tetraploid genomes around 25% and 75% or 50% (Lenski, 2017). Darker blue represents the frequency of the first allele while lighter blue represents the frequency of the second allele.

RDC analysis does not allow differentiation between different levels of euploidy (or near-euploidy), therefore whole genome ploidy was estimated by analysing the proportion of the alleles in heterozygous SNP positions of all single-copy genes (Figure 2C). In both parental strains, a peak at 50% was observed, while in the hybrids, peaks at 25% and 75% or 50% were observed (Figure 2C), which is expected for diploid and tetraploid genomes, respectively (Reis-Cunha and Bartholomeu, 2019). This ploidy estimation is in accordance with the previous DNA content estimation based on flow cytometry data, in which the hybrids displayed ~70% greater DNA content than the parental strains (Figure 1C, Lewis et al., 2009), given that this method could not determine the relative amounts of mitochondrial kDNA and nuclear DNA. Inheritance of SNPs exclusive to both parental strains were observed in the hybrid genomes (Figure 1E), which shows that the tetrasomic profile observed resulted from a contribution from both parental chromosomes.

In order to address chromosome specificities and confirm the somy estimates, the AB analysis was performed independently for each chromosome. The proportion of each allele with heterozygous SNPs per chromosome position was plotted for 46 TcI chromosomes (Figure 3—figure supplements 15), thus excluding chromosome 17 due to the absence of coding regions. Clear patterns of somy were observed in 37 chromosomes (chromosomes 1–16, 18–19, 21, 23, 25–29, 31–33, 35–39, 41, 43, 45–46; Figure 3—figure supplements 15). Due to a lack of regions containing high-quality heterozygous SNPs generating inconsistent AB, somy estimation was not reliable for chromosomes 17, 20, 22, 24, 30, 34, 40, 42, 44 (Figure 3—figure supplements 15). Based on our filtering criteria, all SNPs called in chromosome 47 were removed from further analysis, therefore no AB plot is presented for this chromosome. AB analysis shown to be reliable for estimating aneuploidy in chromosomes with longer core regions that contain housekeeping genes and other genes that are conserved between kinetoplastids, while the presence of large tandemly repeated regions and surface protein-coding gene families made read mapping, and thus copy number estimation difficult for some chromosomes. High resolution representations of the CCNV analysis are shown for six chromosomes (Figure 3). Chromosomes 2, 7, and 13 have longer tandem repeated regions and coding sequences for surface proteins (percentage of surface protein-coding genes: chr2 7.26%, chr7 9.55%, chr13 8.38%), while chromosomes 18, 21, and 37 have longer core regions (percentage of surface protein-coding genes: chr18 1.25%, chr21 1.98%, chr37 3.74%) (Figure 3A). The mapping quality in surface protein-coding sequences was often observed to drop significantly, which affected the inferred proportions of alleles in those regions, and this was found to lead to erroneous somy estimates (Figure 3B). To avoid this, a strict mapping quality filter was applied, and all the allele proportions were plotted by chromosome position. Using this strategy, it was possible to identify specific regions in chromosomes that could be used to infer CCNV with high confidence. Chromosomes 2, 7, and 13 showed longer regions with diverse allele proportions, nonetheless it was still possible to identify the trisomy of chromosome 2 of P2 and tetrasomies in hybrids based on using the smaller core regions (Figure 3C). Chromosomes 18, 21, and 37 showed somy patterns that were clearly consistent with the whole genome CCNV, confirming the trisomy in chromosome 21 of P2 and tetrasomy in the hybrid clones (Figure 3D).

Figure 3 with 5 supplements see all
Somy estimation per chromosome based on allele balance (AB) in starting generation confirming aneuploidies in parental strains and tetrasomic hybrid clones.

(A) Gene distribution and its implication in mapping quality (MQ) and somy estimation of six representative TcI chromosomes with longer surface molecule-encoding regions (chromosomes 2, 7, and 13) and shorter encoding surface molecule-encoding regions (chromosomes 18, 21, and 37). Surface protein-coding genes are represented as cyan blue boxes while other genes are represented as black boxes. (B) Implication of chromosome gene distribution in MQ and somy estimation based on AB in chromosome 2 (longer surface molecule-encoding regions) of parental strain 1 (P1-0). Dark green represents the fluctuation in MQ along chromosome positions. Blue points represent the proportion of the alleles in heterozygous SNP positions along the chromosome. Darker blue represents the frequency of the first allele while lighter blue represents the frequency of the second allele, black lines represent the median value in windows. (C) Proportion of the alleles in heterozygous SNP positions in chromosomes with longer surface molecule-encoding regions. (D) Proportion of the alleles in heterozygous SNP positions in chromosomes with shorter surface molecule-encoding regions. Blue points represent the proportion of the alleles in heterozygous SNP positions along the chromosome. Darker blue represents the frequency of the first allele while lighter blue represents the frequency of the second allele, black lines represent the median value in windows.

CCNV reveals sequential loss of chromosomal copy in hybrid clones after culture growth

CCNV and whole genome ploidy were evaluated in three replicate clones for each parental and hybrid strains after 800 generations in continuous in vitro culture. The trisomies in chromosomes 2 and 21 of P2 (Figure 2A) were no longer observed in any replicate clone after growth in culture (Figure 4A). New aneuploidies were identified in the parental strains, involving trisomy of chromosomes 37 and 45 in all P1 clones and trisomy of chromosomes 32 and 37 in all P2 clones (Figure 4A, B). As observed in the first generation, the RDC and AB analyses in chromosomes 6 and 31 were not consistent throughout the chromosomes (Figure 5—figure supplements 115). Whole genome ploidy estimation revealed a shrinking pattern in the hybrids, which was consistent with the prior flow cytometry analysis (Figure 1C). While parental strains remained essentially diploid (apart from the few aforementioned trisomies), a mixture between trisomic and tetrasomic profiles was observed in the hybrids after growth in culture (Figure 4C). Indeed, hybrid 2C1 displayed a clear transition from a tetraploid to a triploid pattern after growth in culture (Figure 4C), suggesting that genome erosion eventually leads to a return towards diploidy after the hybridization event, as observed in the naturally occurring TcV and TcVI hybrids (Lewis et al., 2009). In contrast with the first generation, the RDC analysis indicates putative aneuploidies in hybrid clones after extended culture growth (Figure 4A). Despite the patterns of genome erosion, tetrasomic profiles in chromosomes 19 and 37 were present in all hybrid clones after culture growth (Figure 4A, B). Taken together with previous DNA content analysis of intermediate time points (Figure 1), our results from the different in vitro evolved clones indicate that the genome erosion seems to be gradual phenomena, with sequential losses of chromosome copies and regions rather than coordinated jumps between ploidy levels. The proportion of each allele in heterozygous SNPs per chromosome position was also plotted for the 47 chromosomes in each replicate after culture growth (Figure 5—figure supplements 115). As observed in the representative chromosomes, the evolved parental clones still have a clear disomic pattern, while a mixture of trisomic and tetrasomic patterns is observed in evolved hybrid clones (Figure 5; Figure 5—figure supplements 115).

Somy estimation based on read depth coverage (RDC) and allele balance (AB) reveals novel aneuploidies and a shift in hybrid strains ploidy after 800 generations in in vitro culture.

(A) Aneuploidy analysis based on RDC among the 47 chromosomes in all clones after culture growth. (B) Proportion of the alleles in heterozygous SNP positions in chromosomes showing aneuploidies in all parental or hybrid clones. (C) Comparison of somy estimation based on the proportion of the alleles in heterozygous SNP positions of single-copy genes. First generation is presented in shaded blue, while the 800 generation is presented in orange.

Figure 5 with 15 supplements see all
Somy estimation confirming novel aneuploidies after in vitro culture.

(A) Proportion of the alleles in heterozygous SNP positions in chromosomes with longer surface molecule-encoding regions. (B) Proportion of the alleles in heterozygous SNP positions in chromosomes with shorter surface molecule-encoding regions. Points represent the proportion of the alleles in heterozygous SNP positions along the chromosome.

Patterns of genome erosion after in vitro microevolution

To investigate the patterns of genome erosion and CCNV in the hybrids, we performed a CNV analysis throughout the whole genome of parental and hybrid clones after extended culture growth. Despite the stability of DNA content in parental clones during our experiment (Figure 1C, D), similar gene categories showed CNV in both parental and hybrid clones after culture growth (Supplementary files 6–12). Gene losses occurred mainly in host-parasite interaction genes such as surface protein-coding genes and genes related to exocytosis and cell secretion, while genes related to macromolecule metabolism, ion transport, and cell division showed an increase in copy numbers (Figure 6; Figure 6—figure supplement 1). Despite the events of genome erosion, chromosomes 19 and 37 remained tetrasomic in all hybrid clones and an extra copy of chromosome 37 is observed in all parental clones after culture growth (Figure 4B and Figure 5B). Chromosomes 19 and 37 contain long core regions with abundant housekeeping genes involved in cellular metabolism, DNA replication and transcription (Supplementary files 10 and 11) which may contribute to fitness in culture. Altogether, these results suggest that the presence or absence of selective pressure from the environment may shape CCNV and genome erosion in T. cruzi, resulting in distinct genome patterns after hybridization events occur.

Figure 6 with 1 supplement see all
Chromosome gene composition and copy number variation (CNV) comparison indicate genome erosion patterns in surface molecule-coding regions after in vitro culture growth.

(A) CNV comparison in parental strain. (B) CNV comparison in hybrid strain. Each dot in the CNV profile chart represents the normalized depth per kb, while lines represent the median ratio (Cingolani et al., 2012). Blue points and lines represent CNV profiles in the first generation, while red points and lines represent CNV profiles after 800 generations of in vitro culture growth. surface protein-coding genes are represented as cyan blue boxes, while other genes are represented as black boxes.

Genetic diversity in evolved genomes

The effect of extended culture growth on genetic diversity was evaluated in all parental and hybrid evolved clones. Regarding the parental strains, all P1 replicates showed a similar SNP density in the retained high-quality regions after culture growth (P1-0: 2.84; P1-1: 2.80; P1-2: 2.77; P1-3: 2.82), while all P2 replicates showed an increase in SNP density after culture growth (P2-0: 2.11; P2-1 2.78; P2-2 2.81; P2-3 2.66, Mann-Whitney, p=0.042) (Figure 7A). The P2 replicates have a reduction in copies of HUS1-like encoding genes, which are involved in DNA repair, after culture growth (Supplementary file 12). Our gene enrichment analysis showed that a number of these genes were located on chromosome 2 (Supplementary file 12), suggesting that the loss of trisomy in this chromosome could possibly have contributed to the higher number of mutations that was found in the P2 clones. New genomic variants were found in both evolved parental clones (Figure 7A). All evolved hybrid replicate clones, with exception of 1D12-3, showed a slight decrease in SNP density (1C2-0: 3.48; 1C2-1: 2.96; 1C2-2: 3.26; 1C2-3: 3.10; 1D12-0: 3.29; 1D12-1: 3.24; 1D12-2: 3.22; 1D12-3: 3.36; 2C1-0: 3.49; 2C1-1: 3.34; 2C1-2: 2.64; 2C1-3: 3.25, Figure 7A), likely caused by the loss of genetic material. Multiple new genomic variants were found in the in vitro evolved hybrid progeny, displaying patterns specific to each clonal isolate (Figure 7A). However, these variants were distinct from the novel mutations that emerged in the period between hybrid formation and the start of the microevolution experiment (t=0) as described above.

SNP density analysis throughout the genome indicates that surface molecule-coding regions displayed a higher number of variants than other regions of the genome for all strains.

(A) Number of SNPs in all clones after long-term in vitro culture. In black are represented SNPs common to all hybrid strains, in green SNPs specific to P1 strain, in purple SNPs specific to the P2 strain, in dark blue SNPs common to all hybrids in the first generation, in light blue SNPs exclusive to each hybrid clone in the first generation, and in orange SNPs exclusive to each clone after culture growth. (B) Chromosome average SNP density per kilobase in all samples before (1 g) and after culture growth (800 g). (C) 1C2 chromosomes with higher SNP density plotted displaying gene composition and SNP density. Surface protein-coding genes are represented as cyan blue boxes, while other genes are represented as black boxes.

To better understand how these genomes evolved during culture growth, the SNP density was also evaluated in each chromosome separately. The SNP density pattern per chromosome remained similar for all samples in first generation (t=0) and after culture growth (t=800). Some chromosomes (e.g. chromosomes 5, 11, 24, and 37) display higher SNP density than the others across all samples (Figure 7B). P2 evolved clone replicates had an increase in SNP density in nearly all chromosomes (Figure 7B). The emergence of novel SNPs after culture growth resulted in most of the non-synonymous mutations. The ratio of non-synonymous per synonymous changes in novel SNPs was similar between both parental strains (P1-1: 2.26; P1-2: 2.22; P1-3: 2.38; P2-1: 2.25; P2-2: 2.22; P2-3: 2.26) and hybrid clones (1C2-1: 2.20; 1C2-2: 2.48; 1C2-3: 2.26; 1D12-1: 2.41; 1D12-2: 2.5; 1D12-3: 2.40; 2C1-1: 2.21; 2C1-2: 2.18; 2C1-3: 2.30). Interestingly, the regions of the genome with higher genetic variation in all clone isolates contain surface protein-coding sequences (Figure 7C). To investigate the impact of new mutations in these regions, we compared the ratio of non-synonymous mutations in surface protein-coding genes to non-synonymous mutations in other genes. This ratio was higher for the mutations that appeared in each evolved parental clone in comparison to those present in the first generation (approx. 2.5-fold higher for P1 clones and 3.5-fold higher for P2 clones; Supplementary file 13). Regarding the hybrid strains, a similar ratio was observed in the SNPs that appeared in each evolved hybrid clone and those exclusive to each hybrid clone present in the first generation (Supplementary file 13). These results indicate that surface molecule-coding regions may mutate faster than the rest of the genome either by clonal or parasexual mechanisms, representing an important region of genomic diversification.

Discussion

The results from whole genome sequencing analysis of this T. cruzi genetic cross have clearly demonstrated that, while the parental strains were diploid, all initial hybrid clones were essentially tetraploid. We showed that genomic variants exclusive to both parental strains were present in all hybrid clones, indicating that the tetraploid profile observed is obtained by the fusion of the two disomic parental genomes, although more complex pathways cannot be ruled out in the absence of direct observational evidence. In addition, we showed that genome erosion seems to be gradual, with sequential losses of chromosome copies rather than coordinated jumps between ploidy levels, though with some variation in erosion rates between different evolved hybrid lines. As previously proposed (Lewis et al., 2011; Gaunt et al., 2003), our results confirmed that hybridization in T. cruzi happened via a parasexual mechanism rather than a canonical meiotic process. This phenomenon resembles the parasexual cycle of the pathogenic fungus Candida albicans, which also involves diploid fusion followed by non-meiotic genome reduction back to aneuploids and diploids (reviewed by Bennett, 2015). In contrast, hybridization in T. brucei and possibly in Leishmania spp. has been associated with parental strains having undergone a meiosis-like process followed by a fusion of haploid cells (Inbar et al., 2019; Peacock et al., 2014; Gibson et al., 2008; Rogers et al., 2014; Louradour et al., 2020; Van den Broeck et al., 2020). In this process, mainly diploid hybrids are formed, but polyploid hybrids may also be produced by the fusion between parental cells that failed to undergo meiosis (Inbar et al., 2019; Peacock et al., 2014; Gibson et al., 2008; Rogers et al., 2014; Louradour et al., 2020; Van den Broeck et al., 2020). Despite the clear evidence of parasexual hybridization in in vitro T. cruzi hybrids, it is still not clear if a meiosis-like mechanism could also contribute to the generation of natural T. cruzi hybrids, as observed in other trypanosomatids. Further work will be required to establish whether there is a single common mechanism or several alternative modes of hybridization.

Our microevolution experiment has shown that T. cruzi genomes are highly responsive to the environmental conditions. The CNV analysis showed that while surface gene numbers were being eroded after culture growth, genes involved in cell metabolism, cell division, and ion transport were being expanded. This genome plasticity was not only observed at the level of gene CNV, but also at the chromosome level. The parental strains displayed distinct aneuploidies before and after culture, and despite the overall pattern of genome erosion, some chromosomes remained tetrasomic in the hybrid strains. Indeed, aneuploidies have been widely reported in natural T. cruzi populations, and although CCNV varies among and within DTUs, it seems constant within a given population (Schwabl et al., 2019; Reis-Cunha et al., 2015; Reis-Cunha et al., 2018). While in many multicellular organisms, aneuploidy is known to have severe consequences, in trypanosomatids it represents an important mechanism for rapidly overcoming environmental diversity (Reis-Cunha et al., 2018; Prieto Barja et al., 2017; Bussotti et al., 2018; Grünebast and Clos, 2020). Aneuploidy is reversed quickly once the benefits of an extra chromosome expire, as observed in P2 aneuploidies in chromosomes 2 and 21. In addition, trypanosomatids may control overexpression caused by additional gene copies, at the protein level (Dumetz, 2017). We found aneuploidy in chromosome 37 in all parental and hybrid clones after culture growth, suggesting a benefit to parasite fitness in culture. As observed in Leishmania spp., the fixation of genetic alterations is the result of positive selection processes that adapt parasite fitness to a given ecology or transmission cycle (Bussotti et al., 2018). We can speculate that in natural T. cruzi hybrids, distinct CCNV and genome erosion patterns would be observed due to specific environmental selective pressure with the expansion of genes involved in host-parasite interactions.

Genome variant analyses showed that an increase in genetic sequence diversity was present after hybridization. Hybridization events are associated with a burst of novel mutations and a myriad of genomic rearrangements (e.g. transpositions, genome size changes, chromosomal rearrangements) in distinct eukaryotes (Steensels et al., 2021; Samarasinghe et al., 2020). Our results indicate that the process of T. cruzi hybrid formation included both new mutations and most likely recombination between parental chromosomes. Here, we were able to show an increase in the number of SNPs in the hybrid strains with the accumulation of amino acid substitutions. Interestingly, the numbers of new SNPs in culture showed a clear difference between the two parental strains after 800 generations, in which P2 showed a higher mutation rate. A previous study has reported that TcI strains have a mismatch repair machinery, which repairs base misincorporation and erroneous insertions and deletions during DNA recombination and replication, that is more efficient than in other DTUs (Reis-Cunha et al., 2015; Augusto-Pinto et al., 2003; Machado et al., 2006). As described in Leishmania major, HUS1 was shown to be required for genome stability under non-stressed conditions and its suppression had a genome-wide mutagenic effect (Damasceno et al., 2018). Since many of the genes coding for the HUS1-like proteins are encoded on chromosome 2, the loss of trisomy in this chromosome after culture growth could possibly be related to the increase in mutations observed in the P2 clones. In general, culture growth led to a reduction in genetic variants in nearly all evolved clones in comparison to the first generation. This reduction may be related to the loss of heterozygosity, but also to the loss of gene copies and genome erosion of surface molecules regions after culture growth. In addition, we could not identify any bias in the loss of a specific parental chromosome even in triploid hybrid clones, suggesting recombination between the parental chromosomes. As material from both parental strains has been lost during over 800 generations, it is likely that the hybrid clones would stabilize at diploidy and contain chromosomes from both parental strains, with a higher polymorphism than the parental strains. This is similar to strains from the known natural hybrid clades, TcV and TcVI. Strains from these clades have diploid genomes with high repeat content, and high levels of polymorphism in many regions, resulting from alleles from two divergent parental strains (from clades TcII and TcIII) (Lewis et al., 2011; Messenger and Miles, 2015; Lewis et al., 2009). From our analyses, it thus appears that the in vitro hybrid formation presented here is highly similar to the process that generated the natural hybrids.

Our microevolution experiment has shown that surface protein-coding genes represent the regions of the genome with higher genetic diversity and more rapid evolution. Accumulations of distinct genomic signatures and amino acid substitutions were observed in both parental and hybrid clones after relatively few generations. We have previously observed that there is a higher frequency of recombination in repeat-rich regions of the T. cruzi genome (Talavera-López et al., 2021), and it is likely that this is also the case for the hybrids.

We can speculate that this is a likely explanation for variability in repeat and gene copy numbers between hybrid and non-hybrid clades in response to the distinct environmental conditions. Since these parasites were cultivated solely in culture, it is still not clear if this mechanism is related to the lack of pressure caused by the progression within triatomine bugs or by the interaction with the host immune system, or if these genomes display an intrinsic mechanism of high diversification of these regions. Recent comparative genomics of natural T. cruzi strains have shown a similar pattern, suggesting that these regions indeed evolve more rapidly than other regions of the genome (Wang et al., 2021).

For T. cruzi, there is clear evidence for relatively recent inter-DTU hybridization between TcII and TcIII as the origin of TcV and TcVI, which are now widespread in domestic transmission cycles in southern South America (Lewis et al., 2011; Messenger and Miles, 2015; Carrasco et al., 1996). Other evidence for inter-DTU genetic exchange between TcI and TcIV comes from phylogenetic incongruence between nuclear and mitochondrial (kDNA maxicircle) sequences. However, the frequency of natural genetic exchange events between lineages/clades/DTUs appears to be too low to disrupt the overall stability of the main six clades, or seven clades if the TcBat lineage is included (Lewis et al., 2011; Messenger and Miles, 2015). The exception is the validity and position of the poorly sampled TcIV clade, which includes strains with TcI-like and/or TcIII-like features. The robustness of the other clades is likely promoted by ecological and/or geographical isolation, for example, association with distinct vector and host species. There may also be a high fitness cost for tetraploidy in the wild that is not replicated in laboratory conditions.

In summary, we confirmed that in vitro hybrid formation in T. cruzi happened in a parasexual mechanism, representing an important mechanism of genetic diversification. Tetraploid hybrids showed patterns of progressive genome erosion shaped by the environment’s selective pressure, which may explain distinct phenotypes in isolated natural hybrid populations. In addition, an increase in coding sequence diversity is observed in hybrids in comparison to the parental strains, related to both new mutations and most likely recombination events. Finally, our microevolution experiment has shown that repetitive gene families related to immune evasion evolved more rapidly than other regions of the genome, indicating an intrinsic mechanism of genetic variation of these regions.

Materials and methods

Parasites and microevolution experiment

Request a detailed protocol

For routine culture, T. cruzi epimastigotes were grown axenically in supplemented RPMI as previously described (Carrasco et al., 1996). Parasites were cloned by limiting dilution. The microevolution experiment comprised continuous in vitro cultures of each parasite line for 5 years. Three experimentally generated hybrid clones (1C2, 1D12, and 2C1) and their two parents (P1 and P2) were seeded in the primary culture at 2 × 105/ml and then allowed to grow for seven generations to stationary phase. The parent and hybrid lines used to initiate the microevolutions experiment (t=0) were estimated to be respectively 70 and 95 generations removed from the hybridization events. Parasites were passaged into fresh media every 2 weeks at a seeding density of 2 × 105/ml, equating typically to 1% of the stationary phase cultures.

Flow cytometry

Request a detailed protocol

DNA content was determined as previously described (Lewis et al., 2009). In brief, mid-log phase epimastigotes were washed in PBS, then fixed overnight in ice-cold 70% methanol/30% PBS. Fixed cells were washed in PBS, adjusted to 1 × 106cells/ml, and incubated for 45 min with 10 μg/ml propidium iodide and 10 μg/ml RNAse A at 37°C. Fluorescence was detected using a FACSCalibur flow cytometer for a minimum of 10,000 events. FlowJo software (Tree Star Inc, Ashland, OR) was used to plot histograms and identify G1-0 and G2-M peaks. Mean G1-0 values were taken to infer relative DNA content. An internal control T. cruzi strain, Esm cl3, was included in each run. Relative DNA content values were calculated as ratios compared to the internal standard. For experimental hybrids, the ratios relative to each parent (P1 or P2) were also determined using the mean standard:parent ratios derived from 12 independent experiments.

Microsatellite analysis

Request a detailed protocol

Genotyping was done for four microsatellite loci: MCLF10, 10,101(TA), 7,093(TC), and 10,101(TC) as previously described (Lewis et al., 2009). Briefly, microsatellites were PCR amplified from genome DNA samples using fluorescent-labelled primers targeting conserved flanking regions. Amplicon lengths were determined using a 48-capillary 3730 DNA analyzer (Applied Biosystems, UK) and analysed using Genemapper v3.5 software (Applied Biosystems, UK). The size of different PCR products (alleles), visualized as fluorescence peaks, were determined automatically by the software using a size standard to calibrate the calculations. All allele size calls made by the software were checked manually against a library of known TcI alleles (Llewellyn et al., 2009).

Whole genome sequencing

Request a detailed protocol

Total genomic DNA was isolated directly from long-term parasite cultures using the Gentra Puregene Tissue Kit (Qiagen) according to the manufacturers’ instructions. A total of 40 ng of genomic DNA was used as a template to prepare the sequencing libraries with the Rubicon Kit and the Illumina TruPlex kit. For the two parent strains, a 180 and 350 bp insert sizes paired end libraries were produced, plus an additional 8 kb insert size mate paired library. For the hybrid offspring, a single paired end library was produced with an insert size of 350 bp. All libraries were sequenced using the Illumina HiSeq 2500 platform. The library complexity of each parent was analysed using the 17mer distribution of the Illumina libraries using Jellyfish (Llewellyn et al., 2009).

De novo genome assembly

Request a detailed protocol

The genomes of the parent strains were assembled using the 180 bp Illumina paired end library and the 8 kb Illumina mate pair library using the ALLPATHS-LG v52488 (https://software.broadinstitute.org/allpaths-lg/blog/) assembler with K=96, using the TcI Sylvio X10/1 reference strain (Talavera-López et al., 2021) for evaluations. The resulting contigs were scaffolded using both Illumina libraries and the BESST scaffolder (Sahlin et al., 2014). Later, gaps in the scaffolded assemblies were filled using the 180 bp with GapFiller (Nadalin et al., 2012) and chromosomes were numbered according to their sizes as for the human genome. Each genome was submitted for annotation, prior removal of repetitive elements using RepeatMasker (https://www.repeatmasker.org/), to the Companion annotation pipeline (Steinbiss et al., 2016).

Data processing and read mapping

Request a detailed protocol

Paired end libraries were quality filtered using Nesoni Clip program (https://github.com/Victorian-Bioinformatics-Consortium/nesoni; Harrison, 2020) removing bases with a Phred quality score below 30 and reads shorter than 64 nucleotides; later, sequencing adaptors were removed. Mate paired libraries were processed as above, plus an additional step to reverse complement the filtered reads. Filtered reads, from all the sequencing libraries produced, were mapped against the TcI Sylvio X10/1 reference genome using Burrows-Wheeler Aligner (Li and Durbin, 2009). The mapping files were sorted, PCR and optical duplicates were removed, and read groups were added using Picard Tools v1.134 (https://broadinstitute.github.io/picard/; Schreiber, 2022). These mapping files were used for downstream analyses.

Identification of genomic variation

Request a detailed protocol

SNPs were called using a mapping-based approach. Mapping files were sorted with Samtools v1.11, PCR-duplicates were marked with Picard v2.22.4 and SNPs were identified with Genome Analysis Toolkit (GATK – https://gatk.broadinstitute.org/hc/en-us) using the HaplotypeCaller algorithm with a minimum quality value of 30 and a minimum depth of coverage of 10. To avoid the influence of low mapping quality in repeated regions, a strict mapping quality filter was applied and any SNP in regions with mapping quality below 50 was removed using BCFtools (Li et al., 2009).

VCFtools package (Li et al., 2009) was used to infer SNP density per 1 kb (option --SNPdensity) and SNPs exclusive to each strain were identified using both VCFtools and BEDtools (Li et al., 2009; Danecek et al., 2011; Quinlan and Hall, 2010). The functional effect of these variants and the ratio of synonymous per non-synonymous mutations was predicted using SNPEff v4.4 (Cingolani et al., 2012).

CCNV and ploidy estimation

Request a detailed protocol

CCNV and ploidy estimation were performed using a combination of RDC and AB methods as previously described (Reis-Cunha and Bartholomeu, 2019; Reis-Cunha et al., 2015). Briefly, the estimation of CCNV was based on the ratio between the mean coverage of predicted single-copy genes in a given chromosome and the mean coverage of all single-copy genes in the genome. This approach was based on the RDC of 2602 1:1 orthologs between the haplotypes of the reference SylvioX10/1 genome and the parental strains assemblies, identified using OrthoVenn2 (https://orthovenn2.bioinfotoolkits.net/home). For chromosomes lacking single-copy genes, CCNV was estimated based on the ratio between the mean RDC of each chromosome position and the mean coverage of all genome positions. If the ratio between the median chromosome coverage and the median genome coverage was approximately one, the chromosome had the same copy as the genome overall, while fluctuations in this ratio (lower than 0.75 or higher than 1.25) were putative aneuploidies.

Heterozygous SNPs with a mapping quality higher than 50 were selected for AB analyses to confirm aneuploidies and to estimate chromosome somy and whole genome ploidy. Whole genome ploidy was estimated by analysing the proportion of the alleles in heterozygous SNP positions of all single-copy genes. To confirm the RDC somy estimations, the proportion of each allele with heterozygous SNPs per chromosome position was plotted for 46 TcI chromosomes. All RDC and AB graphs were generated in R with ggplot2 (https://ggplot2.tidyverse.org), pheatmap (https://CRAN.R-project.org/package=pheatmap), and vcfR (Knaus and Grünwald, 2017) packages.

Genome erosion patterns were identified by CNV estimations using Control-FREEC package (Boeva et al., 2012). Gene Ontology Enrichment analyses in CNV regions were performed using TriTryp tools (https://tritrypdb.org/tritrypdb/app/) and surface molecules genes were assigned from the available TcI Sylvio X10/1 annotation (Talavera-López et al., 2021) (available here).

Data availability

The data generated in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/https://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA748998.

The following data sets were generated
    1. Matos GM
    2. Lewis MD
    3. Talavera-López C
    4. Yeo M
    5. Grisard EC
    6. Messenger LA
    7. Miles MA
    8. Andersson B
    (2022) NCBI BioProject
    ID PRJNA748998. Comparative genomic analyses of Trypanosoma cruzi experimental hybrids.

References

    1. Bern C
    (2015) Chagas’ Disease
    The New England Journal of Medicine 373:456–466.
    https://doi.org/10.1056/NEJMra1410150
    1. Burke MK
    2. Rose MR
    (2009) Experimental evolution with Drosophila
    American Journal of Physiology. Regulatory, Integrative and Comparative Physiology 296:R1847–R1854.
    https://doi.org/10.1152/ajpregu.90551.2008

Decision letter

  1. Antoine Claessens
    Reviewing Editor; University of Montpellier, France
  2. Jos W van der Meer
    Senior Editor; Radboud University Medical Centre, Netherlands
  3. Julius Lukes
    Reviewer; Biology Centre, ASCR, v.v.i., Czech Republic
  4. Fernán Agüero
    Reviewer

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Experimental microevolution of Trypanosoma cruzi reveals hybridization and clonal mechanisms driving rapid diversification of genome sequence and structure" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Jos Van der Meer as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Julius Lukes (Reviewer #1); Fernán Agüero (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. The authors should revise their methods for SNP calling. What GATK variant caller was used? UnifiedGenotyper or HaplotypeCaller? The latter is recommended. It is also unclear which filtering criteria were used to reduce the likelihood of false-positive SNPs. Simply setting the mapping quality to a minimum of 50 is not sufficient. We recommend to have a look at the methods of Schwabl et al. 2019 (https://doi.org/10.1038/s41467-019-11771-z) where rigorous analyses were done to remove false-positive SNPs. In particular, it would be helpful to mask the reference genome based on the mappability of its own reads. This is also important for later sections in the manuscript where the authors claim increased diversity in repetitive surface molecule genes.

Editor: Also, it is not clear whether Indels were analysed at all, they appear once in the Methods and not in the rest of the manuscript.

2. Figure 1e, Table 1 and corresponding main text – We are puzzled regarding the number of SNPs that are unique to each hybrid clone and that are common to all hybrids.

First, it seems to us that the authors have done SNP calling in all strains independently, which may result in genotypes being missed because of low local coverages. Joint genotyping would mitigate such biases. We would recommend to do a joint genotyping across all parental and hybrid strains, and then only retain those SNP positions for which there is no missing data.

Second, in the main text on page 7, it is argued that these SNPs would have occurred in culture during the approximately 50 generations of growth BEFORE hybrid formation. But in the last sentence of that paragraph it is stated that these SNPs appeared after the hybridisation event. How can the authors tell whether these SNPs occurred during culture before hybridization, or whether they happened just after hybridization? The setup of the experiment should be more clearly explained: how many generations happened before hybridization, how many after, when exactly was sequencing done during this process? This is also important for later sections that claim that hybridization resulted in a burst of novel mutations.

It seems like the authors are discovering the impact of tandem repeat regions (copy number variations) on SNP discovery. Most of page 10 and Figure 3 merely describes the fact that you will have more false-positive SNPs within the repeat regions. I'm not sure why this is of interest here. Such SNP's should not have been included in the first place.

3. Why was the mapping quality in surface molecules inferior to the rest? That begs for explanation. "Surface molecule genes", frequently used here, sounds somewhat weird to this reviewer. Couldn't it be rather "Surface protein-coding genes" or the like?

4. I understand the implications of trying to map short reads to repetitive regions. For example, PAINTS (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7912377/) removes noisy genomic regions (large gene families) before attempting to do ploidy estimations. This seems intuitive, and this is also explained in this Ms, in light of this maybe authors can save some space by omitting text and figures that are used to discuss and show these difficulties? Maybe there is not much to gain by showing that estimating ploidies in chromosomes with low numbers of housekeeping genes is difficult. Just state that and put the data into Supplementary file?

5. Based on the results presented, I don't agree that there is gradual erosion overall across all clones.

1D12 hybrid seems to have gone through an initial loss of DNA content but then its DNA content seems to have remained stable (Figure 1D). In addition, it's genome-wide somy profile remained stable at tetrasomy (Figure 4C). Also, there also seems to be an increase in DNA content after 600 generations. Finally, microsatellite alleles are only consistently lost in hybrid 2C1, at three different loci. No loss was observed in hybrid 1D12 and only one allele was lost in hybrid 1C2. This also raises the question what has caused the loss of DNA content in hybrid 1D12. Based on microsats and somy, there is no clear evidence for genomic erosion. So, what part of the genome is lost in the beginning of the experiment? The authors could do a comparative genomic analyses of the parental genomes and the hybrid genomes, to figure out what portions of the genomes are commonly/uniquely lost in the hybrids.

6. One of the conclusions from the study of accumulation of mutations, is that surface molecules may mutate faster. The reasoning for this if I read and understood correctly, is that there were many "mutations that appeared in each parental clone after culture growth". Do authors mean to say that these mutations appeared after the genetic cross experiment in Gaunt et al. 2003? Can the authors clarify how these parental clones were kept and passaged since then? How many passages did they undergo before being sequenced? The same clarification is warranted for hybrids. When the authors state that "while the parental strains were diploid, all initial hybrid clones were essentially tetraploid" (Discussion, page 15), the readership would like to know if these essential tetraploid states were observed just after the hybridization (genetic exchange) event (e.g. clones were kept in liquid nitrogen since then and only thawed for brief culture before sequencing) or if these states had some passages in culture after the genetic exchange event (how many?).

7. Data and code availability: The mentioned accession number was not found in the BioProject database. Please make code available via GitHub or other repository.

https://doi.org/10.7554/eLife.75237.sa1

Author response

Essential revisions:

1. The authors should revise their methods for SNP calling. What GATK variant caller was used? UnifiedGenotyper or HaplotypeCaller? The latter is recommended. It is also unclear which filtering criteria were used to reduce the likelihood of false-positive SNPs. Simply setting the mapping quality to a minimum of 50 is not sufficient. We recommend to have a look at the methods of Schwabl et al. 2019 (https://doi.org/10.1038/s41467-019-11771-z) where rigorous analyses were done to remove false-positive SNPs. In particular, it would be helpful to mask the reference genome based on the mappability of its own reads. This is also important for later sections in the manuscript where the authors claim increased diversity in repetitive surface molecule genes.

Editor: Also, it is not clear whether Indels were analysed at all, they appear once in the Methods and not in the rest of the manuscript.

We thank the reviewers for highlighting these issues. These are important points for this complex genome and we agree that they should be clear. HaplotypeCaller was used, and more information on the SNP calling was added in the revised manuscript. We are familiar with the suggested methods, since the corresponding author co-authored the Schwabl et al. (2019) paper. We have now used the masking used in that paper to compare the two strategies. Only a small number of SNPs are lost when the masking is added, and the distribution is unchanged. Thus, the conclusions are not changed. In addition, we found that our strategy removed a larger proportion of the genome from the analysis than the masking strategy. As the strains used in the hybrid strategy differ significantly from the reference in many repetitive regions, it is likely that the strategy using only the parent/hybrid reads gives more accurate results. The strict mapping, combined with visualization of the mapping quality and comparing with the repeat content, has enabled us to more reliably remove unreliable SNPs in these extremely repetitive regions. As we produced the reference genome (Talavera-Lopez et al., 2021), we have performed extensive mapping of the reference reads to the sequence to identify repetitive regions. Indels were not found to be informative.

2. Figure 1e, Table 1 and corresponding main text – We are puzzled regarding the number of SNPs that are unique to each hybrid clone and that are common to all hybrids.

First, it seems to us that the authors have done SNP calling in all strains independently, which may result in genotypes being missed because of low local coverages. Joint genotyping would mitigate such biases. We would recommend to do a joint genotyping across all parental and hybrid strains, and then only retain those SNP positions for which there is no missing data.

As explained below, these results are not caused by the SNP calling. As the coverage is high, joint genotyping would add significantly to the analysis. Moreover, the independent clone/parental strain analysis would not cause biased SNP detection, since a detailed analysis has compared the same regions of each sequenced genome, being consistent among the raw reads despite the sequencing coverage.

Second, in the main text on page 7, it is argued that these SNPs would have occurred in culture during the approximately 50 generations of growth BEFORE hybrid formation. But in the last sentence of that paragraph it is stated that these SNPs appeared after the hybridisation event. How can the authors tell whether these SNPs occurred during culture before hybridization, or whether they happened just after hybridization? The setup of the experiment should be more clearly explained: how many generations happened before hybridization, how many after, when exactly was sequencing done during this process? This is also important for later sections that claim that hybridization resulted in a burst of novel mutations.

We agree that more detail on sample origins and hybrids formation is required, though exact passage numbers from the original crossing experiment (performed in 1998) were not recorded, so these are best estimates from a revised assessment of all the available information (not included in the main text).

P1 and P2 clones were transfected with the drug resistance genes on episomal constructs, drug selected and polyclonal populations were grown out (~15 generations). Cloning generated the P1-hyg and P2-neo lines (~25 generations). Stationary phase cultures containing metacyclic trypomastigotes were used for the mixed Vero infections (crosses) as described in Gaunt et al., 2003 (~10 generations to point of mixing). The P1 and P2 samples that were sequenced in this study, for t=0, were grown from the pre-transfection parasites, adding another ~20 generations of culture. Thus, the t=0 parent sequences are removed by ~70 generations from the experimental cross.

Parasites were recovered from mixed Vero cell infections between 7 and 28 days post-inoculation (~15 generations as amastigotes), then selected (on both drugs) and a polyclonal population was grown out (~15 generations). Cloning generated the original 6 hybrid lines (~25 generations), which were cryopreserved. When work resumed on analysing the hybrids’ DNA contents (see Lewis et al., 2009) the t=0 DNA extraction occurred after ~40 generations of epimastigote culture. Thus, the t=0 hybrid sequences are removed by ~95 generations from the experimental cross. So, at the beginning of the microevolution experiment presented in this study, the parents and hybrids were already separated by approximately 165 generations. Figure 1 and the methods section has now been modified to reflect these details.

In hindsight, it would have been better to have less divergence between the time of DNA sampling and the original cross experiment, but such a long-term microevolution experiment was not planned in advance and the idea developed over time. However, it would never have been possible to avoid some distance between the parent samples, the hybrid clone samples and the actual hybrisation event because of the culture bottlenecks inherent to the process of cell infections, drug selection and cloning, as well as growing out sufficient numbers of parasites to yield enough DNA for the sequencing technologies available at the time.

In light of the above-mentioned aspects, we consider that SNPs that are common to all hybrids were all almost certainly generated from before hybridisation, falling into two main categories: SNPs that were detected in the three t=0 hybrid samples but not detected in the parent samples, and SNPs occuring in the parent parasite lines during culture before the cross was set up. As above, DNA for this study was derived from parent samples approximately 70 generations separating them from the cross.

We might rule out less likely origins for SNPs common to all hybrid such as homoplasies, that would be extremely rare across three independent hybrid lines, or those occuring during growth of a hypothetical common progenitor hybrid before the cloning procedure was done. This latter seems unlikely because hybrid ID12 inherited P2 kDNA maxicircles while 1C2 and 2C1 inherited P1 kDNA maxicircles. Thus, a minimum of two independent hybridisation events is more parsimonious than a mixed maxicircle common progenitor. It is noteworthy to mention that maxicircle inheritance is uniparental in T. brucei and Lesishmania.

The hybrid clone-specific SNPs are assumed to all be downstream of the hybridisation events, though it is conceivable that a very small fraction might have been undetectable in all samples but one for technical reasons. These are the SNPs being referred to at the end of this section where we state “this ratio was approximately 3.3-fold higher in the SNPs that appeared after the hybridization event”

We thus changed on page 7 “50 generations of growth before hybrid formation” to “70 generations of parent parasite growth before the cross was set up“ and added a sentence to explain about the kDNA. Figure 1 was changed to show the time separating the DNA samples from the hybridisation event. We also clarify at the end of section that SNPs that appeared after hybridisation equates to hybrid clone-specific SNPs.

It seems like the authors are discovering the impact of tandem repeat regions (copy number variations) on SNP discovery. Most of page 10 and Figure 3 merely describes the fact that you will have more false-positive SNPs within the repeat regions. I'm not sure why this is of interest here. Such SNP's should not have been included in the first place.

As the reviewers are already aware, the T. cruzi genome is unusual, even for kinetoplastid parasites. It is effectively divided into two parts, core and repetitive, and the two behave very differently in most respects, both regarding sequence analysis and biology. We have therefore chosen to include this information in order to illustrate the features of the genome for readers are less familiar with this organism and its genome, specially concerning the sequence variability within the multicopy gene families of a clonal organism that would allow detection of distinct SNPs among distinct members of a given gene family. We feel that this will be helpful, as T. cruzi is radically different from other trypanosomatids and it is suited for the readership of this more broad journal. We have rephrased the text to reflect to clarify this.

3. Why was the mapping quality in surface molecules inferior to the rest? That begs for explanation. "Surface molecule genes", frequently used here, sounds somewhat weird to this reviewer. Couldn't it be rather "Surface protein-coding genes" or the like?

The difference in mapping quality thus reflects the extreme nature of the sequence composition of the surface-molecule coding gene-containing regions of the genome. Even though rigorous mapping parameters were used, results such as these are expected due to the large and variable number of gene copies and other repeats that cannot reliably be used for mapping. We have changed the terminology as suggested.

4. I understand the implications of trying to map short reads to repetitive regions. For example, PAINTS (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7912377/) removes noisy genomic regions (large gene families) before attempting to do ploidy estimations. This seems intuitive, and this is also explained in this Ms, in light of this maybe authors can save some space by omitting text and figures that are used to discuss and show these difficulties? Maybe there is not much to gain by showing that estimating ploidies in chromosomes with low numbers of housekeeping genes is difficult. Just state that and put the data into Supplementary file?

As mentioned above, we included and opted for maintaining such information in order to illustrate the unique features of the T. cruzi genome for readers are less familiar with this organism and its genome. We consider that this will be useful and helpful, as comprehension of distinct T. cruzi DTU genome structures have been impaired by a number of variables such as the number of multicopy gene families and their contents.

5. Based on the results presented, I don't agree that there is gradual erosion overall across all clones.

1D12 hybrid seems to have gone through an initial loss of DNA content but then its DNA content seems to have remained stable (Figure 1D). In addition, it's genome-wide somy profile remained stable at tetrasomy (Figure 4C). Also, there also seems to be an increase in DNA content after 600 generations. Finally, microsatellite alleles are only consistently lost in hybrid 2C1, at three different loci. No loss was observed in hybrid 1D12 and only one allele was lost in hybrid 1C2. This also raises the question what has caused the loss of DNA content in hybrid 1D12. Based on microsats and somy, there is no clear evidence for genomic erosion. So, what part of the genome is lost in the beginning of the experiment? The authors could do a comparative genomic analyses of the parental genomes and the hybrid genomes, to figure out what portions of the genomes are commonly/uniquely lost in the hybrids.

We agree that the three hybrid lines have distinct genome erosion patterns not fully captured by our use of the term “gradual”. Revision of figure 1C to account for kDNA and infer 3n1k and 4n1k levels has allowed us to better visualise these trends. This shows there was in fact little DNA loss between the hybridisation (diploid fusion) events and t=0 for all three lines. While 2C1 shows the most consistent rate of genome erosion, with all three 800 generation evolved clones approximating a 3C genome size, the 1C2 shows more of a “stop-start” pattern, where most of the erosion occurs between 0-200 and 400-600 generations, reaching a 3C genome size, but two of the 800 generation clones then gain ~10%. The 1D12 clone, as the reviewer correctly observes, shows an initial 9% loss between t=0 and t=200 but was then quite stable between 3C and 4C.

In terms of the microsatellite data, we only typed four loci so there is very limited value in using them to interpret the total DNA erosion patterns. The purpose of such typing was just to help guide decisions on the analytical approach, i.e., whether to continue culturing for more years, whether to pursue a whole genome sequencing approach and for which samples. We reasoned that once we saw any evidence of microsattellite allele dropout then sequencing was likely to be justified.

Comparative genomic analyses of the parental genomes and the hybrid genomes is described on section “Patterns of genome erosion after in vitro microevolution” (Pg. 13) and we have now expanded the text of the Results section relating to Figure 1C and 1D to cover these aspects of the data and provided more nuance. Also, Gene Ontology analysis of contracted genes in hybrid strains after culture growth is presented in Supplementary Table 9.

6. One of the conclusions from the study of accumulation of mutations, is that surface molecules may mutate faster. The reasoning for this if I read and understood correctly, is that there were many "mutations that appeared in each parental clone after culture growth". Do authors mean to say that these mutations appeared after the genetic cross experiment in Gaunt et al. 2003? Can the authors clarify how these parental clones were kept and passaged since then? How many passages did they undergo before being sequenced? The same clarification is warranted for hybrids. When the authors state that "while the parental strains were diploid, all initial hybrid clones were essentially tetraploid" (Discussion, page 15), the readership would like to know if these essential tetraploid states were observed just after the hybridization (genetic exchange) event (e.g. clones were kept in liquid nitrogen since then and only thawed for brief culture before sequencing) or if these states had some passages in culture after the genetic exchange event (how many?).

Please refer to the answer to point #2 above about the origin and timing of the DNA samples. We have added text to clarify this issue.

7. Data and code availability: The mentioned accession number was not found in the BioProject database. Please make code available via GitHub or other repository.

The sequence data was submitted to the NCBI BioProject database under accession number PRJNA748998 and will be publicly available upon publication as now stated on the MS. URL: https://dataview.ncbi.nlm.nih.gov/object/PRJNA748998?reviewer=ejap1p42k4u6baufrhtpgls5e2.

The scripts, mostly used for plotting the figures in “R” are relatively standard and we do not think they warrant inclusion.

https://doi.org/10.7554/eLife.75237.sa2

Article and author information

Author details

  1. Gabriel Machado Matos

    1. Departamento de Biologia Celular, Embriologia e Genética, Universidade Federal de Santa Catarina, Florianopolis, Brazil
    2. Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
    Contribution
    Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Michael D Lewis
    Competing interests
    The authors declare no competing interests
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3744-2673
  2. Michael D Lewis

    Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Gabriel Machado Matos
    Competing interests
    No competing interests declared
  3. Carlos Talavera-López

    1. Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
    2. Institute of Computational Biology, Computational Health Centre, Helmholtz Munich, Munich, Germany
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Matthew Yeo

    Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Investigation, Methodology
    Competing interests
    The authors declare no competing interests
  5. Edmundo C Grisard

    Departamento de Microbiologia, Imunologia e Parasitologia, Universidade Federal de Santa Catarina, Florianopolis, Brazil
    Contribution
    Formal analysis, Investigation, Supervision, Writing – original draft, Writing – review and editing
    Competing interests
    The authors declare no competing interests
  6. Louisa A Messenger

    Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Data curation, Investigation
    Competing interests
    The authors declare no competing interests
  7. Michael A Miles

    Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Project administration, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Björn Andersson

    Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review and editing
    For correspondence
    bjorn.andersson@ki.se
    Competing interests
    The authors declare no competing interests
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4624-0259

Funding

Swedish Research Council (Project Grant)

  • Gabriel Machado Matos
  • Björn Andersson
  • Michael A Miles

CAPES (Student Scholarship)

  • Björn Andersson
  • Edmundo C Grisard

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was financed in part by the Swedish Research Council, Dnr 20016–02951, and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – CAPES (Brazilian Government Agency) – Finance Code 001, GMM was supported by a scholarship provided by CAPES-PrInt and ECG was funded by grants from CNPq and CAPES-STINT and CAPES-PrInt (Brazilian Government Agencies). We are thankful to João Luís Reis Cunha for his guidance on ploidy estimation.

Senior Editor

  1. Jos W van der Meer, Radboud University Medical Centre, Netherlands

Reviewing Editor

  1. Antoine Claessens, University of Montpellier, France

Reviewers

  1. Julius Lukes, Biology Centre, ASCR, v.v.i., Czech Republic
  2. Fernán Agüero

Publication history

  1. Preprint posted: October 24, 2021 (view preprint)
  2. Received: November 3, 2021
  3. Accepted: April 22, 2022
  4. Accepted Manuscript published: May 10, 2022 (version 1)
  5. Version of Record published: May 12, 2022 (version 2)
  6. Version of Record updated: May 13, 2022 (version 3)

Copyright

© 2022, Matos et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,084
    Page views
  • 170
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Gabriel Machado Matos
  2. Michael D Lewis
  3. Carlos Talavera-López
  4. Matthew Yeo
  5. Edmundo C Grisard
  6. Louisa A Messenger
  7. Michael A Miles
  8. Björn Andersson
(2022)
Microevolution of Trypanosoma cruzi reveals hybridization and clonal mechanisms driving rapid genome diversification
eLife 11:e75237.
https://doi.org/10.7554/eLife.75237
  1. Further reading

Further reading

    1. Genetics and Genomics
    2. Plant Biology
    Simon Snoeck, Bradley W Abramson ... Adam D Steinbrenner
    Research Article Updated

    As a first step in innate immunity, pattern recognition receptors (PRRs) recognize the distinct pathogen and herbivore-associated molecular patterns and mediate activation of immune responses, but specific steps in the evolution of new PRR sensing functions are not well understood. We employed comparative genomic and functional analyses to define evolutionary events leading to the sensing of the herbivore-associated peptide inceptin (In11) by the PRR inceptin receptor (INR) in legume plant species. Existing and de novo genome assemblies revealed that the presence of a functional INR gene corresponded with ability to respond to In11 across ~53 million years (my) of evolution. In11 recognition is unique to the clade of Phaseoloid legumes, and only a single clade of INR homologs from Phaseoloids was functional in a heterologous model. The syntenic loci of several non-Phaseoloid outgroup species nonetheless contain non-functional INR-like homologs, suggesting that an ancestral gene insertion event and diversification preceded the evolution of a specific INR receptor function ~28 my ago. Chimeric and ancestrally reconstructed receptors indicated that 16 amino acid differences in the C1 leucine-rich repeat domain and C2 intervening motif mediate gain of In11 recognition. Thus, high PRR diversity was likely followed by a small number of mutations to expand innate immune recognition to a novel peptide elicitor. Analysis of INR evolution provides a model for functional diversification of other germline-encoded PRRs.

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Evgeniya N Andreyeva, Alexander V Emelyanov ... Dmitry V Fyodorov
    Research Article

    Asynchronous replication of chromosome domains during S phase is essential for eukaryotic genome function, but the mechanisms establishing which domains replicate early versus late in different cell types remain incompletely understood. Intercalary heterochromatin domains replicate very late in both diploid chromosomes of dividing cells and in endoreplicating polytene chromosomes where they are also underrelicated. Drosophila SNF2-related factor SUUR imparts locus-specific underreplication of polytene chromosomes. SUUR negatively regulates DNA replication fork progression; however, its mechanism of action remains obscure. Here we developed a novel method termed MS-Enabled Rapid protein Complex Identification (MERCI) to isolate a stable stoichiometric native complex SUMM4 that comprises SUUR and a chromatin boundary protein Mod(Mdg4)-67.2. Mod(Mdg4) stimulates SUUR ATPase activity and is required for a normal spatiotemporal distribution of SUUR in vivo. SUUR and Mod(Mdg4)-67.2 together mediate the activities of gypsy insulator that prevent certain enhancer-promoter interactions and establish euchromatin-heterochromatin barriers in the genome. Furthermore, SuUR or mod(mdg4) mutations reverse underreplication of intercalary heterochromatin. Thus, SUMM4 can impart late replication of intercalary heterochromatin by attenuating the progression of replication forks through euchromatin/heterochromatin boundaries. Our findings implicate a SNF2 family ATP-dependent motor protein SUUR in the insulator function, reveal that DNA replication can be delayed by a chromatin barrier and uncover a critical role for architectural proteins in replication control. They suggest a mechanism for the establishment of late replication that does not depend on an asynchronous firing of late replication origins.