Evolution of the complex transcription network controlling biofilm formation in Candida species
Abstract
We examine how a complex transcription network composed of seven ‘master’ regulators and hundreds of target genes evolved over a span of approximately 70 million years. The network controls biofilm formation in several Candida species, a group of fungi that are present in humans both as constituents of the microbiota and as opportunistic pathogens. Using a variety of approaches, we observed two major types of changes that have occurred in the biofilm network since the four extant species we examined last shared a common ancestor. Master regulator ‘substitutions’ occurred over relatively long evolutionary times, resulting in different species having overlapping but different sets of master regulators of biofilm formation. Second, massive changes in the connections between the master regulators and their target genes occurred over much shorter timescales. We believe this analysis is the first detailed, empirical description of how a complex transcription network has evolved.
Introduction
Many of the most medically relevant fungi belong to the Candida genus. These microbes are part of the human microbiota, but under specific circumstances — such as imbalances in components of the microbiota or suppression of the immune system of the host — they can proliferate as opportunistic pathogens and cause disease (Calderone and Clancy, 2012; Turner and Butler, 2014; Kullberg and Arendrup, 2015; Romo and Kumamoto, 2020). These diseases, which were already documented by the ancient Greeks, range from mild cutaneous disorders to systemic infections with high mortality rates (Lynch, 1994; Calderone and Clancy, 2012; Kullberg and Arendrup, 2015; Nobile and Johnson, 2015). Although they are usually studied in planktonic (suspension) cultures in the laboratory, Candida species, like many microbes, are often found in nature as biofilms, communities of cells associated with surfaces. For Candida albicans, the best studied and most clinically relevant of the Candida species, biofilms consist of a lower sheet of cells in the yeast form (spherical, budding cells) overlaid by a layer of filamentous cells (hyphae and pseudohyphae) and surrounded by an extracellular matrix composed of proteins and secreted polysaccharides (Blankenship and Mitchell, 2006; Nobile and Johnson, 2015; Lohse et al., 2018). The matrix, together with specific gene expression changes within biofilms (e.g., the upregulation of drug efflux pumps), provides protection from environmental stresses including antifungal drug treatment. The ability of C. albicans to form biofilms has been associated both with its versatility in occupying different niches in the human host and its inherent resistance to antifungal drugs. These features are especially important for individuals with implanted medical devices, which provide substrates for biofilm formation and where often the only effective treatment is replacement of the device (Donlan, 2001). Biofilms also shed live yeast-form cells and thereby serve as reservoirs for further colonization in the human body (Nobile and Johnson, 2015).
C. albicans biofilm formation begins with the adhesion of yeast cells to a surface, followed by cell division and morphological differentiation to form an upper layer of filamentous cells. The biofilm matures through the secretion of the extracellular matrix (Blankenship and Mitchell, 2006; Nobile and Johnson, 2015; Lohse et al., 2018). In C. albicans, a complex transcription network regulates this process; it consists of seven ‘master’ transcription regulators (Bcr1, Brg1, Efg1, Flo8, Ndt80, Rob1, and Tec1) that control each other’s expression and, collectively, bind to the control regions of more than a thousand target genes — around one-sixth of the total number of genes present in the genome of this species (Figure 1; Nobile et al., 2012; Fox et al., 2015). All of the seven regulators appear to be positive regulators that are required for normal biofilm development. Despite the complexity of the biofilm regulatory network, several lines of evidence suggest that this network originated relatively recently. For example, genes that are highly expressed during biofilm formation are enriched for genes that are relatively young, meaning that they only have a clear ortholog in species closely related to C. albicans (Nobile et al., 2012). Apart from the literature available for C. albicans, most of the work to understand biofilm formation in Candida species has been carried out with Candida parapsilosis (Ding and Butler, 2007; Connolly et al., 2013; Holland et al., 2014). C. parapsilosis diverged from a last common ancestor with C. albicans nominally 70 million years ago (Mishra et al., 2007; Butler et al., 2009). Although six of the seven master regulators of biofilm formation in C. albicans have clear orthologs in C. parapsilosis, only two of them are required for biofilm formation in the latter species (Holland et al., 2014). Candida dubliniensis and Candida tropicalis are more closely related to C. albicans (see Figure 2) and are also known to form biofilms (Ramage et al., 2001; Silva et al., 2011; Pujol et al., 2015; Araújo et al., 2017; Dominguez et al., 2018; Kumari et al., 2018), but the regulatory circuits that control this process are largely unknown.

The biofilm transcription network in Candida albicans.
(A) The seven master transcription regulators identified in genetic screens and the interactions among them as determined by ChIP-chip and ChIP-qPCR (Nobile et al., 2012; Fox et al., 2015). (B) Binding interactions (determined by ChIP-chip) between the master regulators (red) and their target genes (black). Figure adapted from Nobile et al., 2012. Many target genes are bound by more than one regulator. Note that genome-wide binding data is not available for Flo8, and thus it is missing from the larger network diagram in (B).

Diversity in biofilm formation across fungal species.
(A) Biofilm biomass dry weight was determined for different fungal species grown on the bottoms of polystyrene 6-well plates in Spider 1% glucose medium at 37°C for 48 hr. The mean and standard deviation were calculated from five replicates. Hashtags denote species that do not grow well at 37°C and for which biofilms were grown at 30°C. The cladogram to the left shows the phylogenetic relationship of the species (Byrne and Wolfe, 2005; Maguire et al., 2013). All species analyzed belong to the CTG-Ser1 clade apart from C. glabrata and S. cerevisiae. (B) Morphology of biofilms formed by five representative CTG clade species visualized by confocal scanning laser microscopy. C. tropicalis biofilm morphology is similar to that of C. albicans and C. dubliniensis as shown in Figure 2—figure supplement 2 and Figure 3—figure supplement 1. Biofilms were grown as described above, but on the surfaces of silicone squares. Scale bars represent 50 μm. (C) Biofilm formation by Candida species in an in vivo rat catheter model (Andes et al., 2004). Biofilms were grown for 24 hr and were visualized by scanning electron microscopy. Two magnifications are shown in the lower and upper panels for each species, and the scale bars represent 20 and 100 μm, respectively. Micrographs of C. albicans were adapted from Dalal et al., 2016, but were obtained as part of the same set of experiments performed in parallel.
To understand how the complex transcription network that controls biofilm formation evolved, we began with the seven master regulators of biofilm formation in C. albicans and determined whether their orthologs also controlled biofilm formation in C. dubliniensis and C. tropicalis. Using ChIP-seq, we mapped the targets of the orthologs in C. dubliniensis, C. tropicalis, and C. parapsilosis. A comparison of the extant networks showed that the two main components of the network, master regulators and target genes, moved in and out of the network at very different rates over evolutionary time. While the regulators moved gradually and in rough correlation with small phenotypic changes in biofilm structure, the master regulator-target gene connections changed very quickly. The large-scale changes in connections observed between closely related species did not appear to have a major impact on biofilm phenotypes, at least as monitored in vitro. These results suggest an evolutionary route through which complex regulatory networks could rapidly explore new network configurations (and perhaps new phenotypes) without disrupting existing functions.
Results
Only closely related species to C. albicans form complex biofilms
To understand how the transcription network that controls biofilm formation changed over evolutionary timescale, we first phenotypically characterized the biofilms formed in vitro by the different species of the so-called CTG clade. This clade, which includes but extends beyond Candida species, was traditionally named CTG due to its unusual property of decoding the CTG codon as serine instead of the usual leucine (Figure 2). Recently, this clade has been renamed CTG-Ser1 because other Ascomycota clades were discovered to also have unusual codon usage (Krassowski et al., 2018). To define an optimal growth medium for these assays, we tested biofilm formation under several conditions typically used to study biofilms of Candida species (García-Sánchez et al., 2004; Richard et al., 2005; Kucharíková et al., 2011; Nobile et al., 2012; Lohse et al., 2017). For the initial tests, we focused on C. albicans and the three species that are most closely related to it and that commonly inhabit humans, C. dubliniensis, C. tropicalis, and C. parapsilosis (Figure 2; Turner and Butler, 2014; Gabaldón et al., 2016). The estimated divergence time for C. dubliniensis, C. tropicalis, and C. parapsilosis from the last common ancestor with C. albicans is approximately 20, 45, and 70 million years, respectively (Mishra et al., 2007; Butler et al., 2009; Moran et al., 2012). C. albicans, C. tropicalis, and C. parapsilosis have been previously shown to form biofilms, while less is known about biofilm formation in C. dubliniensis (Silva et al., 2011; Araújo et al., 2017; Dominguez et al., 2018; Kumari et al., 2018). Biofilms were grown in vitro on silicone squares at 37°C for 48 hr with shaking and were monitored by confocal scanning laser microscopy (CSLM), as has been previously described (Nobile et al., 2012). We tested eight different growth media, and only Spider medium with glucose (rather than mannitol) as the carbon source allowed all four species to form thick, well-structured biofilms (Supplementary file 1a). Our results also showed that environmental conditions are important determinants of biofilm formation for some of these species. While C. albicans formed thick biofilms in all media tested, C. tropicalis biofilm formation, for example, depended very much on carbon source (Supplementary file 1a).
Given that there could be differences in the speed at which different species form biofilms, we also assessed biofilm formation as a function of time for the same four species. Biofilms were formed as described above and were monitored at seven different time points from 30 min to 96 hr after cell adhesion under the confocal microscope. Although C. albicans formed biofilms more rapidly, by 48 hr all four species had formed mature biofilms that did not significantly change at later time points (Figure 2—figure supplement 1).
Once we had defined an optimal biofilm growth medium (Spider + glucose) and time point (48 hr), we extended our analysis to other species of the CTG clade (Maguire et al., 2013). In addition to CSLM as described above, we monitored biofilm formation in two additional ways: we determined the biomass dry weight of biofilms formed on the bottoms of polystyrene plates, and, using a microfluidic flow cell, we continuously monitored biofilm formation by time-lapse photography using an optical microscope (Nobile et al., 2012; Lohse et al., 2017). The three methods are complementary: biomass determination is a quantitative method that reduces biofilm formation to a single number, confocal microscopy is qualitative, but allows detailed characterization of the structure of the biofilm, and the microfluidic assay reveals biofilm formation in real time under a defined flow; the flow rate was adjusted to mimic that of an average catheter implanted in a vein (Gulati et al., 2017; Lohse et al., 2017). Because some of the species we tested are known to grow poorly at 37°C, we performed the assays at 30°C for those species (Kurtzman et al., 2011). Although not all species were tested in the three assays, overall, of the 15 species analyzed, those closest to C. albicans formed the thickest biofilms and, in general, the greater the phylogenetic distance from C. albicans the thinner the biofilm formed (Figure 2, Figure 2—figure supplement 2). Only the two species closest to C. albicans (C. dubliniensis and C. tropicalis) formed biofilms that are structurally very similar to C. albicans biofilms, with a basal layer of yeast cells underlying a thick layer of filamentous cells (hyphae and pseudohyphae). Biofilms formed by C. parapsilosis appeared similar at low resolution, but a closer examination showed that the layer of filamentous cells is composed largely of pseudohyphae rather than a mixture of true hyphae and pseudohyphae (Figure 2B). Under these conditions, Lodderomyces elongisporus formed thinner biofilms composed only of yeast cells. Moving further away from C. albicans, Spathaspora passalidarum, Meyerozyma guilliermondii, and Clavispora lusitaniae form even thinner biofilms, while Scheffersomyces stipites, Debaryomyces hansenii, Metschnikowia bicuspidate, Hyphopichia burtonii, and Candida tenuis did not form biofilms under the conditions we tested; only a few cells were observed adhering to the surface (Figure 2, Figure 2—figure supplement 2). We also performed CSLM assays in an additional medium (RPMI) with a selection of the species, and the results generally agreed with those described above (Figure 2—figure supplement 2). Our results with the microfluidic assays showed a similar trend: only those species that are phylogenetically closest to C. albicans were able to rapidly form biofilms under flow conditions in the microfluidic device (Figure 2—figure supplement 3).
As a reference, we also characterized biofilm formation in two other ascomycetous yeast species that lie outside the CTG clade, Candida glabrata and Saccharomyces cerevisiae. Although not closely related to the CTG clade (despite its name), C. glabrata is an important opportunistic human pathogen, while S. cerevisiae is used extensively in the food and beverage industries and is a widely employed model organism. As can be seen in Figure 2, neither of these species formed biofilms that resembled those formed by C. albicans and its close relatives in the assays and conditions that we tested.
To assess whether the results observed in vitro can be recapitulated in vivo, we tested the ability of C. albicans, C. dubliniensis, C. tropicalis, and C. parapsilosis to form biofilms in the rat catheter model, a well-established in vivo biofilm model (Andes et al., 2004). All four species were able to form biofilms, although the biofilms formed by C. albicans and C. dubliniensis were considerably thicker and more filamentous (Figure 2C). These results agree with previous in vivo characterizations performed for C. albicans and C. parapsilosis (Nobile et al., 2012; Connolly et al., 2013) and provide new information on the intermediate species.
In summary, our results show that the ability to form biofilms that resemble those of C. albicans is limited to its most closely related species. In terms of biomass, there is a sharp drop off outside C. parapsilosis while, in terms of biofilm structure, only C. dubliniensis and C. tropicalis form biofilms similar to those of C. albicans, in terms of all three morphological cell types being represented. Of all the species studied, C. albicans biofilm formation is the most rapid and most robust to environmental changes as it formed similar biofilms in all media tested (Supplementary file 1a, Figure 2—figure supplement 2); moreover, the biofilms formed by this species are the most stable to physical manipulation (results not shown). When interpreting these results, it is important to note that the biofilm assays employed here have been developed for C. albicans and its closely related species. Therefore, it is possible that our observations may not only reflect differences in the intrinsic ability to form biofilms, but could also be due to changes in the ways the species have adapted to the different environments they inhabit. The species that were not observed to form biofilms in our study could, in principle, be able to form biofilms in other conditions, but, to our knowledge, there is no evidence for this even in well-studied species such as S. cerevisiae.
The regulatory core of the biofilm transcription network changed gradually over time
To gain insight into the evolutionary changes that occurred in the transcription network that controls biofilm formation at a molecular level, we first studied the function of the seven master regulators of the C. albicans network (Figure 1A). Given the phenotypic results described above, we centered the analysis on C. albicans, C. dubliniensis, C. tropicalis, and C. parapsilosis. All four of these species are common in humans (Turner and Butler, 2014), and the first three form similar structural types of biofilms. As described above, C. parapsilosis also forms biofilms, but its biofilms show more pronounced differences. All seven master regulators of the network in C. albicans (Figure 1) have clear orthologs in the other three closely related species, with the exception of Rob1. Rob1 has a patchy phylogenetic distribution, with syntenic orthologs present in C. albicans, C. dubiniensis, and C. tropicalis, but apparently absent from C. parapsilosis and closely related species. However, Rob1 orthologs are present in other more distantly related CTG species, which supports the hypothesis that Rob1 was either lost or was evolving sufficiently rapidly in the C. parapsilosis lineage that it cannot be recognized (Maguire et al., 2013).
To test whether the orthologs of the C. albicans master regulators are involved in biofilm formation in the other species, we generated gene deletion knockouts in C. dubliniensis and C. tropicalis. The knockouts in C. albicans and C. parapsilosis had been previously generated as part of large transcription regulator deletion projects (Homann et al., 2009; Holland et al., 2014). To make the knockouts in C. dubliniensis and C. tropicalis, we used amino acid auxotrophic strains and employed a gene knockout strategy similar to that previously used for C. albicans and C. parapsilosis (Mancera et al., 2019).
The ability of the different gene knockout strains to form biofilms was monitored by biomass dry weight determination and CSLM. As can be observed in Figure 3 and Figure 3—figure supplement 1, all seven master regulators identified in C. albicans were also required for biofilm formation in C. dubliniensis. The results were different for C. tropicalis; here, only five of the seven were required, with Rob1 and Flo8 appearing dispensable for biofilm formation under our laboratory conditions. The biofilms formed by the rob1 and flo8 deletion mutants in C. tropicalis were actually slightly heavier and the hyphal layer was denser than biofilms formed by the parental (wildtype) strain, suggesting that these two regulators may have assumed a subtle inhibitory role in this species. For C. parapsilosis, it was previously shown that, of the seven master biofilm regulators in C. albicans, only Bcr1 and Efg1 are indispensable for biofilm formation. The Ntd80 deletion in C. parapsilosis fails to form biofilms but also has a general growth defect, making it difficult to ascertain its precise role. Previous work also established that deletion of CZF1, UME6, CPH2, GZF3, and ACE2 all exhibit biofilm-specific defects in C. parapsilosis but not in C. albicans (Holland et al., 2014). Overall, our results, together with previous observations, show that the group of master regulators underlying biofilm formation in C. albicans is different than in other Candida species with the degree of difference roughly paralleling their evolutionary distance from C. albicans.

Roles of orthologs of the seven C. albicans master regulators in biofilm formation.
(A) Phenotypic characterization of biofilms formed by the gene deletion knockouts of orthologs of the seven master C. albicans biofilm regulators. Images show biofilms grown on the bottoms of polystyrene 6-well plates in Spider 1% glucose medium at 37°C for 48 hr. (B) Dry weights of biofilms formed by the gene deletion mutants grown as described in (A). The means and standard deviations were calculated from five replicates for two independent gene deletion knockout isolates (KO-1 and KO-2). Asterisks denote statistically significant different weights when compared to the corresponding parental strain using a Student’s two-tailed paired t test (p<0.05). Although the dry weight of the C. dubliniensis brg1 mutant is not statistically different from that of the wildtype, detailed analysis of this mutant by confocal scanning laser microscopy showed a clear biofilm formation defect (Figure 3—figure supplement 1). (C) Summary diagram showing the conservation of the seven master regulators in biofilm formation across the three most closely related species to C. albicans. The data for C. albicans was obtained from Nobile et al., 2012; Fox et al., 2015, and that for C. parapsilosis from Holland et al., 2014.
Target genes of the master biofilm regulators differ greatly among Candida species
To determine how the binding connections between the biofilm master regulators and their target genes have changed over evolutionary time, we determined genome-wide protein-DNA interactions of the master regulators in C. albicans, C. dubliniensis, C. tropicalis, and C. parapsilosis by chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) (Johnson et al., 2007). To this end, we tagged each of the regulators with a Myc epitope tag that can be immunoprecipitated using a commercially available antibody. This strategy has the advantage that a single antibody with the same affinity could be used in all experiments, and control experiments could be performed with untagged strains. All the ChIP-seq experiments were performed in mature biofilms grown for 48 hr at 37°C in the optimal medium described above. For unknown technical reasons, not all regulators could be immunoprecipitated in all species. Of all the ChIP-seq experiments performed, protein-DNA interactions could be reliably determined for 18 regulators across the four species (Supplementary file 1b). We believe that this coverage, although not complete, is more than sufficient to uncover the general trends of biofilm network evolution. In the following sections, we consider how the biofilm network (defined as the set of master regulators and connections between them and their target genes) differs among Candida species and what these differences reveal about the evolution of the network.
To compare gene targets between species, we assigned each ChIP occupancy site to the ORF with the nearest downstream start codon. To compare the changes in master regulator-target gene connections across species, we first examined the target genes that are conserved across the species. The overall percentage of one-to-one orthologs between the four species ranged from a high of 91% between C. albicans and C. dubliniensis to a low of 74% between C. tropicalis and C. parapsilosis (Maguire et al., 2013). If we consider only the one-to-one orthologs, it becomes clear that the connections between master regulators and conserved target genes vary greatly across these species (Figure 4). Between the two most closely related species (C. albicans and C. dubliniensis), the master regulator that showed the highest conservation of target gene connections was Ndt80, but the overlap was only about 50% (634 of 1,297). Between C. albicans and C. parapsilosis, this value drops to about 26% (722 of 2,725). Although only 12% (371 of 3,016) of Ndt80 target gene connections are common to all four species, the overlap for each species pair is larger than expected by chance (hypergeometric test, p<0.05). This conclusion does not necessarily mean that selective constraints preserve such overlaps over evolutionary time. Although selection could be responsible, drift is also a possible explanation, particularly given the phylogenetic proximity of the species. The other master regulators show an even lower degree of conserved regulator-target gene connections. For example, Rob1 shares only about 12% (2 of 17) of its target gene connections between the two most related species, C. albicans and C. dubliniensis, but the overlap is still greater than expected by chance (hypergeometric test, p<0.05).

Connections between master regulators and target genes are highly divergent across species.
(A) Pairwise comparison of shared target genes for Ndt80, Efg1, and Rob1 between species. Target genes were determined by ChIP-seq as detailed in Materials and methods. The numbers represent the percentage of overall target genes conserved between each pair of species considering only genes that have orthologs in the two species. Note that Rob1 is absent in C. parapsilosis (Maguire et al., 2013). (B) Venn diagrams depicting the overlap of regulator-target gene connections across species, considering only genes that have orthologs in all four species for Efg1 and Ndt80 and considering genes that have orthologs in C. albicans, C. dubliniensis, and C. tropicalis for Rob1. Numbers in each section of the diagrams represent the percentage of master regulator-target gene connections, with the total number of connections for each regulator set at 100%, and the gross number of target genes. Note that for Efg1 and Ndt80 the size of the color sections does not correspond to the percentage. (C) Venn diagram depicting the overlap between target genes of Efg1 and Ndt80, considering only target gene orthologs that are present in all four species. As in (B), numbers represent the percentage of master regulator-target gene connections and gross number of target genes. The diagram indicates that, for genes that are targets of Efg1 and Ndt80 in all species, most Efg1-target gene connections are also Ndt80-target gene connections, even though the target genes themselves are different across species.
Our results also show that each species has target genes in its biofilm network that lack orthologs in the other three species. In C. albicans, 26% of the genes bound by at least one of the biofilm master regulators do not have orthologs in the other three species compared with 19% for the genome as a whole. This analysis extends the previous observation that genes that are upregulated during biofilm formation in C. albicans are often ‘young’ genes, that is, genes that lack orthologs in related species (Nobile et al., 2012). We observed a similar enrichment of unique genes in the biofilm network of C. parapsilosis, but not for C. dubliniensis and C. tropicalis; in the latter two species, the fraction of non-orthologous genes in the biofilm network was approximately the same as that observed for the whole genome. These observations indicate that the biofilm networks of C. albicans and C. parapsilosis have been more dynamic in recent evolutionary time than those of the other two species.
Differences in master regulator-target gene connections are robust to different methods of comparison
We considered the possibility that our results could be skewed by false-positive signals intrinsic to ChIP-seq experiments, even when proper controls are used (Chen et al., 2012). To deal with this potential problem, we employed two additional criteria for increasing stringency in our analysis of transcription networks (Nocedal et al., 2017). First, we filtered the regions identified as enriched in ChIP signal to include only those regions that also had a high-scoring regulator binding motif in the intergenic region. As discussed below, the binding motif for some of the master regulators does not vary significantly across the species we studied. Second, we also incorporated gene expression data to further filter the gene targets to those that (1) have ChIP enrichment, (2) show the presence of a regulator binding motif in the intergenic region, and (3) whose expression changes under biofilm-forming conditions. Although the gross number of regulator-target gene connections decreased as the stringency of the filtering criteria increased, the high proportion of differences across Candida species described above did not significantly change (Figure 4—figure supplement 1).
Another potential caveat that could confound our analysis concerns our conclusions being based on the specific conditions under which we induced biofilm formation. To test whether this concern is significant, we performed the ChIP-seq experiments for Ndt80 in C. albicans, C. dubliniensis, and C. tropicalis in biofilms grown in an alternative growth media. In all species, the binding intensities of the regulator in all the intergenic regions of the genome were highly correlated between the two media tested (Figure 4—figure supplement 2). This finding indicates that there are no significant changes in Ndt80 binding when the growth media is modified.
High connectivity of the biofilm network is observed in all four species
The initial characterization of the biofilm transcription circuit in C. albicans showed that many target genes were directly connected (by binding) to more than one master regulator (Nobile et al., 2012), and this general feature of the network architecture is observed across the Candida species studied here, despite the low conservation of individual regulator-target gene connections (Figure 4A). Perhaps the most notable example is seen by comparing the target genes of Ndt80 and Efg1 across the four species. The set of target genes of Ndt80 is considerably larger, but over 75% of the Efg1 target genes are also Ndt80 targets in all four species. In addition, the binding motifs of these two regulators are enriched in each other’s binding locations in all four species studied (Figure 5A). These observations indicate that, in all species, Efg1 binds in conjunction with Ndt80 even though the target genes of the regulator combination differ greatly across species. The association between Efg1 and Ndt80 agrees with previous planktonic ChIP-seq experiments of Efg1 performed in C. parapsilosis where the most enriched binding motif found was that of Ndt80 (Connolly et al., 2013). The DNA-binding motifs of these two regulators have also been shown to co-occur in the binding regions of Sfl1 and Sfl2, two regulators of filamentation in C. albicans (Znaidi et al., 2013). Taken together, these results suggest that the Efg1-Ndt80 association is ancient with respect to the Candida species studied here and that it remains preserved across them despite large species-to-species differences in the target genes bound by the two regulators. Analysis of the other master regulators indicates that combinational control of target genes is very common in all species, although the other examples do not seem as deeply conserved as the Efg1-Ndt8o example.

Master regulators retain their DNA-binding specificity while there is considerable variation in gene expression across species.
(A) Logos of the most enriched motif in the binding locations of the different master regulators, determined by ChIP-seq across species. The two circles to the right of each logo show whether the Efg1 (green circle) or Ndt80 (blue circle) previously known motifs are enriched in each set of regulator binding locations. (B) Pairwise comparison of transcription profiles under biofilm-forming conditions (a time point of 48 hr) of C. albicans against C. dubliniensis and C. tropicalis. As a reference, the comparison between two isolates of C. albicans is shown in the left panel. Biofilm-specific expression changes were calculated comparing gene expression between biofilm and planktonic growth conditions in the same media. Linear regressions are shown in blue for each comparison.
In terms of overall network structure across species, we also examined whether, as is the case for C. albicans, the master regulators bind to their own control region as well as those of the other master regulators. The extent of conservation of the binding connections between one master regulator and the others varies from regulator to regulator (Supplementary file 1c), with Ndt80 showing the highest conservation. In all four species, Ndt80 binds to its own control region as well as those of all six other regulators, with the exception of the control region of FLO8 in C. albicans and ROB1 in C. dubliniensis. Efg1, Brg1, and Tec1, in that order, follow Ndt80 in their degree of connection conservation, with Rob1 and Bcr1 having the least conserved set of connections. However, we note that the binding data for these two regulators is also the least complete (Supplementary file 1b). Ndt80 and Efg1 each bind to their own control regions in all the four species analyzed, while Brg1 exhibits this interaction only in C. albicans and C. dubliniensis. The binding of the other regulators to their own upstream intergenic regions appears less conserved. Overall, our findings show that high connectivity between the master regulators is conserved in the four Candida species analyzed. Thus, despite the extensive changes in the network across species, the high connectivity among regulators remains a structural feature of the network in each species.
Changes in master regulator-target gene interactions are due to changes in cis-regulatory regions
A possible mechanistic explanation for the high rates of change among the target genes of the biofilm network is that these changes are due to modifications in the trans components of the network, for example, changes in the DNA-binding specificity of a master regulator. This type of change has the potential to dramatically change the network over relatively short evolutionary timescales. To explore this possibility, we examined the ChIP-seq binding data for motifs recognized by the master regulators. Performing de novo motif searches, we found that the enriched DNA-binding motifs found in the binding regions of the biofilm master regulators were very similar across the four species we examined (Figure 5). Moreover, the de novo generated DNA-binding motifs for Efg1 and Ndt80 are similar to those previously reported for their orthologs in C. albicans and other species (Nobile et al., 2012; Nocedal et al., 2017).
As an additional test of whether the DNA-binding specificity of the regulators changed over the evolutionary time considered here, we used the de novo motifs generated from the ChIP-seq data for Efg1 or Ndt80 in each species and identified the occurrence of these motifs in the upstream regions of the orthologous genes from the other species. The overlap of potential targets between species varied from 51 to 100% depending on the exact motif used, but was never low enough to account for the differences determined directly from the ChIP-seq experiments (Figure 4). For example, the lowest overlap observed when using motif scoring was of 51% for the Efg1 motifs in C. dubliniensis compared with C. parapsilosis, while the actual gene target overlap of ChIP-seq data for this regulator was only 13% (Figure 4). In other words, the large differences in regulator-target interactions across species observed in the ChIP-seq data cannot be accounted for by slight differences in the binding motifs generated in each individual species. Although we cannot fully disregard that small differences in binding affinity contribute to differences in master regulator-target gene connections observed across the species, overall, all the analyses show that the DNA-binding specificity of at least some of the master regulators has not changed significantly across these species. This conclusion is further supported by analysis of a C. albicans–C. dubliniensis hybrid, as described below.
Biofilm-specific gene expression changed rapidly over evolutionary time
A strong prediction of the vast number of species-to-species differences in master regulator-target gene connections documented above is that the genes transcriptionally induced during biofilm formation should differ substantially among species. To test this prediction, we generated genome-wide transcription profiles of C. albicans, C. dubliniensis, and C. tropicalis under biofilm-forming conditions. To reveal biofilm-specific changes, we compared these profiles to expression data obtained when the species were grown in suspension cultures in the same medium. As a reference, we also performed the biofilm expression profile in a second C. albicans isolate. As seen in Figure 5B, the pairwise differences in the transcription profiles across the three species are significant and reflect their phylogenetic position: the further apart the two species are from one another, the less correlated their transcription profiles are. If we use a lax cutoff of twofold over/underexpression to define genes that change their expression during biofilm formation, the overlap between pairs of species is relatively low. For example, only 29% of the genes that change their expression during biofilm formation are shared between C. albicans and C. dubliniensis, and only 24% are shared between C. albicans and C. tropicalis. The overlap of differentially expressed genes between the two C. albicans isolates was 48%. This result is consistent with previous work showing that clinical isolates differ in their biofilm-forming abilities (Hirakawa et al., 2015; Huang et al., 2019). As noted in the 'Discussion', we believe that many of the differences among clinical strains arose after the C. albicans–C. dubliniensis branch.
To test whether the large differences in the gene expression profiles were specific to the media conditions used, we also performed the transcription profiles in different media, namely Spider for C. albicans and C. dubliniensis, and RPMI for C. albicans and C. tropicalis. In the alternative media, the conservation between the sets of genes that changed their expression at least twofold during biofilm formation is even lower, 22 and 11%, respectively, for C. albicans and C. dubliniensis, and C. albicans and C. tropicalis. Despite these major differences, all the interspecific pairwise overlaps are greater than would be expected by chance (hypergeometric test, p<<0.05), although we cannot distinguish if this is due primarily to selection or to the shared ancestry of their promoter sequences. Overall, the low degree of conservation in genome-wide gene expression agrees well with the low conservation of regulator-target gene connections across species described above.
To further understand the relationship between gene expression and the binding of the biofilm regulators in each species, we assessed whether differentially expressed genes were directly bound by the biofilm regulators. Considering genes that change their expression at least twofold during biofilm formation in the same conditions in which the ChIP-seq experiments were performed, the fraction of these genes bound by one or more regulators ranges from 30% in C. albicans to 51% in C. tropicalis. Combining our data with previous transcriptional profiling experiments performed in C. parapsilosis during biofilm formation (Holland et al., 2014), we estimated that 67% of differently expressed genes in this species were bound by at least one of the biofilm regulators under the experimental conditions tested. The overlap between differential gene expression and regulator binding in all species is larger than what would be expected by chance (hypergeometric test, p<<0.05), suggesting that direct binding is an important factor in gene regulation in the biofilm network.
DNA binding of a regulator is not expected to always produce a change in mRNA production, but we did observe a correlation between these two properties. Pairwise comparisons of C. albicans, C. dubliniensis, and C. tropicalis showed that most genes that are expressed differentially between species (as defined above) are genes whose intergenic region is bound by at least one regulator in one species, but not in the other species. This association is statistically significant (Fisher’s two-tailed exact test, p<<0.05) again suggesting that the majority of differences in biofilm-specific gene expression between species can be explained by differences in the cis-regulatory sequences of target genes that alter binding of the regulators.
Analysis of an interspecies hybrid independently supports the conclusions from the species-to-species comparisons
Many challenges exist in mapping and comparing regulator-target gene connections in transcription networks between yeast species and, more generally, between any species (Chen et al., 2012). These difficulties include technical issues such as differential nucleic acid recovery and signal-to-noise ratios, which can vary considerably from one species to the next. However, probably the most difficult problem to circumvent arises from different species having different physiological responses to the same external environment. For example, 30°C could be the optimal temperature for one species but might induce a stress response in a closely related species. Therefore, a network comparison between these species at 30°C might be dominated not by evolutionary changes in the transcription circuitry per se but simply by the fact that only one species has induced a stress response. This problem can be overcome to a large extent by creating and analyzing interspecies hybrids, where the genomes of two different species are present in the same cell and thus exposed to the same physiological state (Wilson et al., 2008). This approach, which can only be carried out between closely related species, specifically reveals the cis-regulatory changes that have accumulated between the two genomes since the species last shared a common ancestor.
We took advantage of the fact that it is possible to mate C. albicans and C. dubliniensis (each diploid) to generate tetraploid hybrids (Pujol et al., 2004). These hybrids form biofilms similar to those formed by C. albicans (Figure 6—figure supplement 1). We performed ChIP-seq of Ndt80 in this hybrid, immunoprecipitating the C. albicans Ndt80 protein in one set of experiments and the C. dubliniensis Ndt80 in another set. The results showed that — in the hybrid — the target genes bound by the C. albicans Ndt80 and the C. dubliniensis Ndt80 were highly correlated, similar to two biological replicates carried out in the same species (Figure 6A). In other words, we obtained the same target genes in the hybrid regardless of which Ndt80 was tagged for immunoprecipitation. Importantly, the binding positions on the C. dubliniensis genome in the hybrid were characteristic of the results in C. dubliniensis, specifically 97% of the targets in the hybrid are targets in C. dubliniensis, and the positions on the C. albicans genome were characteristic of C. albicans with 96% of the targets in the hybrid being targets in C. albicans. Although we only carried out this experiment with one master regulator, the results independently validate our earlier conclusions based on the much more extensive species-to-species comparisons. These observations confirm our previous conclusion that the extreme differences in regulator-target gene connections observed across these Candida species are due to changes in the cis-regulatory sequences in the target genes rather than changes in the regulators themselves or differences in the physiological state of the species at the time of analysis.

Ndt80 ChIP-seq in a hybrid and rate of conservation change of the different network components.
(A) Genome-wide comparison of C. albicans and C. dubliniensis Ndt80 binding in the hybrid strain. Binding to both the C. albicans (dark blue filled dots) and the C. dubliniensis (light blue empty dots) genomes is depicted. The maximal fold enrichment for each upstream intergenic region in the genome is plotted as well as the linear regression for each comparison. The left panel shows the C. albicans Ndt80–C. dubliniensis Ndt80 comparison while the right panel shows, as a reference, the comparison of the two experimental replicates that are most dissimilar. (B) Comparison between the master regulators required for biofilm formation, the Efg1 and Ndt80 binding targets, and biofilm gene expression, as a function of evolutionary distance. Master regulator conservation is depicted as the percentage of C. albicans regulators required for biofilm formation. Efg1 and Ndt80 target conservation reflect the percentage of targets shared by the different species pairs. Gene expression conservation represents the number of genes whose expression changes at least 1.5 log2 fold under biofilm-forming conditions between each species pair. The C. parapsilosis gene expression data is from Holland et al., 2014, and 1.5 log2 fold was chosen as a cutoff because this was the cutoff used in this prior study. There are three estimates of master regulator conservation because comparisons were performed between C. albicans and each of the other three species, while there are six estimates of binding target and gene expression conservation since comparisons were performed in pairs between all four species. Linear regressions are shown in the corresponding color. Evolutionary distance as substitutions per site was calculated from a phylogenetic tree of these species, inferred from protein sequences of 73 highly conserved genes (Lohse et al., 2013).
Discussion
In this work, we examined how a complex transcriptional network underlying a specific phenotype (Figure 1) evolved over a span of approximately 70 million years. The phenotype is biofilm formation by Candida species, a group of fungi that colonize humans, sometimes leading to disease. We documented phenotypic differences in biofilm formation across many fungal species and mapped the transcriptional networks underlying biofilm formation in four of them, C. albicans, C. dubliniensis, C. tropicalis, and C. parapsilosis. All four species form complex biofilms both in vitro and in vivo in a rat catheter model (Figure 2).
Using C. albicans as a reference species, our analysis leads to the following five main conclusions. (1) As we move away from C. albicans, biofilms become less complex, both in terms of structure and of composition; that is, fewer cell types are involved and the resulting biofilm is less regular. At larger evolutionary distances, fungal species did not form biofilms at all under the conditions we tested. (2) Of the seven master transcriptional regulators of biofilm formation in C. albicans, all seven are needed in the most closely related species (C. dubliniensis), but, as we move further away evolutionarily, fewer are required for biofilm formation. For example, two of the master regulators (Rob1 and Flo8) are dispensable for biofilm formation in the next most closely related species (C. tropicalis), and three of the seven are not required in C. parapsilosis. As shown by Holland and colleagues (Holland et al., 2014), other transcriptional regulators (present in C. albicans but not required for biofilm formation in this species) have assumed the role of master regulators in C. parapsilosis. If other master biofilm regulators exist in C. dubliniensis and C. tropicalis and are identified in the future, their analysis in C. albicans and C. parapsilosis will allow a more complete model of the biofilm regulatory network across the four species considered here. (3) In contrast to the relatively slow evolutionary substitutions of master regulators, the connections between the master regulators and their target genes have changed very rapidly over evolutionary time (Figure 4, Figure 6B). This conclusion is most obvious when we compare the two most closely related species, C. albicans and C. dubliniensis, estimated to have last shared a common ancestor 20 million years ago. Depending on the regulator, fewer than 50% of the master regulator-target gene connections were observed to be conserved (Figure 4). This conclusion was independently verified for one regulator — Ndt80 — by analyzing its binding distribution across the two genomes in a C. albicans–C. dubliniensis hybrid; here, the binding distribution of Ndt80 across one genome differed considerably from that of the other, and each resembled that seen in the cognate individual species (Figure 6A). This result strongly supports the conclusion that the differences in regulator-target gene connections across species are due largely to changes in the cis-regulatory sequences of the target genes rather than changes in the regulators. (4) As predicted from the extensive changes in regulator-target gene connections, mRNA expression during biofilm formation differs considerably from one species to the next. Like the other changes we have documented in this paper, mRNA expression divergence becomes greater as the phylogenetic distance increases (Figure 5B). (5) Despite the extensive changes in the transcription networks underlying biofilm formation across the species we examined, several key features of the overall architecture of the network appear to be preserved. For example, all species show high connectivity in the sense that many target genes are directly connected (by binding) to more than one master regulator. The high connectivity observed is dominated by the DNA binding of Ndt80 and Efg1, and we note that these regulators are also involved in several other cellular functions and have been suggested to cooperatively regulate their target genes (Sellam et al., 2010; Znaidi et al., 2013; Mancera et al., 2015). Moreover, many of the master regulators bind to their own control regions as well as those of the other master regulators. We have argued elsewhere that these two features are likely to be common to many complex transcription networks (Sorrells and Johnson, 2015), and the results presented here show that, despite many changes in individual regulator-target connections, the basic ‘structural features’ of the network are preserved across the biofilm networks of the four species examined.
The large amount of new genome-wide protein-DNA interaction and gene expression data reported here will be useful in future studies of biofilm formation in these Candida species. Very few ‘structural’ genes have been implicated in biofilm formation, and the ChIP-seq and transcriptional profiling results obtained across species could greatly facilitate the identification of key non-regulatory genes required for biofilm formation. For example, specific master regulator-target gene connections that are preserved across multiple species may point to target genes that are notably important for biofilm formation in these Candida species. Such a hypothesis could be tested in future studies by deleting these target genes of interest and assessing their roles in biofilm formation across species.
To place our findings in context, it is also instructive to compare our analyses of biofilm formation across species with recent studies where biofilm formation has been analyzed across different isolates of a single species, C. albicans (Hirakawa et al., 2015; Huang et al., 2019). Hirakawa et al., 2015 determined the genome sequences of 21 clinical isolates of C. albicans and examined their abilities to form biofilms. The genome comparisons revealed many differences among strains including aneuploidies, losses of heterozygosity, and mutations in coding sequences; moreover, the strains differed substantially in their abilities to form biofilms. Among the strains analyzed, the one used in our study (SC5314) was among the thickest biofilm producers as assayed by dry weight; the majority of isolates formed thinner biofilms. One clinical isolate that formed very poor biofilms was found to have an inactivating mutation in EFG1, one of the biofilm master regulators, indicating a relatively recent change. Because the C. dubliniensis strain used in our study (CD36) formed biofilms that are similar to those of strain SC5314, we believe that SC5314 is a good representative of C. albicans and that most of the clinical isolates probably acquired mutations (including aneuploidies and losses of heterozygosity) relatively recently. Huang et al., 2019 examined five of the previously sequenced strains in much more detail including the dependence on individual transcriptional regulators for biofilm formation. Although the magnitude of the effect of transcriptional regulator deletions on biofilm formation varied across strains, SC5314 again appears to be a good representation of the ability of C. albicans as a species to form thick, complex biofilms.
To our knowledge, this study is the first to examine in detail how a complex transcription network changes over a relatively short evolutionary time — 70 million years — represented by four different species. During that time, the master transcription regulators controlling biofilm formation have undergone slow substitutions, but their connections to the target genes they control have changed rapidly. We do not know which, if any, of these changes were adaptive; in this regard, it is important to note that, although the biofilms produced by the Candida species have many similarities, they do differ from species to species in at least subtle aspects. Even considering the two most closely related species (C. albicans and C. dubliniensis), it is possible to distinguish their biofilms. Although they appear very similar under the confocal microscope, the C. albicans biofilms form faster and under a greater range of conditions; once formed, they are more difficult to disrupt than those of C. dubliniensis. Although these differences may help to explain why C. albicans is a greater problem in the clinic than C. dubliniensis, it is difficult to reconcile these subtle differences in phenotype with the large differences in the underlying transcriptional circuitry. Given the large magnitude of changes underlying such similar phenotypic output, we propose as a default hypothesis that many of the changes in transcription circuitry result from neutral evolution, more specifically, constructive neutral evolution whereby molecular complexity can change without an increase in fitness (Stoltzfus, 1999; Lynch, 2007; Wagner, 2014; Sorrells and Johnson, 2015; Brunet and Doolittle, 2018). This study clearly shows that complex transcription networks responsible for the same basic phenotype can undergo evolutionary changes that appear much greater in magnitude than the resulting differences in phenotype.
Materials and methods
Characterization of biofilm formation
Request a detailed protocolVisualization of biofilms by CSLM of the different species and strains was performed on silicone squares as described previously (Nobile et al., 2012). The strains used for each species and media employed are shown in Supplementary file 1d and 1a, respectively. Briefly, for the adhesion phase, silicone squares pretreated with adult bovine serum albumin (BSA) were inoculated to an OD600 of 0.5 with cells from an overnight culture grown at 30°C in YPD medium. After incubation for 90 min at 37°C and 200 rpm in the specific medium (Supplementary file 1a) for adhesion, the squares were washed with phosphate-buffered saline and then placed in fresh media and incubated for 48 hr at 37°C and 200 rpm. Biofilms of the species that do not grow well at 37°C were grown at 30°C as indicated in Figure 2. After 48 hr, the biofilms were stained for 1 hr with 50 mg/mL of concanavalin A-Alexa Fluor 594 conjugate and visualized on a Nikon Eclipse C1si upright spectral imaging confocal microscope using a 40×/0.80W Nikon objective. At least two independent silicone squares were observed per strain analyzed.
Visualization of biofilm formation over time was also performed by CSLM in Spider medium. For each of the four species observed (C. albicans, C. dubliniensis, C. tropicalis, and C. parapsilosis), seven independent silicone squares were used for biofilm formation as described above and the biofilms at each square were visualized at 30 min, 4, 8, 12, 24, 48, and 96 hr after the adhesion phase (Figure 2—figure supplement 1).
Determination of the biomass dry weight of biofilms of the different species and strains was performed by growing biofilms on the bottoms of 6-well polystyrene plates pretreated with BSA as previously described (Nobile et al., 2012). The cells were adhered for 90 min. These assays were performed in a modified Spider medium that contained 1% glucose rather than mannitol as the carbon source. After 48 hr of biofilm formation, supernatants were aspirated and biofilms were scraped and placed to dry on top of a filter paper. Dried biofilms were weighed on an analytic scale subtracting the weight of a filter paper in which the media without cells was filtered. Five technical replicates were performed per strain as it has been previously done for large screens using this assay (Nobile et al., 2012). As was performed for CSLM visualization, strains that do not grow well at 37°C were grown at 30°C.
Biofilm formation in a Bioflux microfluidic device (Fluxion Biosciences) was assayed as described previously (Gulati et al., 2017). The medium used was Spider with 1% glucose and without mannitol, and assays were performed at 37 and 30°C for 24 hr.
In vivo biofilm formation assays were performed using the rat central-venous catheter infection model as previously described (Andes et al., 2004). After 24 hr of infection by the four species tested, biofilm formation on the intraluminal surface of the catheters was observed by scanning electron microscopy. Procedures were approved by the Institutional Animal Care and Use Committee at the University of Wisconsin, Madison (protocol MV1947).
Generation of gene deletion strains
Request a detailed protocolGene deletion strains were constructed using a similar fusion PCR strategy as that described by Noble and Johnson, 2005 employing histidine and leucine auxotrophic strains. Construction of these strains was performed using the SAT1 flipper strategy as previously described (Mancera et al., 2019). All the strains employed and generated in this study are shown in Supplementary file 1d. In brief, the two alleles of each regulator in C. dubliniensis were subsequently deleted using the C. albicans HIS1 and LEU2 genes. In C. tropicalis, the first allele was deleted using the C. albicans LEU2 gene while the second was deleted using the CaHygB gene that confers resistance to hygromycin B (Basso et al., 2010). To generate the gene deletion cassettes, ~350 bp flanking 5′ and 3′ regions of each regulator were PCR-amplified from genomic DNA and fused to the corresponding auxotrophic/drug resistance marker by fusion PCR. Transformation was performed by electroporation as previously described (Porman et al., 2011). Verification of correct integration of the gene deletion cassettes was performed by colony PCR with primers directed to both flanks of the disrupted gene. Final gene deletion confirmation was performed by colony PCR with primers that anneal at the ORF of each regulator. Two independent isolates of each deletion mutant originating from two separate transformations were generated for each regulator deletion. The regulator knockout strains of C. albicans and C. parapsilosis had been previously generated as part of efforts to generate collections of regulator gene knockout strains (Homann et al., 2009; Holland et al., 2014).
Chromatin immunoprecipitation followed by sequencing
Request a detailed protocolChIP-seq to identify the target genes of the seven regulators was performed as previously described (Hernday et al., 2010; Lohse and Johnson, 2016) and sequenced using Illumina HiSeq 2500 or 4000 platforms. Each of the seven regulators in C. dubliniensis and C. tropicalis was tagged in the wildtype strain background with a 13× Myc epitope tag at the C-terminus from the pADH34 or pEM019 plasmids, respectively, as previously described (Hernday et al., 2010; Mancera et al., 2019). C. albicans Myc-tagged strains had been similarly generated previously (Nobile et al., 2012). C. parapsilosis Brg1, Ndt80, and Tec1 were tagged using a 6× C-terminal Myc tag amplified from plasmid pFA-MYC-HIS1 as previously described (Connolly et al., 2013). C. parapsilosis Efg1 had been previously tagged with the same 6× C-terminal Myc epitope (Connolly et al., 2013). We were not able to tag the regulator Bcr1 in this species. Genotype details for all the strains generated and used are given in Supplementary file 1d. All the tagged regulator strains were tested for their abilities to form biofilms on the bottoms of 6-well polystyrene plates as previously described (Nobile et al., 2012), and no biofilm defects were observed.
To generate the C. albicans/C. dubliniensis-tagged hybrid strains, the α or a allele of the mating-type-like (MTL) locus was deleted in the Ndt80-tagged strains described above. These deletions allowed the strains to become capable of white-opaque switching, and thus mating competent. In C. albicans, the α allele of the MTL locus was deleted by replacing it with ARG4 using the plasmid pJD1 as previously described (Lohse et al., 2016). In C. dubliniensis, the a allele of the MTL locus was deleted using a cassette containing the SAT1 nourseothricin resistance marker from plasmid pSFS2A flanked by ~300 bp homology regions identical to the 3′ and 5′ upstream/downstream regions of the MTL locus. pSFS2A is a plasmid derived from pSFS2 (Reuss et al., 2004) that contains the SAT1 reusable cassette in the backbone of vector pBC SK+ instead of pBluescript II KS and that was kindly provided by Joachim Morschhauser (U. Würzburg). The deletion of the α or a alleles was verified by colony PCR of the two flanks. Generation of the hybrid strains was done by overlaying the mating competent wildtype and Myc-tagged strains on a YPD plate for 48 hr at 30°C. Single colonies were then streaked out and hybrids were selected by growing on media containing nourseothricin and lacking arginine. The hybrid strains were further verified measuring DNA content using FACS. As controls, the two wildtype untagged mating competent strains were hybridized.
All immunoprecipitation experiments were performed under biofilm growth conditions in 6-well polystyrene plates as previously described (Nobile et al., 2012). After 48 hr of biofilm growth in Spider 1% glucose at 37°C and 200 rpm, cells were fixed with 1% formaldehyde for 15 min. Cell disruption and immunoprecipitation were performed as previously described (Hernday et al., 2010) using a c-Myc tag monoclonal antibody (RRID:AB_2536303). After crosslink reversal, instead of performing a phenol/chloroform extraction, we used a MiniElute QIAGEN kit to purify the immunoprecipitated DNA. Library preparation for Illumina sequencing was performed using an NEBNext ChIP-Seq Library Prep Master Mix Set for Illumina sequencing. Between 12 and 24 samples were multiplexed per lane. As controls, immunoprecipitations were performed in matched strains that lacked the Myc tag. In agreement with the ChIP-seq guidelines and practices of the ENCODE consortia (Landt et al., 2012), two biological replicates were performed for each regulator in the four species.
Identification of regulator directly bound target genes by ChIP-seq
Request a detailed protocolChIP-seq reads were mapped to their corresponding genome using Bowtie 2 with default parameters (Langmead and Salzberg, 2012). The genome sequences and annotations were obtained from CGD (Skrzypek et al., 2017) for versions: C_albicans_SC5314_version_A21-s02-m09-r02, C_dubliniensis_CD36_version_s01-m02-r08, C_tropicalis_MYA-3404_2013_12_11, and C_parapsilosis_CDC317_version_s01-m03-r13. The SAMtools package was used to convert, sort, and index the sequenced reads to BAM format (Li et al., 2009). We observed that the peak calling algorithm was more specific and sensitive if the number of reads in the treatment and control datasets was similar. Therefore, we adjusted the number of reads in the different treatment-control dataset pairs using SAMtools view -s function prior to peak calling. Peak calling was performed using MACS2 (Zhang et al., 2008) with a q-value cutoff of 0.01; the shiftsize parameter was determined using the SPP package in R (Kharchenko et al., 2008). Peaks were considered as true binding events only if the peak was identified in both biological replicates. Assignment of peaks to ORFs was done using MochiView when the peak was present in the intergenic region immediately upstream of the ORF (Homann and Johnson, 2010).
To identify regulator binding target genes in the hybrid strains, ChIP-seq reads were aligned to the C. albicans and C. dubliniensis genomes as described above. Reads that aligned to both genomes were subsequently filtered out. Further processing, peak calling, and assignment of peaks to ORFs were then performed independently for reads that mapped to the C. albicans and C. dubliniensis genomes as described above.
De novo sequence motif discovery and enrichment for the regulators
Request a detailed protocolDNA-binding motifs were generated de novo for regulators from ChIP-Seq experiments using DREME (Bailey, 2011). The union of the sequences under the peaks of the two biological replicates for each experiment was tested against a background of equivalent length random genomic sequences from that species. The top-scoring motif was taken and is shown in Figure 5.
Based on the motifs generated de novo using DREME as well as previously reported DNA-binding motifs (Lassak et al., 2011; Nobile et al., 2012; Connolly et al., 2013; Nocedal et al., 2017), a high-confidence ‘consensus motif’ was generated for Ndt80 (CACAAA) and Efg1 (TGCAT). To determine enrichment of these consensus motifs in peaks identified for each ChIP-seq experiment, the number of consensus motifs in the union of the sequence under the peaks of both biological replicates was compared to the number of motifs in intergenic regions for that species. A Fisher’s one-tailed exact test was performed to generate a p-value representing enrichment of the motif in peaks compared to equivalent length random intergenic sequences.
To determine potential gene targets based on motif presence, we scored the presence of the de novo DNA-binding motifs of Efg1 or Ndt80 generated for each species described above in all the intergenic regions of the four species using the Motif scoring function of MochiView (Homann and Johnson, 2010). Potential gene targets were defined as those having at least one motif in their upstream intergenic region. Then, for each species, the overlap was calculated between the potential gene targets found using the motifs derived from its own ChIP-seq data and the potential gene targets found using the motifs derived from the ChIP-seq data of each of the other three species.
Genome-wide transcription profiling
Request a detailed protocolCultures for the extraction of total RNA under biofilm growth conditions were performed on biofilms grown on the bottom of 6-well polystyrene plates for 48 hr at 37°C and 200 rpm as previously described for the determination of biofilm biomass dry weight (Nobile et al., 2012). The media used was Spider 1% glucose for all species, Spider for C. albicans and C. dubliniensis, and RPMI 1% glucose for C. albicans and C. tropicalis. Planktonic cultures for total RNA were grown in the corresponding media by inoculating with cells from an overnight 30°C YPD culture to an OD600 of 0.05. Cultures were then grown in flasks at 37°C with shaking at 225 rpm until they reached an OD600 of 1.0. Biofilm and planktonic cultures were harvested immediately by centrifugation at 3000 g for 3 min and snap-frozen in liquid nitrogen. Total RNA was extracted from the frozen pellets using the RiboPure-Yeast RNA kit (Ambion, AM1926) following the manufacturer’s recommendations. Transcription profiling was performed by hybridization to custom-designed Agilent 8*15 k oligonucleotide microarrays that contain between 2 and 3 independent probes for each ORF (C. albicans AMADID #020166; C. dubliniensis AMADID #042592; C. tropicalis AMADID #042593). cDNA synthesis, dye coupling, hybridization, and microarray analysis was performed as previously described (Nobile et al., 2012). In agreement with previous reports (Nobile et al., 2012; Nocedal et al., 2017), two biological replicates were performed for each species in each condition using the wildtype strains. The genes that are differentially expressed in C. parapsilosis during biofilm formation were obtained from Holland et al., 2014, Table_S3.xls, where a cutoff of 1.5 log2 fold change was used to define differentially expressed genes.
Data deposition
Request a detailed protocolChIP-seq and microarray gene expression data has been deposited to the NCBI Gene Expression Omnibus (GEO) repository under Superseries GSE160783.
Data availability
ChIP-seq and microarray gene expression data has been deposited to the NCBI Gene Expression Omnibus (GEO) repository under Superseries GSE160783.
-
NCBI Gene Expression OmnibusID GSE160783. Evolution of biofilm formation in Candida.
-
NCBI Gene Expression OmnibusID GSE57451. Comparative phenotypic analysis of the major fungal pathogens Candida parapsilosis and Candida albicans.
References
-
Portrait of Candida species biofilm regulatory network genesTrends in Microbiology 25:62–75.https://doi.org/10.1016/j.tim.2016.09.004
-
DREME: motif discovery in transcription factor ChIP-seq dataBioinformatics 27:1653–1659.https://doi.org/10.1093/bioinformatics/btr261
-
How to build a biofilm: a fungal perspectiveCurrent Opinion in Microbiology 9:588–594.https://doi.org/10.1016/j.mib.2006.10.003
-
The generality of constructive neutral evolutionBiology & Philosophy 33:2.https://doi.org/10.1007/s10539-018-9614-6
-
Systematic evaluation of factors influencing ChIP-seq fidelityNature Methods 9:609–614.https://doi.org/10.1038/nmeth.1985
-
Biofilm formation: a clinically relevant microbiological processClinical Infectious Diseases 33:1387–1392.https://doi.org/10.1086/322972
-
An expanded regulatory network temporally controls Candida albicans biofilm formationMolecular Microbiology 96:1226–1239.https://doi.org/10.1111/mmi.13002
-
Evolutionary genomics of yeast pathogens in the saccharomycotinaFEMS Yeast Research 16:fow064.https://doi.org/10.1093/femsyr/fow064
-
Visualization of biofilm formation in Candida albicans using an automated microfluidic deviceJournal of Visualized Experiments 130:56743.https://doi.org/10.3791/56743
-
Genetics and molecular biology in Candida albicansMethods in Enzymology 470:737–758.https://doi.org/10.1016/S0076-6879(10)70031-8
-
Genetic and phenotypic intra-species variation in Candida albicansGenome Research 25:413–425.https://doi.org/10.1101/gr.174623.114
-
Circuit diversification in a biofilm regulatory networkPLOS Pathogens 15:e1007787.https://doi.org/10.1371/journal.ppat.1007787
-
Design and analysis of ChIP-seq experiments for DNA-binding proteinsNature Biotechnology 26:1351–1359.https://doi.org/10.1038/nbt.1508
-
Invasive candidiasisNew England Journal of Medicine 373:1445–1456.https://doi.org/10.1056/NEJMra1315399
-
Role of biofilm morphology, matrix content and surface hydrophobicity in the biofilm-forming capacity of various Candida speciesJournal of Medical Microbiology 67:889–892.https://doi.org/10.1099/jmm.0.000747
-
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortiaGenome Research 22:1813–1831.https://doi.org/10.1101/gr.136184.111
-
Fast gapped-read alignment with bowtie 2Nature Methods 9:357–359.https://doi.org/10.1038/nmeth.1923
-
Target specificity of the Candida albicans Efg1 regulatorMolecular Microbiology 82:602–618.https://doi.org/10.1111/j.1365-2958.2011.07837.x
-
The sequence alignment/Map format and SAMtoolsBioinformatics 25:2078–2079.https://doi.org/10.1093/bioinformatics/btp352
-
Assessment and Optimizations of Candida albicans In Vitro Biofilm AssaysAntimicrobial Agents and Chemotherapy 61:e02749.https://doi.org/10.1128/AAC.02749-16
-
Development and regulation of single- and multi-species Candida albicans biofilmsNature Reviews Microbiology 16:19–31.https://doi.org/10.1038/nrmicro.2017.107
-
Identification and characterization of Wor4, a new transcriptional regulator of White-Opaque switchingG3: Genes, Genomes, Genetics 6:721–729.https://doi.org/10.1534/g3.115.024885
-
Oral candidiasis history, classification, and clinical presentationOral Surgery, Oral Medicine, and Oral Pathology 78:189–193.https://doi.org/10.1016/0030-4220(94)90146-5
-
Comparative genome analysis and gene finding in Candida species using CGOBMolecular Biology and Evolution 30:1281–1291.https://doi.org/10.1093/molbev/mst042
-
Finding a missing gene: Efg1 regulates morphogenesis in Candida tropicalisG3: Genes, Genomes, Genetics 5:849–856.https://doi.org/10.1534/g3.115.017566
-
Genetic modification of closely related Candida speciesFrontiers in Microbiology 10:357.https://doi.org/10.3389/fmicb.2019.00357
-
Centromere size and position in Candida albicans are evolutionarily conserved independent of DNA sequence heterogeneityMolecular Genetics and Genomics 278:455–465.https://doi.org/10.1007/s00438-007-0263-8
-
Candida albicans versus Candida dubliniensis: why is C. albicans more pathogenic?International Journal of Microbiology 2012:205921.https://doi.org/10.1155/2012/205921
-
Candida albicans biofilms and human diseaseAnnual Review of Microbiology 69:71–92.https://doi.org/10.1146/annurev-micro-091014-104330
-
Biofilm formation by Candida dubliniensisJournal of Clinical Microbiology 39:3234–3240.https://doi.org/10.1128/JCM.39.9.3234-3240.2001
-
Candida albicans biofilm-defective mutantsEukaryotic Cell 4:1493–1502.https://doi.org/10.1128/EC.4.8.1493-1502.2005
-
Adherence and biofilm formation of non-Candida albicans Candida speciesTrends in Microbiology 19:241–247.https://doi.org/10.1016/j.tim.2011.02.003
-
On the possibility of constructive neutral evolutionJournal of Molecular Evolution 49:169–181.https://doi.org/10.1007/PL00006540
-
The Candida pathogenic species complexCold Spring Harbor Perspectives in Medicine 4:a019778.https://doi.org/10.1101/cshperspect.a019778
-
Model-based analysis of ChIP-Seq (MACS)Genome Biology 9:R137.https://doi.org/10.1186/gb-2008-9-9-r137
Decision letter
-
Patricia J WittkoppSenior Editor; University of Michigan, United States
-
Christian R LandryReviewing Editor; Université Laval, Canada
-
Bin HeReviewer
-
Mira EdgertonReviewer
-
Sadri ZnaidiReviewer; Institut Pasteur de Tunis/Institut Pasteur, Tunisia
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
Mancera and colleagues examine the evolution of the regulatory circuitry involved in biofilm formation in Candida albicans and closely related species. Using a combination of genomics approaches that they apply in a comparative manner across closely related species, they show that some features of this network have changed significantly, while others are conserved. This work contributes to a better understanding of how traits that are key to the success of opportunistic pathogenic fungi evolve through changes in regulatory networks.
Decision letter after peer review:
Thank you for submitting your article "Evolution of the complex transcription network controlling biofilm formation in Candida species" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Patricia Wittkopp as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Bin He (Reviewer #1); Mira Edgerton (Reviewer #2); Sadri Znaidi (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)
Summary:
Mancera and colleagues examine the evolution of the regulatory circuitry involved in biofilm formation in Candida and other closely related species. Using a combination of genomics approaches that they apply in a comparative manner across closely related species, they show that this network has changed significantly while some features have been conserved.
Three experts with complementary expertise have evaluated your manuscript. They all identified key issues that would need to be addressed in a revised version of the manuscript. I added the list of comments below.
Essential revisions:
1. The first reviewer suggests many approaches that would help anchor the results and analysis in an evolutionary framework. Some of them are critical to the paper, for instance considering the species non-independence when performing comparative analysis. Other issues related to statistical analysis are also critical. The changes requested will significantly strengthen the quality and impact of your work.
2. One reviewer was more critical about the inclusion of more distantly related species because the conditions associated with biofilm formation differ for those. However, another reviewer found these experiments important. I suggest you keep them in the paper but include a discussion of these points in the manuscript.
3. The third reviewer suggests ways in which the text can be clarified and made easier to follow.
Reviewer #1:
This work produced highly valuable phenotypic and molecular binding datasets for an important phenotype, namely biofilm formation, which will have significant impact both for the medical community in terms of better understanding and potentially advancing the treatment for candida infections, and also in the evolution of complex transcriptional networks. The major weakness of the manuscript as noted in the detailed review below is that it didn't connect the molecular binding differences with the differences in gene expression and phenotype, and some of the statistical tests didn't properly account for the phylogenetic relationships between the species and thus are likely invalid.
In this work Mancera and co-authors built upon a series of significant findings from the Johnson lab on the architecture and evolution of the biofilm network, and produced highly valuable datasets for future research. The key question in this work is how a complex network such as the one controlling biofilm formation could evolve over a relatively short amount of time. To do so, they focused on two closely related species of C. albicans to determine the evolution of the key biofilm regulators and their targets. By performing detailed phenotypic comparisons and collecting large amount of functional genomics data, they reached five main conclusions, including 1) enhanced and more robust biofilm forming abilities in C. albicans, followed by its close relatives, 2) the slow substitution of the regulators, their DNA specificity and interacting partners, and yet fast divergence in their targets. The latter, namely trans factors evolved more slowly than their cis counterparts is consistent with previous studies in other systems, such as Drosophila species and mammals (Wilson et al. 2008, the human chromosome 21 placed in a mouse cell). This paper made several important contributions. First, it performed careful phenotypic assays to characterize the biofilm phenotypes under different inducing conditions in a group of closely related and medically relevant species. Second, it provided a rich set of molecular binding and gene expression data for future studies to identify and test the importance of individual regulators and their targets' role in biofilm.
While reading the manuscript, I had a number of concerns and what I see as missed opportunities for the authors to realize the full potential of their work. First and foremost, this work produced three categories of comparative data and yet failed to connect them. These three data categories are phenotypic (biofilm characteristics), gene expression (RNA-seq) and regulators binding (ChIP). An obvious causal relationship among the three categories exists, namely divergence in regulator binding leading to changes in gene expression, which in turn results in phenotypic divergence. However, in the most part the authors analyzed and presented the data as separate entities, not allowing an examination on their causal relationships. I acknowledge that it is difficult to make explicit correlations between the three given the complexity of the network. However, certain analyses should be attempted and seem feasible. For example, one could analyze the extent to which the observed differences in regulator binding correlate with the divergence in gene expression during biofilm formation. Without such analyses, the three categories of data are disconnected and don't allow for a deep understanding of how the complex network evolved.
A related issue is that the results are primarily descriptive, without inferences about the ancestral state and the direction of evolution. This is particularly true of the binding data, where the major types of analyses were about overlaps between the species. This feeds into the issue above as it is not clear how those differences in binding relate to the differences in the biofilm phenotype. Notably, the authors used the words "evolve" and "diversity" in several context, which to me imply a direction in evolution, while the presented analyses only concern differences between species.
A second concern is that in most of the analyses the authors considered a gene as a "target in the biofilm network" if orthologs of one of the seven major regulators of biofilm formation in C. albicans bound in its upstream region (as evidenced by ChIP). It is well known that TF binding is not always productive, that is, leading to gene induction or repression, which in turn don't always affect the phenotype. This raises a serious question in my mind: how much of the observed binding difference actually relates to the biofilm phenotype vs stochastic changes due to neutral evolution? I could imagine a model in which only a small percentage of the binding events are functional, while the rest represent spurious binding motifs that evolve more or less neutrally. The observed difference could then be due to genetic drift in the gain and loss of short DNA motifs. The authors did include a few additional criteria in their comparative analyses, including the requirement for a high scoring motif in the binding region and a two-fold change in the gene's expression under biofilm forming conditions. These do not completely address the concern above and were also presented towards the end of the Results section, leaving me as a reader pondering on the above question for most of the manuscript. In my opinion, the issue of binding ≠ function should be addressed head-on and controlled for using complementary data, such as RNA-seq of the wild type and TF-KO strain under biofilm formation conditions. If that's not possible, the authors should attempt to estimate the proportion of ChIP identified sites as being truly in the biofilm network.
Third, I have doubts about the statistical tests behind a number of conclusions in the paper, e.g. on lines 337-340, the authors said "the overlap for each species pair is larger than expected by chance (hypergeometric test, P < 0.05), indicating that a small but significant group of Ntd80-target gene connection have been preserved across these species", and on lines 407 to 410, the authors said "the number of connections observed between the regulators is greater than what would be expected by chance (P << 0.05).… high connectivity between the master regulators is conserved in the four Candida species analyzed". Based on the description, I think the authors performed the tests under the null hypothesis that the regulator-target relationships evolved independently in these species, while they were actually non-independent due to the species sharing a common ancestor. The concept of phylogenetically independent contrast and a number of statistical tests based on it was developed so as to take into account the relatedness among the species being compared. One would need to specify a neutral rate at which a trait evolves, and compare the observed level of divergence to the expected level in order to make conclusions about their conservation due to selective constraint.
Reviewer #2:
This comprehensive study describes the relatedness and evolution of biofilm specific genes among several Candida strains. The strength and impact of this study is elucidating the transcriptional regulators controlling biofilms with resulting functional differences in biofilm formation among closely and distantly related Candida species.
Candida species differ in their relative ability to form biofilms, and this study documents differences in the transcriptional networks among species that influence this phenotype. This work shows that among seven master transcriptional regulators needed for robust C. albicans biofilm formation, fewer are involved in biofilm formation in more distantly related species and this accompanies less organized or poorer biofilm production. The authors find that this network of regulator-target connections are predictive of evolutionary changes across species and provide insight into key components involved in biofilm formation.
This manuscript comprises an impressive breadth of experiments, yet the data is easy to follow and the figures are understandable synthesis of an enormous amount of data. The biofilm conditions selected are comprehensive and resulting gene expression data well validated and statistically supported. There are major formatting errors in Suppl Figure 4 , and minor errors in Suppl Figure 1 (top legend) and Suppl Figure 2 ( species names below blank ?).
Reviewer #3:
This work highlights similarities and differences in the way pathogenic Candida species control biofilm formation, contributing to our understanding of how pathogenic traits evolved in species of the Candida lineage. Although this study is limited to four related species, with some overstated conclusions, it could serve as a nice example for how transcriptional circuits operate in microbial pathogens that recently shared a common ancestor.
An important virulence trait of fungal pathogens is their ability to form biofilms, which are communities of adherent cells encased in a polymeric matrix acting as protective structures. Transcriptional control of biofilm development is a complex process, involving a plethora of regulators with variable impact on both structure and function of biofilms. Here authors build on their previous findings to further map the transcriptional regulatory network that controls biofilm formation in four pathogenic Candida species of medical importance, namely C. albicans, C. dubliniensis, C. tropicalis and C. parapsilosis. They provide interesting clues as to how such a circuitry operates in these species. Although authors attempted to include some additional species from the Candida and Saccharomyces lineages in their analyses, they are faced with the complexity of the environmental cues/specificities that trigger biofilm formation in these more distantly-related organisms, reflecting the challenge of studying and explaining the evolution of some traits over extended evolutionary time scales. This study could've been certainly more impactful and comprehensive if some technical issues had been addressed/resolved (e.g. failure of tagging/expressing a subset of transcription factors in the species being analyzed/compared) or if some conclusions had been tempered. Collectively, authors present a valuable work on the transcriptional control of an important pathogenicity trait in medically-important Candida species and extend the reach of the regulatory circuitry operating during biofilm formation in fungal pathogens.
In this work, Mancera and colleagues investigate the transcriptional circuitries that control biofilm development in four medically-important Candida species using functional genomics. They characterize a subset of regulators that were already studied in Candida albicans and Candida parapsilosis in major publications by the groups involved in this study (Nobile et al. 2012, Holland et al. 2014) in two additional related species from the Candida lineage, namely Candida tropicalis and Candida dubliniensis. Although authors attempted to include some additional species from the Candida and Saccharomyces lineages in their analyses, they are faced with the complexity of the environmental cues/specificities that trigger biofilm formation in these more distantly-related organisms, reflecting the challenge of studying and explaining the evolution of some traits over extended evolutionary time scales. Still, authors provide interesting clues as to how the transcriptional circuitry that controls biofilm development operates in the four phylogenetically-close species being compared. This study could have been more impactful and comprehensive if some technical issues had been addressed/resolved (e.g. failure of tagging/expressing a subset of transcription factors and getting ChIP-seq data in the species being analyzed/compared) or if some non-useful data (e.g. other CTG species, S. cerevisiae/C. glabrata) had been removed and conclusions tempered (see some examples from specific comments).
Specific comments:
– Authors should rather focus on biofilm development in the 4 Candida species, because conditions have been optimized for the set of 4 species. For those distantly-related species, the conditions and biofilm development might not be similar at all. For instance, S. cerevisiae does make biofilms but in quite different ways and under different conditions/stimuli (e.g.#1, mat formation, e.g.#2, glucose inhibits adhesion and biofilm formation in S. cerevisiae, see PMID: 32054862).
– Lines 189-207, description of results but no data are shown. This section does not bring much information and the conclusions are overstated (e.g. "the greater the phylogenetic distance from C. albicans, the thinner the biofilm formed") because in distantly-related species the conditions conducive to biofilm formation may not be similar to those optimized for C. albicans, C. dubliniensis, C. tropicalis and C. parapsilosis. Same for lines 210-217.
– Lines 234-236: "Of all the species studied (…) to physical manipulation (results not shown)". This is again an overstatement. Authors optimized biofilm conditions based on experiments performed on only 4 closely-related species.
– Figure 2: C. tropicalis CSLM data are missing in Panel B.
– Figure 3, Supp Figure 3 and lines 270-271: "All seven master regulators identified in C. albicans were also required for biofilm formation in C. dubliniensis" – However this does not appear to stand true for C. dubliniensis BRG1. Any explanations?
– Figure 3 and lines 271-284: Data from C. albicans and C. dublinienesis should also be shown in Figure 3 to serve as a reference/benchmark. The conclusion pertaining to this figure still needs to be tempered (i.e. lines 286-287, "with the degree of diversity roughly paralleling their evolutionary distance from C. albicans"). No clear evidence supports such a conclusion, because observations were made based on mutant phenotypes from only 4 closely-related Candida species.
– Lines 304-309: This is a major weakness in this manuscript. First, did author test the functionality of the tagged transcription factors (TFs)? What about their expression level in biofilm-stimulating conditions? The fact that authors failed to ChIP some of them might be due to epitope tagging which could have altered their function. Which TFs failed to be tested? It is not clearly stated in the manuscript. Again, failure to clearly show which regulators have been successfully ChIPed and the lack of data from those regulators weakens the manuscript. Maybe authors should give them a second try (by tagging differently C-term vs N-term/testing the functionality?)
– Supp Table 2 is a key table and should be included in the main text. Legend is missing, it is not clear what "N/A" vs "0" stand for? It appears that Ndt80 and Efg1 targets account for the majority of binding events in the species being studied. Unlike the other regulators, these are major regulators for many important processes in Candida species including, for instance, filamentous growth. Consequently, the high connectivity of the biofilm network would not be surprising if one takes the Ndt80/Efg1 network as an example. This has been also shown for the Sfl1/Sfl2 transcriptional circuitry that controls filamentous growth in C. albicans (PMID: 23966855).
– Figure 4B/C: It is not advised to show only percentages, as we could have the impression that we are dealing with big numbers whereas in reality we are not (e.g. Rob1, only 20 targets). Same for lines 340-343: the low number of Rob1 targets should prevent authors from drawing strong conclusions.
– Lines 345-355 and 369-385: These sections should rather go to the Discussion section or should be rewritten as data originating from experiments (i.e. results per se, not discussion). In many occurrences, authors discuss their data in the Results section. This strongly alters the quality of the reading flow. Same for lines 404-408, appearing as data but could also be moved to the Discussion section.
– Lines 415-441 and Supp Figure 4: This section is rather technical in nature and should have been presented (or summarized) earlier in the manuscript (may be following the ChIP-seq section, lines 290-343). Still, Ndt80 is not a good example for performing robustness analyses with regard to the specific conditions under which biofilm formation was induced, because this major regulator appears to exert pleiotropic functions.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Evolution of the complex transcription network controlling biofilm formation in Candida species" for further consideration by eLife. Your revised article has been evaluated by Patricia Wittkopp (Senior Editor) and a Reviewing Editor.
The manuscript has been improved but there are some remaining issues identified by one of the two reviewers that need to be addressed, as outlined below:
1. Regarding the new text added on page 21-22, lines 506-570 (there is a jump from 506 to 560), I understand that the goal is to determine whether trans-changes, specifically the DNA binding specificity, could explain part of the observed binding target divergence. However, I can't quite follow the text, as it is not clear to me which species' motif was used to predict in which species' genome, and how were the overlaps actually calculated. I did look for additional information in the Methods section and couldn't find any.
2. Regarding the new text from lines 619-632: am I correct in that the new results were meant to determine the relationship between binding and gene induction *in each species*, rather than attributing gene induction *differences between species* to differences in TF binding in those species? The reason for the question is because I came in expecting answers to the latter question and got confused for a moment.
3. In lines 646-649, the authors laid out the challenges involved in between-species comparison, which I fully agree with. But I don't think the Ca-Cd hybrid experiment can address that. It does address a different question, i.e. specifically revealing cis-changes behind gene expression divergence, which I feel is different from the first one.
4. In Figure 6A, I wonder if the authors can comment on the concave shape of the point cloud on the left.
5. In Figure 6B, there are more than one data points for the two binding and one expression change datasets in the middle and right time points. Are those biological replicates?
6. In the Discussion section, the authors stated that this work examined how a complex transcriptional network underlying a specific phenotype (biofilm formation) evolved over a span of ~70 million years. I think it would be useful to point out, the authors did this to some extent later, that to fully reconstruct the evolutionary history of this network, it is critical to identify all the regulators in the other three species, and that the data in this work constitutes a partial picture for the three non-albicans species.
https://doi.org/10.7554/eLife.64682.sa1Author response
Essential revisions:
Reviewer #1:
[…]
While reading the manuscript, I had a number of concerns and what I see as missed opportunities for the authors to realize the full potential of their work. First and foremost, this work produced three categories of comparative data and yet failed to connect them. These three data categories are phenotypic (biofilm characteristics), gene expression (RNA-seq) and regulators binding (ChIP). An obvious causal relationship among the three categories exists, namely divergence in regulator binding leading to changes in gene expression, which in turn results in phenotypic divergence. However, in the most part the authors analyzed and presented the data as separate entities, not allowing an examination on their causal relationships. I acknowledge that it is difficult to make explicit correlations between the three given the complexity of the network. However, certain analyses should be attempted and seem feasible. For example, one could analyze the extent to which the observed differences in regulator binding correlate with the divergence in gene expression during biofilm formation. Without such analyses, the three categories of data are disconnected and don't allow for a deep understanding of how the complex network evolved.
As suggested by the reviewer, we have included an analysis of the overlap between the DNA binding data of the regulators and the gene expression of their target genes (ln. 619). Additional analyses will require extensive experimentation, especially given the number of regulators and species considered in the current study; we have plans to carry such experiments in future studies, where their findings can be published in detail.
A related issue is that the results are primarily descriptive, without inferences about the ancestral state and the direction of evolution. This is particularly true of the binding data, where the major types of analyses were about overlaps between the species. This feeds into the issue above as it is not clear how those differences in binding relate to the differences in the biofilm phenotype. Notably, the authors used the words "evolve" and "diversity" in several context, which to me imply a direction in evolution, while the presented analyses only concern differences between species.
We think it is not currently possible to infer accurately a specific ancestral state from the type of data presented in this paper. In Tuch et al. 2008 (PLoS Biology 6(2):e38) we inferred, using a maximum likelihood approach, the size of an ancestral circuit, but there was no credible way to accurately infer the details of the circuit. In the case of this work it is clear that an ancestral circuit existed and that it changed over time to give rise to the extant species we analyzed. Although we cannot provide specific details, there is no doubt that evolution has occurred, with the “direction” being ancestral to modern.
A second concern is that in most of the analyses the authors considered a gene as a "target in the biofilm network" if orthologs of one of the seven major regulators of biofilm formation in C. albicans bound in its upstream region (as evidenced by ChIP). It is well known that TF binding is not always productive, that is, leading to gene induction or repression, which in turn don't always affect the phenotype. This raises a serious question in my mind: how much of the observed binding difference actually relates to the biofilm phenotype vs stochastic changes due to neutral evolution? I could imagine a model in which only a small percentage of the binding events are functional, while the rest represent spurious binding motifs that evolve more or less neutrally. The observed difference could then be due to genetic drift in the gain and loss of short DNA motifs. The authors did include a few additional criteria in their comparative analyses, including the requirement for a high scoring motif in the binding region and a two-fold change in the gene's expression under biofilm forming conditions. These do not completely address the concern above and were also presented towards the end of the Results section, leaving me as a reader pondering on the above question for most of the manuscript. In my opinion, the issue of binding ≠ function should be addressed head-on and controlled for using complementary data, such as RNA-seq of the wild type and TF-KO strain under biofilm formation conditions. If that's not possible, the authors should attempt to estimate the proportion of ChIP identified sites as being truly in the biofilm network.
As also suggested by reviewer 3, we have moved the section where we integrated different criteria to determine gene targets after we first present the results about the target gene connections so that the issue is addressed early on in the manuscript (ln. 403).
In our experience, determining whether the binding of a regulator is functional is one of the most challenging aspects of analyzing a transcriptional network in any species. As mentioned above, given the number of regulators and species considered in this study, experiments to rigorously establish this point for each piece of binding data are currently beyond what is feasible in our laboratories. We believe that the integration of motif presence and gene expression change during biofilm formation, combined with the binding data, give an adequate estimation of the target gene connections that are most important for biofilm formation in the conditions assessed.
In addition, even with the incorporation of additional experimental data, it is not always possible to show that a binding site is functional; for example, it could only be functional under very specific conditions. Therefore, its modification would only result in a measurable phenotype in those conditions. Given that it is not possible to evaluate every possible condition, there will always be some uncertainty regarding the functionality of binding events in genome-wide binding studies.
Third, I have doubts about the statistical tests behind a number of conclusions in the paper, e.g. on lines 337-340, the authors said "the overlap for each species pair is larger than expected by chance (hypergeometric test, P < 0.05), indicating that a small but significant group of Ntd80-target gene connection have been preserved across these species", and on lines 407 to 410, the authors said "the number of connections observed between the regulators is greater than what would be expected by chance (P << 0.05).… high connectivity between the master regulators is conserved in the four Candida species analyzed". Based on the description, I think the authors performed the tests under the null hypothesis that the regulator-target relationships evolved independently in these species, while they were actually non-independent due to the species sharing a common ancestor. The concept of phylogenetically independent contrast and a number of statistical tests based on it was developed so as to take into account the relatedness among the species being compared. One would need to specify a neutral rate at which a trait evolves, and compare the observed level of divergence to the expected level in order to make conclusions about their conservation due to selective constraint.
This is an important point that is worth clarifying in the manuscript. The aim of these statistical tests was not to evaluate whether the binding connections are evolving under a selective constraint. Although we would be very interested in testing this, with the available data, we do not think this is feasible; there is simply not enough information to credibly estimate a neutral rate at which binding connections are evolving to see whether the rate of change that we are observing deviates from it. Instead, these tests were aimed at assessing whether the binding connections have changed at such a fast rate that there is no evidence of common decent anymore, independently of whether they are evolving by natural selection or drift. We have now clarified this point in lines 337-340 (now ln. 360) and also when we discuss the overlap between differentially expressed genes (ln. 613).
In Figure 6B we compared how the different datasets that we estimated change as a function of substitutions in DNA sequence within ORFs. In lines 387-399 we also compare the rate of binding connection change to the rate of gene loss/gains in the four species. Although these comparisons cannot be used to assess selective constraint, it allows us to place the rate of change in the network in context of the other measures of evolutionary change in the species analyzed.
Regarding the randomizations to test whether the connections between the regulators are prevalent (previously lns. 407 – 410), we think that the test does not contribute significantly to our general conclusions and therefore have removed it from the current version of the manuscript.
Reviewer #2:
This manuscript comprises an impressive breadth of experiments, yet the data is easy to follow and the figures are understandable synthesis of an enormous amount of data. The biofilm conditions selected are comprehensive and resulting gene expression data well validated and statistically supported. There are major formatting errors in Suppl Figure 4 , and minor errors in Suppl Figure 1 (top legend) and Suppl Figure 2 ( species names below blank ?).
As mentioned above in the response to Reviewer 1, this is probably an issue related to the files provided to the reviewers since the figures in the files that we are able to download from the eLife system are properly formatted. We will make sure that the figures are properly formatted in the proof review.
Reviewer #3:
In this work, Mancera and colleagues investigate the transcriptional circuitries that control biofilm development in four medically-important Candida species using functional genomics. They characterize a subset of regulators that were already studied in Candida albicans and Candida parapsilosis in major publications by the groups involved in this study (Nobile et al. 2012, Holland et al. 2014) in two additional related species from the Candida lineage, namely Candida tropicalis and Candida dubliniensis. Although authors attempted to include some additional species from the Candida and Saccharomyces lineages in their analyses, they are faced with the complexity of the environmental cues/specificities that trigger biofilm formation in these more distantly-related organisms, reflecting the challenge of studying and explaining the evolution of some traits over extended evolutionary time scales. Still, authors provide interesting clues as to how the transcriptional circuitry that controls biofilm development operates in the four phylogenetically-close species being compared. This study could have been more impactful and comprehensive if some technical issues had been addressed/resolved (e.g. failure of tagging/expressing a subset of transcription factors and getting ChIP-seq data in the species being analyzed/compared) or if some non-useful data (e.g. other CTG species, S. cerevisiae/C. glabrata) had been removed and conclusions tempered (see some examples from specific comments).
Specific comments:
– Authors should rather focus on biofilm development in the 4 Candida species, because conditions have been optimized for the set of 4 species. For those distantly-related species, the conditions and biofilm development might not be similar at all. For instance, S. cerevisiae does make biofilms but in quite different ways and under different conditions/stimuli (e.g.#1, mat formation, e.g.#2, glucose inhibits adhesion and biofilm formation in S. cerevisiae, see PMID: 32054862).
This is an important point that we now bring up in the manuscript. However, we do think that, despite its limitations, the phenotypic comparison with other species is valuable. It shows that the phenotype is unique to these species either because of the biofilm forming abilities of the four species or due to other factors, such as the way they respond to the growth conditions. Therefore, and as suggested by the editor, we have included discussion of the issue in the corresponding Results section (ln. 254).
– Lines 189-207, description of results but no data are shown. This section does not bring much information and the conclusions are overstated (e.g. "the greater the phylogenetic distance from C. albicans, the thinner the biofilm formed") because in distantly-related species the conditions conducive to biofilm formation may not be similar to those optimized for C. albicans, C. dubliniensis, C. tropicalis and C. parapsilosis. Same for lines 210-217.
We have included a supplementary figure (Figure 2—figure supplement 2) that shows the data described in the section regarding biofilm formation by the different species studied. As mentioned above, we have also included discussion about the limitations of the comparison in the same section (ln. 254).
– Lines 234-236: "Of all the species studied (…) to physical manipulation (results not shown)". This is again an overstatement. Authors optimized biofilm conditions based on experiments performed on only 4 closely-related species.
We have included discussion about this limitation in the corresponding Results section (ln. 254).
– Figure 2: C. tropicalis CSLM data are missing in Panel B.
Due to space restrictions and the fact that biofilms formed by C. tropicalis are quite similar to those formed by C. albicans and C. dubliniensis, we initially decided not to show the corresponding CSLM micrograph for this species in Figure 2. We now show the C. tropicalis micrographs in the new Figure 2—figure supplement 2 and in Figure 3—figure supplement 1, and include this information in the figure legend of Figure 2 (ln. 1071).
– Figure 3, Supp Figure 3 and lines 270-271: "All seven master regulators identified in C. albicans were also required for biofilm formation in C. dubliniensis" – However this does not appear to stand true for C. dubliniensis BRG1. Any explanations?
Although the results of the dry-weight assay of Figure 3 do not show a clear difference between the biofilms formed by the BRG1 mutant and the WT strain, the CSLM micrographs in Figure 3—figure supplement 1 show a clear biofilm formation defect of this mutant. We have included a note in the legend of Figure 3 (ln. 1091) where we refer to the CSLM results shown in Figure 3—figure supplement 1.
– Figure 3 and lines 271-284: Data from C. albicans and C. dublinienesis should also be shown in Figure 3 to serve as a reference/benchmark. The conclusion pertaining to this figure still needs to be tempered (i.e. lines 286-287, "with the degree of diversity roughly paralleling their evolutionary distance from C. albicans"). No clear evidence supports such a conclusion, because observations were made based on mutant phenotypes from only 4 closely-related Candida species.
The data for C. albicans and C. parapsilosis (the data for C. dublinienisis is already shown) are not shown because these results were previously generated with slight variations to the assays employed (different media, for example), such that a side by side comparison would not be easy to interpret. At the end of the legend of Figure 3 (ln. 1096), we cite the corresponding references for the C. albicans and C. parapsilosis data.
The concluding sentence has been slightly modified as suggested by Reviewer 1 (ln. 314). We do think that our results support the notion that, among the four species we analyzed, the function of the TFs in the species that are further away phylogenetically from C. albicans are less conserved in biofilm formation.
– Lines 304-309: This is a major weakness in this manuscript. First, did author test the functionality of the tagged transcription factors (TFs)? What about their expression level in biofilm-stimulating conditions? The fact that authors failed to ChIP some of them might be due to epitope tagging which could have altered their function. Which TFs failed to be tested? It is not clearly stated in the manuscript. Again, failure to clearly show which regulators have been successfully ChIPed and the lack of data from those regulators weakens the manuscript. Maybe authors should give them a second try (by tagging differently C-term vs N-term/testing the functionality?)
We did test the functionality of the tagged TFs by performing biofilm formation assays. All tagged strains formed biofilms that could not be distinguished from the biofilms formed by the WT strains; we have added a note in the Methods section to that respect (ln. 900). Since we did not observe a defect in the tagged strains, we did not perform further analyses on them, such as measuring the expression level of the tagged regulators.
We have included the TFs for which the ChIP-seq experiments did not work in Sup Table 2 (now Supplementary File 1b). We have expanded the legend of the table so that it is easier to understand what the different values in the table mean, including the experiments that failed or that were not attempted (ln. 1170). We did perform experimental variations in the ChIP-seq experiments in an attempt to map the binding sites of the regulators for which the standard ChIP-seq experiments did not work. For example, for the C. parapsilosis regulators, we performed the experiment under planktonic conditions, but with no success. As the reviewer suggested, we also considered using other epitopes for tagging. However, we decided not to proceed with this strategy since it complicates the comparison to other DNA binding results as the antibodies and ChIP-seq conditions that would be used are different, bringing their own biases. In addition, since we needed to tag 27 regulators, using a variety of tags to optimize the ChIP-seq for each regulator was not a viable option. It is important to point out that tagging the strains and ensuring that they work for ChIP-seq is quite time and resource consuming; the only way to know whether the tagged regulator can be precipitated for ChIP-seq in a new species is by actually tagging and performing the ChIP-seq experiment.
Most importantly, our major conclusions would most likely not change with the addition of the DNA binding data from the regulators that were not precipitated successfully. As this reviewer noted below, most of our DNA binding data comes from Efg1 and Ndt80, both of which we were able to successfully ChIP in all four species. Therefore, the general patterns that we observed would most likely hold true with the addition of other binding data from other regulators ChIPed in a selection of species.
– Supp Table 2 is a key table and should be included in the main text. Legend is missing, it is not clear what "N/A" vs "0" stand for? It appears that Ndt80 and Efg1 targets account for the majority of binding events in the species being studied. Unlike the other regulators, these are major regulators for many important processes in Candida species including, for instance, filamentous growth. Consequently, the high connectivity of the biofilm network would not be surprising if one takes the Ndt80/Efg1 network as an example. This has been also shown for the Sfl1/Sfl2 transcriptional circuitry that controls filamentous growth in C. albicans (PMID: 23966855).
We agree that Sup Table 2 (now Supplementary File 1b) is important as it shows technical results of the work. Since we would like to keep the main figures and tables of the paper to show biological findings, we think this table should be kept as supplementary. As suggested, we have expanded the legend of the table so that the meaning of its content is better explained (ln. 1170).
Regarding the comment about the high connectivity of the network given that it involves Ndt80 and Efg1, we agree and include discussion of this in the Discussion section together with the Sfl1/Sfl2 reference (ln. 729). We have also incorporated this reference where we talk about Ndt80 and Efg1 working together (ln. 459).
– Figure 4B/C: It is not advised to show only percentages, as we could have the impression that we are dealing with big numbers whereas in reality we are not (e.g. Rob1, only 20 targets). Same for lines 340-343: the low number of Rob1 targets should prevent authors from drawing strong conclusions.
We have added the gross number of target genes to Figure 4 B and C and also to the whole paragraph where the overlaps of the regulators are described (ln. 356).
– Lines 345-355 and 369-385: These sections should rather go to the Discussion section or should be rewritten as data originating from experiments (i.e. results per se, not discussion). In many occurrences, authors discuss their data in the Results section. This strongly alters the quality of the reading flow. Same for lines 404-408, appearing as data but could also be moved to the Discussion section.
The three sections describe different analyses of the binding connection data and place the results in context with previous findings. Therefore, we prefer not to move them to the Discussion.
As suggested by this Reviewer and also Reviewer 1, we have moved some sections to the Discussion to improve the flow of the Results.
– Lines 415-441 and Supp Figure 4: This section is rather technical in nature and should have been presented (or summarized) earlier in the manuscript (may be following the ChIP-seq section, lines 290-343). Still, Ndt80 is not a good example for performing robustness analyses with regard to the specific conditions under which biofilm formation was induced, because this major regulator appears to exert pleiotropic functions.
As suggested, we have moved this section after the ChIP-seq section (ln 403). The major robustness analysis that we performed was the inclusion of the gene expression and binding motif data to the analysis of the binding connections. We did this for the two regulators for which we were able to identify a binding motif, Efg1 and Ndt80. Both of these regulators are involved in other cellular functions. However, our data show that their binding connections are not under very strict evolutionary constraints; they change very rapidly even between closely related species. Therefore, the robustness of their binding connections is not granted a priori.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
The manuscript has been improved but there are some remaining issues identified by one of the two reviewers that need to be addressed, as outlined below:
1. Regarding the new text added on page 21-22, lines 506-570 (there is a jump from 506 to 560), I understand that the goal is to determine whether trans-changes, specifically the DNA binding specificity, could explain part of the observed binding target divergence. However, I can't quite follow the text, as it is not clear to me which species' motif was used to predict in which species' genome, and how were the overlaps actually calculated. I did look for additional information in the Methods section and couldn't find any.
To improve clarity of these analyses we have rewritten the section (ln. 468) and added description of the methodology employed in the Methods section “de novo sequence motif discovery and enrichment for the regulators” (ln. 928).
2. Regarding the new text from lines 619-632: am I correct in that the new results were meant to determine the relationship between binding and gene induction *in each species*, rather than attributing gene induction *differences between species* to differences in TF binding in those species? The reason for the question is because I came in expecting answers to the latter question and got confused for a moment.
That is correct. We have added the phrase “in each species” to the section to make it clearer (ln. 552). We have also tested the association of regulator binding differences with gene expression changes between species and have included the results at the end of the same Results section (ln. 567).
3. In lines 646-649, the authors laid out the challenges involved in between-species comparison, which I fully agree with. But I don't think the Ca-Cd hybrid experiment can address that. It does address a different question, i.e. specifically revealing cis-changes behind gene expression divergence, which I feel is different from the first one.
We have added the phrase “to a large extent” to lower the tone of our claim (ln. 593). We do think that the ChIP-seq experiment in the hybrid addresses most of the problems associated to species having different physiological responses to the same external environment, at least for Ndt80. In the hybrid, Ndt80 from C. albicans is subject to the same upstream stimuli and signaling pathways as Ndt80 from C. dubliniensis, and therefore the differences in regulator binding have to be mostly due to differences in the regulators binding.
4. In Figure 6A, I wonder if the authors can comment on the concave shape of the point cloud on the left.
This is an interesting observation. It would seem that Ndt80 from C. dubliniensis has a higher occupancy in most of the intergenic regions of the genome, but the trend reverts for the genes that are overall most strongly bound by Ndt80. Unfortunately, without further experimentation we think it would be difficult to establish a mechanistic explanation for the observation.
5. In Figure 6B, there are more than one data points for the two binding and one expression change datasets in the middle and right time points. Are those biological replicates?
For the conservation of master regulators there are three estimates because comparisons were performed between C. albicans and each of the other three species. On the other hand, there are six estimates of binding target and gene expression conservation since comparisons were performed in pairs between all four species. We have explained this in the corresponding figure legend (ln. 1085).
6. In the Discussion section, the authors stated that this work examined how a complex transcriptional network underlying a specific phenotype (biofilm formation) evolved over a span of ~70 million years. I think it would be useful to point out, the authors did this to some extent later, that to fully reconstruct the evolutionary history of this network, it is critical to identify all the regulators in the other three species, and that the data in this work constitutes a partial picture for the three non-albicans species.
We have included a sentence in the Discussion to clarify that further work will be needed to identify all biofilm regulators in C. dubliniensis and C. tropicalis to have a more complete picture of the network in these two species (ln. 648). In C. parapsilosis, a screen of gene deletion mutants to find biofilm regulators has already been performed and we do mention the regulators found in these analyses throughout the manuscript.
https://doi.org/10.7554/eLife.64682.sa2Article and author information
Author details
Funding
Human Frontier Science Program (LT000484/2012-L)
- Eugenio Mancera
UC MEXUS
- Eugenio Mancera
Consejo Nacional de Ciencia y Tecnología (CB-2016-01 282511)
- Eugenio Mancera
Wellcome Trust (209077/Z/17/Z)
- Eugenio Mancera
National Institutes of Health (R01AI083311)
- Alexander D Johnson
National Institutes of Health (R01AI049187)
- Alexander D Johnson
National Institutes of Health (R01AI073289)
- David R Andes
National Institutes of Health (R35GM124594)
- Clarissa J Nobile
National Institutes of Health (R21AI125801)
- Clarissa J Nobile
Pew Charitable Trusts (Pew Biomedical Scholar Award)
- Clarissa J Nobile
Kamangar family endowed chair
- Clarissa J Nobile
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Derek Sullivan and especially Joachim Morschhäuser for providing strains, plasmids, and advice, and Victor Hanson-Smith for help with the ChIP-seq data analysis.
Ethics
Animal experimentation: Procedures were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Wisconsin, Madison (protocol MV1947).
Senior Editor
- Patricia J Wittkopp, University of Michigan, United States
Reviewing Editor
- Christian R Landry, Université Laval, Canada
Reviewers
- Bin He
- Mira Edgerton
- Sadri Znaidi, Institut Pasteur de Tunis/Institut Pasteur, Tunisia
Version history
- Received: November 6, 2020
- Accepted: April 6, 2021
- Accepted Manuscript published: April 7, 2021 (version 1)
- Version of Record published: April 26, 2021 (version 2)
Copyright
© 2021, Mancera et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,016
- Page views
-
- 326
- Downloads
-
- 16
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Evolutionary Biology
- Genetics and Genomics
Microbial plankton play a central role in marine biogeochemical cycles, but the timing in which abundant lineages diversified into ocean environments remains unclear. Here, we reconstructed the timeline in which major clades of bacteria and archaea colonized the ocean using a high-resolution benchmarked phylogenetic tree that allows for simultaneous and direct comparison of the ages of multiple divergent lineages. Our findings show that the diversification of the most prevalent marine clades spans throughout a period of 2.2 Ga, with most clades colonizing the ocean during the last 800 million years. The oldest clades – SAR202, SAR324, Ca. Marinimicrobia, and Marine Group II – diversified around the time of the Great Oxidation Event, during which oxygen concentration increased but remained at microaerophilic levels throughout the Mid-Proterozoic, consistent with the prevalence of some clades within these groups in oxygen minimum zones today. We found the diversification of the prevalent heterotrophic marine clades SAR11, SAR116, SAR92, SAR86, and Roseobacter as well as the Marine Group I to occur near to the Neoproterozoic Oxygenation Event (0.8–0.4 Ga). The diversification of these clades is concomitant with an overall increase of oxygen and nutrients in the ocean at this time, as well as the diversification of eukaryotic algae, consistent with the previous hypothesis that the diversification of heterotrophic bacteria is linked to the emergence of large eukaryotic phytoplankton. The youngest clades correspond to the widespread phototrophic clades Prochlorococcus, Synechococcus, and Crocosphaera, whose diversification happened after the Phanerozoic Oxidation Event (0.45–0.4 Ga), in which oxygen concentrations had already reached their modern levels in the atmosphere and the ocean. Our work clarifies the timing at which abundant lineages of bacteria and archaea colonized the ocean, thereby providing key insights into the evolutionary history of lineages that comprise the majority of prokaryotic biomass in the modern ocean.
-
- Evolutionary Biology
- Genetics and Genomics
In many species, meiotic recombination events tend to occur in narrow intervals of the genome, known as hotspots. In humans and mice, double strand break (DSB) hotspot locations are determined by the DNA-binding specificity of the zinc finger array of the PRDM9 protein, which is rapidly evolving at residues in contact with DNA. Previous models explained this rapid evolution in terms of the need to restore PRDM9 binding sites lost to gene conversion over time, under the assumption that more PRDM9 binding always leads to more DSBs. This assumption, however, does not align with current evidence. Recent experimental work indicates that PRDM9 binding on both homologs facilitates DSB repair, and that the absence of sufficient symmetric binding disrupts meiosis. We therefore consider an alternative hypothesis: that rapid PRDM9 evolution is driven by the need to restore symmetric binding because of its role in coupling DSB formation and efficient repair. To this end, we model the evolution of PRDM9 from first principles: from its binding dynamics to the population genetic processes that govern the evolution of the zinc finger array and its binding sites. We show that the loss of a small number of strong binding sites leads to the use of a greater number of weaker ones, resulting in a sharp reduction in symmetric binding and favoring new PRDM9 alleles that restore the use of a smaller set of strong binding sites. This decrease, in turn, drives rapid PRDM9 evolutionary turnover. Our results therefore suggest that the advantage of new PRDM9 alleles is in limiting the number of binding sites used effectively, rather than in increasing net PRDM9 binding. By extension, our model suggests that the evolutionary advantage of hotspots may have been to increase the efficiency of DSB repair and/or homolog pairing.