Intrinsic cooperativity potentiates parallel cis-regulatory evolution
Abstract
Convergent evolutionary events in independent lineages provide an opportunity to understand why evolution favors certain outcomes over others. We studied such a case where a large set of genes—those coding for the ribosomal proteins—gained cis-regulatory sequences for a particular transcription regulator (Mcm1) in independent fungal lineages. We present evidence that these gains occurred because Mcm1 shares a mechanism of transcriptional activation with an ancestral regulator of the ribosomal protein genes, Rap1. Specifically, we show that Mcm1 and Rap1 have the inherent ability to cooperatively activate transcription through contacts with the general transcription factor TFIID. Because the two regulatory proteins share a common interaction partner, the presence of one ancestral cis-regulatory sequence can ‘channel’ random mutations into functional sites for the second regulator. At a genomic scale, this type of intrinsic cooperativity can account for a pattern of parallel evolution involving the fixation of hundreds of substitutions.
https://doi.org/10.7554/eLife.37563.001eLife digest
Sometimes evolution repeats itself. For example, independent butterfly species can evolve the same warning pattern to ward off predators. In many cases, the reason that a certain trait crops up again and again in parallel evolution is unknown.
One example is from the evolution of fungi, where a particular DNA sequence appeared several times independently in a large range of genes in different fungus species. This DNA sequence binds to a protein called Mcm1, which regulates nearby genes. Exactly why this DNA sequence has evolved in parallel so often in fungi has not been clear until now.
Researchers want to find out what is so special about this DNA binding sequence for Mcm1, as there are many other proteins that could do the same job. Now, Sorrells et al. investigated this further by testing whether a binding site for another protein Rap1 often found close by had a role to play. Experiments using 162 different fungus species showed that Mcm1 binding sites had evolved 12 times in parallel. Rap1 and Mcm1 did indeed turn out to work together to regulate nearby genes. The two proteins interact with a large protein complex critical for activating genes.
As a result, Mcm1 binding sites are more likely to evolve and play a role in gene regulation in different species when they are located near Rap1 binding sites. This could explain why this particular DNA sequence has evolved in parallel so many times. The same principle may apply to other genetic sequences involved in parallel evolution. With this understanding, it could be possible to predict when and where this event might occur in the future in fungi. This could be particularly useful for working towards being able to predict and anticipate the evolution of drug-resistant fungal pathogens.
https://doi.org/10.7554/eLife.37563.002Introduction
Contingency is widespread in evolution, with chance historical changes dictating the repertoire of future possibilities (Bershtein et al., 2006; Bloom et al., 2006; Blount et al., 2012; Ortlund et al., 2007; Sorrells et al., 2015; Starr et al., 2017). Nevertheless, repeated evolutionary events in different lineages demonstrate that evolution is, at least in some instances, predictable. Repeatability is broadly referred to as convergent evolution, and when a trait arises repeatedly by a similar molecular mechanism, it is referred to as parallel evolution (Stern, 2013). Beginning with Darwin (Darwin, 1883), convergence has been taken as evidence for adaptation, but it can also be caused by drift within the constraints that arise from the properties of biological systems (Losos, 2011). Instances of repeated events can be thought of as natural experimental replicates for finding general principles that result in convergent evolution, and this information could be used to predict which evolutionary paths are most probable.
Genomic studies have found sets of genes that underlie convergent evolution of a wide variety of traits from across the tree of life (Bellott et al., 2010; Denoeud et al., 2014; Gallant et al., 2014; Mycorrhizal Genomics Initiative Consortium et al., 2015; Marvig et al., 2015; McCutcheon et al., 2009; Nagy et al., 2014; Pfenning et al., 2014; Sherwood et al., 2014; Soria-Carrasco et al., 2014). Such sets of genes tend to evolve in functionally related groups and are controlled by cis-regulatory sequences for particular transcription regulators, forming a transcription network. One way that an entire network can be up- or down-regulated is through alterations in the expression of a ‘master’ regulator, propagating a new expression level to all of its downstream target genes (Chan et al., 2010; McGregor et al., 2007; Rebeiz et al., 2009; Reed et al., 2011). However, in many cases—instead of changing the expression pattern of a master regulator—each gene in the network independently acquires the same cis-regulatory sequence, requiring hundreds of mutations across the genome (Booth et al., 2010; Borneman et al., 2007; Gasch et al., 2004; Kasowski et al., 2010; Lowe et al., 2011; Paris et al., 2013; Piasecki et al., 2010; Schmidt et al., 2010; Tanay et al., 2005; Tuch et al., 2008b). An outstanding question is how this process occurs.
Several molecular mechanisms for gene-by-gene rewiring have been proposed (Britten and Davidson, 1971; Tuch et al., 2008b). Hitchhiking of a cis-regulatory site on a transposable element is a mechanism well-documented in plant and animal evolution, and can rapidly bring genes under new regulatory control (Bourque et al., 2008; Chuong et al., 2016; Kunarso et al., 2010; Lynch et al., 2011; Schmidt et al., 2012). Alternatively, one transcription regulator can gain a protein-protein interaction with another, followed by the evolution of cis-regulatory sequences (Baker et al., 2012; Lynch et al., 2008; Pérez et al., 2014; Tsong et al., 2006). This latter mechanism is able, at least in principle, to rewire an entire set of genes at once upon evolution of the new protein-protein interaction; the individual gains of binding sites could occur secondarily. Nevertheless, many—if not most—examples of network rewiring seem to have occurred in the absence of evidence for either of these two mechanisms (Borneman et al., 2007; Gasch et al., 2004; Kasowski et al., 2010; Lowe et al., 2011; Paris et al., 2013; Piasecki et al., 2010; Schmidt et al., 2010; Tanay et al., 2005; Tuch et al., 2008a).
Here, we studied an example of transcription network rewiring in which ~100 ribosomal protein genes gained binding sites for the Mcm1 transcription regulator, and we describe evidence supporting a new mechanism for the concerted gains. The gain of Mcm1 binding sites in the ribosomal protein genes occurred in parallel in two respects: (1) the cis-regulatory sites were gained upstream of a large proportion of the ribosomal proteins in each species; and (2) they were gained independently approximately 12 – 13 times during Ascomycete evolution. At each gene, several point mutations were probably necessary to produce a close match to the 16-basepair Mcm1 binding site, thus requiring several hundred mutations across the entire gene set.
Based on results from a variety of experimental approaches, we argue that the gain of Mcm1 sites was potentiated in several different clades by the presence of an ancestral transcription regulator of the ribosomal protein genes, Rap1. We demonstrate that Rap1 and Mcm1 have the intrinsic ability to cooperate in the activation of transcription, even when artificially introduced in species in which their sites are not found together. Biochemical and genetic experiments show that both regulators interact with the general transcription factor TFIID. We propose that the intrinsic, ancient ability of both proteins to interact with a common component of the general transcription machinery facilitated the repeated evolution of the Mcm1 sites in independent lineages.
Results
Gains of functional Mcm1 cis-regulatory sites
Previously, genome-wide chromatin immunoprecipitation and bioinformatics experiments revealed that Mcm1 cis-regulatory sites evolved independently at the ribosomal protein genes (RPGs) in several yeast lineages (Tuch et al., 2008a). To expand this analysis, we used 162 sequenced fungal genomes, identified RPG regulatory regions, and searched for cis-regulatory sequences for Mcm1 and 11 other transcription regulators that are known to regulate ribosomal components in at least one species (Figure 1A, Figure 1—figure supplement 1). Mcm1 sites were found highly enriched (-log10(P) > 6) at the RPGs in six different monophyletic groups including the clades represented by Kluyveromyces lactis, Candida glabrata, and Yarrowia lipolytica, as well as the individual species Kazachstania naganishii, Pachysolen tannophilus, and Arthrobotrys oligospora. Mcm1 sites were found moderately enriched (6 > -log10(P) > 3) in the RPGs in seven additional lineages.

Repeated evolution of Mcm1 cis-regulatory sites at RPGs.
(A) The Mcm1 sites are found upstream of the ribosomal protein genes (RPGs) in several different clades in the Ascomycete fungi.The first column, colored in green, shows the proportion of ribosomal proteins in each species that contains at least one Mcm1 site at a cutoff of ~50% the maximum log likelihood score using a position weight matrix. The second column, colored in blue, shows the –log10(P) for the enrichment of these Mcm1 binding sites relative to upstream regulatory regions genome-wide, as expected under the hypergeometric distribution. Dis-enrichment values are not highlighted. The phylogeny is a maximum likelihood tree based on the protein sequences of 79 genes found in single copy in most species. Key species discussed in this paper and previous literature are highlighted with blue background indicating enrichment for Mcm1 binding sites or gray indicating no enrichment. (B) Results of two models estimating the numbers of gains and losses of Mcm1 cis-regulatory sites at the RPGs. Shown is one example tree out of 10,000 sampled for each of the two models. Gains are indicated in filled circles and losses are indicated in open circles. Major nodes with a high amount of uncertainty over all of the sampled trees (0.2 < proportion with Mcm1 sites < 0.8) are shown as pie graphs with the proportion of simulations with that ancestor having Mcm1 sites shown in green. Other nodes have a high proportion (>0.8) of trees matching the example tree. (C) Intergenic regions upstream of two ribosomal proteins in the species Kluyveromyces lactis were positioned upstream of a GFP reporter. The Mcm1 cis-regulatory sites were scrambled and the wild-type and mutant reporters were integrated into the Kl. lactis genome. Cells were grown for 6 hr in rich media and expression was measured by flow cytometry. Shown is the single-cell fluorescence distribution for three independent genetic isolates and the median (red bar), normalized by forward-scatter values. The values were divided by the average fluorescence for a cell lacking a GFP reporter (fold above background). (D) The RPL37 reporter strains were diluted into rich media and fluorescence and optical cell density (OD600) were measured every 15 min in a plate reader. Shown is the change in fluorescence between consecutive time points divided by the OD600 for eight technical replicates comprised of the three independent genetic isolates of each strain.
The pattern of yeast lineages containing Mcm1 binding sites upstream of the RPGs could occur through three general scenarios: the Mcm1 binding sites were gained multiple times, they were present in an ancient ancestor and lost multiple times, or a combination of gains and losses. To estimate how many gains and losses occurred during the evolution of Mcm1 sites, we treated the presence of Mcm1 sites (-log10(P) > 3) as a discrete character and sampled 10,000 stochastic character maps from the posterior probability distribution of each of two models (Bollback, 2006; Revell, 2012) (Figure 1B). One model assumed gains and losses to be equal in probability and the other estimated them as two different rates. Under both models, the number of gains was substantial: an average of 13.4 and 12.0 for the equal and different rates models, respectively (95% highest posterior density interval of 12 – 14 and 7 – 17 gains, respectively). The number of losses differed between the models resulting in an average of 3.0 losses for the equal rates model and 17.0 for the different rates model. These models indicate that the number of times Mcm1 sites were gained in the Ascomycete evolution was high (12 – 13 on average) whereas the number of losses is sensitive to the assumptions of the model.
Three observations support the idea that the pattern of Mcm1 cis-regulatory sequences in the RPGs require independent gains for most of the monophyletic clades and species with Mcm1 sites. First, the gain model is more parsimonious because of the sparse distribution of clades with Mcm1 sites. This is still true when taking into account the ancient interspecies hybridization that occurred in two ancient members of Saccharomycetaceae (Marcet-Houben and Gabaldón, 2015) because in most likely scenarios, the ancestors are concordant for the absence of Mcm1 sites. Second, other cis-regulatory sequences show similar patterns of evolution to that of Mcm1. The Dot6/Tod6 and Rim101 motifs are also found upstream of the RPGs in several distantly related clades, although fewer clades than for Mcm1 (Figure 1—figure supplement 1). Other regulators show clear single gains in the common ancestor of all Saccharomycotina yeasts (e.g. Rap1), or Pezizomycotina and Saccharomycotina (e.g. Cbf1) as well as losses in sparsely distributed individual clades and species. These results show that gains and losses of cis-regulatory sequences are common upstream of the RPGs, and that they can be distinguished from each other based on their distinct distributions among species (Lavoie et al., 2010; Tanay et al., 2005). Third, it was previously shown that the entire set of genes that Mcm1 regulates changes extensively over the timescale of Ascomycete evolution (Tuch et al., 2008a), suggesting that conservation from a distant ancestor would be a marked exception to this trend.
To test whether the Mcm1 cis-regulatory sites we identified upstream of the RPGs are functional, we linked several full-length RPG upstream intergenic regions to the fluorescent reporter GFP. We chose RPL37 and RPS18 from Kl. lactis, each of which has Mcm1 binding sites and measured the expression of these reporters under nutrient-replete conditions. To test whether they contributed to expression, we scrambled the Mcm1 sites. In both cases, the scrambled site reduced expression of the reporter (RPS18 p=0.017; RPL37 p=0.027; Welch’s t-test) but left the cell-to-cell variability unchanged (Figure 1C and D; Figure 1—figure supplement 2). Given that there are many known regulators of ribosomal protein transcription, this demonstrates that Mcm1 plays a non-redundant role in activating these genes in Kl. lactis under conditions that require high expression of the translational machinery.
Selection on RPG expression levels
Yeast ribosomes are composed of four ribosomal RNAs and 78 ribosomal proteins, assembled in equal stoichiometry (Woolford and Baserga, 2013). In rapidly growing cells, ribosomal protein transcripts are among the most highly expressed, and have short half-lives, leading to the estimation that approximately 50% of all RNA polymerase II initiation events occur at ribosomal proteins (Warner, 1999). These observations suggest that the expression of these genes is under strong selection as it plays a major role in energy expenditure in the cell.
Given that the Mcm1 cis-regulatory sites increase RPG transcription levels during rapid growth, there are at least two plausible hypotheses for their appearance upstream of the RPGs. (1) In those species that have acquired Mcm1 cis-regulatory sequences, the expression of the RPGs is higher than in species without these sequences. (2) The gain of Mcm1 cis-regulatory sequences compensate for other cis-regulatory changes that lower expression of the genes, with no net gain in expression levels. This second hypothesis is plausible, in principle, because RPGs are known to lose regulator binding sites as well as gain them (Ihmels et al., 2005; Lavoie et al., 2010; Tanay et al., 2005; Tuch et al., 2008a).
To distinguish between these hypotheses, we examined directly whether the RPGs have a higher expression level in species that have acquired Mcm1 sites compared to those that have not. (We performed these experiments under conditions in which we have shown Mcm1 sites are functional, but we do not rule out different evolutionary scenarios under other environmental conditions). To accurately measure differences in RPG expression, we mated different species pairs to form interspecies hybrids; we then measured mRNA levels by RNA-seq and assigned each sequencing read to one genome or the other (Figure 2A). Comparing the expression of orthologous genes in this way controls for differences in ‘trans-acting’ factors like transcription regulators and therefore reflects only differences caused by cis-regulatory changes (Wittkopp et al., 2004).

Selection on RPG expression level in Kluyveromyces.
(A) Schematic of the experimental approach to measure differences in gene expression between Kluyveromyces yeast species. Two interspecies hybrids were constructed through mating and mRNA and genomic DNA were sequenced. The differential allelic expression is the ratio of the number of reads mapping to the coding sequence of one gene vs. its ortholog in the genome of the other species. The mRNA reads for each gene were normalized to the total reads and to the genomic DNA reads mapping to the same region to control for biases introduced in the sequencing and analysis process. (B) The log2-ratio of allelic expression with the lactis allele in the numerator is shown for (left) the Kl. lactis ×Kl. marxianus hybrid (n = 3) and (right) the Kl. lactis ×Kl. wickerhamii hybrid (n = 7). Shown are histograms for ribosomal protein genes and the rest of the identified orthologs in the genome.
For this allelic expression experiment, we constructed hybrids between Kl. lactis and two other species, Kl. marxianus and Kl. wickerhamii. These two additional species are relatively closely related to Kl. lactis (such that they can form interspecies hybrids), but have fewer Mcm1 sites at their RPGs (Figure 2A), suggesting that Mcm1 binding site gains are ongoing in some lineages over this timescale. mRNA reads for each gene were normalized to genomic DNA reads to control for mappability and gene length (see methods). Expression levels and differential allelic expression of genes in the two species’ genomes were reproducible between replicates (Figure 2—figure supplement 1). At a false discovery rate of 0.05, we identified 2925 genes showing differential allelic expression out of a total of 4343 orthologs in the Kl. lactis × Kl. marxianus hybrid, and 3432 out of 4319 in the Kl. lactis × Kl. wickerhamii hybrid.
To ask broadly whether RPGs have experienced concerted cis-regulatory evolution, in each hybrid we asked whether the RPGs were more likely to show differential allelic expression in one direction relative to all genes in the genome (Figure 2B). In both hybrids, the RPGs showed evidence for cis-regulatory evolution (hypergeometric test, p=2.36e-3 for Kl. lactis × Kl. marxianus hybrid; p=1.52e-6 for Kl. lactis × Kl. wickerhamii hybrid). We conclude that in these species, the expression of the ribosomal proteins has evolved directionally as a group through cis-regulatory changes.
However, in both hybrids, the ribosomal protein genes were expressed, on average, lower in the Kl. lactis alleles than in alleles of either of the other species. This observation rules out the hypothesis that the gain of Mcm1 sites over this timescale was simply due to directional selection to increase expression of the ribosomal proteins as a whole (see also Figure 2—figure supplement 2). Instead, it favors the second hypothesis: ongoing evolution of Mcm1 sites in the Kluyveromyces clade compensates (perhaps incompletely) for other cis-regulatory changes that reduce RPG transcript levels. A compensatory model is consistent with the observation that the binding sites for other regulators (including Fhl1 and Rrn7) show a reduction in the Kluyveromyces clade (Figure 1—figure supplement 1).
We also revisited previously published yeast interspecies hybrid experiments performed with species from the Saccharomyces clade. Surprisingly, in all cases, the ribosomal proteins were among the sets of genes with reported directional cis-regulatory evolution (Bullard et al., 2010; Clowers et al., 2015; Lee et al., 2013; Martin et al., 2012). These observations demonstrate that RPG expression may frequently experience different evolutionary forces between closely related yeast species, and they are consistent with the high rate of gains and losses of cis-regulatory sequences that control RPG transcription.
Why were Mcm1 sites gained repeatedly?
Although RPGs have experienced different selection pressures across different species, this observation does not explain why Mcm1 sites, as opposed to cis-regulatory sequences for many other transcription regulators, are repeatedly gained in the lineages we examined. One possibility is that Mcm1 expression changes under certain environmental conditions, conferring optimal RPG expression levels for the Kluyveromyces clade (as well as the other clades that gained Mcm1 sites). The fact that Mcm1 is expressed across all yeast cell types and —in combination with dedicated regulators—controls many different genes that are part of different expression programs (Tuch et al., 2008a) makes this possibility unlikely and we do not explicitly examine it here.
We therefore investigated whether Mcm1 has other characteristics that allow it to gain cis-regulatory sites especially easily at the RPGs. We previously noted that, in Kl. lactis, the Mcm1 sites were gained a fixed distance away from the binding site for another transcription regulator, Rap1, whose sites at the RPGs were ancestral to Kl. lactis and S. cerevisiae (Tuch et al., 2008a). By pooling data from additional species with Rap1-Mcm1 sites, we discovered that the spacing between Mcm1 and Rap1 sites was particularly precise with peaks at 54, 65, and 74 basepairs apart, favoring a configuration with the two proteins on the same side of the DNA helix (Figure 3A–C).

Ancestral cooperativity of Mcm1 with Rap1.
(A) Schematic of Mcm1 and Rap1 cis-regulatory sites at the RPGs. (B) RPG promoters were aligned at the strongest hit to the Mcm1 position weight matrix and the relative location of Rap1 cis-regulatory sites was plotted. Sites for Rap1 with log-likelihood >6.0 are shown at 1 bp resolution. Hemiascomycete yeast (the 29 species at the top of the tree in Figure 1A) are divided into those with large numbers of Mcm1 sites at the ribosomal protein genes (purple shading) and those without (gray line). (C) Schematic using published structures of Mcm1 and Rap1 DNA-binding domains (PDB IDs: 1MNM and 3UKG) bound to DNA connected by a DNA linker corresponding to 55 bp spacing between their cis-regulatory sites. Rap1 is shown in purple and Mcm1 is shown in green. (D–H) The ability of Mcm1 to work with the ribosomal protein gene regulator Rap1 was tested using a GFP reporter. (D) The Rap1 and Mcm1 cis-regulatory sites from Kl. lactis RPS23 and RPS17 were placed in a reporter containing the S. cerevisiae CYC1 basal promoter. Reporter variants were generated by altering the spacing between these sites and by mutating the sites individually and in combination. (E, F) Reporter variants were integrated into the genome of Kl. lactis. Cells were grown for 4 hr in rich media and expression was measured by flow cytometry. (E) Shown is the mean fluorescence for at least three independent genetic isolates. The values were divided by the average fluorescence for a cell lacking a GFP reporter (fold above background). The measurements for RPS23 and RPS17 were collected on separate days and are shown on the same axes for clarity. (F) Shown is the single-cell fluorescence distribution for three independent genetic isolates and the median (red bar), normalized by side-scatter values. (G) Diagrams showing the phylogenetic distribution of ancestral or derived cooperativity. (H) The RPS23 reporter variants were integrated into the genome of S. cerevisiae, a species that lacks Mcm1 cis-regulatory sequences at the ribosomal protein genes. Cells were grown and measured as described in (F). (For the third construct, one isolate had multiple reporter insertions and was not included.).
In order to understand whether the spacing between Mcm1 and Rap1 sites affects transcriptional activation, we moved a segment of the Kl. lactis RPS23 upstream region that contains the Rap1 and Mcm1 sites into a heterologous reporter containing a basal promoter from S. cerevisiae CYC1 (Guarente and Ptashne, 1981) (Figure 3D). This construct allowed us to study the Rap1 and Mcm1 cis-regulatory sequences independent of the additional regulators of the RPGs. We systematically varied the spacing between these two cis-regulatory sequences in 2 bp increments and found that the transcriptional output remained similar in most constructs (Figure 3E). In these experiments the Mcm1 and Rap1 sites were both close to optimal, and we hypothesized that weaker sites would be more likely to reveal a spacing preference. To test this idea, we performed a second series of experiments with the Rap1 and Mcm1 cis-regulatory sequences from the Kl. lactis RPS17 gene. The binding sites from this gene are weaker matches to the Rap1 and Mcm1 position weight matrices (log odds sum of 9.34 versus 12.48 for Rap1 and 7.24 versus 14.33 for Mcm1). The transcriptional output of this second series of constructs showed sensitivity to spacing between the sites (Figure 3E).
We hypothesized that the observed sensitivity of transcriptional output to the strength and spacing of Rap1 and Mcm1 cis-regulatory sequences reflected cooperative activation of transcription. Deletion of the Mcm1 and Rap1 binding sites individually and in combination showed that the proteins KlMcm1 and KlRap1 activated transcription cooperatively in Kl. lactis: their combined expression was approximately four times the sum of their individual contributions (Figure 3F; p=5.5e-4, one sample t-test).
One likely explanation for these observations is the evolution of a favorable protein-protein interaction between Rap1 and Mcm1 that occurred around the time that Mcm1 sites appeared at the RPGs in the Kluyveromyces and C. glabrata-Nakaseomyces clades (Tuch et al., 2008b), that is a ‘derived cooperativity’ model (Figure 3G). To test this model, we placed the Kl. lactis Rap1-Mcm1 reporter into the genome of S. cerevisiae, a species that lacks Mcm1 sites at the RPGs. We found that ScRap1 and ScMcm1 activated expression of the reporter cooperatively (p=3.9e-3, one sample t-test), thereby demonstrating that these proteins have the capacity to work together even in a species where their binding sites did not evolve close proximity (Figure 3H). This result rejects the derived cooperativity model and strongly suggests that the ability of Rap1 and Mcm1 to activate transcription cooperatively was ancestral, existing even in species that did not take advantage of it in regulating the RPGs.
Mechanism of intrinsic Mcm1-Rap1 cooperativity
Having established that cooperativity of Rap1 and Mcm1 was likely ancestral to the monophyletic clade containing Kluyveromyces and Saccharomyces, we considered three possible mechanisms that could explain ancestral cooperativity: (1) Rap1 and Mcm1 could bind DNA cooperatively through an ancient, favorable protein-protein interaction, (2) they could bind nucleosomal DNA through cooperative displacement of histones, or (3) they could both bind a third transcription regulator resulting in cooperative transcriptional activation. To test the first possibility, we used a gel-mobility shift assay with a radiolabeled DNA sequence from the Kl. lactis RPS23 gene (Figure 3D). We asked whether purified full-length S. cerevisiae and Kl. lactis proteins as well as cell lysates from three additional species bound to Rap1 and Mcm1 binding sites cooperatively. In all cases, each protein bound to the RPG promoter DNA independently and did not appear to increase the other’s affinity, indicating that Rap1 and Mcm1 do not, on their own, bind DNA cooperatively (Figure 4A, Figure 4—figure supplement 1).

Mcm1 interacts with TFIID.
Experiments were performed in S.cerevisiae or using S. cerevisiae proteins to test the mechanism of Rap1-Mcm1 cooperativity. (A) Gel shift DNA binding assays were performed to test possible cooperative DNA binding between purified ScMcm1 and ScRap1. Gel shift reactions were performed by incubating 10 fmol (~7000 cpm) of a 79 bp 32P-labeled fragment of the Kl. lactis RPS23 promoter containing the Rap1 and Mcm1 binding sites (see Figure 3D) with either no protein, 2.5 fmol Rap1, 5 fmol Rap1, 10 fmol Rap1, 10 fmol Mcm1, 20 fmol Mcm1, 30 fmol Mcm1, or 2.5 fmol Rap1 with 10, 20, or 30 fmol Mcm1. Reactions also included either no cold competitor DNA or a 100-fold molar excess of cold ~20 bp DNA containing either the Rap1 WT (RWT) or Rap1 scrambled (Rsc) sequences and/or the Mcm1 WT (MWT) or Mcm1 scrambled (Msc) sequences in a final volume of 20 μl. Reactions were fractionated on non-denaturing polyacrylamide gels, vacuum dried, and imaged using a Bio-Rad Pharos FX imager. Radiolabeled species are indicated on the left (R,M-DNA = Rap1-Mcm1-DNA, R-DNA = Rap1 DNA, M-DNA = Mcm1 DNA) (B) Sypro Ruby stain of SDS-PAGE fractionated MBP (2.4 pmol) and MBP-Mcm1 (1.3 pmol) probe proteins used for Far Western protein-protein binding analyses. (C) Far Western protein-protein binding analysis of Mcm1 binding to TFIID. Purified TFIID, His6-Taf3, and His6-Taf4 were separated on two SDS-PAGE gels. One gel was stained with Sypro Ruby for total protein visualization (left panel). The other was electrotransferred to a membrane for protein-protein binding analysis (middle and right panels). Membranes were probed with either control MBP (middle panel) or MBP-Mcm1 (right panel). Binding of probe proteins to Tafs was detecting using an anti-MBP antibody. (D) Mapping the Mcm1 Binding Domain (MBD) of Taf4. Roughly equal molar amounts of His6-Taf4, His6-Taf3, GST, GST-Taf4, and GST-Taf4 deletion variants were fractionated on two SDS-PAGE gels. One gel was stained with Sypro Ruby for total protein visualization. The other gel was electrotransferred to a membrane and Mcm1-Taf protein-protein binding was assayed as described in (C) using the MBP-Mcm1 as the overlay protein. (E) Taf4 protein map indicating the location of the Taf4 Mcm1 Binding Domain mapped in this study (MBD, green) as well as the Rap1 Binding Domain (RBD, purple) mapped in a previous study (Layer et al., 2010).
A second mechanism explaining cooperative activation is through nucleosome displacement. Many transcription regulators have the inherent property of competing for binding to DNA with nucleosomes, with some more effective than others (Zaret and Carroll, 2011). However, it is unlikely that cooperative nucleosome displacement is the primary mechanism for cooperative activation by Rap1 and Mcm1 at the RPGs. Although Rap1 can indeed bind nucleosomal Rap1 binding sites (Koerber et al., 2009; Rossetti et al., 2001), and displace nucleosomes (Kubik et al., 2015; Lickwar et al., 2012; Platt et al., 2013), it remains bound at the RPGs even during stress conditions when the genes are repressed and show higher levels of nucleosome occupancy (Bernstein et al., 2004; Lee et al., 2004). Furthermore, a general mechanism for the cooperative assembly of all transcription regulators has little explanatory power for why cis-regulatory sites for Mcm1 were repeatedly gained across RPGs rather than sites for many of the ~250 other regulators coded in the yeast genome.
We next considered the third possibility, namely that the cooperative transcriptional activation we observe for Rap1 and Mcm1 occurs through the interaction of both with a third factor that catalyzes a rate-limiting step in transcription activation (Lin et al., 1990). In principle, a regulator could activate transcription through contacts with general transcription factors, Mediator, SAGA, various chromatin remodeling complexes, or RNA polymerase itself. To identify possible factors that might directly bind both Rap1 and Mcm1, we searched the S. cerevisiae BioGRID database for common interaction targets of both proteins (Oughtred et al., 2016). Two complexes, SWI/SNF and TFIID, fit this criterion. SWI/SNF is not required for ribosomal protein transcription (Sudarsanam et al., 2000), so it is unlikely that binding to SWI/SNF plays a role in Rap1 and Mcm1 cooperativity at the RPGs.
TFIID is a general transcription factor whose direct interaction with Rap1 is required for RPG transcription in S. cerevisiae (Garbett et al., 2007; Layer et al., 2010; Mencía et al., 2002; Papai et al., 2010; Reja et al., 2015). Furthermore, TFIID activates transcription through contacts with RNA Polymerase II, and its binding to the promoter is a rate-limiting step in the activation of TFIID-dependent genes (Wu and Chiang, 2001), such as the RPGs. The mechanism of how activators increase transcription rate through contacts with TFIID is not entirely understood but occurs either through a structural rearrangement of the complex or simply by an increase in its occupancy on DNA (Coleman et al., 2017; Fuda et al., 2009; Nogales et al., 2017; Papai et al., 2010; Sauer et al., 1995a; Sauer et al., 1995b).
The interaction of Rap1 with TFIID has been extensively documented (Garbett et al., 2007). Of particular importance is the interaction between the ‘activation domain’ of Rap1 and the Taf5 subunit of TFIID, as mutations that compromise this interaction strongly reduce ribosomal protein gene transcription (Johnson and Weil, 2017). To test directly whether Mcm1 also binds to TFIID (as suggested by mass spectrometry-based analyses of TFIID and associated proteins [Sanders et al., 2002]), we performed a Far-Western protein-protein binding assay using purified S. cerevisiae TFIID, ScMcm1, and individual TFIID subunits. We found that Mcm1 binds directly to the Taf4 subunit (Figure 4B,C); using deletions of Taf4, we further mapped the interaction to the N-terminal region of Taf4 (Figure 4D,E). Although Rap1 also binds to Taf4 (in addition to Taf5 and Taf12), its target is in the C-terminus of this subunit, distinct from the Mcm1 interaction site.
The finding that Rap1 and Mcm1 both interact with distinct domains of a common component of the transcription machinery, one whose assembly at the promoter is rate limiting, provides a simple explanation for their ability to activate transcription cooperatively. To test this idea explicitly, we performed a series of experiments in vivo. Because Rap1 is an essential gene, we took advantage of a version of Rap1 with altered DNA-binding specificity (Rap1AS) that binds to a non-natural cis-regulatory sequence and confers expression of a reporter (Johnson and Weil, 2017). Rap1AS could then be manipulated without compromising the function of the endogenous Rap1. As shown in Figure 5A–C, Rap1AS shows cooperative transcriptional activation with Mcm1. When the activation domain of Rap1AS was mutated by introducing seven point mutations (Rap1AS7Ala) its ability to activate transcription of a Rap1 reporter strongly decreased, but was not entirely eliminated (Johnson and Weil, 2017). When introduced in the presence of the Rap1-Mcm1 HIS3 reporter, these mutations strongly reduced growth on media containing 3-aminotriazole (3-AT), a competitive inhibitor of the HIS3 gene product (Brennan and Struhl, 1980), indicating that efficient expression requires an intact Rap1 activation domain (Figure 5D–F).

Mcm1-Rap1 cooperative activation requires Rap1-TFIID contacts.
(A) A series of reporter constructs were designed to test the mechanism of Rap1-Mcm1 cooperative activation. Experiments were performed in S. cerevisiae using S. cerevisiae proteins. (B) Growth analysis of yeast strains carrying the UASRap1-Mcm1 reporter (containing a fragment of the Kl. lactis RPS23 promoter) indicated in the diagram and either an altered DNA-binding specificity Rap1 variant (Rap1AS, magenta) or a second copy of Rap1WT (purple). To perform these analyses, yeast were grown overnight to saturation, serially diluted 1:4 and spotted using a pinning tool onto either non-selective media plates (+His) or plates containing 3-Aminotriazole (+3 AT), which selects for expression of the HIS3 reporter gene. Plates were incubated for 3 days at 30° C and imaged using a Bio-Rad ChemiDoc MP imager. (C) Immunoblot analyses of the expression levels of Myc-tagged Rap1WT and Myc-tagged Rap1AS (Myc IB) compared to actin (Actin IB) and total protein (Ponceau S) loading controls. (D) Rap1 protein map indicating the ScRap1 AD mapped to a location C-terminal of the Rap1 DBD and the seven key AD amino acids. These amino acids were mutated to alanine to inactivate the ScRap1 AD and create the Rap1AS 7Ala mutant variant. (E) Growth analyses performed using yeast carrying the UASRap1AS-Mcm1-HIS3 reporter and either Rap1AS, a second copy of Rap1WT, or Rap1AS7Ala performed as described in ‘B.’ (F) Immunoblot analysis of the Rap1 forms tested in (E) performed as described in (C).
Taken together, these experiments indicate that the cooperative transcriptional activation by Rap1 and Mcm1 at the RPGs in these clades is due to both proteins interacting with TFIID, a component of the general transcriptional machinery. This idea explains how Mcm1—and not a random mixture of other regulatory proteins—came to be repeatedly gained at the ribosomal protein genes.
Implications and predictions of the intrinsic cooperativity model
We tested several evolutionary and molecular predictions of this model. One prediction is that Rap1 and Mcm1 cis-regulatory sequences would occur at the prescribed distance apart but located at other genes besides the RPGs. We searched a subset of hemiascomycete yeast genomes for Rap1-Mcm1 sites with similar spacing and orientation to that observed in the RPGs (Figure 6—figure supplement 1). Most species analyzed had only a few such genes of questionable significance, but Ka. naganishii had 33, showing that Rap1 and Mcm1 sites can evolve at genes other than the RPGs.
A second prediction of our model is that even a sub-optimal Mcm1 site would activate transcription of a regulatory region with an ancestral, strong Rap1 site. Because sub-optimal sites can evolve de novo with much higher probability than optimal sites (Dermitzakis and Clark, 2002; Stone and Wray, 2001), this property would increase the ease by which a large number of functional Mcm1 sites could arise during evolution. To test this prediction, we created a series of GFP reporter constructs with Mcm1 binding sites that drive different levels of expression (Acton et al., 1997). We tested the expression level of each of these Mcm1 sites in the presence and absence of a neighboring Rap1 site in Kl. lactis (Figure 6A). Consistent with the prediction, most of the sub-optimal Mcm1 sites displayed near wild-type levels of expression in the presence—but not in the absence—of neighboring Rap1 sites.

Evolutionary implications of intrinsic cooperative activation.
(A) A series of reporters were designed to test the transcriptional activation of a weak Mcm1 site in the presence and absence of a Rap1 site. A series of Mcm1 cis-regulatory sites were chosen with a range of affinities that correlate with transcription rate (Acton et al., 1997). The order of the sequences shown corresponds to their expression level on the x-axis. These sites were introduced to the S. cerevisiae CYC1 reporter and tested with (y-axis) and without (x-axis) an upstream Rap1 binding site. Cells were grown and measured as described in Figure 3E. The expression level of the WT RPS23 operator is shown as a dotted line. (B) A computational analysis was designed to detect evolution of Mcm1 sites at fixed distances from other ribosomal protein regulators. Ribosomal protein gene promoters were aligned at the strongest hit to the Mcm1 position weight matrix and the relative location of cis-regulatory sites for other transcription regulators was plotted. (C) The shading in each rectangle represents the proportion of ribosomal protein gene promoters in that species that have the given cis-regulatory site in that 10 bp interval. The clades with a large number of Mcm1 cis-regulatory sequences are shown with black boxes.
Finally, we asked whether our conclusions are generalizable to other pairs of transcription factors besides Mcm1 and Rap1. In particular, some of the clades identified in Figure 1A gained Mcm1 sites at RPGs that lack Rap1 sites. To address this question, we centered each RPG intergenic region on the best Mcm1 site and plotted the location of other RPG regulators relative to Mcm1 (Figure 6B). This analysis revealed that, in some of these clades, Mcm1 sites were gained varying distances away from the pre-existing sites of other regulators (Tbf1, Rrn7, and Fhl1), suggesting that the evolutionary mechanism we identified with Rap1 and Mcm1 might be a generalizable to other instances of cis-regulatory evolution (Figure 6C). Consistent with this idea, all three of these regulators are reported to interact (directly or indirectly) with TFIID (Gavin et al., 2002; Knutson et al., 2014; Mallick and Whiteway, 2013; Zhong and Melcher, 2010).
In summary, the experiments we have presented describe a special relationship between Rap1 and Mcm1 by virtue of their interaction with different surfaces of the same rate-limiting component of transcription, TFIID. This relationship between Rap1 and Mcm1 is ancient, and in the next section, we discuss how this property can predispose transcription networks to evolve repeatedly along the same trajectory.
Discussion
Here we have investigated an example of parallel evolution where binding sites for a particular transcription regulator (Mcm1) were gained in a large group of genes (the ribosomal protein genes, RPGs). These gains occurred repeatedly in several independent fungal lineages. In three of these lineages, Mcm1 binding sites were gained a fixed distance from the sites for another transcription regulator, Rap1, and we show that these newly acquired Mcm1 sites are required for full activation of the RPGs. We also show that Mcm1 and Rap1 cooperatively activate these genes. The direct interaction of both of these regulators with a common target, the general transcription factor TFIID, provides a plausible mechanism for this cooperative transcription activation. It also explains why the ability of Rap1 and Mcm1 to work together was ancestral to the more recent gains of Mcm1 sites adjacent to Rap1 sites at the RPGs.
How do these observations account for the fact that Mcm1 sites (as opposed to sites for other regulators) were repeatedly gained in parallel next to the Rap1 site at the RPGs? And how do they account for the distance constraints? One common explanation for parallelism is a specific environmental adaptation that occurs through a similar molecular mechanism. However, the yeast species with Mcm1 sites at the RPGs are from diverse ecological niches (e.g. plant leaves, mangrove sediment, the human body, soil) and utilize different nutrient sources (e.g. lactate, xylose, feline skin, nematode predation), defying a specific environmental adaptation explanation. Consistent with this view is our observation, based on analyzing interspecies hybrids, that a species in which the Mcm1 sites were gained at the RPGs does not express these genes at higher levels than related species that evolved fewer Mcm1 sites.
The model that best fits all of our data holds that the parallel gains arose from the ease with which the functional Mcm1 sites (and not the sites of other regulators) appeared in evolution, rather than selective pressure for particular adaptation. Specifically, we propose that, in the Kluyveromyces-Saccharomyces ancestor (before the parallel gains of Mcm1 sites) Rap1 bound to the ribosomal protein genes and activated transcription through interactions with TFIID, as it does in extant species. We further propose that, in the ancestor, Mcm1 activated non-ribosomal genes by interaction with a second site on TFIID, as it does in the extant species S. cerevisiae. Any suboptimal Mcm1 site that arose by chance point mutation at a specified distance from a Rap1 site would immediately be functional (Figure 7) because, as we show, even a weak Mcm1 DNA interaction would be stabilized by Mcm1’s intrinsic ability to directly bind TFIID (see Figure 4C and 6A). In this way, even suboptimal cis-regulatory sequences (which are much more likely to appear by chance than optimal sites [Dermitzakis and Clark, 2002; Stone and Wray, 2001]) could form under selection. The appearance of Mcm1 sites likely occurred concomitantly with the gradual losses of other cis-regulatory sites in the RPGs; in other words, the Mcm1 cis-regulatory sites would fall under selection as other cis-regulatory sites deteriorated by mutation. In essence, we propose that the free energy gained from the intrinsic interaction between Mcm1 and TFIID would favor formation of new Mcm1 sites at the expense of pre-existing cis-regulatory sequences, particularly since the latter provide a larger target for inactivating mutations. This model accounts for why Mcm1 sites (and not those of other transcription regulators) were repeatedly gained at the RPGs and why the distance between Rap1 and Mcm1 sites is constrained in those species in which the gains occurred.

Model for evolution of cis-regulatory sites through intrinsic cooperativity.
Multiple gains of new Mcm1 sites occur in the ribosomal protein genes because Rap1 and Mcm1 both bind to TFIID, a general transcription factor. Due to the intrinsic cooperativity of Rap1 and Mcm1 (which is ancestral to the gains of Mcm1 cis-regulatory sequences) the evolution of even a weak Mcm1 site near an existing Rap1 site would produce an effect on transcription. Because they are more likely to be functional, weak Mcm1 cis-regulatory sequences are preferentially retained in the population if they arise at a specified distance (as determined by the shape of TFIID) from Rap1 cis-regulatory sequences. These sites would be preserved if there is direct selection to increase RPG expression, or if they are combined over time with the mutational degradation of other regulatory elements that bring the Mcm1 site under purifying selection.
Numerous experimental observations support this model and rule out alternative explanations: (1) in extant species, Rap1 and Mcm1 both interact with TFIID; (2) They interact with different parts of TFIID; (3) cooperative transcriptional activation by Mcm1 and Rap1 requires the activation domain of Rap1, which is known to interact with TFIID; (4) the spacing between Rap1 and Mcm1 sites in the ribosomal protein genes places the proteins on the same side of the helix but at least 50 bp apart, consistent with a physical interaction with a large complex; (5) engineered suboptimal Mcm1 sites are functional as long as they are adjacent to Rap1 sites; (6) Mcm1 and Rap1 have the intrinsic ability to cooperate (through interactions with TFIID) even in a species where Mcm1 sites were not gained at the RPGs.
We note that this model does not require any change in Rap1 or Mcm1 during the gains of Mcm1 sites at the RPGs. Presumably, Rap1 and Mcm1 activated many genes separately in the ancestor, thus preserving by stabilizing selection their ability to interact with TFIID. This idea is consistent with the observation that the two proteins were able to cooperate on artificial constructs introduced into S. cerevisiae even though their binding sites are not found together at the RPGs. Mutations that alter the function of transcription regulators (for example, creating a new protein-protein interaction) can be pleiotropic, decreasing the likelihood that they can arise without disrupting the proteins' ancestral functions (Carroll, 2005; Stern and Orgogozo, 2008). (We note that the transcriptional output was slightly more cooperative in Kl. lactis than in S. cerevisiae, leaving open the possibility that additional fine-scale evolutionary changes may have occurred in how these proteins interact.) According to our model, the ability of the two regulators to work together was ancestral, part of each protein’s intrinsic mechanism of transcriptional activation; therefore, their coupling at the RPGs would avoid such pleiotropic changes.
How does this model account for the gains of Mcm1 sites observed in clades where Rap1 does not regulate the RPGs? In these cases, Mcm1 cis-regulatory sequences also show preferred spacing relative to known regulators of the RPGs, specifically Tbf1 and Rrn7, and we propose that the same type of cooperativity with TFIID can also account for these cases. These two regulators are reported to interact with TFIID (Knutson et al., 2014; Mallick and Whiteway, 2013). Indeed, TFIID occupies the promoters of RPGs in human cell lines as well (ENCODE Project Consortium et al., 2012), raising the possibility that TFIID is a conserved general activator of the RPGs across fungi and animals, while the specific transcription regulators that interact with TFIID simply interchange over this timescale.
While our cooperative activation model provides an explanation for the parallel acquisition of Mcm1 cis-regulatory site evolution, selection must have operated to preserve the Mcm1 sites as they arose in the population. As described earlier, we favor a model where the gains of Mcm1 sites compensated for the degradation of other cis-regulatory sequences and thereby fell under selection. The allelic expression data (Figure 2) strongly supports this model for one clade, represented by Kluyveromyces species. However, it is also possible that, in other clades or over shorter timescales, the Mcm1 sites could have been gained due to selection for higher levels of RPG expression. The widespread differences in RPG expression revealed by the published interspecies hybrid experiments suggest that RPG expression may experience strong and shifting selection, helping to account for the surprising observation that transcriptional regulators that control the RPGs vary substantially across species (Gasch et al., 2004; Lavoie et al., 2010; Mallick and Whiteway, 2013; Tanay et al., 2005; Tuch et al., 2008a; Zeevi et al., 2014; Zeevi et al., 2011). We note that the intrinsic cooperativity model is sufficient to explain the gain of Mcm1 sites whether or not any change in selection occurs: if functional Mcm1 sites are relatively easy to form (because of intrinsic cooperativity) they will be favored over gains of other cis-regulatory sequences by mutational processes alone. Mutations creating Mcm1 sites and weakening other sites could occur sequentially in either order or simultaneously (on the same haplotype), depending on the expression requirements of the RPG at that point in its evolutionary history. We feel that it is likely that most or all of the selection scenarios outlined above have occurred in at least one RPG at some point during the multiple and ongoing gains of Mcm1 sites over millions of years of fungal evolution. While the precise circumstances of each Mcm1 site gain are unknown, intrinsic cooperativity biases the RPGs as a whole toward gaining these sites.
In conclusion, we have proposed a mechanism, supported by multiple lines of experimental evidence, to account for convergent regulatory evolution of a large set of genes. Although, on the surface, the parallel gains of Mcm1 sites at the ribosomal genes would seem to require a special evolutionary explanation, our model does not require an extraordinary mechanism beyond individual point mutations in the cis-regulatory region of each gene. However, the ancestral ability of the two key regulators to activate transcription simplifies the path to gaining these sites by producing a phenotypic output from even non-optimal sites. Mcm1, because of its intrinsic ability to cooperate with Rap1, can significantly activate transcription at the ribosomal proteins more easily than it would elsewhere in the genome; likewise, Mcm1 (or another regulator that interacted with TFIID) would be preferred at the RPGs over regulators that did not share this common direct protein interaction. Thus, the intrinsic cooperativity of Rap1 and Mcm1 ‘channels’ random mutations into functional Mcm1 cis-regulatory sequences, accounting for the observed parallel evolution. We speculate that gene activation through intrinsic cooperativity may be a general mechanism to explain the rapid and ubiquitous rewiring of transcription networks.
Materials and methods
Computational genomics of cis-regulatory sequences
Request a detailed protocolGenomes were compiled from the Yeast Gene Order Browser (YGOB) (Gordon et al., 2011), the Department of Energy Joint Genome Institute MycoCosm portal (Grigoriev et al., 2014), and individual genome releases (Supplementary file 3). The Kl. wickerhamii, Kl. marxianus, and H. vinae genomes were annotated using the Yeast Gene Annotation Pipeline associated with YGOB. The annotations of genes and proteins in other genomes were obtained from the source of the genome sequence. RPGs were defined as any gene starting with ‘Rps’ or ‘Rpl’ in S. cerevisiae and were identified using the ortholog annotation in YGOB (for species contained in this repository). These genes were identified in genomes from other sources through psi-blast in BLAST+ (Camacho et al., 2009), based on their high conservation.
The species tree was created as described (Lohse et al., 2010). In total, 79 single-copy orthologs were chosen to create the species tree based on the ortholog mapping repositories YGOB and Fungal Orthogroups (Byrne and Wolfe, 2005; Wapinski et al., 2007). They were aligned individually using MUSCLE with default parameters (Edgar, 2004). For species that were not included in these repositories, the corresponding ortholog was identified using psi-BLAST (Altschul et al., 1997). Using the orthologs from every species, the genes were then re-aligned using MUSCLE with default parameters and concatenated into a single alignment. To calculate the tree topology, FastTree 2.1.8 was used with Blosum45 matrix, the JTT model with 20 rate categories, and two rounds of +NNI +SPR (default parameters).
Intergenic regions were extracted upstream of each gene using the python script intergenic.py. For scoring of potential transcription factor binding sites, motifs were obtained from the ScerTF database (Spivak and Stormo, 2012) as position weight matrices. Each intergenic region (in the ribosomal proteins or genome-wide, depending on the question) was scored using the script TFBS_score.py by adding up the log likelihood values of each base at each position in the motif, forward and backward, and then repeating for each position in the intergenic region. It is important to note that the motifs from ScerTF are corrected for the GC-content of the S. cerevisiae genome (or the genome(s) from which the matrix is derived), but not individually for the genomes of each species in other parts of the tree. The motif was left as-is instead of correcting for the GC-content in each genome, because the purpose of this scoring was to identify DNA sequences that are most similar to the Mcm1 binding site, not those that are most statistically enriched given the GC-content. Calculating enrichment of the binding site in ribosomal protein genes relative to the rest of the genome was done to take into account forces (including, but not limited to GC-content) that affect the prevalence of the motif genome-wide. After determining the conclusions were unchanged when using 500, 1000, and 1500 bp for the maximum length of an intergenic region, the length of 1000 bp was chosen. A cutoff was chosen for presence of the cis-regulatory site (Figure 1—figure supplement 1).
Ribosomal protein gene intergenic regions were also screened for motifs that were not present or were more information-rich than the corresponding motif in the ScerTF database. This was done by querying the intergeneic regions in each species using MEME with the zero-or-one occurrence per sequence option. From these results the following motifs were chosen: Cbf1 from Spathaspora passalidarum, Rrn7 from Arxula adeninivorans, Tbf1 from Ascoidea rubescens and Schizosaccharomyces japonicus, Hmo1 from Lachancea thermotolerans and a widespread but previously unidentified motif from Botrytis cinerea. The Dot6, Fhl1, Rap1, Mcm1, Rim101, Sfp1, and Stb3 motifs were obtained from ScerTF.
To estimate the number of gains and losses, species were categorized based on whether their RPGs contained Mcm1 sites (-log10(P) > 3) or not. We used phytools to simulate 10,000 stochastic character maps under the equal rates model (ER) and a different-rates model (ARD) assuming Mcm1 sites were a discrete character (Bollback, 2006; Revell, 2012). These trees were used to determine the number of gains and losses of Mcm1 sites under each model and find which ancestral nodes were likely to have Mcm1 sites at the RPGs and which were uncertain.
Previous studies have indicated that many features of the RPGs are defined by their relative location to the Rap1 binding site (Reja et al., 2015). To identify the relative locations, the best hit for the Rap1 site in each species was identified, then a cutoff was set (approximately half of the maximum potential score for a given position weight matrix) for the motif of the second transcription factor. The location of a binding site was defined as the midpoint of the motif. This analysis was carried out using the scripts TFBS_score.py and rel_locs_RPs.py.
Genome-wide scoring of relative motif locations using rel_loc_RPs.py was used to identify additional genes beyond the RPGs that show a similar pattern of Rap1 and Mcm1 sites near to each other. Rap1 sites that faced toward the gene and had a score greater than 6.0 were identified, as were Mcm1 sites with a score above 6.0. Then, genes that had both a forward-facing Rap1 site and an Mcm1 site between 52 and 78 bp downstream (the spacing of most sites at the RPGs) were identified. The order of Rap1 and Mcm1 cis-regulatory site appearance in evolution was inferred by the distribution of the sites in closely related species with available genome sequences.
Strain and reporter construction
Request a detailed protocolThe GFP and HIS3 reporters used in K. lactis and S. cerevisiae have been previously described (Garbett et al., 2007; Mencía et al., 2002; Sorrells et al., 2015). These reporters allow full intergenic regions to be cloned upstream of GFP or HIS3, with the break between the original gene sequence and the reporter gene occurring at the start codon. A second version of the GFP reporter uses the CYC1 promoter from S. cerevisiae (Guarente and Ptashne, 1981) with its upstream activation sequence replaced by two restriction enzyme sites. Different versions of these vectors were made to integrate into the K. lactis genome and into the S. cerevisiae genome. Plasmids and strains are reported in Supplementary files 1 and 2.
To make the full-length RPG GFP reporters, the wild-type intergenic regions were obtained by PCR with ExTaq (Takara) from genomic DNA, flanked by directional restriction enzyme sites for SacI and AgeI. These were cloned using a 2:1 ratio into pTS16 digested with the same restriction enzyme sites, and ligated using Fastlink ligase (Epicentre) to make pTS170 and pTS174.
To scramble Mcm1 and Rap1 binding sites, these sites were put into a text scrambler, then the resulting sequences were queried in ScerTF (Spivak and Stormo, 2012) to see if they contained matches to any other known transcription factor binding site motifs. If not, they were used for further experiments.
To make pTS171, pTS175, and pTS243-246 DNA sequences were synthesized containing scrambled Mcm1, Rap1 sites, or both. The insert for pTS171 and pTS243-244 were cloned into pTS170 using the restriction enzymes SacI and Bsu36i. The insert for pTS175 and pTS245-246 was cloned into pTS174 using SacI and EcoRV.
The vectors pTS176-179 contain the K. lactis RPS23 Rap1-Mcm1 operator upstream of the CYC1 promoter. These vectors were made by annealing oligos and ligating them in a 50:1 ratio into pTS26 digested with NotI and XhoI. The equivalent vectors for S. cerevisiae are pTS181-184 and were made by cloning into pTS180 with NotI and XhoI. The constructs testing the spacing between Rap1 and Mcm1 sites (pTS189-203 and pTS209-224) were cloned using the same approach. For pTS189-203 the intervening sequence between the sites was partially duplicated for some constructs to increase the spacing. For pTS209-224 the endogenous spacing was 80 bp so the entire series was made with deletions starting immediately downstream from the Rap1 site.
The reporters testing how weak Mcm1 sites cooperate with Rap1 were cloned by first adding a BamHI site along with a palindromic Mcm1 site (Acton et al., 1997) into the reporter containing the K. lactis RPS23 Rap1-Mcm1 operator to make pTS247 and pTS248 (with a scrambled Rap1 site). Then variants of the palindromic site containing point mutations were cloned into these two vectors using BamHI and XhoI. For each variant, two point mutations were made to preserve the palindromic nature of the Mcm1 binding site (Acton et al., 1997).
These reporters were digested with KasI and HindIII and integrated into the K. lactis genome by transformation as previously described (Kooistra et al., 2004) and into the S. cerevisiae genome using a standard lithium-acetate transformation. Yeast were grown on non-selective media for 24 hr then replica plated onto plates containing at least 100 μg/mL Hygromycin B. For the spacing and weak Mcm1 reporter series, the reporters are enumerated in the plasmid list but not the strain list. This is because they were transformed into K. lactis, tested, then discarded due to their large numbers. For each reporter, four independent isolates were measured, but isolates where the full reporter had not integrated were discarded, resulting in 3 or four replicates per construct.
The HIS3 reporters were generated by performing PCR on the equivalent GFP reporters to generate wild-type, Rap1AS, and scrambled versions of the RPS23 fragment containing Rap1 and Mcm1 binding sites with NcoI and SacII restriction sites on the ends. These fragments were cloned in a 3:1 ratio into similarly digested UASRap1WT-HIS3 reporter plasmid (Garbett et al., 2007; Mencía et al., 2002). These reporters were digested with SpeI and SalI and integrated into the S. cerevisiae genome via lithium-acetate transformation. Transformants were selected on media lacking TRP1 and integration at the correct locus was confirmed via PCR.
GFP reporter assays
Request a detailed protocolK. lactis and S. cerevisiae reporters were grown overnight in 1 mL cultures in 96 well plates in synthetic complete media. The next day, cells were diluted into synthetic complete media to OD600 = 0.025 – 0.05 and grown for 3 hr. Cells were measured by flow cytometry on a BD LSR II between 3 hr and 4 hr after dilution. A total of 10,000 cells per strain were recorded. Cells were gated to exclude debris, and the mean fluorescence for each strain was used for comparing among different strains. For each reporter, three to four independent isolates were checked, as we were interested in large differences and standard deviations within samples were small. In the case that one of the isolates anomalously showed expression equivalent to background, while the other isolates showed similar but detectable fluorescence, the anomalous isolate was excluded from analysis. Experiments were performed a minimum of two times on different days. Strains were not blinded for data collection or analysis.
HIS3 reporter assays
Request a detailed protocolHIS3 reporter expression in S. cerevisiae was scored by growth assays performed on three independent biological replicates. In these assays, S. cerevisiae were grown overnight to saturation and serially diluted 1:4 in sterile water in 96-well plates. These dilution series were spotted using a pinning tool (Sigma) onto Synthetic Complete (SC) media (0.67% (w/v) yeast nitrogen base without amino acids, 2% (w/v) dextrose, 0.2% (w/v) amino acid dropout mixture) either with His (+His, non-selective media) or without His and with 3-aminotriazole (-His + 3 AT, selective media). Plate images were acquired using the ChemiDoc MP Imager (Bio-Rad) and processed using ImageLab software (Bio-Rad) after 2 days of growth at 30˚C.
Interspecies hybrids
Request a detailed protocolTo construct the interspecies hybrids for allelic expression measurements, multiple isolates of different species were mated together. Strains with complementary markers were grown on YEPD plates for 2 days, then mixed together in patches on 5% malt extract plates for 2 – 4 days. These patches were then observed under the microscope to check for zygotes and streaked out onto plates that select for mating products. Hybrids were tested by PCR for products that were species, and mating-type specific. Kl. lactis × Kl. dobzhanskii matings were attempted, but were unsuccessful, perhaps because the Kl. dobzhanskii isolate used was an a/α strain. Kl. lactis × L. kluyveri zygotes were observed when the two species were mixed with Kl. lactis alpha-factor, but no mating products were obtained. One cross of Kl. lactis and Kl. aestuarii produced zygotes and mating products, but the Kl. aestuarii isolate turned out to be an isolate of Kl. wickerhamii instead (discovered upon genome-sequencing). In all, three Kl. lactis × Kl. wickerhamii matings, and one Kl. lactis × Kl. marxianus mating—each with three isolates—were obtained and carried forward for analysis. Kl. lactis × Kl. wickerhamii hybrids are yTS347 and yTS349 (two matings between yLB13a and yLB122), and yTS353 (a mating between yLB72 and yLB66c). The Kl. lactis × Kl. marxianus mating was yTS352 (a mating between yLB72 and CB63). Sample sizes were determined by the number of samples in our sequencing kit and the number of isolates we recovered from matings.
Both mRNA and genomic DNA were sequenced from the hybrids. Genomic DNA of one isolate of each of the Kl. lactis × Kl. wickerhamii hybrids (yTS347, yTS349 and yTS353) was sequenced, along with all three isolates of the Kl. lactis × Kl. marxianus hybrid, and one isolate of each of the parental strains. Cells were grown in 5 mL cultures overnight in YEPD. Genomic DNA was prepared using a standard ‘smash and grab’ protocol, where cells are lysed with phenol/choloroform/isoamyl alcohol and glass beads. DNA was precipitated twice and treated with RNase A, then sheared on a Diagenode Bioruptor for 2 × 10 min (30 s on 1 min off) on medium intensity. Genomic DNA was prepared for sequencing using the NEBNext Ultra DNA Prep Kit for Illumina E7370 (New England BioLabs).
All three isolates of each of the four hybrids were prepared for mRNA sequencing. Cells were grown overnight, then diluted back into YEPD to OD600 = 0.2 and grown for 4 – 8 hr until they reached OD600 = 0.7 – 1.0. The growth rate of the Kl. lactis ×Kl. wickerhamii hybrids was slower and more variable between isolates. At this point, cells were pelleted and frozen in liquid nitrogen. mRNA was extracted using the RiboPure kit AM1926 (Applied Biosystems). Polyadenylated RNAs were selected using two rounds of the Oligotex mRNA Kit 70022 (Qiagen). The samples were then concentrated using the RNA Clean and Concentrator-5 (Zymo Research).
Libraries were prepared using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina E7420 (New England BioLabs). mRNA and library quality were assessed using a Bioanalyzer 2100 (Agilent). Sequencing was performed at the University of California, San Francisco Center for Advanced Technology on an Illumina HiSeq 4000.
Allelic expression analysis
Request a detailed protocolRaw sequencing data was checked for quality control using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Next, each of the genomic DNA isolates was aligned separately to each of four genomes: Kl. lactis, Kl. marxianus, Kl. wickerhamii, and Kl. aestuarii. In each case, reads uniquely mapped to the expected genome(s) and few reads mapped to other genomes. The strain yTS349 was previously thought to be a Kl. lactis ×Kl. aestuarii hybrid, but sequencing revealed it was in fact a Kl. lactis ×Kl. wickerhamii hybrid, and was treated as such for the analyses.
The genomes for Kl. lactis, Kl. marxianus, and Kl. wickerhamii were annotated using the yeast genome annotation pipeline from YGOB to standardize the gene annotation and ortholog assignment. Kl. lactis is already included in YGOB. These files were converted to gff format using convert_YGAP_GFF.py and genes were extracted using pull_genes.py. Next, hybrid genomes were created in silico by concatenating fasta sequences of the genes of each species. mRNA and gDNA reads were aligned to the hybrid genomes on a computer cluster using the script aln_reads.py, which calls Bowtie 2 (Langmead and Salzberg, 2012). Default parameters were used, which allow mismatches in Bowtie 2. However, reads that mapped equally well to multiple locations in the genome were removed after alignment using discard_multimapping.py. Because ribosomal proteins are highly conserved, they contain stretches of more than 50 bp that are identical between the orthologs belonging to each species in the hybrid. Thus, this filtering step is necessary to assure reads map uniquely to the ortholog from one species or the other.
To quantify differential allelic expression, the reads aligning to each gene were counted using ASE_server.py. mRNA counts of the three Kl. lactis × Kl. marxianus replicates and seven of the nine Kl. lactis × Kl. wickerhamii hybrids that were sequenced were highly similar. The two other isolates showed that one of the genomes in the hybrid was present at a lower level than the other suggesting it had been lost from some of the cells (although each of the two isolates lost a different parental genome). The seven reproducible isolates were then treated as replicates for the rest of the analysis. Genome sequencing also revealed that there were two copies of the Kl. marxianus genome in each of the Kl. lactis × Kl. marxianus hybrids, suggesting that our parental Kl. marxianus strain was a diploid. The mRNA read counts for each gene in each replicate were then normalized to the total reads in the experiment. Second, they were divided by the gDNA read counts from each gene, thus controlling for the effect of two Kl. marxianus genomes in the Kl. lactis × Kl. marxianus hybrids. (The gDNA read counts per gene were averaged across replicates for each hybrid, so all the replicates were divided by the same gDNA count). Finally, the Kl. lactis ortholog read count was divided by either the Kl. marxianus or the Kl. wickerhamii ortholog to get a differential allelic expression value for each ortholog pair in each replicate.
To calculate significance of differential allelic expression, a two-sided one-sample t-test was used on the log2(differential allelic expression), across replicates (three replicates in the case of the Kl. lactis × Kl. marxianus hybrid and seven in the case of the Kl. lactis × Kl. wickerhamii hybrid). The significance of each gene was calculated using the Benjamini-Hochberg procedure to control the false-discovery rate at 0.05. To test for concerted differential allelic expression in the ribosomal proteins, as well as across all gene ontology (GO) terms, the geometric mean of each group of genes was calculated, then tested by the hypergeometric test to see if they were enriched for genes at least 1.1-fold up or down in Kl. lactis. Altering this fold cutoff had little effect on the results. These tests were carried out using the ASE_local.py script.
Protein expression and purification
Request a detailed protocolHis6-Rap1 (S. cerevisaie) used in main text gel shift experiments was expressed using a previously generated pET28a-Rap1 expression vector in Rosetta II DE3 E. coli and purified as described (Johnson and Weil, 2017). In brief, after inducing expression in E. coli cells grown to an OD600 of 0.5 – 1 for 4 hr with 1 mM IPTG at 37°C, cell pellets from 500 mL of culture were resuspended in 20 ml of Rap1 Lysis/Wash buffer (25 mM HEPES-NaOH (pH 7.6), 10% v/v glycerol, 300 mM NaCl, 0.01% v/v Nonidet P-40, 1 mM Benzamidine, 0.2 mM PMSF) and lysed via treatment with 1 mg/mL lysozyme and sonication. Lysate cleared via centrifugation was incubated with 2.5 mL Ni-NTA agarose (Qiagen) equilibrated with Rap1 Lysis/Wash buffer for 3 hr at 4°C to allow for His6-Rap1 protein binding. Following 3 washes with Rap1 Lysis/Wash buffer, Ni-NTA agarose-bound proteins were transferred to a disposable column and eluted using Rap1 Lysis/Wash buffer containing 200 mM Imidazole.
To generate S. cerevisiae Mcm1 for the gel shift and Far Western protein-protein binding assays presented in the main text, S. cerevisiae MCM1 was cloned into a vector that would allow its expression and purification with an N-terminal MBP tag and PreScission protease cleavage site. Specifically, S. cerevisiae MCM1 was amplified from S. cerevisiae genomic DNA and cloned into a p425 GAL1 expression vector (Mumberg et al., 1994) containing an MBP-3C tag (Feigerle and Weil, 2016) using the SpeI and XhoI restriction enzymes. MBP-3C-Mcm1 vector was expressed in yeast grown to an OD600 of ~3 in 1% w/v raffinose via induction with 2% w/v galactose for 3 hr at 30˚C. Yeast cell pellets obtained from 1L of culture were resuspended in 4 mL Mcm1 Lysis Buffer (20 mM HEPES-KOH (pH 7.6), 500 mM potassium acetate, 10% v/v glycerol, 0.5% v/v Nonidet P-40, 1 mM DTT +1X protease inhibitors (0.1 mM PMSF, 1 mM Benzamidine HCl, 2.5 μg/mL aprotinin, 2.5 μg/mL leupeptin, 1 μg/mL pepstatin A)). Cells were lysed via glass bead lysis. Soluble cell extract was obtained via centrifugation and mixed with 2 mL DE-52 resin pre-equilibrated with Mcm1 Lysis Buffer for 5 min at 4˚C. Flowthrough from the DE-52 purification was collected and diluted with 20 mM HEPES-KOH, 10% v/v glycerol, and protease inhibitors to reduce the concentration of potassium acetate and Nonidet P-40 to 200 mM and 0.2% v/v, respectively. Binding to 600 μl amylose resin was performed in batch for 2 hr at 4˚C. Amylose resin-bound proteins were transferred to a disposable column, washed with 10 column volumes of Mcm1 Wash Buffer (20 mM HEPES-KOH, 200 mM potassium acetate, 10% v/v glycerol, and protease inhibitors) and eluted with 10 column buffers of Mcm1 Wash Buffer containing 10 mM maltose. For gel shift reactions, the N-terminal MBP tag was removed by incubating multiple 50 μl reactions each containing 12 pmol MBP-Mcm1 and 48 pmol lab-generated 3C protease for 30 min at 4˚C.
Taf1-TAP TFIID used for Far Western protein-protein binding analyses was purified via a modified tandem affinity protocol as previously described (Feigerle and Weil, 2016). GST-, His6-Taf3, His6-Taf4 and GST-Taf4 were expressed in E. coli and purified via chromatographic methods that varied upon each protein (Layer et al., 2010).
To generate material for the gel shift experiments in the supplement, the full-length Rap1-His6 protein from Kl. lactis was purified from E. coli. Rap1 was amplified from genomic DNA and cloned into the pLIC-H3 expression vector using XmaI and XhoI to make pTS207. This protein was purified using a His6 tag as previously described (Lohse et al., 2013). The full-length Mcm1-HA protein from Kl. lactis and S. cerevisiae was purified from S. cerevisiae. The genes were amplified from genomic DNA and cloned into p426 Gal1P-MCS (ATCC 87331) using BamHI and HindIII to make pTS226 (S. cerevisiae Mcm1) and pTS227 (Kl. lactis Mcm1). These plasmids were transformed into S. cerevisiae W303 for expression. Cells were grown 12 hr in SC –Leu at 30 ˚C, then in YPGL (1% yeast extract, 2% peptone, 3% glycerol, 2% lactate) for 10 – 12 hr. Then cells were diluted into 1L YPGL to OD600 = 0.3 and grown for 4 – 5 hr until OD600 = 0.6. Cells were induced with 2% galactose (added from a 40% galactose stock) for 1 hr. Then cells were pelleted, resuspended in an equal volume of lysis buffer, and pipetted into liquid nitrogen to make pellets. Cells were lysed in a Cryomill (Retsch) 6 times for 3 min at 30 Hz, refreezing in liquid nitrogen between cycles.
Cell powder was thawed on ice and diluted to 4 mL/g frozen pellet with lysis buffer (100 mM Tris pH 8.0, 1 mM EDTA, 10 mM fresh β-mercaptoethanol, 10% glycerol, 200 mM NaCl). Lysate was resuspended by pipetting, then cleared by centrifugation for 2 hr at 200,000 x g. Protein was bound to 250 μl of HA-7-agarose (Sigma) slurry per liter yeast culture for 1 hr at 4 ˚C. Lysate was applied to a Polyprep mini disposable gravity column (Biorad) and washed 4 times with 10 mL lysis buffer. The protein was eluted 4 times with one bed volume of 1 mg/mL HA peptide in lysis buffer after incubating 30 min on a tilt board at room temperature. The protein was stored in aliquots at −80 ˚C.
Extract for gel shifts was prepared as previously described (Baker et al., 2011). 25 mL cultures of cells were grown to OD = 0.8 – 1.0, and frozen at −80 ˚C. Cells were lysed in 300 μl extract gel shift buffer (100 mM Tris pH 8.0, 200 mM NaCl, 1 mM EDTA, 10 mM MgCl2, 10 mM β-mercaptoethanol, 20% glycerol, EDTA-free protease inhibitor cocktail (Roche)) with 200 μl glass beads on a vortexer for 30 min. Lysate was cleared by centrifugation at 18,000 x g for 20 min and diluted for experiments.
Gel-mobility shift assays
Request a detailed protocolFor the main text experiments, purified S. cerevisiae His6-Rap1 and Mcm1 were incubated either individually or in combination in increasing amounts as indicated in the figure legend. All binding reactions were performed using 10 fmol (7000 cpm) of a 79 bp 32P-labeled fragment of the Kl. lactis RPS23 promoter containing the Rap1 and Mcm1 binding sites generated via PCR, EcoRI restriction enzyme digestion, and native PAGE purification. Binding reactions were performed in binding buffer (20 mM HEPES-KOH (pH 7.6), 10% v/v glycerol, 100 mM KCl, 0.1 mM EDTA, 1 mM DTT, 25 μg/μl BSA, 25 μg/μl Poly(dG-dC) (double-stranded, alternating copolymer) in a final volume of 20 μl. For competition reactions, binding was performed in the presence of 100-fold molar excess of cold Rap1 WT (RWT) or Rap1 scrambled (Rsc) sequences and/or the Mcm1 WT (MWT) or Mcm1 scrambled (Msc). Reactions were allowed to proceed for 20 min at room temperature before loading onto 0.5X TBE-buffered (44.5 mM Tris, 44.5 mM Boric acid, 1 mM EDTA (pH 8.0)) 5% polyacrylamide gels and electrophoresed for 1 hr at 200V at room temperature. Gels were vacuum dried and 32P-DNA signals detected via K-screen imaging using a BioRad Pharos FX imager.
Gel shift experiments in the supplemental material were performed as previously described (Lohse et al., 2010). Binding reactions were carried out in 20 mM Tris pH 8.0 50 mM potassium acetate, 5% glycerol, 5 mM MgCl2, 1 mM DTT, 0.5 μg/μl BSA, 25 μg/μl poly(dI-dC) (Sigma).
Far western assay
Request a detailed protocolPurified proteins tested for direct interaction with Mcm1 in Far western protein-protein binding assays were separated on parallel 4 – 12% NuPAGE Bis-Tris polyacrylamide gels (Life Technologies). 0.5 pmol Taf1-TAP TFIID, 1 pmol His6-Taf3, and 1 pmol His6-Taf4 were used in the assay to test for direct Mcm1 interaction with TFIID subunits. In the assay used to identify the Taf4 Mcm1 binding domain,~0.4 pmol of each Taf4 form, 0.8 pmol His6-Taf3, and 0.8 pmol GST were used. For each experiment, one gel was stained using Sypro Ruby protein stain (Invitrogen) to monitor protein integrity and amount. The other gel(s) were transferred to PVDF membranes pre-equilibrated in transfer buffer. Following transfer, PVDF membranes were incubated with renaturation buffer (20 mM HEPES-KOH pH 7.6, 75 mM KCl, 25 mM MgCl2, 0.25 mM EDTA, 0.05 v/v% Triton X-100 and 1 mM DTT freshly added) for 90 min at 4˚C on a tiltboard. The PVDF membrane was then blocked using 5% non-fat milk in renaturation buffer for 30 min at room temperature on a tiltboard. The overlays were performed overnight with 10 nM MBP or 10 nM MBP-Mcm1 with 1% BSA (Roche) as a nonspecific competitor in renaturation buffer. Bound MBP- or MBP-Mcm1 was detected using a standard immunoblotting protocol (primary antibody MBP (NEB Catalog#E8032), used at a dilution of 1:5,000, secondary antibody horse anti-mouse IgG, HRP-linked (Cell Signalling Catalog#7076), used at a dilution of 1:5,000). Detection of bound proteins was achieved via incubation with ECL (GE) and X-ray film.
Data and code availability
Request a detailed protocolInterspecies hybrid expression data is available at the Gene Expression Omnibus (GEO) repository under accession number GSE108389. Flow cytometry data is available at Flow Repository under accession numbers FR-FCM-ZYWS, FR-FCM-ZYWT, FR-FCM-ZYWU, FR-FCM-ZYWV, FR-FCM-ZYJZ, FR-FCM-ZYJY, and FR-FCM-ZYJ2. Code used in computational analyses is available at doi.org/10.5281/zenodo.1341284.
Data availability
Interspecies hybrid expression data is available at the Gene Expression Omnibus (GEO) repository under accession number GSE108389. Flow cytometry data is available at Flow Repository under accession numbers FR-FCM-ZYWS, FR-FCM-ZYWT, FR-FCM-ZYWU, FR-FCM-ZYWV, FR-FCM-ZYJZ, FR-FCM-ZYJY, and FR-FCM-ZYJ2. Code used in computational analyses is available at doi.org/10.5281/zenodo.1341284.
-
Interspecies hybrid expression dataPublicly available at the NCBI Gene Expression Omnibus (accession no: GSE108389).
-
Flow cytometry data for Figure 1C, S2APublicly available at FlowRepository (accession no. FR-FCM-ZYWS).
-
Flow cytometry data for Figure 3E Rps23Publicly available at FlowRepository (accession no. FR-FCM-ZYWT).
-
Flow cytometry data for Figure 3E Rps17Publicly available at FlowRepository (accession no. FR-FCM-ZYWU).
-
Flow cytometry data for Figure 3FPublicly available at FlowRepository (accession no. FR-FCM-ZYWV).
-
Flow cytometry data for Figure 3HPublicly available at FlowRepository (accession no. FR-FCM-ZYJZ).
-
Flow cytometry data for Figure 6APublicly available at FlowRepository (accession no. FR-FCM-ZYJY).
-
Flow cytometry data for Figure S2BPublicly available at FlowRepository (accession no. FR-FCM-ZYJ2).
References
-
DNA-binding specificity of Mcm1: operator mutations that alter DNA-bending and transcriptional activities by a MADS box proteinMolecular and Cellular Biology 17:1881–1889.https://doi.org/10.1128/MCB.17.4.1881
-
Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research 25:3389–3402.https://doi.org/10.1093/nar/25.17.3389
-
Mechanisms of increasing expression of a yeast gene in Escherichia coliJournal of Molecular Biology 136:333–338.https://doi.org/10.1016/0022-2836(80)90377-0
-
Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary noveltyThe Quarterly Review of Biology 46:111–138.https://doi.org/10.1086/406830
-
BLAST+: architecture and applicationsBMC Bioinformatics 10:421.https://doi.org/10.1186/1471-2105-10-421
-
p53 dynamically directs TFIID assembly on target gene promotersMolecular and Cellular Biology 37:e00085.https://doi.org/10.1128/MCB.00085-17
-
Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnoverMolecular Biology and Evolution 19:1114–1121.https://doi.org/10.1093/oxfordjournals.molbev.a004169
-
Yeast TFIID serves as a coactivator for Rap1p by direct protein-protein interactionMolecular and Cellular Biology 27:297–311.https://doi.org/10.1128/MCB.01558-06
-
MycoCosm portal: gearing up for 1000 fungal genomesNucleic Acids Research 42:D699–D704.https://doi.org/10.1093/nar/gkt1183
-
Architecture of the Saccharomyces cerevisiae RNA polymerase I Core Factor complexNature Structural & Molecular Biology 21:810–816.https://doi.org/10.1038/nsmb.2873
-
Fast gapped-read alignment with Bowtie 2Nature Methods 9:357–359.https://doi.org/10.1038/nmeth.1923
-
Direct transactivator-transcription factor IID (TFIID) contacts drive yeast ribosomal protein gene transcriptionJournal of Biological Chemistry 285:15489–15499.https://doi.org/10.1074/jbc.M110.104810
-
Evidence for nucleosome depletion at active regulatory regions genome-wideNature Genetics 36:900–905.https://doi.org/10.1038/ng1400
-
Divergence of iron metabolism in wild malaysian yeastG3: Genes|Genomes|Genetics 3:2187–2194.https://doi.org/10.1534/g3.113.008011
-
Convergence, adaptation, and constraintEvolution 65:1827–1840.https://doi.org/10.1111/j.1558-5646.2011.01289.x
-
The evolutionary rewiring of the ribosomal protein transcription pathway modifies the interaction of transcription factor heteromer Ifh1-Fhl1 (interacts with forkhead 1-forkhead-like 1) with the DNA-binding specificity elementJournal of Biological Chemistry 288:17508–17519.https://doi.org/10.1074/jbc.M112.436683
-
Evolution of a membrane protein regulon in SaccharomycesMolecular Biology and Evolution 29:1747–1756.https://doi.org/10.1093/molbev/mss017
-
Use of the BioGRID database for analysis of yeast protein and genetic interactionsCold Spring Harbor Protocols 2016:prot088880.https://doi.org/10.1101/pdb.prot088880
-
Molecular mechanisms of ribosomal protein gene coregulationGenes & Development 29:1942–1954.https://doi.org/10.1101/gad.268896.115
-
phytools: an R package for phylogenetic comparative biology (and other things)Methods in Ecology and Evolution 3:217–223.https://doi.org/10.1111/j.2041-210X.2011.00169.x
-
Specific interactions of the telomeric protein Rap1p with nucleosomal binding sitesJournal of Molecular Biology 306:903–913.https://doi.org/10.1006/jmbi.2001.4458
-
Molecular characterization of Saccharomyces cerevisiae TFIIDMolecular and Cellular Biology 22:6000–6013.https://doi.org/10.1128/MCB.22.16.6000-6013.2002
-
ScerTF: a comprehensive database of benchmarked position weight matrices for Saccharomyces speciesNucleic Acids Research 40:D162–D168.https://doi.org/10.1093/nar/gkr1180
-
The genetic causes of convergent evolutionNature Reviews Genetics 14:751–764.https://doi.org/10.1038/nrg3483
-
Rapid evolution of cis-regulatory sequences via local point mutationsMolecular Biology and Evolution 18:1764–1770.https://doi.org/10.1093/oxfordjournals.molbev.a003964
-
The economics of ribosome biosynthesis in yeastTrends in Biochemical Sciences 24:437–440.https://doi.org/10.1016/S0968-0004(99)01460-7
-
TATA-binding Protein-associated factors enhance the recruitment of RNA polymerase II by transcriptional activatorsThe Journal of Biological Chemistry 276:34235–34243.https://doi.org/10.1074/jbc.M102463200
-
Pioneer transcription factors: establishing competence for gene expressionGenes & Development 25:2227–2241.https://doi.org/10.1101/gad.176826.111
-
Identification and characterization of the activation domain of Ifh1, an activator of model TATA-less genesBiochemical and Biophysical Research Communications 392:77–82.https://doi.org/10.1016/j.bbrc.2009.12.172
Decision letter
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Intrinsic cooperativity potentiates parallel cis-regulatory evolution" for consideration by eLife. Your article has been reviewed by Patricia Wittkopp as the Senior Editor, a Reviewing Editor, and three reviewers. The following individual involved in review of your submission has agreed to reveal his identity: Brian P.H. Metzger (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
All three reviewers were overwhelmingly positive and believe that this manuscript is an important and rigorous contribution that presents a novel mechanism by which transcriptional regulatory networks can evolve.
Essential revisions:
The consensus is that no additional experiments or analyses are required. Nonetheless, the authors should revise the manuscript in the following ways:
1) Provide a more specific model(s) about how new Mcm1 binding sites could have been selected for and/or maintained, even if this model(s) should be tested in more detail by future studies.
2) Clearly document the sources of all genome sequences analyzed and, for unpublished data, verify that the present analyses meet any terms of use the authors may have agreed to.
3) Provide sufficient detail about the phylogenetic analyses that the results can be replicated.
Reviewer #1:
Several cases have been observed where large regulatory networks have acquired binding sites for the same transcription factor (TF) across dozens of genes during evolution. In previous cases, these seemingly improbable events have been shown to occur through facilitation by transposable elements or protein-protein interactions among TFs. The present work provides an attractive new model where the interactions of two TFs with a third protein (TFIID) seem to have facilitated the acquisition of binding sites.
In general, the model was well articulated, and the experiments performed were insightful and rigorous. The diverse methods included hybridization, gene expression analyses, protein biochemistry, genetic reporter tests of cooperativity, and a particularly insightful test of the importance of the second TF in allowing weak binding sites from the first TF to have a stronger effect (Figure 6A).
Nonetheless several major issues should be addressed:
The phylogenetic and ancestral state reconstruction analyses are not sufficiently documented, do not appear to be up to modern standards, and do not discuss conflicts with published phylogenomic analyses that are likely more rigorous.
Subsection “Gains of functional Mcm1 cis-regulatory sites”, "data not shown" does not meet eLife reporting standards for any type of data or analyses.
The trend of gene expression for K. lactis versus the other species is in the opposite direction from what one might have expected under a simple model where binding sites were adaptively added to increase expression. The authors propose a compensatory model in subsection “Gains of functional Mcm1 cis-regulatory sites”, but they do not explore or discuss much how this would work, which somewhat undermines an otherwise neat and tidy story.
Were promoters analyzed separately that had TATA boxes versus those that were TATA-less? How would this impact conclusions?
Can the authors clarify that JGI is specifically allowing their unpublished genomes to be analyzed for this purpose? Downloading genomes from MycoCosm requires users to agree to terms of use. The list of investigators acknowledged suggests the authors have sought the necessary permissions, but it would be best to be certain, especially since JGI often considers phylogenomic analyses to be reserved for the publication of the genome.
No table of genome versions and citations is provided.
Reviewer #2:
The paper by Sorrells et al. addresses how binding sites for the same transcription factor can be gained across dozens of genes in independent lineages. The paper hypothesizes and tests a novel mechanism that the shared binding of two TFs with mediators of transcription can result in a bias of mutations towards functional effects on expression. Overall the paper is excellent, using multiple lines of evidence from genomics through functional tests to suggest that this mechanism may be important not only for the specific example studied but also more generally. I have only a few concerns that I believe are easily addressed.
1) A false dichotomy is set up in subsection “Gains of functional Mcm1 cis-regulatory sites”. It may be that Mcm1 doesn't change the overall expression, but instead changes regulation. This is addressed slightly in the next session, but the given impression is too clean given the lack of data that directly addresses this question.
2) Regardless of this concern, I like the experimental design. However, the species aren't that different in Mcm1 binding gains, i.e. both are in the full gain group. An alternative explanation for the results is that the organisms chosen simply aren't that different in their RPG regulation compared to what would happen upon the initial gain of Mcm1 binding sites. I believe that this needs to be acknowledged.
3) Compensation seems to imply an order, i.e. that the Mcm1 gain was after the loss of some other TF binding site. It seems more likely it was the other way around, that an Mcm1 gain permitted the loss of other binding sites. However, the extent to which the addition of Mcm1 binding site alone directly influenced expression isn't tested (i.e. adding a Mcm1 binding site to a promoter from a species without the Mcm1 gain). If the simple addition of Mcm1 alters expression, it becomes unclear how the authors think evolution proceeded given the strong purifying selection on RPG expression.
4) The figures don't show that Rap1AS7Ala doesn't act cooperatively as stated.
5) The experiments only address the mediator binding hypothesis from the Rap1 perspective, not Mcm1 (which is the thing that is new), nor TFIID. While these experiments aren't necessary, I think this omission needs to be acknowledged.
Reviewer #3:
This is a convincing and enjoyable manuscript that presents evidence that multiple independent co-options of the transcription factor Mcm1 to regulate ribosomal protein genes (RPGs) occurred in different yeast clades (about 7-9 independent occurrences), building on the authors' previous work in Tuch et al. (2008a). The proposed mechanism is a pre-existing ability of Mcm1 to cooperate with Rap1, through their separate interactions with TFIID. The authors' evolutionary model is convincing and provides an explanation for several otherwise puzzling aspects such as the recurrence of particular spacings between Mcm1 and Rap1 sites. As always with Johnson lab papers, this one is very well written, the figures are excellent, and the experiments are rigorous. The work is of broad significance, because it provides a general rationale for how a cis-regulatory motif can abruptly emerge in a large set of genes such as the RPGs.
The one question that the manuscript did not seem to answer adequately is what is the selective pressure to maintain Mcm1 sites in RPG genes after they arise by chance (model in Figure 7)? Since these sites do not contribute to increased transcription of RPGs, I cannot see what prevents them from decaying again. The text mentions this issue: the Discussion section says that new Mcm1 sites at RPGs "would fall under selection as other cis-regulatory sites deteriorated by mutation" (citing Figure 7, though Figure 7 doesn't illustrate this situation) but leaves me wondering what TFs bind to these other sites, or if there is evidence to support this idea. In principle, if the idea is correct, the deterioration of the other sites should be evident in Figure 1—figure supplement 1. For example, Sfp1 sites seem to disappear in the Kluyveromyces clade around the same time as Mcm1 sites appear, so is Sfp1 a candidate?
https://doi.org/10.7554/eLife.37563.040Author response
Summary:
All three reviewers were overwhelmingly positive and believe that this manuscript is an important and rigorous contribution that presents a novel mechanism by which transcriptional regulatory networks can evolve.
Essential revisions:
The consensus is that no additional experiments or analyses are required. Nonetheless, the authors should revise the manuscript in the following ways:
1) Provide a more specific model(s) about how new Mcm1 binding sites could have been selected for and/or maintained, even if this model(s) should be tested in more detail by future studies.
A model for how Mcm1 binding sites could be selected for requires two things: that the Mcm1 site is functional and that there is an evolutionary force that tends to preserve the site. We have focused primarily on a mechanism (intrinsic cooperativity) that causes weak Mcm1 sites to be functional in RPGs as they arise by mutation.
We can think of two plausible mechanisms explaining the preservation of Mcm1 sites once they arise. One mechanism would be direct selection for the increase of expression of the RPG, or the RPGs as a whole. This is plausible because the selection that acts on RPG expression as a whole changes significantly between species (as evidenced by our interspecies hybrid experiment as well as others). The selection acting on individual RPGs also probably changes on short timescales, potentially resulting on selection for a subset RPGs to have increased expression levels. However, we have no direct evidence for selection for increased expression of RPGs leading to the gain of Mcm1 sites.
We have direct evidence for a second model: a compensatory model in which the Mcm1 sites counteract losses of other regulatory elements. This is supported by the hybrid experiment results and cis-regulatory site analysis. The compensatory model is not incompatible with the first model (for example if selection for increased expression of RPGs alternated with relaxed selection or selection to reduce expression of RPGs). However, it could also be effected by neutral forces: if Mcm1 sites are allowed to be weak, they could present a smaller “mutational target” than the strong sites for another transcription regulator, resulting over time in the increase in Mcm1 sites at the expense of the other regulator due to mutational processes.
Compensatory changes could plausibly occur in multiple different orders on the scale of individual mutations, cis-regulatory sites, and genes. For example, mutations affecting the cis-regulatory sites could be fixed simultaneously (on the same haplotype) or alternate between strengthening the Mcm1 sites and weakening a second site. Alternatively, the Mcm1 site could become fixed some period of time before the loss of a second site (or vice-versa). We suspect that the precise order may differ for each gene as the Mcm1 sites were gained over millions of years.
We have done our best to indicate these scenarios in the text of the manuscript while making it clear what we have direct evidence for. We have modified Figure 7 to indicate the types of selection that could maintain Mcm1 sites, but because there are multiple plausible scenarios we have not directly illustrated them.
2) Clearly document the sources of all genome sequences analyzed and, for unpublished data, verify that the present analyses meet any terms of use the authors may have agreed to.
We have added Supplementary file 3 which includes the sources and versions of the genome assemblies and annotations used. The principle investigators of all of the unpublished genomes used in this study were contacted and shown an example analysis (or the final analysis) and consented to the use of the genomic data for the purpose of this publication. This consent was obtained prior to the preprint and initial submission of the manuscript. In all, 24 genomes of interest were excluded from the analysis because investigators did not consent to their use. Consent from a subset of the investigators was confirmed during revision and all gave the same response.
3) Provide sufficient detail about the phylogenetic analyses that the results can be replicated.
Additional detail was added to the methods to facilitate replication of the phylogenetic analyses. Specifically, we have added details about the tree construction, scoring of intergenic regions for cis-regulatory sites, and the newly added stochastic character mapping analysis to more rigorously estimate gains and losses. In addition to the scripts included in the initial submission, we have included Supplementary file 3 with the genome versions and Supplementary file 4 with the position weight matrices used for identifying the cis-regulatory sequences. We have also included a list of the ribosomal protein genes in each species and the presence of Mcm1 sites in Supplementary file 5.
Reviewer #1:
[…]
The phylogenetic and ancestral state reconstruction analyses are not sufficiently documented, do not appear to be up to modern standards, and do not discuss conflicts with published phylogenomic analyses that are likely more rigorous.
We have provided a reference and additional details in the methods on how the phylogenetic tree was created. Because we focus on the mechanism of parallel evolution and transcription activation, a detailed discussion of the tree topology is beyond the scope of the present study. Several differences in topology when compared to published studies may be ascribed to the use of unpublished genomes, but discussion of this is reserved for the principal investigators of these genome sequencing projects.
In response to this comment we have also implemented a new estimation of gains and losses to better quantify the uncertainty in this analysis. The model for evolution of binding sites is one of the most challenging questions we faced. In the end, rather than choosing a model specific to cis-regulatory sites whose assumptions were violated by the data we had, we chose a simple model of character evolution. (To be specific, the only sequence-independent model of binding site evolution we are aware of is CRETO from Otto et al. (2009) and to give an accurate estimate of evolutionary rates, it requires that the half-life of binding sites is on the order of the length of the tree from root to tip, whereas in our data Mcm1 site turnover is happening many times from root to tip.) Because of the large evolutionary distance, we decided to categorize species into presence or absence of Mcm1 sites and treat this as a discrete character under two different models. We strongly encourage the development of better computational models for cis-regulatory site evolution by experts in that area.
Subsection “Gains of functional Mcm1 cis-regulatory sites”, "data not shown" does not meet eLife reporting standards for any type of data or analyses.
We have further explained these results in the text of the manuscript and removed the assertion about phylogenetic branches because we did not rerun the analysis under multiple different tree topologies.
The trend of gene expression for K. lactis versus the other species is in the opposite direction from what one might have expected under a simple model where binding sites were adaptively added to increase expression. The authors propose a compensatory model in subsection “Gains of functional Mcm1 cis-regulatory sites”, but they do not explore or discuss much how this would work, which somewhat undermines an otherwise neat and tidy story.
Our intention was to present the results without trying to force them into a neat and tidy story. However, as suggested by all reviewers, we provide additional details about how a compensatory model may work, the details of which would need to be tested in future studies. Briefly, the scale of evolution between the interspecies hybrid may be much longer than selection for increased expression of the genes, or the model could involve neutral mutational processes alone.
Were promoters analyzed separately that had TATA boxes versus those that were TATA-less? How would this impact conclusions?
We analyzed whether Rap1 or Mcm1 sites showed a relationship with the strength or location of TBP/Spt15 binding sites, but no clear pattern was observed. In general, yeast ribosomal protein genes lack strong TATA boxes which may explain their dependence on transcription regulators such as Rap1 and Mcm1 that can directly interact with TFIID.
Can the authors clarify that JGI is specifically allowing their unpublished genomes to be analyzed for this purpose? Downloading genomes from MycoCosm requires users to agree to terms of use. The list of investigators acknowledged suggests the authors have sought the necessary permissions, but it would be best to be certain, especially since JGI often considers phylogenomic analyses to be reserved for the publication of the genome.
No table of genome versions and citations is provided.
See our response to the essential revisions.
Reviewer #2:
[…]
1) A false dichotomy is set up in subsection “Gains of functional Mcm1 cis-regulatory sites”. It may be that Mcm1 doesn't change the overall expression, but instead changes regulation. This is addressed slightly in the next session, but the given impression is too clean given the lack of data that directly addresses this question.
We agree that this seems simplistic because we do not discuss broad vs. condition-specific expression of Mcm1 until a later section. We have modified the text to specify that these hypotheses apply only to the function of Mcm1 during the rapid growth conditions used for the experiments in Figure 1 and Figure 2.
2) Regardless of this concern, I like the experimental design. However, the species aren't that different in Mcm1 binding gains, i.e. both are in the full gain group. An alternative explanation for the results is that the organisms chosen simply aren't that different in their RPG regulation compared to what would happen upon the initial gain of Mcm1 binding sites. I believe that this needs to be acknowledged.
We were indeed disappointed that we were unable to recover hybrids between species with larger differences in the number of Mcm1 binding sites (attempts are described in the methods). Unfortunately, we aren’t aware of any intermating species with a difference greater than that between Kl. lactis and Kl. wickerhamii (23% more RPGs have Mcm1 sites in Kl. lactis). Reports of distantly related species mating are probably due to misidentification of the species before molecular techniques existed.
We have modified the text to indicate our experiments address the question of evolution over the timescale of the divergence between each pair of species. This is a scale in which a significant fraction of RPGs can gain Mcm1 binding sites, suggesting that the gains aren’t completed by this time. One approach to capture the “initial gain” of Mcm1 sites as you suggest would be to measure allele-specific expression in hybrids between species such as S. cerevisiae and S. mikatae which show a slight difference between the levels of Mcm1 sites at the RPGs. However, we feel this is less informative because there is no guarantee that S. mikatae will go on to gain Mcm1 sites at a large fraction of its RPGs as we know happened in Kluyveromyces.
3) Compensation seems to imply an order, i.e. that the Mcm1 gain was after the loss of some other TF binding site. It seems more likely it was the other way around, that an Mcm1 gain permitted the loss of other binding sites. However, the extent to which the addition of Mcm1 binding site alone directly influenced expression isn't tested (i.e. adding a Mcm1 binding site to a promoter from a species without the Mcm1 gain). If the simple addition of Mcm1 alters expression, it becomes unclear how the authors think evolution proceeded given the strong purifying selection on RPG expression.
The best way to do this experiment would be to perform ancestral reconstruction on all intergenic sequences to see the effect of each gain of Mcm1 binding site along with other changes that happened in close evolutionary proximity, but this is unfortunately not possible because of the fast rate of evolution of intergenic regions. We suspect however, that the answer of order may differ between each gene depending on where the actual expression of the gene falls in the range of acceptable expression levels. See also our other responses regarding more details about the compensatory model.
4) The figures don't show that Rap1AS7Ala doesn't act cooperatively as stated.
Thank you for pointing this out—we have changed the conclusion to “these mutations strongly reduced 3-AT resistant growth, indicating that efficient expression requires an intact Rap1 activation domain.”
5) The experiments only address the mediator binding hypothesis from the Rap1 perspective, not Mcm1 (which is the thing that is new), nor TFIID. While these experiments aren't necessary, I think this omission needs to be acknowledged.
The experiments that identified the point mutations disrupting the Rap1-TFIID interaction comprised an entire publication (Johnson and Weil, 2017), so doing so with Mcm1 and TFIID is beyond the scope of our current study. Because the intrinsic cooperativity model requires interactions between TFIID and both Rap1 and Mcm1, we feel that disrupting either interaction is a sufficient test of the model.
Reviewer #3:
This is a convincing and enjoyable manuscript that presents evidence that multiple independent co-options of the transcription factor Mcm1 to regulate ribosomal protein genes (RPGs) occurred in different yeast clades (about 7-9 independent occurrences), building on the authors' previous work in Tuch et al. (2008a). The proposed mechanism is a pre-existing ability of Mcm1 to cooperate with Rap1, through their separate interactions with TFIID. The authors' evolutionary model is convincing and provides an explanation for several otherwise puzzling aspects such as the recurrence of particular spacings between Mcm1 and Rap1 sites. As always with Johnson lab papers, this one is very well written, the figures are excellent, and the experiments are rigorous. The work is of broad significance, because it provides a general rationale for how a cis-regulatory motif can abruptly emerge in a large set of genes such as the RPGs.
The one question that the manuscript did not seem to answer adequately is what is the selective pressure to maintain Mcm1 sites in RPG genes after they arise by chance (model in Figure 7)? Since these sites do not contribute to increased transcription of RPGs, I cannot see what prevents them from decaying again. The text mentions this issue: the Discussion section says that new Mcm1 sites at RPGs "would fall under selection as other cis-regulatory sites deteriorated by mutation" (citing Figure 7, though Figure 7 doesn't illustrate this situation) but leaves me wondering what TFs bind to these other sites, or if there is evidence to support this idea. In principle, if the idea is correct, the deterioration of the other sites should be evident in Figure 1—figure supplement 1. For example, Sfp1 sites seem to disappear in the Kluyveromyces clade around the same time as Mcm1 sites appear, so is Sfp1 a candidate?
Your hypothesis is exactly what we think is happening—in some clades where Mcm1 sites are gained you can find some other regulator site that seems to be at lower levels in that clade (e.g. Fhl1, Sfp1, and Rrn7 in Kluyveromyces, Cbf1 in Pachysolen tannophilus, Tbf1 in Yarrowia lipolytica, Hmo1 in Hanseniaspora uvarum). However, there are also species and clades where there is no clear loss associated with Mcm1 site gains. A study from Saccharomyces found the differences in overall nucleotide composition of the promoter contributes greatly to expression divergence of RPG expression (Zeevi, 2014) so this may explain these additional species. We have now described this idea more explicitly several places in the manuscript.
https://doi.org/10.7554/eLife.37563.041Article and author information
Author details
Funding
National Institutes of Health (GM115892)
- Amanda N Johnson
- Jordan T Feigerle
- P Anthony Weil
National Institutes of Health (GM037049)
- Trevor R Sorrells
- Conor J Howard
- Candace S Britton
- Kyle R Fowler
- Alexander D Johnson
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank K Verba, L Pack, S Liang, C Zhou, K Pollard, and E Chow for experimental and analysis guidance. We thank M Lohse, I Nocedal, S Singh-Babak, V Hanson-Smith and other members of the Johnson lab for technical guidance and comments on the manuscript. We thank S Åstrom for use of the Kl. dobzhanskii genome prior to publication. We thank I Grigoriev, J K Magnuson, P Inderbitzin, M Nowrousian, A Grum-Grzhimaylo, K O’Donnell, G Bonito and the Metatranscriptomics of Forest Soil Ecosystems project, D Greenshields, J Crouch, F Martin and the Mycorrhizal Genomics Initiative, J Spatafora, and R Gazis for providing access to unpublished genome data produced by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research was supported by the grants R01 GM115892 and R01 GM037049. ANJ was supported by T32NS00749 from the National Institutes of Health. TRS was supported by a Graduate Research Fellowship from the National Science Foundation.
Publication history
- Received: April 16, 2018
- Accepted: September 9, 2018
- Accepted Manuscript published: September 10, 2018 (version 1)
- Version of Record published: October 5, 2018 (version 2)
Copyright
© 2018, Sorrells et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,607
- Page views
-
- 388
- Downloads
-
- 8
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Chromosomes and Gene Expression
An evolutionary perspective enhances our understanding of biological mechanisms. Comparison of sex determination and X-chromosome dosage compensation mechanisms between the closely related nematode species C. briggsae (Cbr) and C. elegans (Cel) revealed that the genetic regulatory hierarchy controlling both processes is conserved, but the X-chromosome target specificity and mode of binding for the specialized condensin dosage compensation complex (DCC) controlling X expression have diverged. We identified two motifs within Cbr DCC recruitment sites that are highly enriched on X: 13-bp MEX and 30-bp MEX II. Mutating either MEX or MEX II in an endogenous recruitment site with multiple copies of one or both motifs reduced binding, but only removing all motifs eliminated binding in vivo. Hence, DCC binding to Cbr recruitment sites appears additive. In contrast, DCC binding to Cel recruitment sites is synergistic: mutating even one motif in vivo eliminated binding. Although all X-chromosome motifs share the sequence CAGGG, they have otherwise diverged so that a motif from one species cannot function in the other. Functional divergence was demonstrated in vivo and in vitro. A single nucleotide position in Cbr MEX can determine whether Cel DCC binds. This rapid divergence of DCC target specificity could have been an important factor in establishing reproductive isolation between nematode species and contrasts dramatically with conservation of target specificity for X-chromosome dosage compensation across Drosophila species and for transcription factors controlling developmental processes such as body-plan specification from fruit flies to mice.
-
- Chromosomes and Gene Expression
- Plant Biology
A well-established model for how plants start the process of flowering in periods of cold weather may need revisiting.