1. Chromosomes and Gene Expression
  2. Genetics and Genomics
Download icon

Comparison of transcriptional initiation by RNA polymerase II across eukaryotic species

  1. Natalia Petrenko
  2. Kevin Struhl  Is a corresponding author
  1. Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, United States
Research Article
  • Cited 0
  • Views 1,034
  • Annotations
Cite this article as: eLife 2021;10:e67964 doi: 10.7554/eLife.67964

Abstract

The preinitiation complex (PIC) for transcriptional initiation by RNA polymerase (Pol) II is composed of general transcription factors that are highly conserved. However, analysis of ChIP-seq datasets reveals kinetic and compositional differences in the transcriptional initiation process among eukaryotic species. In yeast, Mediator associates strongly with activator proteins bound to enhancers, but it transiently associates with promoters in a form that lacks the kinase module. In contrast, in human, mouse, and fly cells, Mediator with its kinase module stably associates with promoters, but not with activator-binding sites. This suggests that yeast and metazoans differ in the nature of the dynamic bridge of Mediator between activators and Pol II and the composition of a stable inactive PIC-like entity. As in yeast, occupancies of TATA-binding protein (TBP) and TBP-associated factors (Tafs) at mammalian promoters are not strictly correlated. This suggests that within PICs, TFIID is not a monolithic entity, and multiple forms of TBP affect initiation at different classes of genes. TFIID in flies, but not yeast and mammals, interacts strongly at regions downstream of the initiation site, consistent with the importance of downstream promoter elements in that species. Lastly, Taf7 and the mammalian-specific Med26 subunit of Mediator also interact near the Pol II pause region downstream of the PIC, but only in subsets of genes and often not together. Species-specific differences in PIC structure and function are likely to affect how activators and repressors affect transcriptional activity.

Introduction

Transcription begins with the assembly of a preinitiation complex (PIC) at the promoter, a concept defined in vitro as a stable entity that contains RNA polymerase and initiates transcription upon addition of nucleotide triphosphates. For eukaryotic RNA polymerase (Pol) II, the PIC includes general transcription factors (GTFs), originally defined as being necessary and sufficient for ‘basal’ transcription from promoters in vitro (Conaway and Conaway, 1993; Buratowski, 1994; Orphanides et al., 1996; Roeder, 1996). GTFs are highly conserved among eukaryotic organisms, and they include the TATA-binding protein (TBP), TFIIA, TFIIB, TFIIE, TFIIF, TFIIH, and Pol II itself. However, variations in PIC composition can occur in metazoan tissues, including TBP-related factors (TRF1, -2, -3) that function primarily in reproductive organs (D’Alessio et al., 2009; Vo Ngoc et al., 2017).

Although the requirement for the GTFs for transcription in vitro can vary depending on reaction conditions, depletion experiments in yeast cells indicate that all GTFs are essential for Pol II transcription in vivo (Petrenko et al., 2019). Furthermore, the relative occupancies of GTFs are consistent across all yeast promoters and quantitatively linked to transcriptional activity (Kuras et al., 2000; Pokholok et al., 2002; Rhee and Pugh, 2012; Petrenko et al., 2019), indicating that a structurally similar PIC mediates a given level of transcription. Similar experiments in mammalian cells also suggest that a structurally similar PIC is responsible for most transcription (Koch et al., 2011; Pugh and Venters, 2016).

TBP-associated factors (TAFs), which together with TBP constitute the TFIID complex, are not required for transcription in vitro, but they are often components of the PIC in vivo. In yeast cells, TAF occupancies are strongly correlated with each other, but they do not strictly correlate with TBP occupancy (Kuras et al., 2000; Li et al., 2000; Rhee and Pugh, 2012). Thus, transcription in yeast cells can be mediated by PICs with TAF-containing (TFIID) or TAF-lacking forms of transcriptionally active TBP. The relative usage of these two PIC forms depends on the promoter and the quality of the TATA sequence (Struhl, 1986; Iyer and Struhl, 1995; Moqtaderi et al., 1996; Basehoar et al., 2004; Huisinga and Pugh, 2004; Petrenko et al., 2019) as well as the activator protein that stimulates PIC formation (Li et al., 2002; Mencia et al., 2002). The TAF-lacking form of TBP may be associated with the SAGA complex via the Spt3 subunit (Eisenmann et al., 1992; Bhaumik and Green, 2001; Larschan and Winston, 2001; Basehoar et al., 2004; Papai et al., 2020; Wang et al., 2020), but there is no direct evidence that SAGA associates with the promoter in vivo.

Some metazoan tissues possess tissue-specific TAFs and tissue-specific TFIID subunit composition (D’Alessio et al., 2009; Hart et al., 2009; Maston et al., 2012). There is no evidence for the existence of free TBP in metazoan cells, and it is unclear whether there are TAF-lacking, or perhaps SAGA-containing forms of transcriptionally active TBP. However, fluorescent studies suggest that activity of Drosophila histone gene promoters might rely only on TBP and TFIIA in the absence of TFIIB and TAFs (Guglielmi et al., 2013).

Mediator is a large complex (21–30 proteins depending on the species) that interacts with Pol II, and it can stimulate PIC assembly, phosphorylation of the Pol II C-terminal domain (CTD) by TFIIH, and basal transcription in vitro (Thompson et al., 1993; Kim et al., 1994; Guidi et al., 2004; Takagi and Kornberg, 2006; Esnault et al., 2008). Mediator is organized structurally into head, middle, tail, and kinase modules (Guglielmi et al., 2004; Lariviere et al., 2012; Allen and Taatjes, 2015; Plaschka et al., 2015; Robinson et al., 2015). The evolutionarily divergent tail module interacts with activator proteins bound at enhancers, whereas the highly conserved head and middle modules interact with Pol II. In yeast, the complete Mediator complex is essential for transcription, but sub-modules can support transcription at reduced levels (Petrenko et al., 2017). The kinase module sterically blocks the interaction of Mediator with Pol II (Elmlund et al., 2006; Knuesel et al., 2009), but it has a very modest effect on transcription.

In wild-type yeast cells, Mediator is readily detected at enhancers but is not detected at promoters (Fan et al., 2006), even though it is a required component of the PIC (Jeronimo and Robert, 2014; Wong et al., 2014). In contrast, Mediator in mammalian cells associates with both promoters and enhancers (Kagey et al., 2010). Nevertheless, in yeast, Mediator associates strongly with promoters upon depletion or inhibition of TFIIH kinase (Kin28 subunit), and the level of Mediator association strongly correlates with GTF occupancy and Pol II transcription (Jeronimo and Robert, 2014; Wong et al., 2014). Although the complete Mediator complex is recruited by activator proteins bound to enhancers upstream of the promoter, the form of Mediator associated with the promoter lacks the kinase module. This observation is in accord with structural and biochemical studies showing that the kinase module inhibits Mediator contacts with Pol II. Thus, a single Mediator complex acts as a dynamic bridge between enhancers and promoters, and it undergoes a compositional change in which the kinase module dissociates to permit association with Pol II and the PIC (Jeronimo et al., 2016; Petrenko et al., 2016).

These observations also indicate that, in wild-type yeast cells, Mediator association with the promoter is transient, and hence the PIC is a very short-lived entity (estimate around 1/8 s) (Wong et al., 2014). Upon PIC formation, TFIIH kinase rapidly phosphorylates the Pol II CTD at serine 5 with concomitant dissociation of Mediator and Pol II escape from the promoter (Jeronimo and Robert, 2014; Wong et al., 2014). As the PIC is transient, measurements of GTF occupancy in wild-type cells largely reflect a complex in which Pol II has escaped but which is suitable for re-association of Mediator and a new Pol II molecule (Wong et al., 2014). It has been suggested that this post-escape complex is sufficiently stable to permit multiple rounds of reinitiation (also termed bursting) without having to form a completely new PIC from scratch (Wong et al., 2014).

In metazoans, while some aspects of transcription have received intense scrutiny (e.g., Pol II pausing downstream of the promoter, divergent transcription, enhancer RNAs), issues related to basic initiation mechanisms in vivo are largely unaddressed. Here, we analyze publicly available (as of September 2021) genome-wide data for PIC components in metazoans, focusing on cell lines with ChIP-seq datasets for multiple factors to allow direct comparison. We show yeast and metazoans differ considerably with respect to Mediator function, PIC stability, and transcriptional initiation. We show that TFIID is not a monolithic entity at all promoters and extend previous studies indicating that TAFs, and particularly Taf7, behave differently in yeast and metazoans, and we investigate the complicated relationship between Taf7 and Med26.

Results

Promoter definition

The original, and still the best, definition of a promoter is the genetic element necessary for expression of a structural gene (or operon) that is distinct from elements that regulate the expression level of the gene (Jacob et al., 1964; Scaife and Beckwith, 1966). In molecular terms, promoters are recognized by basic transcription machineries that initiate RNA synthesis from nearby sites. However, over the past few decades, this fundamental concept of promoters has become muddled.

A common and confusing definition, particularly for metazoan promoters, includes proximal DNA sequences recognized by specific transcription factors that vary among genes linked to what is termed the ‘core promoter’. Another source of confusion involves the term ‘enhancer’, originally defined as a genetic element that interacts with specific activator proteins that stimulate transcription from a separable promoter located far away (Banerji et al., 1981; Moreau et al., 1981; Fromm and Berg, 1982). However, enhancers are bound by the same sequence-specific activator proteins that bind to promoter-proximal regions, and they often express ‘enhancer RNAs’, leading to the confusing idea that enhancers and promoters are very similar (Core et al., 2014).

In this paper, which relies exclusively on ChIP-seq datasets, we define a promoter by occupancy of GTFs (typically TBP) and Pol II (typically observed at the pause site, not the promoter itself). Many promoters are located immediately upstream of mRNA coding regions, and these mRNA promoters are the primary focus of the analyses. However, other promoters are in the vicinity of what are termed enhancer regions. As we will show, these two classes of promoters behave indistinguishably from the perspective of PIC function. Promoters defined by the classic concept are distinct from activator-binding sites, irrespective of their distance from a given promoter.

Mammalian Mediator associates stably with promoters, not activator-binding sites at proximal or distal locations

In wild-type Saccharomyces cerevisiae cells, Mediator associates strongly with active enhancers but not with promoters (Fan et al., 2006; Fan and Struhl, 2009; Figure 1A). However, Mediator association with promoters does occur in strains depleted for the TFIIH kinase activity (Jeronimo and Robert, 2014; Wong et al., 2014; Jeronimo et al., 2016; Petrenko et al., 2016; Figure 1A). Furthermore, unlike the case at enhancers, the form of Mediator at promoters lacks the kinase module (Jeronimo et al., 2016; Petrenko et al., 2016; Figure 1A).

Figure 1 with 1 supplement see all
Association of general transcription factors (GTFs) at mRNA promoters and distal regulatory regions.

(A) Mean occupancies (metagene analyses) of the indicated proteins at the 600 most active yeast mRNA promoters with respect to the transcription start site (TSS) in cells depleted (+rap) or not depleted (-rap) for the Kin28 subunit of TFIIH. (B) Mean occupancies of the indicated proteins at the 10,000 most active mRNA promoters (by TATA-binding protein [TBP] occupancy) in the indicated human cell lines. (C) Mean occupancies of the indicated proteins at the 10,000 most active mRNA promoters in the indicated mouse cell lines. (D) Correlation of Med1 and Cdk8 occupancies at individual mRNA promoters (dots) in MM1S and HCT116 cells; red dots indicate promoters with relatively high Med1 levels with respect to Cdk8 levels.

By contrast, in wild-type human and mouse cells, Mediator (defined by the middle module subunit Med1) binds to promoters of protein-coding genes (Kagey et al., 2010; Figure 1B and C). Med1 peaks overlap closely with those of TBP, TFIIB, and TFIIF, but all these sites are about 80 bp downstream of peaks of transcription factors SP1 and NF-Y, which are markers of promoter-proximal enhancers just upstream of promoters (Figure 1B and Figure 1—figure supplement 1A). Med1 and GTF occupancy sites are also about 50 bp upstream of Pol II, which is located at the post-initiation pause position as expected (Figure 1B and Figure 1—figure supplement 1A). The distinct peak locations of Pol II, GTFs, and transcription factors in the same samples indicate that Mediator is associated with the promoter and not with the promoter-proximal elements bound by activator proteins that stimulate basal transcription.

Distal regulatory regions, typically termed enhancers, can support a low level of Pol II transcription that generates non-coding ‘enhancer RNAs’ (Kim et al., 2010; Core et al., 2014; Jin et al., 2017). Such enhancer regions are often defined by chromatin that is accessible (DNase-sensitive) and flanked by acetylated histones (Heintzman et al., 2007; Ernst et al., 2011), and mammalian cells typically have ~150,000 such enhancers (Shen et al., 2012; ENCODE, 2020). These chromatin features at enhancers are mediated by sequence-specific activator proteins that cooperatively recruit nucleosome remodeling and histone acetylase complexes.

The level of Mediator association (median values of Med1 peak heights) at a small subset of such enhancer regions (top 10,000 Mediator peaks out of ~150,000 enhancers) is roughly comparable to the level of Mediator association at mRNA promoters (top 10,000 or all Mediator peaks at ~20,000 mRNA promoters) (Figure 2A). In striking contrast, the median level of Mediator association at all enhancers is much lower (Figure 2A). Moreover, most of the apparent Mediator occupancy at all enhancers is due to contributions from the top 10,000 enhancers. Thus, the vast majority of enhancers have low or non-detectable levels of Mediator.

Mediator association and distance between factors at promoters and enhancers.

(A) Medan occupancy values at and surrounding the preinitiation complex (PIC) peak center defined by general transcription factor (GTF occupancy) of the top 10,000 promoters and enhancers as well as all promoters and enhancers in MM1S cells. All enhancers include the ~60,000 genomic regions where any Mediator signal is detected, including the top 10,000. It does not include the ~100,000 enhancers where no Mediator signal is detected. (B) Absolute values of the distances in base pairs betwen the indicated factors (or replicates) at top 10,000 promoters and enhancers (Loven et al., 2013) in MM1S cells. The boxplot indicates the median value (also written as a number), as well as the 25th and 75th percentiles, represented as the edges of the box.

Distinguishing whether Mediator binding within the so-called enhancer regions coincides with activator-binding sites or the PIC is complicated for several reasons. First, it is difficult to align enhancers, unlike promoters that are aligned by their transcription start. Second, transcription from enhancer regions is largely bidirectional, making it difficult to distinguish upstream from downstream relationships between associated proteins. Third, the PICs responsible for the divergent enhancer RNAs are often too close together to be resolved separately. In these cases, the GTF peaks appear to map to the center of the enhancer, which coincides with the activator peaks, thereby making it impossible to address whether Mediator is recruited by the activator or the PIC/promoter.

For these reasons, we measured the distance between the peak summits of Mediator (Med1 subunit), the GTF TFIIH (Cdk7 subunit), Pol II, and sequence-specific activators (E2F, DP1) at enhancers in MM1S cells (Figure 2B). As a control, the median distance for an individual factor in biological replicates is ~45 bp, reflecting experimental variation in peak summits. The median distance between Mediator and Cdk7 is similar, confirming that Mediator coincides with the PIC. The median distance between Cdk7 and Pol II is ~90 bp, indicating that Pol II is at the paused position even at the so-called enhancers. Importantly, the median distance between Mediator and the E2F and DP1 activators is ~90 bp, indicating that Mediator is not localized at the same position as the activators. As expected, similar results are observed at mRNA promoters (Figure 2B). Thus, even at the so-called enhancers, Mediator is stably associated with promoters, not activator-binding sites.

The kinase module of mammalian Mediator associates with promoters

Unlike the situation at yeast promoters, the kinase module (represented by the Cdk8 and Med12 subunits) associates with human and mouse promoters (Figure 1B and C). Association of the kinase module at the promoter is strongly correlated with Med1 for most genes, although the kinase module is depleted at certain classes of genes in some cell lines (translation, ribosome biogenesis and protein targeting to the endoplasmic reticulum in HCT116 cells, chromatin organization in MOLM-14 cells, and histone genes in mouse V6.5 cells; Figure 1D and Figure 1—figure supplement 1B). Thus, mammalian Mediator stably associates with the promoter in a manner that can include the kinase module. The presence of the kinase module at mammalian promoters is surprising in light of numerous lines of evidence indicating that the kinase module and Pol II interact with the core Mediator in a mutually exclusive manner (see Discussion).

Mediator and PIC levels are strongly, but not strictly, correlated at promoters

To address whether Mediator and PIC components co-associate with the promoter, we compared the levels of Med1, Pol II, and TBP at the 10,000 most active mRNA promoters (those with highest TBP levels). A strong Pearson correlation (range 0.65–0.74) suggests a relatively constant ratio of Mediator and GTFs at the promoter for many genes (Figure 3A and Figure 3—figure supplement 1), as is the case in yeast cells. However, Mediator levels are higher than those of PIC components at cell line-specific categories of genes (Figure 3A), excluding the possibility that the relatively high Mediator levels at these genes are due to experimental variance. Promoters with relatively increased Med1 and Cdk8 are enriched for chromatin organization and immunity-related genes in the MM1S blood cancer cell line, for differentiation and apoptosis genes in human embryonic stem cells (hESCs), and stress responses in HCT116 cells (Figure 3A). Thus, unlike the case in yeast cells depleted of Kin28, mammalian Mediator association with promoters is not strictly correlated with association of other PIC components.

Figure 3 with 1 supplement see all
Relationship of Mediator, TATA-binding protein (TBP), and Pol II levels at promoters.

(A) Pairwise correlations of Med1, TBP, and Pol II occupancies at the 10,000 most active (by TBP occupancy) mRNA promoters (dots) in the indicated human cell lines; red dots indicate promoters with relatively high Mediator levels with respect to Pol II levels. Enriched gene categories for promoters with relatively high Mediator levels with respect to Pol II levels in the indicated cell lines are given in the Supplementary file 2. (B) Pairwise correlations of Med1, Cdk7 (TFIIH), and Pol II occupancies at ~8000 enhancers (Loven et al., 2013) in MM1S cells.

We also examined the relationship between Mediator, Cdk7, and Pol II at distal enhancers in MM1S cells (Figure 3B). The correlation of Mediator occupancy with either Cdk7 and Pol II occupancy is strong (R = 0.6) and only slightly less than the correlation between the PIC components Cdk7 and Pol II (R = 0.7) at distal enhancers and the correlations at mRNA promoters (Figure 3A). In accord with the median Mediator occupancy at enhancers (Figure 2A), but in marked contrast to the situation in yeast, there are few enhancers in which Mediator is associated but GTFs are not (Figure 3B).

Fly Mediator associates stably with promoters and may vary in composition

Drosophila melanogaster Med1 (middle module) and Med30 (part of a tail segment contiguous with the head) (El Khattabi et al., 2019) co-localize with TBP at the promoter (Figure 4A). As in mouse and human cells, the Mediator peak maps ~60 bp upstream of the Pol II peak and ~60 bp downstream of the peak for the GAGA transcription factor, indicating association with the PIC, not promoter-proximal elements for sequence-specific DNA-binding proteins. Unexpectedly, some genes show high levels of Med1 but nearly absent Med30, or vice versa (Figure 4B and Figure 4—figure supplement 1). This discordance between Med1 and Med30 binding is not due to random errors, because each set of genes shows enrichment in gene ontology (GO) categories (Figure 4B). Thus, while Mediator appears to behave as a complete complex at most fly promoters, some promoters appear to be bound by distinct Mediator subcomplexes and/or different conformations of Mediator with altered crosslinking properties. These distinct forms of Mediator might be involved in tissue and gene specificity of essential Mediator subunits in other organisms. For example, mouse Med9 (middle module) is essential in T cells but not in B cells or ESCs, whereas Med26 (middle module) is essential in T cells and ESCs but not in B cells (El Khattabi et al., 2019).

Figure 4 with 1 supplement see all
Association of Mediator, TATA-binding protein (TBP), Pol II, and GAGA transcription factor in Drosophila cell lines.

(A) Mean occupancies of the TBP, Pol II, Mediator (Med1 and Med30 subunits), and GAGA at the 1000 most active (by TBP occupancy) promoters in the indicated Drosophila cell lines with respect to the transcription start site (TSS). (B) Relative occupancy levels of Med30 and Med1 at individual promoters. Genes with relatively high Med30:Med1 ratios (blue dots) and relatively low Med30:Med1 ratios (green dots) indicated along with enriched gene categories.

Pol II occupancy at 5’ ends of genes, but not in the coding region, is strongly correlated with PIC occupancy

In yeast, Pol II occupancy in the coding regions directly corresponds to the occupancy of PIC components at the promoter, as Pol II rapidly escapes the promoter into active elongation. By contrast, in mammalian and fly cells, Pol II occupancy peaks at the post-initiation pause site ~50 bp downstream of the transcription start site (TSS), and it is considerably lower throughout the coding region (Adelman and Lis, 2012; Core and Adelman, 2019). Paused Pol II is released into elongation via phosphorylation of NELF and Spt5 by the Cdk9 subunit of the pTEFb complex. In addition, paused Pol II can be removed through early termination, in a mechanism involving the Integrator complex (Erickson et al., 2018; Elrod et al., 2019; Huang et al., 2020). Paused Pol II inhibits new Pol II initiation because it sterically blocks Pol II association at the promoter and hence assembly of a functional PIC (Gressel et al., 2017; Shao and Zeitlinger, 2017).

Resolving Pol II occupancy at the promoter versus the pause site requires high-resolution (e.g., ChIP-exo) or nucleotide-level (e.g., Pro-seq) data that is not available in cell lines for which there is data for other initiation factors. Thus, we used total Pol II occupancy level at peaks around the promoters to determine its relationship to TBP levels. If levels of paused Pol II relative to the initiation rate vary across the genome, then there should be appreciable deviations of the ratio of Pol II occupancy relative to other PIC components. However, Pol II levels correlate very strongly with TBP levels (Pearson = 0.8 in K562 and hESC) (Figure 5A), albeit slightly below the correlation (0.9–0.96) of biological replicates (Figure 5—figure supplement 1). This indicates that Pol II occupancy near promoters (which is primarily at the pause site) is strongly linked to PIC levels. In addition, these observations argue that premature termination at the pause site either happens in concert with initiation or is not a major mechanism regulating Pol II activity at most genes.

Figure 5 with 1 supplement see all
Relationship of TATA-binding protein (TBP) occupancy with various forms of Pol II.

(A) Relative occupancy levels of TBP and total Pol II in the vicinity of individual promoters (10,000 most active by TBP occupancy) in the indicated human cell lines. (B) Relative occupancy levels of total Pol II and the form of Pol II phosphorylated at serine 5 in the C-terminal domain in the vicinity of individual promoters in K562 cells. (C) Relative occupancy levels of total Pol II and the form of Pol II phosphorylated at serine 2 in the C-terminal domain in the vicinity of individual promoters in K562 cells. (D) Relationship between TBP occupancy levels subdivided into deciles, in which high values indicate high occupancy with the Pol II pausing index (ratio of Pol II in the promoter-proximal peak relative to the coding region).

As expected from the fact that phosphorylation of the Pol II CTD at Serine 5 is mediated by the Cdk7 subunit of TFIIH, there is a very strong correlation between the levels of total Pol II and CTD-S5-phosphorylated Pol II (Pearson = 0.87; Figure 5B). However, the correlation between total Pol II and S2-phosphorylated Pol II is lower (Pearson = 0.67; Figure 5C), possibly reflecting the competition between initiation mechanisms and the activities of pausing factors and Integrator. In accord with the observation that very highly transcribed mouse genes often have lower relative levels of pausing (Min et al., 2011), there is a tendency for decreased pausing index (the ratio of Pol II in the promoter-proximal peak relative to the coding region) to be linked to increased TBP occupancy (Figure 5D).

TFIID dependency varies across metazoan promoters

In yeast cells, TBP and the TAF subunits of TFIID have indistinguishable binding profiles at promoters. However, levels of TAF occupancies do not strictly correlate with TBP or GTF occupancies, leading to the concept of TAF-containing (TFIID) and TAF-lacking forms of the PIC transcriptionally active TBP (Kuras et al., 2000; Li et al., 2000; Rhee and Pugh, 2012). Promoters favored by TFIID-containing PICs tend to have lower quality TATA motifs, consistent with the idea that TAF interactions with promoter DNA can reduce the requirement for a strong TBP:TATA interaction. In TAF-depleted cells, transcriptionally active PICs lacking TAFs have been directly observed (Petrenko et al., 2019).

In flies, unlike in yeast and human, TAF1 and TAF2 show well-defined binding peaks ~60 bp downstream of TBP (Baumann and Gilmour, 2017; Shao and Zeitlinger, 2017; Figure 6A). This observation is consistent with the location of downstream promoter elements (e.g., DPE and MTE) that are important for transcription and directly contacted by TAF1 and TAF2 (Louder et al., 2016; Baumann and Gilmour, 2017; Vo Ngoc et al., 2019). In agreement with a pared-down PIC lacking TAFs and several GTFs at the histone gene clusters (Guglielmi et al., 2013), TAF1 and TAF2 are completely missing despite considerable association of TBP and Pol II (Figure 6—figure supplement 1A).

Figure 6 with 2 supplements see all
Relative occupancy of TBP-associated factors (Tafs) and TATA-binding protein (TBP) at promoters.

(A) Relative occupancy levels of TBP, Pol II, and the indicated Tafs at the 1000 most active Drosophila and 10,000 most active human mRNA promoters with respect to the transcription start site (TSS). (B) Number of Drosophila mRNA promoters having the indicated TBP:TBP replicate or Taf2:TBP ratios (among 900 most active promoters); median ratio set to 1.0. Promoters with low (orange) or high (green) Taf2:TBP ratios indicated along with enriched gene categories. (C) Number of human promoters having the indicated TFIIB:TBP or Taf1:TBP ratios (among 3500 most active promoters) in the indicated cell lines; median ratio set to 1.0. Promoters with low (orange) or high (green) Taf1:TBP ratios indicated along with enriched gene categories.

Yeast genes involved in translation and housekeeping functions tend to show a greater reliance on TFIID, whereas highly regulated genes tend to be TAF-independent and dependent on the SAGA complex (Moqtaderi et al., 1996; Kuras et al., 2000; Li et al., 2000; Huisinga and Pugh, 2004; de Jonge et al., 2017; Petrenko et al., 2019). Similarly, the Taf2:TBP ratio in fly cells varies significantly more than observed for the TBP:TBP ratio of replicates (Figure 6B and Figure 6—figure supplement 1B). Moreover, genes with high or low Taf2:TBP ratios are enriched for certain gene classes (Figure 6B and Figure 6—figure supplement 1B), indicating that variations in this ratio are not due to chance. Genes with a high Taf2:TBP ratio tend to be involved in translation, protein targeting to membrane, and nonsense mediated decay, whereas those with a low Taf2:TBP ratio are involved in metabolism and organ morphogenesis (Figure 6B). The observation that high and low TAF/TBP occupancy ratios are respectively enriched for housekeeping and developmental genes in human cells resembles what occurs in yeast.

In human cells, the binding profile of TAF1 is indistinguishable from that of TBP (Figure 6A). Taf1 occupancy correlates well with TBP occupancy (Pearson = 0.64 in K562 and 0.78 in hESC Figure 6—figure supplement 2A), but some promoters have low Taf1 occupancy relative to TBP occupancy (Figure 6C). Promoters with relatively low Taf1 occupancy constitute ~20% of those with detectable TBP peaks, and they are enriched for nucleosome organization (including histones), chromatin assembly, and certain stress responses in two different cell lines, while those with the highest Taf1:TBP occupancy ratio are enriched for translation, ribosome biogenesis, and cell cycle-related processes (Figure 6C and Figure 6—figure supplement 2B). In contrast and as expected, TFIIB:TBP occupancy ratios show less variation, and the few promoters with high or low ratios are not enriched for any functional categories and are likely due to experimental variation (Figure 6C and Figure 6—figure supplement 2A).

Discordant behavior of Taf1 and Taf7 at distinct sets of human genes

In S. cerevisiae, all TAFs tested maintain a constant ratio to one another across various genes (Kuras et al., 2000; Li et al., 2000; Venters et al., 2011), suggesting that TFIID functions as a monolithic complex. Although TAF7 has not been investigated in S. cerevisiae, TBP and Taf7 occupancy in Schizosaccharomyces pombe is strongly correlated (Pearson = 0.7) at promoters of genes encoding proteins (Figure 7A) and non-coding RNAs (Figure 7B). Interestingly, S. pombe genes with relatively high Taf7 levels are enriched for translation and ribosome biogenesis (Figure 7A), resembling the relatively high TAF levels and TFIID dependency at such genes in S. cerevisiae.

Figure 7 with 3 supplements see all
Location and relative occupancy of Taf7, Taf1, and TATA-binding protein (TBP).

(A, B) Relative occupancy levels of TBP and Taf7 at individual mRNA and non-coding RNA promoters in Schizosaccharomyces pombe. Genes with relatively high Taf7:TBP ratios (red dots) are indicated along with enriched gene categories. (C) Relative Taf7:TBP and Taf1:TBP occupancy ratios at 10,000 most active promoters (by TBP occupancy) in the indicated human cell lines. Genes with relatively high Taf7:Taf1 ratios (red dots) are indicated along with enriched gene categories. (D) Relative occupancy levels of Taf7 and Taf1 at the 300 most expressed (by TBP occupancy) non-coding RNA promoters in the indicated cell lines. (E) Mean occupancies of the TBP, Taf1, Taf7, and Pol II with respect to the transcription start site (TSS) at promoters in the K562 cells with Taf7 at downstream locations. (F) Mean occupancies of the TBP, Taf1, Taf7, and Pol II with respect to the TSS at promoters in the K562 cells with Taf7 at the TSS. The genes in each category for E and F are presented in Supplementary file 2.

In contrast, human TAF7 is not only present at the PIC, but it also appears to associate downstream of human MHC class I promoters in a manner independent of other TFIID components (Gegonne et al., 2008). Taf7 has been suggested to negatively regulate Taf1 (Gegonne et al., 2001), to suppress TFIIH and p-TEFb kinases (Gegonne et al., 2008), to dissociate from the PIC, and to act as a checkpoint for preventing premature initiation and elongation (Gegonne et al., 2006; Gegonne et al., 2013). It is unknown whether these observations are limited to the few genes tested or are more general.

Genome-scale analysis of Taf1 and Taf7 occupancy in K562 and hESC cells reveals a complex picture. Like TAF1, Taf7 occupancy correlates reasonably well with TBP (Pearson = 0.57 in K562 and 0.72 in hESC; Figure 7—figure supplement 1A). However, some promoters display stronger than expected Taf7 occupancy relative to TBP occupancy (Figure 7C; compare with Taf1 replicates in Figure 7—figure supplement 1B). Interestingly, these high Taf7 promoters are enriched for genes involved in nucleosome organization, chromatin assembly, and DNA replication (Figure 7C), the same categories associated with unusually low Taf1 occupancy (Figure 6—figure supplement 2B). Thus, although Taf1 and Taf7 occupancies are also moderately correlated (Pearson = 0.5; Figure 7—figure supplement 1C), there is a reciprocal discordance at a subset of promoters. There is a stark difference between Taf1 and Taf7 binding at non-coding RNAs: Taf7 occupancy is high at snRNAs but low at many microRNAs and long non-coding RNAs, while the Taf1 profile is the opposite (Figure 7D and Figure 7—figure supplement 1D).

Interestingly, while Taf1- and TBP-binding profiles are indistinguishable, the Taf7 peak summit in K562 cells is sometimes observed downstream of the PIC at or near the location of paused Pol II (Figure 7E and Figure 7—figure supplement 2A). In contrast, Taf1 is located only at the TSS at most genes (Figure 7F and Figure 7—figure supplement 2B). The downstream shifted peak shape is non-symmetrical, likely indicating the presence of two peaks not fully resolved: a smaller peak in the vicinity of the TSS and a larger peak downstream. In addition, as the distance between TBP and the downstream Taf7 peaks increases, the location of the paused Pol II is further downstream (Figure 7—figure supplement 3), providing independent evidence for a role of Taf7 in the transition to full elongation. In hESCs, the relative occupancy patterns of TBP, Taf1, and Taf7 resemble those in K562 cells (Figure 7—figure supplement 2C,D). The genes exhibiting downstream of Taf7 peaks are enriched for the GO categories of chromatin organization, RNA splicing, and translation, whereas the genes where Taf7 peaks are at the TSS are enriched for other classes of genes (Figure 7—figure supplement 2E).

The relationship between Med26 and Taf7 varies at different gene groups

Med26 is a metazoan-specific subunit of Mediator, but it generally binds downstream of the TSS at protein-coding genes in mouse cells (Huang et al., 2017). Med26 interacts with both Taf7 and the P-TEFb-containing SEC complex, and at the human c-Myc and Hsp70 genes, it has been proposed to switch from binding Taf7-containing TFIID at the PIC to recruiting pTEFb to paused Pol II (Takahashi et al., 2011; Lens et al., 2017). In addition, at a subset of human snRNA genes, Med26 exchanges Taf7 for the little elongation complex, which is involved in the transcription of Pol II-dependent snRNAs (Takahashi et al., 2011; Lens et al., 2017). However, as Med26 knockdown only affects the expression of ~10% of genes (Takahashi et al., 2011), it is unclear if these functions are general.

In several human and mouse cell lines, Med26 localizes downstream of the TSS and other PIC components, roughly at the position of paused Pol II (Figure 8A). Unlike Med1, the Med26 peak shape is bimodal, suggesting the presence of two unresolved peaks, a smaller one in the vicinity of the PIC and a larger one downstream. Interestingly, while Med26 associates with the downstream region at most genes, roughly one-third of the genes show a Med26 peak only at the PIC (Figure 8B). Genes showing Med26 downstream of the PIC are enriched for the GO categories of ribosome biogenesis, mRNA splicing, and translation; genes with Med26 only at the PIC are enriched for chromatin organization (Figure 8—figure supplement 1A). Thus, similarly to Taf7, Med26 appears to be both a component of the PIC and to act independently at a site around paused Pol II, with the relative occupancy at these locations being gene specific.

Figure 8 with 2 supplements see all
Relationship between Taf7 and Med26 occupancy in the vicinity of promoters.

(A) Mean occupancies of the indicated proteins in the indicated mouse and human cell lines at the 10,000 most active mRNA promoters (by TATA-binding protein [TBP] occupancy). (B) Mean occupancies of Med1, Med26, and Taf7 in the indicated mouse and human cell lines for mRNA genes where Med26 is located downstream or at the transcription start site (TSS). (C) Venn diagram showing the intersection of promoters with Taf7 (K562 cells) and/or Med26 (HCT116 cells) at downstream locations. (D) Relative occupancies of Taf7 and Med26, versus Taf1 levels, at Taf1-enriched non-coding RNAs and relative occupancies of Taf1 and Med26 binding, versus Taf7 levels, at Taf7-enriched non-coding RNAs.

As we did not find Med26 and Taf7 datasets in the same cell line, we compared their binding profiles in different cell lines. We assumed that the behavior of GTFs would generally be similar across cell lines and that analyzing multiple human and mouse cell lines per factor would control for cell line-specific features. The genomic pattern is complex, with some genes having both downstream Med26 and Taf7 peaks, others having only downstream peaks for one of these proteins, and the remaining having no downstream peaks for either factor (Figure 8C). Ribosomal protein genes in both human and mouse tend to have Med26, but not Taf7, at downstream locations (Figure 8—figure supplement 1B), whereas histone genes often have Taf7, but not Med26 at downstream locations (Figure 8—figure supplement 1C). Overall, Med26 and Taf7 associations are less well correlated than those of Med26 and Taf1 (Figure 8—figure supplement 2). At non-coding RNAs, Med26 is enriched at both the Taf1-containing and Taf7-containing groups (Figure 8D) discussed above, showing strong occupancy level correlation with both Taf1 and Taf7 (Pearson = 0.7 and 0.9, respectively). In sum, the relationship between Med26 and Taf7 does not appear to be generally co-dependent but rather gene-specific, suggesting potentially nuanced control of Pol II pause release.

Discussion

PIC stability and inactive PIC-like complexes differ at yeast and metazoan promoters

Although it is presumed that overall PIC structure and functions of individual components are conserved, there are species-specific differences in the kinetic steps in transcriptional initiation as well as the nature of stable complexes in vivo. In yeast, Mediator association with enhancers via recruitment by activator proteins is relatively stable. However, association with the promoter is transient due to near immediate phosphorylation of the Pol II CTD by TFIIH, resulting in Mediator dissociation and Pol II escape from the promoter (Jeronimo and Robert, 2014; Wong et al., 2014). Furthermore, when TFIIH-mediated phosphorylation of the Pol II CTD is inhibited, Mediator associated with the promoter lacks the kinase module. As such, the PIC in wild-type yeast cells is very short-lived, and the stable PIC-like entity is a post-escape complex that contains GTFs but lacks Mediator and Pol II (Wong et al., 2014). In addition, the PIC is highly unstable in yeast cells depleted of nucleotide precursors (Petrenko et al., 2019), unlike the situation in vitro.

In contrast, Mediator is not stably associated with activator-binding sites in three metazoan species (human, mouse, fly) at both proximal and distal locations with respect to mRNA promoters. Instead, it is stably associated with promoters in a manner that includes the kinase module. These observations are consistent with and likely explain (1) why Mediator associates with a much lower percentage of enhancers than mRNA promoters, (2) why Mediator association at enhancers rarely, if ever occurs in the absence of GTFs, and (3) why transcription from enhancers is less efficient than from mRNA promoters. Enhancers are often identified as nucleosome-depleted regions with acetylated histones (Heintzman et al., 2007; Ernst et al., 2011), which arise from activator proteins recruiting nucleosome remodeling complexes and histone acetylates. However, activator proteins at most distal enhancers do not stably recruit Mediator, even though they are able to recruit multiple chromatin-modifying activities. In contrast, yeast activators can efficiently recruit Mediator to enhancers even when PIC formation and transcription is blocked (Knoll et al., 2018; Nguyen et al., 2021). These considerations suggest that most mammalian enhancer regions lack an efficient promoter that permits a stable PIC containing Mediator.

As Mediator containing the kinase module cannot interact with Pol II or support transcription in vitro (Elmlund et al., 2006; Knuesel et al., 2009), the presence of the kinase module at metazoan promoters suggests that this is a stable, non-functional PIC-like entity. In addition, as TFIIH-mediated phosphorylation of the Pol II CTD causes dissociation of Mediator from Pol II, the high level of Mediator at metazoan promoters suggests that TFIIH-mediated phosphorylation is relatively slow and/or inefficient compared to yeast. Thus, the active PIC in metazoans, which requires Pol II to displace the Mediator kinase module from the PIC, may also be a short-lived entity, just like PICs in yeast. However, the basis for why PICs in yeast and metazoans are short-lived is different, and this is linked to the behavior of Mediator and the nature of the inactive PIC-like complexes that are stable in vivo.

The molecular components of the presumed non-functional PIC containing the Mediator kinase module are unknown. Pol II is unlikely be present in this non-functional PIC due to the lack of an interaction with the kinase-containing version of Mediator. In addition, most Pol II molecules are located at the pause site, and this likely sterically inhibits Pol II association at the PIC. Because of this steric inhibition, it is also possible that the inactive PIC-like entity might exist in two forms that differ with respect to the presence or absence of the kinase module. The absence of TFIIH in the non-functional PIC would nicely explain the presence of Mediator at the promoter, and TFIIH ChIP signals appear low when examined with multiple antibodies in different cell lines. However, it is unknown whether the low TFIIH ChIP signals reflect true occupancy or inefficient crosslinking efficiency.

Species-specific differences in TFIID

In yeast cells, considerable evidence suggests that there are two forms of transcriptionally active TBP, namely TFIID and a TAF-independent form (Kuras et al., 2000; Li et al., 2000; Petrenko et al., 2019). The relative utilization of these two forms, and hence the relative occupancy of TAFs and TBP, varies among promoters. The TAF-independent form could be TBP alone, especially because free TBP is present in cell-free extracts (Buratowski et al., 1988). In metazoans, free TBP has never been isolated from cell-free extracts and attempts to dissociate TBP from the TFIID complex have been unsuccessful, suggesting that free TBP does not exist in appreciable quantities in vivo.

Nevertheless, as is the case in yeast, TAF:TBP occupancy ratios vary considerably among metazoan promoters suggesting that two or more transcriptionally active forms of TBP must exist. Furthermore, high and low TAF:TBP occupancy ratios are associated with different classes of genes indicating that these occupancy ratios are not due to experimental error. Instead, differences in either promoter sequence and/or activator proteins affect these gene classes. These multiple forms of transcriptionally active TBP could reflect TFIID-like complexes with different composition of TAF subunits, distinct conformations of TFIID that crosslink to the promoter with different efficiencies, or interactions with other complexes such as SAGA.

TFIID in flies differs from its yeast, human, and mouse counterparts in that it crosslinks downstream from TBP in addition to its more typical location that overlaps with TBP. Many fly promoters contain multiple promoter elements located downstream from the TATA and Initiator elements that make significant contributions to transcriptional activity (Louder et al., 2016; Baumann and Gilmour, 2017; Vo Ngoc et al., 2019). Such downstream promoter elements either do not exist or are less significant in yeast and mammalian cells. Presumably, the different pattern of TAF binding in flies reflects physical interactions of TAFs with the downstream promoter elements.

The behavior of Taf7 is perhaps the most dramatic species-specific difference in TFIID. In S. pombe (and presumably S. cerevisiae), Taf7 appears to behave simply as a subunit of a monolithic TFIID complex. In contrast, in at least two human cell lines, Taf7 and Taf1 occupancy is discordant at many promoters, most strikingly at those expressing non-coding RNAs.

These observations lead to several unanswered questions. First, can TFIID associate with promoters in the absence of Taf7? This seems unlikely from the structure of TFIID (Patel et al., 2020) unless there is another protein, such as Taf7L, that can take its place. Alternatively, low Taf7 promoters might reflect occupancy of standard TFIID, whereas high Taf7 promoters would be explained by downstream Taf7 association in addition to standard TFIID. Second, how does Taf7 associate with downstream regions near the site of the Pol II pause? It is unknown which, if any, other Tafs behave like Taf7 and whether Taf7 associates on its own or as part of a different complex. It is also unknown what aspects of paused Pol II or some other entity is required for downstream Taf7 association, although BRD4 and P-TEFb kinase are plausible candidates. Third, what is the basis of promoter specificity of Taf7 association given that TFIID, Pol II, and P-TEFb have general roles in transcription? Differences in promoter or downstream sequences and/or gene-specific activator proteins must be involved, but the mechanism is unknown.

A complex relationship between Med26 and Taf7

Med26 has similar properties to Taf7 in that it can also associate with downstream regions near the Pol II pause site in the apparent absence of other subunits of the major complex to which it belongs. In addition, Med26 and Taf7 physically interact, leading to suggestions that they function together to control the transition between transcriptional initiation and elongation (Takahashi et al., 2011; Takahashi et al., 2015; Lens et al., 2017). However, the relationship between Med26 and Taf7 is enigmatic because their associations near the Pol II pause site appear to differ considerably among genes. While this conclusion is tempered by the different cell lines used, the locations of the PIC and paused Pol II are the same in these cell lines. Furthermore, the three questions raised above for Taf7 also apply independently to Med26, thereby making the connection between these two proteins even more complicated. Whatever the mechanisms involved, they can only occur in eukaryotic species that encode Med26 and that have this unusual non-TFIID-related function of Taf7. These observations provide yet another aspect of the Pol II transcription machinery that differs considerably among eukaryotic organisms.

Materials and methods

The list of datasets obtained from GEO (SRR numbers are the accession numbers of individual datasets) is given in Supplementary file 1. Sequence reads were mapped using Bowtie available through the Galaxy server (Penn State) with the following options:: –s 0; -u 100000000; –5 11; –3 0; Phred + 33; --solexa-quals false; --int-quals false; -N 1; -L 22; -I S,1,1.15; --n-ceil L,0,0.15; --dpad 15; --gbar 4; --ignore-quals false; --no-1mm-upfront false; --local end to end; --score-min L,−0.6,–0.6; --ma 2; --mp 6,2; --np 1; --rdg 5; --rfg 5 and 3; -D 15; re-seeding 2; --seed 0; --non-deterministic false. Normalization was performed relative to the number of mapped reads. Mean occupancy curves were generated using Galaxy deepTools (Freiburg, Germany) as well as through the Penn State Galaxy server, scaled relative to the number of mapped reads and fragment size, and expressed as counts per million mapped reads. Occupancy peaks were called using MACS available through the Penn State Galaxy server with mfold bounds at 5 and 50, bandwidth to 300–500 bp, and the FDR cutoff at 0.05. To calculate the pausing index, activity in the coding region was calculated as the mean reads 2000–4000 bases downstream of the TSS (or less for shorter genes). The determination of which genes were most active was made based on TBP levels at the promoter, or, when TBP was unavailable, the levels of another GTF.

Transcription start (TSS) and stop (TTS) coordinates for human genes were obtained through the UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables), using the hg38 assembly. Coordinates for mouse were obtained from Mouse Genome Informatics (http://www.informatics.jax.org/), using the GRCm38 (mm10) assembly. For D. melanogaster, the Flybase (https://flybase.org/) was used to retrieve the dmel-all-r6.31 coordinates. For S. pombe, the Pombase (https://www.pombase.org/) was used to obtain the Schizosaccharomyces_pombe_all_chromosomes.gff3 coordinates. GO analyses were performed via the Gene Ontology Resource (http://geneontology.org/). Human enhancer coordinates for the MM1S cell line were found in Loven et al., 2013, which required mapping to hg18.

Boxplots were generated using Plotly Chart Studio (https://plot.ly/create/box-plot/). Venn diagrams were made using the Academo venn diagram generator (https://academo.org/demos/venn-diagram-generator/). The ChIP-seq data were visualized using the Integrated Genome Browser and with the assistance of a BED to SGR file converter, kindly provided by Zarmik Moqtaderi.

Data availability

All datasets and their accession numbers are listed in Supplementary file 1.

References

    1. Fromm M
    2. Berg P
    (1982)
    Deletion mapping of DNA regions required for SV40 early promoter function in vivo
    Journal of Molecular and Applied Genetics 1:457–481.
    1. Jacob F
    2. Ullman A
    3. Monod J
    (1964)
    The promoter, a genetic element necessary for expression of an operon
    Comptes Rendus Hebdomadaires Des Séances de l’Académie Des Sciences 10:3125–3138.
    1. Orphanides G
    2. Lagrange T
    3. Reinberg D
    (1996)
    The general initiation factors of RNA polymerase II
    Genes & Development 10:2657–2683.
  1. Conference
    1. Scaife J
    2. Beckwith JR
    (1966) Cold spring harbor symposia on quantitative biology
    Mutational alteration of the maximal level of lac operon expression. pp. 403–408.
    https://doi.org/10.1101/sqb.1966.031.01.052

Decision letter

  1. Naama Barkai
    Senior and Reviewing Editor; Weizmann Institute of Science, Israel

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Thank you for submitting your article "Comparison of transcriptional initiation by RNA polymerase II across eukaryotic species" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Naama Barkai as the Senior and Reviewing Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another. They are all in agreement that the paper is strong and interesting, but that some revisions are required, in particular concerning mediator association with enhancers. I attach also some notes from the communications between the reviewers which may help in preparing the revision. In particular some of the points may have several sides to them and the revision appear quite open to the discussion. Please address your point-to-point reply to all the detailed comments appended below and not to the informal discussion.

From the discussion:

Some specific comments:

1) Line 337, I suspect they mean asymmetric with respect to the TSS, but they need to clarify this. And it would definitely help to normalize the peaks so they are not so "flat"; the magnitudes for different protein ChIP signals can't be directly compared in any event.

2) I actually liked the presentation in Figures 5B-C; it is easier to discern the shape of the distribution with this than with the scatterplots. Might be helpful to add such graphs for the scatterplot data; it would clarify how outliers are defined, perhaps.

3) It would help to specify what needs fixing in Figure 6 supp1D (I agree it is not a great figure as shown). Does it need a white background or taller y-axes?

It sounds like the enhancer part should be revised substantially. That being said, this remains a quick and easy fix. They just need to use something better than CTCF, which means pretty much anything else. ESRRB in mouse ESC is a good example, but I think that they need to do more than just picking one activator. There are so many datasets available and so much information about their motifs. The data at enhancers should be plotted on the activator binding sites, if they want to make the point that Mediator is not there. Currently, they align on "…the midpoint of enhancer coordinates (Loven et al., 2013), which coincides with the location of GTFs". It is not clear what that means. Given the question they are after, it would make sense to map two ways:

1) On the activator binding sites and

2) On the TSS of the eRNA. If their conclusion holds true, Mediator should line up better on the eRNA coordinates than on the activator binding site coordinates. I also agree that in some cases (enhancers are good examples) showing heatmaps and browser views (in addition to metaplots) would be useful.

– I am not necessarily against the histogram representation (Fig5BC), although I think a scatter plot is more raw, and hence, less prone to hiding stuff. I am just concerned that they switch representation for this figure. It sounds like a uniform way of showing the data would be better.

– Regarding Fig6supp1D, I think it has several problems: Needs a y-axis that one can read (and for both panels); needs a scale bar that one can read; not clear where the genes start and end; why the grey background. The Figures, in general, are not very pleasant to the eye but it gets to another level with this one.

– I actually like Kevin's definition of promoters. It also has the merit of being clearly defined at the beginning of the paper. I am not sure what conclusion would change so drastically if using a different definition but defining the promoter as the region where the PIC assembles but not including the activator binding sites, allows discriminating between two functionally distinct elements (the promoter and the activator binding sites). Except at enhancers (where I think they are probably right but just need to show it better), the difference between yeast and metazoan is clear: in yeast, Mediator is stably associated at the UASs and resides within the PIC for a very short period; in metazoans, Mediator is not stably associated with the upstream activator binding sites (where things like SP1 or GAGA bind), but rather within the PIC (although this "PIC" most likely does not contain Pol II, based on the presence of CDK8). What bugs me a little is rather the fact that they call the UAS in yeast, "enhancers". I think that this is where the confusion may arise. UAS are not enhancers. They do not work at a distance. The UASs are more reminiscent of the promoter-proximal regulatory elements also found in higher eukaryotes (where things like SP1 or GAGA bind).

I also still think that the Med26/Taf7 part (last figure) should be removed to lean on the safe side.

The suggestion for mapping the enhancer data two ways is right on point; I think this is what should be communicated to the authors. Also his statement that yeast UASs are more like the promoter-proximal regulatory elements is on target.

With regard to the 5B-C presentation versus scatterplots, probably what should be presented to the authors is that either both formats be used or at least that scatterplots corresponding to the bar graphs also be shown, to allow cross-comparison of data.

I could go with removing the Med26-TAF7 although I found it interesting. But maybe I'm missing the point on the promoter definition. There are clearly PICs in enhancers. But Mediator is also found at some activator binding sites in enhancers without adjacent PICs. The problem is when you average, these counter examples blend in but they are very obvious on browser plots. Thus by including PICs/promoters at enhancers, the argument is not as solid. Additionally, there are typically two PICs at many mRNA encoding genes, one into the gene and one anti-sense. The anti-sense varies in strength but often is similar in abundance to PICs at enhancers. These anti-sense PICs are often in the same locations as proximal promoters so when you line these up, Mediator can align with some activators. So by using the actual gene core promoters you can easily get the results shown. But once you add enhancers and divergent promoters, the results are not as clear. The reason I picked ESRRB is because the biochemistry strongly supports a direct Mediator binding mode like many other nuclear receptors. This relationship is not as strong with other activators. Bottom line, Mediator can bind to some activators independent of PIC in mammalian cells.

The enhancer data needs more work. I think that if they align the data both ways as I suggested in my previous comments, this should do the trick. They will see better-defined aggregate peaks when aligning on the TF binding sites if your model is the prominent one. If, however, their model is correct, the peak will be better defined when aligning data on the enhancer TSSs. Showing heatmaps in addition to metaplots would help a lot, especially if you are both right and it depends on the actual enhancer (which is quite possible and would be interesting). This could then be exemplified with browser snapshots as suggested.

The fact that enhancers tend to have two relatively equal PICs should help in this analysis: aligning on the TF binding sites should lead to a valley in the center (where TFs bind) with peaks on each side (their model) or a peak in the middle (other model). Of course, the resolution may not be good enough to see the valley but at the very least, the two methods should reveal which one gives the better defines peak.

Alignment along with heat maps/browser scans should determine whether the model is correct and how generally it applies. It might help the authors if you could provide a reference or two regarding PICs sometimes being seen at enhancers and sometimes not. Or browser scans, or both.

Reviewer #1:

In this manuscript, Petrenko and Struhl used previously published ChIP-seq datasets for Mediator subunits and PIC components from different organisms to highlight similarities and differences in the composition of protein complexes assembled on promoters in different cells and organisms. Although purely descriptive (no perturbations are included in this study), the manuscript highlights several interesting differences between yeast (where Mediator and the PIC has been the most studied in vivo) and higher eukaryotes. The manuscript proposes ideas that, if confirmed in subsequent studies, are potentially paradigm-changing. The manuscript would be improved by strengthening the analyses on enhancers and by the removal of some aspects that are not as convincing (due to the unavailability of key datasets from the same cells).

The authors make the provocative proposal that Mediator at enhancers is not found at transcription factor binding sites (where it is classically thought to be recruited) but rather at the "promoter" of the flanking eRNAs. If true, this represents a change in paradigm. This point, however, is weak but could probably be strengthened relatively easily. This conclusion is weak for two related reasons. First, the choice of the dataset. The authors used CTCF as a representative transcription factor and compared its occupancy to that of Mediator at enhancers. It is not clear why the authors chose CTCF here. Transcription factors have been abundantly profiles by ChIP-seq in mouse and human cells, so more "canonical" TF datasets could have been used. CTCF has a TF role, indeed, but is also very well known for its role as an architectural protein at insulators and chromatin domain boundaries. These non-TF functions of CTCF may confound the analysis here. Also, the CTCF peak is not well defined. The authors claim that its occupancy does not coincide with that of Mediator. While this is clear at promoters (Figure 1-supp1C) is not obvious at enhancers (Fig1D) where CTCF has a very noisy signal (does not generate a clear peak). Hence, for both conceptual and technical reasons, CTCF appears a bad choice. Give the importance (and somewhat provocative nature) of the point they are making in this analysis, the authors should solidify their claim. Adding ChIP-seq data for other TF would most likely help a lot. Also, the authors could also map the density of TF binding motifs from databases such as Oreganno, Jasper, or TRANSFAC at enhancers.

The interplay between Taf7 and Med26 (described in Fig7) is potentially of high interest, but the fact that there exists no dataset for both these factors from the same cells, makes their comparison quite hazardous, especially when scrutinizing difference between genes (because different genes are expressed in different cells). This potentially confounds the analysis shown in Fig7. In fact, no clear conclusion came from these analyses. So, in order to refocus the manuscript on its strengths, I suggest that the section related to the relationship between Med26 and Taf7 (essentially Fig7 and its supplements) be removed.

Reviewer #2:

Strengths: In the past several years, application of ChIP-seq together with novel approaches for depleting essential proteins, such as those involved in transcription, has yielded new insights into the fundamental dynamics of transcriptional activation, particularly in the model organism Saccharomyces cerevisiae. More limited data has been obtained for metazoan organisms, and comparison of mechanisms among species is rarely reported. Here, the authors make use of publicly available datasets to ask pointed questions regarding differences and similarities in fundamental mechanisms of transcription initiation in yeast, flies, mouse and human cells. Specifically, they address Mediator dynamics, the relationship between TBP and TFIID, interactions between Tafs and promoter sequences, and interaction between Mediator and TFIID subunits at sites of paused polymerase. These analyses provide interesting insights and, just as importantly, raise new questions and point to experimentally addressable gaps in our understanding.

Weaknesses: The authors were of course constrained by the data that is actually available, which for metazoans is limited. For example, ChIP-seq data was analyzed for only two TFIID subunits in human and flies, and none for mouse. Additionally, some of the datasets used did not include replicates. These limitations make it difficult to ascertain how robust conclusions derived really are; nonetheless, most of the conclusions seem well-founded based on the data that is available.

One conclusion that could be argued is that in metazoan cells, at "so-called enhancers, Mediator is stably associated with promoters, not activator binding sites". This conclusion is based on Mediator and Cdk7, a component of TFIIH, binding at sites distinct from CTCF at enhancers in MM1S cells. However, CTCF functions mainly as insulator, not activator, so this conclusion is suspect.

Another weakness is the heavy reliance on line graphs representing average occupancies of factors, which obscures possible effects of small numbers of targets with very high occupancies. Inclusion of heat maps and more browser screenshots would be helpful. Gene ontology analyses are also presented rather superficially; such enrichments are not always as straightforwardly interpreted as simple p-values would suggest. Finally, it is not so easy to discern which datasets contributed to the individual figures.

1) In the abstract, it is stated that in human, mouse, and fly cells, Mediator with its kinase module stably associates with promoters, but not with activator-binding sites. Good evidence is presented to support this statement with regard to activator-binding sites that are close to promoter sites. However, enhancers are considered here as a second kind of promoter, rather than as primary activator-binding sites, for reasons given at the beginning of the Results section. But in comparing metazoan cells to yeast, is this the correct comparison? Given the evidence for enhancer-promoter loops in metazoans, I would think that metazoan enhancers should be considered more like UASs in yeast (i.e., the conventional view). In this case yeast and metazoan genes still behave differently, inasmuch as Mediator occupancy is observed at promoters while this is not seen in yeast. But it's not quite as stated in the manuscript.

2) There is a strong reliance on line graphs showing average occupancy of the factors examined and on scatterplots of occupancies. Line graphs can obscure cases in which a small number of data points make very strong contribution; heat maps can help in ascertaining whether this is the case. The heat maps in Figure 5—figure supplement 1 (the only ones shown) make this case, as line graphs based on these heat maps would not distinguish the different behavior of the histone cluster genes. The line graphs also appear heavily smoothed. I was unable to determine how this smoothing was done based on the terse description of methods. It appears that Bowtie, which has been superseded by Bowtie2 on Galaxy, may have been used for mapping; I could not find most of the parameters indicated in the Methods in the description of Bowtie2 on Galaxy. I'm not objecting to use of Bowtie if that was used, but a better explication of the parameters used should be provided.

It was also unclear to me how occupancy values used in the scatterplots were calculated. Were they based on simple normalized coverage (i.e. not relative to any control data)? Were they derived from MACS or calculated another way? Was anything done to ensure that aberrant data was removed? This is not always a problem but can be; as an example, TBP occupancy in yeast is very high at tRNA genes, so that Pol II transcribed genes that abut tRNA genes can give aberrantly high TBP occupancy values, depending on the details of how occupancy is calculated. Additional browser screenshots would be helpful to support conclusions in many of the figures. This is especially true when the data comes from multiple sources as it does here.

3) Gene ontology analyses should be examined more critically. Some of the FDR p-values reported are not really so impressive (> 10e-6, which for hypergeometric test is not that great). The authors should report expected (if random distribution) and observed fractions of genes in a given category found. Are the enriched categories presented in the various figures and tables the only ones found, or are there others not shown? Browser scans would be especially nice here to visualize differences, for example, between genes having high vs. low ratios of Med1/Med30 in Figure 3B.

4) It is difficult to figure out where the data used in the individual figures is from. This should be stated explicitly (i.e. references given) in the figure legends.

5) Figure 5 uses replicates of TBP ChIP as a control for distribution spread of Taf2/TBP ratios in Drosophila. A better control would be to compare the distribution of two different subunits of TFIID that are expected to always be present, as that could control for the use of different antibodies. Unfortunately, the data for Taf1 is from S2R cells and for Taf2 from Kc167 cells, so it is not clear that should correlate perfectly. There may be no fix for this but the limitations of the control should be recognized. The authors argue that GO enrichment is evidence for the varied Taf2/TBP ratios being meaningful, but there are artifacts that could also produce such enrichments without affecting the spread of replicate data-for example, effects due to nearby chromosomal locations of particular groups of genes, perhaps, or to particular elements preferentially neighboring genes in a GO category.

Reviewer #3:

This manuscript employs publicly available genomic ChIP-seq data to compare and contrast the composition of RNA Polymerase II pre-initiation complexes (PICs) among different model organisms. The data reveal interesting similarities and differences that investigators in the gene regulation field should find interesting, and that may impact the regulatory mechanisms of different classes of genes. Of interest are claimed differences in the Mediator co-activator binding in yeast versus metazoans, and an unusual mode of binding of TAF7 subunit of TFIID. Most of the conclusions were supported by the data, which employed standard analyses of ChIP-seq data with solid statistical correlation approaches. The Mediator data were limited by the absence of a more extensive analysis correlating specific activator binding in the model organisms with Mediator binding and PIC assembly, particularly at enhancers where small levels of PICs are found. This weakened the general conclusion that Mediator only binds when PICs are present in higher eukaryotic cells.

In their manuscript, Petrenko and Struhl examined how Pol II preinitiation complex (PIC) components behave differently across different eukaryotic species. By analyzing publicly available genomic data, the authors discovered how specific subunits correlate with each other and how distinct versions of Mediator/TFIID may be related to different gene categories in yeast, fly, human and mouse. For Mediator, the authors found that this co-activator complex is enriched at enhancers in yeast but mainly binds promoters in the other three species, albeit this is a debatable point (see below). The authors also argue that TFIID exists in different forms at different classes of genes. For example, the authors find Taf7 to be a special TFIID subunit in humans because it behaves differently than other TFIID subunits like TAF1. The relationship between Taf7 and the mammalian-specific protein Med26 was also studied based on the Conaway's Cell paper from a few years back.

Overall, this manuscript reports interesting discoveries that we feel will be of use to the field. For example, TFIID/Mediator forms may be gene-specific and PIC structures display differences between species. However, we are concerned with the definition of promoters used in the manuscript (see below). A different definition will change one of the main conclusions dramatically. Thus, this concern should be addressed more carefully.

1. Figure 1C. What does Med2 refer to in mammalian Mediator. Why are there Med2 (V6.5)' data in this figure? Is this Med26 or Med29? Also, we would like to see Pol II data on this graph.

2. Med1 data are shown in both Figure 1C and Figure 1 —figure supplement 1B. These data are from essentially the same cell line (E14 and E14tg2a) and plotted at the same genomic loci (10,000 most active mRNA promoters). Why is the Med1 summit located at the TSS in Figure 1C but in Figure 1 —figure supplement 1B, it is greater than 100 bp downstream of the TSS?

3. Paragraph starting on page 8, line 179. This section is confusing. First, on page 7, the authors define promoters as any genomic region bound by GTFs and Pol II (line 149-150). According to this definition, the authors appear to claim that the distal enhancers are all promoters because of Cdk7 binding as shown in Figure 1D. This led to the authors' conclusion that 'Thus, even at so-called enhancers, Mediator is stably associated with promoters' (line 186-187). However, Figure 1D shows the average data at all enhancers. There may be a portion of enhancers where there is no Cdk7 binding. It is risky to make such absolute statements. Second, the authors also conclude that Mediator is not bound at activator binding sites at enhancers (line 187). However, what is the definition of 'activator binding sites'? We did not find any data relating to this point. There are mounds of activator binding data for enhancers in many cell lines, particularly ES cells. Shouldn't the authors analyze some of these data before making such strong statements? Note that ESRRB is in an activator in mouse V6.5 ES cells. ESRRB binds Mediator directly in vitro by MS, biochemical experiments and so on. A published analysis of ESRRB at annotated enhancers shows some examples where Mediator (Med1) is bound with small amounts of PIC components including TAF1, TFIIB, and Pol II and numerous other examples where there is no evidence of a PIC. Third, what is the point of showing CTCF data? CTCF is neither a GTF nor an activator. Fourth, in the subtitle in line 158, it says 'mammalian Mediator with its kinase module … distal locations', but where are the ChIP-seq data of CKM subunits in Figure 1D for the distal locations. How can the authors make the conclusion stated in the subtitle?

4. Page 9, line 209-210. We are confused by the statement, 'Thus, unlike the case in yeast, mammalian Mediator association with promoters is not strictly correlated with association of other PIC components.' To us, this sentence sounds like that in yeast, Mediator and PIC should correlate at promoters. However, in wild type yeast, Mediator does not stably associate with promoters as shown in page 8, line 160. Please clarify.

5. Figure 5C and Figure 5—figure supplement 2A. The authors show data of TBP vs TFIIB but don't seem to describe them in the manuscript. Also, in Figure 5—figure supplement 2A, in panel with 'TBP vs TFIIB', the high and low ratio dots are not highlighted as in the other panels.

6. Page 14, line 322-323. 'However, some promoters display stronger than expected Taf7 occupancy relative to TBP and Pol II occupancy.' Where are the Pol II data shown?

7. Figure 6E and F. What is the number of plotted genes for each panel?

https://doi.org/10.7554/eLife.67964.sa1

Author response

New enhancer analyses: The main concern about our enhancer analysis was valid, and we were aware of the issues raised. Analyses at enhancers are trickier than one might expect for 5 reasons. First, there are few high-quality Mediator datasets with good signals at enhancers. In retrospect and as we discuss below, this hints at the conclusion. Second, the high-quality Mediator datasets are in cell lines that typically lack datasets for PIC components. That limits the datasets we can use, and it is why we used TFIIH as the PIC component. Third, unlike promoters where the TSS is used for alignment, it is not obvious how to align enhancers. Fourth, enhancers function bidirectionally and often generate bidirectional transcripts, so there is no way to distinguish upstream from downstream. So, many analyses yield symmetric data around the alignment point, making it hard to see differences between activator binding sites and the PIC. Fifth, bidirectionality causes another problem in that the 2 divergent promoters are often too close together and can’t be resolved separately. Consequently, the GTF peaks appear to map to the center of the enhancer, which coincides with the activator peaks. In these cases, we can’t address whether Mediator is recruited by the activator or the PIC/promoter.

New Figure 2B. We circumvented the alignment and bidirectionality issues by measuring the distances between Mediator, the PIC component TFIIH, and E2F/DP1 activator summits (2 different antibodies) at both enhancer and mRNA promoters. As a control, the median distance for an individual factor in biological replicates is ~45 bp, reflecting experimental variation in peak summits. The median distances between Mediator and PIC components tested is 48 bp, confirming that Mediator coincides with the PIC. However, the median distance between Mediator and the activators tested is ~90 bp, indicating that Mediator is not associated with the activator. So, at both promoters and enhancers, Mediator localizes with the PIC, not activators bound at their target sites.

New Figure 2A. Enhancers are DNase hypersensitive with acetylated histones due to activator-mediated functions of recruiting nucleosome remodelers and histone acetylases; a typical cell line has ~150,000 such enhancers. We measured Mediator occupancy at mRNA promoters and enhancers. The top 10,000 mRNA promoters and top 10,000 enhancers have similar occupancy values. However, the median Mediator occupancy at “all” enhancers is far lower. Moreover, even this low value is mostly due to the contributions of the top 10,000 enhancers. In addition, our definition of “all” enhancers only included the ~60,000 where there was any detectable signal; hence it did not include the nearly ~100,000 enhancers where no signal was detected. This observation is striking because the vast majority of the 150,000 enhancers presumably recruit the chromatin-modifying activities (i.e. the basis for the chromatin properties of enhancers), yet recruit Mediator at very low or non-detectable levels. It is dramatically different from the situation in yeast where the activator proteins recruit normal levels of Mediator even when PIC formation and transcription is blocked.

New Figure 3B. We analyzed the relationship between Mediator, Cdk7, and Pol II at distal enhancers. The correlation of Mediator occupancy with either Cdk7 and Pol II occupancy is strong (R = 0.6) and only slightly less than the correlation between the PIC components Cdk7 and Pol II (R = 0.7) at distal enhancers and the correlations at mRNA promoters (Figure 2A). In addition, and in direct contrast to the situation in yeast, there are few, if any, enhancers in which Mediator is associated but GTFs are not.

Conclusion. These three new analyses provide independent and conclusive evidence that mammalian Mediator is stably recruited by the PIC, but not by activator proteins bound to their target sites. Of course, we believe that activator proteins interact with Mediator, but the stable association of Mediator is with the PIC in mammalian cells and with activators in yeast. Our results strongly suggest the PIC levels, Mediator occupancy, and transcription within enhancers is usually low due to the absence of a good promoter. In this view, the small subset of enhancers with high Mediator and Pol II occupancy have good promoters in the vicinity of the activator binding sites.

Reviewer #1:

In this manuscript, Petrenko and Struhl used previously published ChIP-seq datasets for Mediator subunits and PIC components from different organisms to highlight similarities and differences in the composition of protein complexes assembled on promoters in different cells and organisms. Although purely descriptive (no perturbations are included in this study), the manuscript highlights several interesting differences between yeast (where Mediator and the PIC has been the most studied in vivo) and higher eukaryotes. The manuscript proposes ideas that, if confirmed in subsequent studies, are potentially paradigm-changing. The manuscript would be improved by strengthening the analyses on enhancers and by the removal of some aspects that are not as convincing (due to the unavailability of key datasets from the same cells).

The authors make the provocative proposal that Mediator at enhancers is not found at transcription factor binding sites (where it is classically thought to be recruited) but rather at the "promoter" of the flanking eRNAs. If true, this represents a change in paradigm. This point, however, is weak but could probably be strengthened relatively easily. This conclusion is weak for two related reasons. First, the choice of the dataset. The authors used CTCF as a representative transcription factor and compared its occupancy to that of Mediator at enhancers. It is not clear why the authors chose CTCF here. Transcription factors have been abundantly profiles by ChIP-seq in mouse and human cells, so more "canonical" TF datasets could have been used. CTCF has a TF role, indeed, but is also very well known for its role as an architectural protein at insulators and chromatin domain boundaries. These non-TF functions of CTCF may confound the analysis here. Also, the CTCF peak is not well defined. The authors claim that its occupancy does not coincide with that of Mediator. While this is clear at promoters (Figure 1-supp1C) is not obvious at enhancers (Fig1D) where CTCF has a very noisy signal (does not generate a clear peak). Hence, for both conceptual and technical reasons, CTCF appears a bad choice. Give the importance (and somewhat provocative nature) of the point they are making in this analysis, the authors should solidify their claim. Adding ChIP-seq data for other TF would most likely help a lot. Also, the authors could also map the density of TF binding motifs from databases such as Oreganno, Jasper, or TRANSFAC at enhancers.

We agree that CTCF was a non-optimal choice and hence have removed it. Instead, we now analyze E2F and DP1, a heterodimeric activator.

The interplay between Taf7 and Med26 (described in Fig7) is potentially of high interest, but the fact that there exists no dataset for both these factors from the same cells, makes their comparison quite hazardous, especially when scrutinizing difference between genes (because different genes are expressed in different cells). This potentially confounds the analysis shown in Fig7. In fact, no clear conclusion came from these analyses. So, in order to refocus the manuscript on its strengths, I suggest that the section related to the relationship between Med26 and Taf7 (essentially Fig7 and its supplements) be removed.

The original paper recognized and discussed the problem that there are no datasets where Taf7 and Med26 are examined in the same cell line. Nevertheless, we would like to keep this analysis in the paper as it is interesting and worthy of public scrutiny. We have further softened the statement in the abstract. I note that the Taf7 and Med26 data is always compared to Pol II and TBP in the same cell line and that Pol II at most genes is in the identical location in both cell lines. Hence, although cell-type-specific effects could be involved in the observations, these are unlikely to be mediated through the PIC and paused Pol II per se, as these are the same in both cell lines. If the reviewers insist, we will delete this section from the paper, but we would certainly prefer that it remain.

We thank Reviewer 1 for the detailed list of suggestions in the text, almost all of which were heeded. Regarding the old Figure 6C (now 7C), we plotted TAF:TBP ratios in order to know not just the relation of one to the other but also when one of them was high relative to TBP (and thus PIC levels) and when one of them was low (i.e., depleted relative to TBP and thus PIC levels).

Reviewer #2:

1) In the abstract, it is stated that in human, mouse, and fly cells, Mediator with its kinase module stably associates with promoters, but not with activator-binding sites. Good evidence is presented to support this statement with regard to activator-binding sites that are close to promoter sites. However, enhancers are considered here as a second kind of promoter, rather than as primary activator-binding sites, for reasons given at the beginning of the Results section. But in comparing metazoan cells to yeast, is this the correct comparison? Given the evidence for enhancer-promoter loops in metazoans, I would think that metazoan enhancers should be considered more like UASs in yeast (i.e., the conventional view). In this case yeast and metazoan genes still behave differently, inasmuch as Mediator occupancy is observed at promoters while this is not seen in yeast. But it's not quite as stated in the manuscript.

See enhancer section above. The comment about yeast UAS vs. enhancers addresses an important issue that I’m planning on writing about in a perspective. People often view yeast UAS and mammalian enhancers as different because yeast UAS do not work at long distances and when downstream of the promoter (Struhl, 1984; Guarente and Hoar, 1984; yes, this was a long time ago). However, an unappreciated paper (Petrascheck et al., 2015 NAR) shows that UASs can function downstream and at long distances in yeast when it is connected near the promoter via an artificial loop. Other studies indicate that the loops in mammalian cells do not occur between the enhancer and what is called the core promoter (what I call the promoter), but rather between the enhancer and activator sites near the promoter (Nolis et al., 2009 PNAS; Deng et al., 2012 Cell) and a very recent paper from Mike Carey (Sun et al., 2021 Genes Dev) indicates that loop formation is independent of the PIC. So, comparing yeast UASs and enhancers is appropriate as the difference is related to looping functions of activators, not the PIC or activation per se.

2) There is a strong reliance on line graphs showing average occupancy of the factors examined and on scatterplots of occupancies. Line graphs can obscure cases in which a small number of data points make very strong contribution; heat maps can help in ascertaining whether this is the case. The heat maps in Figure 5—figure supplement 1 (the only ones shown) make this case, as line graphs based on these heat maps would not distinguish the different behavior of the histone cluster genes. The line graphs also appear heavily smoothed. I was unable to determine how this smoothing was done based on the terse description of methods. It appears that Bowtie, which has been superseded by Bowtie2 on Galaxy, may have been used for mapping; I could not find most of the parameters indicated in the Methods in the description of Bowtie2 on Galaxy. I'm not objecting to use of Bowtie if that was used, but a better explication of the parameters used should be provided.

It was also unclear to me how occupancy values used in the scatterplots were calculated. Were they based on simple normalized coverage (i.e. not relative to any control data)? Were they derived from MACS or calculated another way? Was anything done to ensure that aberrant data was removed? This is not always a problem but can be; as an example, TBP occupancy in yeast is very high at tRNA genes, so that Pol II transcribed genes that abut tRNA genes can give aberrantly high TBP occupancy values, depending on the details of how occupancy is calculated. Additional browser screenshots would be helpful to support conclusions in many of the figures. This is especially true when the data comes from multiple sources as it does here.

We disagree with this comment about line graphs and scatterplots, but in fact many of the desired scatterplots are already in Supplemental Figures. Line graphs are crucial for the experiments where we map the relative locations of the factors. The line graphs were done with the Galaxy Deeptools plotting tool and are not smoothed. Heat maps (and scatterplots) are inappropriate for this purpose. The Reviewer is correct that we actually used Bowtie2, and we provide more details in the Methods. As the Reviewer suspected, occupancy values were based simply on normalized coverage, of the read counts. Our new pairwise peak summit analysis (see above and new Figure 2B) is an independent way to get the same kind of information. Scatterplots are extremely useful in comparing the relative occupancies of factors at a given locus because every locus is included such that one can get an overall correlation as well as identifying “outliers” of interest, which is done throughout the paper. In fact, scatterplots would easily identify the different behavior of the histone genes. Heat maps are less useful for these purposes because the loci are ordered by some parameter, which can easily miss outliers. Of course, heat maps certainly are useful for other purposes.

Yeast tRNA genes were not considered in our analyses, and there are very few examples where the TBP signal at tRNA genes interferes with TBP at “nearby” Pol II genes (we published quite a bit on Pol III transcription in yeast and human cells). In addition, we did not identify Pol III genes at loci with low Taf:TBP occupancy ratios.

3) Gene ontology analyses should be examined more critically. Some of the FDR p-values reported are not really so impressive (> 10e-6, which for hypergeometric test is not that great). The authors should report expected (if random distribution) and observed fractions of genes in a given category found. Are the enriched categories presented in the various figures and tables the only ones found, or are there others not shown? Browser scans would be especially nice here to visualize differences, for example, between genes having high vs. low ratios of Med1/Med30 in Figure 3B.

The GO ontology analyses were corrected for multiple hypotheses, as is standard and values are defined as FDR. It is a matter of opinion about how “impressive” 10-E6 is, but clearly such values are far beyond chance. This is important because the main use of these GO analyses is to show that the “outliers” are not due to experimental variance. In addition, there are several examples in the paper of GO analyses did not yield enriched categories, and these effectively serve as controls. See also our response to comment 5.

4) It is difficult to figure out where the data used in the individual figures is from. This should be stated explicitly (i.e. references given) in the figure legends.

The figures all list the cell line and the factor, and the relevant datasets are all cited with GEO numbers.

5) Figure 5 uses replicates of TBP ChIP as a control for distribution spread of Taf2/TBP ratios in Drosophila. A better control would be to compare the distribution of two different subunits of TFIID that are expected to always be present, as that could control for the use of different antibodies. Unfortunately, the data for Taf1 is from S2R cells and for Taf2 from Kc167 cells, so it is not clear that should correlate perfectly. There may be no fix for this but the limitations of the control should be recognized. The authors argue that GO enrichment is evidence for the varied Taf2/TBP ratios being meaningful, but there are artifacts that could also produce such enrichments without affecting the spread of replicate data-for example, effects due to nearby chromosomal locations of particular groups of genes, perhaps, or to particular elements preferentially neighboring genes in a GO category.

The use of TBP replicates as a control for the distribution of Taf2:TBP ratios is appropriate, as these involve the same chromatin preparation and hence have identical size distributions of fragmented chromatin. Another control (unmentioned in the original paper, but now mentioned; see response to comment 5 of Reviewer 3) is the distribution of TFIIB:TBP ratios (old Figure 5C and Figure 5-supplement 2A; now Figure 6C and Figure 6-supplement 2A), which involve 2 different antibodies. Regarding the suggestion to analyze Taf1:Taf2 ratios (and aside from the lack of appropriate datasets), the use of different antibodies does not affect the relative Taf1:Taf2 ratios at promoters. Different antibodies will often have different IP efficiencies, so the absolute ratios have no meaning. However, the relative ratios among all the sites in the same samples are not affected by the different antibodies. We have used this “relative ratio” approach in many previous papers, and it is quite informative.

I think the Reviewer misunderstood our use of enriched GO categories for the varied Taf2:TBP ratios. The fact that genes in the low or high Taf2:TBP classes are enriched for certain GO categories excludes the possibility that such genes merely reflect experimental variation of ratios. As such, the results are meaningful. However, this does not (and we do not) imply any specific mechanism for why these classes exist, although I think the suggestions in this comment are unlikely. In this regard, as noted also by Reviewer 1, it is striking that some of the enriched categories in mammalian cells resemble those in yeast.

Reviewer #3:

1. Figure 1C. What does Med2 refer to in mammalian Mediator. Why are there Med2 (V6.5)' data in this figure? Is this Med26 or Med29? Also, we would like to see Pol II data on this graph.

Med2 is really Med1; just a mistake. Pol II data included.

2. Med1 data are shown in both Figure 1C and Figure 1 —figure supplement 1B. These data are from essentially the same cell line (E14 and E14tg2a) and plotted at the same genomic loci (10,000 most active mRNA promoters). Why is the Med1 summit located at the TSS in Figure 1C but in Figure 1 —figure supplement 1B, it is greater than 100 bp downstream of the TSS?

We thank the reviewer for catching this discrepancy between Figure 1C and old Figure 1-supplement 1B. Indeed, the result in old Figure 1-supplement 1B is bizarre and inconsistent with every other data set indicating that Mediator is localized at the PIC. Instead, the 2 Mediator subunits tested associate roughly in the position of paused Pol II, which makes no sense. We don’t know the reason for this, but suspect it is an artifact. In this regard, my lab has encountered a few cases where a protein (totally unrelated to Mediator or the Pol II machinery) has exactly the same unexpected pattern. Furthermore, our examples and the examples in Figure 1-supplement 1B all involved ChIP-seq experiments with disuccinimidyl glutamate in addition to formaldehyde. So, we think the data in old Figure-supplement 1B is unreliable; hence this figure has been removed.

3. Paragraph starting on page 8, line 179. This section is confusing. First, on page 7, the authors define promoters as any genomic region bound by GTFs and Pol II (line 149-150). According to this definition, the authors appear to claim that the distal enhancers are all promoters because of Cdk7 binding as shown in Figure 1D. This led to the authors' conclusion that 'Thus, even at so-called enhancers, Mediator is stably associated with promoters' (line 186-187). However, Figure 1D shows the average data at all enhancers. There may be a portion of enhancers where there is no Cdk7 binding. It is risky to make such absolute statements. Second, the authors also conclude that Mediator is not bound at activator binding sites at enhancers (line 187). However, what is the definition of 'activator binding sites'? We did not find any data relating to this point. There are mounds of activator binding data for enhancers in many cell lines, particularly ES cells. Shouldn't the authors analyze some of these data before making such strong statements? Note that ESRRB is in an activator in mouse V6.5 ES cells. ESRRB binds Mediator directly in vitro by MS, biochemical experiments and so on. A published analysis of ESRRB at annotated enhancers shows some examples where Mediator (Med1) is bound with small amounts of PIC components including TAF1, TFIIB, and Pol II and numerous other examples where there is no evidence of a PIC. Third, what is the point of showing CTCF data? CTCF is neither a GTF nor an activator. Fourth, in the subtitle in line 158, it says 'mammalian Mediator with its kinase module … distal locations', but where are the ChIP-seq data of CKM subunits in Figure 1D for the distal locations. How can the authors make the conclusion stated in the subtitle?

See initial section on enhancers. Cdk7 is a subunit of TFIIH and hence is a GTF that marks the location of the PIC. Hence, by our (and Jacob and Monod’s) definition, promoters are found at so-called enhancers, this is hardly surprising given the existence of “enhancer RNAs”. Mediator and Cdk7 locations coincide, indicating that Mediator is located at the PIC in so-called enhancers. In contrast, Mediator and Cdk7 locations do not coincide with the location of activator proteins. The objection to CTCF as an activator has some merit, so we removed it from the revised manuscript and instead analyzed the E2F/DP1 activator.

4. Page 9, line 209-210. We are confused by the statement, 'Thus, unlike the case in yeast, mammalian Mediator association with promoters is not strictly correlated with association of other PIC components.' To us, this sentence sounds like that in yeast, Mediator and PIC should correlate at promoters. However, in wild type yeast, Mediator does not stably associate with promoters as shown in page 8, line 160. Please clarify.

We apologize for the confusion about the statement that Mediator and PIC components are highly correlated in yeast cells (also mentioned by Reviewer 1). This was meant to refer to conditions of Kin28 depletion/inactivation where Mediator can be seen at the promoter.

5. Figure 5C and Figure 5—figure supplement 2A. The authors show data of TBP vs TFIIB but don't seem to describe them in the manuscript. Also, in Figure 5—figure supplement 2A, in panel with 'TBP vs TFIIB', the high and low ratio dots are not highlighted as in the other panels.

We now include a sentence about the TBP:TFIIB ratio analysis, which serves as a control (see response to comment 5 of Reviewer 2). High and low dots are not highlighted in Figure 5—figure supplement 2A because this is the control experiment and there aren’t any significant high or low ratios.

6. Page 14, line 322-323. 'However, some promoters display stronger than expected Taf7 occupancy relative to TBP and Pol II occupancy.' Where are the Pol II data shown?

Correct. Pol II occupancy was not explicitly shown here so we deleted Pol II data from the text. Other analyses in the paper clearly show strong correlation of TBP and Pol II occupancy.

7. Figure 6E and F. What is the number of plotted genes for each panel?

We now include the plotted genes in old Figure 6E, F (now 7E, F) in Table 2.

https://doi.org/10.7554/eLife.67964.sa2

Article and author information

Author details

  1. Natalia Petrenko

    Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing - original draft, Writing - review and editing
    Competing interests
    none
  2. Kevin Struhl

    Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, United States
    Contribution
    Conceptualization, Funding acquisition, Project administration, Supervision, Writing - original draft, Writing - review and editing
    For correspondence
    kevin@hms.harvard.edu
    Competing interests
    Senior editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4181-7856

Funding

National Institutes of Health (GM 30186)

  • Kevin Struhl

National Institutes of Health (GM 131801)

  • Kevin Struhl

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Zarmik Moqtaderi for programming scripts and discussions. This work was supported by grants to KS from the National Institutes of Health (GM30186 and GM131801).

Senior and Reviewing Editor

  1. Naama Barkai, Weizmann Institute of Science, Israel

Publication history

  1. Received: March 1, 2021
  2. Accepted: September 10, 2021
  3. Accepted Manuscript published: September 13, 2021 (version 1)
  4. Version of Record published: September 24, 2021 (version 2)

Copyright

© 2021, Petrenko and Struhl

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,034
    Page views
  • 249
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Chromosomes and Gene Expression
    Natalia Petrenko et al.
    Research Article Updated

    The Mediator complex has been described as a general transcription factor, but it is unclear if it is essential for Pol II transcription and/or is a required component of the preinitiation complex (PIC) in vivo. Here, we show that depletion of individual subunits, even those essential for cell growth, causes a general but only modest decrease in transcription. In contrast, simultaneous depletion of all Mediator modules causes a drastic decrease in transcription. Depletion of head or middle subunits, but not tail subunits, causes a downstream shift in the Pol II occupancy profile, suggesting that Mediator at the core promoter inhibits promoter escape. Interestingly, a functional PIC and Pol II transcription can occur when Mediator is not detected at core promoters. These results provide strong evidence that Mediator is essential for Pol II transcription and stimulates PIC formation, but it is not a required component of the PIC in vivo.

    1. Biochemistry and Chemical Biology
    2. Chromosomes and Gene Expression
    Gemma LM Fisher et al.
    Research Article Updated

    Structural Maintenance of Chromosomes (SMC) complexes have ubiquitous roles in compacting DNA linearly, thereby promoting chromosome organization-segregation. Interaction between the Escherichia coli SMC complex, MukBEF, and matS-bound MatP in the chromosome replication termination region, ter, results in depletion of MukBEF from ter, a process essential for efficient daughter chromosome individualization and for preferential association of MukBEF with the replication origin region. Chromosome-associated MukBEF complexes also interact with topoisomerase IV (ParC2E2), so that their chromosome distribution mirrors that of MukBEF. We demonstrate that MatP and ParC have an overlapping binding interface on the MukB hinge, leading to their mutually exclusive binding, which occurs with the same dimer to dimer stoichiometry. Furthermore, we show that matS DNA competes with the MukB hinge for MatP binding. Cells expressing MukBEF complexes that are mutated at the ParC/MatP binding interface are impaired in ParC binding and have a mild defect in MukBEF function. These data highlight competitive binding as a means of globally regulating MukBEF-topoisomerase IV activity in space and time.