Abstract
Genes are often regulated by multiple enhancers. It is poorly understood how the individual enhancer activities are combined to control promoter activity. Anecdotal evidence has shown that enhancers can combine sub-additively, additively, synergistically, or redundantly. However, it is not clear which of these modes are more frequent in mammalian genomes. Here, we systematically tested how pairs of enhancers activate promoters using a three-way combinatorial reporter assay in mouse cells. By assaying about 69,000 enhancer-enhancer-promoter combinations we found that enhancer pairs generally combine near-additively. This behaviour was conserved across seven developmental promoters tested. Surprisingly, these promoters scale the enhancer signals approximately following a power-law, but the exponent of this response varies between promoters. A housekeeping promoter showed an overall different response to enhancer pairs, and a smaller dynamic range. Thus, our data indicate that enhancers mostly act additively, but promoters transform their collective effect non-linearly.
Introduction
Enhancers: background
Enhancers are DNA elements, typically a few hundred base pairs long, that bind transcription factors (TFs) and boost the activity of genes in cis (Banerji et al., 1981). Over the past decades, many insights have been obtained into the molecular architecture of enhancers, the roles of TFs and their cofactors, the interplay of these factors with chromatin, and the diversity of mechanisms that govern the interactions of enhancers with promoters (S. Kim & Wysocka, 2023).
MPRAs to probe enhancers
In recent years, massively parallel reporter assays (MPRAs) have been developed to systematically dissect the sequence features that determine the functionality of regulatory elements by probing thousands or even millions of DNA elements in a multiplexed manner (Inoue & Ahituv, 2015; Klein et al., 2020; Gallego Romero & Lea, 2023). Among others, two-way combinatorial MPRA studies that tested a large number of enhancer - promoter pairs revealed a variable degree of intrinsic compatibility between enhancers and promoters (Zabidi et al., 2015; Bergman et al., 2022; Martinez-Ara et al., 2022), in line with earlier evidence from analyses of single genomic loci (Bertolino & Singh, 2002; Chang et al., 2004; Vakoc et al., 2005; Jing et al., 2008; Deng et al., 2012).
Modes of enhancer interplay are still poorly understood
Surveys of mammalian genomes suggest that there are many more enhancers than genes (ENCODE Project Consortium, 2012). Indeed, genes are frequently controlled by multiple enhancers (Shlyueva et al., 2014; Gasperini et al., 2020; S. Kim & Wysocka, 2023). Often, such enhancers are clustered and jointly regulate a nearby tissue-specific gene (Tuan et al., 1985; Grosveld et al., 1987; Tolhuis et al., 2002; Deng et al., 2012; Hnisz et al., 2013; Whyte et al., 2013; Hay et al., 2016; Schoenfelder & Fraser, 2019; Brosh et al., 2023). However, how multiple enhancers act together to control a single promoter is still poorly understood. Several studies have found that enhancers can combine either additively (Bothma et al., 2015; Lam et al., 2015; Dukler et al., 2016; Hay et al., 2016; Will et al., 2017; Lin et al., 2022), supra-additively/synergistically (Bothma et al., 2015; Lam et al., 2015; Joo et al., 2016; Shin et al., 2016; Carleton et al., 2017; Thomas et al., 2021; Lin et al., 2022; Brosh et al., 2023) or sub-additively/redundantly (Moorthy et al., 2017; Osterwalder et al., 2018). Nevertheless, all of these studies tested only small numbers of enhancers in one or a few genomic loci. Therefore, it is still unclear what the globally predominant mode of functional interaction is between enhancers, and how this may depend on the target promoter. For this, more scalable approaches are needed. Here, we report such an approach.
Systematic approach
To systematically assess enhancer-enhancer interplay and the collective effect on promoter activity, we designed a three-way combinatorial MPRA based on our previous combinatorial reporters (Martinez-Ara et al., 2022). This assay allowed us to test tens of thousands of enhancer-enhancer-promoter combinations in a uniform setting (Inoue & Ahituv, 2015; van Arensbergen et al., 2017; Martinez-Ara et al., 2022). In order to test how different promoters might affect the enhancer interplay and their output, we not only varied enhancer pairs, but also the identity of the promoter in the three-way combinations.
Results and significance
By testing about 69,000 enhancer-enhancer-promoter combinations in mouse embryonic stem cells (mESCs) we found that pairs of enhancers mostly combine near-additively. Furthermore, promoter choice affects this enhancer-enhancer interplay, and promoters integrate the enhancer effects in a non-linear manner. Our three-way combinatorial approach provides further insights into two key aspects of gene regulation: how enhancers combine their effects, and how promoters integrate enhancer signals.
Results
Construction of libraries/experimental design
Triple combinations approach
To systematically test how pairs of enhancers work together and in turn activate promoters, we implemented a three-way combinatorial approach in a massively parallel reporter assay (MPRA). We designed a cloning strategy that enabled us to construct libraries of tens of thousands of reporters that each contain a different enhancer-enhancer-promoter (EEP) combination (Figure 1A-B).
Model system and promoter selection
For these experiments we chose mESCs as a model. We selected eight promoters that are active in mESCs. Of these, seven are from tightly regulated genes that are involved in the control of the pluripotency state of the cells or their exit from this state (Klf2, Sox2, Nanog, Otx2, Lefty1, Ffg5, Tbx3) (Acampora et al., 2013; Dunn et al., 2014; D.-K. Kim et al., 2014; Thomas et al., 2021). In addition, we included the promoter of the housekeeping (Hounkpe et al., 2021) gene Ap1m1, which neighbours the Klf2 gene. From each promoter we approximately included the −350- to +50-bp region around the transcription start site (TSS), because this includes the most information-rich part of functional promoters (Martinez-Ara et al., 2022; van Arensbergen et al., 2017).
Clusters of enhancers selection
We selected clusters of enhancers that are either known to regulate the above promoters (Blinka et al., 2016; Thomas et al., 2021; Zuin et al., 2022; Brosh et al., 2023), or are candidates for such function because of their proximity to the respective promoters (See methods and Supplementary Table 1) (Hnisz et al., 2013; Whyte et al., 2013). We also included putative enhancers from two additional developmental genes from the core pluripotency network (Pou5f1 and Nodal) (Dunn et al., 2014) that are located elsewhere in the genome.
Definition and selection of single enhancers
We defined single enhancers as ∼400 bp elements overlapping with DNase hypersensitive sites (DHSs) in clusters of enhancers (Fig 1C-D), as previously described (Martinez-Ara et al., 2022). Some of these enhancer clusters could be considered as single large enhancers (Long et al., 2016). However, since DHSs have been shown to retain enhancer activity individually (Barakat et al., 2018; Agrawal et al., 2021; Bergman et al., 2022; Martinez-Ara et al., 2022) we decided to test the combinatorial contributions of these smaller elements of an enhancer cluster. In total we selected 59 of such single enhancers. We also included 20 random sequences as negative controls (See Methods, Supplementary Table 2).
Cloning of libraries
We cloned each promoter separately into the reporter assay vector. We then barcoded the 8 separate vectors (see Methods). Next, we randomly combined enhancers and controls into pairs and cloned them into the eight different barcoded vectors downstream of the promoters (Fig1A and Fig S1A). This generated eight different EEP combinatorial libraries, one per promoter, with a total combinatorial space of 49,928 (8 promoters x 79 enhancers and controls x 79 enhancers and controls) or 199,712 if we consider the four possible orientation/position combinations of each enhancer or control fragment (Fig1A). Of this space, the libraries covered 138,528 (69%) of the total combinatorial space, and after application of stringent quality filters (see Methods) we were able to determine activities of 110,180 (55%) combinations (Supplementary table 3).
Explanation of library design
The design of the libraries generates eight matrices, one per promoter. These contain three types of combinations (Figure 1B): control-control (CC), enhancer-control (EC, regardless of position), and enhancer-enhancer (EE), which can be from the same or a different enhancer cluster. Control-control combinations are necessary to measure the baseline activity of each promoter. Enhancer-control combinations are used to estimate the effects of single enhancers. In total, we measured 2,584 control-control combinations, 38,390 enhancer-control combinations, and 69,206 enhancer-enhancer combinations.
Library transfections
We transfected each library separately into mESCs, with three biological replicates performed on different days. RNA was extracted and barcodes were counted by sequencing as previously described (Martinez-Ara et al., 2022) both from RNA and from the plasmid libraries. We calculated the activity of each barcode as the ratio between barcode counts in cDNA and plasmid DNA. We then averaged barcode activities for each triple combination across a minimum of 5 barcodes. For these averaged activities, all developmental promoter libraries showed high correlations between replicates (Pearson’s R = 0.82 to 0.96) (Figure S2). The housekeeping promoter library showed somewhat lower correlations between replicates (Pearson’s R = 0.74 to 0.78), which is most likely due to a smaller dynamic range. For further analyses, we then averaged the biological replicates for each promoter.
Effects of single enhancers
Calculation of boost indices
In order to quantify the activation by each enhancer we first calculated the baseline activity of each promoter, defined as the median activity across all control-control combinations. Then, for each individual enhancer-promoter combination, we calculated the median activity across all enhancer-control combinations. For each single enhancer we then calculated a boost index, which is the log2 ratio of its median activity across enhancer-control combinations over the promoter baseline across control-control combinations Figure 2A). We observed that there was little position and orientation dependence of the enhancer effects (Figure S3). We therefore averaged the boost indices of single enhancers across positions and orientations for further analyses.
Selected DHSs are active enhancers
The single enhancer boost indices show that the majority of the sequences that we selected indeed act as enhancers, although the boost indices varied up to ∼5 log2 units (Figure 2A-B). All tested enhancers showed significant activation of at least one promoter (at an estimated false discovery rate (FDR) < 0.01) (Figure 2B, D). However, the dynamic ranges and effects of each enhancer vary per promoter, with the boost indices being particularly smaller for the housekeeping promoter (Ap1m1) (Figure 2A, C). We conclude that most DHSs that we selected from the enhancer clusters act individually as enhancers.
Promoter selectivity of enhancers
In our previous study of a large set of E-P pairs we found that the majority of tested enhancers exhibited significant preference for certain promoters (Martinez-Ara et al., 2022). Applying the same statistical analysis to the current enhancer set confirmed this finding: the variability of enhancer effects across promoters are generally larger than what is expected from experimental noise (Figure S4). This indicates that the single enhancers tested here show significant promoter selectivity.
Enhancers generally combine near-additively
Enhancer-enhancer combinations are generally stronger than single enhancers
We then focused on the effects of enhancer-enhancer (EE) combinations. In the boost index matrix that covers all combinations for a single promoter, we observed that generally EE combinations were more active than EC combinations (Figure 3A). Indeed, quantitative analysis showed that EE combinations tend to induce stronger activation than the single enhancers (EC) (Figure 3B-C), although some saturation appears to occur with the strongest enhancers (Figure 3C).
Additive model fits better than multiplicative model
We then asked how the effects of two single enhancers are integrated in the enhancer-enhancer (EE) combinations. We considered two simple models: additivity and multiplicativity. For each EE combination we calculated the expected effect according to each of these models based on the single enhancer measurements, and then compared the observed and expected activities (additive in Figure 3D-E, multiplicative in Figure 3F, see Methods). Interestingly, for most of the promoters the additive model consistently showed a better fit to the measured data than the multiplicative model (Figure 3E-F, Figure S5). In particular, the multiplicative model tended to strongly overestimate activities at the higher range of activities (most active enhancers) for most promoters. When focusing on EE combinations for which the multiplicative and additive expected values differ more than 0.5 log2 units (to limit random noise effects), we found that the multiplicative model matched the observed values better for only 0-17% EE combinations across the seven promoters. Only for the Ap1m1 promoter the multiplicative and additive models were nearly indistinguishable. This may be due to the low dynamic range of activities observed with this promoter. We note that for the Fgf5 promoter in the low activity ranges the expected activities were consistently underestimated for both models. It is likely that this is an inaccuracy due to the relative sparseness of the Fgf5 data (the baseline activity of this promoter was estimated from only 23 measured control-control combinations, compared to 219-477 combinations for the other promoters). In summary, we conclude that a simple additive model fits the observed data better than a multiplicative model.
Supra-additive activation is rare and promoter dependent
Low frequency of supra-additive enhancer-enhancer interactions
Most enhancer-enhancer combinations show only minor deviations from the additive model (Figure S5), as illustrated by the distributions of observed/expected ratios (Figure 4A). By applying a simple error model to estimate the noise in the expected EE activities according to the additive model (Figure 3D, see Methods) we found that 74.5% of the measured EE activities lies within one standard deviation (SD) of the predicted activities (Figure 4A) and 94% lies within two standard deviations. This underscores that most of the tested enhancers combine near-additively, and that both sub-additive and supra-additive effects are generally rare or weak.
Supra-additivity occurs more frequently with weak promoters
Nevertheless, Figure 4A suggests that deviations from the additive model may vary between promoters. The Sox2, Lefty1, Tbx3 and Fgf5 promoters showed a relatively large proportion of EE pairs with supra-additive effects, while the Ap1m1 promoter exhibited the lowest frequency of supra-additivity. Interestingly, the proportion of supra-additivity correlates inversely with promoter baseline expression (Figure 4B, left panel). Thus, lowly active promoters allow more supra-additive EE interactions, although these remain a minority (<40% of EE pairs). In contrast, the proportion of sub-additive effects (i.e., >1 SD below expected activity) is very similar across promoters (Figure 4B right). This latter observation suggests that there is no strong saturation effect that could explain the decrease in number of supra-additive interactions as promoter activity increases.
EE synergies are relatively infrequent and often promoter-dependent
Next, we investigated whether supra-additivity involves a specific subset of enhancers. Visual inspection identified a few individual enhancers that exhibited supra-additive interactions with most other enhancers, but often in the context of particular promoters. An example is Nodal_E160 in combination with the Fgf5 promoter (Figure 4D, left panel). Other enhancers showed a diversity of supra-additive activities. Out of the 58 tested enhancers 32.7% (19/58) showed consistent supra-additivity in at least one promoter context (>0.5 log2(observed/expected), mean over all enhancers in at least one promoter context), and only 10.3% (6/58) showed consistent supra-additivity in two or more promoter contexts (Figure 4C). However, most enhancers behave like Sox2_E180 (Figure 4D, right panel), showing small deviations from the additive behaviour with most other enhancers, and with all of the tested promoters (Figure 4C).
No evidence for locus-specific EE supra-additivity
We then asked whether enhancer pairs derived from the same genomic locus have a higher propensity towards supra-additivity than those not derived from the same locus. This could point towards co-evolution of enhancers that jointly control a gene. However, we found that EE combinations from the same locus showed on average not a higher observed/expected ratio (according to the additive model) than EE combinations from different loci; in fact, this ratio is even slightly lower (Figure 4E). Thus, generally there appears to be no preferential synergy among enhancers that are located in the same genomic locus.
Enhancer effects scale non-linearly across developmental promoters
Enhancer effects correlate well across developmental promoters
Prompted by our observations that some supra-additive EE interactions appear to be more pronounced in the context of specific promoters, we decided to systematically compare the boost indices of all EE pairs between the eight promoters. Strikingly, all seven developmental promoters showed strong pairwise correlations of their boost indices (Pearson’s R = 0.79 to 0.98), whereas the housekeeping promoter (Ap1m1) correlated much less with each of the developmental promoters (Pearson’s R = 0.19 to 0.60; Figure S6).
Slopes of promoter boost responses differ
Although the developmental promoters showed strong correlations, we were surprised to notice that the slopes of these relationships often deviated from 1. To further explore this, we plotted the boost indices of each promoter against the average boost indices across all eight promoters (Figure 5A). All of the developmental promoters correlated strongly with this average pattern (Pearson’s R= 0.93 to 0.99) but this was not the case for the housekeeping promoter (Pearson’s R= 0.42). The slopes relative to the averaged promoters ranged from 1.54-0.32, i.e., about 5-fold, across the developmental promoters. These results imply that the developmental promoters differ in the scaling (i.e., steepness) of their responses to the set of EE pairs, even though the pattern of their responses is highly similar. Interestingly, the relative slopes negatively correlate with the baseline activity of these promoters (Figure 5B). Thus, promoters with low intrinsic activity respond more steeply to changes in combined EE activity than promoters with high intrinsic activity.
Discussion
A large-scale survey of EEP combinations
Here, we employed a MPRA approach to measure three-way regulatory interactions in ∼69,000 EEP combinations involving 8 promoters and 59 enhancers. This is of a much larger scale than previous efforts to elucidate the combinatorial interplay among enhancers, which typically focused on a handful of enhancers in single genomic loci (Bothma et al., 2015; Hay et al., 2016; Shin et al., 2016; Dukler et al., 2016; Joo et al., 2016; Lam et al., 2015; Carleton et al., 2017; Moorthy et al., 2017; Osterwalder et al., 2018; Thomas et al., 2021; Lin et al., 2022; Brosh et al., 2023). These previous studies pointed to a diversity of functional enhancer-enhancer (EE) interactions, including redundancy, additivity and synergy. Our study also uncovered a broad spectrum of functional interactions, but simple near-additive effects of EE combinations were predominant, while supra-additive interactions occurred at a lower frequency and were dependent on the specific enhancer and promoter context.
Promoters "interpret" and scale EE interplay
An important finding is that promoters can have substantial impact on supra-additivity of enhancer-enhancer interactions. Promoters may thus be seen as "interpreters" of EE combinations. This underscores the importance of studying triple combinations of elements, and perhaps in the future even more complex settings. A salient finding is that the different promoters show different degrees of responsiveness to EE pairs. The responses of the promoters follow roughly a power-law function of which the exponent is dependent on the promoter (Figures 5A and S6, which are on log-log scale). Because it is not possible to define an absolute measure of enhancer strength, we cannot determine the absolute values of the exponents of the promoter power-law responses. This would be of interest because exponents <1 would lead to "diminishing returns" with higher EE activities, while exponents >1 would correspond to increasing non-linear amplification with higher EE activities. The apparent saturation observable at high enhancer activities in Figure 3C suggests that the exponents are most likely below 1.
Episomal MPRAs vs genomic context
The reporters in our MPRA were not integrated into the genome. On the one hand, this ensures that all EEP combinations are tested under similar conditions, without confounding effects of variable chromatin contexts. On the other hand, we cannot rule out that the results are not fully representative of a natural chromatin environment. However, other studies have indicated that transiently transfected reporters generally behave quite similar, although not identically, to integrated reporters (Inoue & Ahituv, 2015; Klein et al., 2020), and another study indicated that chromatin context can scale transcriptional activity of inserted reporters in a simple linear manner that is largely independent of the inserted sequence (Maricque et al., 2018).
Results may be biased by model system
We cannot rule out that our results are biased by our selection of enhancers and promoters, which was mostly focused on genes linked to pluripotency in mouse ESCs. We may have missed more prominent synergistic or other non-additive effects that may arise in different cell types or among different classes of enhancers and promoters. However, the protocol that we developed for three-way MPRAs should be generally applicable to any set of enhancers and promoters.
Spacing of elements
It should also be considered that in our MPRA the enhancer and promoter elements are placed in fixed positions and close to one another, which does not recapitulate their relative positioning in the genome. Genomic distance and chromatin contexts could act as thresholds for activation, or turn additive interactions into non-linear relationships. For instance, the ability of a Sox2 enhancer to activate its cognate promoter was found to depend on their distance, following approximately a power law (Zuin et al., 2022). The rules for EE interplay may also depend on the EE distance or genomic context, as suggested by a recent report indicating that distant pairs of enhancers are more often synergistic than closely spaced enhancers, which tend to act additively (Lin et al., 2022).
Enhancer - promoter compatibility re-interpreted
In our previous MPRA study (Martinez-Ara et al., 2022) we found evidence for selectivity of enhancer - promoter functional interactions. However, due to the limited number of enhancer-promoter combinations tested we could not fully disentangle what caused these differences in promoter responses. The data from the eight promoters indicate that there are two intermingled factors that account for differences between promoters. First, distinct classes of promoters show different specificities, as illustrated by the poor correlations between the Ap1m1 housekeeping promoter and the developmental promoters. Second, promoters scale the input they receive from enhancer pairs differently, as discussed above. It is probable that the compatibility differences we observed in our previous study (Martinez-Ara et al., 2022) are a mix of these two phenomena.
Relevance of findings and context
The findings presented here offer insights into two aspects of the gene regulatory process – interplay between enhancers, and promoter integration of the resulting signals – in a more quantitative manner. The near-additivity of enhancers will help us understand how different enhancers may be more or less relevant in different loci. The non-linearity of promoter responsiveness and the effects of promoters on enhancer-enhancer interactions illustrates how some promoters may be more sensitive to activating signals. Thus, it will be useful to quantify promoter responsiveness genome-wide to understand which genes may be more or less sensitive than others to environmental cues or to non-coding mutations. These aspects may be especially relevant for developmentally regulated transcription factors and their target genes, like the ones here studied, as they can be very sensitive to dosage changes (Naqvi et al., 2023).
Methods
Cell culture
We used E14tg2a male mouse embryonic stem cells (ATCC CRL-1821) cultured in 2i+LIF for all experiments. As described {Martinez-Ara,2022,35594855}, we prepared 2i+LIF media according to the 4DN nucleome protocol (https://data.4dnucleome.org/protocols/cb03c0c6-4ba6-4bbe-9210-c430ee4fdb2c/). The reagents used were Neurobasal medium (#21103-049, Gibco), DMEM-F12medium (#11320-033, Gibco), BSA (#15260-037; Gibco), N27 (#17504-044; Gibco), B2 (#17502-048; Gibco), LIF(#ESG1107; Sigma-Aldrich), CHIR-99021 (#HY-10182; MedChemExpress) and PD0325901 (#HY-10254; MedChemExpress), monothioglycerol(#M6145-25ML; Sigma) and glutamine (#25030-081, Gibco). Mycoplasma contamination was ruled out by monthly tests (#LT07-318; Lonza).
Selection of promoters, enhancers and controls
For the construction of the libraries we selected seven developmental genes from the mESC pluripotency regulatory network (Acampora et al., 2013; Dunn et al., 2014; D.-K. Kim et al., 2014; Thomas et al., 2021), Klf2, Sox2, Nanog, Otx2, Lefty1, Ffg5, Tbx3; and the housekeeping gene Ap1m1 (Hounkpe et al., 2021), which is neighbouring the Klf2 gene. The promoters of these genes were identified based on transcript and TSS annotation (Frankish et al., 2019) and overlaps with DNAse hypersensitivity sites (DHS) (Joshi et al., 2015) as previously described (Martinez-Ara et al., 2022). We then selected sequences that approximately cover -350 to 50bp around the TSS.
The clusters of enhancers were selected in as follows: 1. Clusters of enhancers that were known to regulate the developmental genes (Nanog, Sox2, Fgf5 clusters) (Blinka et al., 2016; Brosh et al., 2023; Thomas et al., 2021; Zuin et al., 2022); 2. Putative clusters of enhancers in proximity of one of the developmental genes and that overlapped with previously defined super-enhancers (Lefty1, Tbx3, Otx2 clusters) (Hnisz et al., 2013; Whyte et al., 2013); 3. A cluster of putative enhancers previously identified by us (Martinez-Ara et al., 2022) to act as enhancers in an MPRA (Klf2 cluster). 4. Clusters of enhancers identified as putative enhancers of other mESC pluripotency genes for which we did not include their promoters (Pou5f1 and Nodal clusters) and overlapping with previously defined super-enhancers (Hnisz et al., 2013, 2015; Whyte et al., 2013).
Clusters of enhancers were divided into single enhancers based on DNAse hypersensitivity sites (DHS) (Joshi et al., 2015). These single enhancers were then resized to ∼450bp from the center of DNAse hypersensitivity peaks as previously described (Martinez-Ara et al., 2022).
For the library construction we included 20 450 bp long randomized control sequences as previously described (Martinez-Ara et al., 2022). Fifteen of these were based on five DHS peak sequences that were randomly scrambled. The other five controls were randomly generated sequences with the similar GC content as the DHS peak sequences. These control sequences were ordered as synthetic DNA. Supplementary Table 2 contains the sequences of these controls.
For single enhancers and promoters we designed PCR primers against the 50 bp ends of each selected element using BatchPrimer3 v1.0 (You et al., 2008). This yielded PCR products of ∼400 bp for each element. Supplementary Table 1 lists the coordinates of all individual enhancers and promoters. Supplementary Table 4 lists primer sequences.
EEP Library generation
The vector for library construction was previously used for the downstream reporter assay in our previous study (Martinez-Ara et al., 2022). This vector is based on a pSMART backbone. It contains a green fluorescent protein (GFP) open reading frame followed by a barcode, and a psiCheck polyadenylation signal (PAS) introduced during barcoding, followed by the cloning site for inserts and a triple polyadenylation site (SV40+bGH+psiCheckPAS).
Each of the selected eight promoters were amplified by PCR and individually inserted by Gibson assembly (#E2611S; New England Biolabs) into the reporter vector. Each construct was transformed into standard 5-alpha competent bacteria (#C2987; NEB) grown overnight in in 500 ml of standard Luria Broth (LB) with 50 mg/ml of kanamycin and purified. Then each promoter sequence was verified by Sanger sequencing.
Each of the promoter vectors was then barcoded. Similar to what we described previously (Martinez-Ara et al., 2022), we digested 10 µg of each vector with AvrII (#ER1561; Thermo Fischer) and XcmI (#R0533; NEB) and performed a gel-purification. Barcodes were amplified in 10 separate 100ul PCR reactions using 5 µl of 10 mM primer 275JvA, 5 ml of 10 mM primer 465JvA and 1 µl of 0.1 mM template 274JvA (see Supplementary Table 4 for sequences). 14 PCR cycles were performed using MyTaq Red Mix (#BIO-25043; Bioline), yielding 30 µg of barcodes. We purified barcodes by phenol-chloroform extraction and isopropanol precipitation. Barcodes were digested overnight with 80 units of NheI (#R0131S; NEB) and purified by magnetic bead purification (#CPCR-0050; CleanNA). Each promoter vector and barcodes were then ligated in one 100 µl reaction. In each reaction we used 3 µg of digested vector and 2.7 µg digested barcodes, 20 units NheI (#R0131S; NEB), 20 units AvrII, 10 ml of 103 CutSmart buffer, 10 µl of 10 mM ATP, 10 units T4 DNA ligase (#10799009001 Roche). We performed a cycle-ligation of six cycles (10 min at 22 °C and 10 min at 37 °C), followed by 20 min heat-inactivation at 80 °C. Linear barcoded vectors were purified by magnetic beads and digested for 3h with 40 units of XcmI (#R0533S; NEB). Finally barcoded vectors were purified again by gel-purification.
Single enhancers were amplified by PCR from E14tg2a mESC genomic DNA, then purified with magnetic beads (#CPCR-0050, Clean NA) and quantified by Qubit using dsDNA High sensitivity Qubit kit (#Q33231; invitrogen). Single enhancers and random controls were combined in approximately equimolar manner. This pool of elements was end-repaired using End-It DNA End-Repair Kit (#ER0720; Epicentre) and self-ligated using Fast-link ligase (LK0750H; Lucigen). Duplets of 800bp-1000bp were excised from agarose gel, purified (BIO-52059; Bioline), and A-tailed using Klenow HC 3’ → 5’ exo (#M0212L; NEB).
The pool of duplets was ligated to each of the barcoded promoter vectors using Takara ligation kit version 2.1 (#6022; Takara). Ligation products were purified using magnetic bead purification and 2 µl of the ligation were electroporated into 20 µl of electrocompetent e. cloni 10G supreme (#60081-1; Lucigen). To estimate the approximate complexity of the resulting libraries, dilutions of the electroporated bacteria were plated for overnight culture and colonies were counted. We aimed at a minimum of 2 million complexity per library in order to have sufficient expected representation of all possible combinations.
When the estimated complexity of the libraries was not high enough, we amplified the ligation products with primers aligning to the vector. This generated PCR products consisting of the duplets pool and a vector overhang. These PCR products were used for Gibson assembly with the barcoded promoters (#E2611S; NEB). Gibson assembly reactions were purified with magnetic beads and 2 µl of the reaction were electroporated into 20 µl of electrocompetent e. cloni 10G supreme (#60081-1; Lucigen).
Each library was grown overnight in 500 ml of standard Luria Broth (LB) with 50 mg/ml of kanamycin and purified using a maxiprep kit (K210016, Invitrogen).
Characterization of libraries by iPCR and sequencing
Barcode to insert (enhancer and control duplets) combinations were identified by inverse PCR (iPCR) and Illumina sequencing as described before(van Arensbergen et al., 2017). In brief, each library was digested overnight with I-CeuI (#R0699S, NEB), barcode-insert fragments were then circularized, remaining linear fragments digested, and barcode-insert fragments were linearized again with I-SceI (#R0694S; NEB). Barcode-insert fragments were then amplified by PCR with Illumina adapters and sequenced on an Illumina NextSeq 550 platform using 150 bp paired-end sequencing. Each library was processed separately and mixed together for sequencing.
Transfection of libraries
Each EEP library was transfected separately into E14tg2a mESCs. Per library 20 million cells were nucleofected (5µgs of library and 5million cells per cuvette, 5 cuvettes) using Amaxa nucleofector II, program A-30, and Mouse Embryonic Stem Cell NucleofectorTM Kit (#VPH-1001, Lonza). Three biological replicates were performed per library on different days. Cells were collected 24 hours after transfection in 5 ml of TRIsure (#BIO-38032; Bioline) and frozen at -80 °C until further processing.
cDNA sequencing
RNA was extracted and prepared for sequencing as described for the Downstream Assay in our previous study (Martinez-Ara et al., 2022). From each sample, the aqueous phase containing the total RNA of the TRIsure solution was extracted and purified on an RNA extraction column (#K0732, Thermo Scientific). Total RNA was digested with 10 units of DNase I (#04716728001; Roche) for 30 minutes in 10-8 10 µl reactions containing 5 µg of RNA. DNase I was inactivated by addition of 1 µl of 25 mM EDTA and incubation at 70°C for 10 min.
cDNA was produced in 20 µl reactions (1 per 10 µDNAse reaction) using Maxima Reverse transcriptase (#EP0743; ThermoFisher Scientific) and a gene specific (targeting the GFP reporter) primer (JvA304, see Supplementary table 4 for sequence). dNTPs and the primer were mixed with the DNAse digested RNA and incubated at 65°C for 5 minutes. The RT buffer, enzyme and RNAse inhibitor were added and the reaction was incubated at 50 °C for 1 hour. cDNA was then amplified by two nested PCRs to make it strand specific. The first PCR reaction uses (index variants of 285JvA containing the S2, index and p7 adaptor) and 305JvA (targeting the adapter introduced by 304JvA). Each 20 µl RT reaction was amplified in a 100µl PCR reaction with MyTaq Red mix (#BIO-25043; Bioline). The second PCR reaction was performed using 10 µl of the product of the previous reaction in a 100 µl reaction using the same index variant primer and index variants of 437JvA (containing the S1, index and p5 adaptor) (see Supplementary table 4 for primer sequences). The first PCR was run for 10 cycles and the second one for 8 cycles using the recommended Mytaq Red mix conditions (for both PCRs: 1 min 96 °C, then each cycle 15 s 96 °C, 15 s 60 °C, 15 s 72 °C). PCR products were purified with magnetic beads and mixed for sequencing in an Illumina NovaSeq 6000 platform using 100bp single end reads.
Plasmid DNA (pDNA) sequencing
Plasmid libraries were processed for sequencing as described (Martinez-Ara et al., 2022) but using dual indexing for sequencing on a NovaSeq 6000 platform with 100 bp single end reads. 1 µg of each plasmid was digested with I-SceI in order to linearise the plasmid. Barcodes were amplified by PCR for 9 cycles from 50 ng of material. Primers and reaction conditions were the same as in the amplification of cDNA. PCR products were purified using magnetic beads and mixed for sequencing.
Computational methods
cDNA and pDNA Data pre-processing and linking of barcodes to duplets was performed using a custom snakemake pipeline (Köster & Rahmann, 2012). All other data analyses and quantifications were performed in R (R Core Team, 2022). Figures were generated using ggplot2 (Wickham, 2016). The snakemake pipeline and all the scripts used in this publication are available in a GitHub repository (https://github.com/vansteensellab/EEPCombinations).
Linking Barcodes to duplet inserts
The custom pipeline for linking barcodes to duplet inserts was previously described {Martinez-ara,2022, 35594855}. In brief, iPCR reads from each library were locally aligned using bowtie (version 2.3.4) (Langmead & Salzberg, 2012) with very sensitive parameters (–very-sensitive-local) on a custom bowtie genome. This custom Bowtie genome consists of virtual chromosomes that correspond to each of the enhancer and control sequences used to generate the combinatorial libraries. Bam files were processed by a custom Python script (Van Rossum & Drake, 2009). This script extracts from read 1 the barcode sequence, and the identity and orientation of the DNA fragment in position 1. From read 2 it extracts the identity and orientation of the DNA fragment in position 2. Finally, barcodes were clustered using Starcode (version 1.1) (Zorita et al., 2015) to remove PCR and sequencing errors.
cDNA and pDNA data pre-processing
For each cDNA and pDNA replicate of each library, barcodes were extracted from single-end reads using a custom Python script that matches the constant region after the barcode. Barcodes were clustered using Starcode (version 1.1) (Zorita et al., 2015) to remove errors from sequencing and counts were summarized.
cDNA and pDNA data post-processing
For each library cDNA and pDNA barcodes were matched to iPCR barcodes. Any barcode assigned to multiple fragment identities was removed from the data. Per cDNA/pDNA replicate barcode counts were normalized to the total number of barcode counts. Then activity was calculated per barcode and cDNA replicate as the cDNA:pDNA normalized counts ratio. Per individual duplet combination these normalized activity ratios were averaged across barcodes with a minimum requirement of five barcodes and eight pDNA counts per barcode. The mean activity across the three replicates was calculated as the geometric mean.
Calculation of boost indices
In our previous study (Martinez-Ara et al., 2022) we observed that it was critical to use control sequences to estimate the baseline activity of the promoters. For each promoter we used all of the control-control combinations to estimate its baseline activity. We calculated this as the median activity across all control-control combinations. Then the boost index of an enhancer-enhancer or enhancer-control (single enhancer boost index) combination was calculated as the log2 ratio between the observed activity of this combination and the baseline activity of the promoter. Except in Supplementary Figure 3, boost indices were averaged across positions and orientations. In Figure 2a boost indices were averaged across all enhancer-control combinations for each enhancer. In Figure 3a boost indices were averaged across orientations but not positions.
Calculation of additive and multiplicative expected activities
To calculate the expected additive and multiplicative activities we first had to determine three values. The baseline activity of the promoter (ACCP) was measured as the median activity across all control-control combinations for each promoter. The single enhancer activity estimates (AE1CP and ACE2P, enhancer in position1 and position 2 respectively) were calculated as the median activity of all enhancer-control combinations where the enhancer was found in either position 1 (AE1CP) or position 2 (ACE2P). Therefore
ACCP = median{AC1C2P, AC1C3P, …, ACiCjP}
AE1CP = median{AE1C1P, AE1C2P, …, AE1CiP}
ACE2P = median{AC1E2P, AC2E2P, …, ACiE2P}
Then for each enhancer-enhancer-promoter combination the expected additive activity was calculated as the sum of the estimated activities of the single enhancers in each position minus the promoter baseline activity (AE1CP + ACE2P – ACCP). The multiplicative expected activity was calculated as the product of the estimated activities of the single enhancers in each position divided by the promoter baseline activity ((AE1CP * ACE2P)/ACCP). For the expected additive activity, the standard deviation of the estimate was propagated as the square root of the sum of the variances of the single measurements. Therefore
Expected Additive Activity = (AE1CP + ACE2P - ACCP)
Expected Multiplicative Activity = (AE1CP * ACE2P)/ACCP
The additive expected activities were then used to calculate log2(observed/expected) ratios in order to find cooperative behaviours between enhancers. To find trends the log2 ratios were then averaged for each enhancer across enhancer-enhancer pairs.
Non-linear Promoter responses
For every enhancer-enhancer combination present in all eight libraries we averaged boost indices across the eight promoters. This average enhancer effect was then used to calculate the slope between the Average boost index and the observed boost index for each of the promoters. The slopes were calculated using the lm function in R.
Data availability
Raw sequencing data and processed data are available at Gene expression Omnibus (GEO): GSE240586. Publicly available datasets that were used are listed in and are available at https://osf.io/yuc4e/. Lab journal records are available at https://osf.io/yuc4e/. Code of data processing pipelines and analysis scripts are available at https://github.com/vansteensellab/EEPCombinations (DOI: ###).
Acknowledgements
We thank the NKI Genomics, and Research High Performing Computing facilities for technical support, and members of our laboratory for helpful discussions and comments. Funded by the European Union (European Research Council, RE_LOCATE, 101054449). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. Research at the Netherlands Cancer Institute is supported by an institutional grant of the Dutch Cancer Society and of the Dutch Ministry of Health, Welfare and Sport. The Oncode Institute is partially funded by the Dutch Cancer Society
Declaration of interests
F.C. is a co-founder of enGene Statistics GmbH.
References
- Otx2 is an intrinsic determinant of the embryonic stem cell state and is required for transition to a stable epiblast stem cell conditionDevelopment (Cambridge, England) 140:43–55https://doi.org/10.1242/dev.085290
- Genome editing demonstrates that the -5 kb Nanog enhancer regulates Nanog expression by modulating RNAPII initiation and/or recruitmentThe Journal of Biological Chemistry 296https://doi.org/10.1074/jbc.RA120.015152
- Expression of a beta-globin gene is enhanced by remote SV40 DNA sequencesCell 27:299–308https://doi.org/10.1016/0092-8674(81)90413-x
- Functional Dissection of the Enhancer Repertoire in Human Embryonic Stem CellsCell Stem Cell 23:276–288https://doi.org/10.1016/j.stem.2018.06.014
- Compatibility rules of human enhancer and promoter sequencesNature 607:176–184https://doi.org/10.1038/s41586-022-04877-w
- POU/TBP cooperativity: A mechanism for enhancer action from a distanceMolecular Cell 10:397–407https://doi.org/10.1016/s1097-2765(02)00597-x
- Super-Enhancers at the Nanog Locus Differentially Regulate Neighboring Pluripotency-Associated GenesCell Reports 17:19–28https://doi.org/10.1016/j.celrep.2016.09.002
- Enhancer additivity and non-additivity are determined by enhancer strength in the Drosophila embryoELife 4https://doi.org/10.7554/eLife.07956
- Synthetic regulatory genomics uncovers enhancer context dependence at the Sox2 locusMolecular Cell 83:1140–1152https://doi.org/10.1016/j.molcel.2023.02.027
- Multiplex Enhancer Interference Reveals Collaborative Control of Gene Regulation by Estrogen Receptor α- Bound EnhancersCell Systems 5:333–344https://doi.org/10.1016/j.cels.2017.08.011
- An enhancer directs differential expression of the linked Mrf4 and Myf5 myogenic regulatory genes in the mouseDevelopmental Biology 269:595–608https://doi.org/10.1016/j.ydbio.2004.02.013
- Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factorCell 149:1233–1244https://doi.org/10.1016/j.cell.2012.03.051
- Is a super-enhancer greater than the sum of its parts?Nature Genetics 49:2–3https://doi.org/10.1038/ng.3759
- Defining an essential transcription factor program for naïve pluripotency. Science (New YorkN.Y 344:1156–1160https://doi.org/10.1126/science.1248882
- An integrated encyclopedia of DNA elements in the human genomeNature 489:57–74https://doi.org/10.1038/nature11247
- GENCODE reference annotation for the human and mouse genomesNucleic Acids Research 47:D766–D773https://doi.org/10.1093/nar/gky955
- Leveraging massively parallel reporter assays for evolutionary questionsGenome Biology 24https://doi.org/10.1186/s13059-023-02856-6
- Towards a comprehensive catalogue of validated and target-linked human enhancersNature Reviews. Genetics 21:292–310https://doi.org/10.1038/s41576-019-0209-0
- Position-independent, high-level expression of the human beta-globin gene in transgenic miceCell 51:975–985https://doi.org/10.1016/0092-8674(87)90584-8
- Genetic dissection of the α-globin super-enhancer in vivoNature Genetics 48:895–903https://doi.org/10.1038/ng.3605
- Super-enhancers in the control of cell identity and diseaseCell 155:934–947https://doi.org/10.1016/j.cell.2013.09.053
- Convergence of developmental and oncogenic signaling pathways at transcriptional super-enhancersMolecular Cell 58:362–370https://doi.org/10.1016/j.molcel.2015.02.014
- HRT Atlas v1.0 database: Redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasetsNucleic Acids Research 49:D947–D955https://doi.org/10.1093/nar/gkaa609
- Decoding enhancers using massively parallel reporter assaysGenomics 106:159–164https://doi.org/10.1016/j.ygeno.2015.06.005
- Exchange of GATA factors mediates transitions in looped chromatin organization at a developmentally regulated gene locusMolecular Cell 29:232–242https://doi.org/10.1016/j.molcel.2007.11.020
- Stimulus-specific combinatorial functionality of neuronal c-fos enhancersNature Neuroscience 19:75–83https://doi.org/10.1038/nn.4170
- Dynamic Reorganization of Extremely Long-Range Promoter-Promoter Interactions between Two States of PluripotencyCell Stem Cell 17:748–757https://doi.org/10.1016/j.stem.2015.11.010
- Lefty1 and lefty2 control the balance between self-renewal and pluripotent differentiation of mouse embryonic stem cellsStem Cells and Development 23:457–466https://doi.org/10.1089/scd.2013.0220
- Deciphering the multi-scale, quantitative cis-regulatory codeMolecular Cell 83:373–392https://doi.org/10.1016/j.molcel.2022.12.032
- A systematic evaluation of the design and context dependencies of massively parallel reporter assaysNature Methods 17:1083–1091https://doi.org/10.1038/s41592-020-0965-y
- Snakemake—A scalable bioinformatics workflow engine. Bioinformatics (OxfordEngland 28:2520–2522https://doi.org/10.1093/bioinformatics/bts480
- Partially redundant enhancers cooperatively maintain Mammalian pomc expression above a critical functional thresholdPLoS Genetics 11https://doi.org/10.1371/journal.pgen.1004935
- Fast gapped-read alignment with Bowtie 2Nature Methods 9:357–359https://doi.org/10.1038/nmeth.1923
- Nested epistasis enhancer networks for robust genome regulation. Science (New YorkN.Y 377:1077–1085https://doi.org/10.1126/science.abk3512
- Ever-Changing Landscapes: Transcriptional Enhancers in Development and EvolutionCell 167:1170–1187https://doi.org/10.1016/j.cell.2016.09.018
- A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activityNature Biotechnology https://doi.org/10.1038/nbt.4285
- Systematic analysis of intrinsic enhancer-promoter compatibility in the mouse genomeMolecular Cell 82:2519–2531https://doi.org/10.1016/j.molcel.2022.04.009
- Enhancers and super-enhancers have an equivalent regulatory role in embryonic stem cells through regulation of single or multiple genesGenome Research 27:246–258https://doi.org/10.1101/gr.210930.116
- Precise modulation of transcription factor levels identifies features underlying dosage sensitivityNature Genetics 55:841–851https://doi.org/10.1038/s41588-023-01366-2
- Enhancer redundancy provides phenotypic robustness in mammalian developmentNature 554:239–243https://doi.org/10.1038/nature25461
- R: A Language and Environment for Statistical Computing [Computer software]R Foundation for Statistical Computing
- Long-range enhancer-promoter contacts in gene expression controlNature Reviews. Genetics 20:437–455https://doi.org/10.1038/s41576-019-0128-0
- Hierarchy within the mammary STAT5-driven Wap super-enhancerNature Genetics 48:904–911https://doi.org/10.1038/ng.3606
- Transcriptional enhancers: From properties to genome-wide predictionsNature Reviews. Genetics 15:272–286https://doi.org/10.1038/nrg3682
- Temporal dissection of an enhancer cluster reveals distinct temporal and functional contributions of individual elementsMolecular Cell 81:969–982https://doi.org/10.1016/j.molcel.2020.12.047
- Looping and interaction between hypersensitive sites in the active beta-globin locusMolecular Cell 10:1453–1465https://doi.org/10.1016/s1097-2765(02)00781-5
- The ‘beta-like-globin’ gene domain in human erythroid cellsProceedings of the National Academy of Sciences of the United States of America 82:6384–6388https://doi.org/10.1073/pnas.82.19.6384
- Proximity among distant regulatory elements at the beta-globin locus requires GATA-1 and FOG-1Molecular Cell 17:453–462https://doi.org/10.1016/j.molcel.2004.12.028
- Genome-wide mapping of autonomous promoter activity in human cellsNature Biotechnology 35:145–153https://doi.org/10.1038/nbt.3754
- Python 3 Reference ManualCreateSpace
- Master transcription factors and mediator establish super-enhancers at key cell identity genesCell 153:307–319https://doi.org/10.1016/j.cell.2013.03.035
- Ggplot2 Elegant Graphics for Data AnalysisNew York: Springer-Verlag
- Composition and dosage of a multipartite enhancer cluster control developmental expression of Ihh (Indian hedgehog)Nature Genetics 49:1539–1545https://doi.org/10.1038/ng.3939
- BatchPrimer3: A high throughput web application for PCR and sequencing primer designBMC Bioinformatics 9https://doi.org/10.1186/1471-2105-9-253
- Enhancer-core-promoter specificity separates developmental and housekeeping gene regulationNature 518:556–559https://doi.org/10.1038/nature13994
- Starcode: Sequence clustering based on all-pairs search. Bioinformatics (OxfordEngland 31:1913–1919https://doi.org/10.1093/bioinformatics/btv053
- Nonlinear control of transcription through enhancer-promoter interactionsNature 604:571–577https://doi.org/10.1038/s41586-022-04570-y
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Copyright
© 2023, Martinez-Ara et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 1,396
- downloads
- 82
- citations
- 3
Views, downloads and citations are aggregated across all versions of this paper published by eLife.