Research Article

Systematic analysis of naturally occurring insertions and deletions that alter transcription factor spacing identifies tolerant and sensitive transcription factor pairs

Department of Cellular and Molecular Medicine, School of Medicine, University of California San Diego, United States
Department of Bioengineering, Jacobs School of Engineering, University of California San Diego, United States
Department of Medicine, School of Medicine, University of California San Diego, United States
Department of Medical Biochemistry, Experimental Vascular Biology, Amsterdam Infection and Immunity, Amsterdam Cardiovascular Sciences, Amsterdam UMC, University of Amsterdam, Netherlands
Department of Medicine, McGill University, Canada
Division of Biological Sciences, University of California San Diego, United States
Department of Cellular and Molecular Medicine, College of Medicine, University of Arizona, United States
Department of Biochemistry and Molecular Biology, Nippon Medical School, Japan

Jan 20, 2022

Open access
Copyright information

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Regulation of gene expression requires the combinatorial binding of sequence-specific transcription factors (TFs) at promoters and enhancers. Prior studies showed that alterations in the spacing between TF binding sites can influence promoter and enhancer activity. However, the relative importance of TF spacing alterations resulting from naturally occurring insertions and deletions (InDels) has not been systematically analyzed. To address this question, we first characterized the genome-wide spacing relationships of 73 TFs in human K562 cells as determined by ChIP-seq (chromatin immunoprecipitation sequencing). We found a dominant pattern of a relaxed range of spacing between collaborative factors, including 45 TFs exclusively exhibiting relaxed spacing with their binding partners. Next, we exploited millions of InDels provided by genetically diverse mouse strains and human individuals to investigate the effects of altered spacing on TF binding and local histone acetylation. These analyses suggested that spacing alterations resulting from naturally occurring InDels are generally tolerated in comparison to genetic variants directly affecting TF binding sites. To experimentally validate this prediction, we introduced synthetic spacing alterations between PU.1 and C/EBPβ binding sites at six endogenous genomic loci in a macrophage cell line. Remarkably, collaborative binding of PU.1 and C/EBPβ at these locations tolerated changes in spacing ranging from 5 bp increase to >30 bp decrease. Collectively, these findings have implications for understanding mechanisms underlying enhancer selection and for the interpretation of non-coding genetic variation.

Editor's evaluation

Transcription factors (TFs) bind to the DNA in a sequence-specific manner at TF binding sites (TFBSs) to control gene transcription. Hence, characterizing how TFs interact with DNA is key to uncover how gene regulation occurs and how this process can be disrupted in diseases. While the binding properties of a large portion of human TFs are well characterized, a remaining challenge lies in our knowledge of how TFs interact cooperatively at regulatory elements, either forming dimers or co-binding the same regions. In this manuscript, Shen et al. explored spacing patterns between TFBSs using previously published data sets and revealed that the dominant pattern is a relaxed range of spacing between collaborative factors and tolerance for InDels that change the TFBS spacing.

https://doi.org/10.7554/eLife.70878.sa0

Introduction

Genome-wide association studies (GWASs) have identified thousands of genetic variants associated with diseases and other traits (MacArthur et al., 2017; Visscher et al., 2017). Single nucleotide polymorphisms (SNPs) and short insertions and deletions (InDels) represent common forms of these variants. The majority of GWAS variants fall at non-protein-coding regions of the genome, suggesting their effects on gene regulation (Farh et al., 2015; Ward and Kellis, 2012). Gene expression is regulated by transcription factors (TFs) in a cell-type-specific manner. A TF can bind to a specific set of short, degenerate DNA sequences at promoters and enhancers, often referred to as TF binding motif. Active promoters and enhancers are selected by combinations of sequence-specific TFs that bind in an inter-dependent manner to closely spaced motifs. SNPs and InDels can create or disrupt TF binding sites by mutating motifs and are a well-established mechanism for altering gene expression and biological function (Behera et al., 2018; Deplancke et al., 2016; Grossman et al., 2017; Heinz et al., 2013). InDels can additionally change spacing between TF binding sites, but it remains unknown the extent to which altered spacing is relevant for interpreting genetic variation in human populations or between animal species.

Previous studies reported two major categories of motif spacing between inter-dependent TFs (Slattery et al., 2014). One category refers to the enhanceosome model (Slattery et al., 2014) that requires specific or ‘constrained’ spacing. It is mainly provided by TFs that form ternary complexes recognizing composite binding motifs, exemplified by GATA, Ets, and E-box TFs in mouse hematopoietic cells (Ng et al., 2014), MyoD and other cell-type-specific factors in muscle cells (Nandi et al., 2013), and Sox2 and Oct4 in embryonic stem cells (Rodda et al., 2005). In vitro studies of the binding of pairwise combinations of ~100 TFs to a diverse library of DNA sequences identified 315 out of 9400 possible interactive TF pairs that select composite elements with constrained positions of the respective recognition motifs (Jolma et al., 2015). Constrained spacing required for the optimal binding and function of interacting TFs can also occur between independent motifs, such as occurs at the interferon-β enhanceosome (Panne, 2008). In comparison to constrained spacing, another category of motif spacing allows TFs to interact over a relatively broad range (e.g., 100–200 bp), which we call ‘relaxed’ spacing and is equivalent to the billboard model (Slattery et al., 2014). This type of spacing relationship is observed in collaborative or co-occupied TFs that do not target promoters or enhancers as a ternary complex (Heinz et al., 2010; Jiang and Singh, 2014; Sönmezer et al., 2021).

Substantial evidence indicates that the two categories of spacing requirement can experience a different level of impact from genetic variation. Reporter assays examining synthetic alterations of motif spacing revealed examples of TFs that require constrained spacing and have high sensitivity of TF binding and gene expression on spacing (Farley et al., 2015; Ng et al., 2014; Panne, 2008). On the contrary, flexibility in motif spacing has been demonstrated using reporter assays in Drosophila (Menoret et al., 2013) and HepG2 cells (Smith et al., 2013). However, these studies did not distinguish the impact of altered spacing on TF binding or subsequent recruitment of co-activators required for gene activation. Moreover, it remains unknown the extent to which these findings are relevant to spacing alterations resulting from naturally occurring genetic variation.

To investigate the effects of altered spacing on TF binding and function, we first characterized the genome-wide binding patterns of 73 TFs based on their binding sites determined by chromatin immunoprecipitation sequencing (ChIP-seq). We developed a computational framework that assigned each spacing relationship to ‘constrained’ or ‘relaxed’ category and associated spacings to the naturally occurring InDels observed in human populations to study the selective constraints of different spacing relationships. As specific case studies, we leveraged natural genetic variation in numerous human samples and from five strains of mice to study the effect size of spacing alterations on TF binding activity and local histone acetylation. These findings suggested that InDels altering spacing are generally less constrained and well tolerated when they occur between TF pairs with relaxed spacing relationships. Finally, we experimentally validated substantial tolerance in spacing for macrophage lineage-determining TFs (LDTFs), PU.1 and C/EBPβ, by introducing a wide range of InDels between their respective binding sites at representative endogenous genomic loci using CRISPR/Cas9 mutagenesis in mouse macrophages.

Results

TFs primarily co-bind with relaxed spacing

We characterized spacing relationships for 73 TFs of K562 cells covering diverse TF families (Hu et al., 2019) based on the ChIP-seq data from ENCODE data portal (Davis et al., 2018). After obtaining reproducible ChIP-seq peaks, we used the corresponding position weight matrix (PWM) of each TF (Supplementary file 1) to identify the locations of high-affinity binding sites that are less than 50 bp from peak centers (Figure 1A; Figure 1—figure supplement 1); 42% of peaks on average contain at least one binding site of corresponding TF (Supplementary file 1). The peaks of every pair of TFs were then merged, and at the overlapping peaks indicating co-binding events, the edge-to-edge spacings were calculated between TF binding sites and then aggregated to show a distribution within ±100 bp. To categorize spacing relationships, we used Monte Carlo procedures to obtain an empirical p-value to find significant spacing constraints and used Kolmogorov–Smirnov (KS) test to test for a relaxed spacing relationship against random distribution.

Figure 1 with 7 supplements see all

Download asset Open asset

Characterization of spacing relationships for transcription factor (TF) pairs.

(A) Schematic of data analysis pipeline for characterizing the spacing relationships based on TF chromatin immunoprecipitation sequencing (ChIP-seq) data. (B) Dissection of TF binding sites for TFs in K562 cells based on spacing relationships with co-binding TFs. Each dot represents a TF pair. The bar heights indicate medians. (C) Circos plot summarizing spacing relationships for all the TF pairs analyzed. Orange and blue bands represent significant constrained and relaxed spacing relationships, respectively. Color opacity indicates the level of significance. TFs are grouped and colored by TF family. (D) The spacing distributions of example TF pairs with constrained spacing or relaxed spacing relationships. Dashed lines indicate the significant constrained spacings. Since TAL1 motif is completely palindromic, the motif orientation is only differentiated by its co-binding partners.

Figure 1—source data 1 The numbers of co-binding sites for every pair of 73 transcription factors (TFs). A number represents chromatin immunoprecipitation sequencing (ChIP-seq) peaks of the TF on row that overlap with at least one ChIP-seq peak of the TF on column. Therefore, the number for (TF1, TF2) may not equal but should be close to the number for (TF2, TF1).: https://cdn.elifesciences.org/articles/70878/elife-70878-fig1-data1-v2.csv
Download elife-70878-fig1-data1-v2.csv
Figure 1—source data 2 Statistical test results for significant transcription factors (TF) pairs.: https://cdn.elifesciences.org/articles/70878/elife-70878-fig1-data2-v2.txt
Download elife-70878-fig1-data2-v2.txt

We applied this computational framework to all possible pairs of TFs. By dissecting each TF’s binding sites based on their spacing relationships with co-binding TFs, we found that 45 of the 73 TFs examined exclusively exhibited relaxed spacing relationships with other TFs (Figure 1B). Twenty-five factors could participate in either relaxed or constrained interactions, depending on the specific co-binding TFs. Only three TFs interacted with only constrained spacing, some of which might show additional relaxed spacing relationships by expanding the current set of TFs. The significant pairwise patterns of relaxed and constrained spacing relationships are illustrated in Figure 1C. Among 29 TF pairs with constrained spacing relationships, most bind closely to each other within 15 bp spacing (Figure 1—figure supplement 2). Some of these TF pairs have been reported to recognize composite motifs such as GATA1-TAL1 and NFATC3-FOSL1 (Macián et al., 2001; Ng et al., 2014; Figure 1D; Figure 1—figure supplement 3), and some are novel constrained spacing patterns discovered by our analysis such as MEF2A-JUND and CEBPB-TEAD4 (Figure 1—figure supplement 3). TFs exhibiting relaxed spacing are exemplified by ETV1-TAL1 and JUND-KLF16, in which the frequency of co-binding progressively declines with distance from the center of the reference TF (Figure 1D). We also saw frequent relaxed spacing between TFs in the same family. For instance, despite the similar motifs recognized by AP-1 factors, many of these TFs were found to co-localize at non-overlapping nearby positions. In addition, the same type of spacing relationship is usually observed in different motif orientations (Figure 1D), consistent with previous findings (Lis and Walther, 2016).

We downloaded the ChIP-seq data of HepG2 cells from ENCODE and processed them with the same pipeline as for K562 cells. The same TF pairs can have similar spacing relationships in different cell types, exemplified by CEBPB and JUND in K562 and HepG2 cells (Figure 1—figure supplement 4). Despite more frequent binding events occurring at specific spacings for constrained TF pairs or at closer spacings for relaxed TF pairs, the binding activities quantified by ChIP-seq tags were indifferent at various spacings, suggesting that the spacing preference is not a determinant of TF binding activity (Figure 1—figure supplement 5).

Since DNA repetitive regions such as transposable elements are known to harbor TF binding sites and specific TF co-binding (Bourque et al., 2008; Kunarso et al., 2010), we further examined whether the spacing relationships of TFs could be different in repetitive and nonrepetitive regions. To study this, we applied the same pipeline to the subsets of TF ChIP-seq peaks in repetitive and nonrepetitive regions. As a result, most of the relaxed spacing relationships remained regardless of repetitive or nonrepetitive regions (Figure 1—figure supplement 6). Some constrained TF pairs, however, showed constrained spacing only in repetitive regions and not in nonrepetitive regions (Figure 1—source data 2). For example, EGR1 and JUND exhibited a constrained spacing at 29 bp (Figure 1D), but this relationship is observed specifically in SINEs (Figure 1—figure supplement 7). Such observation is consistent with previous studies that discovered specific motif pairs in repetitive regions (Wang et al., 2012).

Natural genetic variants altering spacing between relaxed TFs are associated with less deleteriousness in human populations

Based on a global view of the TF spacing relationships, we then studied whether these relationships associate with different levels of sensitivity to spacing alterations. Here, we leveraged more than 60 million InDels from gnomAD data (Karczewski et al., 2020), which were based on more than 75,000 genomes from unrelated individuals. We mapped these InDels to the TF binding sites of representative TF pairs with constrained and relaxed spacing identified in K562 cells. We found that InDels between TF binding sites have similar sizes compared to those at binding sites and those in background regions, the majority of which are less than 5 bp (Figure 2A). Next, we divided these InDels based on their allele frequency (AF) and allele count (AC) into high-frequency variants (AF>0.01%), rare variants (AF<0.01%, AC>1), and singletons (AC = 1). Most of the InDels are singletons or rare variants (Figure 2—figure supplement 1; Figure 2—source data 1). We compared the enrichment of different categories of InDels between or at TF binding sites (Figure 2B; Figure 2—figure supplement 2). The InDel compositions at TF binding sites were not significantly different between constrained and relaxed spacing groups. On the contrary, singletons were significantly more enriched between the binding sites of TFs with constrained spacing, whereas high-frequency variants were significantly more depleted between these binding sites. We also computed for several TF pairs with random spacing relationships as negative controls and found similar enrichments of InDels like those with relaxed spacing. Since common variants are associated with less deleteriousness and rare variants with more deleteriousness (Lek et al., 2016), these findings suggest that InDels between TF binding sites with constrained spacing could be just as damaging as those at binding sites, whereas InDels between TF binding sites with relaxed spacing might have a much weaker effect. This observation is consistent with prior studies that validated significant effects of spacing alterations between TFs with constrained spacing relationships (Ng et al., 2014). However, few studies have discussed the effects of InDels on TFs with relaxed spacing, so we specifically focused on relaxed spacing relationships in the rest of the current study.

Figure 2 with 2 supplements see all

Download asset Open asset

Naturally occurring insertions and deletions (InDels) in human populations.

(A) Size distributions of human InDels within different regions. (B) Log2 odds ratios for different categories of InDels. Each dot represents a transcription factor (TF) pair with corresponding spacing relationship. Mann–Whitney U test was used to compare the odds ratios between different spacing relationships. Non-significant (n.s.) if p-value is larger than 0.01.

Figure 2—source data 1 The numbers and odds ratios of different categories of insertions and deletions (InDels) at or between transcription factor (TF) binding sites.: https://cdn.elifesciences.org/articles/70878/elife-70878-fig2-data1-v2.txt
Download elife-70878-fig2-data1-v2.txt

Spacing alterations across mouse strains are generally tolerated by relaxed TF binding and promoter and enhancer function

To investigate the regulatory effects of naturally occurring InDels that alter spacing between TFs with relaxed spacing relationships, we leveraged more than 50 million SNPs and 5 million InDels from five genetically diverse mouse strains, including C57BL/6J (C57), BALB/cJ (BALB), NOD/ShiLtJ (NOD), PWK/PhJ (PWK), and SPRET/EiJ (SPRET). The ChIP-seq data of key TFs and histone acetylation and genome-wide transcriptional run-on (GRO-seq) data are available for the bone marrow-derived macrophages (BMDMs) from every mouse strain (Link et al., 2018a). We first characterized the spacing relationship between the macrophage LDTFs, PU.1 (encoded by Spi1) and C/EBPβ (encoded by Cebpb), which have been found to bind in a collaborative manner at the regulatory regions of macrophage-specific genes (Heinz et al., 2010). Based on our computational framework for characterizing spacing relationships (Figure 1A), these two TFs follow a relaxed spacing relationship independent of their motif orientations (Figure 3A; KS p-value < 1e-6). Moreover, both PU.1 and C/EBPβ binding activities quantified by the ChIP-seq tags were indifferent at various spacings (Figure 3B).

Figure 3 with 8 supplements see all

Download asset Open asset

Effects of spacing alterations resulting from natural genetic variation across mouse strains.

(A) Spacing distributions of PU.1 and C/EBPβ binding sites at co-binding sites. (B) Density plots showing the relationship between transcription factor (TF) binding activity and motif spacing for the co-binding sites. Log2 chromatin immunoprecipitation sequencing (ChIP-seq) tags were calculated within 300 bp to quantify the binding activity of PU.1 and C/EBPβ. The color gradients represent the number of sites. Spearman’s correlation coefficients together with p-values are displayed. (**C, E, G**) Absolute log2 fold changes of ChIP-seq tags between C57 and another strain for (C) PU.1 binding, (E) C/EBPβ binding, or (G) nascent transcripts measured by GRO-seq. Boxplots show the median and quartiles of every distribution. Cohen’s d effect sizes comparing against variant-free regions are displayed on top. (**D, F, H**) Correlations between change of spacing or position weight matrix (PWM) score and change of (D) PU.1 binding, (F) C/EBPβ binding, or (H) nascent transcript level. Spearman’s correlation coefficients together with p-values are displayed.

Figure 3—source data 1 Tag fold changes at individual sites for PU.1 chromatin immunoprecipitation sequencing (ChIP-seq).: https://cdn.elifesciences.org/articles/70878/elife-70878-fig3-data1-v2.csv
Download elife-70878-fig3-data1-v2.csv
Figure 3—source data 2 Tag fold changes at individual sites for C/EBPβ chromatin immunoprecipitation sequencing (ChIP-seq).: https://cdn.elifesciences.org/articles/70878/elife-70878-fig3-data2-v2.csv
Download elife-70878-fig3-data2-v2.csv
Figure 3—source data 3 Tag fold changes at individual sites for GRO-seq.: https://cdn.elifesciences.org/articles/70878/elife-70878-fig3-data3-v2.csv
Download elife-70878-fig3-data3-v2.csv
Figure 3—source data 4 Tag fold changes at individual sites for H3K27ac chromatin immunoprecipitation sequencing (ChIP-seq).: https://cdn.elifesciences.org/articles/70878/elife-70878-fig3-data4-v2.csv
Download elife-70878-fig3-data4-v2.csv

We then conducted independent comparisons between C57 and one of the other four strains to investigate the effects of spacing alterations caused by natural genetic variation. Most of the natural InDels are less than 5 bp similar to those found in the human population (Figure 3—figure supplement 1). We first identified the co-binding sites of PU.1 and C/EBPβ for every strain and then, for each pairwise analysis, pooled the co-binding sites of C57 and a comparison strain to obtain the testing set of regions. Based on the impacts of SNPs and InDels on binding affinity quantified by PWM score or the impacts of InDels on spacing, we categorized the testing regions into the following independent groups: (1) mutated PU.1 motif, (2) mutated C/EBPβ motif, (3) mutated other potentially functional motifs, (4) altered spacing, (5) no motif/spacing effect, and (6) variant free. Potentially functional motifs were identified from PU.1 and C/EBPβ binding sites using MAGGIE (Shen et al., 2020), which is a computational tool that finds motifs associated with changes in TF binding (Figure 3—figure supplement 2). Considering that PU.1 and C/EBPβ binding could experience changes due to genetic variation mutating other motifs, we grouped these genetic variations to examine their overall effects and simultaneously reach a cleaner group of spacing alterations. The effect of genetic variation was quantified by the log2 fold difference of ChIP-seq tag counts between strains at orthogonal sites (Figure 3C). All the four independent comparisons showed that PU.1 binding is most strongly affected by PU.1 motif mutation, followed by C/EBPβ motif mutation and other motif mutation. Spacing alterations have a smaller effect size than any of these motif mutations, but still a relatively larger effect than variants affecting neither binding affinity nor spacing. Despite the moderate effect size of spacing alterations, we found such effect was independent of the size or direction of InDels (Figure 3D). On the contrary, changes of PU.1 ChIP-seq tags are strongly correlated with changes of binding affinity measured by changes of PWM scores (Figure 3D). In addition, the effects of motif mutation and spacing alteration are not varied by the initial spacing between PU.1 and C/EBPβ motifs (Figure 3—figure supplement 3). Similar findings were observed in C/EBPβ binding, except that C/EBPβ motif mutation had the largest effect size and the strongest correlation with C/EBPβ binding activity as expected (Figure 3E and F; Figure 3—figure supplement 3). Despite that most of the informative genetic variants are located at enhancers and relatively few within promoters, we saw consistent relationships in promoters and enhancers (Figure 3—figure supplement 4).

To investigate whether the effects of altered spacing on PU.1 and C/EBPβ binding can be generalized to hierarchical interactions with signal-dependent TFs (SDTFs), we leveraged the ChIP-seq data of PU.1, the NFκB subunit p65 (encoded by Rela), and the AP-1 subunit c-Jun (encoded by Jun) for BMDMs treated with the TLR4-specific ligand Kdo2 lipid A (KLA) in the same five strains of mice (Link et al., 2018a). Upon macrophage activation with KLA, p65 enters the nucleus and primarily binds to poised enhancer elements that are selected by LDTFs including PU.1 and AP-1 factors (Heinz et al., 2015). We observed a relaxed spacing relationship between PU.1 and p65 and between c-Jun and p65 (Figure 3—figure supplement 5). In addition, InDels altering motif spacing had a much smaller effect size on TF binding than motif mutations (Figure 3—figure supplement 6), consistent with our findings from PU.1 and C/EBPβ.

Although alterations in motif spacing had generally weak effects at the level of DNA binding, it remained possible that changes in motif spacing could influence subsequent steps in enhancer and promoter activation. To examine this, we extended our analysis to nascent transcription measured by GRO-seq (Core et al., 2008). Importantly, nascent transcription occurs both at active promoters and enhancers, with enhancer transcription serving as an indicator of enhancer activity (De Santa et al., 2010; Kim et al., 2019). We leveraged GRO-seq data of untreated BMDMs from the five strains of mice (Link et al., 2018a) and calculated the log fold changes of tags at the PU.1 and C/EBPβ co-binding sites for the same pairwise comparisons of strains. Like for TF binding, altered spacing demonstrated weaker effects on nascent transcription than motif mutations (Figure 3G), which is consistent with the significant correlations between changes in TF binding and changes in the level of nascent transcripts (Figure 3—figure supplement 7). The relative tolerance of spacing alteration was further supported by a weak correlation between changes in GRO-seq tags and the size of InDels, in contrast with a much stronger correlation with changes in binding affinity (Figure 3H). Thus, these findings extend the concept of spatial tolerance to the entire ensemble of factors that must be assembled to mediate nascent transcription. Similar relationships were observed for effects of InDels on local acetylation of histone H3 lysine 27 (H3K27ac) (Figure 3—figure supplement 7; Figure 3—figure supplement 8), which provides an alternative surrogate for enhancer and promoter activity (Creyghton et al., 2010).

Human quantitative trait loci altering spacing between relaxed TFs have small effect sizes

To study the effects of spacing alteration on TF binding and local histone acetylation in human cells, we leveraged the ChIP-seq data of ERG, p65, and H3K27ac in endothelial cells from dozens of individuals (Stolze et al., 2020). ERG is an ETS factor that functions as an LDTF in endothelial cells that selects poised enhancers where p65 binds in a hierarchical manner upon interleukin-1β (IL-1β) stimulation (Hogan et al., 2017). ERG and p65 follow a relaxed spacing relationship according to our method (Figure 4A). Next, we obtained 557 TF binding quantitative trait loci (bQTLs) for ERG, 5,791 bQTLs for p65, 25,621 histone modification QTLs (hQTLs) for H3K27ac in untreated cells, and 21,635 hQTLs for H3K27ac in IL-1β-treated cells (Stolze et al., 2020). We further classified bQTLs and hQTLs based on their impacts on binding affinity or spacing: (1) mutated both ERG and p65 (i.e., RELA) motif, (2) mutated ERG motif only, (3) mutated p65 motif only, (4) mutated other potentially functional motifs identified by MAGGIE (Shen et al., 2020), (5) altered spacing between ERG and p65 motif, (6) none of the above. To find potentially functional motifs, we fed MAGGIE with 100 bp sequences around QTLs before and after swapping alleles at the center (Figure 4—figure supplement 1). As a result, only a small portion of bQTLs and hQTLs directly mutates an ERG or p65 motif (Figure 4B; Figure 4—figure supplement 2). However, such motif mutations are enriched in bQTLs compared to non-QTLs (Fisher’s exact p < 1e-4). On the contrary, InDels that alter motif spacing are significantly depleted in p65 bQTLs (Fisher’s exact p = 1.3e-15). These InDels from the dozens of individuals are predominantly shorter than 5 bp by following a similar size distribution of those in human populations (Figure 4—figure supplement 3). A large proportion of QTLs affect other motifs, implicating the complexity of TF interactions. More than a quarter of the QTLs affect neither binding affinity nor spacing, which can be explained by the high correlation of non-functional variants with functional variants due to linkage disequilibrium.

Figure 4 with 4 supplements see all

Download asset Open asset

Effects of chromatin quantitative trait loci (QTLs) in human endothelial cells.

(A) Spacing distributions of ERG and p65 binding sites at co-binding sites. (B) Classification of chromatin QTLs based on the impacts on motif and spacing. (C) Absolute correlation coefficients of different QTLs. Cohen’s d and Mann–Whitney U test p-values comparing against the ‘other’ group are displayed on top. *p < 0.01, **p < 0.001, ***p < 0.0001. (D) Example QTLs for large effect size due to ERG motif mutation (upper) and trivial effect due to spacing alteration (lower).

Figure 4—source data 1 Effect sizes and categorization of p65 binding quantitative trait loci (bQTLs).: https://cdn.elifesciences.org/articles/70878/elife-70878-fig4-data1-v2.csv
Download elife-70878-fig4-data1-v2.csv
Figure 4—source data 2 Effect sizes and categorization of H3K27ac histone modification quantitative trait loci (hQTLs) at IL-1β.: https://cdn.elifesciences.org/articles/70878/elife-70878-fig4-data2-v2.csv
Download elife-70878-fig4-data2-v2.csv
Figure 4—source data 3 Effect sizes and categorization of ERG binding quantitative trait loci (bQTLs).: https://cdn.elifesciences.org/articles/70878/elife-70878-fig4-data3-v2.csv
Download elife-70878-fig4-data3-v2.csv
Figure 4—source data 4 Effect sizes and categorization of H3K27ac histone modification quantitative trait loci (hQTLs) at basal.: https://cdn.elifesciences.org/articles/70878/elife-70878-fig4-data4-v2.csv
Download elife-70878-fig4-data4-v2.csv

We further compared the effect sizes of different categories of QTLs. Despite being the minority among QTLs, variants that mutate both ERG and p65 motifs have the strongest effects on both p65 binding and histone acetylation in IL-1β-treated endothelial cells (Figure 4C). In comparison, ERG binding and the basal level of histone acetylation are significantly affected by ERG motif mutations in untreated endothelial cells and not by p65 motif mutations, consistent with the hierarchical interaction of p65 only upon IL-1β stimulation (Figure 4—figure supplement 4). In both conditions of endothelial cells, spacing alterations have the smaller effect size than motif mutation categories and are not significantly different from likely non-functional variants in the ‘other’ group. The examples showed a variant being both a p65 bQTL and a H3K27ac hQTL under the IL-1β state due to its impact on an ERG motif, and a 4 bp insertion between ERG and p65 motifs associated with no change in p65 binding or H3K27ac (Figure 4D).

Relaxed TF binding is highly tolerant to synthetic spacing alterations

The generally small effects of InDels occurring between TF pairs exhibiting relaxed spacing relationships raised the question of the robustness and the extent of such tolerance at genomic locations lacking such variation. We addressed this question by using CRISPR/Cas9 editing to introduce synthetic InDels between binding sites identified for the LDTFs PU.1 and C/EBPβ in mouse macrophages (Figure 5A). We used lentiviral transduction in Cas9-expressing ER-HoxB8 cells, which are conditionally immortalized monocyte progenitors, to introduce gRNAs targeting genomic sequences between the locations of PU.1 and C/EBPβ co-binding. The successfully transduced ER-HoxB8 cells were then sorted and differentiated into macrophages. Since non-homologous DNA repair resulting from the Cas9 nuclease activity would generate a spectrum of InDels in a population of transduced cells, we first measured input DNAs to obtain the distribution of InDels and then compared with TF ChIP-seq tags from deep sequencing, in which the effect of an InDel is reported as the odds ratio of ChIP tags to the input tags. Importantly, the ChIP-seq libraries were prepared by selective amplification of ChIP tags containing the targeted region of interest. Thus, for each region-specific sequence tag that was immunoprecipitated, we could simultaneously determine whether an InDel had been created and its specific length. Each tag is thus cell- and allele-specific.

Figure 5 with 2 supplements see all

Download asset Open asset

Effects of variable sizes of synthetic spacing alterations.

(A) Schematic for generating and analyzing synthetic spacing alterations. (B) The distributions of valid read counts from the input sample based on the InDel sizes of the reads. Negative InDel size indicates deletion, and positive size means insertion. (C) Log2 odds ratios by comparing C/EBPβ chromatin immunoprecipitation sequencing (ChIP-seq) reads and input sample reads. Y = 0 indicates where transcription factor (TF) binding has an expected amount of activity. p-Values were based on two-sample t-tests by comparing the InDel groups of each test region. (D) Sequencing data of ER-HoxB8 cells at co-binding site of PU.1 and C/EBPβ. Highlighted is test region #6 whose DNA sequence from PU.1 binding site to C/EBPβ binding site is shown. (E) Log2 odds ratios of test regions #6 as a function of InDel size.

Figure 5—source data 1 Raw chromatin immunoprecipitation sequencing (ChIP-seq) tag counts associated with different sizes of insertions and deletions (InDels).: https://cdn.elifesciences.org/articles/70878/elife-70878-fig5-data1-v2.txt
Download elife-70878-fig5-data1-v2.txt

We tested six PU.1 and C/EBPβ co-binding sites with their original spacing ranging from 26 to 55 bp (Supplementary file 1) and quantified the effects of InDels on C/EBPβ binding. Among the six test regions, three of them have supportive evidence from naturally occurring InDels of mouse strains (regions #1, #3, #5) and the other three don’t (regions #2, #4, #6). Based on the bioinformatic analysis of the ultra-deep sequencing reads from the input DNA samples, we saw that the CRISPR/Cas9 system generated a wide range of InDels with most deletions being <30 bp and short insertions usually less than 5 bp (Figure 5B). It provides longer deletions than natural genetic variations found across mouse strains (Figure 3—figure supplement 1) and in human populations (Figure 2A). After classifying ChIP-seq reads based on the InDel size and whether the InDel overlaps with any of the PU.1 and C/EBP binding sites, we estimated the effect size of InDels on C/EBPβ binding by calculating the odds ratio between C/EBPβ ChIP-seq reads and input DNA sample reads for every InDel group. We found that InDels altering spacing have significantly weaker effects on C/EBPβ binding in comparison to those overlapping with at least one of the binding sites (Figure 5C). For some test regions, the effects of pure spacing alterations are almost negligible, exemplified by test region #6 (Figure 5D and E) and test region #1 (Figure 5—figure supplement 1). Test region #6 is located near a highly expressed gene Prdx1 and has strong binding of PU.1 and C/EBPβ binding and strong signals of H3K27ac and chromatin accessibility indicated by ATAC-seq in ER-Hoxb8 cells, which all support its potential regulatory function (Figure 5D). The PU.1 and C/EBPβ binding sites at this region are 26 bp apart. In general, spacing alterations ranging from 5 bp increase to 22 bp decrease did not have a strong effect on TF binding, indicated by a log2 odds ratio close to 0 (Figure 5E). A small number of outliers were observed at each region where specific InDels resulted in substantial loss of binding (e.g., –20 bp, Figure 5E). C/EBPβ binding at these specific InDels was generally discontinuous with 1 bp increments (e.g., –19 and –21 bp, Figure 5E). The basis for these highly localized changes in the odds ratio in a small fraction of InDels that alter spacing is unclear. On the contrary, deletions overlapping with the TF binding sites resulted in a general decrease in TF binding activity. Similar results were found at test region #1 where PU.1 and C/EBPβ binding sites are 41 bp apart (Figure 5—figure supplement 1A). This Ly9 enhancer also has a 5 bp insertion between PU.1 and C/EBPβ binding sites in BALB, NOD, and PWK mice, and shows unaffected binding of PU.1 and C/EBPβ in the BMDMs of these strains (Figure 5—figure supplement 1B). As a result of the synthetic InDels, the C/EBPβ binding activity was generally unaffected by spacing alterations only, whereas deletions overlapping TF binding sites substantially diminished TF binding (Figure 5—figure supplement 1C). We further measured PU.1 binding using ChIP-seq at three out of six test regions and saw general tolerance of synthetic spacing alterations in contrast with significantly weaker PU.1 binding resulted from motif alterations (Figure 5—figure supplement 2).

Discussion

By classifying the genome-wide spacing relationships of 73 co-binding TFs as ‘constrained’ or ‘relaxed’, we revealed that relaxed spacing relationships were the dominant pattern of interaction for majority of these factors. Among these factors, approximately half could also participate in constrained spacing relationships with specific TF partners. We confirmed TF pairs known to exhibit constrained relationships (e.g., GATA1-TAL1) and identified previously unreported constrained relationships for additional pairs, including EGR1 and JUND. Overall, this finding of a subset of constrained TF interactions on a genome-wide level is consistent with the locus-specific examples provided by functional and structural studies of the interferon-β enhanceosome (Panne, 2008) and in vivo studies of synthetically modified enhancer elements in Ciona (Farley et al., 2015). Each of these examples represents genomic regulatory elements in which key TF binding sites are tightly spaced in their native contexts (i.e., 0–9 bp between binding sites). Direct protein-protein interactions are observed between bound TFs at the interferon-β enhanceosome, analogous to interactions defined for cooperative TFs that form ternary complexes (Morgunova and Taipale, 2017; Reményi et al., 2003). However, unlike the previous in vitro study that identified over 300 TF-TF interactions (Jolma et al., 2015), the spacing analyses in our study did not directly consider the possible overlap between TF binding sites. Thus, we are not able to discover constrained TFs that recognize overlapping motifs or distinguish effects of spacing alterations from effects of InDels on overlapping composite motifs.

Our findings based on ChIP-seq data were consistent with the recent in vivo profiling of TF co-occupancy on single DNA molecules, which discovered a lack of association between TF co-occupancy and precise spacing or orientation of motifs (Sönmezer et al., 2021). The observation that most TF pairs exhibited relaxed spacing relationships has intriguing implications for the mechanisms by which functional enhancers and promoters are selected from chromatinized DNA. In contrast to ternary complexes of TFs that cooperatively bind to composite elements as a unit, relaxed spacing relationships appear to not require specific protein-protein interactions between TFs for collaborative binding at most genomic locations. Although pioneering TFs necessary for selection of cell-specific enhancers have been reported to recognize their motifs within the context of nucleosomal DNA (Zaret and Carroll, 2011), the basis for collaborative binding interactions between TFs with relaxed spacing remains poorly understood.

While the current studies relying on natural genetic variation and mutagenesis experiments concluded clear tolerance of spacing alterations between binding sites of TFs with relaxed spacings, the extent to which this set of binding sites is representative of all regulatory elements is unclear. For example, we observed outliers in which significant differences in TF binding between mouse strains were associated with InDels occurring between TF binding sites. However, the proportion of outliers was generally similar to that observed at genomic regions lacking such InDels, and such strain differences may be driven by distal effects of genetic variation on interacting enhancer or promoter regions (Hoeksema et al., 2021; Link et al., 2018a). The remarkable tolerance of synthetic InDels at two independent endogenous genomic locations between PU.1 and C/EBPβ binding sites strongly support the generality of relaxed binding interactions for these two proteins. Intriguingly, while the densities of C/EBP binding sites increase with decreasing distance to PU.1 binding sites over a 100 bp range (Figure 3A), deletions from 1 to >30 bp between PU.1-C/EBPβ pairs did not result in improved binding. Instead, relatively constant binding was observed with progressive deletions bringing two binding sites close together until the deletions started to cause mutations in one or both motifs. The lack of requirement for exact spacing and remarkable tolerance of spacing alterations by TFs with relaxed spacing could potentially associate with the high turnover of TF binding sites found by previous studies (Vierstra et al., 2014), although further investigation would be needed to establish this association. A limitation of our studies is that few and relatively short insertions were obtained, preventing conclusions as to the extent to which increases in spacing are tolerated.

In concert, the present studies provide a basis for estimation of the potential phenotypic consequences of naturally occurring InDels in non-coding regions of the genome. The majority of naturally occurring InDels are less than 5 bp in length. In nearly all cases, InDels of this size range between binding sites for TFs that have relaxed binding relationships are unlikely to alter TF binding and function, and InDels of much greater length are frequently tolerated. In contrast, InDels between binding sites for TFs that have constrained binding relationships have the potential to result in biological consequences. Application of these findings to the interpretation of non-coding InDels that are associated with disease risk will require knowledge of the relevant cell type in which the InDel exerts its phenotypic effect and the types of TF interactions driving the selection and function of the affected regulatory elements.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Strain, strain background (Mus musculus, male)	B6(C)-Gt(ROSA) 26Sor^{em1.1(CAG-cas9*,-EGFP)Rsky}/J	Jackson Laboratory	Stock No: 028555RRID:IMSR_JAX:028555
Cell line (Mus musculus)	Cas9-expressing ER-HoxB8 cells	This paper		Gifted from Dr David Sykes
Cell line (human)	Lenti-X 293T cells	Clontech	Cat#: 632180RRID:CVCL_4401
Transfected construct (retrovirus)	Murine stem cell virus-based vector for ER-HoxB8	Massachusetts General Hospital, Boston, MA		Gifted from Dr David Sykes
Transfected construct (retrovirus)	lentiGuide-puro	Addgene	Cat#: 52963
Transfected construct (retrovirus)	psPAX2	Addgene	Cat#: 12260
Transfected construct (retrovirus)	pVSVG	Addgene	Cat#: 138479
Antibody	PU.1/Spi1 (rabbit polyclonal)	Santa Cruz	Cat#: sc-352XRRID:AB_632289	(1 µL)
Antibody	C/EBPβ (rabbit polyclonal)	Santa Cruz	Cat#: sc-150RRID:AB_2260363	(10 µL)
Antibody	H3K27ac (rabbit polyclonal)	Active Motif	Cat#: 39135RRID:AB_2614979	(2 µL)
Recombinant DNA reagent	NEBNext 2× High Fidelity PCR Master Mix	NEB	Cat#: M0541
Sequence-based reagent	Locus-specific Nextera hybrid primer	This paper	PCR primers	Sequences included in Supplementary file 1
Sequence-based reagent	Nextera index primer	This paper	PCR primers	Sequences included in Supplementary file 1
Peptide, recombinant protein	Recombinant Mouse IL-3	Peprotech	Cat#: 213–13
Peptide, recombinant protein	Recombinant Mouse IL-6	Peprotech	Cat#: 216–16
Peptide, recombinant protein	Recombinant Mouse SCF	Peprotech	Cat#: 250–03
Peptide, recombinant protein	Recombinant Mouse GM-CSF	Peprotech	Cat#: 315–03
Peptide, recombinant protein	Mouse M-CSF	Shenandoah Biotech	Cat#: 200–08
Commercial assay or kit	Direct-zol RNA MicroPrep kit	Zymo Research	Cat#: R2062
Commercial assay or kit	Qubit dsDNA HS Assay Kit	Thermo Fisher Scientific	Cat#: Q32851
Commercial assay or kit	Nextera DNA Library Preparation Kit	Illumina	Cat#: 15028212
Commercial assay or kit	ChIP DNA Clean & Concentrator	Zymo Research	Cat#: D5205
Commercial assay or kit	NEBNext Ultra II Library Preparation Kit	NEB	Cat#: E7645L
Chemical compound, drug	LentiBlast Transduction Reagent	OZ Biosciences	Cat#: LB00500
Chemical compound, drug	Ficoll-Paque-Plus	Sigma-Aldrich	Cat#: GE17-1440-02
Chemical compound, drug	RPMI-1640	Corning	Cat#: 10–014-CV
Chemical compound, drug	DMEM high glucose	Corning	Cat#: 10–013-CV
Chemical compound, drug	FBS	Omega Biosciences	Cat#: FB-12
Chemical compound, drug	100× Penicillin/ Streptomycin + L-glutamine	Gibco	Cat#: 10378–016
Chemical compound, drug	β-Estradiol	Sigma-Aldrich	Cat#: E2758
Chemical compound, drug	G418	Thermo Fisher	Cat#: 10131035
Chemical compound, drug	Polybrene	Sigma-Aldrich	Cat#: H9268
Chemical compound, drug	Fibronectin	Sigma-Aldrich	Cat#: F0895
Chemical compound, drug	Poly-D-lysin	Sigma-Aldrich	Cat#: DLW354210
Chemical compound, drug	X-tremeGENE HP DNA Transfection Reagent	Sigma-Aldrich	Cat#: 6366546001
Chemical compound, drug	Formaldehyde	Thermo Fisher Scientific	Cat#: BP531-500
Chemical compound, drug	Dynabeads Protein A	Invitrogen	Cat#: 10002D
Chemical compound, drug	SpeedBeads magnetic carboxylate modified particles	Sigma-Aldrich	Cat#: GE65152 105050250
Chemical compound, drug	Dynabeads MyOne Streptavidin T1	Invitrogen	Cat#: 65602
Software, algorithm	CHOPCHOP	CHOPCHOP (https://chopchop.cbu.uib.no/)	RRID:SCR_015723
Software, algorithm	Bowtie2	Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)	RRID:SCR_016368	Version 2.3.5.1
Software, algorithm	STAR	STAR (https://github.com/alexdobin/STAR)	RRID:SCR_004463	Version 2.5.3
Software, algorithm	HOMER	HOMER (https://homer.ucsd.edu/homer/)	RRID:SCR_010881	Version 4.9.1
Software, algorithm	MAGGIE	MAGGIE (https://github.com/zeyang-shen/maggie)	RRID:SCR_021903	Version 1.1
Software, algorithm	IDR	IDR (https://www.encodeproject.org/software/idr/)	RRID:SCR_017237	Version 2.0.3
Software, algorithm	MMARGE	MMARGE (https://github.com/vlink/marge)	RRID:SCR_021902	Version 1.0

Share this article

Cite this article

Characterization of spacing relationships for transcription factor (TF) pairs.

Figure 1—source data 1

Figure 1—source data 2

Naturally occurring insertions and deletions (InDels) in human populations.

Figure 2—source data 1

Effects of spacing alterations resulting from natural genetic variation across mouse strains.

Figure 3—source data 1

Figure 3—source data 2

Figure 3—source data 3

Figure 3—source data 4

Effects of chromatin quantitative trait loci (QTLs) in human endothelial cells.

Figure 4—source data 1

Figure 4—source data 2

Figure 4—source data 3

Figure 4—source data 4

Effects of variable sizes of synthetic spacing alterations.

Figure 5—source data 1

Author details

Zeyang Shen

Contribution

Competing interests

Rick Z Li

Contribution

Competing interests

Thomas A Prohaska

Contribution

Competing interests

Marten A Hoeksema

Present address

Contribution

Competing interests

Nathan J Spann

Contribution

Competing interests

Jenhan Tao

Contribution

Competing interests

Gregory J Fonseca

Present address

Contribution

Competing interests

Thomas Le

Contribution

Competing interests

Lindsey K Stolze

Contribution

Competing interests

Mashito Sakai

Present address

Contribution

Competing interests

Casey E Romanoski

Contribution

Competing interests

Christopher K Glass

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organisms