Chromatin immunoprecipitation (ChIP) and its derivatives are the main techniques used to determine transcription factor binding sites. However, conventional ChIP with sequencing (ChIP-seq) has problems with poor resolution, and newer techniques require significant experimental alterations and complex bioinformatics. Previously, we have used a new crosslinking ChIP-seq protocol (X-ChIP-seq) to perform high-resolution mapping of RNA Polymerase II (Skene et al., 2014). Here, we build upon this work and compare X-ChIP-seq to existing methodologies. By using micrococcal nuclease, which has both endo- and exo-nuclease activity, to fragment the chromatin and thereby generate precise protein–DNA footprints, high-resolution X-ChIP-seq achieves single base-pair resolution of transcription factor binding. A significant advantage of this protocol is the minimal alteration to the conventional ChIP-seq workflow and simple bioinformatic processing.https://doi.org/10.7554/eLife.09225.001
The diverse gene expression programs that allow for differentiation and response to environmental stimuli result from the regulated binding of transcription factors to DNA. The prevalent technique used in chromatin biology for mapping protein–DNA interactions is chromatin immunoprecipitation (ChIP), but little has changed since it was first described 27 years ago (Solomon et al., 1988). Despite recent advances in read-out technologies for ChIP, such as high-throughput sequencing (ChIP-seq), the basic chromatin preparation protocol remains the same and has a number of limitations. For example, sonication is typically used to fragment the chromatin. This however, has been shown to be non-random, with heterochromatic regions showing increased resistance to fragmentation leading to bias in the experiment (Teytelman et al., 2009). In addition, sonication typically produces chromatin fragments between 200 and 500 bp, whereas the footprint of a typical chromatin-associated protein is ∼10-fold smaller, indicating a lack of resolution currently obtained by ChIP-seq. Even extensive sonication only yields fragments with an average length of 200 bp, suggesting that sonication is of limited use in generating high-resolution maps of genome-wide protein binding (Fan et al., 2008). In a previous study, we were interested in how RNA Polymerase II transcribes through nucleosomes at the promoter (Skene et al., 2014). Answering this question required the precise mapping of PolII with respect to the position of nucleosomes, but conventional ChIP-seq that uses sonication to fragment the chromatin, yields fragments approximately twice the size of a nucleosome. Additionally, it has been shown that PolII can crosslink to nearby nucleosomes and therefore mapping the immunoprecipitated DNA fragments from these composite PolII:nucleosome:DNA complexes fails to precisely map the position of PolII on the DNA (Koerber et al., 2009; Skene et al., 2014). By using micrococcal nuclease (MNase) to digest unprotected DNA, we were able to achieve high resolution in a ChIP experiment, mapping the precise location of PolII and chromatin remodelers on the DNA (Figure 1A; a detailed protocol is provided as a Supplementary file 1) (Skene et al., 2014). Optimization of this simple protocol for the high-resolution mapping of protein–DNA interactions has the potential to revolutionize our understanding of genome-wide protein binding. High resolution is especially a requirement at closely spaced transcription factor binding sites, such as locus control regions and super-enhancers, which have been shown to be vital to cell fate decisions and human diseases (Hnisz et al., 2013; Pott and Lieb, 2014).
To evaluate high-resolution crosslinking ChIP-seq (X-ChIP-seq), we compared it to existing methodologies. Initially, we focused on mapping PolII at the transcriptional start site (TSS) in Drosophila S2 cells, where there are existing data sets at both low and high resolution. We performed high-resolution X-ChIP-seq using the same antibody against total PolII (Rpb3 subunit) as previously used by a conventional sonication ChIP experiment (Figure 1B) (Core et al., 2012). Using this cell line also allowed a comparison with the single base-pair resolution technique that maps the last ribonucleotide incorporated into the nascent RNA chain (3′NT), thereby mapping the exact position of the PolII active site (Weber et al., 2014). By using paired-end sequencing, we can selectively study specific lengths of immunoprecipitated fragments. Analyzing sequenced fragments with lengths 20–70 bp, which more closely represent the footprint of PolII (Samkurashvili and Luse, 1996), avoids the complication of mapping fragments consisting of PolII crosslinked to adjacent nucleosomes. Using this technique, we find that the maximal peak of PolII signal coincides with the position of the polymerase's active site at ∼+35 bp, as measured by 3′NT. This is consistent with evidence suggesting that the vast majority of Drosophila genes have a productively engaged PolII enzyme stalled just downstream of the promoter rather than PolII stably bound at the pre-initiation complex (Core et al., 2012). In contrast, PolII distribution as measured by conventional ChIP with the chromatin fragmented by sonication, shows a distinct distribution at the promoter with a broader peak centered at the TSS with maximal density at −5 bp. This discrepancy likely comes from biases in the probability of sonication breaking the DNA at the nucleosome-depleted region of the promoter, as accessible regions such as DNase I sites and promoters of active genes have been shown to be sonicated at higher probability than inactive genomic regions (Teytelman et al., 2009). Analysis of a published sonicated input chromatin sample indicates a strong sonication bias at the promoter region (Figure 1—figure supplement 1). In contrast, by predominantly fragmenting the chromatin with MNase, it is possible to generate footprints corresponding to nucleosomes and other DNA-bound factors (Henikoff et al., 2011; Skene et al., 2014). Overall, this shows that using a high-resolution ChIP technique to map the protected footprint of PolII achieves comparable resolution to the single base-pair resolution achieved by mapping the position of the active site of PolII via nascent chain mapping. In comparison to conventional ChIP-seq, using high-resolution X-ChIP-seq achieves both higher resolution, as indicated by the width of the ChIP peak and higher accuracy by avoiding sonication bias, as shown by high similarity to 3′NT. Moreover, the depth of sequencing indicates the cost-effectiveness of this high-resolution ChIP approach, with the 3′NT profile based on 150 million reads (Weber et al., 2014), whereas our method required only 7 million paired-end reads with a fragment length of 20–70 bp. For comparison, the PolII profile generated by conventional ChIP was based on 13 million mapped reads (Core et al., 2012).
A limitation of high-resolution X-ChIP-seq is that a minority of the immunoprecipitated fragments represent the footprint of PolII on DNA, likely as a consequence of formaldehyde readily forming protein–protein crosslinks generating complexes such as PolII crosslinked to nucleosomes (Koerber et al., 2009; Skene et al., 2014). In our previous study, mapping murine PolII, only 10% of the fragments were 20–70 bp in length and less than 3% were under 50 bp (Figure 1C) (Skene et al., 2014). Therefore, to improve the cost-effectiveness of this technique and make it more applicable to transcription factors, which typically have a <50-bp footprint, we have further optimized the method to enrich for short fragments prior to sequencing. Previously, Agencourt AMpure beads have been used to select for short fragments prior to linker ligation (Orsi et al., 2015). In agreement, initial attempts indicated that Agencourt AMpure beads could enrich for DNA fragments below 100 bp from a complex mixture, but were unable to selectively purify fragments of ∼50 bp. However, by adjusting the volumetric ratio of beads to DNA, we could reproducibly control the selection within the 100–200 bp range with a ratio of 1.1×, leaving fragments of ∼170 bp in the unbound fraction (Figure 2A). Given that the ligation of the Illumina adapters to the immunoprecipitated DNA adds ∼125 bp, by using this ratio of AMpure beads, we could selectively enrich for ligated products containing short inserts (Figure 2B). Using this approach on input DNA from a MNase ChIP experiment, where the vast majority of the DNA fragments are from mono-nucleosomes, we find a 25-fold enrichment of fragments below 50 bp (Figure 2C). Therefore, combining this modification to the existing library preparation protocol with the MNase X-ChIP approach yields cost-effective high-resolution data.
To assess the resolution of this method, we chose the well-characterized transcription factor CCCTC-binding factor (CTCF). We performed high-resolution X-ChIP-seq in K562 cells and analyzed 20–50 bp fragments, and compared this to conventional ChIP as used in the ENCODE project (Figure 3A). To avoid the complexities of peak-calling algorithms that might be biased for differing data types, we used an unbiased approach of centering the data on CTCF motifs that were found within DNase I sites and therefore most likely bound by CTCF. High-resolution X-ChIP-seq of CTCF yielded a more focused distribution of reads centered over the CTCF motif. To quantify this, we measured the width of the ChIP peak at its half-height for each individual CTCF site (Figure 3B). A conventional ChIP approach using sonication gave a half-height width of 200 bp. In contrast, analysis of 20–50 bp fragments from MNase ChIP gave much higher resolution, with a half-height width of only 50 bp, suggesting that genome-wide MNase is chewing back to a minimal footprint of CTCF bound to the DNA. By analyzing different ranges of fragment lengths, it was possible to see that shorter fragments gave the highest resolution and smallest range in peak widths (Figure 3B and Figure 3—figure supplement 1).
We also compared our CTCF high-resolution X-ChIP-seq results to CTCF profiles obtained using ChIP-exo, which is based upon the sonication of crosslinked chromatin, followed by exonuclease digestion of the immunoprecipitated complexes (Rhee and Pugh, 2011). In ChIP-exo, sequential ligation reactions allow the demarcation of 5′ and 3′ ends and bioinformatic analysis is used to identify ‘peak pairs’ that flank the transcription factor binding site. Figure 3C shows profiles based on ENCODE X-ChIP-seq data as processed by the ENCODE uniform processing pipelines and downloaded as ‘raw signal’, our high-resolution X-ChIP-seq stacked read data, and raw ChIP-exo data around a representative CTCF motif. Due to the low amounts of noise, high-resolution X-ChIP-seq is amenable to a very simple thresholding algorithm to identify peaks (Kasinathan et al., 2014), requiring only 13 million paired-end reads to obtain a crisp peak feature. This is in contrast to ChIP-exo, where more complex analysis is required, including dedicated software (Rhee and Pugh, 2011; Starick et al., 2015). Based on 82 million reads, the ChIP-exo raw data show significant signal at a distance from the CTCF motif. This might be a consequence of immunoprecipitating sonicated 200–300 bp chromatin fragments containing more than one protein, which would block the subsequent exonuclease cleavage. However, by using MNase to fragment the chromatin, which has both endo- and exo-nuclease activity, high-resolution X-ChIP-seq should be able to discriminate between nearby proteins. An additional limitation of ChIP-exo is that the input chromatin is not subjected to the same exonuclease treatment and therefore subsequent analyses cannot be normalized to input. With high-resolution X-ChIP-seq, however, all the steps in chromatin fragmentation are prepared prior to immunoprecipitation, thereby allowing input normalization. Moreover, high-resolution X-ChIP-seq requires only minimal alteration to the existing conventional ChIP workflow and library preparation, whereas other techniques require more extensive changes (Starick et al., 2015). To more directly compare to ChIP-exo, we plotted the end positions of each of our 20–50 bp paired-end reads (Figure 3D). We find two predominant sharp peaks on either side of the 19-bp CTCF motif that are separated by 19 bp, indicating that on average for each of our immunoprecipitated fragments, MNase has chewed back to one side of the minimal sequence motif. In contrast, the signal for ChIP-exo is relatively diffuse when centered around the CTCF motif, with an average distance of 52 bp between peak pairs for the peak-called sites (Rhee and Pugh, 2011). DNase I footprinting is often used to generate maps of global transcription factor binding at nucleotide resolution, with the drawback that the technique is not targeted to a specific transcription factor (Hesselberth et al., 2009; Neph et al., 2012). By comparing the ends of the DNA fragments released by DNase I footprinting and that of high-resolution X-ChIP-seq, we find that both techniques identify protected fragments with ends separated by 19 bp at the 19 bp consensus CTCF motifs (Figure 3—figure supplement 2). This therefore suggests that high-resolution X-ChIP-seq can achieve single nucleotide resolution, and by using immunoprecipitation, has the advantage that it can be used to interrogate individual transcription factors.
By harnessing the endo- and exo-nuclease activity of MNase to fragment chromatin, high-resolution X-ChIP-seq has key advantages over conventional ChIP-seq and ChIP-exo in terms of the resolution obtained (Figure 4). Overall, the combination of the improvements to enrich for short immunoprecipitated fragments and the unparalleled ChIP resolution for PolII and transcription factor binding indicate that high-resolution X-ChIP-seq is a cost-effective and simple approach that easily fits within existing ChIP-seq pipelines for determining precise genome-wide maps of protein–DNA binding.
Drosophila S2 cells and K562 cells were cultured under standard conditions.
High-resolution X-ChIP-seq was performed as described in the Supplementary file 1. Libraries were prepared from the isolated DNA and following cluster generation, 25 rounds of paired-end sequencing was performed by the FHCRC Genomics Shared Resource on the Illumina HiSeq 2500 platform (Henikoff et al., 2011). Details of the library protocol have previously been published (Orsi et al., 2015). After processing and base-calling by Illumina software, paired-end sequencing data were aligned to the hg19 or dmel_r5_51 genome build using Bowtie or Novoalign, respectively. Counts per base pair were normalized as previously described with the fraction of mapped reads spanning each base-pair position multiplied by the genome size (Kasinathan et al., 2014). To analyze reads by length, we divided paired-end fragments into distinct size classes, as indicated in the figure legends. V-plot construction has been previously described (Henikoff et al., 2011). Half-height width for each individual site was calculated as follows: the half height was calculated by dividing the maximum ChIP signal within ±1000 bp of each 19 bp CTCF motif by the background signal, which was defined as the median ChIP signal between −1000 to −900 and +900 to +1000 bp relative to the motif. The half-height width for each motif was calculated by counting the number of contiguous base pairs that had ChIP signal greater than or equal to the half-height.
Defining the status of RNA polymerase at promotersCell Reports 2:1025–1035.https://doi.org/10.1016/j.celrep.2012.08.034
Epigenome characterization at single base-pair resolutionProceedings of the National Academy of Sciences of USA 108:18318–18323.https://doi.org/10.1073/pnas.1110731108
Mapping regulatory factors by immunoprecipitation from native chromatinCurrent Protocols in Molecular Biology 110:21.31.1–21.31.25.https://doi.org/10.1002/0471142727.mb2131s110
Translocation and transcriptional arrest during transcript elongation by RNA polymerase IIThe Journal of Biological Chemistry 271:23495–23505.https://doi.org/10.1074/jbc.271.38.23495
Joaquín M EspinosaReviewing Editor; University of Colorado at Boulder, United States
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for submitting your work entitled “A simple method for generating high‐resolution maps of genome wide protein binding” for peer review at eLife. Your submission has been reviewed by James Manley (Senior editor) and a Reviewing editor. We feel the work as it stands is not yet fully developed to a level that we could consider for publication as a Research Advance in eLife. However, if you are able to address these concerns with additional data and significant revisions of the text to include more detail, we would be prepared to consider a resubmission on this topic that would be evaluated by the same editors.
The authors present modifications to the widely used ChIP-seq technique as a new method entitled X-ChIP-seq. These modifications simply involve the addition of a micrococcal nuclease digestion step to reduce background DNA fragments and a size-selection step to enrich for short fragments prior to sequencing. It is claimed that these modifications both greatly reduce background and increase resolution, relative to conventional ChIP-seq.
The authors highlight the simplicity of their modifications and bioinformatic analysis relative to other recent modifications to ChIP-seq such as ChIP-exo and ChIP-nexus, making this of interest to the field. The authors aim to show the superiority of X-ChIP-seq through limited comparisons to other datasets (such as conventional ChIP, ChIP-exo, 3'NT, and DNaseI footprinting).
The major areas of concern are:
1) The claim that X-ChIP-seq provides “near base-pair resolution” is not convincingly supported and several of the comparisons with other datasets are incomplete (as detailed below), making it difficult to directly compare the X-ChIP-seq data with other techniques.
2) There is a severe lack of information provided on methods used for the processing and analysis of sequencing data. This makes it difficult to verify the validity of methods used and does not support the claim of “simple bioinformatic processing”.
Main text, first paragraph: Clarify the reasoning/evidence that this is “near base-pair resolution”? This is not clearly established in Skene et al. 2014 and is not readily apparent from Figure 1; Although the positioning in Figure 1B looks different from conventional ChIPseq, the width (resolution) of the peak looks very similar; Are the authors arguing that it is higher resolution because it is a closer match to the 3'NT data which is nucleotide resolution? (This should actually be called accuracy).
Related to Figure 1: Would it not be better to demonstrate the resolution if this technique at a single locus (rather than averaging all TSS) and/or with a DNA-binding protein that has a more defined position on DNA? See Figure3 CTCF.
What do ChIP-exo and ChIP-nexus look like for PolII?
Figure 1–figure supplement 1: It would be very helpful to show end positions for 1st reads of X-ChIP input for comparison. This would allow for clear demonstration of any reduction in of bias. Also, it is unclear if this analysis uses reads aligned to both strands; this could change interpretation of lower half of figure; left shift of + strand reads would correspond to a right shift of - strand reads. How would this affect the plots?
Main text, third paragraph: How would adding this step alter conventional ChIPseq? I.e. what fraction of reads from conventional ChIPseq are within these size ranges and what happens to resolution/accuracy when only these are analyzed? Again there is insufficient comparison between new and old technique. This is important for demonstrating an improvement and/or less bias.
Main text, fourth paragraph: clarify “unbiased approach” (at least in Methods) How many sites were used? What was their average size etc. (some of this information is in Figure 3 legend). Give source for DNase data.
Main text, fourth paragraph: clarify or show data to support the claim that “shorter fragments gave the highest resolution and smallest range in peak widths”; Again this would be better supported with a comparison of both X-ChIP-seq and conventional ChIP-seq (at least by bioinformatically selecting shorter insert sizes).
Main text, fifth paragraph: Clarify “achieve single nucleotide resolution” and comment on offset between the two techniques.
Main text, last paragraph: It would be helpful to see more analysis of low background as in many cases (i.e. transcription-related proteins that do not directly bind DNA) this could be as important as the claimed increase in resolution.https://doi.org/10.7554/eLife.09225.010
- Steven Henikoff
- Peter J Skene
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- Joaquín M Espinosa, Reviewing Editor, University of Colorado at Boulder, United States
© 2015, Skene and Henikoff
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.