1. Chromosomes and Gene Expression
  2. Genetics and Genomics
Download icon

Improved CUT&RUN chromatin profiling tools

  1. Michael P Meers
  2. Terri D Bryson
  3. Jorja G Henikoff
  4. Steven Henikoff  Is a corresponding author
  1. Fred Hutchinson Cancer Research Center, United States
  2. Howard Hughes Medical Institute, United States
Research Advance
  • Cited 0
  • Views 4,054
  • Annotations
Cite this article as: eLife 2019;8:e46314 doi: 10.7554/eLife.46314

Abstract

Previously, we described a novel alternative to chromatin immunoprecipitation, CUT&RUN, in which unfixed permeabilized cells are incubated with antibody, followed by binding of a protein A-Micrococcal Nuclease (pA/MNase) fusion protein (Skene and Henikoff, 2017). Here we introduce three enhancements to CUT&RUN: A hybrid protein A-Protein G-MNase construct that expands antibody compatibility and simplifies purification, a modified digestion protocol that inhibits premature release of the nuclease-bound complex, and a calibration strategy based on carry-over of E. coli DNA introduced with the fusion protein. These new features, coupled with the previously described low-cost, high efficiency, high reproducibility and high-throughput capability of CUT&RUN make it the method of choice for routine epigenomic profiling.

https://doi.org/10.7554/eLife.46314.001

Introduction

Profiling the chromatin landscape for specific components is one of the most widely used methods in biology, and over the past decade, chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) has become practically synonymous with genome-wide chromatin profiling (Landt et al., 2012; Schubert, 2018). However, the most widely used ChIP-seq protocols have limitations and are subject to artifacts (Jain et al., 2015; Park et al., 2013; Teves et al., 2016; Teytelman et al., 2013), of which only some have been addressed by methodological improvements (Brind'Amour et al., 2015; Kasinathan et al., 2014; Rhee and Pugh, 2011; Rossi et al., 2018; van Galen et al., 2016). An inherent limitation to ChIP is that solubilization of chromatin, whether by sonication or enzymatic digestion, results in sampling from the entire solubilized genome, and this requires very deep sequencing so that the sites of targeted protein binding can be resolved above background (Landt et al., 2012). To overcome this limitation, we introduced Cleavage Under Targets and Release Using Nuclease (CUT&RUN) (Skene and Henikoff, 2017), which is based on the chromatin immunocleavage (ChIC)-targeted nuclease strategy (Schmid et al., 2004): Successive incubation of unfixed cells or nuclei with an antibody and a Protein A-Micrococcal Nuclease (pA/MNase) fusion protein is followed by activation of MNase with calcium. In CUT&RUN, cells or nuclei remain intact throughout the procedure and only the targeted sites of binding are released into solution. Our CUT&RUN method dramatically reduced non-specific backgrounds, such that ~10 fold lower sequencing depth was required to obtain similar peak-calling performance (Skene and Henikoff, 2017). In addition, CUT&RUN provides near base-pair resolution, and our most recently published benchtop protocol is capable of profiling ~100 human cells for an abundant histone modification and ~1000 cells for a transcription factor (Skene et al., 2018). The simplicity of CUT&RUN has also resulted in a fully automated robotic version (AutoCUT&RUN) in which the high reproducibility and low cost makes it ideally suited for high-throughput epigenomic profiling of clinical samples (Janssens et al., 2018). Other advances based on our original CUT&RUN publication include CUT&RUN.Salt for fractionation of chromatin based on solubility (Thakur and Henikoff, 2018) and CUT&RUN.ChIP for profiling specific protein components within complexes released by CUT&RUN digestion (Brahma and Henikoff, 2019). CUT&RUN has also been adopted by others (Ahmad and Spens, 2018; Daneshvar et al., 2019; de Bock et al., 2018; Ernst et al., 2019; Federation et al., 2018; Hainer et al., 2019; Hainer and Fazzio, 2019; Hyle et al., 2019; Inoue et al., 2018; Liu et al., 2018; Menon et al., 2019; Oomen et al., 2019; Park et al., 2019; Roth et al., 2018; Uyehara and McKay, 2019; Zhang et al., 2019; Zheng and Gehring, 2019), and since publication of our eLife paper we have distributed materials to >600 laboratories world-wide, with user questions and answers fielded interactively on our open-access Protocols.io site (dx.doi.org/10.17504/protocols.io.zcpf2vn).

Broad implementation of CUT&RUN requires reagent standardization, and the rapid adoption of CUT&RUN by the larger community of researchers motivates the enhancements described here. First, the method requires a fusion protein that is not at this writing commercially available, and the published pA/MNase purification protocol is cumbersome, which effectively restricts dissemination of the method. Therefore, we have produced an improved construct with a 6-His-Tag that can be easily purified using a commercial kit, and by using a Protein A-Protein G hybrid, the fusion protein binds avidly to mouse antibodies, which bind only weakly to Protein A. Second, the original protocols are sensitive to digestion time, in that under-digestion results in low yield and over-digestion can result in pre-mature release of pA/MNase-bound complexes that can digest accessible DNA sites. To address this limitation, we have modified the protocol such that premature release is reduced, allowing digestion to near-completion for high yields with less background. Third, the current CUT&RUN protocol recommends a spike-in of heterologous DNA at the release step to compare samples in a series. Here we demonstrate that adding a spike-in is unnecessary, because the carry-over of E. coli DNA from purification of pA/MNase or pAG/MNase is sufficient to calibrate samples in a series.

Results and discussion

An improved CUT&RUN vector

The pA/MNase fusion protein produced by the pK19-pA-MN plasmid (Schmid et al., 2004) requires purification from lysates of Escherichia coli overexpressing cells using an immunoglobulin G (IgG) column, and elution with low pH followed by neutralization has resulted in variations between batches. To improve the purification protocol, we added a 6-His tag (Bornhorst and Falke, 2000) into the pK19-pA-MN fusion protein (Figure 1A and Figure 1—figure supplement 1A). This allowed for simple and gentle purification on a nickel resin column (Figure 1—figure supplement 1B). In addition, we found that a commercial 6-His-cobalt resin kit also yielded pure highly active enzyme from a 20 ml culture, enough for ~10,000 reactions. Even when used in excess, there is no increase in release of background fragments (Figure 1—figure supplement 2), which indicates that the washes are effective in removing unbound fusion protein.

Figure 1 with 2 supplements see all
An improved fusion protein for CUT&RUN.

(A) Schematic diagram (not to scale) showing improvements to the pA-MNase fusion protein, which include addition of the C2 Protein G IgG binding domain, a 6-histidine tag for purification and a hemagglutinin tag (HA) for immunoprecipitation. (B) The Protein A/G hybrid fusion results in high-efficiency CUT&RUN for both rabbit and mouse primary antibodies. CUT&RUN for both rabbit and mouse RNAPII-Ser5phosphate using pAG/MNase were extracted from either the supernatant or the total cellular extract. Tracks are shown for the histone gene cluster at Chr6:26,000,000–26,300,000, where NPAT is a transcription factor that co-activates histone genes. Tracks for 2’ and 10’ time points are displayed at the same scale for each antibody and for both supernatant (supn) or total DNA extraction protocols.

https://doi.org/10.7554/eLife.46314.002

In principle an epitope-tagged pAG/MNase could be used for chromatin pull-down from a CUT&RUN supernatant in sequential strategies like CUT&RUN.ChIP (Brahma and Henikoff, 2019). However, in practice use of the 6-His tag is complicated by the requirement for a chelating agent to release the protein from the nickel resin. Therefore, we also added an HA (hemagglutinin) tag, which could be used to affinity-purify the complex of a directly bound chromatin particle with a primary antibody and the fusion protein.

Protein A binds well to rabbit, goat, donkey and guinea pig IgG antibodies, but poorly to mouse IgG1, and so for most mouse antibodies, Protein G is generally used (Fishman and Berg, 2019). To further improve the versatility of the MNase fusion protein, we encoded a single Protein G domain adjacent to the Protein A domain in the pK19-pA-MN plasmid (Eliasson et al., 1988). In addition, we mutated three residues in the Protein G coding sequence to further increase binding for rabbit antibodies (Jha et al., 2014). This resulted in a fusion protein that binds strongly to most commercial antibodies without requiring a secondary antibody. We found that for ordinary CUT&RUN applications pAG/MNase behaves very similarly to pA/MNase, but is more easily purified and is more versatile, for example allowing us to perform CUT&RUN without requiring a secondary antibody for mouse primary monoclonal antibodies (Figure 1B).

Preventing premature release during CUT&RUN digestion

When fragments are released by cleavage in the presence of Ca++ ions, the associated pA/MNase complex can digest accessible DNA (Skene and Henikoff, 2017). Although performing digestion at 0°C minimizes this artifact, eliminating premature release during digestion would allow for more complete release of target-specific fragments. Based on the observation that nucleosome core particles aggregate in high-divalent-cation and low-salt conditions (de Frutos et al., 2001), we wondered whether these conditions would prevent premature release of chromatin particles in situ. Therefore, we performed digestions in 10 mM CaCl2 and 3.5 mM HEPES pH 7.5. Under these high-calcium/low-salt conditions, chromatin is digested with no detectable release of fragments into the supernatant (Figure 2). Reactions are halted by transferring the tube to a magnet, removing the liquid, and adding elution buffer containing 150 mM NaCl, 20 mM EGTA and 25 µg/ml RNAse A, which releases the small DNA fragments into the supernatant. These conditions are compatible with direct end-polishing and ligation used for AutoCUT&RUN (Janssens et al., 2018). Furthermore, retention of the cleaved fragments within the nucleus under high-divalent cation/low-salt conditions could facilitate single-cell application of CUT&RUN.

Targeted fragments are not released during digestion using high-calcium/low-salt conditions.

CUT&RUN was performed using either the high-Ca++/low-salt (Ca++) or the standard (Std) method with antibodies to three different epitopes. DNA was extracted from supernatants, where no elution was carried out for the Ca++ samples. Although high yields of nucleosomal ladder DNA eluted from the supernatants using the standard method, no DNA was detectable in the supernatant using the high-Ca++/low salt method when the elution step was omitted. Left, Tapestation images from indicated lanes; Right, Densitometry of the same lanes.

https://doi.org/10.7554/eLife.46314.005

The high-calcium/low-salt protocol provided similar results using either pA/MNase and pAG/MNase (Figure 3). We also obtained similar results with either protocol for digestion time points over a ~ 30 fold range and for both supernatant and total DNA extraction (Figure 4—figure supplement 1). For antibodies to H3K27ac, libraries produced using the high-calcium/low-salt protocol showed improved consistency relative to the standard protocol when digested over an extended time-course (Figure 4), presumably because preventing release of particles during digestion avoids their premature release where they would artifactually digest accessible DNA. The close correlations between high-calcium/low-salt H3K27ac datasets for time points over a ~ 100 fold range occur with corresponding increases in the yield of fragments released into the supernatant during subsequent elution (Figure 4—figure supplement 2). This indicates that longer digestion times result in higher yields, with high signal-to-noise throughout the digestion series (Figure 4). Thus, this modification of CUT&RUN can reduce the risk of overdigestion for abundant epitopes such as H3K27ac, where premature release of pA-MNase-bound chromatin particles can increase background.

Similar performance using pA/MNase and pAG/MNase.

(A) CUT&RUN was performed with an antibody to H3K27ac (Millipore MABE647) and to CTCF (Millipore 07–729) with digestion over a 1 to 27 min range as indicated using pAG/MNase with the high-Ca++/low-salt protocol. Correlation matrix comparing peak overlaps for time points and fusion constructs. The datasets were pooled and MACS2 was used with default parameters to call peaks, excluding those in repeat-masked intervals and those where peaks overlapped with the top 1% of IgG occupancies, for a total of 52,425 peaks. Peak positions were scored for each dataset and correlations (R2 values shown along the diagonal and displayed with Java TreeView v.1.16r2, contrast = 1.25) were calculated between peak vectors. IgG and H3K27me3 (me3) negative controls were similarly scored. (B) Same as A, except the antibody was to CTCF. A set of 9403 sites with a CTCF motif within a hypersensitive site was used (Skene and Henikoff, 2017). High correlations between all time points demonstrate the uniformity of digestion over a 27-fold range. (C) Representative tracks from datasets used for panels A and B showing a 100 kb region that includes a histone locus cluster (chr6:25,972,600–26,072,600).

https://doi.org/10.7554/eLife.46314.006
Figure 4 with 2 supplements see all
Consistent peak definition with high-Ca++/low salt digestion.

(A) H3K27ac CUT&RUN time-course experiments were performed with an Abcam 4729 rabbit polyclonal antibody, following either the standard protocol or the low-salt/high-calcium (High-Ca++) protocol. Samples of 5 million fragments from the 10 H3K27ac datasets were pooled and MACS2 called 36,529 peaks. Peak positions were scored for each dataset and correlations (R2 values shown along the diagonal) were calculated between peak vectors. IgG and H3K27me3 (me3) negative controls were similarly scored. Higher correlations between the High-Ca++ than the Standard time points indicates improved uniformity of digestion over the ~100 fold range of digestion times. (B) Tracks from a representative 200 kb region around the HoxB locus. (C) Fraction of reads in peaks (Frip) plots for each time point after down-sampling (5 million, 2.5 million, 1.25 million, 625,000 and 312,500), showing consistently higher Frip values for Ca++ (red) than Std (blue).

https://doi.org/10.7554/eLife.46314.007

We previously showed that CUT&RUN can be performed on insoluble protein complexes by extracting total DNA (Skene and Henikoff, 2017) or by performing salt fractionation of the bead-bound cells and extracting DNA from the residual pellet (Thakur and Henikoff, 2018). In either case, large DNA fragments were depleted using SPRI (AMPure XP) beads before library preparation. RNA polymerase II (RNAPII) from animal cells is insoluble when engaged (Mayer et al., 2015; Weber et al., 2014), and requires harsh treatments for quantitative profiling using ChIP (Skene and Henikoff, 2015). To determine whether CUT&RUN can be used for insoluble chromatin complexes, we profiled Serine-5-phosphate on the C-terminal domain (CTD) of the Rpb1 subunit of RNAPII using both extraction of supernatant and of total DNA. This CTD phosphorylation is enriched in the initiating form of RNAPII, and we observed similar genic profiles for supernatant and total DNA (Figure 1B).

Calibration using E. coli carry-over DNA

Comparing samples in a series typically requires calibration for experimental quality and sequencing read depth. It is common to use background levels to calibrate ChIP-seq samples in a series and to define and compare peaks for peak-calling (Landt et al., 2012). However, the extremely low backgrounds of CUT&RUN led us to a calibration strategy based on spike-in of heterologous DNA, which has been generally recommended for all situations in which samples in a series are to be compared (Chen et al., 2015; Hu et al., 2015). In our current spike-in protocol, the heterologous DNA, which is typically DNA purified from an MNase digest of yeast Saccharomyces cerevisiae or Drosophila melanogaster chromatin, is added when stopping a reaction, and we adopted this spike-in procedure for the high-calcium/low-salt protocol described in the previous section. Interestingly, we noticed that mapping reads to both the spike-in genome and the E. coli genome resulted in almost perfect correlation (R2 = 0.97) between S. cerevisiae and E. coli in an experiment using pA/MNase in which the number of cells was varied over several orders of magnitude (Figure 5A). Near-perfect correlations (R2 = 0.96–0.99) between yeast spike-in and carry-over E. coli DNA were also seen in series using the same batch of pAG/MNase with high-calcium/low-salt digestion conditions (Figure 5B), and for both supernatant release and extraction and total DNA extraction (Figure 5C–D). These strong positive correlations are not accounted for by cross-mapping of the yeast spike-in to the E. coli genome, because omitting the spike-in for a low-abundance epitope resulted in very few yeast counts with high levels of E. coli counts (blue symbol in Figure 5C–D panels). As the source of E. coli DNA is carried over from purification of pA/MNase and pAG/MNase, the close correspondence provides confirmation of the accuracy of our heterologous spike-in procedure (Skene and Henikoff, 2017). Moreover, as carry-over E. coli DNA is introduced at an earlier step, and is cleaved to small mappable fragments that are released during digestion and elution, it provides a more desirable calibration standard than using heterologous DNA (Chen et al., 2015; Hu et al., 2015). High correlations were also seen between S. cerevisiae spike-in and E. coli carry-over DNA for pA-MNase in batches that we have distributed (Table 1). Therefore, data for nearly all CUT&RUN experiments performed thus far can be recalibrated post-hoc whether or not a spike-in calibration standard had been added.

E. coli carry-over DNA of pA/MNase and pAG/MNase can substitute for spike-in calibration.

(A) Fragments from a CUT&RUN K562 cell experiment (GSE104550 20170426) using antibodies against H3K27me3 (100–8,000 cells) and CTCF (1,000–100,000 cells) were mapped to the repeat-masked genome of S. cerevisae and the full genome of E. coli. Standard digestion was followed by supernatant release and extraction. (B) Same as A using antibodies against multiple epitopes of varying abundances, with high-calcium/low-salt digestion followed by supernatant release and extraction. (C) Same as B except using standard digestion conditions and total DNA extraction. The S. cerevisiae spike-in DNA was left out for one sample (blue square). From top to bottom, antibodies are: NPAT Thermo PA5-66839, Myc: CST Rabbit Mab #13987, CTD: PolII CTD Abcam 8WG16, RNAPII-Ser5: Abcam 5408 (mouse), RNAPII-Ser2: CST E1Z3G, CTCF Millipore 07–729, RNAPII-Ser5: CST D9N5I (rabbit), H3K4me2: Upstate 07–030. (D) Same as C except using high-calcium/low-salt digestion and total DNA extraction. From top to bottom, antibodies are: CTCF Millipore 07–729, NPAT Thermo PA5-66839, Myc: CST Rabbit Mab #13987, CTD: PolII CTD Abcam 8WG16, RNAPII-Ser5: Abcam 5408 (mouse), RNAPII-Ser5: CST D9N5I (rabbit), RNAPII-Ser2: CST E1Z3G, H3K4me2: Upstate 07–030.

https://doi.org/10.7554/eLife.46314.010
Table 1
Carry-over E. coli DNA correlates closely with the heterologous spike-in for both fusion proteins and both low-salt/high-calcium and standard digestion conditions.

CUT&RUN was performed for H3K27me3 in parallel for pA/MNase Batch #6 (pA), pAG/MNase (pAG) using both low-salt/high-calcium (lo-hi) and standard (std) CUT&RUN digestion conditions. Each sample started with ~700,000 cells and 10 µL of bead slurry. Also varied in this experiment was addition of antibody followed by bead addition (Ab first) and addition of 0.1% BSA in the antibody buffer (BSA). Adding antibody first led to increased recovery of both yeast and E. coli DNA relative to human DNA, indicative of loss of cells prior to addition of fusion protein, possibly caused by loss of digitonin solubilization of membrane sugars.

https://doi.org/10.7554/eLife.46314.011
H3K27me3Ab firstBSAHumanYeastE. coliCorr (Sc:Ec)
pA lo-hi591398374334550.92
pA lo-hi+77480038584988
pA lo-hi+5202278228816110
pA lo-hi++5178086180418759
pA std601334759524620.99
pA std+60050808592295
pA std+4104736262421236
pA std++3972820232819245
pAG lo-hi69998027894040.94
pAG lo-hi+6374939642467
pAG lo-hi+414040715651291
pAG lo-hi++405869323825289
pAG std75141273085670.90
pAG std+5935592355125
pAG std+45941531271555
pAG std++537961025091353

To explain the presence of carry-over E. coli DNA in proportion to the amount of yeast spike-in DNA, which is constant between samples in a series, we can exclude intracellular binding, because we observe proportionality between E. coli and yeast reads despite varying human cell numbers over two orders of magnitude (Figure 5A). Rather, we note that Concanavalin A binds to glycosylated immunoglobulins, and so the successive treatments of Con A bead-bound cells with excess antibody and Protein A(G)/MNase fusion protein will affix an amount of carry-over E. coli DNA to beads in proportion to the number of beads. Our use of a constant number of beads for all samples in a series to be compared would then have resulted in a constant amount of carry-over E. coli DNA. A similar inference of E. coli carry-over DNA suitable for calibration was noted for CUT&Tag (Kaya-Okur et al., 2019), which suggests successive binding of antibodies and Protein A-Tn5 to the Con A beads used to immobilize cells. Thus, our calibration strategy might serve as a more general replacement for conventional spike-ins.

Conclusions

Since its introduction in our original eLife paper (Skene and Henikoff, 2017), the advantages of CUT&RUN over ChIP-seq has led to its rapid adoption, including publication of new CUT&RUN protocols for low cell numbers (Hainer and Fazzio, 2019; Skene et al., 2018), for plant tissues (Zheng and Gehring, 2019) and for high-throughput (Janssens et al., 2018). The new CUT&RUN advances that we describe here are likely to be useful when applied in all of these protocols. Our improved CUT&RUN fusion construct simplifies reagent purification and eliminates the requirement for a secondary antibody against mouse primary antibodies. Our high-calcium/low-salt protocol minimizes time-dependent variability. Our discovery that carry-over E. coli DNA almost perfectly correlates with an added spike-in upgrades a contaminant to a resource that can be used as a spike-in calibration proxy, even post-hoc simply by counting reads mapping to the E. coli genome in existing CUT&RUN datasets.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Cell line (Human)K562ATCC#CCL-243RRID: CVCL_0004
Biological sample (Escherichia coli)JM101 cellsAgilent#200234
Antibodyrabbit polyclonal anti-NPATThermoPA5-66839Concentration: 1:100; RRID:AB_2663287
Antibodyguinea pig polyclonal anti-rabbit IgGAntibodies OnlineABIN101961Concentration: 1:100; RRID: AB_10775589
Antibodyrabbit polyclonal anti-mouse IgGAbcam46540Concentration: 1:100; RRID: AB_2614925
Antibodyrabbit monoclonal anti-RNAPII-Ser5Cell SignalingD9N51Concentration: 1:100
Antibodymouse monoclonal anti-RNAPII-Ser5Abcam5408Concentration: 1:100; RRID:AB_304868
Antibodyrabbit monoclonal anti-H3K27me3Cell Signaling9733Concentration: 1:100; RRID: AB_2616029
Antibodyrabbit polyclonal anti-H3K4me2Upstate07–730Concentration: 1:100; RRID: AB_11213050
Antibodyrabbit monoclonal anti-H3K27acMilliporeMABE647Concentration: 1:100;
Antibodyrabbit polyclonal anti-H3K27acAbcam4729Concentration: 1:100; RRID: AB_2118291
Antibodyrabbit polyclonal anti-CTCFMillipore07–729Concentration: 1:100; RRID: AB_441965
Recombinant DNA reagentAG-ERH-MNase-6xHIS-HA (plasmid)Progenitors: pK19-pA-MN; gBlocks
Recombinant DNA reagentpK19-pA-MNSchmid et al., 2004Gift from author
Sequence-based reagentgBlock Hemagglutinin and 6-histidine tags; gattacaGAAGACAACGCTGATTCAGGTCAAGGCGGtGGTGGcTCTGGgGGcGGgGGcTCGGGtGGtGGgGGcTCAcaccatcaccatcaccatGGCGGtGGTGGcTCTTACCCATACGATGTTCCAGATTACGCTtaatgaGGATCCgattacaIntegrated DNA Technologies (IDT)
Sequence-based reagentgBLOCK PrtG_ERH Codon optimized; AGCAGAAGCTAAAAAGCTAAACGATGCTCAAGCACCAAAAACAACTTATAAATTAGTCATCAACGGGAAAACGCTGAAGGGTGAAACCACGACAGAGGCCGTAGATGCGGAGACAGCGGAGCGCCACTTTAAGCAATACGCGAATGATAACGGTGTAGACGGCGAGTGGACCTACGACGACGCGACAAAGACCTTTACCGTCACGGAGAAACCTGAGGTTATCGACGCGTCTGAGTTGACGC
CAGCCGTAGATGACGATAAAGAATTCGCAACTTCAACTAAAAAATTAC
Integrated DNA Technologies (IDT)
Peptide, recombinant proteinpA/MNaseSchmid et al., 2004purified as described inSchmid et al., 2004 and supplementary
Peptide, recombinant proteinpAG/MNaseThis paperPurified from modified plasmid pAG-ERH-MNase-6xHIS-HA in
S Henikoff Lab
Commercial assay or kitPull-Down PolyHis Protein:Protein Interaction KitThermo#21277
OtherConcanavalin A coated magnetic beadsBangs Laboratories#BP-531
OtherGibson AssemblyNew England Biolabs#E2611
OtherChicken egg white lysozymeEMD Millipore#71412
OtherZwittergent 3–10 detergent (0.03%)EMD Millipore#693021
Chemical compound, drugDigitoninEMD Millipore#300410
Chemical compound, drugRoche Complete Protease Inhibitor EDTA-free tabletsSigma Aldrich5056489001
Chemical compound, drugRNase A Dnase- and protease-freeThermoENO53110 mg/ml
Chemical compound, drugProteinase KThermoEO0492
Chemical compound, drugGlycogenSigma-Aldrich10930193001
Chemical compound, drugSpermidineSigma-Aldrich#S0266

Cell culture

Request a detailed protocol

K562 cells were purchased from ATCC (#CCL-243) and cultured as previously described (Janssens et al., 2018). All tested negative for mycoplasma contamination using MycoProbe kit.

Construction and purification of an improved IgG-affinity/MNase fusion protein

Request a detailed protocol

Hemagglutinin and 6-histidine tags were added to the carboxyl-terminus of pA-MNase (Schmid et al., 2004) using a commercially synthesized dsDNA fragment (gBlock) from Integrated DNA Technologies (IDT), which contains the coding sequence for both tags, glycine-rich flexible linkers and includes restriction sites for cloning. Another IDT gBlock containing the optimized protein-G coding sequence and homologous flanking regions to the site of insertion, was introduced via PCR overlap extension using Gibson Assembly Master Mix (New England Biolabs cat. #E2611), following the manufacturer’s instructions. The sequence-verified construct was transformed into JM101 cells (Agilent Technologies cat. #200234) for expression, cultured in NZCYM-Kanamycin (50 µg/ml) and induced with 2 mM Isopropyl β-D-1-thiogalactopyranoside following standard protein expression and purification protocols. The cell pellet was resuspended in 10 ml Lysis Buffer, consisting of 10 mM Tris-HCl pH 7.5, 300 mM NaCl, 10 mM Imidazole, 5 mM beta-mercaptoethanol, and EDTA-free protease inhibitor tablets at the recommended concentration (Sigma-Aldrich cat. #5056489001). Lysis using chicken egg white lysozyme (10 mg/mL solution, EMD cat. #71412 solution) was followed by sonication with a Branson Sonifier blunt-end adapter at output level 4, 45 s intervals for 5–10 rounds or until turbidity was reduced. The lysate was cleared by high-speed centrifugation and purified over a nickel-agarose column, taking advantage of the poly-histidine tag for efficient purification via immobilized metal affinity chromatography. Cleared lysate was applied to a 20 ml disposable gravity-flow column 1.5 ml (0.75 ml bed volume) of NI-NTA agarose (Qiagen cat. #30210), washed twice in three bed volumes of Lysis Buffer. Lysate was applied followed by two washes at five bed volumes of 10 mM Tris-HCl pH 7.5, 300 mM NaCl, 20 mM Imidazole, 0.03% ZWITTERGENT 3–10 Detergent (EMD Millipore cat. #693021) and EDTA-free protease inhibitor tablets. Elution was performed with 1 ml 10 mM Tris-HCl pH 7.5, 300 mM NaCl, 250 mM Imidazole and EDTA-free protease inhibitor tablets. Eluate was dialyzed twice against a 750 ml volume of 10 mM Tris-HCl pH 7.5, 150 mM NaCl, 1 mM EDTA, 1 mM PMSF to remove imidazole. Glycerol was then added to 50%, aliquots stored at –80°C for long term storage and –20°C for working stocks.

For purification, we used either the nickel-based protocol or the Pierce Cobalt kit (Pull-Down PolyHis Protein:Protein Interaction Kit cat. #21277 from Thermo Fisher). Similar results were obtained using either the nickel or cobalt protocol, although the cobalt kit alleviated the need for a sonicator, using a fifth of the starting material from either fresh culture or a cell pellet frozen in lysis buffer, and yielded more protein per volume of starting material. With the cobalt kit, 20 ml of culture yielded ~100 µg of fusion protein.

CUT&RUN using high-calcium/low-salt digestion conditions

Request a detailed protocol

Log-phase cultures of K562 cells were harvested, washed, and bound to activated Concanavalin A-coated magnetic beads, then permeabilized with Wash buffer (20 mM HEPES, pH7.5, 150 mM NaCl, 0.5 mM spermidine and a Roche complete tablet per 50 ml) containing 0.05% Digitonin (Dig-Wash) as described (Skene et al., 2018). The bead-cell slurry was incubated with antibody in a 50–100 µL volume for 2 hr at room temperature or at 4°C overnight on a nutator or rotator essentially as described (Skene et al., 2018). In some experiments, cells were permeabilized and antibody was added and incubated 2 hr to 3 days prior to addition of ConA beads with gentle vortexing; similar results were obtained (e.g. Figure 2B–D), although with lower yields. After 2–3 washes in 1 ml Dig-wash, beads were resuspended in 50–100 µL pA/MNase or pAG/MNase and incubated for 1 hr at room temperature. After two washes in Dig-wash, beads were resuspended in low-salt rinse buffer (20 mM HEPES, pH7.5, 0.5 mM spermidine, a Roche mini-complete tablet per 10 ml and 0.05% Digitonin). Tubes were chilled to 0°C, the liquid was removed on a magnet stand, and ice-cold calcium incubation buffer (3.5 mM HEPES pH 7.5, 10 mM CaCl2, 0.05% Digitonin) was added while gently vortexing. Tubes were replaced on ice during the incubation for times indicated in each experiment, and within 30 s of the end of the incubation period the tubes were replaced on the magnet, and upon clearing, the liquid was removed, followed by immediate addition of EGTA-STOP buffer (170 mM NaCl, 20 mM EGTA, 0.05% Digitonin, 20 µg/ml glycogen, 25 µg/ml RNase A, 2 pg/ml S. cerevisiae fragmented nucleosomal DNA). Beads were incubated at 37°C for 30 min, replaced on a magnet stand and the liquid was removed to a fresh tube and DNA was extracted as described (Skene et al., 2018). A detailed step-by-step protocol is available at https://www.protocols.io/view/cut-amp-run-targeted-in-situ-genome-wide-profiling-zcpf2vn. Extraction of pellet and total DNA was performed essentially as described (Skene and Henikoff, 2017; Thakur and Henikoff, 2018).

DNA sequencing and data processing

Request a detailed protocol

The size distribution of libraries was determined by Agilent 4200 TapeStation analysis, and libraries were mixed to achieve equal representation as desired aiming for a final concentration as recommended by the manufacturer. Paired-end Illumina sequencing was performed on the barcoded libraries following the manufacturer’s instructions. Paired-end reads were aligned using Bowtie2 version 2.2.5 with options: --local --very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X 700. For MACS2 peak calling, parameters used were macs2 callpeak – t input_file –p 1e-5 –f BEDPE/BED(Paired End vs. Single End sequencing data) –keep-dup all –n out_name. Some datasets showed contamination by sequences of undetermined origin consisting of the sequence (TA)n. To avoid cross-mapping, we searched blastn for TATATATATATATATATATATATAT against hg19, collapsed the overlapping hits into 34,832 regions and intersected with sequencing datasets, keeping only the fragments that did not overlap any of these regions.

Evaluating time-course data

Request a detailed protocol

If digestion and fragment release into the supernatant occur linearly with time of digestion until all fragments within a population are released, then we expect that CUT&RUN features will be linearly correlated within a time-course series. For CTCF, features were significant CTCF motifs intersecting with DNAseI hypersensitive sites (Skene and Henikoff, 2017). For H3K27Ac and H3K4me2, we called peaks using MACS2 and calculated the Pearson correlation coefficients between time points, displayed as a matrix of R2 values, using the following procedure:

  1. Aligned fastq files to unmasked genomic sequence using Bowtie2 version 2.2.5 to UCSC hg19 with parameters: --end-to-end --very-sensitive --no-mixed --no-discordant -q --phred33 -I 10 -X 700.

  2. Extracted properly paired read fragments from the alignments and pooled fragments from multiple samples.

  3. Compared pooled fragments with (TA)n regions of hg19 and kept those fragments that did NOT overlap any (TA)n region using bedtools 2.21.0 with parameters: intersect -v -a fragments.bed -b TATA_regions.bed>fragments_not_TATA.bed.

  4. Found peaks using macs2 2.1.1.20160309 with parameters: callpeak -t fragments_not_mask.bed -f BED -g hs --keep-dup all -p 1e-5 -n not_mask –SPMR.

  5. Made scaled fractional count bedgraph files for each sample from bed files made in step 2. The value at each base pair is the fraction of counts times the size of hg19 so if the counts were uniformly distributed the value would be one at each bp.

  6. Extracted bedgraph values for ±150 bps around peak summits for IgG sample and computed their means, which resulted in one mean score per peak.

  7. Removed peaks from macs2 results in step four if the mean score was greater than the 99th percentile of all IgG scores to make a subset of the peaks lacking the most extreme outliers.

  8. Extracted bedgraph values for ±150 bps around the subset of peak summits from step seven for all samples and computed their means, which resulted in a matrix with columns corresponding to samples and one row per peak.

  9. Computed correlations of matrix in eight using R 3.2.2 cor(matrix, use=‘complete.obs’) command.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
    Chimeric IgG-binding receptors engineered from staphylococcal protein A and streptococcal protein G
    1. M Eliasson
    2. A Olsson
    3. E Palmcrantz
    4. K Wiberg
    5. M Inganäs
    6. B Guss
    7. M Lindberg
    8. M Uhlén
    (1988)
    The Journal of Biological Chemistry 263:4323–4327.
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45

Decision letter

  1. Stephen Parker
    Reviewing Editor; University of Michigan, United States
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Developmental Biology, Germany

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "A streamlined protocol and analysis pipeline for CUT&RUN chromatin profiling" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Detlef Weigel as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The main objective of the manuscript is to introduce an improved protocol for CUT&RUN and a peak calling algorithm. The authors made optimized the pA-MNase enzyme (now pAG-MNase), for easier purification and recognition of both mouse and rabbit primary antibodies. Furthermore, the authors suggest an improved high Ca2+/low salt CUT&RUN protocol that prevents overdigestion/nonspecific digestion. The authors also find that E. coli DNA carried over in the pAG-MNase purification is still present in CUT&RUN sequencing samples and can therefore be used to normalize CUT&RUN data. Lastly, a new peak calling algorithm is proposed for calling peaks in CUT&RUN data as it typically has low read number and high signal to noise ratio. Although this manuscript does not contain any biological findings or major changes to the current CUT&RUN protocol, it does communicate important improvements to a technique that many labs are interested in using.

Essential revisions:

1) All reviewers uniformly shared major concerns about the peak calling algorithm, and we summarize these here:

a) There are several different modes and the description how they differ and when each is appropriate wasn't clear.

b) The sudden drop-off of SEACR with more data (going from 25M to 30M reads, Figure 5D) reveals a very concerning flaw in the model or a bug in the implementation. Performance should improve with more data in a downsampling experiment. The overall drop-off and poor performance present significant questions to its claim of robustness and general usability.

c) Could MACS2 and HOMER algorithms perform comparably to SEACR simply by tuning their parameters (wasn't clear how much of an effort was made to do this, and we think that is important for algorithm benchmarking)? More comparisons to other factors would be helpful.

d) SEACR does not provide any estimate of statistical significance to the assigned peaks compared to other methods. How are users to interpret confidence in peak calls?

e) Why were the target blocks defined by contiguous regions of nonzero coverage rather than tiling windows?

f) Figure 5 – label axes of (A)-(C) more clearly. These appear to be TPR vs. FPR using the encode logFDR<-10 peaks as a truth set; is that right? But should each peak caller at a given sampling depth have a summary value (auROC or auPR) rather than a point?

g) Figure 6B – SEACR is more aggressive in aggregating long peaks but this seems sort of trivial (i.e. one could do something similar by padding and merging peaks)

Overall, we felt that the other aspects of the manuscript were strong enough to warrant a path forward in the Research Advances format even if all the above items about the peak caller are not addressable.

2) A high Calcium/Low salt procedure is included that reduces diffusion of the released complex. The data to support this lower diffusion is that signal to noise appears higher. However, direct evidence for lower diffusion within the nucleus is not provided. Background seems to be lower in specific example loci (e.g. shown in Figure 1B) – this should be quantified genome-wide (e.g.,% signal in peaks). The improvement appears to be more profound in some examples (H3K27ac, Figure 2—figure supplement 1A) than others (H34Kme2, Figure 2—figure supplement 1A). Also, are inter-sample correlations the best way to show signal:noise improvement? It would be more convincing with precision and recall (or similar) vs. high-confidence peaks.

3) The authors claim that DNA carry over from E. coli in the pAG-MN preparation is a good substitute for the yeast genomic DNA spike in that is normally used. My concern about this is that this is not a well-controlled spike in, as the amount of E. coli DNA may well vary between batches of the purified PAG-MNase.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Improved CUT&RUN chromatin profiling and analysis tool" for further consideration at eLife. Your revised article has been favorably evaluated by Detlef Weigel (Senior Editor), a Reviewing Editor, and three reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

All three reviewers have examined the revised manuscript and feel that all points were addressed with the exception of the issues raised in items 1a-g regarding the peak caller. Importantly, all reviewers remain uniformly concerned about this aspect of the work and do not believe the revisions adequately addressed the points that were outlined. All reviewers believe that other aspects of this manuscript warrant publication and advocate for moving forward without the peak caller, which would expedite publication.

However, if the authors would like to make additional revisions to the peak caller sections, we will re-review those and recommend directly addressing the items we initially raised:

1b) The response to this point was to add an ad hoc "genome coverage" filter to effectively throw out reads and decrease noise. This is not a reasonable solution for a robust peak caller and only emphasizes the sensitivity of the method to read depth and noise.

1c) Insufficient parameter exploration was provided to convincingly demonstrate that the other peak callers are inferior to SEACR. For example, in addition to changing MACS2 FDR, why have the authors not attempted to change the local λ smoothing parameter, which would make MACS2 use a genome-wide Poisson threshold that is more equivalent to the single genome-wide threshold that SEACR uses? More informed parameter exploration beyond the single example provided here would make a stronger case.

1d) The authors state that statistical model based FDRs are inferior to their empirical threshold, but the performance results only support such a statement under a limited and author-selected (yellow highlights in plots) range of read depths. Further, performance on additional antibodies (TFs, histone mods, etc.) should be presented before such a strong claim is made.

1e) If "contiguous signal blocks reflect real patterns of protein protection that should be incorporated into the peak calls." then why do the authors need to implement a "genome coverage" filter at high read depths? This statement is contradictory to the performance results and methods in the manuscript.

1f) See point 1c above.

Overall, the SEACR algorithm seems very sensitive to background noise levels, which may be highly variable across diverse labs that will implement the revised CUT&RUN technique. This could be a recipe for confusion and misinterpretation across the user base. For the collective reasons outlined above, we remain skeptical about including the peak caller in this manuscript. However, we would welcome immediate forward movement if this section is removed.

https://doi.org/10.7554/eLife.46314.017

Author response

Summary:

The main objective of the manuscript is to introduce an improved protocol for CUT&RUN and a peak calling algorithm. The authors made optimized the pA-MNase enzyme (now pAG-MNase), for easier purification and recognition of both mouse and rabbit primary antibodies. Furthermore, the authors suggest an improved high Ca2+/low salt CUT&RUN protocol that prevents overdigestion/nonspecific digestion. The authors also find that E. coli DNA carried over in the pAG-MNase purification is still present in CUT&RUN sequencing samples and can therefore be used to normalize CUT&RUN data. Lastly, a new peak calling algorithm is proposed for calling peaks in CUT&RUN data as it typically has low read number and high signal to noise ratio. Although this manuscript does not contain any biological findings or major changes to the current CUT&RUN protocol, it does communicate important improvements to a technique that many labs are interested in using.

We thank the reviewers and editors for their appreciation of the importance of our improvements to the CUT&RUN method and for their very helpful comments and suggestions, which we address below.

Essential revisions:

1) All reviewers uniformly shared major concerns about the peak calling algorithm, and we summarize these here: a) There are several different modes and the description how they differ and when each is appropriate wasn't clear.

We agree that the utility of the different modes was not made explicitly clear in the text, especially between the “AUC” and the “union” modes. As described when testing AUC and union mode on Sox2 and FoxA2 data, union mode was introduced in order to improve the recall of SEACR for narrow, tall peaks that do not meet the total signal threshold assigned in AUC mode. Therefore, AUC mode defines a highly stringent set of peaks, whereas union mode generates more peaks with a small precision penalty. To clarify the distinction, we have changed the names of the modes from “AUC” and “union” to “stringent” and “relaxed” and have more extensively described their differences in the text (subsection “Peak-calling based on fragment block aggregation”). Moreover, we are in the process of implementing a SEACR web server to facilitate its broad use by non-technical users, and we intend to report outputs for both modes so that users can get a sense of the precision-recall tradeoffs for each, and use their preferred mode accordingly.

b) The sudden drop-off of SEACR with more data (going from 25M to 30M reads, Figure 5D) reveals a very concerning flaw in the model or a bug in the implementation. Performance should improve with more data in a downsampling experiment. The overall drop-off and poor performance present significant questions to its claim of robustness and general usability.

With respect to the possibility that the “model” is flawed, it should be understood that SEACR does not use a model at all—the total signal threshold is derived from empirical distributions of signal, which is the source of its simplicity and relative speed (SEACR with input files representing ~10M fragments typically completes in less than two minutes). SEACR was designed to deal with CUT&RUN data that is sequenced to relatively low read depths as compared to ChIP-seq (hence Sparse Enrichment Analysis). Our original eLife manuscript demonstrated that CUT&RUN can generate signal-to-noise that is far superior to ChIP-seq with fewer than 10 million fragments from a human experiment, and in the examples in the current manuscript we sequence only 3-5 million fragments, because deeper sequencing provides little if any gain. SEACR’s performance declines once the sequence depth is high enough that background signal accumulates, and this is responsible for the relatively poor SEACR performance above 30 million fragments in Figure 5D (new Figure 7D).

To address this, we have implemented a “genome coverage” filter that requires that greater than 50% of the reference genome lacks read coverage, therefore ensuring that the distribution of background signal is sparse enough for SEACR. We accomplish this by finding a minimum bedgraph signal threshold n for which converting regions of less than n signal to 0 results in satisfying the aforementioned 50% threshold. In the new Figure 7—figure supplement 1A, we show that when we calculate F1 scores for this amended implementation of SEACR similarly to Figure 5D (new Figure 7D), SEACR outperforms MACS2 and HOMER at 30M or more fragments, mirroring its better performance at lower fragment depths.This is a general solution, since no transcription factor or chromatin feature is expected to span this much of the genome. We should emphasize that SEACR is effective at avoiding false negatives in both transcription factor and histone modification CUT&RUN data while providing sufficient recall to outperform MACS2 and HOMER at low sequencing depth. We believe that retaining the core code and presenting its flaws in the extreme cases serves as a warning not to waste money by unnecessary sequencing.

c) Could MACS2 and HOMER algorithms perform comparably to SEACR simply by tuning their parameters (wasn't clear how much of an effort was made to do this, and we think that is important for algorithm benchmarking)? More comparisons to other factors would be helpful.

It is possible to improve the performance of MACS2 and HOMER by tuning parameters, and we have added analysis and discussion of this to the manuscript (new Figure 7—figure supplement 1B). However, the point of SEACR is that it uses a completely different design for calling peaks, and this algorithm works best at low sequencing depths. In comparing MACS2 and HOMER to SEACR, we made sure for each to use the “mode” that was appropriate for the data being analyzed according to the user guides for each algorithm. For example, we used MACS2 “narrow peak” mode and HOMER “factor” mode to call peaks from H3K4me2 data, and used “broad” and “histone” mode to analyze H3K27me3 data. This is newly emphasized in the text (subsection “Peak-calling based on fragment block aggregation”).

To the extent that parameters such as MACS2 FDR can be tuned to modulate the precision-recall balance, this burdens users with having to make arbitrary decisions about their peak calls, while SEACR is designed to provide a single empirical threshold for the user. Nevertheless, in this revision we provide an example of how selecting specific parameters for MACS2 changes its performance relative to SEACR in new Figure 7—figure supplement 1B. We took the MACS2 peak calls for H3K4me2 that we originally presented, and selected only peaks that met a minimum –log10(FDR) threshold of 10, in order to tilt the precision-recall balance in favor of precision similar to SEACR. In doing so and calculating F1 scores as previously described, we found that although the new MACS2 peak calls performed similarly to SEACR across the upper end of the optimal range of fragment depths that we originally defined in Figure 5D (new Figure 7D), its performance suffered dramatically at low fragment depths, making SEACR newly superior at those subsampling levels. Therefore, we conclude that, even when purposefully selecting parameters for a different peak caller to tune its performance, SEACR performs competitively in the absence of any arbitrary user input.

d) SEACR does not provide any estimate of statistical significance to the assigned peaks compared to other methods. How are users to interpret confidence in peak calls?

Since SEACR is model-free, we cannot assign statistics such as false discovery rate to each peak based on a comparison to a statistical model, and we consider model statistics inferior to the threshold we derive from actual empirical data. However, SEACR already calculates the threshold at which the fraction of remaining signal blocks from the target dataset is maximized by finding the following value:

Max(1-(IgG blocks/total blocks)) = 1-Min(IgG blocks/total blocks)

Since the IgG blocks/total blocks term is analogous to an empirically calculated false discovery rate (FDR), we now report the minimum value for this term in the standard output of SEACR in order to convey the confidence inherent to the global threshold. Users can interpret this value as they would a standard FDR and use it to determine the general quality of the peak calls as a whole.

Although we can report an FDR, we are uncomfortable with the inclusion of confidence statistics in peak-caller output, because this risks inspiring false confidence on the part of a non-technical user. Therefore, in our planned public web server we will make it an option with the following warning: “Assigning confidence measures for individual peaks often falsely gives the impression that the peak is in fact a true positive, in the absence of a “gold standard” to verify this impression.” Our demonstration that SEACR calls ~10,000 true positives (Sox2 in ESCs and FoxA2 in Endoderm where they are expressed) but only 1-2 false positives (Sox2 in Endoderm and FoxA2 in ESCs where they are not expressed) achieves for CUT&RUN gold-standard validation. Such validation has never been achieved for ChIP-seq, where for example reports of “Phantom Peaks” in modENCODE data imply that upwards of 25% of peaks are called regardless of the presence or absence of the targeted factor (Jain et al., 2015). We now emphasize this point in the text (subsection “Conclusions”).

e) Why were the target blocks defined by contiguous regions of nonzero coverage rather than tiling windows?

CUT&RUN digestion patterns are informative in a way that ChIP-seq random fragmentation is not, and therefore contiguous signal blocks reflect real patterns of protein protection that should be incorporated into the peak calls. There are additional benefits. CUT&RUN datasets typically have very sparse background signal, meaning we can take advantage of this characteristic by understanding that contiguous signal blocks with the highest total signal contained within them are most likely to be true positive peaks, rendering tiling unnecessary. From a design perspective, we were interested in using the empirical data rather than abstracting from it. From a practical perspective, window tiling is much more computationally intensive than our data-driven approach. For example, window tiling requires that one discard data unless the tiling is done per-base, which incurs a large computational cost, and therefore we would sacrifice speed where it isn’t clear that window tiling would even be a preferred approach.

f) Figure 5 – label axes of (A)-(C) more clearly. These appear to be TPR vs. FPR using the encode logFDR<-10 peaks as a truth set; is that right? But should each peak caller at a given sampling depth have a summary value (auROC or auPR) rather than a point?

The reviewers are correct that ENCODE peaks of -log10(FDR) > 10 is used as the “truth set”; we make this more explicit in the text (subsection “Peak-calling based on fragment block aggregation”) and have made the axis labels more clear. If we understand the final question correctly, the reviewers are proposing that some parameter of the peak calls (e.g. FDR from MACS2) be varied such that a curve can be derived, rather than a single point from the full peak call set. As was outlined above in response to point (c), we feel this would unfairly disadvantage SEACR since it is designed to find a single threshold that reflects high precision in peak calling. However, we also point the reviewers to the new Figure 7—figure supplement 1B produced in response to point (c), in which MACS2 FDR is varied and F1 scores calculated, which partially addresses this proposal by showing that SEACR remains competitive in comparison to MACS2 across multiple parameter selection strategies for MACS2.

g) Figure 6B – SEACR is more aggressive in aggregating long peaks but this seems sort of trivial (i.e. one could do something similar by padding and merging peaks)

As is the case with other points raised above, parameters could be selected from other peak callers or other manual means used to achieve a desired degree of “peak-stitching”. However, the ability to generate domains without arbitrary user input remains valuable. Moreover, SEACR does not require manual tuning for data with vastly different patterns (e.g. H3K4me2 and H3K27me3).

Overall, we felt that the other aspects of the manuscript were strong enough to warrant a path forward in the Research Advances format even if all the above items about the peak caller are not addressable.

Our revisions to the implementation and description of SEACR should address the reviewers’ concerns.

2) A high Calcium/Low salt procedure is included that reduces diffusion of the released complex. The data to support this lower diffusion is that signal to noise appears higher. However, direct evidence for lower diffusion within the nucleus is not provided.

The direct evidence for reduced diffusion is presented in Figure 2, where we show that incubation at 37oC for 30 minutes results in extraction of the nucleosome ladder for H3K27me3 and H3K27ac under standard conditions, but no detectable release under high-salt/low calcium conditions. The 10 mM Ca++ 3.5 mM HEPES pH7.5 condition that we used was based on the observation that mononucleosomes that are freely soluble without divalent cations are insoluble in 10 mM Ca++ or Mg++. To avoid any misunderstanding on the physical basis for the observation shown in Figure 2, we have removed the term “diffusion” where it was used in the text, and now describe this phenomenon consistently as “premature release”.

Background seems to be lower in specific example loci (eg shown in Figure 1B) – this should be quantified genome-wide (e.g.,% signal in peaks).

Figure 1B was intended to show that pAG/MNase works equally well for both a rabbit monoclonal and a mouse monoclonal antibody using RNAPII-Ser5P as an example, and we now show representative tracks that make this point. We show tracks for a histone gene cluster, because the genes are small and close together so that the distinction between signal and background is readily apparent. However, it is problematic to call peaks on RNAPII-Ser5P, genome-wide because this modification is enriched over 5’ ends of genes but also present throughout gene bodies at a low level. Instead, we now show identically scaled tracks for an H3K4me2 CUT&RUN experiment done in parallel with the RNAPII-Ser5P experiment shown in Figure 1B (new Figure 4—figure supplement 1B). It is apparent from the tracks that the backgrounds are extremely low for both the standard protocol and the high-calcium/low-salt protocol. As requested, we also provide a Fraction of reads in peaks (Frip) plot, the ENCODE-recommended standard for evaluating relative data quality (new Figure 4 and Figure 4—figure supplement 1C).

The improvement appears to be more profound in some examples (H3K27ac, Figure 2—figure supplement 1A) than others (H34Kme2, Figure 2—figure supplement 1A). Also, are inter-sample correlations the best way to show signal:noise improvement? It would be more convincing with precision and recall (or similar) vs high-confidence peaks.

We have added representative tracks and Frip plots for the time points, which show improved signal-to-noise (new Figure 4C). However, there seems to be a misunderstanding of what we were intending to show with the correlation matrices. Our conclusion was that the new protocol provides higher yields by preventing premature release, which can increase background with longer digestion times, and we have made textual changes in the revision to make this point clearer (subsection “Preventing premature release during CUT&RUN digestion”). The correlation matrices were meant to show that over the course of digestion there is improved uniformity and this improvement is confirmed in the Frip plots. As we first documented in our original eLife paper and confirmed in subsequent publications (PMID: 29651053; 30577869; 30554944), backgrounds for CUT&RUN using the standard protocol are much lower than for ChIP-seq (e. g.Figure 1—figure supplement 2). Since then, we have distributed CUT&RUN materials to >600 laboratories, and although feedback has been mostly positive, a common problem that users ask us about is relatively high background with some antibodies, which we have traced to overdigestion (we recommend 30 minutes at 0oC to maximize yields). We chose to do most of our testing on H3K27ac antibodies from Abcam and Millipore because these are the antibodies that we and others have had the most background problems with, perhaps because H3K27 acetylation is so abundant that release during digestion happens more rapidly than for transcription factors and most other epitopes. In fact, we see no consistent signal-to-noise differences between the standard and high-calcium/low-salt conditions for H3K4me2, now documented in the new Figure 4—figure supplement 1 described above. Because there are already hundreds of satisfied CUT&RUN users who follow the standard protocol described on Protocols.io (version 3, https://www.protocols.io/view/cut-amp-run-targeted-in-situ-genome-wide-profiling-zcpf2vn/guidelines), we are only recommending high-salt/low-calcium conditions as an option “for targets that are enriched at active chromatin (e.g. H3K27ac)”.

3) The authors claim that DNA carry over from E. coli in the pAG-MN preparation is a good substitute for the yeast genomic DNA spike in that is normally used. My concern about this is that this is not a well-controlled spike in, as the amount of E. coli DNA may well vary between batches of the purified PAG-MNase.

The goal of calibration is to allow for comparisons to be made in a series, so as long as a user follows the protocol, this issue will never arise, especially insofar as a single batch of enzyme using the Pierce kit yields enough for ~10,000 samples. Using two different lots of any critical reagent (e.g. an antibody) for samples in a series would be contrary to accepted laboratory practice. Even if a user runs short of pAG/MNase, mixing batches is OK, as long as the mixing is done before adding to samples in a series. We now state in the text that the E. coli carry-over DNA can be used for samples in a series using the same batch of pAG-MNase (subsection “Calibration using E. coli carry-over DNA”).

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

All three reviewers have examined the revised manuscript and feel that all points were addressed with the exception of the issues raised in items 1a-g regarding the peak caller. Importantly, all reviewers remain uniformly concerned about this aspect of the work and do not believe the revisions adequately addressed the points that were outlined. All reviewers believe that other aspects of this manuscript warrant publication and advocate for moving forward without the peak caller, which would expedite publication.

We are glad that the reviewers found our responses to be satisfactory for most of the manuscript, and we agree that the peak-caller needs additional work. Therefore, as requested by the reviewers, we have removed the peak-caller in its entirety from the manuscript to expedite publication. This required removal of a sentence from the Abstract, a sentence from the Introduction, a sub-section from the Results section and Discussion section, a paragraph from the Conclusions, a sub-section from the Materials and methods section and Figure 6, Figure 7, Figure 6—figure supplement 1 and Figure 7—figure supplement 1. We have made minor textual changes in accordance with the removal of SEACR and have changed the title to “Improved CUT&RUN chromatin profiling tools”.

However, if the authors would like to make additional revisions to the peak caller sections, we will re-review those and recommend directly addressing the items we initially raised:

As described above, the peak caller section and associated figures and text have been removed.

Overall, the SEACR algorithm seems very sensitive to background noise levels, which may be highly variable across diverse labs that will implement the revised CUT&RUN technique. This could be a recipe for confusion and misinterpretation across the user base. For the collective reasons outlined above, we remain skeptical about including the peak caller in this manuscript. However, we would welcome immediate forward movement if this section is removed.

We agree that our manuscript without SEACR is ready to move forward, as all requirements for publication have been met.

https://doi.org/10.7554/eLife.46314.018

Article and author information

Author details

  1. Michael P Meers

    Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, United States
    Contribution
    Validation, Writing—review and editing
    Competing interests
    No competing interests declared
  2. Terri D Bryson

    1. Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, United States
    2. Howard Hughes Medical Institute, United States
    Contribution
    Investigation, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Jorja G Henikoff

    Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, United States
    Contribution
    Data curation, Software, Formal analysis, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Steven Henikoff

    1. Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, United States
    2. Howard Hughes Medical Institute, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    steveh@fhcrc.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7621-8685

Funding

Howard Hughes Medical Institute

  • Steven Henikoff

National Institutes of Health (4DN TCPA A093)

  • Steven Henikoff

Chan-Zuckerberg Initiative

  • Steven Henikoff

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Christine Codomo and Tayler Hentges for technical support. We also thank all members of the Henikoff lab for valuable discussions and Kami Ahmad, Brian Freie and Bob Eisenman for comments on the manuscript. This work was supported by the Howard Hughes Medical Institute, and a grant from the National Institutes of Health (4DN TCPA A093) and the Chan-Zuckerberg Initiative.

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Developmental Biology, Germany

Reviewing Editor

  1. Stephen Parker, University of Michigan, United States

Publication history

  1. Received: February 22, 2019
  2. Accepted: June 22, 2019
  3. Accepted Manuscript published: June 24, 2019 (version 1)
  4. Version of Record published: June 28, 2019 (version 2)

Copyright

© 2019, Meers et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 4,054
    Page views
  • 758
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Chromosomes and Gene Expression
    2. Evolutionary Biology
    Ryan Bracewell et al.
    Research Article
    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Sara Masachis et al.
    Research Article Updated