Mcm binding sites, identified by MCM-ChEC, are consistent with those identified by Mcm-ChIP, and their occupancy varies over 3 orders of magnitude. (A) MCM-ChEC strategy relies on Mcm-MNAse fusion proteins that, when activated with exogenously added calcium, generate short DNA fragments corresponding to Mcm binding sites. These short fragments, which are conducive to PCR amplification, are identified through deep sequencing. (B) Library fragment size in MCM-ChEC library exhibit major and minor peaks at approximately 60 and 180 base pairs, corresponding to Mcm double-helicases and nucleosomes, respectively. (C) Mcm binding sites identified by MCM-ChEC are consistent with results from Mcm-ChIP, but higher in resolution. Four panels show the same region of chrIV at various levels of magnification, from coordinates 417,855-516,737 (upper left) to coordinates 460,851-464,146 (lower right). The prominent peak in the middle corresponds to ARS416/ARS1. Yellow shading in lower right panel depicts within ARS1 (red rectangle) correspond to Mcm complex. From top to bottom, rows in each panel show MCM-ChEC (this study), Mcm-ChIP (Belsky et al. 2015), Mcm-ChIP (Dukaj and Rhind 2021), Mcm-ChIP-exo (Rossi et al. 2021), and free MNase (Foss et al. 2019). (D) Signal intensities at CMBSs vary over approximately 3 orders of magnitude. The vertical portion of the plot at the far left represents ARS1200-1/ARS1200-2 in the repetitive rDNA. Blue vertical lines indicate the 142 peaks of MCM-ChEC signal that are within 100 base pairs of one of the 187 ACSs reported in SGD. Inset shows an 11 kb region from chrIV, with MCM-ChEC signal in top row, nucleosome dyads in middle row, and gene locations in bottom row. Vertical black and red arrows point to CMBSs corresponding to intergenic Mcm helicases and low level background signal from nucleosomes, respectively, with the ranks of these sites depicted by corresponding arrows on the main curve.

Replication activity at known origins and Mcm binding sites. Replication activity was assessed by generation of ssDNA in rad53 mutants released from G1 arrest into medium containing HU. Plots show relative ratio of ssDNA in S/G1, as described (Feng et al. 2006). (A) Average replication profile centered on 181 ACSs (all ACSs at least 10 kb from chromosome end listed in SGD for which data were available in Feng et al., 2006). (B) Average replication profiles of groups of 200 of the 2000 most prominent peaks of MCM-ChEC signal, centered on peak midpoints. Numbers in parentheses indicate median abundance of Mcm-ChEC signal, normalized such that the median signal in the highest group is 100.

Most abundant 5500 peaks of MCM-ChEC signal are located predominantly in nucleosome-free regions. (A) Left heat map shows 46-56 base pair fragments marking nucleosome dyads that were generated by ex-vivo nuclease activity of H3Q85C coupled to phenanthroline (Chereji et al. 2018). Right heat map shows 51-100 base pair library fragments from MCM-ChEC. Both heat maps are centered on midpoints of the top 8,000 MCM-ChEC peaks, ranked by abundance. Each row corresponds to a single peak, with the most abundant peak at the top. A portion of Figure 4A is shown rotated by 90° (right) to indicate average locations of 50 CMBSs relative to gene bodies (main curve), transcription levels of genes in those cases when CMBSs are genic rather than intergenic (horizontal bars), and CMBSs within 100 base pairs of an ACS listed in SGD (light blue). (B) Composite signals of nucleosome dyads for groups of 500 CMBSs. The x axis is identical to that in the heat maps in (A). Each row shows 7 groups of 500. Moving from the top row to the bottom row corresponds to moving from high abundance CMBSs to low abundance CMBSs. Within each row, the order from high abundance to low abundance is indicated by color, as noted in the figure. The y axes are linear scales showing the relative abundance of nucleosomal signal, and the scales of these axes are the same for each row.

Locations of CMBSs with respect to locations and both levels and directions of transcription of adjacent genes. (A) Continuous curve shows number of CMBSs, from a sliding window of 50, that are located within gene bodies. Vertical lines show levels of transcription of genes in those instances in which the corresponding CMBS is genic. Light blue area indicates CMBSs within 100 base pairs of ACS listed in SGD. X axis is arranged by CMBS abundance rank, from high abundance (left) to low abundance (right). (B) Orientation of transcription of flanking genes in those instances when CMBSs are intergenic. Plots show number of convergently (blue), divergently (orange) and tandemly (black) transcribed genes enumerating only those CMBSs, within a sliding window of 50, that are intergenic. X axis is limited to CMBSs with ranks <= 5500, because higher ranks are mostly genic, and therefore cannot be classified according to relative transcription directions of flanking genes.

EACS PWM scores indicate that peaks of MCM signal are flanked by ACSs, and that Mcm helicases are loaded downstream of the ACS (maximum possible score is 12.742). (A) Composite scores for Watson (red) and Crick (blue) strands for CMBSs with ranks <= 5500. (B) Composite scores for Watson (red) and Crick (blue) strands for CMBSs with ranks <= 7700, shown in groups of 1100. (C) Average EACS-PWM scores on Crick strand for peaks with the top 10% of EACS-PWM scores on the Watson strand are as high as average scores on the Crick strand for all 5,500 peaks of MCM-ChEC signal. 10% of peaks with highest EACS-PWM scores on Wason strand are shown in black and total peaks are shown in red. This result supports a model in which peaks of Mcm signal are flanked on both sides by ACS signals, because, even if peaks have a very high EACS-PWM on the left side, they still maintain a significant score on the right side. (D) Cumulative MCM-ChEC signal for 187 ACS sequences indicate that Mcm helicases are loaded downstream of the ACS. 1 kb windows of MCM-ChEC signals centered on each of the 187 ACS sequences listed in SGD were oriented according to the ACS, normalized to 100, and then added to create a cumulative sum vector showing the location of MCM-ChEC signal relative to the ACS. MCM-ChEC signal was quantified according to the midpoints of library fragments with inserts from 51-100 base pairs.

Average GC skew increases across the midpoints of the most abundant 5,500 peaks of MCM-ChEC signal. GC skew was calculated according to the numbers of Gs and Cs in the Watson strand along a 51 base pair moving window, as (G-C)/(G+C); the skew value plotted corresponds to the score at the midpoint of that window. (A) Average EACS PWM scores for the most abundant 5,500 peaks of MCM-ChEC signal. (B) Averages of GC skew for groups of 1100, arranged from high (left) to low (right) abundance.

Estimates of total number of origins by different criteria (estimates from all but first 2 rows rely on CMBSs from this report)

MCM-ChEC measurements are reproducible. 14 MCM-ChEC experiments show high reproducibility, with mean and median values of r2 of 0.95 and 0.98, respectively. All plots show the same sample plotted on the x axes with each of the 13 remaining samples plotted on the y axes. 100 base pair windows centered on each of the most abundant 5,500 peaks of MCM-ChEC signal were quantified as the sum of per-base pair read depths, with all samples normalized to the same number of total counts for the entire genome. r2 values are noted on each plot.

Replication activity and licensing at Mcm binding sites, assessed by ssDNA generation, BrdU incorporation, and qPCR. (A) Average replication profiles, as determined by generation of ssDNA, for groups of 200 from the 1600 most prominent peaks of MCM-ChEC signal, centered on peak midpoints. WT (top row) and rad53 (bottom row) cultures were arrested in G1 and released into medium containing HU. Samples were collected at 30 minutes, 1 hour, 2 hours and 3 hours and ssDNA was quantified, as described (Feng et al. 2006). (B) Average replication profiles, as determined by incorporation of BrdU, for groups of 200 from the 1600 most prominent peaks of MCM-ChEC signal, centered on peak midpoints. rad53 cells were arrested in G1 and released into medium containing HU and BrdU. BrdU was quantified on Affymetrix chips. y axis shows average of log 2 of measurement/mean. (C) qPCR across origins using Mcm-ChEC-cut templates can be used to assess licensing, because increased cutting by Mcm-MNase is reflected in the requirement for increased qPCR cycles to generate a diagnostic product (Foss et al. 2021). Linear regression using qPCR at 7 origins allowed us to determine that the CMBS with rank 100 is licensed at 69% (green lines), and thereby to infer that the CMBSs in the 8th cohort, whose median CMBS abundance is 1.5% of that of the first cohort, is licensed at approximately 1% (69% x 0.015 = 1.0%). ARSs analyzed correspond to the following CMBS ranks, and were licensed at the indicated percentages: ARS111: rank 5, 76%; ARS1235: rank 75, 63%; ARS802: rank 183, 70%; ARS1427: rank 51, 76%; ARS1406: rank 42, 70%; ARS224: rank 9, 84%; ARS1103: rank 3, 90.0%.

Figure identical to Figure 3, except that nucleosomes were identified by MNase treatment of chromatin. Heat map showing nucleosomes flanking CMBSs is derived from library fragments in the 151-200 base pair range.

Comparison of pattern of GC skew when centered on ACSs versus CMBSs. (A) GC skew for 400 base pair sequences centered on 187 ACSs, oriented according to directionality specified in SGD. (B) GC skew for 400 base pair sequences centered on 5,500 most abundant CMBSs. GC skew is measured as in Figure 6.