Total 7,459,709 origins defined by four types of techniques show different genomic features.

(a) Data processing pipeline. 113 publicly available profiles of origins are processed following the pipeline. (b) Number of samples collected for each technique. In total 7,459,709 union origins were identified. (c) PCA shows the clustering of origin datasets from different techniques. (d) Genomic annotation (TSS, exon, intron and intergenic regions) of different groups of origins. Background is the percentage of each annotation on the whole genome. (e) Overlap with TF hotspots for different groups of origins and promoters. (f) Overlap with constitutive CTCF binding sites for different groups of origins and promoters. (g) GC content of different groups of origins and promoters. Grey line marks the average GC content of human genome. (h) G-quadruplex overlapping rates of different groups of origins and promoters.

The shared origins are enriched with certain transcription factors and active histone marks.

(a) SNS-seq origins fitting distribution to an exponential model shows that an occupancy score ≥20 selected for reproducible SNS-seq origins. (b) Conceptual model of how shared origins is calculated. Any SNS-seq shared origin that overlaps with Bubble-seq IZ, OK-seq IZ and Repli-seq origin together is considered as an origin identified by all four techniques (shared origins). (c) Genomic annotation of union origins and shared origins. (d) Overlap with TF hotspots of union origins and shared origins. (e) Overlap with constitutive CTCF binding sites of union origins and shared origins. (f) GC content of union origins and shared origins. (g) G-quadruplex overlapping rates of union origins and shared origins. (h) BART prediction of TFs associated with shared origins. (i) Enrichment of histone marks at shared origins using all origins as control.

Genomic features of shared ORC binding sites and their co-localization with shared origins.

(a) Genomic annotation of union ORC and shared ORC binding sites. (b) Overlap with TF hotspot of union ORC and shared ORC binding sites. (c) Overlap with constitutive CTCF binding sites of union ORC and shared ORC binding sites. (d) GC content of union ORC and shared ORC binding sites. (e) Overlap with G-quadruplex of union ORC and shared ORC binding sites. (f) The percentage of high-confidence origins (shared origins in human and confirmed origins in yeast) that overlapped with (left) or are proximate to (≤1kb) (right) to two types of ORC binding sites (union or shared). (g) Distribution of the distance between ORC binding sites and the nearest shared origin.

Shared origins near shared ORC binding sites are more correlated with active transcription.

(a) Genomic annotation of shared origins and shared origins near (≤1kb) ORC binding sites. (b) Overlap with TF hotspots of shared origins and shared origins near ORC binding sites. (c) Overlap with constitutive CTCF binding sites of shared origins and shared origins near ORC binding sites. (d) GC content of shared origins and shared origins near ORC binding sites. (e) Overlap with G-quadruplex sites of shared origins and shared origins near ORC binding sites. (f) Y-axis: Replication timing score (from Navarro, 2021) for indicated classes of origins. (g) Annotation of expression level of genes that overlapped with different groups of origins. (h) BART prediction of TFs associated with highest confidence origins.

Genomic features of shared MCM binding sites and their co-localization with shared origins.

(a) Genomic annotation of union MCM and shared MCM binding sites. (b) Overlap with TF hotspot of union MCM and shared MCM binding sites. (c) Overlap with constitutive CTCF binding rates of union MCM and shared MCM binding sites. (d) GC content of union MCM and shared MCM binding sites. (e) Overlap with G-quadruplex of union MCM and shared MCM binding sites. (f) Overlap with TF hotspots of shared origins and shared origins near MCM binding sites. (g) The percentage of high-confidence origins (shared origins in human and confirmed origins in yeast) that overlapped with (left) or are proximate to (≤1kb) (right) to two types of MCM binding sites (union or shared). (h) Venn diagram of shared origins that are near ORC, MCM2 or MCM3-7 binding sites.

Genome browser screenshot for three of the 74 origins from Fig. 5h.

The numbers below the SNS-seq shared origins track is the occupancy score of the origins along the length of indicated track.

Overlap of origins, TSS and ORC binding sites with indicated features.

Permutation test of overlap of shared origins or promoters (TSS) with region around promoters, Shared ORC peaks (in > 2 datasets), R-loops, G-quadruplexes and CTCF binding sites. Fold enrichment of observed overlap relative to the mean overlap seen with 1000 randomizations of set A is indicated together with the p-value of the enrichment.

Distribution of origins defined by four types of techniques.

(a) For each cell type, how many sample we have collected. (b) Distribution of peak length of origins from each technique. (c) PCA results of all samples, marked by cell types. (d) PCA results of SNS-seq samples, marked by cell types. € PCA results of SNS-seq samples, marked by year of the data uploaded. (f) BART2 results of union origins. (g) Enrichment of histone marks at re-replicated union origins using total union origins as control.

Correlation between origins from different samples.

Pairwise correlation of samples from different techniques.

Background model for identification of shared origins.

(a) Conceptual model of how occupancy score is defined to represent the number of samples that each origin occurs. (b) Distribution of occupancy score of SNS-seq union origins (300bp).

Origins defined by different techniques in K562 cell line and their overlap with shared origins.

Shared origins are defined from all samples. The number of shared origins covered by each file is calculated and marked in the figure. Numbers in the parentheses are the numbers of peaks in the other dataset that overlap with the shared origins.

Analysis of overlap between shared ORC binding sites and origins.

(a) 12,712 ORC binding sites in the human genome were shared by at least 2 ORC ChIP-seq datasets. The overlapping rates with shared origins is provided. (b) Overlapping of union origins, MCM3-7, and ORC2 in K562 cell line. (c) Overlapping of union origins and MCM2 in HCT116 cell line. (d) Overlapping of union origins and ORC1 in HeLa cell line. Overlapping of shared origins seen in K562 cells with ORC and MCM binding sites in K562 cells. Shared origins seen in K562 cells are generated from SNS-seq files that overlapped with K562 Izs (defined by OK-seq and Repli-seq). (f) Overlapping of shared origins seen in HeLa cells with ORC binding sites in HeLa cells. Shared origins seen in HeLa cells are generated from 3 HeLa derived SNS-seq samples using the intersected peaks from: NS_GSM3983205_hela_siNC.bed, NS_GSM3983206_hela_siNC.bed, NS_GSM3983210_hela_siH2A.Z.bed.

ORC subunits do not co-bind to DNA as much as expected.

(a) Overlap of ChIP-seq peaks of different co-factors. (b) Distance distribution of ORC1 and ORC2 binding sites. (c) Number of shared cell types of ChIP-seq data used in (a). (d) Overlap of ChIP-seq peaks of co-factors from different cell types.

Shared origins overlap with phosphorylated MCM2.

(a) Percentage of shared origins that overlap with phosphorylated MCM2 binding sites. (b) Percentage of shared origins that near phosphorylated MCM2 binding sites.

Analyses of a few selected origin sets suggested by reviewers.

(a) Overlapping number of shared origins and core origins (Akerman et al., Nature Communications,2020). (b) Percentage of shared origins that overlap with phosphorylated MCM2 binding sites. (c) Overlapping number of shared origins and Ini-seq2 defined origins. (d) Percentage of shared origins that near phosphorylated MCM2 binding sites.

Selecting fewer but even more reproducible origins with more stringent cutoff to determine their overlap with ORC and MCM binding sites.

(a) The percentage of high-confidence origins (defined by cutoff of occupancy score indicated on X-axis) that overlapped with union or shared ORC binding sites. (b) Similar to (a), except % origins that are near (< 1kb) ORC binding sites. (c) Similar to (a) except % origins that overlap with union or shared MCM binding sites. (d) Similar to (c), except % origins near (<1 kb) MCM binding sites.