Total 7,459,709 origins defined by four types of techniques show different genomic features.

(a) Data processing pipeline. 113 publicly available profiles of origins are processed following the pipeline. (b) Number of samples collected for each technique. In total 7,459,709 union origins were identified. (c) PCA shows the clustering of origin datasets from different techniques. (d) Genomic annotation (TSS, exon, intron and intergenic regions) of different groups of origins. Background is the percentage of each annotation on whole genome. (e) Overlap with TF hotspots for different groups of origins. (f) Overlap with constitutive CTCF binding sites for different groups of origins. (g) GC content of different groups of origins. Grey line marks the average GC content of human genome. (h) G- quadruplex overlapping rates of different group of origins.

The shared origins are enriched with certain transcription factors and active histone marks.

(a) SNS-seq origins fitting distribution to an exponential model. (b) Conceptual model of how shared origins is calculated. Every SNS-seq shared origin overlaps with any Bubble-seq IZ, OK-seq IZ and Repli-seq origin together is considered as origin shared by all techniques(shared origins). (c) Genomic annotation of union origins and shared origins. (d) Overlap with TF hotspots of union origins and shared origins. (e) Overlap with constitutive CTCF binding sites of union origins and shared origins. (f) GC content of union origins and shared origins. (g) G-quadruplex overlapping rates of union origins and shared origins. (h) BART prediction of TFs associated with shared origins. (i) Enrichment of histone marks at shared origins using all origins as control.

Genomic features of shared ORC binding sites.

(a) Genomic annotation of union ORC and shared ORC binding sites. (b) Overlap with TF hotspot of union ORC and shared ORC binding sites. (c) Overlap with constitutive CTCF binding rates of union ORC and shared ORC binding sites. (d) GC content of union ORC and shared ORC binding sites. (e) Overlap with G-quadruplex of union ORC and shared ORC binding sites. (f) Distribution of the distance between ORC binding sites and the nearest shared origin. (g) The percentage of high-confidence origins (shared origins in human and confirmed origins in yeast) that overlapped with / near(region boundary less than 1kb) corresponding ORC binding sites.

Shared ORC binding sites are more correlated with activate transcription.

(a) Genomic annotation of shared origins and shared origins that near (less than 1kb) ORC binding sites. (b) Overlap with TF hotspots of shared origins and shared origins that near ORC binding sites. (c) Overlap with constitutive CTCF binding sites of shared origins and shared origins that near ORC binding sites. (d) GC content of shared origins and shared origins that near ORC binding sites. (e) Overlap with G- quadruplex sites of shared origins and shared origins that near ORC binding sites. (f) Replication timing score of shared origins and shared origins that near ORC binding sites. (g) Annotation of expression level of genes that overlapped with different groups of origins. (h) BART prediction of TFs associated with highest confidence origins.

Genomic features of shared MCM binding sites.

(a) Genomic annotation of union MCM and shared MCM binding sites. (b) Overlap with TF hotspot of union MCM and shared MCM binding sites. (c) Overlap with constitutive CTCF binding rates of union MCM and shared MCM binding sites. (d) GC content of union MCM and shared MCM binding sites. (e) Overlap with G-quadruplex of union MCM and shared MCM binding sites. (f) Overlap with TF hotspots of shared origins and shared origins that near MCM binding sites. (g) The percentage of high-confidence origins (shared origins in human and confirmed origins in yeast) that overlapped with / near(region boundary less than 1kb) corresponding MCM binding sites. (h) Several confident origins defined by different methods. Number is calculated from total peak from each methods overlapping with total peak of all methods.

Genome browser screenshot for three of the 74 origins from Fig. 5h

The number on SNS-seq shared origins track is occupancy score of the origin.

summary of data to show numbers of origins of different types, extent of overlap of each with different genomic features and comparison with Transcription Start Sites (TSS or promoters).

Permutation test to ascertain the significance of the overlaps reported in this paper relative to random expectation.