HOT loci are prevalent in the genome. A) Distribution of the number of loci by the number of overlapping peaks 400bp loci. Loci are binned on a logarithmic scale (Table S1. Methods). The shaded region represents the HOT loci. B) Prevalence of DAPs in HOT loci. Each dot is a DAP. X-axis: percentage of HOT loci in which DAP is present. Y-axis: percentage of total peaks of DAPs that are located in HOT loci. Dot color and size are proportional to the total number of ChIP-seq peaks of DAP. C) Breakdown of HepG2 HOT loci to the promoter, intronic and intergenic regions. D) Fractions of HOT enhancer and promoter loci located in ATAC-seq. E) Overlaps between the HOT enhancer, HOT promoter, super-enhancer, regular enhancer, H3K27ac, and H4K4me1 regions. All of the visualized data is generated from the HepG2 cell line.

PCA plots of HOT loci based on the DAP presence vectors. Each dot represents a HOT locus: A) PC1 and PC2, marked promoters and enhancers. B) PC1 and PC2, marked p300 bound HOT loci. C) PC1 and PC4, marked CTCF bound HOT loci. The dashed lines in A,B,C are logistic regression lines. auROC values are indicated on x-axes. D) DAPs hierarchically clustered by their involvement in HOT promoters and HOT enhancers. Heatmap colors indicate the % of HOT enhancers or promoters that a given DAP overlaps with. All of the visualized data is generated from the HepG2 cell line.

A) Densities of long-range Hi-C chromatin contacts between the DAP-bound loci. Each horizontal and vertical bin represents the loci with the number of bound DAPs between the edge values. The density values of each cell are normalized by the maximum value across all pairwise bins. Green boxes represent HOT loci. B) Distribution of HOT loci in Hi-C contact regions. X-axis is the number of Hi-C contacts. Numbers in the top row indicate the total number of genomic loci engaging in the given number of Hi-C contacts. Bars indicate the % of loci that contain at least one HOT locus. C) Distribution of the number of HOT loci in regions with a given number of Hi-C contacts. X-axis is the same as B. All of the visualized data is generated from the HepG2 cell line.

HOT regions induce strong ChIP-seq signals. A) Distribution of the signal values of the ChIP-seq peaks by the number of bound DAPs. The shaded region represents the HOT loci. B,C) DAPs sorted by the ratio of ChIP-seq signal strength of the peaks located in HOT loci and non-HOT loci. 20 most HOT-specific (red bars) and 20 most non-HOT-specific (blue bars) DAPs are depicted. B) Fold change (log2) of the HOT and non-HOT loci ChIP-seq signals. C) Distribution of the average ChIP-seq signal in the loci binned by the number of bound DAPs. Rows represent the loci with the bound DAPs indicated by the values of the edges (y-axis). Green box regions demarcate the HOT regions. D) Signal values of ssDAPs, nssDAPs (see the text for description), H3K27ac, CTCF, P300 peaks in HOT promoters and enhancers. All of the visualized data is generated from the HepG2 cell line.

Sequence features of HOT loci. A) Distribution of conservation score in loci bound by DAPs in HepG2 and K562. The logarithmic part of the bins is expressed in terms of the percentages of loci that each bin covers, averaged over two cell lines. The correlation value is Pearson. The shaded region represents HOT loci. B) phastCons conservation scores of regular enhancer, HOT loci, and exon regions. The values are normalized by the average scores of regular enhancers. C) Classification performances (auROC and auPRC values) of HOT loci against the backgrounds of DHS, promoter, and regular enhancer regions. The X-axis values are the methods used for classifications. Methods starting with “seq -” are based on sequences (CNNs and gkmSVM, refer to Methods and main text). Starting with “feat -” are methods where all sequence features are used (GC, CpG, GpC, CpG island). Depicted values for feature-based SVMs are run using linear kernels.

HOT promoters are ubiquitous and HOT enhancers are tissue-specific. A) Fractions of housekeeping genes regulated by the given category of loci (blue). Fractions of the loci which regulate the housekeeping genes (orange) B) Tissue-specificity (tau) scores of the target genes of different types of regulatory regions C) GO enriched terms of HOT promoters and enhancers of HepG2. 0 values in the p-values columns indicate that the GO term was not present in the top 50 enriched terms as reported by the GREAT tool. All of the visualized data is generated from the HepG2 cell line.

H1-hESC HOT loci A) Overlaps between the HOT loci of three cell lines. B) Overlaps between the HOT loci of cell lines defined using the set of DAPs available in all three cell lines. C) Fractions of H1 HOT loci overlapping that of the HepG2 and K562 using the complete set of DAPs, common DAPs, and DAPs randomly subsampled in HepG2/K562 to match the size of H1 DAPs set D) phastCons scores of HOT loci in HepG2, K562, and H1. The ratio of average conservation scores of HOT promoters with that of the HOT enhancers is at the top of every cell line’s group.

Densities of variants A) common INDELs (MAF>5%), B) common SNPs (MAF>5%), C) eQTLs, D) caQTLs E) raQTLs, and F) GWAS and LD (r2>0.8) variants in HOT loci and regular promoters and enhancers. G) Enriched GWAS traits in HOT enhancers and promoters. All of the visualized data is generated from the HepG2 cell line.