Allele-specific expression across diverse human-chimpanzee hybrid cell types.

a) Six cell types were differentiated from human-chimpanzee hybrid induced pluripotent stem cells. These six cell types represent diverse body systems, including motor neurons for the central nervous system, retinal pigment epithelium for eye, skeletal myocytes for skeletal muscle, cardiomyocytes for the heart, hepatocyte progenitors for the liver, and pancreatic progenitors for the pancreas. b) Heatmap showing the result of hierarchical clustering performed on genes with highly variable normalized allele counts. c) Result of running PCA on normalized allelic counts for all samples and cell types. d) Result of PCA performed on normalized allele counts for each individual cell type separately. Cardiomyocytes and motor neurons are shown here. e) Expression of marker genes for each cell type.

Human-chimpanzee ASE is largely cell type-specific.

a) Outline of measurement of allele-specific expression. Reads from the human and chimpanzee alleles are counted and differences in read counts identified. b) Thousands of genes with ASE were identified for each cell type, with many genes only showing ASE in a single cell type (Conway et al., 2017). c) The neurotrophins and their receptors as examples of genes showing cell type-specific ASE patterns. DESeq2 estimate of log2 fold-change (human/chimpanzee) are shown in the heatmap and significance is indicated by asterisks where *** indicates FDR < 0.005, ** indicates FDR < 0.01, and * indicates FDR < 0.05. Zero asterisks (i.e. a blank box) indicates FDR > 0.05. d) Plot showing that genes with ASE are enriched for genes showing cell type-specific expression patterns across all cell types. Asterisks indicate p-values rather than FDR using the same system as in 2c. e) Top gene sets with evidence for lineage-specific selection in cardiomyocytes and motor neurons are shown. The length of the bars indicates the number of genes in a category with biased expression in each cell type.

Allele-specific chromatin accessibility across diverse human-chimpanzee hybrid cell types.

a) Schematic outlining the ATAC-seq protocol. A hyperactive transposase cleaves accessible DNA and adds adapters enabling measurement of chromatin accessibility. b) PCA on normalized allelic counts from ATAC-seq. c) ASCA in the promoter of CTSF, and ASE for the CTSF gene. d) Differences in ASCA were quantified and plotted separately based on chromHMM annotation. The order is based on the median of z-score transformed absolute log fold-change between human and chimpanzee across all cell types, with higher z-scores indicating greater divergence in accessibility. e) Pearson correlation between ASE and ASCA for all cell types with all peaks (left) or only differentially accessible peaks (right, defined as peaks with nominal binomial p-value less than 0.05). Pearson’s r values are shown in the heatmap and significance is indicated by asterisks where *** indicates p < 0.005, ** indicates p < 0.01, and * indicates p < 0.05. Zero asterisks (i.e. a blank box) indicates p > 0.05.

Motor neuron-specific human-biased ASE and ASCA for FABP7 and the promoter of FABP7.

a) Allelic ATAC-seq tracks are shown in the peak containing the annotated FABP7 promoter (highlighted in yellow). b) The top panel shows allelic CPM of the FABP7 promoter across cell types and the bottom panel shows the log fold-change across cell types. c) The top panel shows allelic CPM of the FABP7 gene across cell types and the bottom panel shows the log fold-change across cell types. d) Outline of the process for variant effect prediction with Sei for FABP7. All sequences input to Sei were centered at the FABP7 promoter. The human sequence, chimpanzee sequence, and partially “chimpanized” human sequences (modified by systematically switching the human allele to the chimpanzee allele separately for each human-chimpanzee difference) were fed into Sei to predict the effects of these variants on chromatin state. e) Histogram of the probabilities of various chromatin states and transcription factor binding predicted by Sei was plotted for the human sequence, the chimpanzee sequence, and the human sequence with one human-chimpanzee difference swapped to match the chimpanzee sequence. The human sequence with SNV26 changed to the chimpanzee allele and the human sequence switched at both indels are shown as examples. A histone modification (H3K4me3) predicted in two cell types was labeled to illustrate how the predictions depend on both input sequence and cell type. f) Plot of the predicted effects of all single nucleotide differences and indels between the human and chimpanzee genomes in the Sei input window (the FABP7 promoter is highlighted in yellow). g) Scatterplots showing the correlation of the effects of both indels (left panel), each individual indel (middle two panels), and a representative SNV (right panel) on Sei predictions with the difference in Sei predictions between the human and chimpanzee sequences. The root mean square error (RMSE) was computed and shown in each figure.

Motor neuron-specific chimpanzee-biased ASE for GAD1 and ASCA for a HAR near the GAD1 TSS.

a) Allelic ATAC-seq tracks are shown for the peak near the GAD1 TSS that contains a HAR (highlighted in yellow). b) The top panel shows allelic CPM of the CRE near the GAD1 TSS across cell types and the bottom panel shows the log fold-change across cell types. c) The top panel shows allelic CPM of the GAD1 gene across cell types and the bottom panel shows the log fold-change across cell types. d) Outline of the process for variant effect prediction with Sei for GAD1. All input sequences to Sei were centered at the HAR. The human sequence, the chimpanzee sequence, and modified sequences with the human sequence altered at each substitution to match the chimpanzee sequence were fed into the Sei sequence model to predict the effects of these variants on the chromatin state. e) Histogram of the probabilities of various chromatin states and transcription factor binding predicted by Sei was plotted for the human sequence, the chimpanzee sequence, and two examples in which the human sequence with only one SNV “chimpanized” (human w/SNV) was input to Sei. The histogram of the probability of the sequence having a particular epigenomic annotation (predicted by Sei) was plotted for human, chimpanzee, human w/SNV1 changed to match the chimpanzee sequence, and human w/SNV21 changed to match the chimpanzee sequence. Two epigenomic annotations were labeled as examples that show the different values output by Sei with these two different sequence inputs. f) Plot of the predicted effects of all single nucleotide differences between human and chimpanzee in the Sei input window centered at the HAR (highlighted in yellow). Positions were switched to the chimpanzee allele individually. g) Scatterplots showing the correlation of the effects of four SNVs on Sei predictions with the difference in Sei predictions between the human and chimpanzee sequences. The root mean square error (RMSE) was computed and shown in each figure.