Figures and data

Analysis workflow and quality control of H3K4me3 and H3K27ac peaks in the fat-tailed dunnart (Sminthopsis crassicaudata).
(A) Drawing of a D0 dunnart pouch young with dissected orofacial tissue shown in gray. Short-read alignment and peak calling workflow and numbers of reproducible peaks identified for H3K27ac (orange), H3K4me3 (blue) for craniofacial tissue. (B) Log10 distance to the nearest TSS for putative enhancer (orange) and promoters (blue). (C) Log10 of peak intensity and peak length are represented as boxplots and violin plots for putative enhancers (orange) and promoters (blue). Peak intensities correspond to average fold enrichment values over total input DNA across biological replicates. (D) Dunnart pouch young on the day of birth. Scale bar = 1mm. (E) Adult female dunnart carrying four young.

Predicted functional enrichment for dunnart peaks.
(A) 304 significantly enriched GO terms clustered based on similarity of the terms. The function of the terms in each group are summarized by word clouds of the keywords. Rows marked by P were driven by genes linked to putative promoters, rows marked by E were driven by genes linked to putative enhancers. (B) Enriched TF motifs for transcription factor families (HOMER). PWM logos for preferred binding motifs of TFs are shown. The letter size indicates the probability of a TF binding given the nucleotide composition.

Genes linked to craniofacial enhancers and promoters in the dunnart are reproducibly expressed and involved in embryonic vasculature, muscle, skin and sensory system development.
(A) The majority of nearest genes assigned to candidate enhancers and promoters were reproducibly expressed in dunnart face tissue, and (B) reproducibly expressed genes in dunnart were associated with both a promoter and enhancer region. (C) Biological processes enriched for genes medium to highly expressed (>10TPM) and linked to both a promoter and enhancer region (FDR-corrected, p < 0.01).

Analysis workflow and features of H3K4me3 and H3K27ac enhancers and promoters in dunnart and mouse.
(A) Alignment filtering and peak calling workflow and (B) number of reproducible predicted regulatory elements identified in the dunnart and mouse embryonic stages. Log10 of distance to the nearest TSS for putative (C) enhancers and (D) promoters.

Genes near enhancers highly expressed only in dunnart are involved in development of the skin, muscle and mechanosensory systems.
Gene set intersections across mouse (E10.5-E15.5) and dunnart (P0) for (A) genes near promoters and (B) genes near enhancers. (C) Gene ontology term enrichment for top 500 highly expressed dunnart genes. (D) Top 50 highly expressed genes (TPM) in dunnart compared with mouse embryonic stages. Scale bar in E. (E) Expression levels (TPM) for keratin genes across dunnart and mouse.

Dunnart expressed genes are associated with two gene clusters with distinct temporal expression patterns in the mouse.
(A) Genes in cluster 2 and cluster 3 plotted with their z-scaled temporal expression (logCPM). Color-coding represents membership value (degree to which data points of a gene belong to the cluster). Gene Ontology enrichment for biological processes enriched in (B) cluster 2 and (C) cluster 3 (FDR-corrected p<0.01).

Homer motif enrichment for dunnart promoters and enhancers for the top 20 enriched TF families.
Log10 FDR adjusted p-value scores are shown as a heatmap in enhancers and promoters. Individual TFs are labeled on the right hand side of the heatmap. Distinct colors represent TF families.

Features of predicted promoters and enhancers in the dunnart and mouse.
(A) Log10 peak intensity (measure of enrichment) for enhancers (orange) and promoters (blue) prior to filtering. (B) Log10 distance to the nearest TSS for enhancers (orange) and promoters (blue). After clustering and filtering for high-confidence promoters we observed consistent patterns for features including, (D) CpG and GC content and, (E) Log10 peak intensity and Log10 peak length.

Top GO enriched terms in the dunnart and mouse embryonic stages
for (A) genes near enhancers and, (B) genes near high-confidence promoters.

Heatmaps showing gene expression values (TPM) for common developmental genes across dunnart and mouse.

Temporal gene expression dynamics.
(A) Z-scaled temporal expression (logCPM) plotted across embryonic timepoints for five clusters. Membership values are used to indicate to what degree a data point belongs to each cluster. (B) Top enriched biological processes for each cluster (FDR corrected p<0.01, background gene set is all differentially expressed genes used for clustering). (C) Odds ratio estimates with 95% confidence interval for genes expressed in dunnart in each cluster. (D) Number of genes in each cluster for mouse and dunnart.

Validation of antibodies in dunnart craniofacial tissue with immunofluorescence and qPCR.
Localisation of (A) H3K27ac (pink, scale bar = 250um) and (B) H3K4me3 (pink, scale bar = 80um), in D0 dunnart head sections with nuclei stained with DAPI (blue) (C) enhancer regions expected to be enriched in the IP samples presented as the percentage of the input control sample as measured by qPCR.

deepTools quality control plots for dunnart subsampled aligned BAM files.
(A). Overall similarity between BAM files based on read coverage within genomic regions with Pearson correlation coefficients plotted for H3K27ac, H3K4me3 and input control. (B) Fingerprint plot showing a profile of cumulative read coverages for each BAM file. All reads overlapping a window (bin) of the specified length are counted, sorted and plotted for H3K27ac, H3K4me3 and input control.

Number of peaks per gene for enhancer- and promoter-associated peaks.
(A) Log10 number of peaks per gene. Genes with greater than 50 peaks are noted. (B) Scatter-plot with distance to the closest gene on the x-axis and number of peaks per gene on the y-axis. There is a weak but significant correlation between the number of peaks per gene and the distance to the next closest gene in both enhancers and promoters.

Subsetting high-confidence promoter-associated peaks with k-means clustering.
(A) Barplot for the number of promoter-associated peaks per cluster and histogram showing the distribution of peak distance from the nearest TSS in each cluster. (B) Genomic annotations for promoter-associated peaks in each cluster. (C) GC content, (D) Log10 peak length, (E) CpG content, and (F) Log10 of clustered promoter-associated peaks, enhancer-associated peaks, and unclustered promoter-associated peaks. Statistical significance (Wilcoxon, FDR-adjusted, $p <$ 0.00001) compared to cluster 1 promoter-associated peaks is denoted by ****.

deepTools fingerprint plot for mouse subsampled aligned BAM files.
Cumulative read coverages for each BAM file. All reads overlapping a window (bin) of the specified length are counted, sorted and plotted for (A) H3K27ac, (B) H3K4me3.

deepTools correlation plots for mouse subsampled aligned BAM files.
Overall similarity between BAM files based on read coverage within genomic regions with Pearson correlation coefficients plotted for (A) H3K27ac, (B) H3K4me3

Genome alignment between the fat-tailed dunnart (Sminthopsis crassicaudata and mouse (Mus musculus).
(A) Genome alignment workflow. (B) Genome coverage and exon coverage (bp) between mouse and dunnart. Coverage of exons recovered after LiftOver. (C) Block size distribution for the dunnart including number of blocks and total size of the blocks.

Volcano plot for both upregulated and downregulated differentially expressed genes from mouse (any stage) and dunnart.
Stringent log2 fold change and pvalue cutoffs were used given the comparisons between datasets from two different species.