Identification of GPR1-AS orthologs from public placental transcriptomes

UCSC Genome Browser screenshots of the GPR1-ZDBF2 locus in humans (A), rhesus macaques (B) and mice (C). Predicted transcripts were generated using public directional placental RNA-seq datasets (accession numbers: SRR12363247 for humans, SRR1236168 for rhesus macaques, and SRR943345 for mice) using the Hisat2-StringTie2 programs. Genes annotated from GENCODE or RefSeq databases and LTR retrotransposon positions from UCSC Genome Browser RepeatMasker tracks are also displayed. Among the gene lists, only the human reference genome includes an annotation for GPR1-AS (highlighted in green). GPR1-AS-like transcripts and MER21C retrotransposons are highlighted in red. Animal silhouettes were obtained from PhyloPic (mouse silhouette by Katy Lawler, available under a CC BY 4.0 license).

Identification of GPR1-AS orthologs from original placental and extra-embryonic transcriptomes

Predicted transcripts were generated from placental and extra-embryonic directional RNA-seq datasets of chimpanzee (A), rabbit (B), pig (C), cow (D), and opossum (E) with the Hisat2-StringTie2 programs. Genes annotated from RefSeq or Ensembl databases and their LTR positions are also shown. GPR1-AS-like transcripts and MER21C retrotransposons are highlighted in red. Animal silhouettes were obtained from PhyloPic (opossum silhouette by Sarah Werning, available under a CC BY 3.0 license).

Allele-specific RT-PCR sequencing of ZDBF2 in various mammals

Heterozygous genotypes were used to distinguish between parental alleles in adult tissues from tammar wallabies (A), fetal/embryonic tissues from cattle (B), blood samples from rhesus macaques (C) and rabbits (D), respectively. Primers were designed to amplify the 3’-UTR regions of ZDBF2 orthologs and detect SNPs. Each SNP position is highlighted in red. Reverse primers were also used for Sanger sequencing. Animal silhouettes were obtained from PhyloPic.

Multi-species comparison of LTR retrotransposon locations at GPR1 locus

A total of 24 mammalian genomes were compared, including six primates (human, chimpanzee, rhesus macaque, marmoset, tarsier, and gray mouse lemur), one colugo (flying lemur), one treeshrew (Chinese treeshrew), two lagomorphs (rabbit and pika), eight rodents (squirrel, guinea pig, lesser jerboa, blind mole rat, giant pouched rat, mouse, rat, and golden hamster), and six other eutherians (pig, cow, horse, dog, elephant, and armadillo). Among the selected genomes, LTRs that can be considered homologous to MER21C, which corresponds to the first exon of GPR1-AS, are marked in red. In tarsier, treeshrew, lesser jerboa, and giant pouched rat, the orthologous LTRs were annotated as MER21B, which exhibits 88% similarity with MER21C in their consensus sequences through pairwise alignment. MER21B are marked in purple. According to Dfam, the MER21C and MER21B subfamilies are specific to the genomes of Boroeutherians and Euarchontoglires, respectively. The copy number of MER21C/B in selected species is shown in red and purple (LTRs likely matching the GPR1-AS exon are underlined). There are 5,418 and 2,529 copies of MER21C and 2,894 and 1,535 copies of MER21B in human and mouse genomes, respectively.

Comparison of MER21C-derived sequences overlapping the first exon of GPR1- AS orthologs

(A) Phylogenetic tree of MER21C-derived sequences estimated by multiple sequence alignment (MSA) using multiple sequence comparison by log-expectation (MUSCLE) program. (B) Positions of common and unique cis-acting elements at each sequence. (C) Motif structures of the common region that contains E74-like factor 1 and 2 (ELF1 and ELF2) binding motifs. (D) Motif structures of transcription factor AP-2 gamma (TFAP2C) and Zinc finger and SCAN domain containing 4 (ZSCAN4).

Initiation of GPR1-AS transcription before implantation

Genome browser screenshots of the GPR1-ZDBF2 locus in humans at preimplantation stages, including the MII oocyte, zygote, 2-cell, 4-cell, 8-cell, inner cell mass (ICM), and trophectoderm (TE) from the blastocyst. Predicted transcripts were generated from publicly available full-length RNA-seq datasets, with detected GPR1-AS-like transcripts and their FPKM and TPM values highlighted in red. Silhouette was obtained from PhyloPic.

Establishment of ZDBF2 imprinted domain in evolution and genome biology

(A) Scheme of epigenetic and transcriptional changes at the first exon of mouse Liz and human GPR1-AS. (B) Timescale of the evolution of ZDBF2 imprinting and LTR (MER21C) insertion. Animal silhouettes were obtained from PhyloPic (mouse silhouette by Katy Lawler, available under a CC BY 4.0 license; opposum silhouette by Sarah Werning, available under a CC BY 3.0 license).

© 2018 Geoff Shaw. Wallaby silhouette by Geoff Shaw, available under a CC BY-NC 3.0 license.

Identification of GPR1-AS orthologs using public and non-directional RNA-seq data

(A) Heat map showing the expression levels of GPR1, GPR1-AS, and ZDBF2 in different human tissues, including the placenta. Genome browser screenshots of the GPR1- ZDBF2 locus in humans (B) and baboons (C). Predicted transcripts were generated using public non-directional placental RNA-seq datasets (accession numbers: SRR1850957 for humans, GSM4696517 for baboons). Transcript/gene information and LTR retrotransposon positions are shown. GPR1-AS-like transcripts and MER21C retrotransposons are shown in red. Animal silhouettes were obtained from PhyloPic.

Search for GPR1-AS orthologs from embryonic transcriptomes

Predicted transcripts were generated using directional RNA-seq datasets of embryonic proper tissues from rabbit (A), pig (B), bovine (C), and opossum (D) embryos. Transcript/gene information and LTR retrotransposon positions are displayed and the annotated MER21C retrotransposon (only in rabbit) is highlighted in red. Animal silhouettes were obtained from PhyloPic (opossum silhouette by Sarah Werning, available under a CC BY 3.0 license).

Search for germline DMRs from oocyte and sperm DNA methylomes

The DNA methylation (DNAme) levels of individual CpG sites in oocyte and sperm from rhesus macaque (A), pig (B), and bovine (C) whole genome bisulfite sequencing datasets are shown. Oocyte-methylated and sperm-methylated DMRs are highlighted in red and blue, respectively. Predicted transcripts from placental and extra-embryonic directional RNA-seq datasets (shown in Figure 1, 2), genes annotated from RefSeq databases, and LTR positions from UCSC/RepeatMasker are included, with a MER21C retrotransposon overlapping rhesus macaque GPR1-AS highlighted in red. Animal silhouettes were obtained from PhyloPic.

Reanalysis of repeat positions using RepeatMasker

Repetitive elements were re-identified in five mammalian species: mouse, rat, and hamster—where MER21C, which overlaps the first exon of human GPR1-AS, was not found in the homologous region—and rabbit and human, where it was detected. The Percent Identity Plot (PIP, showing a conservation scale between sequences from 50% to 100% on the y-axis) illustrates the order and alignment of the 20 kb region surrounding the GPR1-AS (Liz) transcription start site in each mammalian chromosome. Detected repeat elements are displayed above each plot. RepeatMasking was performed under less stringent settings, including switching search engines from RMblast to HMMER and adjusting speed/sensitivity settings from default to slow. Despite these adjustments, MER21C insertion was not detected in the three rodent species.

Multiple genome alignments at the first exon of GPR1-AS locus

Cactus generates reference-free, whole-genome multiple alignments (Armstrong et al. 2020). The Cactus track from UCSC Genome Browser displays multiple alignments across vertebrate species and evolutionary conservation metrics from the Zoonomia Project (Zoonomia 2020). Green square brackets indicate shorter alignments where DNA from one genomic context in the aligned species is nested within a larger alignment chain from a different genomic context. The alignment within these brackets may represent a short misalignment, a lineage-specific insertion of a retrotransposon in the human genome that aligns to a paralogous copy in another species. SINE and LTR retrotransposon positions from UCSC Genome Browser are also displayed. Silhouette obtained from PhyloPic.

Pairwise alignment between consensus sequences of retrotransposons and GPR1-AS-exonic MER21 sequences

(A) MER21C (or MER21B) sequences located in the GPR1 intron of eutherian genomes and the first exon of mouse Liz were compared with the consensus MER21C sequence.

(B) Human and rabbit MER21C sequences overlapping the first exon of GPR1-AS and the first exon of mouse Liz were compared with the consensus sequences of ERV3/ERVL solo-LTRs present in human and mouse (n = 182). Each graph displays the identity percentages and alignment scores for the top five LTRs with the highest scores. In humans and rabbits, MER21C showed the highest identity with the exonic sequences.

(C) The first exon of mouse Liz was compared with the consensus sequences of all retrotransposons present in mice (n = 1,361). The graph represents the top 10 retrotransposons with the highest scores. In mice, MER21C does not show sufficient sequence identity to the first exon of Liz to distinguish it from other retrotransposons. Pairwise alignment scores and percent identity values for each sequence pair were calculated using Genetyx software.

Promoter activities of first exons of mouse Liz and human GPR1-AS

(A) Constructs (inserted sequences) used for dual luciferase reporter assays in HEK293T cells. A promoter-less vector served as the negative control. (B) Results of dual luciferase reporter assays. Relative fold changes in Firefly luciferase activity (Firefly/Renilla) were normalized to the Firefly/Renilla ratio of the negative control. Error bars indicate mean ± s.e.m. Statistical significance was determined using unpaired t-tests: *P < 0.05, **P < 0.01. Data represent four biological replicates.

Expression patterns of transcription factors and imprinted genes during human preimplantation development

A heat map displaying the average expression of four transcription factors associated to human GPR1-AS or mouse Liz transcription, a primary KRAB-ZFP that binds to MER21C, and three imprinted genes surrounding the ZDBF2 locus.

Human LTR reactivation during preimplantation development

Heat map displaying the average expression of select LTR retrotransposon families in human oocytes and early embryos. MLT2A1/MLT2A2 and HERVK are reactivated between the 4- to 8-cell stage and after the 8-cell stage, respectively (Grow et al. 2015; Hashimoto et al. 2021).

Interspecies epigenomic comparisons between human GPR1-AS and mouse Liz

IGV screenshots of the first exon of GPR1-AS/Liz in human (A) and mouse (B) showing DNA methylation, enrichment of post-translational histone modifications (H3K4me3, H3K9me3, and H3K27me3) and transcription factor binding sites (TFAP2C and ZSCAN4C) from ChIP-Atlas in various tissues. DNA methylomes from oocyte and sperm from mouse and human were published previously (Brind’Amour et al. 2018). Animal silhouettes were obtained from PhyloPic (mouse silhouette by Katy Lawler, available under a CC BY 4.0 license).