Derivation of human naïve cells in the ground state of pluripotency provides promising avenues for developmental biology studies and therapeutic manipulations. However, the molecular mechanisms involved in the establishment and maintenance of human naïve pluripotency remain poorly understood. Using the human inducible reprogramming system together with the 5iLAF naïve induction strategy, integrative analysis of transcriptional and epigenetic dynamics across the transition from human fibroblasts to naïve iPSCs revealed ordered waves of gene network activation sharing signatures with those found during embryonic development from late embryogenesis to pre-implantation stages. More importantly, Transcriptional analysis showed a significant transient reactivation of transcripts with 8-cell-stage-like characteristics in the late stage of reprogramming, suggesting transient activation of gene network with human zygotic genome activation (ZGA)-like signatures during the establishment of naïve pluripotency. Together, Dissecting the naïve reprogramming dynamics by integrative analysis improves the understanding of the molecular features involved in the generation of naïve pluripotency directly from somatic cells.https://doi.org/10.7554/eLife.29518.001
The pluripotent state, emerging during the development of embryos from the totipotent zygote stage into the blastocyst stage in vivo, can be captured indefinitely in vitro as embryonic stem cells (ESCs). Moreover, the generation of induced pluripotent stem cells (iPSCs) further demonstrates that pluripotency can be re-captured directly from somatic cells. Previous studies have demonstrated distinct properties in human ESCs/iPSCs compared with mouse ESCs/iPSCs with regard to cell morphology, transcriptional profiles, signaling requirements and epigenetic modifications(Hackett and Surani, 2014; Nichols and Smith, 2009). Human ESCs/iPSCs represent a primed pluripotent state corresponding to the post-implantation epiblast; while mouse ESCs/iPSCs represent a naïve state of pluripotency found in the pre-implantation blastocyst stage (Brons et al., 2007; Huang et al., 2014; Tesar et al., 2007; Ware, 2017). Recent advances have allowed the development of several strategies to achieve human naïve pluripotency through genetic or chemical manipulations (Chan et al., 2013; Gafni et al., 2013; Hanna et al., 2010; Takashima et al., 2014; Theunissen et al., 2014; Ware et al., 2014). Rigorous molecular criteria for the evaluation of human naïve pluripotency have also been defined through systematic comparisons among naïve pluripotent cell lines derived via different strategies, primed pluripotent cell lines and human pre-implantation embryos(Davidson et al., 2015; Dodsworth et al., 2015; Huang et al., 2014; Theunissen et al., 2016).
The widely used 5iLAF culture system allows the establishment of naïve pluripotency in different types of human cells, including cells from pre-implantation embryos, primed pluripotent stem cells, and even somatic cells (Pastor et al., 2016; Theunissen et al., 2014; Yang et al., 2016). However, in-depth mechanistic studies exploring naïve pluripotency establishment during these reprogramming processes are still lacking, owing to the low efficiency of reprogramming and the high heterogeneity of the reprogramming cells. The human inducible reprogramming system recently developed by Cacchiarelli and colleagues (Cacchiarelli et al., 2015) enables high-resolution analysis throughout the reprogramming process, thus providing a powerful tool for dissecting the molecular roadmap of the gain of naïve pluripotency directly from somatic cells.
In this study, we utilized the human secondary reprogramming system together with the 5iLAF naïve induction system to systematically characterize the transcriptional and epigenetic dynamics involved in the transition from fibroblast cells to naïve pluripotent cells at base resolution. Integrative analysis revealed ordered gene network activation that shared signatures with embryogenesis from the post-implantation to pre-implantation stages, with a transient wave of 8-cell-specific transcripts expression in the late stage of naïve reprogramming, suggesting the activation of gene networks with human zygotic genome activation (ZGA)-like characteristics during naïve pluripotency establishment. Altogether, dissecting the dynamics during naïve induction process provides a comprehensive analysis of the naïve pluripotency reprogramming roadmap, which improves the understating of molecular networks in human naïve pluripotency.
To establish the secondary naïve iPSCs induction system in human, we first reprogrammed primary human embryonic fibroblasts (1° hEF) to primary primed iPSCs (1° piPSCs) by using the doxycycline (dox)-inducible polycistronic human OSKM cassette, as previously reported (Park et al., 2008). After the differentiation of clonal 1° piPSCs, the resultant secondary human inducible fibroblast-like cells (2° hiF) were reprogrammed into iPSCs at naïve pluripotent state by 5iLAF with dox treatment (Figure 1—figure supplement 1A). Consistent with previous observations (Cacchiarelli et al., 2015), the 2° hiFs showed significantly decreased proliferation ability and naïve reprogramming efficiency, as well as increased senescence-associated beta-galactosidase activity with long-term culturing and passaging (Figure 1—figure supplement 1B–D); these effects were rescued by immortalizing the 2° hiF cell lines (2° hiF-T) with constitutive human TERT expression (Figure 1—figure supplement 1B–D).
Upon dox treatment, morphological changes occurred in hiF-T cells at approximately day 2, and small cell aggregates were observed as early as day 6 (Figure 1A). Dome-shaped colonies emerged and expanded after the culture conditions were changed from conventional hESC medium (hESM) to 5iLAF medium on day 6 (Figure 1A). After 20 days of induction, the cells were further cultured under dox-withdrawal conditions for 4 days before clonal expansion, to establish the secondary naïve iPSC lines (niPSC-Ts) (Figure 1A; Figure 1—figure supplement 1A). We also used the OCT4-ΔPE-GFP reporter system to monitor the activation of the OCT4 distal enhancer (DE), the molecular signature of ground state pluripotency, during human naïve reprogramming. GFP+ colonies were observed at day 20 of induction (Figure 1B) and were increased during the reprogramming process (Figure 1B), reaching ~91.4% in clonally derived niPSC-Ts, as detected by FACS analysis (Figure 1C). Immunostaining results of the derived niPSC-Ts exhibited robust expression of core pluripotency markers (OCT4, SOX2, NANOG and TRA-1–60) and naïve pluripotency markers (DPPA3 and UTF1) (Figure 1D). In agreement with recent reports (Pastor et al., 2016), almost all niPSC-Ts were negative for SSEA-3 and SSEA-4 expression (Figure 1D). Thus, using the dox-inducible system and 5iLAF naïve reprogramming strategy, we established a stable and reliable system for the integrative study of the transcriptional and epigenetic roadmap to human naïve pluripotency.
Next, we collected the mRNAs of cells at different time points throughout the naïve reprogramming process and performed RNA-seq analysis (Figure 2A). The Pearson correlation distance analysis of mRNAs segregated the cell samples into three distinct categories including hiF-T/0d/2d/6d, 8d/12d and 14d/20d/24d+dox/24d-dox/niPSC-T (Figure 2—figure supplement 1A). On the basis of the dynamics of the differentially expressed (DE) genes during naïve reprogramming (Figure 2—figure supplement 1B), multi-dimensional scaling (MDS) analysis exhibited a continuous trajectory of transcriptional dynamics from hiF-Ts to established niPSC-Ts (Figure 2B). However, distinct from the results for the primed reprogramming system (Cacchiarelli et al., 2015), the cellular states at days 20–24 during naïve reprogramming were similar, and dox withdrawal did not result in dramatic transcriptional changes in the cells on day 24 (Figure 2B), thus further suggesting the intrinsic differences between naïve and primed pluripotency. Compared with those in the primed reprogramming system, the epiblast-specific markers representing naïve pluripotency were gradually up-regulated (one-tailed t-test p-value=5.99e-22), whereas the primed-specific genes were gradually down-regulated (one-tailed t-test p-value=1.67e-3) during the naïve pluripotency induction process (Figure 2—figure supplement 1C). We also observed transient up-regulation followed by marked down-regulation of transcriptional factors OTX2 and ZIC2 during naïve reprogramming, which were known to direct OCT4 to primed state-specific enhancer sites (Buecker et al., 2014) and exhibited robust activation during primed reprogramming (Figure 2—figure supplement 1D).
Next, we characterized the transcriptome dynamics during naïve reprogramming in detail. On the basis of the DE genes between two adjacent time points of RNA-seq data (Figure 2—figure supplement 1B), we identified several dynamic expression clusters and focused on seven major patterns in three categories (down-regulated, up-regulated and transiently up-regulated), which were defined according to gene ontology (GO) enrichment analysis, developmental cell identity, and expression pattern of marker genes (Figure 2C; Figure 2—figure supplement 2) (Edgar et al., 2013). The activation of OSKM down-regulated genes involved in cell junction and extracellular matrix (ECM) organization occurred in two waves, at day 12 and day 20 of naïve reprogramming (Figure 2C; Figure 2—figure supplement 2). More importantly, pluripotency-associated gene networks were also activated in two waves; the earlier one consisted of genes with an early embryogenesis signature such as gradual up-regulation of CDH1 and NANOG from day 2, whereas the later one comprised genes with pre-implantation characteristics, such as DPPA3 and TFCP2L1 (Figure 2C; Figure 2—figure supplement 2). For the transiently up-regulated genes during naïve reprogramming, the first transient wave peaked at day eight and was enriched in genes characteristic of late embryogenesis and pattern specification, such as LHX9 and the HOX cluster genes (Figure 2C; Figure 2—figure supplement 2). The second wave peaked around day 12–14 and included metabolism-associated genes, such as IGF2 and AFP (Figure 2C; Figure 2—figure supplement 2). The third wave, which peaked during the late reprogramming process at approximately day 24, was enriched in genes important for gamete generation (Fisher’s exact test p-value=1.934e-3) and sexual reproduction (Fisher’s exact test p-value=3.517e-3), such as OVOL1 and CGB5 (Figure 2C; Figure 2—figure supplement 2). Notably, the gene expression program with pre-implantation-like signatures exhibited significantly different dynamics in naïve compared with primed reprogramming systems, which exhibited robust up-regulation along the naïve reprogramming process, peaking in niPSC-Ts (Figure 2C, D). However, such expression was lost upon dox withdrawal and iPSC-T derivation during primed reprogramming (Figure 2D) (Cacchiarelli et al., 2015), results consistent with the MDS analysis results in both reprogramming systems, respectively (Figure 2B) (Cacchiarelli et al., 2015). In addition, in comparing the transcriptional profiles between the naïve reprogramming process of hiF-Ts to niPSC-Ts and human early embryo development (Yan et al., 2013), we found that the reprogramming cells at day 20 and 24 and niPSC-Ts most closely resembled human embryos at the late blastocyst stage (Figure 2E). Immunostaining of the pre-implantation marker DPPA3 and early embryogenesis marker UTF1 in the reprogramming cells at day 20 and day 24 in both induction systems also confirmed our observations in the transcriptional profiles (Figure 2F). Therefore, in contrast to the primed reprogramming system, in which the transcriptional profile finally dropped to the post-implantation-like stage (Cacchiarelli et al., 2015), the derivation of naïve iPSCs was accompanied by ordered waves of gene network activation, with the gene program finally stabilizing at the pre-implantation-like stage.
During human embryo development, the major wave of embryonic genome activation (EGA) occurs at approximately day three at ~8 cell (8C) stage (Niakan et al., 2012; Vassena et al., 2011), times corresponding to the major wave of zygotic genome activation (ZGA) at the 2 cell stage in mice, which is a key transcriptional feature of totipotency (Latham and Schultz, 2001; Macfarlan et al., 2012; Schultz, 2002). By analyzing single cell RNA-seq datasets of human early embryos (Yan et al., 2013), we identified 538 genes with expression levels peaking at the 8C stage during human embryonic development (Figure 3A), including the previously reported EGA regulators with PRD-like homeodomains such as DUXA, OTX2 and LEUTX (Töhönen et al., 2015). Further investigation revealed that these 8C-genes were significantly enriched in the cluster including genes important for gamete generation (Fisher’s exact test p-value=3.784322e-08) (Figure 3A; Figure 3—figure supplement 1A, Figure 2—figure supplement 2) and exhibited similar transiently up-regulated expression patterns during the late stages of naïve reprogramming (Figure 3B). In addition to ZSCAN4 (Figure 3C; Figure 3—figure supplement 1B), whose mouse homologs are 2C-stage restricted and are important for telomere stability in mouse ESCs and iPSCs (Ko, 2016; Zalzman et al., 2010); we also identified KLF17, TBX20 and some PRAMEF family genes including PRAMEF15, PRAMEF5 etc. in the 8C-stage category with transiently enhanced transcription activity during late stages of naïve reprogramming but not in the primed reprogramming system (Figure 3—figure supplement 1C). More importantly, we identified MBD3L2/3/4/5 genes as 8C-genes during human embryo development (Figure 3—figure supplement 1B); these genes are the homologs of the mouse Mbd3l2 gene, which is specifically expressed at the 2C stage during mouse development (Jiang et al., 2002). Similar to the genes listed above, the transcriptional dynamics of MBD3L2/3/4/5 also showed dramatically increased expression during the late stages of naïve reprogramming from day 20 to 24 even after dox withdrawal, then decreased sharply after niPSC-T derivation (Figure 3C, D; Figure 3—figure supplement 1C, D). However, during the primed reprogramming, we observed only a small transient wave of ZSCAN4 expression during the late stages from day 20 to day 24 (Figure 3C), and only marginal MBD3L2/3/4/5 expression throughout the induction process (Figure 3D). Interestingly, the MBD3L2/3/4/5 genes are loci-clustered in the human genome and are far from MBD3L1. Although these genes exhibited differential mRNA expression levels, the amino acid sequences of each gene in the cluster are identical, thus suggesting that the gene cluster might have evolved from mouse Mbd3l2 via the copy-paste mode, similar to the situation observed in mouse Zscan4 retrotransposons and ERV repeats. Using specific human ZSCAN4 and MBD3L2-5 antibodies, we performed immunostaining in cells during reprogramming. While only weak expression of ZSCAN4 as well as no expression of MBD3L2-5 could be detected around day 14, we could observe robust expression of ZSCAN4 and MBD3L2-5 around day 24 during naïve reprogramming (Figure 3E). During primed reprogramming, we could observe weak expression of ZSCAN4 and DPPA3 at 24d + dox, which diminished after dox withdrawal (Figure 3—figure supplement 1E). However, the expression of MBD3L2-5 could not be detected during primed reprogramming process (Figure 3—figure supplement 1E). Western blot analysis of MBD3L2-5 also confirmed our observations in both immunostaining and transcriptional profiles (Figure 3F). Although the transcriptional profiles of MBD3L cluster genes exhibited sharply decreased mRNA expression in the derived niPSC-T lines, we could still observe MBD3L2-5+ cells in naïve iPSC clones compared to their absence in primed iPSCs by immunostaining (Figure 3—figure supplement 1F), suggesting the expression of MBD3L2/3/4/5 as an indicator for naïve pluripotency. In summary, the transient expression of 8C-genes, which occurs specifically during the late stages of naïve reprogramming, suggests the emergence of gene network activation with 8C-stage-like characteristics. Compared with the primed reprogramming that finally stabilized at the post-implantation-like state (Cacchiarelli et al., 2015), naïve reprogramming not only reached the pre-implantation-like status but more importantly, underwent reactivation of gene programs with human EGA-like characteristics.
Several lines of evidence in mouse pre-implantation development revealed activation of many retrotransposons, including ERVs, LINE-1 elements and SINE elements during ZGA at the 2C stage (Peaston et al., 2004). According to our observation of the 8C-gene network wave during naïve reprogramming, we next assessed the dynamics of transposon elements (TEs) during this process. We identified TEs that were specifically and highly expressed at the 8C stage (8C-TEs) during human embryo development (Figure 3G); these TEs also showed transient reactivation during the late stages of naïve reprogramming (Figure 3H). Further investigation of these 8C-TEs showed a significant enrichment (Fisher’s exact test p-value=0.00408) in LTR (Figure 3—figure supplement 2A). Notably, integrants of the HERVK-family that were activated upon EGA during development (Grow et al., 2015), especially MER9a3-HERVK9, were significantly up-regulated at day 12 of naïve reprogramming and down-regulated upon niPSC-T derivation (Figure 3I; J). These results, together with the recent observations of the transient activation of 2C-specific MERVL/ZSCAN4 transcriptional network in the intermediate-late stages of mouse iPSC reprogramming (Eckersley-Maslin et al., 2016), suggest that the transient wave of 8C-transcripts is an important feature of human naïve reprogramming and that the reactivation of a gene network with the human EGA-like signature occurs during this process.
Recent studies in naïve pluripotency evaluation have indicated that naïve human ESCs display a unique transposon signature of cleavage-stage embryos with significant overexpression of the SINE-VNTR-Alu (SVA) family of transposon elements (Theunissen et al., 2016). In our system, reprogramming to naïve pluripotency induced significant up-regulation of several subgroups of the SVA family highly expressed during the morula-stage of human early embryo development, with the highest expression levels in niPSC-Ts (Figure 3—figure supplement 2B). Similar to previous observations (Theunissen et al., 2016), LTR7 and HERVH-int in the LTR7-HERVH family that were highly expressed in hESC0 and hESC10 that simulated the post-implantation stages, with no enrichment in 8C or morula stages during development (Figure 3—figure supplement 2C), exhibited higher expression levels across primed reprogramming than across naïve induction (Figure 3—figure supplement 2D). Hence, the transcriptional dynamics of TEs also revealed an ordered reactivation of transcripts with 8C- and morula-stage signatures in the intermediate-late stages of naïve reprogramming.
We also assessed the transcriptional profiles of KRAB-ZNF genes during reprogramming, which have been reported to play central roles in repressing TEs during early embryogenesis (Huntley et al., 2006; Quenneville et al., 2011). Transcriptional dynamics analyses divided the KRAB-ZNFs in to eight distinct clusters, among which the expression pattern of genes in cluster v was remarkably up-regulated in the naïve reprogramming system with robust expression in niPSC-Ts (Figure 3K). These genes included ZNF534, the repressor of LTR7-HERVH (Figure 3—figure supplement 2E). These results suggested that the naïve-specific KRAB-ZNF genes were activated to repress the TE network with a post-implantation-like signature, accompanied by the reactivation of TEs with characteristics reminiscent of morula-stage embryos during reprogramming to the naïve pluripotent state.
Transcriptional profiling across the naïve reprogramming process revealed an ordered reactivation of diverse developmental pathways from late embryogenesis to the pre-implantation stages. Next we examined the dynamics of epigenetic modifications during this process. In contrast to the continuous hypermethylation during primed reprogramming, a significant decrease in global DNA methylation was observed throughout the naïve pluripotency induction process (Figure 4A; Figure 4—figure supplement 1A), including an average methylation level resembling that in ICM by the end of reprogramming in niPSC-Ts, as previously demonstrated (Figure 4A) (Leitch et al., 2013; Okae et al., 2014; Pastor et al., 2016; Smith et al., 2014; Theunissen et al., 2016). Quantitative analysis via HPLC-MS also revealed a dramatic decrease in 5mC levels throughout naïve reprogramming but not the primed reprogramming process (Figure 4—figure supplement 1B). Dynamic changes in differentially methylated C sites (DMCs) throughout naïve reprogramming also showed an increasing trend of hypomethylated C site ratios (Figure 4—figure supplement 1C), a result consistent with the global demethylation trend (Figure 4A; Figure 4—figure supplement 1A, B). For detailed analysis, we characterized naïve-specific differentially methylated Regions (DMRs) by comparing the DMRs of niPSC-Ts with those of hiF-Ts and piPSC-Ts (Figure 4—figure supplement 1D); the results showed a dramatic trend of down-regulation in the average methylation ratios during naïve reprogramming, but no significant changes during primed reprogramming (Figure 4B). We also found that the identified naïve-specific DMRs were enriched in genes related to naïve pluripotency, which exhibited significant DNA de-methylation at the promoter regions on approximately day 24 during reprogramming (Figure 4C).
Growing evidence indicates that naïve PSCs lose DNA methylation at primary imprints, which are retained throughout pre-implantation development (Pastor et al., 2016). We examined DNA methylation dynamics in 31 stable primary imprints in both the naïve and primed reprogramming processes. The DNA methylation ratios markedly decreased during naïve reprogramming, and methylation was nearly completely lost in niPSC-Ts (Figure 4D), results similar to previous observations (Pastor et al., 2016). However, in primed reprogramming, the decrease was more moderate, and the DNA methylation ratio finally stabilized at ~50% in piPSC-Ts, as previously reported (Figure 4D) (Pastor et al., 2016). The failure to reactivate DNMT3A and DNMT3B, as well as the over-activation of TET2 (Figure 4—figure supplement 1E), might be causative of the genome-wide hypomethylation and loss of imprinting observed in the naïve pluripotency induction process.
To assess the genome-wide landscape of histone modifications, we mapped two active modifications (H3K4me2 and H3K4me3) and two repressive modifications (H3K27me3 and H3K9me3) during naïve reprogramming. We first classified genes to ‘H3K4me2-only’, ‘H3K4me3-only’ and ‘both H3K4me2/H3K4me3’ catalogs based on the H3K4me3 and H3K4me2 modification signals detected on the promoter region of each gene. A transiently decreased pattern of ‘both H3K4me2/H3K4me3’ gene numbers was observed during this process, with an increasing pattern of ‘H3K4me3-only’ gene numbers as well as a decreasing pattern of ‘H3K4me2-only’ signals, thus suggesting that there was a transient transition from H3K4me2 to H3K4me3 on the promoter regions during the naïve reprogramming process (Figure 5—figure supplement 1A).
Next, we focused on H3K4me3 and H3K27me3 dynamics during reprogramming. We observed increasing numbers of bivalent genes during the reprogramming process (Figure 5A,B), however, the up-regulation of bivalent signals was much slower with less gene numbers in niPSC-Ts during naïve reprogramming compared to primed reprogramming (Figure 5A,B). For further comparison, we clustered the genes with similar patterns of bivalency on their promoters in both naïve and primed reprogramming process, and found that there were two clusters of genes highly related to embryonic development showing increasing bivalent signals on their promoter regions during primed reprogramming, while exhibiting almost no H3K4me3/H3K27me3 signals during the whole reprogramming process to naïve pluripotency (Figure 5C; Figure 5—figure supplement 1B), results consistent with the previous report (Pastor et al., 2016; Theunissen et al., 2014; Yang et al., 2016).
To address the effect of histone modification on transcriptional activity, we also examined the H3K4me3 and H3K27me3 modification dynamics around the transcription start sites (TSSs) of gene clusters with different expression patterns. The average H3K4me3 signals were highly correlated with the transcriptional dynamics and increased around the TSSs of genes with pre-implantation and early embryogenesis signatures, while the average H3K27me3 signals decreased around the TSSs of these genes during reprogramming (Figure 5D).
We also checked the dynamics of H3K9me3 modification and found a decreasing pattern of H3K9me3 signals during naïve reprogramming, including integrants of SVA family, the naïve specific TEs (Figure 5E; Figure 5—figure supplement 1C). Moreover, the H3K9me3 signals around the 8C-TEs, especially MER9a3-HERVK9, exhibited a transiently decreased pattern at day 14, strongly correlated with the transiently increasing transcriptional activity of these 8C-TEs during naïve reprogramming (Figure 5F).
Next, we focused on the down- and up-regulated patterns of genes with early/late somatic, early embryogenesis- and pre-implantation-like signatures (Figure 2C; Figure 2—figure supplement 2) and analyzed the relationships among the gene expression patterns, DNA methylation status and histone modifications. Despite their distinct expression dynamics, the transcriptional levels of these genes were closely correlated with the changes in epigenetic modifications at the promoter regions during naïve reprogramming, regulated by DNA methylation, histone modifications, or both (Figure 6A). Deep investigations at the base level also confirmed our observations above (Figure 6—figure supplement 1A). For detailed analysis, we divided the up-regulated genes during reprogramming into three groups: high-CpG-density promoters (HCPs), intermediate-CpG-density promoters (ICPs) and low-CpG-density promoters (LCPs) on the basis of the CpG ratios and the GC contents of their promoters (Figure 6—figure supplement 1B). We observed that, the two clusters with up-regulation trend showed different CpG-density patterns on these promoters (Figure 6A). Furthermore, compared with the genes with lower CpG-densities in their promoters, genes with higher CpG-densities tended to undergo more rapid DNA demethylation and re-establishment of active histone modifications (H3K4me2/3), as well as earlier up-regulation in transcription during naïve reprogramming (Figure 6B; Figure 6—figure supplement 1B). We examined the genes classified by different CpG-densities at promoter regions and found that the genes associated with core naïve pluripotency were enriched primarily in the LCP and ICP groups (Figure 6—figure supplement 1C). Together, these results indicated that the transcriptional dynamics strongly correlated with epigenetic changes during naïve reprogramming, which may be affected by the CpG density at gene promoter regions.
The newly discovered ‘naive’ state of human pluripotency holds great promise for early embryo development studies and therapeutic manipulations, overcoming the application bottlenecks of pluripotent stem cells at primed state. However, recent studies have primarily focused on naïve pluripotency derivation and identification (Dodsworth et al., 2015; Gafni et al., 2013; Hanna et al., 2010; Huang et al., 2014; Takashima et al., 2014; Theunissen et al., 2016; Theunissen et al., 2014), in lack of in-depth mechanical studies of naïve pluripotency establishment during reprogramming. Here, by using the secondary human naïve reprogramming system, we monitored the dynamics of transcriptome and epigenome during naïve pluripotency induction process. We observed ordered reactivations of transcriptional networks sharing signatures reminiscent of reversed embryonic development, including a significant transient wave of 8C-stage-specific transcripts expression in the late reprogramming stages followed by the stabilization of the transcriptome with pre-implantation-like characteristics, thus suggesting that the activation of a network with human EGA-like characteristics occurs during naïve pluripotency induction.
Recent advances have demonstrated that during the primed iPSCs induction, the pre-implantation-like state is reached quickly but is lost upon dox withdrawal and iPSC line derivation (Cacchiarelli et al., 2015). Distinctly from that phenomenon, the reprogramming cells in the naïve pluripotency induction process undergo a gradual reactivation of networks with pre-implantation signatures (Figure 2). Interestingly, the expression wave of 8C transcripts observed specifically in naïve reprogramming (Figure 3) suggests that the 8C-stage-like state is transiently established during the naïve pluripotency induction process. We also observed a significant genome-wide DNA demethylation as early as 12 days after naïve induction, which eventually led to hypomethylation in naïve iPSCs with similar methylation ratios of ICM in vivo, as previously reported (Hanna et al., 2010; Pastor et al., 2016; Theunissen et al., 2016) (Figure 4A; Figure 4—figure supplement 1A). The failure to maintain DNA methylation might be correlated with the lower expression levels of de novo DNA methyltransferase DNMT3A/3B, as well as the higher expression level of 5mC oxidase TET2, across naïve reprogramming compared with the primed system (Figure 4—figure supplement 1E).
Taken together, dissecting and analyzing the dynamics of naïve induction process provide the first molecular roadmap of the reprogramming of human somatic cells into naïve pluripotent state, which improve the understating of the molecular networks in the establishment and maintenance of naïve pluripotency and provide a theoretical basis for further applications.
Human skin specimens from abortive fetus were obtained from the Clinical and Translational Research Center of Shanghai First Maternity and Infant Hospital, Tongji University to make human embryonic fibroblasts (HEFs). The identity of HEFs has been authenticated by STR profiling and the cells are cultured with no mycoplasma contamination.
HEFs were cultured in DMEM (Invitrogen) supplemented with 10% FBS (Invitrogen). Primed iPSCs were cultured in hESM containing DMEM/F12 with 20% knockout serum replacement (KSR) (Invitrogen) and 4 ng/ml bFGF (Peprotech). Naïve iPSCs were cultured in 5iLAF medium containing DMEM/F12: Neurobasal (1:1) (Invitrogen), 1% N2 supplement (Invitrogen), 2% B27 supplement (Invitrogen), 0.5% KSR (Invitrogen), 20 ng/ml human LIF (Peprotech), 8 ng/ml bFGF (Peprotech), 50 μg/ml BSA (Sigma) and the following cytokines and small molecules: PD0325901 (Stemgent, 1 μM), IM-12 (Enzo, 1 μM), SB590885 (R and D systems, 0.5 μM), WH-4–023 (A Chemtek, 1 μM), Y-27632 (Stemgent, 10 μM), and Activin A (Peprotech, 20 ng/ml), and passaged by Accutase (Sigma) every 4–5 days as previously reported (Theunissen et al., 2014; Yang et al., 2016).
For reprogramming, hEFs were infected with the dox-inducible, polycistronic OKMS lentiviral vector (addgene) and cultured in conventional human embryonic stem cell medium(hESM) to generate primary primed iPSCs, which were further differentiated into hiFs and then infected with pBabe-TERT retroviral vector and screened by 1.6 ng/ml puromycin to generate hiF-Ts. Secondary naïve iPSC-Ts were derived by culturing hiF-Ts in hESM with dox for 6 days followed by changing to 5iLAF medium with dox for 14 days, which were then maintained in 5iLAF medium without dox for additional 4 days. After 24 days of infection, mESC-like colonies were picked and expanded wihout dox on irradiated feeder cells in 5iLAF medium.
For differentiation from naïve to primed state, niPSC-T cells were first digested into single cells and plated onto irradiated feeder cells, which were then cultured in conventional hESM supplemented with Y-27632 (Stemgent, 10 μM) for 8–10 days.
5 × 104 cells (early hiF, late hiF, early hiF-T and late hiF-T) were passaged onto 12-well plate in three replicates. Calculation of cell growth rate was performed by counting of cell numbers at 24 hr, 48 hr, 72 hr and 96 hr respectively.
For immunostaining, the primary antibodies used included those against OCT3/4 (1:500, Santa Cruz), SOX2 (1:500, Santa Cruz), NANOG (1:500, Abcam), SSEA3 (1:50, Millipore), SSEA4 (1:50, Millipore), TRA-1–60 (1:50, Millipore), UTF1 (1: 200, Abcam), DPPA3 (1:50, Santa Cruz), ZSCAN4 (1:100, Millipore) and MBD3L2 (1:100, Abcam). The following secondary antibodies were used: Alexa Fluor 594-conjugated donkey anti-mouse IgG (1:500; Invitrogen), fluorescein isothiocyanate (FITC) 488-conjugated donkey anti-rabbit IgG (1:500; Invitrogen), and FITC 488-conjugated donkey anti-goat IgG (1:500; Invitrogen). Anti-H3K4me2 (Millipore), Anti-H3K4me3 (Millipore), Anti-H3K27me3 (Abcam) and Anti-H3K9me3 (Millipore) antibodies were used for ChIP experiments.
For flow cytometry, OCT4-ΔPE-niPSC-Ts were collected, washed and re-suspended in FACS buffer containing PBS (Invitrogen) supplemented with 2% FBS (Invitrogen). All analyses were performed on a MoFloXDP cell sorter (Beckman Coulter). Flow cytometry data were processed using Flow Jo software.
Immunostaining was performed according to standard protocols. In brief, cells were fixed with PBS containing 4% paraformaldehyde (Sigma-Aldrich) overnight at 4°C and permeabilized for 15 min in PBS containing 0.5% Triton X-100. After incubation with blocking buffer (PBS containing 4% BSA) for 30 min at room temperature, cells were incubated with primary antibodies followed by secondary antibodies. Nuclei were stained with 4’,6-diamidino-2-phenylindole (1:10,000; Sigma-Aldrich). Images were taken using A1 Nikon confocal microscope.
Total RNAs were isolated from naïve iPSCs and reprogramming cells using TRizol (Invitrogen). To generate RNA sequencing libraries, KAPA Stranded mRNA-Seq Kit (KAPA) was used following the manufacturer’s instructions. Adapters were offered by TruSeq Library Prep Pooling kit (Illumina). Single-end 50 bp sequencing was further performed on a HiSeq 2500 or 2000 (Illumina) at Berry Genomics Corporation.
Gene expression Analysis
All RNA-seq reads were aligned to the human genome (hg19) using TopHat (v2.0.12) with default parameters (Trapnell et al., 2009). Gene expression level was measured as FPKM using Cufflinks(v2.2.1) to eliminate the effects of sequencing depth and transcript length (Trapnell et al., 2010). MDS clustering analysis was based on expression profile of all genes using R function ‘cmdscale’. For each comparison, differential expressed (DE) genes were founded using GFOLD(v1.1.3) with the GFOLD value >0.58 (fold change >1.5)(Feng et al., 2012). For following analysis, FPKM were log2 transformed after adding a pseudo-count of 1. K-means clustering was performed on combined DE genes of each nearby time points with k = 14 using R library ‘amap’. Distance between genes was measured based on their correlation. Batch effects of samples from different systems are removed using removeBatchEffect function of the R library ‘edgeR’. The negative values in the normalized data are considered as zero. Pearson correlation coefficient was calculated between each two samples on common genes using R function cor().
All RNA-seq reads were aligned to the human genome (hg19) using bowtie2(v 2.2.3) with default parameters. Then FPKM of repeat classes were calculated as the sum of the number of reads that align to each class divided by the genome coverage of the class in kilobases. FPKM of each repeat elements were calculated same as repeat classes. Repeats that expressed in specific stages were identified by comparing the repeat expression in the specific stage with the average expression. The repeats with average expression lower than 0.0001 were discarded.
A time-point specific expressed score was calculated as the expression of given time point divided by the average expression for each candidate gene, which should have a max expressed value in the given stage. The top 100 8C-TEs are the top 100 TE ranked by the time-point specific expressed score.
Functional annotation was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resource (Huang et al., 2009). Gene ontology terms for each function cluster were summarized to a representative term and P-values were plotted to show the significance. The genes are classified according to expression pattern of marker genes and developmental cell identity using LifeMap Discovery (Edgar et al., 2013).
Reprogramming cells and iPSCs were cross-linked and lysed to release chromatin, which were then sonicated by SonicsVibraCell Sonicator (Covaris) and immunoprecipitated with pretreated antibody-coupled ProteinG Dynabeads (Invitrogen) at 4°C for 8–12 hr. The ChIPed DNA was reverse-crosslinked, eluted, purified and quantified by Qubit dsDNA HS assay kit (Life Technologies). For ChIP sequencing, ChIP-seq libraries were prepared according to the protocols described in the KAPA DNA Library Preparation Kits (KAPA). Paired-end 125 bp sequencing was further performed on a HiSeq 2500 or 2000 (Illumina) at Berry Genomics Corporation.
All ChIP-seq reads were aligned to the human genome (hg19) using bowtie2(v 2.2.3) with default parameters. Then reads signal for each sample were generated using the MACS2 (v184.108.40.20640616) pileup function and were normalized to 1 million reads for visualization. Chromatin states were identified and characterized using ChromHMM (v1.11) (Ernst and Kellis, 2012). The reads alignment files of H3K4me2, H3K4me3 and H3K27me3 modifications across 4-time stages during reprogramming path were binned into 200 bp bins using the Binarize Bam command. Next, a model was trained using the Learn Model command with 200 bp resolution and default parameters. Finally, for active marker, the whole genome was classified into four states: H3K4me2-only (without H3K4me3) region, H3K4me3-only (without H3K4me2) region, H3K4me2 and H3K4me3 region and non-marked region at each stage. For bivalency analysis, the whole genome was then classified into four states: H3K4me3-only (without H3K27me3) region, H3K27me3-only (without H3K4me3) region, H3K4me3 and H3K27me3 region and non-marked region at each stage. Genes were classified based on their promoters overlapping with ChromHMM segments in each stage. Promoters were defined as ±2 kb around the TSS. Genes overlapped with multiply segments in one stage were discarded in following analysis.
Alluvial diagrams of reprogramming lineages were plotted using the alluvial function in R to show the transitions of different histone mark covered gene number. The gene number was the counter of genes in each state and the percentages of the specified intervals in each stage were plotted to show the global trend of that specific chromatin state. The alluvial diagrams showed the percentage changes of different chromatin states covered gene during each transition; the lines from the present stage to next stage cannot be traced, as they represent different genes.
For Reduced Representation Bisulfite Sequencing (RRBS), genomic DNAs were extracted from reprogramming cells and iPSCs using DNeasy Blood and Tissue Kits (QIAGEN), digested by (NEB) and inactivated by heating. The sequencing libraries were constructed as previously described (Gu et al., 2011). Paired-end 125 bp sequencing was further performed on a HiSeq 2500 or 2000 (Illumina) at Berry Genomics Corporation. All Reduced Representation Bisulfite Sequencing (RRBS) reads were aligned to human genome (hg19).
The correlation between DNA methylation ratio and gene expression was calculated using function cor() in R on each gene.
Local CpG ratio was calculated for 500 bp bins with 50 bp steps, as previously defined (Weber et al., 2007). The CpG ratio for each transcript was calculated as the max local CpG ratio around ±2 kb of the TSS. The transcripts were then separated into high-CpG-density promoters (HCPs), intermediate-CpG-density promoters (ICPs) and low-CpG-density promoters (LCPs) based on the CpG ratio and the GC content cut-off previously defined (Weber et al., 2007).
The accession number for all the sequencing-derived data in this paper is GEO: GSE89072.
The accession number for RNA-seq data of embryo development used in this paper is GEO: GSE36552.
The accession number for RNA-seq data of primed cell reprogramming used in this paper is GEO: GSE62777.
The accession number for DNA methylation data of human early embryos used in this paper is GEO: GSE49828
Zygotic genome activation revisited: Looking through the expression and function of Zscan4Current Topics in Developmental Biology 120:103–124.https://doi.org/10.1016/bs.ctdb.2016.04.004
Naive pluripotency is associated with global DNA hypomethylationNature Structural & Molecular Biology 20:311–316.https://doi.org/10.1038/nsmb.2510
The molecular foundations of the maternal to zygotic transition in the preimplantation embryoHuman Reproduction Update 8:323–331.https://doi.org/10.1093/humupd/8.4.323
Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cellsNature Structural & Molecular Biology 20:1131–1139.https://doi.org/10.1038/nsmb.2660
Martin PeraReviewing Editor; University of Melbourne, Australia
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Unique molecular events during reprogramming of human somatic cells to induced pluripotent stem cells at naïve state" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Fiona Watt as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Qi-Long Ying (Reviewer #2); Christine Wells (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
This study provides novel information on the time course of reprogramming of human fibroblasts to the naive pluripotent state.
While two reviewers agree that your study will be of interest to the field, there are some serious concerns about the comparative analysis of the gene expression data and the bioinformatics approaches used in these analyses. The reviewers also note a number of instances in which the presentation of results is unclear. You should address these major criticisms concerning data analysis and clear up the areas where the presentation lacks clarity.
A key question relates to the analysis of the data: the authors need to clarify precisely what embryo and primed cell reprogramming datasets they are comparing their data to.
1) Subsection “Transcriptional profiling of naïve reprogramming cells” Figure 2C - how were genes classified as early embryogenesis, pre-implantation, and late embryogenesis? These do not turn up in the GO analysis in Figure 2—figure supplement 2.
2) Subsection “Transcriptional profiling of naïve reprogramming cells” - what is the significance of the activation of genes related to placenta development and gamete generation?
3) Subsection “Transcriptional profiling of naïve reprogramming cells” Figure 2E - what is the source of the human embryo data? Is there a distinction in these data for blastocyst between inner cell mass and trophectoderm? What about the epiblast proper, which the naïve cells should resemble?
4) Subsection “Transcriptional profiling of naïve reprogramming cells” – where do the authors' data show that primed cell reprogramming drops to a post-implantation stage? What post-implantation data in what species?
5) Subsection “8-cell-stage-like state was transiently established during naïve reprogramming” and elsewhere-the statement that an eight-cell stage was transiently established probably claims too much from the data. Unless I misunderstand Figure 3A-B, the induction of a selected set of 8C genes is considerably more modest in naïve reprogrammed cells compared to that in the embryo, and the reprogramming cells may be expressing many genes that are not typical of this stage. Some other form of analysis would be required to prove that the transient cells are really equivalent to an 8C stage.
6) Subsection “8-cell-stage-like state was transiently established during naïve reprogramming” - again the significance of enrichment for genes involved in gamete generation is unclear here.
7) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and Figure 5C - what genes are chosen for study here and why?
8) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and following-the logic of comparison of the H3K4me3 with those of the mouse blastocysts is not clear. What cells of the mouse blastocyst? How about epiblast?
9) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and Figure 5E,F - what about global patterns of H3K9me3?
10) Discussion section – what is the basis for equating these patterns of gene expression with the epiblast? What human embryo epiblast data are used for comparison?
Figure 2 primed heatmap is the first place that we see a comparison between the 2i reprogrammed cells and cells reprogrammed in 5iLAF. This is repeated in Figure 3B. I can't find details of the primed timecourse. Was this generated in parallel with the 5iLAF 2i reprogramming? Is it a 2i timecourse or derived from the first reprogramming stage? If so, this should be included in the over-view cartoons describing the experimental design and detailed in the Materials and methods section.
Or has the data been taken from an external publication (e.g. Cacchiarelli et al., 2015 which is referenced heavily in the Results section describing the transcriptome time series)? If so, I have some concerns about the integration of these two data sets. The GEO accessions for any external data series used in comparisons should be provided in the methods. Details on normalization used to address technical batch, as would be expected from integrating disparate data sources, should be provided. These integration issues are exacerbated in comparisons of single cell RNA-seq and population RNA-seq for the correlation table with the early embryo series.
The authors benchmark the 2i 5LAF transcriptome series against an external early human embryogenesis series using a Pearson correlation. Why this choice of embryo time points? If the 2i iPSC equivalent is late epiblast, is correlation with zygote-blastocyst sufficient or informative? There are relevant later stage human and monkey embryogenesis datasets that would provide a more appropriate developmental window.
How was the correlation analysis performed? There are no substantive details of any of the bioinformatics analyses beyond the RNA-seq processing.
The comparison of 'primed'-iPSC and 5iLAF-2i-iPSC in Figure 3 to the '8-cell' signature that the authors claim is reminiscent of the reactivation of the zygote transcriptome rests heavily on comparative higher expression of this signature in the 5iLAF series and not in the 'primed' iPSC series. As far as I can tell from the methods described, only 1 replicate is provided per time point in the 5iLAF-2i-iPSC timecourse. This spike could represent an oversampling issue or some other technical issue with a small number of libraries – Panel B in figure 6 particularly concerns us with regard to potential over-sampling. I have insufficient information to evaluate the quality of the primed and 5iLAF series without knowing the source of the former. The transposon mining comes from the same RNA-seq libraries, and so the same question about library quality, source of comparison and suspicion in a 'spike' seen in a small number of data points in a time series applies to the transposon analysis.
The authors have not evaluated whether the 8-cell signature is expressed later in human embryonic development. Indeed, many of the individual genes are more broadly expressed. This needs further exploration before concluding that the 5iLAF-iPSC have an equivalent zygote-reactivation wave of chromatin remodeling and gene expression. Of the individual genes highlighted in the RNA-seq, and then confirmed by IF, Figure 3 lacks comparison of the 5iLAF-iPSC with the primed iPSC series or with other pluripotent cell lines, so the claim that these genes are hall-marks of the naïve state remains somewhat anecdotal.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for submitting your article "Unique molecular events during reprogramming of human somatic cells to induced pluripotent stem cells (iPSCs) at naïve state" for consideration by eLife. Your article has been re-reviewed by one peer reviewer, and the evaluation has been overseen by a Reviewing Editor and Fiona Watt as the Senior Editor. The following individual involved in review of your submission has agreed to reveal her identity: Christine Wells (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
The revised manuscript and rebuttal address many concerns raised in the first round of review. However, one reviewer who is highly experienced in this field continues to have several important concerns about the analysis and interpretation of the data. These are summarized below and relate to your analysis of the repeat RNA-seq time course, the specificity of expression of key markers used to define cell state and the reliance on one embryo dataset only, and the degree to which the final cell state of the naïve and primed protocols really differ.
We are therefore offering you the opportunity to address these issues. As a matter of policy, we rarely offer more than one round of revision so please take the following concerns carefully as we may not be able to consider any further revisions.
1) The reanalysis, including repeat RNA-seq of the time course, have been superficially addressed. In fact, it is not clear that this repeat data has been incorporated into any of the analyses or figures. The libraries appear genuinely different, sufficiently so that the numbers of DE genes, and scales on plots should be different between the manuscript versions. They are not. One would assume that this would address potential library artefacts association with the earlier version of the manuscript, however no changes in relevant figures have arisen from inclusion of additional data, suggesting that no additional data has been included in the analysis.
Please clarify whether or not the repeat time course has been incorporated into the revised figures.
2) Figure 2D: I'm concerned that genes I observe to be highly variable in any pluripotent cell culture, including 2i-cultures, are being used as hallmarks of naïve vs primed states, especially given the shallow replication of the data generated here, and the single comparison to one primed dataset generated in another lab. In particular, the claims that dox-withdrawal in primed iPSC (Cacchiarelli) lead to loss of naïve markers such as DPP2,3,5, DNMT3L, NODAL etc. exemplifies the superficial nature of the comparison being made in this manuscript. These are genes that are highly variable in hESC and iPSC cultures, regardless of reprogramming method or 2i-culture condition (see for example https://www.stemformatics.org/expressions/result?graphType=box&datasetID=7253&gene=ENSG00000156574&db_id=56 as well as https://www.stemformatics.org/expressions/result?graphType=box&datasetID=7045&gene=ENSG00000187569&db_id=56). There is not sufficient depth of characterization in this manuscript to demonstrate that gain or loss of these markers, in either reprogramming study (present or Cacchiarelli), are indicative of a higher-order difference in primed or naïve state. The authors can verify the variable expression of markers in the HipSc catalogue, and in expression atlases such as GEO, or stemformatics, which include a large number of iPSC reprogramming datasets, including datasets linked above.
The authors have not adequately addressed the question of comparison to late epiblast stages in their correlation analyses – the Yan dataset is small, and Gao et al., seem to be using hESC as their equivalent to a post-implantation stage. So my original request for more appropriate embryo comparisons have not been addressed. The Yan dataset used post-hoc clustering to predict which cells were PE vs EPI vs Polar TE. The correlations are interesting observations but overstated because they are looking at a limited range of embryonic stages.
Please address the issue of variable expression of markers used to define cell states and refer to other studies of human embryo gene expression to validate markers for particular cell states.
3) The behaviour of primed and naive time courses are very similar up to the derivation of the final iPSC line. I'm not convinced that differences in the final iPSC state are not the result of normal line variation.
Please highlight more clearly the basis for claiming that the end stages of these protocols are genuinely different rather than a function of normal variation between cell lines. Is it possible for instance that the differences in the final cell lines reflect long term adaptation to particular culture conditions rather than inherent differences in cell state? If 2 above is answered adequately, you should be able to address this point. We apologise if the significance of point 2 above was not clear enough in the previous review.
It is important to address these concerns by demonstrating (1) that your analysis of the repeat data is adequately reflected in the revision, (2) by showing that the markers used clearly define what they are claimed to (by reference to databases or more recent human embryo data), and (3) to define clearly the differences in the reprogramming endpoints. These revisions do not entail new experimentation and should not consume too much time.https://doi.org/10.7554/eLife.29518.035
- Yixuan Wang
- Yixuan Wang
- Yixuan Wang
- Yixuan Wang
- Yixuan Wang
- Shaorong Gao
- Shaorong Gao
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank our colleagues in the laboratory for their assistance with the experiments and comments on the manuscript. This work was supported by the Ministry of Science and Technology of China (2016YFA0100400, and 2014CB964601), the National Natural Science Foundation of China (NSFC) (31671530, 31325019 and 31471392), and the Science and Technology Commission of Shanghai Municipality (Grant 14YF1403900, 15XD1503500).
Human subjects: Human subjects: Human skin specimens from abortive fetus were obtained from the Clinical and Translational Research Center of Shanghai First Maternity and Infant Hospital, Tongji University to make human embryonic fibroblasts (HEFs). The patients provided informed consent for tissue donations, and the Biological Research Ethics Committee of Tongji University approved the study.
- Martin Pera, Reviewing Editor, University of Melbourne, Australia
© 2018, Wang et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.