An integrative transcriptomic atlas of organogenesis in human embryos
Abstract
Human organogenesis is when severe developmental abnormalities commonly originate. However, understanding this critical embryonic phase has relied upon inference from patient phenotypes and assumptions from in vitro stem cell models and non-human vertebrates. We report an integrated transcriptomic atlas of human organogenesis. By lineage-guided principal components analysis, we uncover novel relatedness of particular developmental genes across different organs and tissues and identified unique transcriptional codes which correctly predicted the cause of many congenital disorders. By inference, our model pinpoints co-enriched genes as new causes of developmental disorders such as cleft palate and congenital heart disease. The data revealed more than 6000 novel transcripts, over 90% of which fulfil criteria as long non-coding RNAs correlated with the protein-coding genome over megabase distances. Taken together, we have uncovered cryptic transcriptional programs used by the human embryo and established a new resource for the molecular understanding of human organogenesis and its associated disorders.
https://doi.org/10.7554/eLife.15657.001eLife digest
Individual organs and tissues form in human embryos during the first two months of pregnancy. Any errors during this crucial stage of human development can result in miscarriage or serious birth defects. Yet remarkably little is known about how this process works. What is known has been inferred from studies of how other animals develop, human stem cells grown in a laboratory, and babies born with genetic conditions that cause developmental problems.
Genes control the way that organs and tissues form, and are switched on or off in complex patterns during development to ensure that particular cells develop into one type of organ and not another. When genes are switched on, their DNA is copied into molecules called RNA. Many RNA molecules are used as templates to make proteins, which then perform critical roles in cell processes. One way to find out which genes are activated during development is to identify which RNAs are made by cells in the embryo.
Here, Gerrard, Berry et al. used a technique called RNA-sequencing to identify the RNAs that human embryos make while their organs and tissues form. The RNA came from many different tissues including the heart, limbs and the roof of the mouth. Gerrard, Berry et al. developed a new computational model that used the identity of the RNAs to decode the precise patterns of gene activity in the tissues. The model correctly identified many genes that were already known to cause developmental problems when faulty, and identified numerous others that are now predicted to cause developmental defects in humans.
Gerrard, Berry et al. also discovered over 6,000 RNAs in the human embryos that are unlikely to code for proteins. These “non-coding” RNAs may have other roles in cells, such as switching off genes, and many of them appear to be specific to human embryos. Together, these findings have uncovered new patterns of gene activity that drive development in human embryos and provide a resource for studying how organs and tissues form. Future challenges are to understand what controls these patterns of gene activity, and how the patterns change over time.
https://doi.org/10.7554/eLife.15657.002Introduction
Embryogenesis encompasses the progression from fertilized zygote to blastocyst and through gastrulation to establish the three germ layers of ectoderm, mesoderm and endoderm, from which all organs and tissues subsequently arise during organogenesis. Remarkably little is known about this latter phase of assembling organs and tissues in human due to the restricted availability of human embryonic tissue and its tiny size. Previous transcriptomics post-implantation have sampled either the whole embryo by expression microarray (Fang et al., 2010), thus lacking organ-specific resolution and the vast majority of long non-coding (lnc) transcription; or included lnc expression by massively parallel short-read RNA sequencing (RNA-seq) but focussed on single sites such as limb bud (Cotney et al., 2013) or pancreas (Cebola et al., 2015). RNA-seq from NIH Roadmap and other studies during or after the end of the first trimester of pregnancy falls after the embryonic period (which ends at 56–58 days post-conception (Carnegie Stage 23)) and commonly reflects near terminal differentiation within heterogeneous fetal organs and tissues (Jaffe et al., 2015; Roadmap Epigenomics Consortium, 2015; Roost et al., 2015). As a consequence of these combined deficiencies, we set about compiling global transcriptomic data during the critical phase of human organogenesis, sampling each germ layer and including sites of mixed origin that are subject to major developmental disorders such as cleft palate and limb abnormalities (Figure 1a).

Profiling the transcriptomes underlying organogenesis in human embryos.
(a) Human embryo showing the 15 tissues and organs subjected to RNA-seq. (b) High dynamic range of human embryonic RNA-seq. The combined dataset (black) included expression of >90% of annotated protein-coding genes (GENCODE18 [Harrow et al., 2012]). (c) Human embryogenesis possesses a distinctive transcriptome. Human embryonic read counts were compared with equivalent fetal datasets from NIH Roadmap (Roadmap Epigenomics Consortium, 2015) using edgeR (Robinson et al., 2010) and a false discovery rate (FDR) of 0.05 (see Materials and methods, Supplementary file 1B). Negative log10 p-values are shown for selected biological process Gene Ontology (GO) terms with significant enrichment in either the embryonic or fetal gene sets following Fisher's exact test applied using the elimination algorithm (Alexa and Rahnenfuhrer, 2010) (Supplementary file 1C contains the full list of enriched terms). (d) Selected sites illustrate the highly specific expression of HOX genes within the human embryo.
Organs and tissues from fifteen human embryonic sites were sequenced in two sets of biological replicates (except pancreas and tongue) to generate 28 strand-specific RNA-seq datasets with 44–90 million uniquely mapped reads per replicate (Figure 1a; Supplementary file 1A, which contains information on embryonic stages). Global transcription rates across all organs and tissues were comparable over a high dynamic range; approximately 70% of protein-coding genes contained 100–10,000 mapped reads (Figure 1b; Supplementary file 2). We assessed whether our human embryo datasets identified earlier developmental processes than currently available fetal data (Roadmap Epigenomics Consortium, 2015). There were three-fold the number of differentially expressed genes in the fetal datasets but equivalent enrichment of gene ontology (GO) terms in the embryo, including many early developmental processes such as morphogenesis of an epithelial bud, anterior/posterior pattern specification and embryonic morphogenesis. These were in contrast to homeostatic processes enriched in the fetal dataset (Figure 1c; Supplementary file 1B–C).
Sampling gene expression across multiple sites allowed us to set about deciphering the precise transcriptomic codes responsible for the development of the different human embryonic organs and tissues. While ZNF and ZSCAN family members were broadly expressed discrete site-specific expression was more apparent for individual members of other transcription factor families (Figure 1—figure supplement 1) exemplified by the HOX gene clusters (Figure 1d). User-defined sets of up to five developmental transcription factors characteristic for a particular organ or tissue displayed very high levels of tissue specificity (Figure 1—figure supplement 2). However, while principal components (PC) analysis (PCA) or clustering grouped biological replicates, relationships between different organs and tissues other than the distinctiveness of brain and liver were not resolved (Figure 1—figure supplements 3–4). Non-negative matrix factorisation (NMF) also allows unbiased clustering of gene expression (Gaujoux and Seoighe, 2010). By setting the parameters such that representative genes were only extracted once, we identified eleven non-overlapping ‘metagenes’ from the complete expression dataset with clear tissue-specific signals for thyroid, liver, RPE, brain, heart and adrenal gland (Figure 1—figure supplement 5; Supplementary file 1D). We hypothesized that these new signals might allow benchmarking to assess the fidelity of in vitro differentiated stem cells, similar to a previous report (Roost et al., 2015). As an exemplar, we chose hepatocyte differentiation for which RNA-seq data are available including positive (primary adult hepatocytes) and negative (human embryonic fibroblasts) control data (Du et al., 2014). Clear enrichment for the stem cell-derived hepatocytes and the primary hepatocytes (but not the fibroblasts) was apparent in metagene 2, the cluster of 39 genes indicative of human embryonic liver. From this starting point, we wanted to move beyond the unique organ-specific signatures to study how patterns of gene expression co-varied across tissues. While relaxing NMF parameters would allow non-exclusive gene selection across metagenes, we also wanted to capture differences in gene expression between organs (e.g. aspects of what is not expressed as a contributing factor to an organ’s identity). Moreover, different embryonic organs are related according to developmental lineage. We reasoned that being able to apply a lineage structure would create natural assemblies of co-regulated genes (Figure 2a). Accordingly, we adapted a method from spatial ecology and phylogenetics (Jombart et al., 2010a, 2010b) to constrain PCA by imposing a hierarchical developmental lineage and termed this approach LgPCA. We also included RNA-seq from undifferentiated human embryonic stem cells (Roadmap Epigenomics Consortium, 2015) to represent pre-gastrulation human biology. Of the total thirty-one principal components (PCs) arising from LgPCA the first fifteen now identified patterns of gene expression across groups of related tissues in addition to unique organ-specific signatures (Figure 2b) while PCs 16–31 sampled heterogeneity within individual organs and tissues (Figure 2—figure supplement 1). In keeping with this transition PCs 1–4 ordered samples reminiscent of very early developmental events: pluripotency (extreme positive loadings in PC1; ‘PC1 high’), early brain formation (extreme negative loadings in PC2; ‘PC2 low’), foregut endoderm (PC4 low) and intermediate mesoderm (PC4 high). PCs 5–15 resolved the individual organs and tissues; for instance low PC5 loadings discriminated liver from the other foregut endoderm derivatives. Heatmaps illustrated the composite or tissue-specific signals emanating from the genes with the most extreme PC loadings which also underlay appropriate developmental gene ontology (GO) terms (Figure 2c–d and Supplementary file 1E–F).

Lineage-guided PCA discovers unique transcriptional signatures regulating human organogenesis.
(a) Interpreting gene expression profiles is dependent upon the underlying developmental lineage. Similar expression profiles in closely related tissues imply fewer regulatory events. (b) Lineage-guided principal components analysis (LgPCA) constrains PCA by imposing a developmental lineage on the different organs and tissues. The first 15 PCs are shown including biological replicates for the human embryonic organs and tissues integrated with human embryonic stem cell data (Roadmap Epigenomics Consortium, 2015). PC scores for the 15 different dimensions are shown in black (positive/high) or white (negative/low) with scale (extremeness) indicated by circle size (sign/direction is arbitrary). Unique transcriptional signatures were resolved for broad organ groupings (e.g. foregut endoderm derivatives, low scores in PC4), single organs or tissues (e.g. palate, high scores in PC13) or across tissues unrelated by germ layer but connected by multisystem congenital disorders (e.g. heart and limb, low scores in PC13). (c) Heatmaps of quantile normalised expression values of the most extreme 50 genes for selected PCs from the LgPCA. (d) Gene Ontology (GO) terms and their underlying genes illustrate the specific signatures from the LgPCA (further examples in Supplementary file 1F).
Identifying the master regulators that differentially orchestrate organogenesis across the body has not previously been possible directly in human embryos. We undertook this in two different ways, both based on studying the 1000 genes with the most extreme loadings in PCs 1–15 that identified gene co-expression patterns across tissues and within individual organs (Figure 2b; Supplementary file 1E). We interrogated these gene sets for regulatory networks based on the enrichment of transcription factor motifs (Janky et al., 2014). Numerous well known master regulators were recovered alongside previously unappreciated factors for either broad tissue groups (e.g. foregut endoderm derivatives) or individual organs (Figure 3a). As proof-of-principle, this also included proven regulators of human pluripotency, NANOG, OCT4 and MYC, at an extreme of PC1. Remarkably, in several instances approaching half of the 1000 genes with the most extreme PC loadings imputed co-regulation by a single transcription factor, such as HNF4A in the liver or SRF in the heart. Alongside NR5A1, the data predicted RUNX and BAD as novel regulators of human adrenal and gonadal development (Figure 3a). As a second approach to study gene regulation, we extracted the transcription factors (typically <5%) from amongst the 1000 most extreme genes in PCs 1–15 (Supplementary file 1G). We searched the Mouse Genome Informatics database (MGI) and in 309/594 instances there was a relevant mouse mutation phenotype supporting the notion that the transcription factors identified by LgPCA are key regulators of human organogenesis. At the lowest extreme of PC5 (liver) the twenty-two transcription factors contained all three of those required for reprogramming fibroblasts directly to hepatocytes (Huang et al., 2014). This suggests novel fate programming roles for transcription factors at the extremes of other PCs (including new potential regulators of pluripotency amongst the sixteen factors containing zinc fingers in PC1). In keeping with these regulatory roles, the extreme PC loadings in the LgPCA data also prioritized those transcription factors responsible for major congenital disorders (Supplementary file 1G). Because LgPCA is not limited to individual organs this included a novel ability to predict multisystem abnormalities such as the combined heart and limb defects of Holt-Oram syndrome (OMIM 142900, TBX5, PC13 low) or the palate and limb abnormalities associated with mutations in TP63 (OMIM 603543, PC3 high).

LgPCA points to master regulators of human organogenesis and the causes of human congenital disorders.
(a) Predicted regulation by iRegulon (Janky et al., 2014) of the most extreme 1000 genes for different PCs identifies known and unexpected transcription factors regulating human organogenesis. In several examples, individual transcription factors (e.g. REST, NR5A1, HNF4A, FOXA1 and SRF) were predicted to regulate nearly half of the most extreme 1000 genes. (b) Transcription factors at the extremes of individual PCs in the LgPCA are responsible for a diverse range of congenital disorders (red names in the ovals for heart and testis; full details in Supplementary file 1G). To validate the utility of these data, we conservatively selected some of the earliest critical regions for these disorders (two ‘Proven’ examples on the left; all 53 listed in Supplementary file 1H). LgPCA frequently isolated the correct transcription factor from an average of 111 genes across >10 Mb, shown for NKX2-5 in congenital heart disease and SOX9 in campomelic dysplasia. Beyond this validation LgPCA similarly predicts causative transcription factors (blue) for many unresolved congenital disorders such as developmental heart abnormalities in Chr1p36 deletion syndrome and sex reversal / disorders of sex differentiation (DSD) (all 13 examples in Supplementary file 1H).
Mutations in genes encoding transcription factors are over-represented causes of congenital disorders, most likely due to their critical function during organogenesis and inadequacy when haploinsufficient. The enrichment of transcription factors with specific disease-associations at the extremes of the LgPCA implicates the co-enriched genes as leading candidates for unanswered clinical syndromes. To test this model we identified some of the earliest chromosomal mapping or patient deletion data for the known disease-associated transcription factors from Supplementary file 1G. 53 disorders were suitable for assessment with an average critical region of 13.7 Mb each containing an average of 111 protein-coding genes (Supplementary file 1H). Strikingly, in 37 instances (73%) LgPCA uniquely selected the correct transcription factor and in 48 instances (91%) narrowed the field down to three or fewer transcription factors. When applied to 13 syndromes (mostly deletion disorders) where the causative gene remains unresolved clear predictions of causality emerge, for instance in cleft palate (DLX5, DLX6, LHX8 and FOXF2) or cerebellar disorders (ZIC1 and ZIC4) (Supplementary file 1H). Frequently, there is an appropriate mutant mouse phenotype such as CASZ1 in cardiac malformations, part of Chr1p36 deletion syndrome, or SOX10 in the 46,XX disorder of sex differentiation (DSD) linked to duplication on Chr22 (Figure 3b and Supplementary file 1H).
Non-coding transcription has emerged as a critical regulator of cell and developmental biology (Goff and Rinn, 2015). A dedicated programme operating during human organogenesis seemed likely as 81 out of the 1571 genes enriched in embryogenesis compared to the fetal datasets were annotated long intergenic non-coding (LINC) transcripts (Supplementary file 1B). To look beyond this we assembled strand-specific transcripts not recognized by current genome annotation [GENCODE 18 (Harrow et al., 2012)] and systematically named them individually according to recommended criteria (Mattick and Rinn, 2015). 6251 unique loci accounted for in excess of 9 Mb of novel polyadenylated transcription from the human genome (Figure 4a and Supplementary file 1I). The vast majority of transcripts fulfilled criteria as lnc RNAs by assessment of coding potential (CPAT score <0.2) (Figure 4b), length over 200 base pairs (bp) and an absence of reads spanning splice junctions to currently annotated genes (Mattick and Rinn, 2015). These lncRNAs were classified as either bidirectional, antisense or overlapping, or by exclusion intergenic, according to orientation and position in relation to the annotated genome (Mattick and Rinn, 2015). Transcripts were most commonly 500–1,500 bp but could extend to over 600 Kb (Figure 4c) and showed high tissue-specificity with the median Tau value (Yanai et al., 2005) of 0.86, much higher than for protein-coding genes (0.63) but consistent with previously annotated non-coding genes (0.89). We investigated the association between this novel human embryonic transcriptome and the annotated genome. Reduced physical distance to expressed annotated genes markedly increased the likelihood of novel transcript co-expression (Figure 4e), although the best correlations were by no means always with the closest gene (Figure 4f–g). The median distance to the closest annotated gene was 7.7 Kb (Figure 4—figure supplement 1) while on average the best correlation was at 188 Kb (random prediction was 476 Kb). Over half (3634) of the lnc transcripts were classified positionally as LINC RNAs. While LINC RNAs can harbour important regulatory function, how to forecast their relationship(s) with the protein-coding genome and prioritize the investigation of thousands of new transcripts is immensely challenging (Goff and Rinn, 2015). As a first step, the multi-tissue nature of our dataset allowed intricate correlative patterns to be deciphered implying putative relationships; for instance over a 2 Mb window and across numerous genes on chromosome 7 between HE-LINC-C7T121 and TBX20, which encodes a developmental cardiac transcription factor mutated in a wide range of congenital heart disease (Figure 4h).

6251 novel transcripts identified during human organogenesis show low coding probability and high tissue-specificity.
(a) Novel transcript models were merged across tissues (n = 9180; Supplementary file 4), assessed for coding potential using CPAT and classified (Mattick and Rinn, 2015) as overlapping (OT), antisense (AS), bidirectional (BI), intergenic noncoding (LINC) and/or transcripts of uncertain coding potential (TUCP, if CPAT >0.2). LINC or TUCP transcripts were numbered sequentially (T number) along each chromosome (C, either X, Y or 1–22) whereas BI, AS and OT transcripts were named by association with the annotated gene (‘Z’). A small proportion of transcripts fulfilled dual criteria as BI/AS/OT and TUCP. 6251 unique, non-overlapping, filtered transcript models were identified (the longest from each locus, >200 bp; Supplementary file 1I). (b) Histogram of coding probability determined using CPAT (Wang et al., 2013). 9% of transcripts were classed as TUCP. The small proportion with clear open reading frames (CPAT score = 1.0) were predominantly OT transcripts. (c) Distribution by size of transcript. 114 transcripts were >10 Kb. (d) Tissue specificity was calculated using Tau (Yanai et al., 2005) based on the mean normalized read counts for each tissue or organ site. 80% of transcripts showed Tau values >0.7 indicating high tissue specificity. Details on exon and read counts, and proximity to surrounding genes are shown in Figure 4—figure supplement 1. (e) Box and whisker plots show the correlation between expression of the novel transcripts and surrounding annotated genes within set chromosomal distances of the novel transcriptional start site. Mean correlation was near zero beyond 1 Mb. (f) Histogram showing the correlation (r) between expression of each novel transcript and its closest annotated gene. One quarter of novel transcripts show a correlation (r > 0.71) with the nearest gene; another quarter shows minimal correlation (r = ±0.14). There was no strong anticorrelation. g-h, Expression of the novel transcript is not always correlated with the immediately adjacent gene, illustrated by heatmaps across the 15 organs and tissues. (g) Expression of the novel transcript, HE-LINC-C6T24, located just over 2 Kb from FOXQ1, correlates strongly with FOXF2, approximately 65 Kb distant. (h) Heatmap demonstrates the poor correlation of expression between HE-LINC-C7T121 and most of the nine genes within 1 Mb on Chr7 but near perfect correlation with TBX20 located ~0.7 Mb away beyond two intervening genes.
Taken together, this study reports the first comprehensive transcriptomic atlas during human organogenesis to complement parallel initiatives from later development and adulthood (Jaffe et al., 2015; Roadmap Epigenomics Consortium, 2015; Roost et al., 2015). Subjecting transcription from many sites to a method of analysis that incorporated developmental lineage deciphered novel genetic signatures, predicted causality in many human developmental disorders and associated novel non-coding transcription with expression from the surrounding protein-coding genome. At present, the data arise from a relatively narrow window of embryonic development but set the stage for future longitudinal studies for individual organs over time. The tiny amounts and scarcity of human embryonic tissue also necessitated aspects of pooling across different Carnegie stages for some sites but it is striking that this had no impact on ascertaining organ and tissue-specific transcriptomic signatures by LgPCA. The integrated data are expected to be particularly valuable to stem cell researchers examining the fidelity of PSC differentiation in vitro or searching for transcription factors for direct reprogramming of chosen cell lineages. Finally, the discovery of a major new programme of non-coding transcription adds a fresh layer of detail on the spatiotemporal regulation of the human genome.
Materials and methods
Human material
Request a detailed protocolHuman embryonic material was collected under ethical approval, informed consent and according to the Codes of Practice of the Human Tissue Authority and staged by the Carnegie classification as described previously (Jennings et al., 2013). This clinical material was collected on site overseen by our research team with immediate transfer to the laboratory. Individually identified tissues and organs (details in Supplementary file 1A) were immediately dissected. The adrenal gland, whole brain, heart, kidney, liver, entire limb buds, lung, stomach, testis, thyroid and anterior two-thirds of the tongue were readily identifiable as discrete organs and tissues. All visible adherent mesenchyme was removed from organs and tissues under a dissecting microscope. For the adrenal gland, this includes the capsule which allowed separation from the kidney. The ureter, emerging from the renal pelvis, was removed separately from the kidney. For the heart, a window of tissue was removed from the lateral wall of the left ventricle. A segment of the liver was dissected from each embryo that avoided the developing gall bladder. The trachea was removed from the lung at its entry point into the lung parenchyma. The stomach was identified between the gastro-oesophageal and pyloric junctions. The testis was dissected free from the attached mesonephros. While the thyroid was readily visualized as a discrete organ in the neck, it unavoidably contained the developing parathyroids and thus this tissue type was referred to throughout as ‘thyroid/PTH’. The palatal shelves were dissected on either side of the midline. The eye was dissected and the RPE peeled off mechanically from its posterior surface with validation possible under the dissecting microscope because the RPE is very darkly pigmented compared to the other ocular structures. All samples were collected into Trizol (Thermofischer) or Tri reagent (Sigma-Aldrich) for total RNA isolation as individual tissue or organ types followed by treatment with DNaseI (Sigma-Aldrich). Once the quality of each RNA sample had been confirmed, samples were pooled in order to obtain sufficient RNA for each biological replicate (Supplementary file 1A). The pancreas dataset derived from the same tissue collection was re-used from a previous study (Cebola et al., 2015).
RNA-seq and transcriptome
Request a detailed protocolQuality and integrity of total RNA samples were assessed using a 2100 Bioanalyzer or a 2200 TapeStation (Agilent Technologies) according to the manufacturer’s instructions. RNA sequencing (RNA-seq) libraries were generated using the TruSeq Stranded mRNA assay (Illumina, Inc.) according to the manufacturer’s protocol. Briefly, total RNA (0.1–4 µg) was used as input material from which polyadenylated mRNA was purified using poly-T, oligo-attached, magnetic beads. The mRNA was then fragmented using divalent cations under elevated temperature and then reverse transcribed into first strand cDNA using random primers. Second strand cDNA was then synthesized using DNA Polymerase I and RNase H. Following a single 'A' base addition, adapters were ligated to the cDNA fragments, and the products purified and enriched by PCR to create the final cDNA library. Adapter indices were used to multiplex libraries, which were pooled prior to cluster generation using a cBot instrument. The loaded flow-cell was then sequenced (paired-end; 101 + 101 cycles, plus indices) on an Illumina HiSeq2000 or HiSeq2500 instrument. Demultiplexing of the output data (allowing one mismatch) and BCL-to-Fastq conversion was performed with CASAVA 1.8.3. The RNA-seq was conducted in three batches at different times as a necessity of how human embryonic tissue was collected over time (Supplementary file 1A). Where organs were sequenced across batches (palate, RPE, kidney, testis, adrenal gland, heart / left ventricle and liver) biological replicates clustered together (Figure 1—figure supplement 4).
RNA-seq reads from the Illumina platform were mapped to the human genome (hg19) strand-specifically using TopHat 2.0.9 (Trapnell et al., 2012) and the GENCODE 18 gene annotation set (Harrow et al., 2012). We also remapped the published pancreas RNA-seq dataset (Cebola et al., 2015) obtained from material isolated previously in our laboratory. Additionally, a dataset of hepatocyte differentiation RNA-seq (Du et al., 2014 GEO: GSE54066) was downloaded, re-mapped and quantified as per our own data. Commonly applied RNA-seq normalisation methods such as TMM assume a small proportion of differentially expressed genes in any one dataset (Dillies et al., 2013). Because the highly distinct tissues surveyed here differed strongly on the scale of thousands of genes (for instance liver versus brain) we used quantile normalisation which gave a lower median coefficient of variation than either no or TMM normalization. Read counts from the different datasets were quantile normalized using the R package preprocessCore (Bolstad, 2007). Tissue-specificity was scored per gene using Tau (Yanai et al., 2005) on normalized read counts across all samples. Initial genome-wide relationships were assessed using PCA (Figure 1—figure supplement 3) and hierarchical clustering (heatmap, Figure 1—figure supplement 4).
To compare our samples with RNA-seq from the NIH Roadmap project (Roadmap Epigenomics Consortium, 2015) uniquely mapped strand-specific RNA-seq reads were counted into a set of non-redundant exon annotations (custom made from GENCODE 18 annotations) using bedtools intersect (Quinlan and Hall, 2010). Exon level counts were then summed into a single total per gene per sample. Counts were quantile normalized across samples. NIH roadmap samples (Roadmap Epigenomics Consortium, 2015) used in this study are listed in Supplementary file 1J. For the analysis of human embryonic RNA-seq with comparable Roadmap fetal data (adrenal gland, heart, kidney, lung, limbs, stomach and testis) a single pairwise differential expression test was undertaken using the R package edgeR (Robinson et al., 2010) and an FDR < 0.05.
NMF
Request a detailed protocolNon-negative matrix factorisation (NMF) searches complex expression data, comprising thousands of genes, for a small number of characteristic ‘metagenes’ (Gaujoux and Seoighe, 2010). NMF was performed using the nmf R package (version NMF_0.20.5) (Gaujoux and Seoighe, 2010) to extract tissue-specific metagenes. Non-normalised read counts were filtered to remove all Y-linked genes, the X-inactivation gene XIST and genes with fewer than 100 reads across all samples. Initially 50 runs each of ranks 11–18 and using the default ‘Brunet’ algorithm (Brunet et al., 2004) were performed to find an optimal factorisation ‘rank’ (r). The maximal cophenetic distance was used to select the value of r. Subsequently, 200 runs using the optimal rank were performed to assess consistency of sample groupings between runs. Non-overlapping (i.e. tissue-specific) gene sets were extracted from each metagene by filtering on basis contribution >0.8.
LgPCA
Request a detailed protocolThe LgPCA approach was adapted from established phylogenetic PCA methodology (Jombart et al., 2010b) and performed using quantile-normalized, gene-level read counts, a high memory (512 Gb) compute node and the ppca function from the adephylo R package (Jombart et al., 2010a). A broad user-defined guide tree (Figure 2b) based on well-established knowledge of mammalian gastrulation and downstream lineage relationships was imposed on the different organ and tissue types following which the adephylo R package weighted the principal components by the lineage auto-correlation between samples; increased if related samples were similar and lessened if related samples were more different. As in the description from Jombart and colleagues the resulting components represented ‘global’ structures (where similarity is high between related samples) and ‘local’ structures (where related samples are dissimilar) (Jombart et al., 2010b). We used the LgPCA to extract all the global patterns from the data (PCs 1–15). These patterns were not apparent if lineage relationships were not included nor were they altered if any one tissue, such as palate, was altered within the broad lineage structure (data not shown). The global patterns in PCs 1–15 infer co-regulatory patterns of gene expression across human organogenesis. The ‘local’ patterns thereafter captured heterogeneity between tissue replicates (Figure 2—figure supplement 1) (while PC7 separated the two PSC populations these RNA-seq datasets represent separate cell lines from NIH Roadmap). We used the Abouheif distance as implemented in adephylo (Jombart et al., 2010a), which takes into account the topology of the specified tree but does not use branch lengths.
Gene set enrichment
Request a detailed protocolFor the comparison of the embryonic versus fetal datasets Gene Ontology term enrichment was performed on upregulated genes (FDR < 0.05) using Fisher’s exact test with the elimination algorithm of the R package topGO (Alexa and Rahnenfuhrer, 2010). For the LgPCA, annotated ontology nodes (>10 genes) were tested for each loadings vector for each PC against background using the Wilcoxon test. Tests were performed sequentially moving up the separate GO ontologies (Biological Process (BP), Molecular Function (MF) and Cellular Component (CC)), excluding significant scoring genes from later tests (the topGO ‘elim’ method).
iRegulon analysis of regulation in the extremes of the LgPCA
Request a detailed protocoliRegulon is a computational method which tests for enrichment amongst precomputed motif datasets to decipher transcriptional regulatory networks in a set of co-expressed genes. The 1000 genes with the most extreme loadings at either end of each PC (‘high’ and ‘low’) from the LgPCA were loaded into Cytoscape (version 3.2.1) (Shannon et al., 2003) and used as queries to the iRegulon plug-in (version 1.3, build 1024) (Janky et al., 2014). 20 Kb was examined centred on the transcriptional start site (TSS) under default settings.
Novel transcripts
Request a detailed protocolSample-specific transcriptomes were assembled with Cufflinks (version 2.2.0) (Trapnell et al., 2010). Transcriptomes were combined (‘cuffmerge’; -min-isoform-fraction = 0.1) and compared with the original GENCODE 18 reference (‘cuffcompare’). We filtered out known transcripts using the ‘Transfrag class codes’ (http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/#transfrag-class-codes) to retain only wholly intronic (‘i’, of which there were none), unknown (‘u’), antisense (x) and overlapping (‘o’) transcripts. We discarded all other classes including pre-mRNA (class ‘e’), novel-isoforms spliced to known exons (class ‘j’), and 3’ run-ons within 2 kb of the end of the transcript annotation (class ‘p’). In addition, some remaining non-spliced transcripts may theoretically represent first or last exon (UTR) extensions; to delimit these, we calculated the distance on the same strand to the closest downstream transcription start site (to consider potential 5’ UTR extension) and upstream transcription termination site (to consider potential 3’ UTR extension). Names were assigned to these novel transcripts following suggested criteria (Mattick and Rinn, 2015) (Figure 4a) with the sole adaptation that bidirectional (BI) transcripts were defined as having their TSS within 1 Kb of the TSS of the associated annotated gene. No transcripts mapped to the same strand within the introns of any annotated gene excluding the possibility of unspliced transcripts from annotated genes being erroneously defined as novel transcripts. All transcript sequences (annotated and unannotated) were scored for protein-coding potential using CPAT (based on human training data included with CPAT) (Wang et al., 2013). A threshold of >0.2 was used to define ‘Transcripts of Uncertain Coding Potential’ (TUCP). Where there were multiple transcripts from a single locus, the longest transcript was retained in assembling the final dataset of 6251 novel transcripts. Transcript level read counts for the embryonic samples and NIH Roadmap samples (Supplementary file 1J) were generated for the merged transcriptome using bedtools multicov (vers 2.21.0) (Quinlan and Hall, 2010). The correlations between each of the 6251 transcripts and all annotated genes within 1 Mb were calculated using only the human embryonic data from this study.
Data availability
Request a detailed protocolMapping coordinates against multiple genome versions using a range of common pipelines and summary count data are hosted at www.manchester.ac.uk/human-developmental-biology. To view data in the UCSC genome browser, a trackhub is available: http://www.humandevelopmentalbiology.manchester.ac.uk/data/hub_manchester_hdb/hub.txt.
Data availability
-
An integrative transcriptomic atlas of organogenesis in human embryosPublicly available at EBI ArrayExpress (accession no: E-MTAB-3928).
-
TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitorsPublicly available at EBI ArrayExpress (accession no: E-MTAB-3061).
References
-
TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitorsNature Cell Biology 17:615–626.https://doi.org/10.1038/ncb3160
-
A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysisBriefings in Bioinformatics 14:671–683.https://doi.org/10.1093/bib/bbs046
-
Transcriptome analysis of early organogenesis in human embryosDevelopmental Cell 19:174–184.https://doi.org/10.1016/j.devcel.2010.06.014
-
A flexible R package for nonnegative matrix factorizationBMC Bioinformatics 11:367.https://doi.org/10.1186/1471-2105-11-367
-
GENCODE: the reference human genome annotation for The ENCODE projectGenome Research 22:1760–1774.https://doi.org/10.1101/gr.135350.111
-
iRegulon: from a gene list to a gene regulatory network using large motif and track collectionsPLoS Computational Biology 10:e1003731.https://doi.org/10.1371/journal.pcbi.1003731
-
Putting phylogeny into the analysis of biological traits: a methodological approachJournal of Theoretical Biology 264:693–701.https://doi.org/10.1016/j.jtbi.2010.03.038
-
Discovery and annotation of long noncoding RNAsNature Structural & Molecular Biology 22:5–7.https://doi.org/10.1038/nsmb.2942
Decision letter
-
Janet RossantReviewing Editor; University of Toronto, Canada
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "An integrative transcriptomic atlas of organogenesis in human embryos" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Janet Rossant as the Senior Editor and Reviewing Editor. One of the three reviewers has agreed to reveal their identity: Majlinda Lako.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. Given the nature of this contribution, we would like to consider it in the category of a Tools and Resources paper.
This paper provides transcriptional profiles from different tissues during early human embryogenesis and as such represents a significant and novel contribution. To date most of our knowledge on human embryogenesis is based on extrapolation from model animals. These new data will provide a resource to the community enabling more understanding of tissue differentiation and organogenesis. It will also help with identification of new disease genes and understanding developmental defects.
Essential revisions:
1) There are some important issues raised by two of the reviewers with regard to the statistical analysis of the data. These concerns are provided in detail below and must be addressed in the revised manuscript.
2) The linkage of the transcriptional data to developmental disorders is not well validated. There is considerable mouse developmental and phenotype data available in databases that could be mined to enhance the value of your findings and help determine whether the genes identified are truly key regulators of tissue development.
3) The claim that novel transcripts are likely lncRNAs was questioned and needs to be analyzed more carefully.
Statistical issues:
There are concerns of the methodology that would require further clarification:
For the NMF, while the implementation of the methods is likely to be appropriate, there may be a potential flaw in the interpretation of what the analysis may achieved. Regarding "non-overlapping" metagenes, by default, genes can be represented in multiple metagenes, based on the Brunet algorithm in the NMF R package (also the default algorithm). The Methods state that this algorithm has been used, so following on from that, the assertion that "non-overlapping metagenes" could be extracted from the complete dataset and the "potential of coordinated deployment of overlapping genes" is ignored may be incorrect. Furthermore, the critique that "NMF failed to discriminate transcriptional signatures for a number or organs or tissues, or discern the relationships between them… " may also apply to the outcome of lgPCA. Therefore, this is not a compelling argument for discarding the NMF.
The analysis could have been taken further to examine the enrichment of all the metagenes, similarly to the one for liver (Figure 1—figure supplement 3B) for all the specific tissue metagenes (“clear tissue-specific signals for thyroid, liver, RPE, brain, heart and adrenal gland”), and a similar approach could be taken for the downstream functional analysis of lgPCA.
For lgPCA, while the interpretation of the results would be correct, there is confusion in the nomenclature of loading and scores, which are fundamental concepts in PCA. PC scores refer to the eigenvalues (magnitude of variance) which determine the separation between the samples (as can be gleaned from Figure 2B). PC loadings are eigenvectors and describe the contribution of the variables (in this case genes) that causes the separation of the samples. Therefore, PC scores should rightly be PCA loadings (main text, third and fourth paragraphs). This confusion should be attended to. Additionally, it would be helpful to mention at least once in the main text that when referring to "PC1 low" for example, this is meant to be low/negative PC1 scores.
Both NMF and lgPCA are doing similar things, i.e. clustering samples based on gene expression to find which genes drive sample separation. The differences are that NMF can reveal the similarity between gene expression patterns for samples, which are clustered with the same metagenes, in this case kidney and testis to metagene 3, (Figure 2—figure supplement 3B). LgPCA can discern the differences in gene expression between certain samples (for example Brain and liver in PC2 of Figure 2B). If it was intended to include the NMF analysis in this study, a point should be made as the rationale of which samples are to be clustered together and why, before the results may be compared to those from the lgPCA. Alternatively, the NMF analysis can be omitted since the lgPCA method is sufficient to accomplish the data analysis.
The analysis presented in Figure 1C compares the authors' data with that presented in the fetal datasets from the NIH Roadmap. The authors claim substantial up-regulation of sets of genes in their dataset but this is not quantified statistically (instead an arbitrary cut-off of >2-fold enrichment was employed). While there is relatively little data from which to perform a rigorous analysis it should be possible to use appropriate models (gaining power across genes such as GSEA) to find enriched categories in a more robust manner than presented here (i.e., this would allow appropriate correction for multiple testing).
https://doi.org/10.7554/eLife.15657.023Author response
Essential revisions:
1) There are some important issues raised by two of the reviewers with regard to the statistical analysis of the data. These concerns are provided in detail below and must be addressed in the revised manuscript.
Thank you. We have addressed these issues in turn in the relevant later section.
2) The linkage of the transcriptional data to developmental disorders is not well validated. There is considerable mouse developmental and phenotype data available in databases that could be mined to enhance the value of your findings and help determine whether the genes identified are truly key regulators of tissue development.
We agree that this is important. The signposting in the original submission was weak. We have addressed this by re-writing the main text to reference clearly Supplementary file 1G (was F), which details 594 transcription factors at the extremes of principal components in the LgPCA. For each of these genes we mined the OMIM database for human developmental disorders (Column G of Supplementary file 1G). To link to mouse developmental biology we mined the Mouse Genome Informatics (MGI) database for each of the 594 genes providing each MGI ID in Column I and consistent mouse phenotypic features (identifiable in 309 instances) in Column J. In addition, each gene was mined in Pubmed; where additional information from mouse models was particularly insightful over and above the MGI data PMIDs are given in Column K. These data reinforce the message that our identified transcription factors are indeed developmental regulators. In addition, it is relevant that the LgPCA led to consistent GO terms (e.g. Figure 2D, ‘metanephros development’ for the embryonic kidney signature). These GO terms and the genes underlying them relate to regulatory roles discovered during mouse development.
3) The claim that novel transcripts are likely lncRNAs was questioned and needs to be analyzed more carefully.
We have debated this fast-moving area amongst ourselves throughout the course of the study. We considered it best to draw on up-to-date criteria and used the commentary by Mattick & Rinn in Nature Structural & Molecular Biology last year.
An important point is scale: we have discovered a huge amount of unannotated transcription and catalogued it as systematically as possible. This approach and publicising its existence is important because its scope (>6,000 transcripts) is beyond the capacity for functional investigation by any one lab. Thus, we view our job as to make the data open access for others to interrogate. This point was made by Mattick & Rinn: ‘Although these transcripts may be collectively referred to as long ‘noncoding’ RNAs (i.e. lncRNAs), this is a necessarily vague catch all term that is useful only as a moniker until they are better characterized’. We recognise, agree with and have adopted this initial starting point of using the ‘catch all term’, lncRNA. In addition, we have gone to great length to build on their suggested nomenclature (Figure 4A and Supplementary file 1I) rather than pursue the easier option of simple ‘in-house’ numbering (prevalent in other manuscripts) which lacks reference to genomic structure and location and is less helpful to the community (‘Given the enormous numbers of long apparently noncoding transcripts, it is important to have a logical and flexible structure for cataloguing and parsing newly found transcripts, in a way that will foster rather than impede their functional and mechanistic exploration.’).
With this in mind: ‘The broad term lncRNA refers to a transcript >200 nt in length that does not appear to contain a protein-coding sequence.’ The 200bp threshold is regarded as a practical lower limit for standard RNA-seq thus excluding smaller RNAs, such as tRNAs, snRNAs and MIRs. We assessed coding potential using CPAT (Figure 4B; Wang et al., NAR, cited in the manuscript), a method recommended by Mattick & Rinn. Our cut-off was <0.2, stricter than the 0.364 limit (determined by two-graph user operator characteristic) from the CPAT User Guide to increase our confidence that these transcripts are truly lncRNAs. Of note, over 90% of our novel transcripts had this very low coding potential.
We recognize that histone modifications can be useful for increasing the confidence with which transcriptional start sites of lncRNAs can be ascribed. We have started these studies (e.g. H3K4me3 ChIP-seq) for a number of organs as a part of a separate analysis of genome regulation (e.g. organ-specific enhancer usage during human embryogenesis). While too preliminary for inclusion here and well beyond the 2-month requested turnaround for this revision, it is reassuring that our early data on H3K4me3 marks do indeed correlate with the start of our lncRNAs.
Taken together, we feel that ‘lncRNA’ is the best descriptor for our novel transcripts filtered for length >200 bp and CPAT scores <0.2.
Statistical issues:
We note that the end-point of the Referees’ discussion on the NMF data is that we could simply remove it (which is a very helpful option—thank you!). We would like to reserve that opportunity; however, for now, we would like to try and preserve the NMF subject to the clarifications below for the following reasons. In previous advice we were encouraged to include an alternative approach that specifically targets organ-specific signatures. This seems reasonable. In addition, we feel that our manuscript will interest a broad audience not immediately familiar with the wide range of possible computational analyses. Therefore, it seems fair and reasonable to publicise the potential of NMF alongside our own approach of LgPCA. Clearly, if our clarifications and revisions are not convincing, the NMF analysis can be removed.
There are concerns of the methodology that would require further clarification:
For the NMF, while the implementation of the methods is likely to be appropriate, there may be a potential flaw in the interpretation of what the analysis may achieved. Regarding "non-overlapping" metagenes, by default, genes can be represented in multiple metagenes, based on the Brunet algorithm in the NMF R package (also the default algorithm). The Methods state that this algorithm has been used, so following on from that, the assertion that "non-overlapping metagenes" could be extracted from the complete dataset and the "potential of coordinated deployment of overlapping genes" is ignored may be incorrect.
Thank you for this point. Our original submission was not clear, some of the text needed moderation and we failed to reference Supplementary file 1D. We applied the algorithm specifically to decipher tissue-defining gene-sets. Therefore, we used only those genes with ‘basis contribution’ >80% across metagenes meaning that each of the genes featured in only one metagene. This is depicted in Figure 1—figure supplement 5 and now referenced properly as Supplementary file 1D. In not describing this adequately, we gave the false perception that this was the limit of the NMF approach when, as the reviewers correctly highlight, we could have reduced the threshold to allow overlapping sets of genes. We have amended to text to discuss this point and clarify our rationale.
Furthermore, the critique that "NMF failed to discriminate transcriptional signatures for a number or organs or tissues, or discern the relationships between them… " may also apply to the outcome of lgPCA. Therefore, this is not a compelling argument for discarding the NMF.
The original text was poorly chosen and we have tempered this section and the final paragraph. As a general aside, we feel that signatures were more comprehensively discerned in the LgPCA, in part by the opportunity to study profiles across PCs (e.g. adrenal gland across PC8 and PC9 as discussed in the text and as in Supplementary file 1F–G). Moreover, we wanted to explore a methodology that permitted a lineage to be imparted upon the global dataset to model human embryogenesis. We have tried to get these points across in revised text.
The analysis could have been taken further to examine the enrichment of all the metagenes, similarly to the one for liver (Figure 1—figure supplement 3B) for all the specific tissue metagenes (“clear tissue-specific signals for thyroid, liver, RPE, brain, heart and adrenal gland”), and a similar approach could be taken for the downstream functional analysis of lgPCA.
The inclusion of this figure panel and these data was in response to a helpful suggestion from a stem cell expert who commented on the importance of our data in allowing researchers to benchmark in vitro differentiated PSCs. In the example shown, we have taken the NMF metagenes (Figure 1—figure supplement 5A) and tested for gene enrichment in data from Du et al. (2014, Cell Stem Cell) for human PSCs differentiated to hepatocytes with primary human adult hepatocytes (positive control) and human embryonic fibroblasts (negative control). The hepatocytes (both native and PSC-derived) from Du et al. showed appropriate signals in Metagene 2 (liver) but not the other metagenes. (It is interesting to note some signal for metagene 1 (thyroid – another anterior endoderm derivative) in the in vitro PSC differentiated hepatocyte-like cells.) This analysis required robust differentiation protocols and suitable RNA-seq data from differentiated PSCs; hence why we chose the exemplar of liver were data are available from Du and colleagues and hence why we feel it would be premature to extend the analyses to other NMF metagene or LgPCA organ signatures (lack of in vitro protocols/RNA-seq data).
For lgPCA, while the interpretation of the results would be correct, there is confusion in the nomenclature of loading and scores, which are fundamental concepts in PCA. PC scores refer to the eigenvalues (magnitude of variance) which determine the separation between the samples (as can be gleaned from Figure 2B). PC loadings are eigenvectors and describe the contribution of the variables (in this case genes) that causes the separation of the samples. Therefore, PC scores should rightly be PCA loadings (main text, third and fourth paragraphs). This confusion should be attended to.
Thank you. We are very grateful for these important points. We have corrected this oversight.
Additionally, it would be helpful to mention at least once in the main text that when referring to "PC1 low" for example, this is meant to be low/negative PC1 scores.
We have added additional text.
Both NMF and lgPCA are doing similar things, i.e. clustering samples based on gene expression to find which genes drive sample separation. The differences are that NMF can reveal the similarity between gene expression patterns for samples, which are clustered with the same metagenes, in this case kidney and testis to metagene 3, (Figure 2—figure supplement 3B). LgPCA can discern the differences in gene expression between certain samples (for example Brain and liver in PC2 of Figure 2B). If it was intended to include the NMF analysis in this study, a point should be made as the rationale of which samples are to be clustered together and why, before the results may be compared to those from the lgPCA. Alternatively, the NMF analysis can be omitted since the lgPCA method is sufficient to accomplish the data analysis.
We have adapted the text. NMF can indeed generate metagenes across tissues (e.g. kidney and testis as pointed out) as can LgPCA (e.g. limbs and palate in PC3). As commented on, LgPCA can also discern differences in gene co-expression between organs (e.g. tongue and heart in PC15). To note, NMF does not start from a position of pre-determined sample clustering. Hoping to resolve this issue and recognising that the original text was unhelpful on NMF versus LgPCA, we have made revisions. As above, if these measures are inadequate, we can always remove the NMF.
The analysis presented in Figure 1C compares the authors' data with that presented in the fetal datasets from the NIH Roadmap. The authors claim substantial up-regulation of sets of genes in their dataset but this is not quantified statistically (instead an arbitrary cut-off of >2-fold enrichment was employed). While there is relatively little data from which to perform a rigorous analysis it should be possible to use appropriate models (gaining power across genes such as GSEA) to find enriched categories in a more robust manner than presented here (i.e., this would allow appropriate correction for multiple testing).
Thank you for this valid point. We are grateful for the Referees noting the relative paucity of data for this analysis. Nevertheless, we feel that it is a useful early component of the paper to show that there are indeed differences between human embryonic datasets and those from later development from NIH Roadmap. We have undertaken alternative analyses based on statistical cut-offs and with methodology that facilitates multiple testing. These data are now included in the manuscript in place of the previous Figure 1C.
https://doi.org/10.7554/eLife.15657.024Article and author information
Author details
Funding
Wellcome Trust (WT088566MA)
- Dave T Gerrard
- Andrew A Berry
- Neil A Hanley
British Council (MR/J003352/1)
- Karen Piper Hanley
Medical Research Council (MR/L009986/1)
- Nicoletta Bobola
- Neil A Hanley
Wellcome Trust (WT097820MF)
- Neil A Hanley
British Council (14BX15NHBG)
- Neil A Hanley
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We are very grateful to all women who consented to take part in our research programme and for the assistance of research nurses and clinical colleagues at Central Manchester University Hospitals NHS Foundation Trust. We thank Ian Donaldson, Peter Briggs and Andy Hayes of the Bioinformatics and Genomic Technologies Core Facilities at the University of Manchester for assistance with RNA-sequencing. REJ is a UK Medical Research Council (MRC) clinical research training fellow. NAH is a Wellcome Trust senior fellow in clinical science (WT088566MA). This project received support from the Wellcome Trust (WT097820) with additional support from MRC project grants MR/L009986/1 to NB and NAH, the British Council (14BX15NHBG) to NAH and MR/J003352/1 to KPH.
Ethics
Human subjects: Human embryonic material was collected under ethical approval, informed consent and according to the Codes of Practice of the Human Tissue Authority (protocols number 13/NW/0205).
Reviewing Editor
- Janet Rossant, University of Toronto, Canada
Version history
- Received: February 29, 2016
- Accepted: July 18, 2016
- Version of Record published: August 24, 2016 (version 1)
Copyright
© 2016, Gerrard et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 8,133
- Page views
-
- 1,238
- Downloads
-
- 47
- Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Developmental Biology
- Neuroscience
The hippocampus executes crucial functions from declarative memory to adaptive behaviors associated with cognition and emotion. However, the mechanisms of how morphogenesis and functions along the hippocampal dorsoventral axis are differentiated and integrated are still largely unclear. Here, we show that Nr2f1 and Nr2f2 genes are distinctively expressed in the dorsal and ventral hippocampus, respectively. The loss of Nr2f2 results in ectopic CA1/CA3 domains in the ventral hippocampus. The deficiency of Nr2f1 leads to the failed specification of dorsal CA1, among which there are place cells. The deletion of both Nr2f genes causes almost agenesis of the hippocampus with abnormalities of trisynaptic circuit and adult neurogenesis. Moreover, Nr2f1/2 may cooperate to guarantee appropriate morphogenesis and function of the hippocampus by regulating the Lhx5-Lhx2 axis. Our findings revealed a novel mechanism that Nr2f1 and Nr2f2 converge to govern the differentiation and integration of distinct characteristics of the hippocampus in mice.
-
- Developmental Biology
- Evolutionary Biology
Gene expression has been employed for homologizing body regions across bilateria. The molecular comparison of vertebrate and fly brains has led to a number of disputed homology hypotheses. Data from the fly Drosophila melanogaster have recently been complemented by extensive data from the red flour beetle Tribolium castaneum with its more insect-typical development. In this review, we revisit the molecular mapping of the neuroectoderm of insects and vertebrates to reconsider homology hypotheses. We claim that the protocerebrum is non-segmental and homologous to the vertebrate fore- and midbrain. The boundary between antennal and ocular regions correspond to the vertebrate mid-hindbrain boundary while the deutocerebrum represents the anterior-most ganglion with serial homology to the trunk. The insect head placode is shares common embryonic origin with the vertebrate adenohypophyseal placode. Intriguingly, vertebrate eyes develop from a different region compared to the insect compound eyes calling organ homology into question. Finally, we suggest a molecular re-definition of the classic concepts of archi- and prosocerebrum.