1. Developmental Biology and Stem Cells
  2. Genomics and Evolutionary Biology
Download icon

Constraint and divergence of global gene expression in the mammalian embryo

  1. Noah Spies
  2. Cheryl L Smith
  3. Jesse M Rodriguez
  4. Julie C Baker
  5. Serafim Batzoglou
  6. Arend Sidow Is a corresponding author
  1. Stanford University School of Medicine, United States
  2. Stanford University, United States
Research Article
Cited
2
Views
1,883
Comments
0
Cite as: eLife 2015;4:e05538 doi: 10.7554/eLife.05538

Abstract

The effects of genetic variation on gene regulation in the developing mammalian embryo remain largely unexplored. To globally quantify these effects, we crossed two divergent mouse strains and asked how genotype of the mother or of the embryo drives gene expression phenotype genomewide. Embryonic expression of 331 genes depends on the genotype of the mother. Embryonic genotype controls allele-specific expression of 1594 genes and a highly overlapping set of cis-expression quantitative trait loci (eQTL). A marked paucity of trans-eQTL suggests that the widespread expression differences do not propagate through the embryonic gene regulatory network. The cis-eQTL genes exhibit lower-than-average evolutionary conservation and are depleted for developmental regulators, consistent with purifying selection acting on expression phenotype of pattern formation genes. The widespread effect of maternal and embryonic genotype in conjunction with the purifying selection we uncovered suggests that embryogenesis is an important and understudied reservoir of phenotypic variation.

https://doi.org/10.7554/eLife.05538.001

eLife digest

The way that the embryo of a mammal, such as a mouse or a human, develops from a fertilized egg is a complicated process that relies on controlling: which genes are active; when these genes activate; and for how long they are active. In broad terms, there are four ways that this control can be achieved:

First, inside the sperm or egg, genes can be marked with small chemical tags that flag these genes to be activated (or remain inactive) after fertilization, depending on whether the modification was made by the father (in the sperm) or the mother (in the egg); this process is known as ‘imprinting’. Second, the mother can alter the gene activity in her offspring via the placenta; this process is known as ‘maternal effect’. Third, instructions encoded within the embryo's DNA can directly control if, and when, a nearby gene becomes activated; this is known as ‘cis-regulation’. Finally, similar instructions can also control genes that are situated elsewhere in the embryo's DNA through indirect mechanisms; this is known as ‘trans-regulation’.

Now, Spies, Smith et al. have investigated these four processes in the offspring of two different strains of mice, one originally from Europe and the other from Southeast Asia. The two strains were crossbred and the resulting embryos were analyzed to see which of the four processes affected gene activity. This analysis revealed 31 imprinted genes and 331 genes that exhibited a maternal effect—which sometimes changed gene activity by as much as 52%. Spies, Smith et al. also found over a thousand DNA instructions in the embryo's DNA that could directly influence the activity of nearby genes, but fewer instructions that could indirectly control genes that were further away.

These results suggest that direct control of genes, which affects only the genes closest to the DNA instruction, can vary a lot between individual embryos of the same species. However, indirect control of embryonically active genes, which affects many genes across the genome at the same time, appears much more tightly constrained by evolutionary forces. Which genes in the mother are responsible for the molecular signals that drive the maternal effect is an important question for future work, with implications for the genetic basis of embryonic development and disease.

https://doi.org/10.7554/eLife.05538.002

Introduction

Multicellular animals develop via complex pattern formation processes that unfold during embryogenesis, with gene regulation playing a central role. Surprisingly, the impact of genetic variation on gene regulation and the relative importance of the genetic mechanisms underlying regulatory variation in mammalian embryogenesis remain undefined. There are four broad categories of genetic mechanisms of gene expression regulation: imprinting, parental effects, cis-regulation, and trans-regulation. All of these are potentially subject to natural selection and provide genetic windows into the forces that shape gene regulatory evolution. Here, we quantify their impact and relevance in mammalian embryogenesis by asking which of these genetic mechanisms reveal evolutionary divergence or constraint since the last common ancestor of two distantly related mouse strains, C57BL/6 (B6) and Cast/Ei (Cast).

Genomic imprinting is a well-studied form of parent-of-origin-dependent gene expression that produces strongly biased allelic expression patterns in offspring according to the parental source of the inherited allele. The progeny's imprinted gene expression patterns result from epigenetic modifications put in place in the previous generation and reflect the particular genetic architecture at these specialized loci. A recent genome-wide survey estimated that fewer than 200 genes are imprinted in mice and humans (Babak et al., 2008). Imprinting is thought to play a critical role in early embryonic and placental development in mammals and has been shown to influence post-natal health and behavior [Reviewed in Peters (2014)].

Related to but distinct from imprinting is the concept of parental effects. Parental effects produce phenotypic variation that depends, directly or indirectly, on the genotype of either the mother or father, rather than that of the offspring. Therefore, the resulting phenotypic patterns lag a generation behind the genetic transmission of the causal variants. The most well-studied parental genetic effects are caused by deposition of maternal transcripts into the egg prior to fertilization, resulting in differences in early embryonic development depending on the genotype of the mother. Certain genes have also been shown to respond to maternal influence after birth through genetically defined maternal behaviors (Weaver et al., 2004). Because of the small contribution, through the sperm, of the paternal transcriptome to the fertilized zygote, and because of the stronger maternal contribution to child rearing in most model organisms, parental effects are typically thought of as synonymous with maternal effects, although true paternal effects are known to exist (Rando, 2012).

Maternal effects have been shown to be important during embryonic development, leading to differences in the birth weight of mice depending on the genotype of the mother (Cowley et al., 1989; Wolf et al., 2011). However, neither the causal molecular mechanism for these effects in the mother nor the responding genes in the embryo have been identified and few studies to date have discriminated between prenatal and postnatal effects. Maternal effects likely contribute to the heritability of complex traits but have been confounded with family structure in mouse as well as human studies (Mott et al., 2014).

Genetic variants affecting gene expression are classified into cis or trans, depending on whether they act on the gene copy on the same chromatid (cis) or on both chromatids equally (trans). Regulatory variants near their target genes usually act in cis, by modifying the activity of a promoter or enhancer, but very occasionally act in trans if they affect a gene product that regulates a nearby locus. One approach to quantifying cis and trans regulation in model organisms involves assaying gene expression in two parental lines compared to allele-specific expression (ASE) in the hybrids (Wittkopp et al., 2004) and asking whether alleles are regulated differently in the parentals compared to the hybrids. In adult flies and mice, these analyses have demonstrated that cis and trans regulatory variants frequently target the same genes, often with opposite, compensatory effects on the target's expression (McManus et al., 2010; Goncalves et al., 2012). Recent genome-wide studies of ASE patterns across many individuals have demonstrated the ability of high-throughput sequencing methods to directly identify genes with cis-regulatory variants (Montgomery et al., 2010; Pickrell et al., 2010).

Another method to quantify cis- and trans-regulatory variation is analysis of expression quantitative trait loci (eQTL) (Schadt et al., 2003). eQTL are identified as genetic loci whose genotypes correlate with gene expression changes across a number of genetically heterogeneous individuals. Genome-wide eQTL studies have demonstrated that the strongest genetic variation in expression regulation generally occurs in variants located proximally to their target genes, presumably acting in cis. While typically fewer in number, trans-regulatory variants affect both alleles of each target gene and can affect expression of many genes simultaneously, resulting in a large aggregate effect.

ASE and eQTL approaches have been applied widely in adult tissues in humans and model organisms. However, these approaches have not yet been applied to understand the global relationship between genetic variation and gene expression during mammalian embryogenesis, and while imprinting has been studied extensively, the contribution of maternal effects on mammalian embryonic gene expression has never been addressed. For several reasons, it is likely that patterns of regulatory divergence differ between embryos and adults. First, eQTL studies have demonstrated dynamic trans-regulatory variation during Caenorhabditis elegans larval development (Francesconi and Lehner, 2014). Second, recent studies in flies and fish have demonstrated higher expression of more ancient conserved genes as well as more conserved gene expression patterns during embryogenesis compared to the adult (Domazet-Lošo and Tautz, 2010; Kalinka et al., 2010). This suggests that the number of cis- as well as trans-eQTL should be lower in embryos compared to adult.

Here, we quantify evolutionary gene regulatory divergence between B6 and Cast during embryogenesis, addressing each of the four regulatory architectures. We focus on embryonic day 11.5 (E11.5), which is equivalent to roughly 40 days of gestation in human. It is a time characterized by organogenesis, rapid growth of the brain and limbs, increasingly complex and localized pattern formation, and consequently substantial constraints on allowable phenotypic variation. An ASE/eQTL hybrid approach facilitates quantifying the different types of regulation and their effects on expression variation, revealing unique and important features of gene regulatory architecture during mammalian embryonic development.

Results

For this study, we chose two inbred strains derived from geographically separate subspecies of Mus musculus: C57BL/6J (B6), the classic inbred mouse reference strain, which was derived from European mice, and CAST/EiJ (Cast), which was derived from the Southeast Asian Mus musculus castaneus. Inbreeding caused random subsets of alleles segregating in their ancestral populations to become fixed, and as a consequence the two strains exhibit one single nucleotide polymorphism (SNP) difference every 120 bp on average (Keane et al., 2011).

We collected 16 E11.5 F1 embryos obtained from reciprocal crosses between B6 and Cast (Figure 1). We profiled genome-wide gene expression using a 3′-biased RNA sequencing method, 3SEQ (Beck et al., 2010), in order to maximize SNP coverage in the 3′ UTRs, where the average SNP density is ∼1 SNP per 170 bp (compared to 1 SNP per 265 bp in the coding sequence). Mapping of reads separately against the Cast (Keane et al., 2011) and B6 genomes allowed us to eliminate reference mapping bias (Figure 2—figure supplement 1) as the average fraction of reads supporting the B6 allele (B6 allele fraction, ‘BAF’) was 0.502. Overall, out of a possible 6917 genes with sufficient reads covering SNPs, 1594 (23%) showed significant ASE (p < 0.01, paired t-test, Bonferroni corrected; Figure 2A,B). We next backcrossed F1 mice (derived from female B6 by male Cast crosses) with B6 parentals to obtain N2 embryos. We harvested 154 N2s at E11.5, performed 3SEQ, and quantified ASE in the approximately 50% of embryos that were heterozygous at each gene. There was excellent agreement between average BAF estimates from F1s and N2s (Figure 2C; rho = 0.77).

Experimental design.

(A) Reciprocal crosses of B6 and Cast parents were used to generate F1 hybrid embryos. (B) F1 hybrid adults were backcrossed to B6 to generate N2 embryos via reciprocal crosses of B6 females with F1 males and F1 females with B6 males. Chromosomal regions inherited from a B6 parent are black and Cast regions are red. (C) Example genotypes at a controlling genetic locus, its corresponding target gene, and resulting gene expression levels for different types of gene regulatory variation assayed. Allele-specific expression (ASE) and imprinting are measured by allele-specific read counts, whereas other types measure total gene expression levels. X: X chromosome; Y: Y chromosome; m: mitochondrial DNA.

https://doi.org/10.7554/eLife.05538.003
Figure 2 with 3 supplements see all
ASE.

B6 allele fraction (BAF) for (A) non-ASE gene Stmn4, (B) ASE gene Ascc1. Each point represents a single embryo, grouped by cross (F1 embryos on left, N2 embryos on right). Cross mother is listed first. Bars are mean BAF and 95% confidence intervals. (C) Correlation of average BAF between F1 and N2 embryos. Significant ASE genes (called using union of F1 and N2 data) are red. Point size corresponds to relative expression level. (D) Transition from fetal (Hbb-bh1, Hbb-y) to adult (Hbb-b2, Hbb-b1) beta-globins, shown as fraction of total expression of all beta-globins, averaged across individuals by time point (somite number). (E and F) Divergent temporal regulation for B6 and Cast alleles of Hbb-b1 (E) and Hbb-b2 (F) in heterozygous N2 mice. (G and H) BAF for imprinted genes Igf2 (G) and H13 (H). (I) Significantly imprinted genes (red, with 95% confidence intervals; n = 31) clearly separate from non-imprinted genes (gray, n = 8261; Figure 2—figure supplement 2).

https://doi.org/10.7554/eLife.05538.004

To quantify developmental progression for each embryo at the time of harvest, we counted the number of somites, which form every 90–120 min (Saga and Takeda, 2001) and ranged from 42 to 57 in our embryos. Of the 7712 genes with moderate expression of at least 5 reads per million (RPM), 1796 genes showed significant correlation between total expression and somite number (q < 0.05, Spearman correlation). Genes changing over this developmental time course were enriched for a number of gene ontology (GO) categories indicative of important developmental processes occurring at E11.5, including synaptogenesis (q = 2.2 × 10−8, GOrilla) and eye lens formation (q = 5.8 × 10−7). A third significant GO category, blood microparticles (q = 9.6 × 10−7), includes the globins, which are beginning to undergo the transition from fetal to adult form over the developmental time period we assayed (Noordermeer and de Laat, 2008) (Figure 2D).

By correlating ASE with somite number, we identified twelve genes that show significant differential regulation between B6 and Cast alleles as a function of developmental timing (q < 0.10, Spearman rho). Globins and functionally related genes were well represented in this set, which included both major (Hbb-b1) and minor (Hbb-b2) adult beta globins. Both genes increase in absolute level and relative expression compared to fetal beta globins (Figure 2D; Figure 2—figure supplement 2), but the B6 allele of the major form is delayed compared to the Cast allele (Figure 2E; rho = 0.86, q < 10−14), whereas the opposite occurs for the minor form (Figure 2F; rho = −0.51, q = 0.02). By the end of our time course, the alleles are equally expressed (Figure 2E,F). Transferrin also increases in absolute expression but the Cast allele is delayed compared to the B6 allele (rho = −0.46, q = 0.02; Figure 2—figure supplement 2). These results suggest that compensatory mechanisms have evolved in one or both strains to keep the overall regulatory dynamic of oxygen physiology stable during this important switching period.

Comparison of embryos from either side of the reciprocal cross allowed us to identify parent-of-origin dependent ASE patterns indicative of imprinting. Consistent with previous studies in mouse embryos (Babak et al., 2008), we found a total of 31 genes with significant imprinting (Figure 2G–I; p < 0.01, binomial test, Bonferroni corrected), all of which had been previously identified. We note that the clear separation of most of the imprinted loci from the noise of the bulk of all expressed genes, with few exceptions (Figure 2I; Figure 2—figure supplement 3), suggests that most imprinting generally acts uniformly across the embryo, as expected from the methylation-based mechanism that sets the expression state of the regulated allele pre-zygotically (Reik et al., 1987).

Imprinting is not the only molecular mechanism that results from the mother–offspring resource allocation conflict in mammals (Moore and Haig, 1991). To explore other genetically-encoded parental effects, we took advantage of the backcross design, in which N2 offspring have genotypically distinct parents. We identified a large number (n = 331) of embryonically expressed genes whose total expression levels differed significantly depending on the genetic background of the mother (Mann Whitney U test, p < 0.01, Bonferroni corrected), irrespective of the genotype of the embryo (Figure 3A,B; Supplementary file 1). For example, Angptl4 expression increases 52% from an average of 107 RPM in embryos with an F1 mother to an average of 163 RPM in embryos with a B6 mother (Figure 3C). As for the magnitude of the effect, these maternal effect target genes exhibited up to a 3.5-fold (e.g., Polm: 2.5–11-fold at a 95% confidence interval) difference in total expression level depending on genotype of the mother.

Maternal effect target genes.

(A) Quantile–quantile (qq) plot for maternal effect target genes (black circles), compared to cis-expression quantitative trait loci (eQTL) (gray circles). (B) Expression by embryo for the top 20 maternal effect target genes, separated by maternal genotype. For each gene, samples are grouped by F1 mother (left) and B6 mother (right). (C) Total expression for Angptl4 for embryos grouped by embryonic genotype and maternal genotype. Error bars indicate mean and 95% confidence intervals.

https://doi.org/10.7554/eLife.05538.008

We were able to exclude a number of alternative mechanisms to explain these maternal effects. All F1s used for the backcross were derived from B6 female by Cast male matings, and therefore all N2 embryos carry the B6 mitochondrial genome, ruling out a confounding mitochondrial effect. The maternal effects were unchanged when considering sex, X chromosome genotype, litter, and developmental stage. Furthermore, for the vast majority of genes with SNP coverage, we were able to verify that maternal effect target genes were not imprinted or the result of contamination by maternally expressed transcripts. We found that maternal genotype by itself was a better predictor of the maternal effects than maternal genotype in conjunction with embryonic genotype at any of the imprinted loci, indicating that the maternal effects were not the result of trans effects downstream of imprinted genes. To exclude the possibility of a batch effect in our 3SEQ data, we used quantitative RT-PCR (qPCR) to confirm 4 genes as maternal effect target genes.

The most significant of the maternal effect target genes, Angptl4 (Figure 3C; p = 8 × 10−13 by 3SEQ, p = 1 × 10−11 by qPCR), functions in lipid metabolism and energy homeostasis, suggestive of a role in responding to the placental nutrient supply (Yoshida et al., 2002). Similarly, Gpx3 (Figure 3B; p = 5 × 10−12 by 3SEQ, p = 6.6 × 10−4 by qPCR), a glutathione peroxidase, is an antioxidant enzyme that acts in fat and glucose metabolism and has been associated with obesity and type 2 diabetes in mouse and human (Lee et al., 2008; Chung et al., 2009). Both of these genes are transcriptionally regulated by the peroxisome proliferator-activated receptor PPARγ.

To examine how genetic variation carried by the embryo itself affects gene expression phenotype, we performed eQTL analyses by correlating expression levels of each measurably expressed gene (‘target’) with N2 embryo genotypes (either B6/Cast or B6/B6) at each recombination-defined locus (‘controlling locus’). Genotyping and recombination analysis based on the expressed SNPs (Figure 4A) divided the autosomes into 986 marker region blocks, each separated by one or more recombination events that define candidate controlling loci. We tested every controlling locus for correlation with gene expression of every gene. At a genome-wide significance level of q < 0.01, we identified a total of 1034 eQTL.

Figure 4 with 1 supplement see all
eQTL.

(A) Example N2 embryo data used to call recombination events for chromosome 2. Each point represents BAF for a single gene. The HMM calls green regions as heterozygous, blue regions as homozygous, with two recombination events occurring in the short intervening white regions. Only genes with sufficient read coverage of single nucleotide polymorphisms (SNPs) are shown as points; genes with under 20 SNP reads are still genotyped, based on neighboring genes. (B) Overview of all eQTL, arranged by genomic position, shaded by log10 p-value. cis-eQTL, along the diagonal, show the only appreciable signal. (C) Predicted BAF from transformed expression levels in heterozygotes and homozygotes (x-axis) correlate with BAF measured by ASE (y-axis).

https://doi.org/10.7554/eLife.05538.009

1017 of these eQTL (98%) mapped close to the target gene (Figure 4B), suggesting cis-regulation as the underlying mechanism. A gene's difference in expression between heterozygotes and homozygotes could accurately predict its allelic ratio as assayed by ASE (Figure 4C). This high correlation (rho = 0.58, n = 4821; rho = 0.70 when considering only the n = 1000 genes with the highest allele-specific read coverage) provides a systematic validation of the large effect of embryonic cis-regulatory polymorphisms on expression phenotype. Note that our study has sufficient power to identify substantial numbers of eQTL when performing this all–loci–by–all–genes analysis. If, instead, we were focused specifically on only studying cis-regulation, we could have increased our power to identify cis-eQTL by only comparing genes to their nearby loci, thereby reducing the number of statistical tests being performed.

In contrast to the large number of cis-eQTL, we found few trans-eQTL regardless of FDR cutoff (Figure 5A). After the top 3 (each comprising a controlling locus–target gene pair, e.g., Figure 5B), significance drops off rapidly compared to background, unlike for the cis-eQTL (Figure 5C). To explore this lack of trans effects further, we reasoned that a set of genes with a high prior probability of having downstream effects might be enriched for trans-controlling loci. Our cis-eQTL/ASE loci comprise a large set of such candidates because differences in expression levels of a gene product may propagate to downstream targets (Yvert et al., 2003). However, we found that neither strong cis-eQTL nor ASE loci are enriched for trans-eQTL. Finally, application of a random forest model (Michaelson et al., 2010; Francesconi and Lehner, 2014) that simultaneously considers developmental progression (number of somites) and genotype also did not enrich for trans-eQTL (Figure 4—figure supplement 1). We estimate that the number of comparable trans-eQTL is about 1% that of cis-eQTL.

Trans-eQTL.

(A) Number of cis-eQTL (black) and trans-eQTL (red) as a function of FDR. Dashed line indicates number of false positive eQTL expected. (B) eQTL for Pop1, the most significant trans-eQTL target (vertical blue line). Candidate controlling loci are shown across the x-axis in genome order; only the peak on Chr14 is significant. Chromosomes are shaded in alternating background colors. (C) qq plot for all gene–marker pairs (black) and gene–marker pairs only in trans (red). Gray straight lines are y = x, x axis is expected p-values based on permutations. (D) Trans-eQTL hotspots, shown as the distribution of significant target genes per marker region (p < 0.005) across all marker regions (red) or only marker regions containing the top 50% of ASE genes (blue). Gray shows the expected distribution if there were no hotspots. (E and F) Comparison of our study with that of eQTL in adult tissues as a function of FDR. (E), Trans-eQTL as a proportion of total eQTL. (F), total number of eQTL. GeneNetwork sample ID indicated in parentheses. (G) Evolutionary constraint (GERP score) in coding regions (CDS) for the most significant (red; n = 1347) and non-significant (blue; n = 3726) cis-eQTL genes.

https://doi.org/10.7554/eLife.05538.011

Because a trans-regulatory effect is potentially distributed over several target loci, we asked whether there was evidence for ‘hotspots’, or controlling loci whose genotypes correlate with small expression differences at numerous target genes (Brem et al., 2002; Schadt et al., 2003). Permutation tests revealed a significant signal (p < 10−24 KS test, Figure 5D) in support of hotspots, but no specific candidates. Reinforcing the above result that ASE genes do not act as trans-controlling loci, strong ASE genes were not substantially more likely to act as hotspots than weaker ASE genes (Figure 5D; p = 0.20, KS test). Therefore, at this stage of embryogenesis, we found no evidence for downstream propagation of genetically controlled expression level differences.

We hypothesized that the paucity of trans-eQTL was due to increased evolutionary constraint of regulatory networks during embryonic development. Since trans-eQTL have previously been found to have weaker effects than cis-eQTL (Montgomery and Dermitzakis, 2011), an alternative hypothesis is that we were simply underpowered to find trans-eQTL. To distinguish these possibilities, we obtained genotype and expression data from previously published eQTL studies performed in adult mouse tissues (Wang et al., 2006; Yang et al., 2006; Chen et al., 2008). After sampling individuals to match the same number of total eQTL found at an FDR of 0.05, we found that all previously published studies conducted in adult tissues showed a substantially higher proportion of trans-eQTL than did our embryos (Figure 5E,F). This result was robust to varying the sample size from the previously published data, to whether p-values or FDR was used to assess significance, and to varying the thresholds for comparing proportions of trans-eQTL across the ranked eQTL. Based on this result and our high power to detect cis-eQTL even at low FDR, we conclude that embryonic trans-eQTL are subject to elevated selective constraint compared to adult tissues.

If cis-regulatory divergence in embryos is strongly buffered and does not lead to downstream trans-regulatory effects, why does embryogenesis tolerate such a large number of cis-effects in the first place? We hypothesized that divergent cis-regulation between strains occurs preferentially in genes with lower evolutionary constraint. In support of this hypothesis, ASE genes were significantly depleted of GO categories related to organ development (p = 6 × 10−5, q = 0.05), as well as gene regulation (p = 1 × 10−4, q = 0.07), and transcription factors (p = 3 × 10−5, q = 0.03), demonstrating constraints on variation in core regulatory genes. Furthermore, we found that evolutionary constraint at the DNA sequence level (Cooper et al., 2005; Davydov et al., 2010) was lowest in the most significant cis-eQTL (Figure 5G; rho = −0.14, p = 6 × 10−13; n = 2788). Similarly, ASE magnitude of effect correlated negatively with evolutionary constraint (rho = −0.10, p = 9 × 10−8, n = 2612), whereas cis-eQTL in adult mouse tissues showed no correlation (|rho| < 0.05). Thus, the strong enrichment for cis-eQTL/ASE in less conserved genes is unique to embryos.

Discussion

We have performed the first, to our knowledge, comprehensive global analysis of the effects of genetic variation on mammalian gene expression during embryogenesis. Our experimental design allowed us to simultaneously assay the effects of genetic variation on imprinting, maternal effects, cis-regulation, and trans-regulation. Supporting the high quality of our data and analytical methods, we found good agreement between related measurements of cis-regulatory variation when comparing ASE in F1 embryos (sequenced at high coverage) with ASE in N2 embryos (sequencing many individuals), and when comparing ASE with cis-eQTL.

Our two mouse strains exhibit SNPs at 17.7 million sites, about sixfold more than the heterozygosity in African humans (1000 Genomes Project Consortium, 2012). The vast majority of these polymorphisms have accumulated as a result of genetic drift (Kimura, 1983) over the time since the present-day European and Southeast Asian M. musculus populations were separated from their common ancestral population. Of the small subset of polymorphisms that affect molecular function, some may produce an advantageous phenotype subject to positive selection, some are deleterious to organismal fitness and are therefore subject to purifying selection, and yet others are selectively neutral (subject to drift). Missense polymorphisms are severely underrepresented genome-wide and exhibit lower derived allele frequencies relative to other genomic sites (1000 Genomes Project Consortium, 2012), reflecting the global predominance of purifying selection. This signal of purifying selection is weaker but still substantial for noncoding gene regulatory sites (ENCODE Project Consortium, 2007).

Given these known dynamics, our study sheds light on the microevolution of gene regulation. Gene expression divergence of the type we quantified in this work occurs as a result of polymorphisms in regulatory regions or in expressed regulatory gene products. The former manifests as cis-effects, the latter as trans-effects. Because we focused on embryonic gene expression, two competing hypotheses as to what patterns we might find were plausible. Under the first hypothesis, the substantial DNA sequence divergence between mouse strains would be reflected in many expression differences. The alternative hypothesis is that few expression changes would be detected between strains. This hypothesis is suggested by the developmental hourglass model (Domazet-Lošo and Tautz, 2010; Kalinka et al., 2010), which posits that gene expression patterns are most highly conserved in the middle of embryonic development (the so-called ‘phylotypic stage’).

In support of the first hypothesis, we found substantial numbers of significant eQTL, demonstrating that a sizeable subset of all segregating polymorphisms causes expression changes. However, we found that nearly all acted in cis, rather than trans. This depletion of trans-regulatory variation is substantially stronger than that observed across several adult mouse tissues. In contrast to adult tissues, eQTL acting in our embryos were correlated with lower DNA sequence conservation, supporting the conclusion that evolutionary constraint on gene expression regulation is markedly stronger during mammalian embryogenesis than in the adult. In addition, the identified cis-expression changes do not themselves cause detectable trans-effects. These findings support the idea of the developmental hourglass model (Domazet-Lošo and Tautz, 2010; Kalinka et al., 2010) while suggesting that substantial neutral divergence of molecular phenotype (gene expression) can accumulate in the form of cis-regulatory variation even under strong evolutionary constraint at the organismal (embryonic) level.

Previous studies have shown enrichment for trans-regulation when using an eQTL model that considers both genotype and stage during C. elegans larval development. Perhaps because of the nearness of our embryos to the mouse phylotypic stage, or because of fundamental differences between mice and worms, we found that incorporating developmental timing (i.e., somite number) into our eQTL model did not alter the balance between cis- and trans-regulation. Somite number was, however, correlated with significant changes in ASE, indicating differential developmental timing of gene expression between the Cast and B6 strains. In particular, we found that alleles of several hemoglobin-related genes, including the adult beta globins and transferrin, were expressed at differing levels between the strains at early stages of the fetal-to-adult hemoglobin transition, but then equalize by the end of our developmental time course. These results demonstrate our ability to identify subtle tissue-specific expression differences between strains, despite sequencing from whole embryos containing a mix of tissues.

It is formally possible that some of the significant eQTL identified here act in trans, even though their target genes are located in genomic proximity. However, the good agreement between eQTL and ASE suggests that the majority of our local eQTL are indeed cis-eQTL. Furthermore, because there is no reason to believe that the proportion of locally-situated but trans-acting eQTL is higher in embryos than in adults, it does not materially alter our conclusions if a small number of our local eQTL do act in trans.

Because of the large number of embryos collected and individually sequenced for our study, we were able to accurately quantify ASE mean and standard deviation for thousands of genes. When stratifying embryos by side of cross, we found that nearly all imprinted genes showed very strong allelic bias, meaning that the epigenetically silenced allele is expressed at only low levels across the embryo. Consistent with previous studies that found few genes exhibiting tissue-specific imprinting [apart from those found in the extra-embryonic tissues; reviewed in Prickett and Oakey (2012)], we observed no weak allelic bias that might result from averaging effects across tissues.

In addition to the clear signature of parent-of-origin-dependent allelic expression, we also found strong maternal effects on total gene expression levels for hundreds of genes. Whereas most previous studies have employed F1 intercrosses in which parental genotypes are indistinguishable, we used a backcross design, enabling us to track the potentially important sources of genetic variance contributed by the parents. The observed maternal effects are restricted to pre-natal influences and separate from imprinting and mitochondrial effects.

Our results demonstrate that the genotype of the mother plays an important role in regulating gene expression, not just after birth, as has been previously suggested, but also during embryogenesis, affecting the expression of hundreds of genes (Bult and Lynch, 1997; Weaver et al., 2004; Wolf et al., 2011). The implications of this finding are significant for human genetic twin, familial, and genome-wide association studies. What is usually thought of as familial environment [e.g., Grundberg et al. (2012)] may have a substantial heritable genetic component exerting an effect not on the generation carrying it but on the offspring.

Parental genetic influences on offspring phenotype have been implicated in growth regulation and adult-onset obesity- and diabetes-related diseases but the gene targets of such effects have not been identified (Jarvis et al., 2005). While we cannot rule out an epigenetic or RNA-mediated mechanism underlying this regulation, it is more likely acting through the placenta's role of transmitting nutrients, hormones or other factors to the embryo. This hypothesis is supported by the known functions of the most significant target gene, Angptl4, which responds to lipid metabolism and energy homeostasis, and participates in angiogenesis (Koster et al., 2005; Scott et al., 2012). We hypothesize that these maternal effect target genes continue to be expressed differently depending on maternal genotype, and may be at least partly responsible for the previously observed effect of maternal genotype on pre-natal growth (Cowley et al., 1989; Wolf et al., 2011).

In summary, using our hybrid ASE/eQTL approach we were able to assay all four types of genetic variation in gene expression regulation. We found that cis-regulatory variation caused the largest divergence in overall gene expression in mouse embryos, affecting over 1000 genes with up to sevenfold differences in expression levels between the strains. These cis-regulatory variants did not translate into trans-regulatory divergence, and we identified only ∼10 trans-eQTL genomewide. Imprinting affected the allelic balance of dozens of genes, generally resulting in complete or nearly complete silencing of the inactive allele. Finally, we identified over 300 maternal effect target genes. The number of maternal effect targets, and their overall significance, are only slightly more modest than cis-regulatory effects, but nonetheless substantial, with approximately 1/3 as many genes involved, and maximal effect size of up to 3.5-fold difference or approximately 1/2 as large. Given this context, it is clear that maternal effects can play a substantial genetic role in determining embryonic gene expression patterns. We speculate that maternal effects contribute to evolutionary divergence, allowing expression heterogeneity in the embryo to build as a result of differences in the adult mother, where gene regulatory networks show less evolutionary constraint than in the developing embryo.

Materials and methods

Mouse strains and crosses

6-week old mice were purchased from The Jackson Laboratory. All ‘Materials and methods’ were carried out in accordance with the Administrative Panel on Laboratory Animal Care protocol and the institutional guidelines set by the Veterinary Service Center at Stanford University.

The inbred strain C57BL/6J and wild-derived inbred strain CAST/EiJ were used in reciprocal crosses (Figure 1). C57BL/6J females were crossed with CAST/EiJ males to generate (B6 × Cast)F1 hybrid embryos (n = 8), and CAST/EiJ females were crossed with C57BL/6J males to generate (Cast × B6)F1 hybrid embryos (n = 8). (B6 × Cast)F1 progeny were backcrossed to C57BL/6J in reciprocal crosses to generate 154 N2 embryos (F1 × B6 = 65 and B6 × F1 = 89). Individual embryos from 2 litters from each side of the F1 cross (2 B6 × Cast and 2 Cast × B6), and 12 F1 × B6 litters and 18 B6 × F1 litters from the N2 cross were analyzed.

For all crosses, following identification of a vaginal plug on the morning after mating (E0.5), males were removed; pregnant females were housed separately.

Whole embryos were harvested at E11.5 and dissected to remove all embryonic membrane and placental material. Somite numbers were recorded for each embryo at the time of dissection.

Library preparation and sequencing

To isolate total RNA, individual embryos were homogenized in Trizol reagent (GibcoBRL/Invitrogen, Grand Island, NY) and RNA was extracted per manufacturer's directions. Poly(A)+ RNA was isolated using the Qiagen Oligotex Mini kit and 3SEQ libraries were constructed as described (Beck et al., 2010). For the N2 embryos, oligo(dT) and PCR primers were barcoded with Illumina multiplexing sequences. F1 data were produced in several sequencing batches, first on an Illumina GAIIx sequencer producing 76 bp or 80 bp reads, then on an Illumina HiSeq 2000 sequencer producing 101 bp reads. N2 data were produced only on the HiSeq 2000. Libraries from 12 individually barcoded N2 samples were pooled in equimolar amounts and sequenced in a single flow-cell lane to produce single-end 101 bp reads.

Read preprocessing

Prior to read mapping, 3′ poly(A) sequences likely to be derived from the mRNA poly(A) tail were trimmed using a procedure that tolerated some amount of sequencing error: we computed a score si for each position i in the read which was equal to the sum of the quality scores of each A after position i minus three times the sum of the quality scores of each non-A after position i. All bases after the highest-scoring position were trimmed if this would result in trimming at least 3 bases.

The first 18 bases and the last 7 bases of the HiSeq F1 reads were trimmed to make them the same length as the GAIIx reads. This was done to avoid read length being a confounding factor when analyzing ASE in F1. Only poly(A) stretches were trimmed from the N2 data.

Read alignment and filtering

A M. m. castaneus genome was created in silico by applying the high-quality Cast SNPs (Keane et al., 2011) to the reference C57BL/6 genome mm9. For B6, we used the mm9 genome. For each of these two reference genomes, a transcriptome was computed using RefSeq annotations, including spliced transcripts of all genes separated by 200 nucleotides, and each spliced transcriptome was added to its reference genome sequence. Bowtie2 (Langmead and Salzberg, 2012) was used to map all reads to the composite genomes (Cast or B6 including spliced transcripts) with settings ‘-k 200 –score-min L,0,-0.10 –rdg 2,2 –rfg 2,2 –mp 1,1’. These settings recovered up to 200 sub-optimal genomic hits, allowing us to stringently filter reads derived from repetitive regions of the genome as follows: If the difference in edit distance (1 for each mismatch and 2 for each indel) between a read's best alignment and its second-best alignment was within 3, the read was discarded. Reads were also filtered out unless all best alignments in the composite genome corresponded to the same genomic position (e.g., if a read aligned equally well to an intergenic region and to a splice junction in the transcriptome). Each aligned read was then consolidated into a single genomic alignment, including intronic splice gaps.

After filtering, each F1 dataset had an average of 15.8 million mapped reads, and each N2 dataset had an average of 1.2 million reads.

Expression quantification

Gene expression was quantified using only reads mapping to exons. Read counts were typically normalized by the total number of sequenced reads per library, multiplied by 1 × 106, resulting in RPM. For the eQTL analyses, genes were removed if over 5% of the mapped reads were filtered based on the second-best hit criteria above. Genes were also discarded if they matched the retrotransposed genes list compiled by UCSC (Kent et al., 2002). In the F1s, there were 13,574 genes with detectable expression (13,469 across N2s), 7554 of which passed the analysis threshold of 5 RPM (7712 in N2s).

We found Xist expression followed a bimodal distribution, and used this to identify females (robust Xist expression) and males (little or no Xist expression). All 5 genes significantly different between males and females, including Xist and its antisense transcript Tsix, were located on the X or Y chromosomes. At E11.5, embryos are just beginning sexual differentiation, so the small number of sex-specifically expressed genes is unsurprising. Because of the small number of affected genes, as well as the repetitiveness of the sex chromosomes, subsequent analyses were restricted to genes and marker loci on autosomes.

Genotyping

Genes were assigned a preliminary genotype based on the proportion of B6-supporting reads, either ‘unknown’ if <20 SNP-covering reads were found, or homozygous or heterozygous otherwise. For each chromosome, preliminary genotypes were then fed through a Hidden Markov Model (HMM) that penalized transitions between the heterozygous and homozygous states. The HMM in effect smoothed the genotypes and allowed us to infer genotypes for unknown genes falling between confidently genotyped genes. A post-processing step assigned the ‘unknown’ genotype to genes with low read coverage falling near putative recombination sites, typically affecting only a small number of genes for each chromosome.

We combined adjacent genes with identical genotypes across all embryos into a total of 986 marker regions across the 19 autosomes. A more aggressive combining, allowing up to 5 recombination events across all N2 embryos, resulted in 286 marker regions. Downstream analyses used the consensus genotype in each embryo, assigning the unknown genotype where the number of homozygous and heterozygous embryos was close to equal.

GO

GO enrichment analyses were performed with GOrilla (Eden et al., 2009), which enables easy visual inspection of results, and depletion analyses were performed using a hypergeometric test. Only expressed genes (or those with allele-specific read coverage) were considered.

ASE

To avoid the possibility of small insertions and deletions causing misalignment around SNPs, ASE was quantified using only high quality annotated SNPs that were at least 50 bp from an annotated Cast indel of any quality, based on the published mouse polymorphism resource (Keane et al., 2011). For reads covering multiple SNPs, each SNP received 1/n votes for a given genotype (Cast, B6 or other), to account for the low percentage of SNP positions with sequencing errors. For example, a read covering 3 SNPs, 2 of which match B6 alleles and 1 that matches neither the B6 nor Cast allele, would contribute 0.67 read votes to the B6 expression total. A read covering only a single SNP matching the Cast allele, would contribute a full 1 read vote to the Cast expression.

Because F1 and N2 ASE estimates largely agreed, ASE significance was estimated from the 170 pooled samples. Significance was assessed by subtracting the B6 read votes from the Cast read votes, and then normalizing the differences based on library-size. This was necessary because the F1 samples were sequenced to substantially more depth than the N2 samples.

A paired t-test was then applied to test for significance of these differences against 0. This simple approach takes full advantage of the magnitude of difference between the alleles as well as the variance in allelic expression across the large number of samples sequenced in this study. We found that p-values calculated from the paired t-test correlated well with those derived by resampling, but were somewhat conservative. Hence, our estimates of ASE are likely to be slight underestimates. Additionally, genes were only called significant if their average BAF was less than 0.45 or greater than 0.55.

Maternal effect target genes

Maternal effect target genes were identified by comparing expression normalized by library size from embryos with B6/B6 mothers to expression values from embryos with B6/Cast F1 mothers. We used a Mann–Whitney U test to calculate the significance of this difference.

Because all F1 mothers used for the N2 backcross were derived from the B6 × Cast side of the cross, all N2 embryos had B6 mitochondria. Maternal effects are similar in significance and direction when restricting analyses to only female embryos with two B6 X chromosomes, or when averaging expression across individuals in each litter. Furthermore, the effects are consistent across somite numbers, even for genes that change expression during development.

qPCR

To exclude batch effects as a potential cause of maternal effects, we used qPCR to validate candidate maternal effect target genes on an orthogonal platform. 0.5 μg total RNA per sample was reverse-transcribed into cDNA using Oligo(dT)20 primer and Superscript III (Invitrogen) in 20 μl reactions, according to the manufacturer's instructions. Following reverse transcription, each sample was treated with RNase H. 48.48 microfluidic dynamic array IFC chips (Fluidigm) were used to analyze the expression of 26 candidate maternal effect target genes and 15 control reference genes in 93 N2 samples. 1 μl of cDNA was pre-amplified using 2× Taqman PreAmp Master Mix (Lifetech) and 50 nM of each primer pair in 5 μl reaction volume, according to the manufacturer's instructions. The cycling program was 10 min at 95°C followed by 10 cycles of 15 s at 95°C and 1 min at 60°C. Following pre-amplification, each reaction was diluted fivefold. RT-qPCR on the dynamic array chips was conducted on the BioMark system (Fluidigm). 5 μl sample pre-mix containing 2.5 μl of SsoFast EvaGreen Supermix with Low ROX (Bio-Rad), 0.25 μl of DNA Binding Dye Sample Loading Reagent (Fluidigm) and 2.25 μl of diluted pre-amplification samples, as well as 5 μl assay mix containing 2.5 μl of Assay Loading Reagent (Fluidigm), 2.25 μl EB Buffer (Qiagen) and 0.25 μl of 100 μM primer pairs (500 nM in the final reaction) were mixed on the chip using the IFC controller MX (Fluidigm). The thermal cycle was 60 s at 95°C followed by 30 cycles of 5 s at 96°C and 20 s at 60°C. A dissociation curve was also drawn for each primer pair.

The following eight control genes were chosen based on high consistency between Fluidigm chips: Smarcc2, Rnf216, Nxf1, Cnpy3, Pmvk, Trmt1, Fam149b, Farp1. These genes were used to normalize expression of the candidate maternal effect target genes using the standard curve method. Significance of the maternal effect was calculated using the Mann–Whitney U test on the normalized expression values using a Bonferroni-corrected p = 0.05 cutoff. Because of low consistency between Fluidigm chips, we were only able to validate four maternal effect candidate genes.

eQTL

Between-sample read count comparisons showed some evidence of batch effects across N2 embryos, primarily at the level of sequencing lane. To control for these batch effects, we used PEER (Stegle et al., 2012) to normalize for both unknown covariates as well as the following known batch effects: embryo sex, sequencing lane, and somite count. For each gene and for each marker region, we performed a Wilcoxon rank-sum test comparing PEER-normalized expression in heterozygous and homozygous embryos. We found that the total number of significant eQTL was substantially higher using PEER-normalized read counts compared to simple library-normalized read counts, although the fraction of trans-eQTL was not significantly different. Results were also consistent when replacing the non-parametric ranksum test with the parametric t-test.

To remove the effects of correlated markers, downstream analyses used the most significant marker region on each chromosome for each target gene. To estimate false discovery rates, marker–sample assignments were permuted and eQTL were called using an identical procedure to the unpermuted data. Cis-eQTL were defined as those eQTL for which the marker region resides on the same chromosome as the target gene, and trans-eQTL were thus defined as the eQTL for which the marker region is on a different chromosome as the target gene. While this is a conservative definition of trans-eQTL, we found no evidence of trans-eQTL residing on the same chromosome as the target gene.

ASE vs eQTL comparison

We estimated expected ASE (BAFpred) from total expression levels using the following transformation for each gene:

BAFpred=BAFexp2×(1BAFexp), where BAFexp=μB6/B6μB6/B6+μB6/BCast,

and μg are the mean total expression levels across all mice with genotype g at the gene of interest. The BAFpred transformation is required to estimate μCast/Cast from μB6/B6 and μB6/Cast. Because of variance in the estimation of μg, the predicted BAF can be larger than 1.0.

eQTL comparison with previous studies

We reanalyzed six previously published mouse eQTL datasets available from GeneNetwork (Wang et al., 2003). Array-normalized data were batch-normalized using PEER, then compared to genotype data provided by GeneNetwork. Because these previous studies used an intercross model with three possible genotypes, eQTL were called using a linear regression, analogous to the t-test in our backcross experiment.

Random forests

eQTL were also mapped using a random forest method (Michaelson et al., 2010). We used the normalized read counts as input to the random forest, and included marker genotype, sex, side of the cross, sequencing lane, and number of somites as predictors. For 5 N2 embryos, no somite count was recorded, and hence the total number of input samples is 149 instead of 154. Genotypes were coded as 0.5 for heterozygous B6/Cast, 1.0 for homozygous B6/B6, and 0.75 for missing genotypes. False discovery rates were calculated as above using permuted input data after identifying the best markers per chromosome. We found results were similar when using the variable selection frequency method (Michaelson et al., 2010) or the mean decrease in prediction accuracy method (Francesconi and Lehner, 2014).

Conservation

We found GERP scores broadly correlated among upstream, UTR, coding and downstream gene regions, and therefore show only the average coding sequence GERP score. We performed a Spearman correlation test to examine the hypothesis that GERP score is associated with cis-eQTL p-value and ASE effect size, although for clarity Figure 5G focuses only on the most extreme genes. Because the set of very lowly expressed genes is enriched for low GERP scores, we filtered out the lowest expressed genes in each tissue. We used RNA-seq data previously published (Merkin et al., 2012) for muscle, liver and brain to filter out lowly expressed genes in those tissue comparisons. The strength of the associations with GERP scores for our ASE and cis-eQTL as well as previously published cis-eQTL is robust to all but the most extreme changes in filtering cutoff.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
    The impact of maternal uterine genotype on postnatal growth and adult body size in mice
    1. DE Cowley
    2. D Pomp
    3. WR Atchley
    4. EJ Eisen
    5. D Hawkins-Brown
    (1989)
    Genetics 122:193–203.
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
    The neutral theory of molecular evolution
    1. M Kimura
    (1983)
    Cambridge University Press.
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49

Decision letter

  1. Anne C Ferguson-Smith
    Reviewing Editor; University of Cambridge, United Kingdom

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for sending your work entitled “Constraint and divergence of global gene expression in the mammalian embryo” for consideration at eLife. Your article has been evaluated by Chris Ponting (Senior editor), a Reviewing editor and three reviewers.

As listed below, the reviewers have communicated quite detailed major concerns about the manuscript. We would like to provide the opportunity for you to address each of the points raised and to provide an extensively revised manuscript before making a final decision. The Reviewing editor and the reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.

Major concerns:

1) The title is provocative, without obvious follow-up. ‘Convergence’ and ‘divergence’ has implications about genetic selection and systems properties. The reviewers expected a definition of what these concepts mean in this context, how the study design would address these questions, and then interpretation of how the results provide insight. This was not evident in this manuscript. The primary reference seems to be in the Introduction, where two definitions of divergence are provided: first, it refers to changes in embryonic development resulting from genotype-specific differences in maternal transcripts present in mature oocytes at fertilization, and second, divergence here seems to be differences in regulatory control between embryos and adults. Both have been extensively studied, first to understand the maternal:zygotic transition in gene expression patterns, and second to characterize differentiation of cells, organs, and tissues. Resolving these and other forms of direct (genetic) and indirect (epigenetic, social factors is beyond the scope of the present study design, except with weak ad hoc arguments.

2A) No mention about whether sex of embryos was determined, and results analyzed separately for males and females. Considerable literature shows that sex affects expression of many genes. If the authors have evidence that embryo sex is not important in this context, evidence and arguments need to be presented (see also comment 10 below).

B) Apparently RNAs from whole embryos were analyzed. But as the authors acknowledge, organogenesis is active at E11.5. So why study whole embryos rather than specific organs and tissues?

C) Paternal stress is an established factor affecting offspring phenotypes and molecular features. Typically these paternal factors are controlled by removing males from mating cages immediately after fertilization. Alternatively, in vitro fertilization is sometimes used. Regardless, these kinds of factors needed to be controlled in order to make strong conclusions about the nature of maternal effects.

3) The aspect of the paper that we found disappointing was the analysis of gene expression dynamics. The reviewers considered that the authors could get much more from this dataset. For example, they don't test for interactions between genotype and time/stage (somite number, this could be done quite easily even using PEER) and there is no consideration of non-motonic time trends etc. In short, we think the authors could likely find much more about how genetic variation influences expression dynamics by implementing something similar to the ideas of Francesconi (Nat 2014).

4) The next recommendation would be to try and get more from the comparison of ASE and eQTLs. The correlation is weak (R^2 =∼34% variance explained). What is the explanation for this? Biological or technical? Likely there is something interesting to say biological here about the causes of the differences between the hybrids and the recombinants.

5) In general, the results are not too surprising; for example, the observation (Results, third paragraph) that the expression of key developmental genes changes through development is to be expected, we are not sure this warrants a main text figure. Moreover, a significant number of eQTL have been found in other tissues/cell lines in mammals (multiple papers from the Dermitzakis and Pritchard labs), the majority of which have been shown to regulate gene expression in cis, so the novelty of this part of the manuscript is small.

6) Care must be taken when defining cis-regulatory variants identified from previous RNA-seq studies (Montgomery et al.; Pickrell et al.), as you have done in the Introduction. In particular, these studies focused on correlating variants near the gene of interest with its expression across individuals. In a number of cases, it was shown that the effect size of the eQTL was correlated with allele-specific expression levels, strongly suggesting that many variants act in cis. However, in most previous eQTL studies, variants that act in cis have typically been defined as variants that are proximal to the gene of interest. Some clarification of this subtle, but important distinction, would be useful here.

7) When you mention, in the Introduction, that, at E11.5, gastrulation has already occurred, and the embryo is already composed of multiple different cell types, how much of a concern is it that, across N2 embryos, the rate of development may differ (perhaps because of the genetic background), which might lead to heterogeneity in the composition of cell types profiled across embryos?

8) Results, second paragraph: Why was a paired t-test as opposed to a binomial test (perhaps with a beta prior to model over-dispersion) employed here? The latter seems a much more natural approach given the data.

9) Results, fourth paragraph: It would be interesting to study the overall expression levels of the genes considered here in a homogeneous genetic background (i.e., in the F0s). In particular, this would allow one to study how the expression of Hbb-b1 and Hbb-b2 differed during development—is it also the case that, in BL6, Hbb-b1 increases in expression during development while Hbb-b2 decreases? This would provide additional mechanistic insight and provide support for the compensatory mechanism alluded to in the last sentence.

10) Figure 3 and Results: What explains the strange patterns in Figure 3B where there seems to be a bimodal pattern of expression (for both alleles) for a subset of the genes (e.g., Zkscan5, Polm, Eif2ak and Rhob)? Is this related to differences in developmental stage and/or sex? To explore the latter, could the expression of genes on the X or Y chromosome be used to sex the embryos? More generally, this is a strange pattern, which provides some concern about the robustness of the selection of these genes as those that display strong maternal effects.

11) Figure 5 and Results: The comparison with previous studies is somewhat troubling. In particular, how were the data normalized to ensure the results were comparable (e.g., batch effects/control for potential confounding factors)? Also, a p-value cutoff, rather than an FDR cutoff, should be used to ensure results are comparable across studies (since the null distribution will differ). Without more information, it is difficult to assess the significance of this result.

12) Results: What functional categories were enriched for in the ASE genes? Also, for the enrichment analysis, did the background control for gene expression levels (i.e., were only expressed genes considered)?

https://doi.org/10.7554/eLife.05538.013

Author response

1) The title is provocative, without obvious follow-up. ‘Convergence’ and ‘divergence’ has implications about genetic selection and systems properties. The reviewers expected a definition of what these concepts mean in this context, how the study design would address these questions, and then interpretation of how the results provide insight. This was not evident in this manuscript. The primary reference seems to be in the Introduction, where two definitions of divergence are provided: first, it refers to changes in embryonic development resulting from genotype-specific differences in maternal transcripts present in mature oocytes at fertilization, and second, divergence here seems to be differences in regulatory control between embryos and adults. Both have been extensively studied, first to understand the maternal:zygotic transition in gene expression patterns, and second to characterize differentiation of cells, organs, and tissues. Resolving these and other forms of direct (genetic) and indirect (epigenetic, social factors is beyond the scope of the present study design, except with weak ad hoc arguments.

We are so used to these terms in their evolutionary context that we did not ask ourselves whether it’s clear what we mean. We have changed the introductory paragraph to indicate that we use these terms in their standard evolutionary meaning and would be open to further clarification if necessary.

In evolutionary genetics, ‘constraint’ and ‘divergence’ are standard terms that describe the slowing of change, and the accumulation of change, respectively. Divergence is the evolutionary process that gives rise to differences; constraint is the brake purifying selection puts on divergence.

Specifically, in the context of our paper, the large number of genes with cis-eQTL and allele-specific expression, the change in the allelic ratios over developmental time (eg Hbb-b1 and Hbb-b2), and the maternal effect differences between the strains are indicative of gene regulatory divergence since the strains’ last common ancestor.

Similarly, ‘constraint’ refers to expression patterns that are more similar between strains than would be expected, indicating past purifying selection. We find evidence of constraint in the lack of trans-eQTL (fewer than would be expected, based on the comparison to adult mice), the higher sequence conservation and the depletion of regulatory gene ontology terms in the cis-eQTLs.

2A) No mention about whether sex of embryos was determined, and results analyzed separately for males and females. Considerable literature shows that sex affects expression of many genes. If the authors have evidence that embryo sex is not important in this context, evidence and arguments need to be presented (see also comment 10 below).

This is an important point, and we thank the reviewers for mentioning it. We did indeed determine the sex of each embryo. We performed the suggested analysis and found only 4 additional genes with significant sex-specific expression patterns, even at higher (more lenient) FDR cutoffs. All these genes are on the X or Y chromosomes.

Sex differentiation begins with differential expression of Sry and then Sox9 in the genital ridges at E10.5. Our embryos were harvested at E11.5, before development of the gonads, which happens a day later at approximately E12.5. We do not expect substantial sex-specific differences in gene expression at E11.5 because most male-specific development begins with testosterone synthesis in the gonads. We also asked whether we can detect differences in genes that are known to be expressed differentially between the sexes in the mouse brain prior to gonad differentiation, but again only found a handful of X and Y chromosome genes to show any sign of sex bias.

Because the X and Y chromosomes are highly repetitive, and because we found only a few sex-biased genes, we considered only autosomal genes and markers for all subsequent analyses. A new paragraph has been added to the Methods to explain these analyses.

We note that we used sex as a covariate when performing batch normalization with PEER, thus controlling for any smaller sex-specific effects on gene expression.

B) Apparently RNAs from whole embryos were analyzed. But as the authors acknowledge, organogenesis is active at E11.5. So why study whole embryos rather than specific organs and tissues?

An E11.5 embryo is, on average, approximately 5x4x3mm3. Attempting to dissect individual organs, which are very small and morphologically not particularly distinct at this stage of development, would introduce substantial confounding factors. The most important and the most difficult one to control is variation among individual dissections, such as in the precise cut separating the head from the body. This introduces unknown and uncontrollable variance that potentially results in apparent differences in gene expression. Also, a ‘tissue’ isolated in any practically feasible dissection, for example head vs. body vs. limbs, is still a heterogeneous mix of developing cells. Finally, dissection and subsequent sequencing of individual organs from hundreds of mice would have multiplied costs to the extent that it would have been prohibitively expensive.

We obtained definitive results with important conclusions despite the perceived shortcoming of using whole embryos. Future studies on individual embryonic tissues, if executed properly, may refine our findings.

C) Paternal stress is an established factor affecting offspring phenotypes and molecular features. Typically these paternal factors are controlled by removing males from mating cages immediately after fertilization. Alternatively, in vitro fertilization is sometimes used. Regardless, these kinds of factors needed to be controlled in order to make strong conclusions about the nature of maternal effects.

We used standard mating procedures. One male was placed into a cage with 2 females in the evening and removed promptly on the morning after mating (E0.5), minimizing any effect of paternal behavior on the mother’s pregnancy. We have added a note to this effect in the Methods.

3) The aspect of the paper that we found disappointing was the analysis of gene expression dynamics. The reviewers considered that the authors could get much more from this dataset. For example, they don't test for interactions between genotype and time/stage (somite number, this could be done quite easily even using PEER) and there is no consideration of non-motonic time trends etc. In short, we think the authors could likely find much more about how genetic variation influences expression dynamics by implementing something similar to the ideas of Francesconi (Nat 2014).

We thank the reviewers for pressing this point, which we had extensively explored during preparation of the manuscript. In fact, we were quite excited to find 12 genes with significant genotype x time interactions, which we found by correlating ASE and somite number. Thus, for these 12 genes, cis-regulatory polymorphisms cause the B6 and Cast alleles to be expressed differently over time. We highlighted in the text an interesting subset of these heterochronic genes, the beta-globins (also see point 9 below).

The time course we assayed is relatively short, ranging from approximately E11.0 to E12.0, or about 5% of total pre-term developmental time. We therefore expect only a very small fraction of genes to show non-monotonic trends in expression, and even fewer where that trend is polymorphic between the assayed strains. Thus, we do not have the same expectation as the reviewers regarding potential widespread, genetically controlled heterochrony.

In order to definitively explore this issue, we applied the Francesconi (2014) method to explicitly identify genotype x time interactions. This method uses an F-test to assess whether a spline-based genotype x time model (with interaction) fit the data better than a genotype + time model (additive effects only). We found some interactions, but all genome-wide significant results were the result of overfitting, a pitfall that is particularly prevalent with non-monotonic trends. In order to avoid overfitting, we modified the splines approach to use resampling (rather than an F-test) to identify significant genes. This approach yielded no significant genes. We also asked whether genotype x time interactions could be observed using simple linear models only, but again were unable to find any convincing significant results. This underscores the greater power of our straightforward correlation approach, which did yield 12 heterochronic genes.

We considered the reviewers’ suggestion to use PEER to test for interactions between genotype and time/stage. Typically, PEER is used to remove batch effects, including both known and unknown covariates. Since somite number is a known covariate in our experiment, it was unclear to us how PEER would be useful in identifying time x genotype interactions, since we can model time explicitly.

Based on these results, we conclude that the divergence between Cast and B6 produced a small number of significant time x genotype interactions that are active at E11.5. That we found any is a testament to the robustness of our data and a reflection of biological reality in the developing mammalian embryo. Future studies may refine or expand these results.

4) The next recommendation would be to try and get more from the comparison of ASE and eQTLs. The correlation is weak (R^2 =∼34% variance explained). What is the explanation for this? Biological or technical? Likely there is something interesting to say biological here about the causes of the differences between the hybrids and the recombinants.

The explanation is technical. Most genes do not exhibit ASE or eQTL effects, and therefore contribute noise around (0.5, 0.5). When considering only the genes with significant ASE or eQTL, agreement improves substantially and we find that only about 1% of genes are discordant between ASE and eQTL. The correlation improves from rho=0.58 when considering the top 5000 genes with best ASE read coverage to rho=0.70 when considering the top 1000 genes.

In comparison to previous studies the correlation we find between eQTL and ASE appears to be quite strong. For example, Lagarrigue et al. (2013) compared ASE and eQTL in mice, finding 90% of significant eQTL genes showed allelic ratios in the same direction as predicted by eQTL, but the overall quantitative correlation appeared to be poor. Similarly, Battle et al. (2014) and Li et al. (2014) show highly significant correlations between ASE and eQTL effect sizes, but the Pearson correlations are at most 0.68 when considering genes with deep allele-specific read coverage.

5) In general, the results are not too surprising; for example, the observation (Results, third paragraph) that the expression of key developmental genes changes through development is to be expected, we are not sure this warrants a main text figure.

The differential developmental regulation of the globins between B6 and Cast highlights fundamental physiological changes due to divergence in gene expression regulation between the strains. That the two adult globins show opposite patterns is suggestive of compensatory evolutionary dynamics within this gene family. We are not aware of any such finding in mammalian embryogenesis and while one might have expected such phenomena, our study is the first to show that they actually occur.

Figure 2D establishes that our data recapitulate the previously reported shift from fetal to adult hemoglobins. This is important context for Figure 2E and 2F, which demonstrate that the two strains initiate this transition using different globin family members.

Our study is also the first to show widespread, substantial effects of maternal genotype on embryonic expression. Is that ‘surprising’? We think it is, and it’s important too.

Moreover, a significant number of eQTL have been found in other tissues/cell lines in mammals (multiple papers from the Dermitzakis and Pritchard labs), the majority of which have been shown to regulate gene expression in cis, so the novelty of this part of the manuscript is small.

The novelty here is not the abundance of cis-eQTL; it is the paucity of trans-eQTL. Previous studies in model organisms, including adult mice, have typically shown that 10–20% of eQTL are in trans (e.g. Chow et al. 2015), with the rest acting locally and probably therefore in cis. We demonstrate that there is a highly significant quantitative difference between adults and embryos, with only approximately 1% of all eQTL acting in trans in embryos. Together with the sequence conservation analysis of eQTL/ASE genes, our results show that there is stronger evolutionary constraint on mammalian gene regulatory networks in embryos than in adults. This is a novel, non-obvious, conclusion.

6) Care must be taken when defining cis-regulatory variants identified from previous RNA-seq studies (Montgomery et al.; Pickrell et al.), as you have done in the Introduction. In particular, these studies focused on correlating variants near the gene of interest with its expression across individuals. In a number of cases, it was shown that the effect size of the eQTL was correlated with allele-specific expression levels, strongly suggesting that many variants act in cis. However, in most previous eQTL studies, variants that act in cis have typically been defined as variants that are proximal to the gene of interest. Some clarification of this subtle, but important distinction, would be useful here.

We agree, and thank the reviewers for pointing that out. The relevant sections have been edited to make clear that genomic proximity does not necessarily imply cis-regulation.

7) When you mention, in the Introduction, that, at E11.5, gastrulation has already occurred, and the embryo is already composed of multiple different cell types, how much of a concern is it that, across N2 embryos, the rate of development may differ (perhaps because of the genetic background), which might lead to heterogeneity in the composition of cell types profiled across embryos?

We think that variance in developmental rate would simply dilute our power to detect effects. The developmental ASE effects we uncover (Figure 2D, E, F) are inconsistent with high variance in developmental progress, at least in the tissues in which these genes are expressed.

8) Results, second paragraph: Why was a paired t-test as opposed to a binomial test (perhaps with a beta prior to model over-dispersion) employed here? The latter seems a much more natural approach given the data.

We carefully considered several statistical approaches before settling on the paired t-test, which has two key appealing features:

A) It considers the mean and variance across many individuals; since we considered only heterozygous sites in N2s, this number varies by individual but is on average ∼75.

B) It takes advantage of the within-individual comparison (i.e., for each individual, the B6 allele read count is subtracted by the Cast allele read count); this reduces the effects of biological and technical noise between samples because much of this noise should affect both alleles similarly.

Previous studies have used the beta-binomial distribution to model the count data underlying ASE measurements (e.g. Skelly et al. 2011). Such models are unlikely to provide improvements to our results while adding substantial complexity to the analysis as well as free parameters whose data-based estimation may result in overfitting. Importantly, we found that p-values obtained from the t-test correlated well with those calculated by resampling, but were slightly conservative.

We have expanded our note in the Methods explaining our decision to use the paired t-test.

9) Results, fourth paragraph: It would be interesting to study the overall expression levels of the genes considered here in a homogeneous genetic background (i.e., in the F0s). In particular, this would allow one to study how the expression of Hbb-b1 and Hbb-b2 differed during development—is it also the case that, in BL6, Hbb-b1 increases in expression during development while Hbb-b2 decreases? This would provide additional mechanistic insight and provide support for the compensatory mechanism alluded to in the last sentence.

Both Hbb-b1 and Hbb-b2, the adult beta-globins, are increasing in total expression during the time course we assayed (around E11.5), while the embryonic globins are slowly being downregulated. Our ASE data, which are particularly well-suited for such analyses because both alleles are assayed in the same embryo, demonstrate that the Cast allele of Hbb-b1 becomes activated earlier than the B6 allele, and that the Cast allele of Hbb-b2 becomes activated later than the B6 allele. In support of the ASE data, the regression slopes for total expression (a coarser metric than ASE) differ between heterozygotes and homozygotes for both genes, significantly so for Hbb-b2 (p=0.04).

Therefore, there are one or more cis-regulatory differences between Cast and B6 in the Hbb locus, which we detect across a large sample of N2 embryos. Only a violation of basic rules of inheritance would generate a different result in the F0s.

10) Figure 3 and Results: What explains the strange patterns in Figure 3B where there seems to be a bimodal pattern of expression (for both alleles) for a subset of the genes (e.g., Zkscan5, Polm, Eif2ak and Rhob)? Is this related to differences in developmental stage and/or sex? To explore the latter, could the expression of genes on the X or Y chromosome be used to sex the embryos? More generally, this is a strange pattern, which provides some concern about the robustness of the selection of these genes as those that display strong maternal effects.

We found that the genes with a bimodal pattern of expression were correlated in their expression patterns across embryos, suggestive of a concerted regulatory program. We know each embryo’s sex (as explained above) and found no evidence that these expression patterns were caused by sex of the embryo or developmental timing. Instead, we found these genes loosely clustered on the basis of litter, with most outliers comprised of embryos from a small number of litters. This clustering by litter suggests that there may be influences beyond genotype of the mother that result in maternal effects. This result does not impact our conclusions about the effects of maternal genotype on embryonic expression.

11) Figure 5 and Results: The comparison with previous studies is somewhat troubling. In particular, how were the data normalized to ensure the results were comparable (e.g., batch effects/control for potential confounding factors)? Also, a p-value cutoff, rather than an FDR cutoff, should be used to ensure results are comparable across studies (since the null distribution will differ). Without more information, it is difficult to assess the significance of this result.

We completely agree on the importance of making sure that results are comparable between the studies and have now gone to considerable additional lengths to do so. In response to the comment, we have batch normalized the previously published eQTL data using PEER, and have indicated this in the Methods section. In addition, we have performed two new analyses comparing the rates of trans-eQTL in our embryo study to the previous ones of adult mice.

First, as the reviewers suggest, we have compared the proportion of trans-eQTL at differing p-value cutoffs. Second, we have compared the proportion of trans-eQTL as the total numbers of eQTL increase. See Author response images 1A and 1B. Both of these alternative analyses support the original conclusion that adult mouse tissues show a substantially higher proportion of trans-eQTL than do embryos. Furthermore, the results hold when sampling smaller or larger subsets of the previously published datasets.

Author response image 1

Fraction of eQTL in trans, comparing current embryo study to previous adult studies. (A) eQTL were calculated at increasing p-value cutoffs (x-axis). (B) eQTL were compared at increasing ranks (e.g., top 500 eQTL from each study). See also Figure 5E and 5F.

https://doi.org/10.7554/eLife.05538.015

As the reviewers point out, the null distributions will be different between studies, due to differences in the number of correlated markers, differences in which genes are expressed, etc. Thus, we believe that the raw p-values should not be directly compared. In contrast, for each dataset, the false discovery rate is calculated relative to its permuted distribution, thereby controlling for differences in the null distribution between studies. Therefore, we are keeping the main figure using FDR, but have added a note in the text about the robustness of the result to variations of the analysis.

12) Results: What functional categories were enriched for in the ASE genes? Also, for the enrichment analysis, did the background control for gene expression levels (i.e., were only expressed genes considered)?

The top enriched categories for ASE genes were “cellular biosynthetic process” and “ribonucleoprotein complex”. We did not mention these results in the text, as we were unsure about how to interpret them biologically and, based on our other results, we did not expect substantial positive selection.

All Gene Ontology analyses were performed using background sets containing only expressed genes (and the ASE analyses using only genes with read coverage of SNPs, informative for the ASE analyses). We have noted this in the Methods.

https://doi.org/10.7554/eLife.05538.014

Article and author information

Author details

  1. Noah Spies

    1. Department of Pathology, Stanford University School of Medicine, Stanford, United States
    2. Department of Genetics, Stanford University School of Medicine, Stanford, United States
    Contribution
    NS, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Cheryl L Smith
    Competing interests
    The authors declare that no competing interests exist.
  2. Cheryl L Smith

    1. Department of Pathology, Stanford University School of Medicine, Stanford, United States
    2. Department of Genetics, Stanford University School of Medicine, Stanford, United States
    Contribution
    CLS, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Noah Spies
    Competing interests
    The authors declare that no competing interests exist.
  3. Jesse M Rodriguez

    1. Department of Computer Science, Stanford University, Stanford, United States
    2. Biomedical Informatics Program, Stanford University School of Medicine, Stanford, United States
    Present address
    Google Inc., Mountain View, United States
    Contribution
    JMR, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  4. Julie C Baker

    Department of Genetics, Stanford University School of Medicine, Stanford, United States
    Contribution
    JCB, Conception and design, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  5. Serafim Batzoglou

    Department of Computer Science, Stanford University, Stanford, United States
    Contribution
    SB, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  6. Arend Sidow

    1. Department of Pathology, Stanford University School of Medicine, Stanford, United States
    2. Department of Genetics, Stanford University School of Medicine, Stanford, United States
    Contribution
    AS, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    arend@stanford.edu
    Competing interests
    The authors declare that no competing interests exist.

Funding

National Institutes of Health (NIH) (R01GM103787)

  • Julie C Baker

Burroughs Wellcome Fund (BWF)

  • Julie C Baker

National Institutes of Health (NIH) (T32 HG000044)

  • Noah Spies

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

NS was supported in part by the Stanford Genome Training Program (NIH/NHGRI), JMB by the Biomedical Informatics Training Program (NIH/NLM), and JCB by the Burroughs Wellcome Fund Preterm Birth Initiative. We thank Elizabeth Finn for help with embryo dissections, Ghia Euskirchen and staff of the Stanford Center for Genomics and Personalized Medicine and Ziming Weng for help with sequencing, Cyril Ramathal and the Reijo-Pera lab for qPCR advice, and Casey Brown, Stephen Montgomery, and Eric Stone for discussions and comments on the manuscript.

All raw sequencing data are available in the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE62967.

Ethics

Animal experimentation: All experimental procedures were carried out in accordance with the Administrative Panel on Laboratory Animal Care protocols (#11799 and #13646) and the institutional guidelines set by the Veterinary Service Center at Stanford University.

Reviewing Editor

  1. Anne C Ferguson-Smith, Reviewing Editor, University of Cambridge, United Kingdom

Publication history

  1. Received: November 9, 2014
  2. Accepted: April 13, 2015
  3. Accepted Manuscript published: April 14, 2015 (version 1)
  4. Version of Record published: May 5, 2015 (version 2)

Copyright

© 2015, Spies et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,883
    Page views
  • 374
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading