1. Genomics and Evolutionary Biology
  2. Neuroscience
Download icon

A genomic lifespan program that reorganises the young adult brain is targeted in schizophrenia

  1. Nathan G Skene
  2. Marcia Roy
  3. Seth GN Grant Is a corresponding author
  1. University of Edinburgh, United Kingdom
Research Article
Cited
0
Views
1,683
Comments
0
Cite as: eLife 2017;6:e17915 doi: 10.7554/eLife.17915

Abstract

The genetic mechanisms regulating the brain and behaviour across the lifespan are poorly understood. We found that lifespan transcriptome trajectories describe a calendar of gene regulatory events in the brain of humans and mice. Transcriptome trajectories defined a sequence of gene expression changes in neuronal, glial and endothelial cell-types, which enabled prediction of age from tissue samples. A major lifespan landmark was the peak change in trajectories occurring in humans at 26 years and in mice at 5 months of age. This species-conserved peak was delayed in females and marked a reorganization of expression of synaptic and schizophrenia-susceptibility genes. The lifespan calendar predicted the characteristic age of onset in young adults and sex differences in schizophrenia. We propose a genomic program generates a lifespan calendar of gene regulation that times age-dependent molecular organization of the brain and mutations that interrupt the program in young adults cause schizophrenia.

https://doi.org/10.7554/eLife.17915.001

eLife digest

In our lifetime, we go through many changes – physically and also intellectually. At certain ages, we are particularly vulnerable to develop psychiatric disorders, and the majority of mental conditions start to manifest in teenagers and young adults. The symptoms for schizophrenia, for example, a mental health disorder in which patients often experience hallucinations, delusion or changes in behavior, typically start in the mid-twenties.

Schizophrenia tends to run in families and it is likely that different combinations of faulty genes that affect the connections between nerve cells increase the chance of having the disease. Until now, scientists have assumed that certain situations and environmental factors trigger the condition, but it was unknown if genes could influence the age at which the disease will begin.

To explore whether genes in the brain change at certain time points, Skene et al. examined how the genes are turned on and off across the lifespan of healthy mice and humans. The results showed that in both mice and humans, a ‘genetic lifespan calendar’ controlled every cell type in the brain and directed the way they worked at different ages. The timing was so precise that it was possible tell the age of a mouse or a person simply by looking at the way the genes were expressed in a tissue sample.

Skene et al. then studied how the genetic lifespan calendar controlled the genes damaged in schizophrenia, and found that the calendar caused a major reorganization of the genes at the time when the symptoms started. This suggests that the genetic lifespan calendar is a crucial factor that can determine at what age the disease will start.

The next step will be to study how the genetic lifespan calendar programs changes throughout the brain and to explore if it could be manipulated to change how the brain ages. This could help to develop new types of treatments for schizophrenia and other conditions of the brain.

https://doi.org/10.7554/eLife.17915.002

Introduction

Identifying the genetic mechanisms that underpin brain ageing across the lifespan may provide explanations for the maturation of behaviours and age of onset of diseases. Longitudinal studies show cognition, emotion and personality emerge progressively during childhood and adolescence, with executive functions peaking in early adulthood (Craik and Bialystok, 2006; De Luca et al., 2003). This coincides with the onset of some of the most devastating psychiatric disorders, most of which arise during later stages of brain development in the teenage years and early twenties (Kessler et al., 2007). For example, impulse-control disorders arise in late childhood and early teen years, substance abuse peaks in the early twenties, and schizophrenia in the mid-twenties (with a delay of around two years in females) (Häfner et al., 1993). Some monogenic neurological disorders also have early adult onset, such as Inclusion Body Myopathy associated with Paget disease Frontotemporal Dementia (Watts et al., 2004) and rapid-onset dystonia Parkinsonism (Brashear et al., 2012).

Why some brain disorders with a strong genetic component have a late developmental onset is unknown. The prevailing hypothesis for schizophrenia proposes that an early (fetal) insult or mutation renders the brain vulnerable to a secondary environmental insult that occurs in young adults, which then triggers the onset of psychosis (Bayer et al., 1999). However, the finding that the age of onset for schizophrenia has a heritability estimated at 33% (Hare et al., 2010), suggests that the timing may have a genetic basis. In recent years, there has been major progress in understanding the genetic basis of schizophrenia with the identification of many mutations and variants contributing to disease susceptibility. It is widely accepted that many mutations directly impact on synapse proteins, particularly those involved with postsynaptic signalling mechanisms in excitatory synapses (Kirov et al., 2008; Pocklington et al., 2015; Fromer et al., 2014; Fernández et al., 2009a; Singh et al., 2017). The postsynaptic proteome of excitatory synapses is physically organised into multiprotein complexes of which the supercomplexes assembled by PSD95 (Husi et al., 2000; Frank et al., 2016a; Frank and Grant, 2017; Frank et al., 2017) play a major role in regulating cognitive functions (Migaud et al., 1998; Nithianantharajah et al., 2013) and are disrupted by schizophrenia mutations (Pocklington et al., 2015; Fromer et al., 2014; Fernández et al., 2009a; Singh et al., 2017; Kirov et al., 2012; Purcell et al., 2014). Together these observations suggest there may be genetic mechanisms that account for the convergence between the many schizophrenia susceptibility genes, the postsynaptic proteome and the young adult brain.

Transcriptome profiling of the brain at different ages has shown complex changes in expression levels across the lifespan (Colantuoni et al., 2011). In early onset brain disorders, such as intellectual disability and autism, gene expression levels in fetal and early postnatal development correlate with enrichment in autism susceptibility genes (Willsey et al., 2013; Parikshak et al., 2013). However, correlation based approaches (Willsey et al., 2013; Parikshak et al., 2013; Gulsuner et al., 2013) (e.g. weighted gene coexpression network analysis) are unable to account for the full complexity of transcriptional changes that occur with age. Furthermore, although there has been extensive characterisation of cellular and anatomical maturation in neuronal, glial and synaptic subtypes (Alexander and Goldman, 1978; Shaw et al., 2006; Gogtay et al., 2004, 2006; Zehr et al., 2006; Woo et al., 1997; Huttenlocher, 1979; Anderson et al., 1995; Bourgeois et al., 1994; Kalsbeek et al., 1988; Klingberg et al., 1999), the relevant transcriptome changes remain to be identified.

To further understand the transcriptional events underlying brain development and ageing, we developed new tools that identify age-dependent gene regulatory events. These methods detect when gene expression trajectories change direction or plateau: we refer to these events as Transcriptome Trajectory Turning Points (TTTPs) (Figure 1). The timing of TTTPs has been shown to be an important feature of transcriptome trajectories during maize embryo development and the yeast cell-cycle: in both systems, genes with linked biological functions were found to turn/plateau at similar time points (Lee et al., 2002; Spellman et al., 1998). We have characterised TTTPs in the neocortex of humans and hippocampus of mice across their respective postnatal lifespans. This revealed a previously unknown, species conserved, gene regulatory program. These methods were also used with single-cell transcriptomes to define the age-dependent sequence of changes in neuronal, glial and endothelial cell-types. Our data suggest that the late onset of some psychiatric and neurological disorders is timed by mutations in this genetically programmed developmental sequence. Our findings also indicate that misregulation in the molecular maturation of synapse proteomes during a critical window in young adults is important for the onset of schizophrenia. These methods and findings open new areas of investigation into the genetic regulation of brain age and highlight their importance in the adolescent and young adult.

Figure 1 with 1 supplement see all
Gene expression trajectories can be classified and quantified based on the characteristics of their turning points (TTTPs).

(a) Simple trajectories (upper panel) do not contain TTTPs (red dots) whereas complex trajectories (lower panel) contain a TTTP. Complex trajectories are further classified according into those that plateau or reverse direction after the TTTP. Upward and Downward classification refers to the initial direction of the trajectory. (b) and (c). Examples of the application of the DeGeT (b) and ALiGeT (c) methods applied to two trajectories (A, shown in red; turns at 23 years of age and has a greater fold change than B, shown in blue, which turns at 33 years). The change in expression (ΔE) prior to the turning point is denoted as α and β for trajectories A and B respectively. (b) The DeGeT scoring system generates a score for each of 10 groups of ages, where each group contains a similar number of TTTPs. Trajectory A receives a DeGeT score of α in the age group spanning 23 years and zero in all other ages. Similarly, trajectory B receives a score of β only in age group 30–35. (c) The ALiGeT scoring system generate a score for each year of age and decays the contribution of ΔE as a function of distance from the TTTP (see dotted ALiGeT scores that peak at α and β and rapidly decay at ages either side of this peak).

https://doi.org/10.7554/eLife.17915.003

Results

Quantifying expression trajectories

As shown in the schematic diagram (Figure 1a), genes can be categorised according to their age-dependent profile of expression into simple trajectories (monotonic, upward or downward gradients) or complex trajectories that contain a Trancriptome Trajectory Turning Point (TTTP), marking both the direction of change and the age at which it occurs. Following a TTTP, the expression may reverse direction or plateau. We systematically identified gene expression trajectories by fitting cubic splines and marking TTTPs where the first derivative dE/dA, (E = expression level and A = Age) of the interpolated trajectories equals zero and changes sign. Next, to take into account the extent of expression changes prior to the TTTP (ΔE), and thereby emphasise those genes for which the TTTP represents a significant regulatory event, we developed two complementary methods (DeGeT, Decile-based Gene Turning; ALiGeT, Age-Linked Gene Turning). The DeGeT method depicted in Figure 1b is conceptually the simplest: the lifespan is divided into ten age groups within which an approximately equal number of TTTPs occur; each gene then receives a score for each age group (ΔE if the gene turns within that age window, and zero otherwise). The ALiGeT method depicted in Figure 1c extends this to generate a score for each year of age, by decaying the contribution of ΔE the greater the distance between the TTTP and the scored year (example trajectories and associated ALiGeT scores are shown in Figure 1—figure supplement 1). The reason for using two scoring methods is that some age periods have many TTTPs and others have few: DeGeT controls for this by balancing the number of TTTPs within age groups (and thereby has greater power to detect enrichments in earlier/later life stages), while ALiGeT scoring allows for the possibility that small time windows will have distinct molecular associations. Together, the TTTP, DeGeT and ALiGeT methods provide general purpose tools for exploring age-dependent gene regulation.

Brain expression trajectories in human and mouse

We first applied these methods to the Braincloud dataset, which measured mRNA expression levels from 269 prefrontal cortex samples across the human lifespan (14th gestational week to 78 years). Although TTTPs were identified across the lifespan, they were sharply concentrated during early adulthood. Summing the number at each age shows a striking peak and a mean of 26.0 years in males and 27.5 years in females (Wilcoxon signed rank test, p=0, Figure 2a, example trajectories with different turning points shown in Figure 2—figure supplement 1). Prominent peaks in early adulthood were confirmed with three regression methods (cubic splines with three degrees of freedom, four degrees of freedom and Loess regression) (Figure 2b). Removing X-chromosome genes from the analysis had no effect on this sex difference (p=0). To determine whether this TTTP-peak was human specific, we performed an equivalent analysis using transcriptome data from the hippocampus of 186 mice of both sexes between 58 and 600 days of age (Figure 2—figure supplement 2): this also revealed a peak in the frequency of TTTPs with a mean of 156 days for male and 165 days for female mice (p<1×10−323, Figure 2c). Although there is a lack of previous research on the age equivalence of early adulthood between rodents and humans, our results are concordant with estimations made using the TranslatingTime species comparison model which suggests that p156 (5 months) is equivalent to human early adulthood based on equivalent levels of cortical synaptic maturation (Pinto et al., 2015; Pinto et al., 2013; Workman et al., 2013). Plotting the cumulative distribution of TTTPs across the lifespan further illustrates the TTTP-peak in young adults (Figure 2d,e), and shows that 90% of TTTPs occur by 40 years of age, corresponding to the last stages of human brain development (De Luca et al., 2003; Lebel et al., 2012; Wood et al., 2004) and 7 months of age in mice. These data indicate that despite greatly differing lifespans, these two mammalian species share a lifespan program of brain gene expression with conserved features.

Figure 2 with 3 supplements see all
Trajectories and turning points characterise brain age.

(a) Percentage of TTTPs at each age of human lifespan. Mean age is 26.0 years for males and 27.5 years for females. (b) Percentage of TTTPs at each age of human lifespan using three different methods for fitting splines. Mean age is 31.3 years using cubic splines with 3 degrees of freedom, 21.6 years using cubic splines with 4 degrees of freedom and 25.2 years using Loess regression. (c) Percentage of TTTPs at each age of mouse. Mean age is 156 days for males and 165 days for females. (d) Cumulative sum of TTTP scores for every year of life in the Braincloud dataset. (e) Cumulative sum of TTTP scores for every year of life in the mouse hippocampus dataset. (f) The TTTPs for genes with the greatest expression changes prior to the TTTP (ΔE) were concentrated around the late-twenties. (g) Percentage of probes associated with TTTPs that up- or down-regulate prior to turning. (h) Percentage of probes with TTTPs which plateau or reverse after turning. (i) and (j). Age of individual mice and humans can be accurately predicted using a Support Vector Machine trained on the expression data. Individual points represent each mRNA sample.

https://doi.org/10.7554/eLife.17915.005

To characterize the features of the human TTTPs in more detail, we focussed on those genes that showed the greatest changes. As shown in Figure 2f, the TTTPs for genes with the greatest expression changes prior to the TTTP were concentrated around the late-twenties, reinforcing the earlier finding that this is a significant period for switching in the trajectories. Next, we separated genes into those with upward or downward trajectories prior to the turning point: overall there were similar numbers in each category, although there was a skew toward upward inflecting genes being more common in young subjects (<25 years) and downward inflecting genes more common in older subjects (>40 years) (Figure 2g). Finally, we examined the direction of the trajectories after the TTTP by dividing genes into those that established a stable plateau (Post-Turn Plateau) or reversed their direction (Post-Turn Reversal) (Figure 2h). The vast majority of probes had plateaued by 30 years of age. While the exact ages at which TTTPs occur is sensitive to both the regression method and the dataset used, it is clear that the TTTP-peak reveals a major molecular reorganisation in young adults towards the end of development. Together these findings show that young adulthood is a crucial time for switching brain gene expression and establishing the set points of most genes for later life.

Even though there are major changes during brain maturation in young adults, the complex trajectories were found throughout the lifespan, suggesting they could be used to predict the biological age of the brain. To test this, we used radial basis support vector machines and demonstrated that classifiers trained on partitioned subsets of the gene expression data (training sets) predicted age in the test sets with an accuracy (defined as mean |AgeActual−AgePredicted|) in humans of 5.5 years and 28 days in mice. Remarkably, they showed accurate age predictions across the entire range of ages in both species (human, R2 = 0.88, mice, R2 = 0.94) (Figure 2i,j) using only 40 probes in humans and 100 in mice (Figure 2—figure supplement 3). Thus, TTTPs and trajectories are highly characteristic features defining brain age across the lifespan in mice and humans. This indicates a ‘genetic lifespan calendar’ of transcriptome events is a conserved feature of mammals.

Cell type changes throughout the lifespan

To ascertain the biological processes affected by the TTTPs, we first sought to identify the cell types affected at each age. We asked if the TTTPs were enriched in the transcriptome of specific cell types using brain single-cell RNA-seq data and the Expression Weighted Cell-type Enrichment (EWCE) method (Skene and Grant, 2016). TTTPs were binned into approximately equal sized age-groups (similar to the DeGeT method) and then tested for cellular enrichments (Figure 3a). We assumed that different biological processes could be associated with up/down-regulated genes and so performed enrichment analyses for each direction separately. To ensure the findings were robust, the analysis was performed on both the Braincloud (Colantuoni et al., 2011) dataset and an independent human prefrontal cortex transcriptome dataset (Somel et al., 2009) (see Materials and methods). Significant enrichments were found for each cell-type tested and these were strongly correlated between the two datasets (Figure 3a, median correlation of enrichments for each cell-type and direction with at least one significant change between the two datasets was 0.37). The majority of significant enrichments were found amongst the Upward (Figure 1a) gene sets relative to the Downward trajectory gene sets.

Genes associated with high levels of expression in particular cell types are enriched at different stages of the human and mouse lifespan.

(a) Expression Weighted Cell-type Enrichments (EWCE) for the top 10% of genes with largest DeGeT scores within each age window for the human prefrontal cortex. Enrichments were calculated separately for the Somel (Somel et al., 2009) and Braincloud (Colantuoni et al., 2011) datasets. Enrichment values represent how far the mean cellular expression level within the target list is from the mean value for the bootstrapped lists (in terms of standard deviations); these values have been scaled to have a maximal value of either 1 or −1 within each age window. X-axis labels mark the mean age for each of the age windows. Asterisks mark those enrichments that are significant after Bonferroni correction with p<0.05. (b) Analysis as in (a) on mouse hippocampus.

https://doi.org/10.7554/eLife.17915.009

A striking sequence of events was observed where each cell type was regulated within distinct age-windows (Figure 3a and summarised in Figure 7). Early postnatal life was marked by TTTPs in endothelial cells (pbraincloudBonferroni=0.025, psomelBonferonni<0.00001 where ‘Bonferroni’ indicates a Bonferroni-adjusted p-value) and oligodendrocytes (psomelBonferroni=0.015); followed by microglial genes (pbraincloudBonferroni<0.00001, psomelBonferroni<0.00001) throughout adolescence; then pyramidal neuron genes (pbraincloudBonferroni=0.0216, psomelBonferroni<0.00001) in early adulthood (corresponding to the peak in turning points); then interneuron (psomelBonferroni<0.00001) and oligodendrocyte genes through the late twenties and thirties (pbraincloudBonferroni<0.00001, psomelBonferroni<0.00001). Astrocytes were found to have two periods of enrichment, the first during early adulthood (psomelBonferroni=0.0028) and a second very late in life (pbraincloudBonferroni=0.00675, psomelBonferroni<0.00001). These enrichments reveal the sequential maturation of cellular processes in the brain across the human lifespan. Moreover, most of these changes occurred prior to 35 years of age with prominent neuronal changes in young adults.

We also performed this analysis on the mouse hippocampal dataset (Figure 3b). As in the human neocortex datasets we found an early adulthood enrichment for upward-turns in pyramidal neuron genes (pmouseBonferroni=0.008), followed by a later enrichment for downward-turns in genes associated with both pyramidal and interneurons (pmouseBonferroni=0.006 and, pmouseBonferroni=0.0086 respectively). Unlike in the human datasets, upward-turns in interneuron genes were found to be enriched at the same ages as those in pyramidal neurons (pmouseBonferroni=0.025). This may be because the earliest samples in the mouse ageing dataset were 56 days whereas the human datasets included fetal samples: correspondingly the early life enrichments for upward-turns in microglia and endothelial cells were not seen in mice. A later enrichment for upward-turns was seen in endothelial cells in mice (pmouseBonferroni=0.048). These findings show that the enrichment of neuronal genes in the early adult peak is conserved across species and brain regions.

Synaptic mechanisms in young adults

The young adult peak in TTTPs is a prominent landmark and we next sought to identify the key molecular mechanisms involved. We used ALiGeT to identify the top 10% of genes in both human and mouse in the year in which the largest number of TTTPs occurred (and refer to this as the Peak Gene Turning (PeGeT) score) (Supplementary files 1a and 1b). The relevant biological processes in the genes with high PeGeT scores, conserved between mouse and human (Supplementary file 1c), were first examined for enrichment in Gene Ontology terms: this revealed their role in synaptic transmission (pFDRadj=0.00024 where ‘FDR-adj’ indicates the p-value was adjusted for False Discovery Rate using the Benjamini-Hochberg method). Mammalian Phenotype Ontology annotations indicated their roles in behavioural (p=7.5×10−05) and nervous system (p=0.0002) phenotypes. To probe the synaptic role in more detail, we examined genes coding for proteins in the human postsynaptic density (hPSD) (Bayés et al., 2011) and the postsynaptic PSD95 supercomplexes (Husi et al., 2000; Frank et al., 2017; Fernández et al., 2009b; Frank et al., 2016b), which are crucial in controlling synaptic plasticity and behaviour: both sets showed significantly higher PeGeT scores than expected by chance (p<0.0009 and p=0.0019 respectively, Figure 4a,b, Supplementary file 1d).

Figure 4 with 1 supplement see all
Post-synaptic density (PSD) genes are associated with TTTPs during windows in adolescence and early adulthood.

(a) PeGeT scores for human PSD (hPSD) genes compared with scores from 100 randomly sampled gene lists. For each of the 100 random lists, the ith largest score is plotted against the ith largest hPSD score. If the distribution of PeGeT scores within the PSD gene list was normal, then an equal number of random scores would be expected on either side of the red line. P value from bootstrapping of 10,000 random lists is shown. (b) PeGeT scores for PSD95 supercomplex genes analysed as in (a). (c) Human PSD genes show a window of increased ALiGeT scores during young adulthood in two human prefrontal cortex datasets (Braincloud, blue; Somel, yellow). Beneath the red horizontal line marks the point of significance (Bonferroni corrected p=0.05). (d) DeGeT scores were significantly increased in hPSD genes in five consecutive age sets spanning 24–36 years. Boxplots show the mean bootstrapped scores (10,000 random lists) and the red cross marks the score of the hPSD list for that age. *, Bonferroni corrected p<0.05. (e) Human PSD genes show a window (126 to 151 days) of increased ALiGeT scores during young adulthood in a mouse hippocampal dataset. Beneath the red horizontal line marks the point of significance (Bonferroni corrected p=0.05). (f) The mouse synaptic proteome shows extensive changes during early adulthood, with particularly strong impact on ion channels. Expression profiles for a subset of the channels and receptors found to be differentially expressed with age. Values shown are the mean for each month, divided by the maximum mean expression value for that protein across all five months.

https://doi.org/10.7554/eLife.17915.010

To establish the age window when synaptic genes showed TTTPs, we used a bootstrapping approach to test whether synaptic genes have higher ALiGeT scores in particular years in the Braincloud dataset. Each year between 22 and 33 years of age showed a significant increase in synapse-associated ALiGeT scores and we refer to this as the ALiGeThPSD window. A replication study (Somel) using a smaller human transcriptome dataset (Somel et al., 2009) gave an overlapping estimate for the synaptic ALiGeThPSD window at 17–22 years (Figure 4c). We next applied the DeGeT method and found all five consecutive age sets from 24 to 36 years were significantly enriched in hPSD genes (Figure 4d). This result was confirmed in two independent human frontal cortex datasets (Somel et al., 2009; Kang et al., 2011) (see Figure 4—figure supplement 1). Similarly, we identified the ALiGeTmPSD window in mice at 126 to 151 days (Figure 4e). These data show that transcripts encoding postsynaptic proteins were significantly enriched in TTTPs during early adulthood in all four datasets tested, spanning two species and two brain regions.

Since these transcriptome results suggest that specific changes occur in the composition of components of the synapse proteome, we performed relative quantitation by mass spectrometry on 38 forebrain synaptosome samples between 1–5 months in mice. A total of 900 proteins were detected, of which 99 were found to show significant changes with age (Supplementary file 1e, pFDRadj<0.005). We explored which functional classes of synapse proteins were affected by ageing during this period, and found that ion channel proteins and receptors showed increased likelihood of being differentially expressed (pFDRadj=0.00034, expected = 7, actual = 17, Figure 4f). Amongst the affected channels were the majority of Ca2+- and Na+/K+-ATPases detected in our dataset, including three of the subunits of Atp2b (often referred to as PMCA). This finding recapitulates and extends the previous reports that aged rodents have decreased Ca2+- and Na+/K+- ATPase activity (Zaidi et al., 1998; Tanaka and Ando, 1990). These proteomic findings support the transcriptome findings that changes in synapse proteome composition occurs within the young adult age window.

Lifespan trajectories and schizophrenia

Prompted by previous results showing that postsynaptic proteins have been linked to the genetic susceptibility of schizophrenia (Fromer et al., 2014; Fernández et al., 2009a; Nithianantharajah et al., 2013; Kirov et al., 2012; Purcell et al., 2014), we hypothesized that TTTPs may be relevant to the age of onset of schizophrenia. For these analyses we examined multiple sets of publically available genetic data including de novo and GWAS data (see Materials and methods ‘Genes Lists’, Supplementary file 1f) and applied multiple analytical approaches.

As a first step, we applied the same bootstrapping approach used earlier for the PSD genes on the pooled set of genes which have been associated with Schizophrenia using either GWAS or de novo approaches. Strikingly, this analysis showed the TTTPs in susceptibility genes predicted the known age windows for the onset of schizophrenia (Figure 5). The ALiGeTscz enrichment window spanned 22–26 years (Figure 5a) corresponding to the clinically reported age of onset defined by the first episode of psychotic symptoms and window of maximum vulnerability (Kessler et al., 2007; Häfner et al., 1993). Since males are reported to have an earlier disease onset than females (Kessler et al., 2007), we tested males and females separately and found the ALiGeTscz enrichment window was significantly earlier in males than females (males 20–26 years, females 26–28 years; wilcoxon p=4.7×10−14, Figure 5b).

Figure 5 with 8 supplements see all
Schizophrenia susceptibility associated genes are associated with TTTPs during windows in adolescence and early adulthood.

(a) Schizophrenia susceptibility genes show a window (22–26 years) of increased ALiGeT scores during young adulthood in the Braincloud dataset. Beneath the red horizontal line marks the point of significance (Bonferroni corrected p=0.05). (b) Schizophrenia susceptibility genes show a later window of increased ALiGeT scores in females (26–28 years) than in males (20–26 years). (c) DeGeT scores were significantly increased in Schizophrenia susceptibility genes in two consecutive age sets spanning 24–26 years. Boxplots show the mean bootstrapped scores (10,000 random lists) and the red cross marks the score of the schizophrenia list for that age. *, Bonferroni corrected p<0.05. (d) PeGeT scores for the schizophrenia susceptibility genes (pooled from de novo, GWAS and integrative studies) compared with scores from 100 randomly sampled gene lists. For each of the 100 random lists, the ith largest score is plotted against the ith largest disease gene score. If the distribution of PeGeT scores within the disease gene list was normal, then an equal number of random scores would be expected on either side of the red line. P value from bootstrapping of 10,000 random lists is shown. (e) PeGeT scores for the schizophrenia genes from the integrative study analysed as in (g). (f) PeGeT scores for the schizophrenia genes from the de novo study analysed as in (g). (g) PeGeT scores for the schizophrenia genes from the GWAS study analysed as in (g). (h) DeGeT scores at 25—26 years are enriched for schizophrenia heritability (calculated with GWAS summary statistics instead of associated gene set). Bootstrapping was performed by shuffling gene level association z-scores which had been calculated using MAGMA. Boxplots show the mean bootstrapped scores and the red cross marks the score with the unshuffled association scores for that age. *, Bonferroni corrected p<0.05. (i) ALiGeT scores are enriched for schizophrenia heritability (calculated with GWAS summary statistics) in the Braincloud dataset between 22 and 24 years of age. The red horizontal line marks the point of significance (Bonferroni corrected p=0.05).

https://doi.org/10.7554/eLife.17915.012

To validate these results, we performed a series of technical control studies and biological replications. First, we used the DeGeT scoring system that showed the enrichment in schizophrenia peaked at 24–26 years (Figure 5c). Secondly, we performed the DeGeT enrichment test in the two independent human frontal cortex datasets (Somel et al., 2009; Kang et al., 2011) and confirmed that schizophrenia was significantly enriched (confirming the Braincloud result) (Figure 5—figure supplement 1). Thirdly, to confirm the results were not specific to a particular spline regression method, we demonstrate that enrichment window for schizophrenia was found using two alternate approaches to model fitting (Figure 5—figure supplement 2). While validating the occurrence of the disease enrichments, this analysis also revealed that the exact age at which the turning points/windows of enrichments occur depends on the regression model used (Figure 5—figure supplement 2). Fourth, we performed a down-sampling based sensitivity analysis to determine how the size of the gene set influences the enrichment: this indicated that the DeGeT enrichments are stronger for schizophrenia than for the hPSD (using subsets of 650 genes, 95% of schizophrenia subsets were significant at 24—25 years but only 50% of hPSD subsets, Figure 5—figure supplement 3). Fifth, we performed a variation of the bootstrapping analysis which accounts for transcript length and GC content (both of which are known to affect the rate of de novo mutations) and found no effect on the significance of the results (Figure 5—figure supplement 4). Sixth, we confirmed that varying the parameter in the ALiGeT scoring function, which controls the rate of decay with temporal distance, did not adversely affect the results (Figure 5—figure supplement 5). Finally, we dropped all fetal samples from the analysis, recalculated the TTTPs and confirmed that the major peak of TTTPs occurs in the early twenties, that it is delayed in females, that the later DEGET windows are enriched for PSD genes, and that schizophrenia shows discrete and significant enrichments using DEGET (Figure 5—figure supplement 6).

To further validate our findings, we next examined different genetic datasets that have been used to identify schizophrenia susceptibility genes. The ALiGeTscz analysis results described above used a combined schizophrenia gene set from three orthogonal genome-wide methods: (1) an integrative analysis of Genome Wide Association Studies (GWAS), expression analysis, copy number variants (CNV) and mouse models (Ayalew et al., 2012); (2) combined results of three exome-sequencing studies (Fromer et al., 2014; Xu et al., 2012; Girard et al., 2011), and (3) the most recent GWAS results from the Schizophrenia Working Group of the Psychiatric Disease Consortium (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) (Supplementary file 1f). We therefore separately tested sets of susceptibility genes discovered with all three methods: all showed significantly increased PeGeT scores (Integrative Analysis, pFDRadj<0.0009, de novo, pFDRadj=0.0057, GWAS, pFDRadj=0.027, Figure 5d–g). Interestingly, the stronger enrichment seen with de novo mutations may reflect that GWAS detects common variants that are assumed to have lower penetrance (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). We also validated the results using an additional set of de novo mutations (Gulsuner et al., 2013): this also confirmed the adult enrichments, either when analysed independently or when combined with the combined sets described above (Figure 5—figure supplement 7).

Much of the heritability for schizophrenia is associated with SNPs that have not reached genome-wide significance with current sample sizes (Loh et al., 2015) (and were not included in our analysis thus far) and the sample sizes for de novo studies are too small to determine whether any genes found are significantly associated with disease status. We therefore adapted our methods to include a greater fraction of the SNPs associated with schizophrenia heritability by using association statistics from all SNPs regardless of whether they are genome wide significant (as defined in GWAS summary statistics files) and explicitly modelling linkage disequilibrium (based on results of the 1000 genomes project), such that disease association scores can be ascertained for each gene. The ALiGeT and DEGET approaches were extended to directly utilise the gene association scores generated by MAGMA (Multi-marker Analysis of GenoMic Annotation) (de Leeuw et al., 2015) based on schizophrenia GWAS summary statistics. Using this approach, schizophrenia showed a DEGET enrichment at 26 years in the Braincloud dataset (pBonferroni=0.03135, Figure 5h) and at 15—16 years in the Somel dataset (pBonferroni=0.0115, Figure 5—figure supplement 8a) and corresponding enrichment windows were found using ALiGeT (Figure 5i, Figure 5—figure supplement 8b).

Finally, we performed two additional sets of analyses where we examined mouse proteome datasets and single-cell transcriptome data. Consistent with the transcriptome results, the mouse synapse proteomic dataset showed that schizophrenia-associated proteins were enriched in the synapse proteins that changed prior to the 5 month peak (p=0.018, expected = 6, actual = 11), and 91% of those that change were found to be down-regulated. Next, we examined the role of specific cell types in schizophrenia by restricting the ALiGeT analysis to the subsets of genes defined by single cell transcriptome data (see Materials and methods) (Zeisel et al., 2015). Schizophrenia associated genes showed a significant (pBonferroni<0.05) window of enrichment in neuronal genes (Figure 6a, Figure 6—figure supplement 1). We were however unable to confirm this result using the summary statistics based method (Figure 6b). We then examined the types of trajectories associated with schizophrenia and found genes were predominantly down-regulated prior to the TTTP (Figure 6c). This result was confirmed using the summary statistics based method (Figure 6d).

Figure 6 with 3 supplements see all
Trajectories of schizophrenia associated genes show further distinctions in their cell-type and trajectories.

(a) Schizophrenia susceptibility genes show a window of increased ALiGeT scores when the analysis is restricted to the 5000 most neuron specific genes, but not the 5000 most glial specific genes. The red horizontal line marks the point of significance (Bonferroni corrected p=0.05). (b) DeGeT scores of the 5000 most neuron enriched genes do not confirm a significant enrichment in schizophrenia heritability calculated with GWAS summary statistics. Bootstrapping was performed by shuffling gene level association z-scores which had been calculated using MAGMA. Boxplots show the mean bootstrapped scores and the red cross marks the score with the unshuffled association scores for that age. *, Bonferroni corrected p<0.05. (c) Schizophrenia susceptibility genes show a window of increased ALiGeT scores, when the analysis is restricted to genes that down-regulate prior to their TTTP, but not with genes that up-regulate prior to their TTTP. (d) DeGeT scores of genes which down-regulate prior to their TTTP are significantly enriched for Schizophrenia heritability calculated with GWAS summary statistics at 26—27 and 40–77 years.

https://doi.org/10.7554/eLife.17915.021

Together these analyses show robust replication across multiple datasets, including different types of genetic variants, transcriptomes and proteomes, and several complementary analytical approaches. These data strongly suggest that the window of onset and sex difference in schizophrenia is timed by the regulation of susceptibility genes in young adults.

Synapse proteome and adult-onset neurological disorders

In the mouse synapse proteome dataset, we noticed several proteins that cause adult onset monogenic disorders. We used the Human Phenotype Ontology age-of-onset annotations to identify these neurological disorders (excluding neoplasms, and peripheral/autonomic disorders) and from a total of 39 genes, six were detected in our samples, and we confirmed these were significantly enriched amongst the synapse proteins whose expression levels changed during maturation (p=0.00008, expected = 1, actual = 4, Figure 6—figure supplement 2, Supplementary file 1e). The affected proteins/disorders included Atp2a2 (Darier’s disease), Eef2 and Itpr1 (Spinocerebellar ataxia), Atp1a3 (Dystonia Parkinsonism). No equivalent enrichment was found for congenital/neonatal onset disorders (p=0.45, expected = 4, actual = 4). Further inspection (because HPO annotations were incomplete) found three additional, adult-onset haploinsufficiency disorders encoding by genes showing a 50–80% reduction in the maturing mouse synapse proteome (Figure 6—figure supplement 2): Vcp (Frontotemporal Dementia) (fold change, FC = 0.48; p=0.002); Atl1 (hereditary spastic paraplegia) (FC = 0.19, p=0.00004); Dmxl2, (Polyendocrine-polyneuropathy syndrome) (fold change: 0.28, p=0.00004). These age-dependent reductions in levels of haploinsufficient disease genes is consistent with the model that their respective lifespan trajectories are relevant to their age of onset.

Prefrontal cortex early adult turning points are not associated with other polygenic brain disorders

The possibility exists that the lifespan trajectories are also relevant to the age of onset of other polygenic diseases with onset at different ages. To address this we compiled gene lists for six other major brain disorders with different age-windows of onset: onset during infancy (autism and intellectual disabilities); early adulthood (multiple sclerosis); and late adulthood (Amyotrophic Lateral Sclerosis, Parkinson’s and Alzheimer’s) (corresponding gene sets shown in Supplementary file 1f). We applied the same bootstrapping approach used earlier for the PSD genes and none of the disorders showed significant results (Figure 6—figure supplement 3). Because these gene lists are not comparable (different size, population sample size, obtained using different technical approaches etc.) the relative importance of the genetic calendar to schizophrenia cannot be directly compared with these disorders (see Discussion). In addition, because the transcriptome data is from prefrontal cortex and the primary pathology of several of these diseases is in other parts of the nervous system it cannot be assumed that the transcriptome trajectories in one part of the brain are the same as in others. Hence, the lack of any detectable age window does not preclude a role for gene regulation in the onset of these diseases.

Discussion

Understanding gene expression in the human brain during the phases of childhood, adolescence, young adulthood, middle and old age, is a fundamentally important area of biology with medical significance. We focussed on identifying age-dependent gene regulatory events that were detected when the trajectory in the level of gene expression changed. Studying the transcriptome trajectories across the lifespan of the human neocortex and mouse hippocampus showed that TTTPs occurred at all ages. Moreover, because these events were a defining characteristic of every age, we found that actual age could be predicted by examination of an RNA sample from mouse and human brain tissue. These findings indicate there is a ‘genetic lifespan calendar’ that sets the date for gene expression changes in both species. This conclusion complements previous epigenomic studies that show DNA methylation correlates with chronological age (Horvath, 2013). The most striking and unexpected feature was the peak of TTTPs in young adult humans at 26 years of age and 5 months of age in mice. In both species, this peak was delayed in females and involved similar sets of genes. Moreover, in both species there was a similar sequential pattern of cell-type specific changes across the lifespan. Thus, we conclude that mammals with greatly differing lifespans share a conserved genomic program regulating the sequence of cellular and synaptic changes throughout the lifespan.

We discovered that the young adult brain undergoes a dramatic reorganisation of gene expression, as revealed by the TTTP-peak around 26 years, and this reorganisation is largely completed by 40 years of age when ninety percent of trajectories have plateaued. These findings correspond well with anatomical data showing that development and myelination in the frontal cortex is over (Lebel et al., 2012) by this age. The TTTP-peak was enriched in neuronal and synaptic genes expressed in pyramidal and interneurons including those encoding PSD proteins and the 1.5 MDa PSD95 supercomplexes (Figure 7). This indicates that young adults undergo a major reorganisation of their synapse proteomes, which was supported by proteomics results in mice. The level of expression of many PSD and PSD95 supercomplex proteins are known to be important for many innate and learned behaviours and we also found that the TTTP-peak was enriched in behaviourally important genes. Thus the genetic calendar can modify synapse proteome composition and potentially shape the behavioural repertoire across the lifespan.

Summary of age windows for synaptic and cellular processes and diseases across the human lifespan.

The intensity of the shaded boxes indicates the enrichment in relevant genes. The TTTP-peak in young adults (vertical dotted lines show approximate boundary) coincides with synaptic and neuronal changes and schizophrenia genetic susceptibility. Changes in glial cells overlap with the TTTP-peak and occur in distinct windows across the lifespan.

https://doi.org/10.7554/eLife.17915.025

The TTTP windows in non-neuronal cells also corresponded well with known changes in these cells. The early age-window in endothelial cells corresponds to the expansion and maturation of the brain’s vasculature (Caley and Maxwell, 1970; Engelhardt and Liebner, 2014; Azmitia et al., 2016) and post-natal shift in the transcriptome of endothelial cells (Daneman et al., 2010). There was also an enrichment in oligodendrocyte genes early in life, potentially corresponding to downregulation of genes specific to oligodendrocyte precursor cells (He et al., 2009). The next phase, which continues through the early teenage years, involves strong upregulation of microglial genes and coincides with synaptic pruning (Schafer et al., 2012) and preceded the young adult window of synaptic and neuronal reorganisation.

Transcriptome trajectories and schizophrenia

To date, there has not been a satisfactory mechanism or model that accounts for the following five central features of schizophrenia. First, the genetic susceptibility: the mechanism needs to account for the diverse sets of genes and the diverse types of mutations. Second, the age of onset: the mechanism needs to account for the age-window during which first-episode psychosis occurs, the heritability of this onset, as well as the earlier presence of prodromal cognitive symptoms (Koutsouleris et al., 2012). Third, the sex difference: females have a later onset than males. Fourth, the cell biological mechanisms: there needs to be a common subcellular mechanism that incorporates the diverse classes of disease-relevant proteins, which include channels, receptors, synaptic adhesion proteins, scaffold proteins and signalling molecules. Fifth, the cognitive deficits: the molecular and cellular mechanisms need to play a key role in the relevant cognitive processes. A model that satisfies these criteria would also be expected to have explanatory power for other characteristics of schizophrenia.

Our findings meet all five criteria and we propose the following model. A genetic program orchestrating transcriptome trajectories causes reorganisation of expression of synapse proteins in young adults. Mutations in these genes are functionally exposed in young adults because the reorganisation of expression produces inappropriate synapse signalling properties that result in abnormal behaviour. We refer to this as the genetic calendar model of schizophrenia. This model posits that schizophrenia is a genetic disorder targeting the mechanisms of brain aging during the young adulthood period of the lifespan. Our model offers a mechanistic explanation for the onset of first-episode psychosis and is consistent with prodromal cognitive impairments and the persistence of schizophrenia beyond the young adult years (when many genes reach their plateau). Thus, the genetic calendar model can explain the onset and progression of schizophrenia.

In addition to the robust association of schizophrenia genes with the TTTP-peak which was replicated across datasets including multiple types of genetic variants, transcriptomes and proteomes, and several complementary analytical approaches, our study identified specific molecules that further strengthens the mechanistic link between synaptic mechanisms and schizophrenia. PSD95 supercomplex proteins are enriched in schizophrenia genes (Fromer et al., 2014; Fernández et al., 2009a; Singh et al., 2017; Kirov et al., 2012; Purcell et al., 2014; Grant et al., 2005). Amongst the schizophrenia-susceptibility genes with the most prominent TTTPs were those with established synaptic functions, including Rgs4, Snap25, Kalrn, Htr2a and Nrg1. The expression level of each of these genes has previously been shown to either influence, or be influenced by psychiatric symptoms (Etain et al., 2010; Guillozet-Bongaarts et al., 2014; Yin et al., 2013; Hill et al., 2006). Furthermore, altered expression of Snap25, Htr2a and Nrg1 are noted to associate with earlier age of onset (Etain et al., 2010; Weickert et al., 2012; Abdolmaleky et al., 2011).

Neurological and early onset disorders

We found evidence that several mendelian neurological disorders with adolescent and young adult onset also involved proteins that were down-regulated in the TTTP-peak. Heterozygous mutations in Atp2a2 cause Darier’s disease and significantly increase the risk of many psychiatric disorders including mood disorders, depression, and schizophrenia (Gordon-Smith et al., 2010). Mutations in Eef2 and Itpr1 cause spinocerebellar ataxia type 26 and 15/29, respectively. Atp1a3 mutations cause rapid-onset dystonia Parkinsonism, which leads to Parkinson’s-like symptoms appearing during early adulthood, often with concurrent emergence of psychiatric symptoms (Brashear et al., 2012). Vcp mutations cause a form of frontotemporal dementia with a mean age of onset in the mid-thirties (Watts et al., 2004). One of the main causes of hereditary spastic paraplegia (HSP) are heterozygous mutations in Atl1, which has an age of onset ~21 years (McCorquodale et al., 2011). Interestingly, we found that the synaptic scaffold protein Dmxl2 (haploinsufficiency causes Polyendocrine-polyneuropathy syndrome [Tata et al., 2014]) was reduced by ~70% (fold change: 0.28, p=0.00004) and studies in mice show heterozygous deletion of Dmxl2 in central neurons delayed the onset of puberty (Tata et al., 2014). These findings support the view that the genetic lifespan calendar reduces expression below a critical threshold in young adults and is important for multiple neurological and psychiatric diseases.

Our studies relied on human prefrontal cortex transcriptome data, which may have limited our ability to detect a role for the genetic lifespan calendar in the age-dependent onset of those diseases that are known to have primary pathology in other brain regions (e.g. Parkinson’s disease). Given the evolutionary conservation between mouse hippocampus and human prefrontal cortex, we expect other human brain regions to show a calendar of transcriptome trajectories, but with different patterns and therefore different age windows of disease gene enrichments. Moreover, while our whole tissue transcriptome analysis appeared to be sensitive to neuronal changes, we expect that future single-cell transcriptome data will provide a more detailed insight into rarer cell types and potentially reveal mechanisms relevant to the age of onset of pathology in these cells. We do not expect that the age of onset of all brain diseases will be accounted for by the genetic calendar, as it will likely depend on the importance of cell autonomous processes and exogenous factors (e.g. inflammatory processes involving microglia). It is also likely that high penetrance severe mutations will show an earlier onset and are less likely to show a dependence on the TTTPs. Weaker alleles may manifest with undetectable or subtle phenotypes at early ages and be exposed at later ages by the changes in gene expression. Thus, TTTPs would not necessarily account for the age of onset of intellectual disability or autism even though some of the same genes are involved with schizophrenia.

Role of genetic lifespan calendar in behavioural adaptation

The genetic lifespan calendar is an innate mechanism that regulates the levels of postsynaptic proteins and hence changes the physiological and behavioural properties of brain circuits. This indicates that the brain is not ‘hard wired’ but is continuously being modified by an evolutionary ancient and conserved genetic program. The postsynaptic proteins control innate and learned behaviours and thus the genetic calendar will modulate innate behaviours and the capacity to learn across the lifespan. The overarching biological function of this program could be to equip the animal for the challenges it faces at different ages.

If there are mutations that interrupt this calendar of events, then the organism will not respond appropriately to environmental challenges. This may be relevant to schizophrenia where exogenous factors are thought to influence onset (e.g. cannabis) or behaviour (e.g. smoking) during young adulthood. Our model which posits a fundamental role for genetic and genomic mechanisms can therefore accommodate previous models that have considered exogenous factors in disease aetiology. Interestingly, environmental triggers of psychosis, such as cannabis and other drugs of abuse, are known to act on the synaptic signalling mechanisms (Camp et al., 2011; Abbas et al., 2009) that are being reorganised in the TTTP-peak. Our model also has implications for those schizophrenia models that posit a (non-genetic) fetal insult as the predisposing factor for later onset (Brown and Derkits, 2010): the enrichments in schizophrenia susceptibility genes in young adults was present even when fetal samples were not included in the analysis supporting the notion that it is a disorder of postnatal brain ageing.

Our genetic lifespan calendar model may also have implications for the development of pharmaceuticals for the adolescent and young adult onset psychiatric and neurological disorders. As an alternative to the ‘precision medicine’ approach which directs pharmacological treatments to each susceptible genotype, we suggest that therapeutics accommodating many genotypes might lie in drugs that modify the genetic lifespan calendar. The identification of the conserved TTTP-peak between mice and humans may also assist in refining animal models of adult-onset brain disorders. Our findings open a range of new approaches for understanding brain ageing and the mental disorders afflicting young adults.

Materials and methods

Mice

Hippocampi from 186 mice of both sexes and two background strains (C57Bl/6 and 129s5) aged 58—600 days were used for the mouse microarray dataset (Figure 2—figure supplement 2). All animal experiments conformed to the British Home Office Regulations (Animal Scientific Procedures Act 1986; Project License PPL80/2,337), local ethical approval, and NIH guidelines. Animals were born and raised within the Research Support Facility at the Wellcome Trust Sanger Institute, and exposed to the same light/dark cycles and feed supply. Mice were exported to a holding facility three days prior to collection of brain samples, to allow time for stress response genes induced through movement to subside. All animals were sacrificed by cervical dislocation, and the hippocampi were dissected on ice, and snap frozen in liquid nitrogen.

RNA preparation and microarray hybridization

Mouse brain samples were homogenised using the Kontes Cordless Pellet Pestle system. Total RNA was extracted using the Qiagen miRNeasy kit, snap frozen using liquid nitrogen and stored at −80°C. Microarray processing was performed by the Wellcome Trust Sanger Institute’s Microarray facility. The Illumina TotalPrep-96 RNA Amplification kit was used for reverse transcription, amplification, and biotinylation of the RNA prior to hybridisation. The microarrays used were the Illumina MouseWG-6 v2 series. Hybridization, washing, and staining were performed according to standard Illlumina protocol. The microarrays were imaged using the Illumina BeadArray Reader. Images from the scanner were processed using the BeadStudio software. The microarray data is available to download from ArrayExpress (accession number E-MTAB-3256).

Braincloud human prefrontal transcriptome dataset

The human samples were generated by the Braincloud project (braincloud.jhmi.edu) as described in their primary publication (Colantuoni et al., 2011). In brief they used post-mortem human brains from the NIMH Brain Tissue Collection, National Institute of Child Health and Human Development Brain and Tissue Bank for Developmental Disorders. RNA samples were described as being extracted from ~100 mg of tissue from BA46/9, the dorsolateral prefrontal cortex. Custom-spotted two-colour oligonucleotide microarrays constructed from a set of oligos referred to as HEEBO7 (Human Exonic Evidence Based Oligonucleotide) were used. The preprocessed/normalized Braincloud expression data was downloaded from GEO (GSE30272, data contained within the Series Matrix file). No further normalisation was done beyond that which was already performed as described in the Braincloud publication. Low intensity probes and outliers had already been removed and expression levels were adjusted for sex, ancestry and a single surrogate variable using the sva package (Leek et al., 2012).

Somel human prefrontal transcriptome dataset

The ‘Somel’ dataset used to validate the increased PeGeT/DeGeT/ ALiGeT scores of schizophrenia genes, and EWCE enrichments, was downloaded from the GEO repository (accession GSE11512) (Somel et al., 2009). The dataset comprised samples taken from dorsolateral prefrontal cortex, which had been hybridised to Affymetrix Human Genome U133 +arrays. The dataset was read into R using the bioconductor ‘affy’ package, and processed using RMA. Probes with detection probabilities of less than 0.05 in over ten of the arrays were dropped.

BrainSpan human frontal transcriptome dataset

The BrainSpan RNA-Seq gene level dataset (Kang et al., 2011) was downloaded from the BrainSpan website (brainspan.org) on the 21st Jan 2016. The data was read into R. Data from the following four regions were retained: dorsolateral prefrontal cortex, ventrolateral prefrontal cortex, orbital frontal cortex and anterior (rostral) cingulate (medial prefrontal) cortex. Ages were converted to numerical equivilents. Any transcripts whose summed expression over all retained samples was less than 0.1 were dropped. Because the oldest samples in the BrainSpan dataset are only 40 years of age we treated trajectories which did not turn as turning in final year of life.

Mouse hippocampal transcriptome dataset

The mouse microarray data was read into R using the Bioconductor package Lumi (Du et al., 2008) (RRID:SCR_012781) and re-annotated based on a published analysis (Barbosa-Morais et al., 2010). The detectionCall function from the lumi package was used to drop probes undetected by the microarrays. Probes rated as having a ‘bad’ quality (according to the re-annotation package) were dropped, as were those with the coding zone given as ‘Transcriptomic?’. A variance stabilizing transformation was applied followed by quantile normalization.

Spline fitting

Natural cubic splines with three degrees of freedom were fitted to the expression data using the R package splines. For the mouse expression data, age was modeled as the independent variable, with sex and background as covariates. For the human expression data, as variation attributed to the extraneous variables (sex and ancestry) had already been subtracted earlier using the sva package these terms were not included. The location of knots for the splines was determined by the quantiles of the data. For each species and gene, trajectories were then interpolated across the lifespan with extraneous variables (sex and ancestry/background) held constant. We refer to these interpolated spline trajectories as Brain Transcriptome Lifespan Trajectories (BTLTs).

Detecting turning points (TTTPs)

Prior to detecting TTTPs, spline models were fitted to the data. To calculate where a TTTP occurred in a gene’s expression trajectory (Ag), the derivative was approximated for each BTLT and the age of the TTTP was taken as the first point where dE/dA =0 and the derivative changed sign, where E is expression level in the spline model and A is age. If multiple turning points were found in a single spline, then only the first was used. Figure 2a–c show the number of TTTPs found across the transcriptome within a given year/month. To determine the TTTPs for each sex, all samples from the other sex were dropped, the splines remodeled (without including sex as a covariate) and TTTPs detected afresh.

The numbers presented for the age of the TTTP peak in paragraph six are the mean age of TTTPs across all transcripts. To test for a difference in the age of TTTPs between the sexes, the Wilcoxon signed rank test was used. To test whether X-chromosome genes caused the sex difference, TTTPs in genes from that chromosome were dropped the same statistical test performed on the remaining set, and a similarly significant probably for difference was found.

To analyse the properties of trajectories with TTTPs (for generation of Figure 2g,h), linear models were fitted to the expression data before and after each TTTP, and tested for whether there is a significant change. A small fraction of probes which were detected as having TTTPs were not found to show significant differential expression prior to the TTTP, after multiple hypothesis correction with the Benjamini-Hochberg correction (FDR = 0.05) and these were excluded from this analysis of TTTP characteristics. We classified genes as showing post-turn plateaus if there were not detected as differentially expressed after the TTTP with FDR = 0.05. Those that were detected as showing significant differential expression after the TTTP, in the opposite direction to before the TTTP, are denoted as showing post-turn reversals. To generate Figure 2f we calculated the deciles of the ages at which TTTPs occur, and used these to split the data into ten groups with as closely matching sizes as possible.

Scoring TTTPs

The Age-linked Gene Turning scores (ALiGeTs) were designed to represent two features: (1) proximity of the TTTP close to the target year, and (2) the changes in their expression level from the start of the dataset (at 14th gestational week) through to the age at which the TTTP occurred (this change is denoted as ΔE). A human genes ALiGeT scores for a given age (Sg,T) can be calculated as:

Sg,T=1.5|AgT||ΔE|

Wherein

g is the index of the gene of interest

T is the year of age being queried

ΔE is the change in expression level prior to the TTTP

Ag is the age of the first TTTP in gene g

For mice the equation is modified slightly to account for the difference in lifespan of the two species. To allow the ALiGeT window to have an approximately equal width across a given proportion of the mouse’s, we scaled the term | AgT | by 650/78 where 650 days represents old age for a mouse and 78 represents old age for a human. A mouse genes ALiGeT score for a given age can thus be calculated as:

Sg,T=1.5|AgT|(65078)|ΔE|

To penalize genes which are further from the peak of TTTPs, the distance between the age of turning point (Ag) and the target age (T) is used as the exponent in an exponential expression with base 1.5. This function gives greatest emphasis to genes with D = 0, with a rapid fall over neighbouring years such that a human gene with D = 6 years have scores 9% the size of those with equivalent pre-turn changes in expression level at D = 0. The scoring function is graphically depicted in Figure 1—figure supplement 1. We show in Figure 5—figure supplement 5 that the results are stable to variation in the exponent base.

The term PeGeT score is used to refer to the ALiGeT score for the year in which that species has the most TTTPs.

DeGeT scoring

To score genes using the DeGeT method (represented in Figure 1b and used to generate Figure 4d and Figure 5c) a similar scoring system was used. First ages were divided into ten groups based on the number of genes which turn/inflect within that age period, such that each age set has approximately equal size number of turning genes. Because there are many more TTTPs during early adulthood than in infancy/old age, the age windows span many years at start/end of the lifespan and as few as one year in during the twenties. For age set S, which covers ages { x,x+1,,y1,y }, the score was equal to the pre-turn expression change if the gene turns within the age window bounded by x and y. If the gene does not turn within within the age window bounded by x and y then the score for that gene is zero.

Cellular enrichment analysis with EWCE

Cell type enrichment analysis was performed on three datasets: Somel (Somel et al., 2009), Braincloud (Colantuoni et al., 2011) and the mouse hippocampus. The EWCE ('Expression Weighted Celltype Enrichment') package from Bioconductor (RRID:SCR_006442) was used to perform the enrichment with 100,000 bootstrap replicates used for each test. Age groups were assigned based on the frequency of TTTPs across the lifespan (0—10th, 11th—20th,…,90th—100th percentile of all TTTPs). The genes which turn within each age window were sorted based on the size of pre-turn expression change. For each age window, the 10% of genes which turn within that window with largest positive (upward) and negative (downward) expression changes were assigned into two groups: we refer to these lists as the 'target' lists. A previously published single cell transcriptome (SCT) dataset containing cells from cortex and hippocampus was then loaded (Zeisel et al., 2015). Any genes in the target lists which were not found to be expressed in the SCT dataset were dropped. For the analysis with Braincloud and Somel, the background gene set was taken as all genes with mouse orthologs that are also found in the SCT dataset for which a spline was fitted. For the mouse hippocampus analysis, the background set contained all genes for which a spline was fitted and which was also detected in the SCT dataset. For the directional analyses, the background set thus contains genes which change in both directions. Bonferroni correction was used to adjust for multiple testing. For both directions, for each cell type within each age group (indexed by r, c and aA respectively), we calculated the mean (μr,c,a) and standard deviation (μr,c,a) of the bootstrap distribution, and used this to determine the distance (in terms of standard deviations) that the target list falls from the expected mean—we refer to this value as dr,c,a. Values were calculated separately for each dataset. The plots shown in Figure 3 show normalised values derived from dr,c,a by dividing by the maximal absolute value over all age windows. As a result the maximal absolute enrichment is either 1 or −1. The values plotted on the x-axis each represent one of the age windows defined by the quantiles (specifically, the values shown are the mean of the upper and lower bounds of the window). The age windows for Somel and Braincloud are not fully overlapped but were annotated with the central points of the Braincloud age windows to enable both datasets to be plotted against each other.

Testing for increased turning point scores of PSD and disease gene sets

To determine whether gene sets show larger ALiGeT scores than expected, a boot strapping test was performed. Genes in the target list, which were not detected as expressed on the arrays, were dropped from the analysis. Where multiple probes target the same gene, duplicated probes with smaller PeGeT inflection scores were dropped. For each target list being tested, ten thousand random gene lists of the same length were generated. The mean ALiGeT/PeGeT score for the target list and the random gene lists was calculated, and the proportion of random lists with smaller mean scores than the target list is taken as the probability. Where multiple lists were tested for significance against PeGeT scores, these were corrected using the Benjamini-Hochberg method (FDR = 0.05). The same method was used for the DeGeT set based analysis method as was used for the ALiGeT analysis, with bootstrapping being done with 10000 samples and correction for multiple testing over age sets done with the Bonferroni method.

Figure 4a,b, Figure 5d–g, and Figure 6—figure supplement 3 represent the results of the PeGeT bootstrapping analysis in graph form. We sought to represent the extent to which enrichment results from high scores amongst either a small number of genes, or from a broad increase in scores throughout the list. As in the bootstrapping analysis we compare the score distribution in the target (disease/synapse) list to the scores in random lists of the same length. To keep the plot tidy we only represent data from 100, rather than 10000 random lists. PeGeT scores for the target and random lists were sorted by numerical size. For each of the n genes in the target list, 100 dots were positioned with the y-axis determined by ith largest score in the target list, and the x-axis given by the ith largest score in each of the 100 random lists. To interpret the graph, note that if the majority of the random list scores fall above the red line, then the random lists have lower scores than the target list. It should be noted that the scales of the graph does bias the view towards the genes with largest scores.

To perform a bootstrapping analysis which controls for gene size and GC content, we obtained those values from biomart. Where multiple transcript lengths were associated with a single HGNC gene we took the mean value. The deciles of gene size and GC content were calculated over the set of expressed genes. The two sets of decile values were used to define a grid, and each gene assigned to a position within the grid based on it’s transcript lengths and GC content. To run a bootstrap analysis on a particular target list, 10000 random lists were constructed with equal length to the target list. Gene i in each random list was selected from the same grid square as gene i in the target list.

Disease gene lists

The disease gene lists are provided in Supplementary file 1f. The combined schizophrenia gene list (results shown in Figure 5a–d, Figure 6a,b, Figure 5—figure supplement 15,7 and Figure 6—figure supplement 1) used genetic associations from three types of studies, which were also analysed individually: the integrative dataset contains 42 genes assembled using a translational convergent functional genomics approach (Ayalew et al., 2012); the de novo gene set comprised 609 genes pooled from three studies which used parent-child trios to detect de novo mutations (Fromer et al., 2014; Xu et al., 2012; Girard et al., 2011); the GWAS results are from the 2014 release of the Schizophrenia Working Group of the Psychiatric Genomics Consortium (Abbas et al., 2009). Many SNPs from the GWAS study results were associated with multiple genes (349 genes were associated with 108 loci, with numerous loci associated with over twenty genes), and these were dropped leaving 62 genes to be used in our analyses (not that this does not apply to the analyses which directly utilised the GWAS summary statistics files). The GWAS result remained significant if the alternative approach was taken and all genes associated with SNPs were used. The additional schizophrenia de novo gene set (the ‘Gulsuner’ set) came from exome sequencing of ‘quads and trios’ (patients, their parents and unaffected siblings) associated with 105 individuals with the disorder (Gulsuner et al., 2013). Two autism lists were used, both based on finding de novo mutations through exome sequencing: the first contained 172 mutations (Sanders et al., 2012), the second 358 (Iossifov et al., 2012). The 78 intellectual disability genes were discovered through de novo sequencing of family groups (de Ligt et al., 2012). The Multiple Sclerosis, Amyotrophic lateral sclerosis (Lill et al., 2011), Parkinsons (23andMe Genetic Epidemiology of Parkinson's Disease Consortium et al., 2012) and Alzheimer's (Bertram et al., 2007) lists come from the top results of the following websites: msgene.org (69 genes); alsgene.org (17 genes); pdgene.org (23 genes); alzgene.org (10 genes).

ALiGeT enrichment windows

Gene list enrichments were evaluated across the lifespan by generating Sg,T for each gene, for each value of T between 1 through 78 (Sg,T defined above). To test for disease enrichments the boot strapping method described above was applied. Enrichment of the gene sets were calculated at each age. The Bonferroni method was used to correct for the testing of the hypothesis at each of the 78 years of age. When running the ALiGeT for the eight diseases Bonferroni correction was applied across all the diseases as well as over each year of age. To obtain the sex specific datasets, the enrichments were calculated separately on the male subset of the data, and then again on the female subset. To test for sex differences in TTTP ages specific to disease gene sets, the age of TTTPs associated with those genes in the male and female data subsets were compared using the Wilcoxon signed-rank test.

To control for transcript length, the maximum transcript length was determined for each gene using Biomart. As the distribution of transcript lengths is sharply peaked, ten quantiles of transcript length were used to group the genes for display of TTTP score distributions. To control for neuron specificity, the data on single cell transcriptomes from the Linnarsson/Hjerling-Leffler labs (Zeisel et al., 2015) was utilized (data available at linnarssonlab.org/cortex/). The cell-type specificity matrix was used to produce a metric for neuron vs glia enrichment. This was done by first calculating neuronal expression as the sum of expression in all cells they label as ‘pyramidal’ or ‘interneurons’, whilst glial expression was the sum of expression in ‘astrocytes’, ‘endothelial’, ‘microglia’ or ‘oligodendrocytes’. The ratio of these values was taken, and classic markers checked to ensure they were as expected (Dlg4 = 15; Camk2a = 6; Map2 = 13; Gfap = 0.2; Mbp = 0.1; Aif1 = 0.2). The 5000 genes most enriched for neurons were then used to perform an ALiGeT analysis for the combined Schizophrenia list. To confirm that the threshold used for how neuron-specific the genes are does not influence this result, we also show below a figure with PeGeT results using different thresholds between 1000 and 8000 (Figure 6—figure supplement 1).

To perform the analyses restricted to up/down-regulating genes depicted in Figure 6c,d all probes which show a higher/lower expression level at the beginning of the spline, relative to the splines value at the TTTPs are dropped. Removal of duplicate probes targeting the same HGNC gene is performed after dropping the probes. Probes which do not have TTTPs are also dropped from from both analyses.

Functional analysis of genes with high PeGeT scores

For the comparisons of mouse and human genes with large PeGeT scores, orthologs were determined using the Biomart package, through querying which HGNC Genes are orthologs for a given MGI ID. To find GO terms enriched in this set of genes, the MGI symbols were analysed using DAVID (Huang et al., 2009) against a background of mouse genes, and the GO term for ‘synaptic transmission’ was found to be the most enriched (the Benjamini-Hochberg corrected p-value is stated in the text). For the analysis of phenotypes from the Mammalian Phenotype Ontology (Smith and Eppig, 2009) (MPO), a full database of phenotypes was downloaded from ftp.informatics.jax.org. Any genes which were not detected as present by the microarrays were dropped from the MPO database, and a hypergeometric test for significance of enrichment performed on the remainder. Enrichment for ‘behavioural’ and ‘nervous system’ phenotypes was specifically tested for, and hence probabilities presented are not adjusted for multiple hypothesis testing.

DEGET and ALIS using genome wide summary statistics

Gene association statistics were calculated from Genome Wide Association Study (GWAS) Summary Statistics using v1.05 of MAGMA (Multi-marker Analysis of GenoMic Annotation) (de Leeuw et al., 2015). MAGMA enables disease/phenotypes association scores to be calculated for each gene, while accounting for linkage disequilibrium and the contributions from multiple SNPs. MAGMA takes GWAS summary statistics as input: these do not contain data on the individuals from the study and simply list the association p-values and z-scores/odds ratios for each SNP included in the study. The 1000 Genomes European panel was provided to MAGMA as reference data for calculating Linkage Disequilibrium. The schizophrenia summary statistics are associated with the GWAS analysis performed by the Psychiatric Genomics Consortium (PGC) which found 108 genome wide significant loci (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014); the file was downloaded from the PGC website (https://www.med.unc.edu/pgc/results-and-downloads).

DEGET enrichment was calculated by first grouping the genes into deciles based on the age of the turning point. DEGET groups for the MAGMA method are different from those calculated for gene list approaches for several reasons: firstly, sorting is performed using entrez gene IDs rather than HGNC gene symbols, secondly, all genes within the extended MHC region are removed from these analysis (as their MAGMA gene associations cannot be properly assigned due to Linkage Disequilibrium in this region). DEGET scores were assigned to genes as they were done for the gene set based analysis. To determine whether a given GWAS trait is enriched at a particular age, the z-score calculated by MAGMA for each gene was multiplied by the DEGET score. Z-scores were then shuffled 20,000 times, multiplying at each iteration with the DEGET scores, and compared to the unshuffled value. The p-value is based on the frequency with which the sum of unshuffled value is greater than the shuffled values. The same approach (multiplying ALIS scores with MAGMA z-scores, following by perturbations of z-scores) is used to calculate ALIS enrichment probabilities.

Age prediction

All of the human (including fetal) and all mouse samples were included in the age prediction analysis. Age predictions were performed with radial basis function Support Vector Machines through the e1071 package that provides an interface to libsvm in R (Chang and Lin, 2011). Two rounds of 10-fold partitioning were used to form training, validation and test sets. An initial round of random partitions separated test data from training/validation data. The combined set of training/validation data was then passed to the tune.svm() function from the e1071 package which then uses 10-fold cross-validation to perform a parameter search. A first shallow grid search was performed for γ{15,13, ,1 3) and cost{5,3, ,13, 15), and the optimal pair of values selected as (γ1,cost1). A second finer grid search was then performed over γ{γ11.5,γ11.375, ,γ1+1. 375,γ1+1.5) and cost{cost11.5, cost11.375, ,cost1+1. 375,cost1+1.5).

Probes were included in the model by first determining which probes are associated with age, as determined using a linear model. The linear modeling of age-associated probes was repeated for each of the partitioned sets of training/validation data; as such, data in the test set did not influence the selection of probes through the linear model. The probes were then ranked based on the level of association, and the top Ncutoff probes were used for training and testing. The reported accuracies were obtained using values of Ncutoff 40 for humans and 100 for mice. To determine the appropriate level of Ncutoff, a range of values were tested between 10 and 400, and the optimal number manually selected on the basis of having optimal SSE without using an unnecessarily large number of probes.

PSD preparations for mass spectrometry

Crude PSD preparations were made from dissected mouse forebrain tissue from C57BL6/5J mice ages 1 month to 5 months. In brief, each forebrain was homogenized by performing 12 strokes with a Dounce homogenizer containing 5 mL of ice cold homogenization buffer (320 mM sucrose, 1 mM HEPES, pH = 7.4) containing 1X Complete EDTA-free protease inhibitor (Roche) and 1X Phosphatase inhibitor cocktail set II (Calbiochem). Insoluble material was pelleted by centrifugation at 1000 x g for 10 min at 4˚C. The supernatant (S1) was removed and the pellet was re-suspended in 2 mL of homogenization buffer and an additional six strokes were performed. Following a second centrifugation at 1000 x g for 10 min at 4˚C, the supernatant (S2) was removed and pooled with S1. The combined supernatants were then centrifuged at 18, 500 x g for 15 min at 4˚C. The pellet was re-suspended in 5 mL of extraction buffer (50 mM NaCl, 1% DOC, 25 mM Tris-HCl, pH 8.0) containing 1X Complete EDTA-free protease inhibitor (Roche) and 1X Phosphatase inhibitor cocktail set II (Calbiochem) and incubated on ice for 1 hr. The resulting crude PSD extracts were centrifuged at 10,000 x g for 20 min at 4 ˚C and the resulting supernatant filtered through a 0.2 µm syringe filter (Millipore).

Preparation of samples for LC-MS/MS

Protein concentration of PSD preparations was determined using 1X Quickstart Bradford assay (BioRad). Thirty micrograms of PSD protein was prepared to contain 1X Novex NuPAGE LDS sample loading buffer (Invitrogen) with 100 mM DTT (BioRad), boiled for 10 min at 100˚C and loaded into 1 well of a 10-well Novex NuPAGE 3–12% Bis-Tris gradient gel (Invitrogen). Electrophoresis was performed under reducing conditions using the Novex NuPAGE SDS-PAGE system (Invitrogen) for 5 min. Gels were then stained with SimplyBlue SafeStain (Invitrogen) following the manufacturer’s instructions. Gel bands were excised and subjected to tryptic digestion using standard methods.

Mass spectrometry and data analysis

Five microgram of tryptic digest was analysed by LC-MS/MS using a UPLC Dionex QExactive (Thermo-Fisher, Waltham, Massachussets, USA). Protein identification was performed with MASCOT (Matrix Sciences) using Uniprot Mouse database (downloaded on 2014 March 24th). Label-free quantitation analysis was then performed on all timepoints examined in the study using Progenesis (Nonlinear Dynamics). The normalized Mass Spectroscopy dataset was read into R. Swissprot/Trembl ID’s for each detected protein was matched to a gene symbol. Orthologs were determined using the Biomart package, through querying which HGNC Genes are orthologs for a given MGI ID. The data was log2 transformed. A linear model was fitted to estimate protein expression based on age and sex using the bioconductor ‘lumi’ package. Significance of differential expression was corrected using the Benjamini-Hochberg method.

Build #85 of the Human Phenotype Ontology was downloaded from http://compbio.charite.de/hudson/job/hpo.annotations.monthly/lastStableBuild. All diseases associated with the following HPO terms were extracted from the ontology using Phenexplorer (compbio.charite.de/phenexplorer): ‘Neoplasm’, ‘Abnormality of the Nervous System’, ‘Abnormal peripheral nervous system morphology’ and ‘Abnormality of the autonomic nervous system’. Using these lists all diseases associated with neoplasms, and peripheral/autonomic disorders, and which are not associated with nervous system disorders, were dropped from the HPO phenotype database. From the remaining dataset, congenital onset disorders were taken as those annotated with the following terms: ‘Congenital onset’, ‘Infantile onset’ or ‘Neonatal onset’. Adult onset disorders were annotated with either ‘Adult onset’, ‘Young adult onset’, ‘Schizophrenia’ or ‘Bipolar affective disorder’. Schizophrenia enrichment was tested using the combined list described earlier. Enrichment analyses were performed using hypergeometric tests. Functional analysis was done by manually assigning a single functional category to each of the proteins detected in the dataset. Functional annotations were based on those from a previous paper on the synaptic proteome (Emes et al., 2008) and expanded based on Panther protein class.

Code availability

R code was used for all TTTP and age prediction analyses described herein (see Bioconductor, ‘TurningPoints’).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
    LIBSVM
    1. C-C Chang
    2. C-J Lin
    (2011)
    ACM Transactions on Intelligent Systems and Technology 2:1–27.
    https://doi.org/10.1145/1961189.1961199
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
    Heritability of age of onset of psychosis in schizophrenia
    1. E Hare
    2. DC Glahn
    3. A Dassori
    4. H Raventos
    5. H Nicolini
    6. A Ontiveros
    7. R Medina
    8. R Mendoza
    9. A Jerez
    10. R Muñoz
    11. L Almasy
    12. MA Escamilla
    (2010)
    American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official Publication of the International Society of Psychiatric Genetics 153B:298–302.
    https://doi.org/10.1002/ajmg.b.30959
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
    Myelination and organization of the frontal white matter in children: a diffusion tensor MRI study
    1. T Klingberg
    2. CJ Vaidya
    3. JD Gabrieli
    4. ME Moseley
    5. M Hedehus
    (1999)
    Neuroreport 10:2817–2821.
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
    Neurodevelopment and Schizophrenia
    1. SJ Wood
    2. CR De Luca
    3. V Anderson
    (2004)
    69–88, Cognitive development in adolescence: cerebral underpinnings, neural trajectories, and the impact of aberration, Neurodevelopment and Schizophrenia.
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92

Decision letter

  1. Jonathan Flint
    Reviewing Editor; University of California, Los Angeles, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "A genomic ageing program that reorganises the young adult brain is targeted in schizophrenia and anxiety disorders" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The authors create a new method to quantify transcriptome trajectory turning points (TTTP) and apply this to human BrainCloud dataset and a new mouse brain transcriptome dataset across ages 58-600 days. The authors identify a peak age of TTTP around 25-30 years old, which they link to psychiatric disorders (schizophrenia and autism) and synaptic development. Overall the reviewers felt that this was an ambitious analysis covering a novel and interesting topic (complex brain gene expression trajectories).

Essential revisions:

1) The text needs attention. The Introduction is long and meandering and would be more accessible if shortened significantly. The authors need to explain what they have done with greater clarity. For example the model the authors propose (subsection “Transcriptome trajectories and schizophrenia” paragraph three) is so vague as to be nearly meaningless.

2) The authors need to explain better which models they used and why. There are several algorithms presented in this work (TTTP, peget, DeGET, AliGET), of which only TTTP is straightforward. It is hard to understand why the authors choose to use one over the other. Some of the models (such as Aliget) include parameters that do not have a clear biological justification and seem somewhat arbitrary. Comparison of units across species is also unclear – is a human "year" equivalent to a mouse "day"?

3) Correction for multiple testing needs to be made explicit. For the cell type analysis a Bonferoni correction is used, but it is not clear that appropriate corrections were made for other analyses (the multiple different enrichment tests).

4) A list of >600 de novo mutation-containing genes is used. However, this is not a list of genes with any level of reasonable statistical support – what is the FDR? Are these all mutations in genes never observed in controls? The anxiety gene list is not well statistically supported and has no place in such an analysis.

5) Some of the models include a δ-E term, which "negates the contribution of genes that merely fluctuate away from their mean." The δ-E term also likely biases analysis toward genes with certain levels of baseline expression, as well as the dynamic range of microarrays. In addition, the gene fluctuation issue should be adequately accounted for by using spline regressions and biological replicates.

6) With regard to the age classifier, it is unclear if this is developed using a test set and validation set, which is necessary to truly evaluate accuracy that would be generalizable and meaningful. If it is not, they must find an independent validation set to make this of any meaning. In this regard, the cell type analysis does use a validation set for example.

7) All models are based on the TTTP point, which is defined as the earliest age in which the first derivative of expression changes signs. It is not clear why the authors only use the earliest of such points when genes can clearly have multiple inflections across the lifetime (see Figure 2—figure supplement 1). By taking the earliest inflection point, this makes the analysis somewhat contingent on youngest sample in a dataset (which is not balanced between mouse and human experiments), as well as the resolution of samples at the younger timepoints, which is sparse for BrainCloud (see below). Finally, all models are highly contingent on (1) the quality and resolution of the input datasets and (2) the accuracy of the spline regression model. As shown in the supplement, choice of regression parameters can have dramatic effects on downstream analyses. There is no justification for the cubic spline model used or any assessment of the accuracy with which it fits the underlying data. The choice of parameters seems arbitrary.

8) Gender differences – Using the Braincloud dataset, the peak age of TTTP is reported to be delayed in females compared to males and this is used as an explanation for why psychiatric diseases occur later in F. However, the Braincloud data that this result is based on has already had sex effects regressed out prior to analysis. It is inappropriate to make any comment on sex effect without using the non-regressed dataset.

9) Braincloud is used to define the human TTTP trajectory and includes human prenatal and postnatal samples. The regressed and normalized data is downloaded from GEO. However, it is unclear whether measures of RNA quality and other sources of technical variability are taken into account in any of the analyses. There will likely be an interaction between age and RNA quality and therefore potentially more variability in gene expression signal seen during certain timepoints. Replication in the Kang et al. (GSE25219) dataset will be important as this is similar to BrainCloud in scope.

10) The authors used the cubic splines with 3 DOF to depict the gene expression trajectories, and the TTTPs were detected by searching slope changing points along the smooth curves. How well does this actually fit the data? Why were 3 DOF chosen? There should be some rigorous assessment of accuracy of regression model (compared with Loess, etc)

11) The degree of smooth curve and the number of binned intervals could affect the number of identified TTTPs. But in subsection “Detecting turning points (TTTPs” the author said "If multiple turning points were found in a single spline, then only the first was used". This strategy may result in the imprecisely predicting TTTPs, e.g., the differences shown in Figure 4—figure supplement 2 when using different parameters and different methods.

12) Human-Mouse comparisons: the authors find a peak of TTTP around age 160 days in mice which they imply is equivalent to 26-30 years in human. However, mice were not assessed before 58 days postnatally whereas human time-points include prenatal brain samples. Since TTTP is calculated as the earliest transition point (if multiple are detected in the sample gene) it is likely sensitive to the age windows assessed.

13) Age differences – the peak age of TTTP is reported to be in the range of 25-30 years in human. However, there are a relatively few number of samples in the 2-20 age groups compared with age 0 and 20+. The ability to identity changes in transcriptomic trajectory during this crucial period of brain development is strongly unpowered and could confound the age trajectory of TTTP.

14) Psychiatric enrichments – subsection “Psychiatric susceptibility genes in young adults”, paragraph two: "TTTP.…accurately predicted the age windows for the onset of schizophrenia and anxiety disorders." As shown in Figure 5—figure supplement 3, this is highly contingent on parameter choice of the model used to fit transcriptome data. Loess regression or cubic spline with 4 dof predicts much earlier onset of schizophrenia (~16 year old). The choice of parameters does not seem to be justified either biologically or in terms of fit to the dataset. As such, this conclusion cannot be considered robust.

15) Cell-type enrichment analyses are likely on very small numbers of genes, which is susceptible to bias. The Zeisel dataset was used to define single cell transcriptomes. However, this dataset is based on single-cell RNAseq with <3000 genes expressed in each cell. Genes were dropped from analysis that were not expressed in Zeisel and therefore the input set of genes is likely very small.

16) The author claimed the accuracies of age prediction are 6.5 years in human and 28 days in mice. If this prediction included fetal samples, the prediction is imprecise. One similar age prediction from DNA methylation reported the predicting difference is close to zero for embryonic samples (Genome Biology 2013, 14:R115). If this prediction did not include fetal samples, the authors should not state that, "Remarkably, they showed accurate age predictions across the entire range of ages in both species," as Braincloud includes 38 fetal samples. Similarly, the authors should make clear that they excluded these samples, and, in light of the dramatically higher temporal dynamics observed in fetal and infant periods in these samples (Nature 478, 519-523, Figure 2B), why they excluded these samples.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "A genomic ageing program that reorganises the young adult brain is targeted in schizophrenia and anxiety disorders" for further consideration at eLife. Your revised article has been favorably evaluated by a Senior editor, a Reviewing editor, and one reviewer.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Reviewer #1:

In my original review, I was very enthusiastic about the innovative approach and ideas contained in this paper, but was concerned about Methods and potential confounders, which were hard to understand and evaluate in the original paper. The other reviewer had very similar concerns. The authors now provide a detailed response to the critiques and a revised manuscript, which is much improved. They have rewritten, shortened many sections and improved clarity. They have done a good job of clarifying parameter choice and for some of algorithms, and shown the robustness of the methods to these choices.

I should emphasize that I still have my original enthusiasm about this innovative study. But although some of my concerns are well addressed, others regarding the robustness of the method and interpretation of the data are not and still need additional work. Because this study could illuminate some areas of disease susceptibility that have been mysterious, it could have high impact, but it is critical that it be methodologically solid in every respect. I think it is now believable that young adult is a time of interesting changes in gene expression trajectories, although as the authors acknowledge choice of spline parameters effects the actual age considerably. This is not a minor concern that needs acknowledgement, but a key point of their analysis. Better if the authors presented an interval of TTTP ranges based on parameter choices just to show what we can be certain about from these data. The major question that still remains is how is this related to disease specific factors.

Other concerns are as follows:

1) The main focus on the paper is defining transcripome trajectory turning points, which they then relate to disease. They have clarified the use of algorithms and Methods. Now that some of these issues are clarified, one major question still remains. If the TTTP indeed does define SZ onset at late adolescence with a peak at 25 years (although with some range depending on parameters chosen as mentioned above), it should be specific to schizophrenia and not to other disorders. The choice of gene sets to be compared is critical here, and in no way are the genetic studies of the brain disorders that are compared comparable. Some identify mostly common variants of small effect size, and others, such as ID and autism, mostly rare variants with large effect size. These mutations are under considerably different selection pressures as their effects sizes differ.

So, a major problem that still remains is that the choice of the 600 SZ de novo genes is not well supported by statistical genetic analysis. The authors’ response to this in the revision is not satisfactory. There is strong support for de novo mutations as a class in SZ and other neurodevelopmental disorders, and some pathways, more equivocally. But, the individual genes in this list are mostly not supported by evidence yet. It is not a question of finding mutations that occur only in SZ patients and not in controls; it is standard population genetics. de novo mutations occur frequently and many of these genes are almost certainly not risk genes. The authors should absorb some of the analyses in the following papers and should filter the list by metrics that are now well accepted in human genetics (PMID:27899611; PMID:25086666; PMID: 27535533; PMID:27533299; PMID:26439716; PMID:25684150).

Also in this regard, genome-wide significant genes for anxiety disorders have not been identified. Again, the use of a hodge podge of gene lists, without strong statistical support or rationale plagues this major aspect of the paper. The choice of gene lists is so important that it should be presented up-front and clarified. And when comparisons are made with these lists, they should be of equal power, or they don't support disease specificity. In other words, the gene lists should be the same size approximately, or better yet, account for a similar proportion of the population attributable genetic liability to account for the different effect sizes of mutations and common variants. Further, anxiety should certainly not be used at all, unless the authors can provide genome-wide statistical support for the genes identified.

If the authors want to use this SZ list of 600 genes that lacks statistical support or OR for most of the genes, they should definitely compare to a similarly filtered of genes that cause autism and intellectual disability, or list of brain enriched genes (2/3 fold?) to see if the trajectory that they see in SZ is meaningful, as mentioned above. In the gene lists there are 2- 3x more SZ genes than ASD genes (one list has 170, the other about 380), and there are only about 80 ID genes listed, when there are >400 known. At least the sensitivity of the TTTP analyses to the size of the gene list should be provided. Another ground truth would be to show that a similar sized list of PSD genes, or highly synaptically enriched genes not chosen for association with SZ did not show the same pattern. It simply may be that certain classes of synaptic genes are enriched for this pattern of trajectory switching and this creates a vulnerable period, but it is not due to an enrichment of SZ risk genes themselves.

3) The human-mouse comparison is really a tough area as the authors recognize, since it has been demonstrated that there is not a linear, constant scaling of stage comparability between these species across development when specific developmental processes are compared. The authors should at least cite some of the work that demonstrates this and discuss it more thoroughly as a caveat. Currently it is somewhat glib. They have done a very good job of dealing with this issue analytically, but the readers should know that it is a biological conundrum that is tough to address with any method mapping human development to mouse, and what the limitations are.

https://doi.org/10.7554/eLife.17915.036

Author response

Essential revisions:

1) The text needs attention. The Introduction is long and meandering and would be more accessible if shortened significantly. The authors need to explain what they have done with greater clarity. For example the model the authors propose (subsection “Transcriptome trajectories and schizophrenia” paragraph three) is so vague as to be nearly meaningless.

We acknowledge that the text would benefit from improvement in clarity and brevity. We have edited the Abstract, and significantly shortened the Introduction and Discussion. We have also rewritten the model and introduced the phrase “genetic lifespan calendar” to help convey that the gene regulatory events occur at particular ages.

2) The authors need to explain better which models they used and why. There are several algorithms presented in this work (TTTP, peget, DeGET, AliGET), of which only TTTP is straightforward. It is hard to understand why the authors choose to use one over the other.

We have improved the clarity of the explanation of these approaches and how they complement one another in the first paragraph of the Results. Where we can do so without overwhelming the reader we have provided ALiGeT, DeGeT and PeGeT all together.

Some of the models (such as Aliget) include parameters that do not have a clear biological justification and seem somewhat arbitrary.

There is one parameter used in the human ALiGeT calculation: the base of the exponential is set as 1.5 (the one other parameter used for the mouse dataset is explained below in response 2.iii). This parameter sets the distance over which genes can have a strong influence on the ALiGeT score for a given year. It has a fairly direct biological interpretation: larger values imply that functionally related sets of genes will ‘turn’ in increasingly small windows of time and that the dataset is able to accurately detect the turning points. This parameter is explained in detail in subsection “Scoring TTTPs”. We show in Figure 5—figure supplement 5 that the choice of parameter does not have a particularly significant effect on the results. We now reference this figure supplement in the Materials and methods section.

Comparison of units across species is also unclear – is a human "year" equivalent to a mouse "day"?

We inadvertently had left these details out of the methods and thank the reviewer for bringing it to our attention. We have amended the Materials and Methods section to make this clear.

For the ALiGeT analysis 8.33 mouse days is equivalent to one human year. This was calculated on the basis that the species can be considered in old age at 650 days (for mice) and 78 years (for human). Then 650/78=8.33.The term in the ALiGeT function where length of time enters the equation is in the exponent. We thus scale the term by dividing by 8.33. The width of the ALiGeT decay function is then approximately equal (in proportion of lifespan) between the two species.

3) Correction for multiple testing needs to be made explicit. For the cell type analysis a Bonferoni correction is used, but it is not clear that appropriate corrections were made for other analyses (the multiple different enrichment tests).

We have made the use of multiple testing correction more explicit in the paper. Where a p-value is Bonferroni corrected we now denote it as. Where a p-value is Benjamini-Hochberg corrected with FDR=0.5 we now denote it as. The Materials and methods section now makes clear for every analysis performed whether and how multiple testing correction was performed.

4) A list of >600 de novo mutation-containing genes is used. However, this is not a list of genes with any level of reasonable statistical support – what is the FDR? Are these all mutations in genes never observed in controls?

To the best of our knowledge there is not a single mutation associated with schizophrenia—whether a genome wide significant SNP or de novo deletion—which is never found in controls. The schizophrenia associated mutations with highest penetrance are a set of copy number variations which generally occur as a result of de novo mutations. There is strong statistical support for this (see for instance PMID: 21855053). de novo mutations have been found to occur at a higher rate in schizophrenics relative to controls (PMID: 23911319). Those de novo mutations which do occur are found at very significantly enriched rates in particular groups of genes, such as those involved in NMDA receptor function (PMID: 24463507). The functional associations found for de novo mutations are the same as for those found through GWAS studies. We conclude that there is strong statistical support for de novo mutations playing a role in Schizophrenia, and that the set of genes found to bear mutations acts in schizophrenia related functional pathways.

We further note that we do already perform separate analyses for genes found through GWAS and de novo studies and find the same results (see Figures 5G-J). A result that was important for us is that we used two separate sets of de novo genes (treating those from Gulsuner et al. as separate) and again find the same result for both (Figure 5—figure supplement 6).

The anxiety gene list is not well statistically supported and has no place in such an analysis.

We agree that the anxiety associated genes are not ideal, however we believe that it is reasonable to use them with the appropriate caveats. Preferably we would have used separate genome wide significant genes lists for each anxiety disorder subtype, but it is likely to be many years before this data becomes available. The gene lists that we used combines knowledge from all available sources that could be brought to bear on the issue, including human and mouse genetics. We have made it clear that the human genetic basis for the anxiety gene set is weaker than that of others disorders. See subsection “Psychiatric susceptibility genes in young adults; “It should be noted that the human genetic evidence for the anxiety gene set is not as substantial as that for schizophrenia and included mouse genetic data (see Materials and methods). We expect these results to be further evaluated with larger datasets including those for specific anxiety disorders in the future.” In a discussion paragraph where we discuss the relationship between schizophrenia and anxiety, we again remind the reader of this point: “The genetic basis of anxiety disorders is not as well characterised as schizophrenia and we expect that future large-scale datasets for specific anxiety disorders with adolescent onset will enhance our understanding of the role of the genetic calendar.” We believe this to be reasonable and provides the readers with the appropriate information for them to judge.

5) Some of the models include a δ-E term, which "negates the contribution of genes that merely fluctuate away from their mean." The δ-E term also likely biases analysis toward genes with certain levels of baseline expression, as well as the dynamic range of microarrays.

We acknowledge that the phrasing of that sentence made it sound as though the primary purpose of the δ-E term was “to negates the contribution from genes that merely fluctuate away from their mean”. We have now rewritten that section (“Quantifying expression trajectoris”) to explain that the δ-E term is fundamental to this analysis as it focuses the turning point analysis on those genes with the largest role in brain development.

As the core results have been replicated using the BrainSpan RNA-Seq dataset (Figure 4—figure supplement 1 and Figure 5—figure supplement 2) it is clear that the dynamic range of microarrays does not influence our findings.

In addition, the gene fluctuation issue should be adequately accounted for by using spline regressions and biological replicates.

It is unclear how “spline regressions and biological replicates” could ever stop TTTPs with very small ΔE terms from occurring: if a gene has stable expression it will show (very) small changes in direction at some point during the lifespan due to random fluctuations. Only genes which consistently change in one direction will avoid this. This can be proved by taking random samples from a normal distribution and fitting splines to the sampled values—at some point the spline will change direction due to random effects. Thus small values of ΔE indicate real biological information: that the gene’s expression is stable.

6) With regard to the age classifier, it is unclear if this is developed using a test set and validation set, which is necessary to truly evaluate accuracy that would be generalizable and meaningful. If it is not, they must find an independent validation set to make this of any meaning. In this regard, the cell type analysis does use a validation set for example.

We can confirm that we partitioned the data into training, validation and test data sets. This was done through two separate rounds of 10-fold cross-validation. The grid search was performed using the tune.svm() function from the e1071 package—this performs 10-fold crossvalidation and was thus passed both the training and validation datasets (separate from the test data). The relevant part of the Materials and Methods section has been improved to make this clearer. We have made it more explicit in the Results section that the data is partitioned into separate datasets sets.

7) All models are based on the TTTP point, which is defined as the earliest age in which the first derivative of expression changes signs. It is not clear why the authors only use the earliest of such points when genes can clearly have multiple inflections across the lifetime (see Figure 2—figure supplement 1).

In Figure 2G we show that the majority of genes plateau after the first turning point, indicating that later turning points are not as pervasive and important as the first ones.

By taking the earliest inflection point, this makes the analysis somewhat contingent on youngest sample in a dataset (which is not balanced between mouse and human experiments), as well as the resolution of samples at the younger timepoints, which is sparse for BrainCloud (see below).

Acknowledging this point, we have now introduced a new figure (Figure 5—figure supplement 6) which shows that our core results obtained with the Braincloud dataset are stable to removal of all fetal samples. This shows that the analysis is not overly contingent on the age of the youngest samples. We do acknowledge that the difference of starting point between human and mice may introduce differences between the two analyses: this is noted in the paper.

Finally, all models are highly contingent on (1) the quality and resolution of the input datasets and (2) the accuracy of the spline regression model. As shown in the supplement, choice of regression parameters can have dramatic effects on downstream analyses. There is no justification for the cubic spline model used or any assessment of the accuracy with which it fits the underlying data. The choice of parameters seems arbitrary.

We have shown in Figure 5—figure supplement 3 that the core results remain stable when we vary the spline regression parameters.

The cubic spline with 3 degrees of freedom was selected as it balanced the competing priorities of: (a) reproducing the trajectories and (b) not overfitting to potentially outlying data points around the turning points. While using four degrees of freedom or loess regression necessarily result in better fits (in terms of regression residuals) this was at the cost of being more susceptible to extreme data points around the period at which the turn occurs.

If you consider the example trajectories shown for turning points at different ages shown in Figure 2—figure supplement 1 then we think you will agree that the cubic splines capture the timing of the turning points accurately.

8) Gender differences – Using the Braincloud dataset, the peak age of TTTP is reported to be delayed in females compared to males and this is used as an explanation for why psychiatric diseases occur later in F. However, the Braincloud data that this result is based on has already had sex effects regressed out prior to analysis. It is inappropriate to make any comment on sex effect without using the non-regressed dataset.

Gender was regressed out as a categorical factor in a linear model. As such, within the set of female samples, it is equivalent to adding or subtracting a value to all samples. It would have no influence on the shape of the trajectory. Hence our approach was perfectly valid.

9) Braincloud is used to define the human TTTP trajectory and includes human prenatal and postnatal samples. The regressed and normalized data is downloaded from GEO. However, it is unclear whether measures of RNA quality and other sources of technical variability are taken into account in any of the analyses. There will likely be an interaction between age and RNA quality and therefore potentially more variability in gene expression signal seen during certain timepoints. Replication in the Kang et al. (GSE25219) dataset will be important as this is similar to BrainCloud in scope.

We are happy to say that we already do perform replication with that dataset. We refer to this as the Brainspan dataset. You will note that we also use a third dataset, the ‘Somel’ dataset, for validation. See Figure 4—figure supplement 1 and Figure 5—figure supplement 2.

10) The authors used the cubic splines with 3 DOF to depict the gene expression trajectories, and the TTTPs were detected by searching slope changing points along the smooth curves. How well does this actually fit the data? Why were 3 DOF chosen? There should be some rigorous assessment of accuracy of regression model (compared with Loess, etc)

We have shown in Figure 5—figure supplement 3 that the core results remain stable when we vary the spline regression parameters.

The cubic spline with 3 degrees of freedom was selected as it balanced the competing priorities of: (a) reproducing the trajectories and (b) not overfitting to potentially outlying data points around the turning points. While using four degrees of freedom or loess regression necessarily result in better fits (in terms of regression residuals) this was at the cost of being more susceptible to extreme data points around the period at which the turn occurs.

If you consider the example trajectories shown for turning points at different ages shown in Figure 2—figure supplement 1 then we think you will agree that the cubic splines capture the timing of the turning points accurately.

11) The degree of smooth curve and the number of binned intervals could affect the number of identified TTTPs. But in subsection “Detecting turning points (TTTPs” the author said "If multiple turning points were found in a single spline, then only the first was used". This strategy may result in the imprecisely predicting TTTPs, e.g., the differences shown in Figure 4—figure supplement 2 when using different parameters and different methods.

As was shown in the original Braincloud paper, the majority of expression change occurs during the early period of life. The point of detecting TTTPs is essentially to determine when the ‘developmental’ trajectory for a given gene ends. TTTPs which occur later in life will necessarily not be relevant to this (as the developmental trajectory for that gene will have already ended). Furthermore, in Figure 2G we show that the majority of genes plateau after the first turning point, indicating that later turning points are not as pervasive and important as the first ones. As such we do not consider it necessary/valuable to consider them within this paper.

12) Human-Mouse comparisons: the authors find a peak of TTTP around age 160 days in mice which they imply is equivalent to 26-30 years in human. However, mice were not assessed before 58 days postnatally whereas human time-points include prenatal brain samples. Since TTTP is calculated as the earliest transition point (if multiple are detected in the sample gene) it is likely sensitive to the age windows assessed.

We have now introduced a new figure (Figure 5—figure supplement 6), which shows that our core results obtained with the Braincloud dataset are stable to removal of all fetal samples. This shows that the analysis is not overly contingent on the age of the youngest samples. We do acknowledge that the difference of starting point between human and mice may introduce differences between the two analyses: this is noted in the paper and does not diminish the results which are found.

13) Age differences – the peak age of TTTP is reported to be in the range of 25-30 years in human. However, there are a relatively few number of samples in the 2-20 age groups compared with age 0 and 20+. The ability to identity changes in transcriptomic trajectory during this crucial period of brain development is strongly unpowered and could confound the age trajectory of TTTP.

Unfortunately obtaining more brains of human children and adolescents is difficult and outside the scope of this study. The dataset used has far more coverage of that period than any other dataset available. We have added a cautionary note early in the Results section to ensure the readers note how this could have affected the analysis:

“We note that although the exact ages assigned to TTTPs are sensitive to both the regression method and the dataset used, and that if more data points available between childhood and early adulthood the ages assigned to particular genes would change, it is clear that the TTTP-peak reveals a major molecular reorganisation in young adults towards the end of development.”

14) Psychiatric enrichments – subsection “Psychiatric susceptibility genes in young adults”, paragraph two: "TTTP.…accurately predicted the age windows for the onset of schizophrenia and anxiety disorders." As shown in Figure 5—figure supplement 3, this is highly contingent on parameter choice of the model used to fit transcriptome data. Loess regression or cubic spline with 4 dof predicts much earlier onset of schizophrenia (~16 year old). The choice of parameters does not seem to be justified either biologically or in terms of fit to the dataset. As such, this conclusion cannot be considered robust.

While the precise age at which the window of enrichment occurs is dependent on the regression model used, the temporal ordering of TTTP’s is well conserved. There is a pearson’s correlation of 0.6 between TTTPs detected with cubic splines (3 df) and those detected using loess regression (when considering the 14544 probes which account for 90% of ∆E). To acknowledge your concerns, we have added a line to the Results section which brings this to the reader’s attention:

“While validating the occurrence of the disease enrichments, this analysis also revealed that the exact age at which the turning points/windows of enrichments occur depends on the regression model used (Figure 5—figure supplement 3).”

We have also noted early on in the Results:

“We note that although the exact ages assigned to TTTPs are sensitive to both the regression method and the dataset used, and that if more data points available between childhood and early adulthood the ages assigned to particular genes would change, it is clear that the TTTP-peak reveals a major molecular reorganisation in young adults towards the end of development.”

15) Cell-type enrichment analyses are likely on very small numbers of genes, which is susceptible to bias. The Zeisel dataset was used to define single cell transcriptomes. However, this dataset is based on single-cell RNAseq with <3000 genes expressed in each cell. Genes were dropped from analysis that were not expressed in Zeisel and therefore the input set of genes is likely very small.

The Zeisel dataset has reads from significantly more than 3000 genes per cell. The full dataset has reads from 19965 genes. A total of 16762 genes have reads assigned to cells classed as interneurons. Similarly, 15421 genes have reads assigned to cells classed as astrocytes/ependymal cells. The cell type enrichment analysis method which we used (and which we developed) was explicitly designed to utilize information from all the genes expressed in a celltype rather than just a “small number of genes”. As such, this criticism does not apply to the analysis we performed.

16) The author claimed the accuracies of age prediction are 6.5 years in human and 28 days in mice. If this prediction included fetal samples, the prediction is imprecise. One similar age prediction from DNA methylation reported the predicting difference is close to zero for embryonic samples (Genome Biology 2013, 14:R115). If this prediction did not include fetal samples, the authors should not state that, "Remarkably, they showed accurate age predictions across the entire range of ages in both species," as Braincloud includes 38 fetal samples. Similarly, the authors should make clear that they excluded these samples, and, in light of the dramatically higher temporal dynamics observed in fetal and infant periods in these samples (Nature 478, 519-523, Figure 2B), why they excluded these samples.

We thank the reviewer for leading us to note that we had actually excluded the younger human samples. With the younger samples (including fetal) included, the prediction accuracy was improved (R2=0.88 and mean absolute difference between predicted and actual age of 5.48). We have changed all relevant parts of the paper to reflect this (i.e. Figure 2H and Figure 2—figure supplement 3A).

We note that we do not find it surprising that it is harder to predict the age of adults than fetal samples. It has been shown that biological age predictions using DNA methylation profiles are affected by lifestyle (PMID: 25617346, PMID: 26673150) and hence we expect this reflects real biological variation.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Reviewer #1:

[…] I should emphasize that I still have my original enthusiasm about this innovative study. But although some of my concerns are well addressed, others regarding the robustness of the method and interpretation of the data are not and still need additional work. Because this study could illuminate some areas of disease susceptibility that have been mysterious, it could have high impact, but it is critical that it be methodologically solid in every respect. I think it is now believable that young adult is a time of interesting changes in gene expression trajectories, although as the authors acknowledge choice of spline parameters effects the actual age considerably. This is not a minor concern that needs acknowledgement, but a key point of their analysis. Better if the authors presented an interval of TTTP ranges based on parameter choices just to show what we can be certain about from these data. The major question that still remains is how is this related to disease specific factors.

We acknowledge that we did not make it clear enough that the exact age of TTTPs is dependent on choice of spline parameters. We have now amended Figure 2 such that when the distribution of TTTPs is shown, it is immediately clear in Figure 2B that the age distribution is affected by the method of spline fitting. We note that Figure 5—figure supplement 2 (which was included previously in the paper) makes it clear how the choice of spline affects individual trajectories and ALIGET enrichments, and thereby shows that the key results are robust across methods.

Other concerns are as follows:

1) The main focus on the paper is defining transcripome trajectory turning points, which they then relate to disease. […] So, a major problem that still remains is that the choice of the 600 SZ de novo genes is not well supported by statistical genetic analysis. The authors’ response to this in the revision is not satisfactory. There is strong support for de novo mutations as a class in SZ and other neurodevelopmental disorders, and some pathways, more equivocally. But, the individual genes in this list are mostly not supported by evidence yet. It is not a question of finding mutations that occur only in SZ patients and not in controls; it is standard population genetics. de novo mutations occur frequently and many of these genes are almost certainly not risk genes. The authors should absorb some of the analyses in the following papers and should filter the list by metrics that are now well accepted in human genetics (PMID:27899611; PMID:25086666; PMID: 27535533; PMID:27533299; PMID:26439716; PMID:25684150).

We thank the reviewer for this comment which inspired us to incorporate a new genetic analysis into the paper which directly utilizes SNP summary statistic data from the largest publically available Schizophrenia GWAS study while accounting for linkage disequilibrium and other confounds using Multi-marker Analysis of GenoMic Annotation (MAGMA).

We explicitly acknowledge the flaws with the de novo gene set when we introduce the key findings discovered using this method:

“Much of the heritability for schizophrenia is associated with SNPs that have not reached genome-wide significance with current sample sizes52 (and thus were not included in our analysis thus far) and the sample sizes for de novo studies are too small to determine whether any genes found are significantly associated with disease status. We therefore adapted our methods to include a greater fraction of the SNPs associated with schizophrenia heritability by using association statistics from all SNPs regardless of whether they are genome wide significant (as defined in GWAS summary statistics files) and explicitly modelling linkage disequilibrium (based on results of the 1000 genomes project), such that disease association scores can be ascertained for each gene. The ALiGeT and DEGET approaches were extended to directly utilise the gene association scores generated by MAGMA (Multi-marker Analysis of GenoMic Annotation)53 based on schizophrenia GWAS summary statistics. Using this approach, schizophrenia showed a DEGET enrichment at 26 years in the Braincloud dataset (=0.03135, Figure 5H) and at 15—16 years in the Somel dataset (=0.0115, Figure 5—figure supplement 8A) and corresponding enrichment windows were found using ALiGeT (Figure 5I, Figure 5—figure supplement 8B).”

The results generated using this method are shown in Figures 5H,I, 6B, D and Figure 5—figure supplement 8. We thereby reproduced all the key results of the paper which pertain to Schizophrenia in both in Braincloud and Somel datasets, with the exception of the ‘neuronal only’ enrichment which failed to replicate (negative result shown in Figure 6B).

Also in this regard, genome-wide significant genes for anxiety disorders have not been identified. Again, the use of a hodge podge of gene lists, without strong statistical support or rationale plagues this major aspect of the paper. The choice of gene lists is so important that it should be presented up-front and clarified. And when comparisons are made with these lists, they should be of equal power, or they don't support disease specificity. In other words, the gene lists should be the same size approximately, or better yet, account for a similar proportion of the population attributable genetic liability to account for the different effect sizes of mutations and common variants. Further, anxiety should certainly not be used at all, unless the authors can provide genome-wide statistical support for the genes identified.

We acknowledge that the anxiety gene set did not have a sufficient level of genetic support and have dropped all results pertaining to disorder and edited the text and figures accordingly.

Please note the improvements we have made to the genetic analysis of Schizophrenia (described in response 3). For all the other disorders we maintain that this is the most strongly supported set of genes available for this set of major brain disorders.

We agree with the reviewer that for the purposes of comparisons, the lists should be of same size etc. However, we want to emphasise that we are not comparing these diseases – we are asking if the set of genes relevant to an individual disease are enriched in an age window.

We also don’t believe it is right to interpret our results as a comparison of the relevance of the calendar to diseases for several reasons. These diseases are known to have a different cellular basis and the gene expression data is obtained from whole tissue and not from the relevant individual cell types. Moreover, the transcriptome data is from the human prefrontal cortex, which is known to be important in schizophrenia, but is not the primary region of pathology in either multiple sclerosis or Parkinsons disease. So, we cannot exclude a role of the genetic calendar in these disorders until the relevant tissue is examined.

We believe that the paper would benefit from a clear statement about these issues at the point in the Results when the disease gene lists are first mentioned and in the Discussion.

In the Results we have added:

“Because these gene lists are not comparable (different size, population sample size, obtained using different technical approaches etc.) the relative importance of the genetic calendar to schizophrenia cannot be directly compared with these disorders (see Discussion). In addition, because the transcriptome data is from prefrontal cortex and the primary pathology of several of these diseases is in other parts of the nervous system it cannot be assumed that the transcriptome trajectories in one part of the brain are the same as in others. Hence, the lack of any detectable age window does not preclude a role for gene regulation in the onset of these diseases.”

In the Discussion we have added:

“Our studies relied on human prefrontal cortex transcriptome data, which may have limited our ability to detect a role for the genetic lifespan calendar in the age-dependent onset of those diseases that are known to have primary pathology in other brain regions (e.g. Parkinson’s disease). Given the evolutionary conservation between mouse hippocampus and human prefrontal cortex, we expect other human brain regions to show a calendar of transcriptome trajectories, but with different patterns and therefore different age windows of disease gene enrichments. Moreover, while our whole tissue transcriptome analysis appeared to be sensitive to neuronal changes, we expect that future single-cell transcriptome data will provide a more detailed insight into rarer cell types and potentially reveal mechanisms relevant to the age of onset of pathology in these cells. We do not expect that the age of onset of all brain diseases will be accounted for by the genetic calendar, as it will likely depend on the importance of cell autonomous processes and exogenous factors (e.g. inflammatory processes involving microglia). It is also likely that high penetrance severe mutations will show an earlier onset and are less likely to show a dependence on the TTTPs. Weaker alleles may manifest with undetectable or subtle phenotypes at early ages and be exposed at later ages by the changes in gene expression. Thus, TTTPs would not necessarily account for intellectual disability or autism even though some of the same genes are involved with schizophrenia.”

If the authors want to use this SZ list of 600 genes that lacks statistical support or OR for most of the genes, they should definitely compare to a similarly filtered of genes that cause autism and intellectual disability, or list of brain enriched genes (2/3 fold?) to see if the trajectory that they see in SZ is meaningful, as mentioned above. In the gene lists there are 2- 3x more SZ genes than ASD genes (one list has 170, the other about 380), and there are only about 80 ID genes listed, when there are >400 known. At least the sensitivity of the TTTP analyses to the size of the gene list should be provided. Another ground truth would be to show that a similar sized list of PSD genes, or highly synaptically enriched genes not chosen for association with SZ did not show the same pattern. It simply may be that certain classes of synaptic genes are enriched for this pattern of trajectory switching and this creates a vulnerable period, but it is not due to an enrichment of SZ risk genes themselves.

Firstly, we now include a sensitivity analysis to the size of both the PSD and Schizophrenia gene lists in Figure 5—figure supplement 3. This reveals that for equivalently sized gene sets, the enrichment is stronger in earlier adulthood for Schizophrenia than for the PSD. We thank the reviewer for suggesting this informative control.

Secondly, the reviewer suggests we compare to a list of brain enriched genes. We note that we already test for enrichment of pyramidal neurons, interneurons and all other major cortical cell types in Figure 3. In Figure 4 we test for enrichment of the human post-synaptic density. The different cell types and proteomes show clearly distinct lifespan trajectories.

Thirdly, in further rebuttal of the point that the Schizophrenia enrichment may simply be a result of a ‘brain expressed’ enrichment, we note that Figure 6A shows that the enrichment is present even when the analysis is limited to a subset of the 5000 most neuron specific genes. Thus, even within neuronal genes, there is a clear enrichment of schizophrenia turning points during early adulthood.

Fourth, we note again that we now include a genetic analysis of the Schizophrenia enrichment performed in accordance with what we understand to be the highest standards in the field, directly utilizing GWAS summary statistics via the MAGMA analysis package (see response 3).

3) The human-mouse comparison is really a tough area as the authors recognize, since it has been demonstrated that there is not a linear, constant scaling of stage comparability between these species across development when specific developmental processes are compared. The authors should at least cite some of the work that demonstrates this and discuss it more thoroughly as a caveat. Currently it is somewhat glib. They have done a very good job of dealing with this issue analytically, but the readers should know that it is a biological conundrum that is tough to address with any method mapping human development to mouse, and what the limitations are.

We appreciate the reviewer’s acknowledgement of the difficultly of handling mouse/human age comparisons. We have now included a reference to the following paper, which we believe makes the best attempt to compare brain ages between species:

Workman, A. D., Charvet, C. J., Clancy, B., Darlington, R. B. & Finlay, B. L. Modeling transformations of neurodevelopmental sequences across mammalian species. Journal of Neuroscience 33, 7368-7383 (2013).

The authors of the paper referenced above created a website with a model (http://www.translatingtime.net/translate) for translating ages between species. Using their tool we obtained the following output:

“A synapse elimination and elimination event in the cortex of the Human at PC Day 7100 translates to PC Day 156 in the Mouse.”

We have added the following sentence to the Results:

“Although there is a lack of previous research on the age equivalence of early adulthood between rodents and humans, our results are concordant with estimations made using the TranslatingTime species comparison model which suggests that p156 (5 months) is equivalent to human early adulthood based on equivalent levels of cortical synaptic maturation.”

https://doi.org/10.7554/eLife.17915.037

Article and author information

Author details

  1. Nathan G Skene

    Genes to Cognition Programme, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Data curation, Software, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon 0000-0002-6807-3180
  2. Marcia Roy

    Genes to Cognition Programme, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Data curation, Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon 0000-0002-4088-3958
  3. Seth GN Grant

    Genes to Cognition Programme, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    seth.grant@ed.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon 0000-0001-8732-8735

Funding

Seventh Framework Programme (HEALTH-F2-2009-241995)

  • Seth GN Grant

Seventh Framework Programme (HEALTH-F2-2009-242167)

  • Seth GN Grant

Wellcome Trust (Graduate Student Fellowship)

  • Nathan G Skene

Wellcome Trust Sanger Institute (Core funding)

  • Seth GN Grant

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Support from the Medical Research Council, Wellcome Trust, European Union Seventh Framework Programme (FP7 grant agreement no. 242167). T Le Bihan and L Imrie at SynthSys, University of Edinburgh for mass spectrometry sample analysis. The LC-MS QExactive equipment was purchased by a Wellcome Trust Institutional Strategic Support Fund and a strategic award from the Wellcome Trust for the Centre for Immunity, Infection and Evolution (095831/Z/11/Z). D Maizels for artwork.

Ethics

Animal experimentation: All animal experiments conformed to the British Home Office Regulations (Animal Scientific Procedures Act 1986; Project License PPL80/2,337 to Prof Seth Grant), local ethical approval, and NIH guidelines.

Reviewing Editor

  1. Jonathan Flint, Reviewing Editor, University of California, Los Angeles, United States

Publication history

  1. Received: May 17, 2016
  2. Accepted: August 15, 2017
  3. Version of Record published: September 12, 2017 (version 1)

Copyright

© 2017, Skene et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,683
    Page views
  • 288
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Scopus, Crossref.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Chia-Wei Chang et al.
    Research Article Updated