The genetic mechanisms regulating the brain and behaviour across the lifespan are poorly understood. We found that lifespan transcriptome trajectories describe a calendar of gene regulatory events in the brain of humans and mice. Transcriptome trajectories defined a sequence of gene expression changes in neuronal, glial and endothelial cell-types, which enabled prediction of age from tissue samples. A major lifespan landmark was the peak change in trajectories occurring in humans at 26 years and in mice at 5 months of age. This species-conserved peak was delayed in females and marked a reorganization of expression of synaptic and schizophrenia-susceptibility genes. The lifespan calendar predicted the characteristic age of onset in young adults and sex differences in schizophrenia. We propose a genomic program generates a lifespan calendar of gene regulation that times age-dependent molecular organization of the brain and mutations that interrupt the program in young adults cause schizophrenia.https://doi.org/10.7554/eLife.17915.001
In our lifetime, we go through many changes – physically and also intellectually. At certain ages, we are particularly vulnerable to develop psychiatric disorders, and the majority of mental conditions start to manifest in teenagers and young adults. The symptoms for schizophrenia, for example, a mental health disorder in which patients often experience hallucinations, delusion or changes in behavior, typically start in the mid-twenties.
Schizophrenia tends to run in families and it is likely that different combinations of faulty genes that affect the connections between nerve cells increase the chance of having the disease. Until now, scientists have assumed that certain situations and environmental factors trigger the condition, but it was unknown if genes could influence the age at which the disease will begin.
To explore whether genes in the brain change at certain time points, Skene et al. examined how the genes are turned on and off across the lifespan of healthy mice and humans. The results showed that in both mice and humans, a ‘genetic lifespan calendar’ controlled every cell type in the brain and directed the way they worked at different ages. The timing was so precise that it was possible tell the age of a mouse or a person simply by looking at the way the genes were expressed in a tissue sample.
Skene et al. then studied how the genetic lifespan calendar controlled the genes damaged in schizophrenia, and found that the calendar caused a major reorganization of the genes at the time when the symptoms started. This suggests that the genetic lifespan calendar is a crucial factor that can determine at what age the disease will start.
The next step will be to study how the genetic lifespan calendar programs changes throughout the brain and to explore if it could be manipulated to change how the brain ages. This could help to develop new types of treatments for schizophrenia and other conditions of the brain.https://doi.org/10.7554/eLife.17915.002
Identifying the genetic mechanisms that underpin brain ageing across the lifespan may provide explanations for the maturation of behaviours and age of onset of diseases. Longitudinal studies show cognition, emotion and personality emerge progressively during childhood and adolescence, with executive functions peaking in early adulthood (Craik and Bialystok, 2006; De Luca et al., 2003). This coincides with the onset of some of the most devastating psychiatric disorders, most of which arise during later stages of brain development in the teenage years and early twenties (Kessler et al., 2007). For example, impulse-control disorders arise in late childhood and early teen years, substance abuse peaks in the early twenties, and schizophrenia in the mid-twenties (with a delay of around two years in females) (Häfner et al., 1993). Some monogenic neurological disorders also have early adult onset, such as Inclusion Body Myopathy associated with Paget disease Frontotemporal Dementia (Watts et al., 2004) and rapid-onset dystonia Parkinsonism (Brashear et al., 2012).
Why some brain disorders with a strong genetic component have a late developmental onset is unknown. The prevailing hypothesis for schizophrenia proposes that an early (fetal) insult or mutation renders the brain vulnerable to a secondary environmental insult that occurs in young adults, which then triggers the onset of psychosis (Bayer et al., 1999). However, the finding that the age of onset for schizophrenia has a heritability estimated at 33% (Hare et al., 2010), suggests that the timing may have a genetic basis. In recent years, there has been major progress in understanding the genetic basis of schizophrenia with the identification of many mutations and variants contributing to disease susceptibility. It is widely accepted that many mutations directly impact on synapse proteins, particularly those involved with postsynaptic signalling mechanisms in excitatory synapses (Kirov et al., 2008; Pocklington et al., 2015; Fromer et al., 2014; Fernández et al., 2009a; Singh et al., 2017). The postsynaptic proteome of excitatory synapses is physically organised into multiprotein complexes of which the supercomplexes assembled by PSD95 (Husi et al., 2000; Frank et al., 2016a; Frank and Grant, 2017; Frank et al., 2017) play a major role in regulating cognitive functions (Migaud et al., 1998; Nithianantharajah et al., 2013) and are disrupted by schizophrenia mutations (Pocklington et al., 2015; Fromer et al., 2014; Fernández et al., 2009a; Singh et al., 2017; Kirov et al., 2012; Purcell et al., 2014). Together these observations suggest there may be genetic mechanisms that account for the convergence between the many schizophrenia susceptibility genes, the postsynaptic proteome and the young adult brain.
Transcriptome profiling of the brain at different ages has shown complex changes in expression levels across the lifespan (Colantuoni et al., 2011). In early onset brain disorders, such as intellectual disability and autism, gene expression levels in fetal and early postnatal development correlate with enrichment in autism susceptibility genes (Willsey et al., 2013; Parikshak et al., 2013). However, correlation based approaches (Willsey et al., 2013; Parikshak et al., 2013; Gulsuner et al., 2013) (e.g. weighted gene coexpression network analysis) are unable to account for the full complexity of transcriptional changes that occur with age. Furthermore, although there has been extensive characterisation of cellular and anatomical maturation in neuronal, glial and synaptic subtypes (Alexander and Goldman, 1978; Shaw et al., 2006; Gogtay et al., 2004, 2006; Zehr et al., 2006; Woo et al., 1997; Huttenlocher, 1979; Anderson et al., 1995; Bourgeois et al., 1994; Kalsbeek et al., 1988; Klingberg et al., 1999), the relevant transcriptome changes remain to be identified.
To further understand the transcriptional events underlying brain development and ageing, we developed new tools that identify age-dependent gene regulatory events. These methods detect when gene expression trajectories change direction or plateau: we refer to these events as Transcriptome Trajectory Turning Points (TTTPs) (Figure 1). The timing of TTTPs has been shown to be an important feature of transcriptome trajectories during maize embryo development and the yeast cell-cycle: in both systems, genes with linked biological functions were found to turn/plateau at similar time points (Lee et al., 2002; Spellman et al., 1998). We have characterised TTTPs in the neocortex of humans and hippocampus of mice across their respective postnatal lifespans. This revealed a previously unknown, species conserved, gene regulatory program. These methods were also used with single-cell transcriptomes to define the age-dependent sequence of changes in neuronal, glial and endothelial cell-types. Our data suggest that the late onset of some psychiatric and neurological disorders is timed by mutations in this genetically programmed developmental sequence. Our findings also indicate that misregulation in the molecular maturation of synapse proteomes during a critical window in young adults is important for the onset of schizophrenia. These methods and findings open new areas of investigation into the genetic regulation of brain age and highlight their importance in the adolescent and young adult.
As shown in the schematic diagram (Figure 1a), genes can be categorised according to their age-dependent profile of expression into simple trajectories (monotonic, upward or downward gradients) or complex trajectories that contain a Trancriptome Trajectory Turning Point (TTTP), marking both the direction of change and the age at which it occurs. Following a TTTP, the expression may reverse direction or plateau. We systematically identified gene expression trajectories by fitting cubic splines and marking TTTPs where the first derivative dE/dA, (E = expression level and A = Age) of the interpolated trajectories equals zero and changes sign. Next, to take into account the extent of expression changes prior to the TTTP (ΔE), and thereby emphasise those genes for which the TTTP represents a significant regulatory event, we developed two complementary methods (DeGeT, Decile-based Gene Turning; ALiGeT, Age-Linked Gene Turning). The DeGeT method depicted in Figure 1b is conceptually the simplest: the lifespan is divided into ten age groups within which an approximately equal number of TTTPs occur; each gene then receives a score for each age group (ΔE if the gene turns within that age window, and zero otherwise). The ALiGeT method depicted in Figure 1c extends this to generate a score for each year of age, by decaying the contribution of ΔE the greater the distance between the TTTP and the scored year (example trajectories and associated ALiGeT scores are shown in Figure 1—figure supplement 1). The reason for using two scoring methods is that some age periods have many TTTPs and others have few: DeGeT controls for this by balancing the number of TTTPs within age groups (and thereby has greater power to detect enrichments in earlier/later life stages), while ALiGeT scoring allows for the possibility that small time windows will have distinct molecular associations. Together, the TTTP, DeGeT and ALiGeT methods provide general purpose tools for exploring age-dependent gene regulation.
We first applied these methods to the Braincloud dataset, which measured mRNA expression levels from 269 prefrontal cortex samples across the human lifespan (14th gestational week to 78 years). Although TTTPs were identified across the lifespan, they were sharply concentrated during early adulthood. Summing the number at each age shows a striking peak and a mean of 26.0 years in males and 27.5 years in females (Wilcoxon signed rank test, p=0, Figure 2a, example trajectories with different turning points shown in Figure 2—figure supplement 1). Prominent peaks in early adulthood were confirmed with three regression methods (cubic splines with three degrees of freedom, four degrees of freedom and Loess regression) (Figure 2b). Removing X-chromosome genes from the analysis had no effect on this sex difference (p=0). To determine whether this TTTP-peak was human specific, we performed an equivalent analysis using transcriptome data from the hippocampus of 186 mice of both sexes between 58 and 600 days of age (Figure 2—figure supplement 2): this also revealed a peak in the frequency of TTTPs with a mean of 156 days for male and 165 days for female mice (p<1×10−323, Figure 2c). Although there is a lack of previous research on the age equivalence of early adulthood between rodents and humans, our results are concordant with estimations made using the TranslatingTime species comparison model which suggests that p156 (5 months) is equivalent to human early adulthood based on equivalent levels of cortical synaptic maturation (Pinto et al., 2015; Pinto et al., 2013; Workman et al., 2013). Plotting the cumulative distribution of TTTPs across the lifespan further illustrates the TTTP-peak in young adults (Figure 2d,e), and shows that 90% of TTTPs occur by 40 years of age, corresponding to the last stages of human brain development (De Luca et al., 2003; Lebel et al., 2012; Wood et al., 2004) and 7 months of age in mice. These data indicate that despite greatly differing lifespans, these two mammalian species share a lifespan program of brain gene expression with conserved features.
To characterize the features of the human TTTPs in more detail, we focussed on those genes that showed the greatest changes. As shown in Figure 2f, the TTTPs for genes with the greatest expression changes prior to the TTTP were concentrated around the late-twenties, reinforcing the earlier finding that this is a significant period for switching in the trajectories. Next, we separated genes into those with upward or downward trajectories prior to the turning point: overall there were similar numbers in each category, although there was a skew toward upward inflecting genes being more common in young subjects (<25 years) and downward inflecting genes more common in older subjects (>40 years) (Figure 2g). Finally, we examined the direction of the trajectories after the TTTP by dividing genes into those that established a stable plateau (Post-Turn Plateau) or reversed their direction (Post-Turn Reversal) (Figure 2h). The vast majority of probes had plateaued by 30 years of age. While the exact ages at which TTTPs occur is sensitive to both the regression method and the dataset used, it is clear that the TTTP-peak reveals a major molecular reorganisation in young adults towards the end of development. Together these findings show that young adulthood is a crucial time for switching brain gene expression and establishing the set points of most genes for later life.
Even though there are major changes during brain maturation in young adults, the complex trajectories were found throughout the lifespan, suggesting they could be used to predict the biological age of the brain. To test this, we used radial basis support vector machines and demonstrated that classifiers trained on partitioned subsets of the gene expression data (training sets) predicted age in the test sets with an accuracy (defined as mean |AgeActual−AgePredicted|) in humans of 5.5 years and 28 days in mice. Remarkably, they showed accurate age predictions across the entire range of ages in both species (human, R2 = 0.88, mice, R2 = 0.94) (Figure 2i,j) using only 40 probes in humans and 100 in mice (Figure 2—figure supplement 3). Thus, TTTPs and trajectories are highly characteristic features defining brain age across the lifespan in mice and humans. This indicates a ‘genetic lifespan calendar’ of transcriptome events is a conserved feature of mammals.
To ascertain the biological processes affected by the TTTPs, we first sought to identify the cell types affected at each age. We asked if the TTTPs were enriched in the transcriptome of specific cell types using brain single-cell RNA-seq data and the Expression Weighted Cell-type Enrichment (EWCE) method (Skene and Grant, 2016). TTTPs were binned into approximately equal sized age-groups (similar to the DeGeT method) and then tested for cellular enrichments (Figure 3a). We assumed that different biological processes could be associated with up/down-regulated genes and so performed enrichment analyses for each direction separately. To ensure the findings were robust, the analysis was performed on both the Braincloud (Colantuoni et al., 2011) dataset and an independent human prefrontal cortex transcriptome dataset (Somel et al., 2009) (see Materials and methods). Significant enrichments were found for each cell-type tested and these were strongly correlated between the two datasets (Figure 3a, median correlation of enrichments for each cell-type and direction with at least one significant change between the two datasets was 0.37). The majority of significant enrichments were found amongst the Upward (Figure 1a) gene sets relative to the Downward trajectory gene sets.
A striking sequence of events was observed where each cell type was regulated within distinct age-windows (Figure 3a and summarised in Figure 7). Early postnatal life was marked by TTTPs in endothelial cells (=0.025, <0.00001 where ‘Bonferroni’ indicates a Bonferroni-adjusted p-value) and oligodendrocytes (=0.015); followed by microglial genes (<0.00001, <0.00001) throughout adolescence; then pyramidal neuron genes (=0.0216, <0.00001) in early adulthood (corresponding to the peak in turning points); then interneuron (<0.00001) and oligodendrocyte genes through the late twenties and thirties (<0.00001, <0.00001). Astrocytes were found to have two periods of enrichment, the first during early adulthood (=0.0028) and a second very late in life (=0.00675, <0.00001). These enrichments reveal the sequential maturation of cellular processes in the brain across the human lifespan. Moreover, most of these changes occurred prior to 35 years of age with prominent neuronal changes in young adults.
We also performed this analysis on the mouse hippocampal dataset (Figure 3b). As in the human neocortex datasets we found an early adulthood enrichment for upward-turns in pyramidal neuron genes (=0.008), followed by a later enrichment for downward-turns in genes associated with both pyramidal and interneurons (=0.006 and, =0.0086 respectively). Unlike in the human datasets, upward-turns in interneuron genes were found to be enriched at the same ages as those in pyramidal neurons (=0.025). This may be because the earliest samples in the mouse ageing dataset were 56 days whereas the human datasets included fetal samples: correspondingly the early life enrichments for upward-turns in microglia and endothelial cells were not seen in mice. A later enrichment for upward-turns was seen in endothelial cells in mice (=0.048). These findings show that the enrichment of neuronal genes in the early adult peak is conserved across species and brain regions.
The young adult peak in TTTPs is a prominent landmark and we next sought to identify the key molecular mechanisms involved. We used ALiGeT to identify the top 10% of genes in both human and mouse in the year in which the largest number of TTTPs occurred (and refer to this as the Peak Gene Turning (PeGeT) score) (Supplementary files 1a and 1b). The relevant biological processes in the genes with high PeGeT scores, conserved between mouse and human (Supplementary file 1c), were first examined for enrichment in Gene Ontology terms: this revealed their role in synaptic transmission (=0.00024 where ‘FDR-adj’ indicates the p-value was adjusted for False Discovery Rate using the Benjamini-Hochberg method). Mammalian Phenotype Ontology annotations indicated their roles in behavioural (p=7.5×10−05) and nervous system (p=0.0002) phenotypes. To probe the synaptic role in more detail, we examined genes coding for proteins in the human postsynaptic density (hPSD) (Bayés et al., 2011) and the postsynaptic PSD95 supercomplexes (Husi et al., 2000; Frank et al., 2017; Fernández et al., 2009b; Frank et al., 2016b), which are crucial in controlling synaptic plasticity and behaviour: both sets showed significantly higher PeGeT scores than expected by chance (p<0.0009 and p=0.0019 respectively, Figure 4a,b, Supplementary file 1d).
To establish the age window when synaptic genes showed TTTPs, we used a bootstrapping approach to test whether synaptic genes have higher ALiGeT scores in particular years in the Braincloud dataset. Each year between 22 and 33 years of age showed a significant increase in synapse-associated ALiGeT scores and we refer to this as the ALiGeThPSD window. A replication study (Somel) using a smaller human transcriptome dataset (Somel et al., 2009) gave an overlapping estimate for the synaptic ALiGeThPSD window at 17–22 years (Figure 4c). We next applied the DeGeT method and found all five consecutive age sets from 24 to 36 years were significantly enriched in hPSD genes (Figure 4d). This result was confirmed in two independent human frontal cortex datasets (Somel et al., 2009; Kang et al., 2011) (see Figure 4—figure supplement 1). Similarly, we identified the ALiGeTmPSD window in mice at 126 to 151 days (Figure 4e). These data show that transcripts encoding postsynaptic proteins were significantly enriched in TTTPs during early adulthood in all four datasets tested, spanning two species and two brain regions.
Since these transcriptome results suggest that specific changes occur in the composition of components of the synapse proteome, we performed relative quantitation by mass spectrometry on 38 forebrain synaptosome samples between 1–5 months in mice. A total of 900 proteins were detected, of which 99 were found to show significant changes with age (Supplementary file 1e, <0.005). We explored which functional classes of synapse proteins were affected by ageing during this period, and found that ion channel proteins and receptors showed increased likelihood of being differentially expressed (=0.00034, expected = 7, actual = 17, Figure 4f). Amongst the affected channels were the majority of Ca2+- and Na+/K+-ATPases detected in our dataset, including three of the subunits of Atp2b (often referred to as PMCA). This finding recapitulates and extends the previous reports that aged rodents have decreased Ca2+- and Na+/K+- ATPase activity (Zaidi et al., 1998; Tanaka and Ando, 1990). These proteomic findings support the transcriptome findings that changes in synapse proteome composition occurs within the young adult age window.
Prompted by previous results showing that postsynaptic proteins have been linked to the genetic susceptibility of schizophrenia (Fromer et al., 2014; Fernández et al., 2009a; Nithianantharajah et al., 2013; Kirov et al., 2012; Purcell et al., 2014), we hypothesized that TTTPs may be relevant to the age of onset of schizophrenia. For these analyses we examined multiple sets of publically available genetic data including de novo and GWAS data (see Materials and methods ‘Genes Lists’, Supplementary file 1f) and applied multiple analytical approaches.
As a first step, we applied the same bootstrapping approach used earlier for the PSD genes on the pooled set of genes which have been associated with Schizophrenia using either GWAS or de novo approaches. Strikingly, this analysis showed the TTTPs in susceptibility genes predicted the known age windows for the onset of schizophrenia (Figure 5). The ALiGeTscz enrichment window spanned 22–26 years (Figure 5a) corresponding to the clinically reported age of onset defined by the first episode of psychotic symptoms and window of maximum vulnerability (Kessler et al., 2007; Häfner et al., 1993). Since males are reported to have an earlier disease onset than females (Kessler et al., 2007), we tested males and females separately and found the ALiGeTscz enrichment window was significantly earlier in males than females (males 20–26 years, females 26–28 years; wilcoxon p=4.7×10−14, Figure 5b).
To validate these results, we performed a series of technical control studies and biological replications. First, we used the DeGeT scoring system that showed the enrichment in schizophrenia peaked at 24–26 years (Figure 5c). Secondly, we performed the DeGeT enrichment test in the two independent human frontal cortex datasets (Somel et al., 2009; Kang et al., 2011) and confirmed that schizophrenia was significantly enriched (confirming the Braincloud result) (Figure 5—figure supplement 1). Thirdly, to confirm the results were not specific to a particular spline regression method, we demonstrate that enrichment window for schizophrenia was found using two alternate approaches to model fitting (Figure 5—figure supplement 2). While validating the occurrence of the disease enrichments, this analysis also revealed that the exact age at which the turning points/windows of enrichments occur depends on the regression model used (Figure 5—figure supplement 2). Fourth, we performed a down-sampling based sensitivity analysis to determine how the size of the gene set influences the enrichment: this indicated that the DeGeT enrichments are stronger for schizophrenia than for the hPSD (using subsets of 650 genes, 95% of schizophrenia subsets were significant at 24—25 years but only 50% of hPSD subsets, Figure 5—figure supplement 3). Fifth, we performed a variation of the bootstrapping analysis which accounts for transcript length and GC content (both of which are known to affect the rate of de novo mutations) and found no effect on the significance of the results (Figure 5—figure supplement 4). Sixth, we confirmed that varying the parameter in the ALiGeT scoring function, which controls the rate of decay with temporal distance, did not adversely affect the results (Figure 5—figure supplement 5). Finally, we dropped all fetal samples from the analysis, recalculated the TTTPs and confirmed that the major peak of TTTPs occurs in the early twenties, that it is delayed in females, that the later DEGET windows are enriched for PSD genes, and that schizophrenia shows discrete and significant enrichments using DEGET (Figure 5—figure supplement 6).
To further validate our findings, we next examined different genetic datasets that have been used to identify schizophrenia susceptibility genes. The ALiGeTscz analysis results described above used a combined schizophrenia gene set from three orthogonal genome-wide methods: (1) an integrative analysis of Genome Wide Association Studies (GWAS), expression analysis, copy number variants (CNV) and mouse models (Ayalew et al., 2012); (2) combined results of three exome-sequencing studies (Fromer et al., 2014; Xu et al., 2012; Girard et al., 2011), and (3) the most recent GWAS results from the Schizophrenia Working Group of the Psychiatric Disease Consortium (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) (Supplementary file 1f). We therefore separately tested sets of susceptibility genes discovered with all three methods: all showed significantly increased PeGeT scores (Integrative Analysis, <0.0009, de novo, =0.0057, GWAS, =0.027, Figure 5d–g). Interestingly, the stronger enrichment seen with de novo mutations may reflect that GWAS detects common variants that are assumed to have lower penetrance (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). We also validated the results using an additional set of de novo mutations (Gulsuner et al., 2013): this also confirmed the adult enrichments, either when analysed independently or when combined with the combined sets described above (Figure 5—figure supplement 7).
Much of the heritability for schizophrenia is associated with SNPs that have not reached genome-wide significance with current sample sizes (Loh et al., 2015) (and were not included in our analysis thus far) and the sample sizes for de novo studies are too small to determine whether any genes found are significantly associated with disease status. We therefore adapted our methods to include a greater fraction of the SNPs associated with schizophrenia heritability by using association statistics from all SNPs regardless of whether they are genome wide significant (as defined in GWAS summary statistics files) and explicitly modelling linkage disequilibrium (based on results of the 1000 genomes project), such that disease association scores can be ascertained for each gene. The ALiGeT and DEGET approaches were extended to directly utilise the gene association scores generated by MAGMA (Multi-marker Analysis of GenoMic Annotation) (de Leeuw et al., 2015) based on schizophrenia GWAS summary statistics. Using this approach, schizophrenia showed a DEGET enrichment at 26 years in the Braincloud dataset (=0.03135, Figure 5h) and at 15—16 years in the Somel dataset (=0.0115, Figure 5—figure supplement 8a) and corresponding enrichment windows were found using ALiGeT (Figure 5i, Figure 5—figure supplement 8b).
Finally, we performed two additional sets of analyses where we examined mouse proteome datasets and single-cell transcriptome data. Consistent with the transcriptome results, the mouse synapse proteomic dataset showed that schizophrenia-associated proteins were enriched in the synapse proteins that changed prior to the 5 month peak (p=0.018, expected = 6, actual = 11), and 91% of those that change were found to be down-regulated. Next, we examined the role of specific cell types in schizophrenia by restricting the ALiGeT analysis to the subsets of genes defined by single cell transcriptome data (see Materials and methods) (Zeisel et al., 2015). Schizophrenia associated genes showed a significant (<0.05) window of enrichment in neuronal genes (Figure 6a, Figure 6—figure supplement 1). We were however unable to confirm this result using the summary statistics based method (Figure 6b). We then examined the types of trajectories associated with schizophrenia and found genes were predominantly down-regulated prior to the TTTP (Figure 6c). This result was confirmed using the summary statistics based method (Figure 6d).
Together these analyses show robust replication across multiple datasets, including different types of genetic variants, transcriptomes and proteomes, and several complementary analytical approaches. These data strongly suggest that the window of onset and sex difference in schizophrenia is timed by the regulation of susceptibility genes in young adults.
In the mouse synapse proteome dataset, we noticed several proteins that cause adult onset monogenic disorders. We used the Human Phenotype Ontology age-of-onset annotations to identify these neurological disorders (excluding neoplasms, and peripheral/autonomic disorders) and from a total of 39 genes, six were detected in our samples, and we confirmed these were significantly enriched amongst the synapse proteins whose expression levels changed during maturation (p=0.00008, expected = 1, actual = 4, Figure 6—figure supplement 2, Supplementary file 1e). The affected proteins/disorders included Atp2a2 (Darier’s disease), Eef2 and Itpr1 (Spinocerebellar ataxia), Atp1a3 (Dystonia Parkinsonism). No equivalent enrichment was found for congenital/neonatal onset disorders (p=0.45, expected = 4, actual = 4). Further inspection (because HPO annotations were incomplete) found three additional, adult-onset haploinsufficiency disorders encoding by genes showing a 50–80% reduction in the maturing mouse synapse proteome (Figure 6—figure supplement 2): Vcp (Frontotemporal Dementia) (fold change, FC = 0.48; p=0.002); Atl1 (hereditary spastic paraplegia) (FC = 0.19, p=0.00004); Dmxl2, (Polyendocrine-polyneuropathy syndrome) (fold change: 0.28, p=0.00004). These age-dependent reductions in levels of haploinsufficient disease genes is consistent with the model that their respective lifespan trajectories are relevant to their age of onset.
The possibility exists that the lifespan trajectories are also relevant to the age of onset of other polygenic diseases with onset at different ages. To address this we compiled gene lists for six other major brain disorders with different age-windows of onset: onset during infancy (autism and intellectual disabilities); early adulthood (multiple sclerosis); and late adulthood (Amyotrophic Lateral Sclerosis, Parkinson’s and Alzheimer’s) (corresponding gene sets shown in Supplementary file 1f). We applied the same bootstrapping approach used earlier for the PSD genes and none of the disorders showed significant results (Figure 6—figure supplement 3). Because these gene lists are not comparable (different size, population sample size, obtained using different technical approaches etc.) the relative importance of the genetic calendar to schizophrenia cannot be directly compared with these disorders (see Discussion). In addition, because the transcriptome data is from prefrontal cortex and the primary pathology of several of these diseases is in other parts of the nervous system it cannot be assumed that the transcriptome trajectories in one part of the brain are the same as in others. Hence, the lack of any detectable age window does not preclude a role for gene regulation in the onset of these diseases.
Understanding gene expression in the human brain during the phases of childhood, adolescence, young adulthood, middle and old age, is a fundamentally important area of biology with medical significance. We focussed on identifying age-dependent gene regulatory events that were detected when the trajectory in the level of gene expression changed. Studying the transcriptome trajectories across the lifespan of the human neocortex and mouse hippocampus showed that TTTPs occurred at all ages. Moreover, because these events were a defining characteristic of every age, we found that actual age could be predicted by examination of an RNA sample from mouse and human brain tissue. These findings indicate there is a ‘genetic lifespan calendar’ that sets the date for gene expression changes in both species. This conclusion complements previous epigenomic studies that show DNA methylation correlates with chronological age (Horvath, 2013). The most striking and unexpected feature was the peak of TTTPs in young adult humans at 26 years of age and 5 months of age in mice. In both species, this peak was delayed in females and involved similar sets of genes. Moreover, in both species there was a similar sequential pattern of cell-type specific changes across the lifespan. Thus, we conclude that mammals with greatly differing lifespans share a conserved genomic program regulating the sequence of cellular and synaptic changes throughout the lifespan.
We discovered that the young adult brain undergoes a dramatic reorganisation of gene expression, as revealed by the TTTP-peak around 26 years, and this reorganisation is largely completed by 40 years of age when ninety percent of trajectories have plateaued. These findings correspond well with anatomical data showing that development and myelination in the frontal cortex is over (Lebel et al., 2012) by this age. The TTTP-peak was enriched in neuronal and synaptic genes expressed in pyramidal and interneurons including those encoding PSD proteins and the 1.5 MDa PSD95 supercomplexes (Figure 7). This indicates that young adults undergo a major reorganisation of their synapse proteomes, which was supported by proteomics results in mice. The level of expression of many PSD and PSD95 supercomplex proteins are known to be important for many innate and learned behaviours and we also found that the TTTP-peak was enriched in behaviourally important genes. Thus the genetic calendar can modify synapse proteome composition and potentially shape the behavioural repertoire across the lifespan.
The TTTP windows in non-neuronal cells also corresponded well with known changes in these cells. The early age-window in endothelial cells corresponds to the expansion and maturation of the brain’s vasculature (Caley and Maxwell, 1970; Engelhardt and Liebner, 2014; Azmitia et al., 2016) and post-natal shift in the transcriptome of endothelial cells (Daneman et al., 2010). There was also an enrichment in oligodendrocyte genes early in life, potentially corresponding to downregulation of genes specific to oligodendrocyte precursor cells (He et al., 2009). The next phase, which continues through the early teenage years, involves strong upregulation of microglial genes and coincides with synaptic pruning (Schafer et al., 2012) and preceded the young adult window of synaptic and neuronal reorganisation.
To date, there has not been a satisfactory mechanism or model that accounts for the following five central features of schizophrenia. First, the genetic susceptibility: the mechanism needs to account for the diverse sets of genes and the diverse types of mutations. Second, the age of onset: the mechanism needs to account for the age-window during which first-episode psychosis occurs, the heritability of this onset, as well as the earlier presence of prodromal cognitive symptoms (Koutsouleris et al., 2012). Third, the sex difference: females have a later onset than males. Fourth, the cell biological mechanisms: there needs to be a common subcellular mechanism that incorporates the diverse classes of disease-relevant proteins, which include channels, receptors, synaptic adhesion proteins, scaffold proteins and signalling molecules. Fifth, the cognitive deficits: the molecular and cellular mechanisms need to play a key role in the relevant cognitive processes. A model that satisfies these criteria would also be expected to have explanatory power for other characteristics of schizophrenia.
Our findings meet all five criteria and we propose the following model. A genetic program orchestrating transcriptome trajectories causes reorganisation of expression of synapse proteins in young adults. Mutations in these genes are functionally exposed in young adults because the reorganisation of expression produces inappropriate synapse signalling properties that result in abnormal behaviour. We refer to this as the genetic calendar model of schizophrenia. This model posits that schizophrenia is a genetic disorder targeting the mechanisms of brain aging during the young adulthood period of the lifespan. Our model offers a mechanistic explanation for the onset of first-episode psychosis and is consistent with prodromal cognitive impairments and the persistence of schizophrenia beyond the young adult years (when many genes reach their plateau). Thus, the genetic calendar model can explain the onset and progression of schizophrenia.
In addition to the robust association of schizophrenia genes with the TTTP-peak which was replicated across datasets including multiple types of genetic variants, transcriptomes and proteomes, and several complementary analytical approaches, our study identified specific molecules that further strengthens the mechanistic link between synaptic mechanisms and schizophrenia. PSD95 supercomplex proteins are enriched in schizophrenia genes (Fromer et al., 2014; Fernández et al., 2009a; Singh et al., 2017; Kirov et al., 2012; Purcell et al., 2014; Grant et al., 2005). Amongst the schizophrenia-susceptibility genes with the most prominent TTTPs were those with established synaptic functions, including Rgs4, Snap25, Kalrn, Htr2a and Nrg1. The expression level of each of these genes has previously been shown to either influence, or be influenced by psychiatric symptoms (Etain et al., 2010; Guillozet-Bongaarts et al., 2014; Yin et al., 2013; Hill et al., 2006). Furthermore, altered expression of Snap25, Htr2a and Nrg1 are noted to associate with earlier age of onset (Etain et al., 2010; Weickert et al., 2012; Abdolmaleky et al., 2011).
We found evidence that several mendelian neurological disorders with adolescent and young adult onset also involved proteins that were down-regulated in the TTTP-peak. Heterozygous mutations in Atp2a2 cause Darier’s disease and significantly increase the risk of many psychiatric disorders including mood disorders, depression, and schizophrenia (Gordon-Smith et al., 2010). Mutations in Eef2 and Itpr1 cause spinocerebellar ataxia type 26 and 15/29, respectively. Atp1a3 mutations cause rapid-onset dystonia Parkinsonism, which leads to Parkinson’s-like symptoms appearing during early adulthood, often with concurrent emergence of psychiatric symptoms (Brashear et al., 2012). Vcp mutations cause a form of frontotemporal dementia with a mean age of onset in the mid-thirties (Watts et al., 2004). One of the main causes of hereditary spastic paraplegia (HSP) are heterozygous mutations in Atl1, which has an age of onset ~21 years (McCorquodale et al., 2011). Interestingly, we found that the synaptic scaffold protein Dmxl2 (haploinsufficiency causes Polyendocrine-polyneuropathy syndrome [Tata et al., 2014]) was reduced by ~70% (fold change: 0.28, p=0.00004) and studies in mice show heterozygous deletion of Dmxl2 in central neurons delayed the onset of puberty (Tata et al., 2014). These findings support the view that the genetic lifespan calendar reduces expression below a critical threshold in young adults and is important for multiple neurological and psychiatric diseases.
Our studies relied on human prefrontal cortex transcriptome data, which may have limited our ability to detect a role for the genetic lifespan calendar in the age-dependent onset of those diseases that are known to have primary pathology in other brain regions (e.g. Parkinson’s disease). Given the evolutionary conservation between mouse hippocampus and human prefrontal cortex, we expect other human brain regions to show a calendar of transcriptome trajectories, but with different patterns and therefore different age windows of disease gene enrichments. Moreover, while our whole tissue transcriptome analysis appeared to be sensitive to neuronal changes, we expect that future single-cell transcriptome data will provide a more detailed insight into rarer cell types and potentially reveal mechanisms relevant to the age of onset of pathology in these cells. We do not expect that the age of onset of all brain diseases will be accounted for by the genetic calendar, as it will likely depend on the importance of cell autonomous processes and exogenous factors (e.g. inflammatory processes involving microglia). It is also likely that high penetrance severe mutations will show an earlier onset and are less likely to show a dependence on the TTTPs. Weaker alleles may manifest with undetectable or subtle phenotypes at early ages and be exposed at later ages by the changes in gene expression. Thus, TTTPs would not necessarily account for the age of onset of intellectual disability or autism even though some of the same genes are involved with schizophrenia.
The genetic lifespan calendar is an innate mechanism that regulates the levels of postsynaptic proteins and hence changes the physiological and behavioural properties of brain circuits. This indicates that the brain is not ‘hard wired’ but is continuously being modified by an evolutionary ancient and conserved genetic program. The postsynaptic proteins control innate and learned behaviours and thus the genetic calendar will modulate innate behaviours and the capacity to learn across the lifespan. The overarching biological function of this program could be to equip the animal for the challenges it faces at different ages.
If there are mutations that interrupt this calendar of events, then the organism will not respond appropriately to environmental challenges. This may be relevant to schizophrenia where exogenous factors are thought to influence onset (e.g. cannabis) or behaviour (e.g. smoking) during young adulthood. Our model which posits a fundamental role for genetic and genomic mechanisms can therefore accommodate previous models that have considered exogenous factors in disease aetiology. Interestingly, environmental triggers of psychosis, such as cannabis and other drugs of abuse, are known to act on the synaptic signalling mechanisms (Camp et al., 2011; Abbas et al., 2009) that are being reorganised in the TTTP-peak. Our model also has implications for those schizophrenia models that posit a (non-genetic) fetal insult as the predisposing factor for later onset (Brown and Derkits, 2010): the enrichments in schizophrenia susceptibility genes in young adults was present even when fetal samples were not included in the analysis supporting the notion that it is a disorder of postnatal brain ageing.
Our genetic lifespan calendar model may also have implications for the development of pharmaceuticals for the adolescent and young adult onset psychiatric and neurological disorders. As an alternative to the ‘precision medicine’ approach which directs pharmacological treatments to each susceptible genotype, we suggest that therapeutics accommodating many genotypes might lie in drugs that modify the genetic lifespan calendar. The identification of the conserved TTTP-peak between mice and humans may also assist in refining animal models of adult-onset brain disorders. Our findings open a range of new approaches for understanding brain ageing and the mental disorders afflicting young adults.
Hippocampi from 186 mice of both sexes and two background strains (C57Bl/6 and 129s5) aged 58—600 days were used for the mouse microarray dataset (Figure 2—figure supplement 2). All animal experiments conformed to the British Home Office Regulations (Animal Scientific Procedures Act 1986; Project License PPL80/2,337), local ethical approval, and NIH guidelines. Animals were born and raised within the Research Support Facility at the Wellcome Trust Sanger Institute, and exposed to the same light/dark cycles and feed supply. Mice were exported to a holding facility three days prior to collection of brain samples, to allow time for stress response genes induced through movement to subside. All animals were sacrificed by cervical dislocation, and the hippocampi were dissected on ice, and snap frozen in liquid nitrogen.
Mouse brain samples were homogenised using the Kontes Cordless Pellet Pestle system. Total RNA was extracted using the Qiagen miRNeasy kit, snap frozen using liquid nitrogen and stored at −80°C. Microarray processing was performed by the Wellcome Trust Sanger Institute’s Microarray facility. The Illumina TotalPrep-96 RNA Amplification kit was used for reverse transcription, amplification, and biotinylation of the RNA prior to hybridisation. The microarrays used were the Illumina MouseWG-6 v2 series. Hybridization, washing, and staining were performed according to standard Illlumina protocol. The microarrays were imaged using the Illumina BeadArray Reader. Images from the scanner were processed using the BeadStudio software. The microarray data is available to download from ArrayExpress (accession number E-MTAB-3256).
The human samples were generated by the Braincloud project (braincloud.jhmi.edu) as described in their primary publication (Colantuoni et al., 2011). In brief they used post-mortem human brains from the NIMH Brain Tissue Collection, National Institute of Child Health and Human Development Brain and Tissue Bank for Developmental Disorders. RNA samples were described as being extracted from ~100 mg of tissue from BA46/9, the dorsolateral prefrontal cortex. Custom-spotted two-colour oligonucleotide microarrays constructed from a set of oligos referred to as HEEBO7 (Human Exonic Evidence Based Oligonucleotide) were used. The preprocessed/normalized Braincloud expression data was downloaded from GEO (GSE30272, data contained within the Series Matrix file). No further normalisation was done beyond that which was already performed as described in the Braincloud publication. Low intensity probes and outliers had already been removed and expression levels were adjusted for sex, ancestry and a single surrogate variable using the sva package (Leek et al., 2012).
The ‘Somel’ dataset used to validate the increased PeGeT/DeGeT/ ALiGeT scores of schizophrenia genes, and EWCE enrichments, was downloaded from the GEO repository (accession GSE11512) (Somel et al., 2009). The dataset comprised samples taken from dorsolateral prefrontal cortex, which had been hybridised to Affymetrix Human Genome U133 +arrays. The dataset was read into R using the bioconductor ‘affy’ package, and processed using RMA. Probes with detection probabilities of less than 0.05 in over ten of the arrays were dropped.
The BrainSpan RNA-Seq gene level dataset (Kang et al., 2011) was downloaded from the BrainSpan website (brainspan.org) on the 21st Jan 2016. The data was read into R. Data from the following four regions were retained: dorsolateral prefrontal cortex, ventrolateral prefrontal cortex, orbital frontal cortex and anterior (rostral) cingulate (medial prefrontal) cortex. Ages were converted to numerical equivilents. Any transcripts whose summed expression over all retained samples was less than 0.1 were dropped. Because the oldest samples in the BrainSpan dataset are only 40 years of age we treated trajectories which did not turn as turning in final year of life.
The mouse microarray data was read into R using the Bioconductor package Lumi (Du et al., 2008) (RRID:SCR_012781) and re-annotated based on a published analysis (Barbosa-Morais et al., 2010). The detectionCall function from the lumi package was used to drop probes undetected by the microarrays. Probes rated as having a ‘bad’ quality (according to the re-annotation package) were dropped, as were those with the coding zone given as ‘Transcriptomic?’. A variance stabilizing transformation was applied followed by quantile normalization.
Natural cubic splines with three degrees of freedom were fitted to the expression data using the R package splines. For the mouse expression data, age was modeled as the independent variable, with sex and background as covariates. For the human expression data, as variation attributed to the extraneous variables (sex and ancestry) had already been subtracted earlier using the sva package these terms were not included. The location of knots for the splines was determined by the quantiles of the data. For each species and gene, trajectories were then interpolated across the lifespan with extraneous variables (sex and ancestry/background) held constant. We refer to these interpolated spline trajectories as Brain Transcriptome Lifespan Trajectories (BTLTs).
Prior to detecting TTTPs, spline models were fitted to the data. To calculate where a TTTP occurred in a gene’s expression trajectory (), the derivative was approximated for each BTLT and the age of the TTTP was taken as the first point where =0 and the derivative changed sign, where is expression level in the spline model and is age. If multiple turning points were found in a single spline, then only the first was used. Figure 2a–c show the number of TTTPs found across the transcriptome within a given year/month. To determine the TTTPs for each sex, all samples from the other sex were dropped, the splines remodeled (without including sex as a covariate) and TTTPs detected afresh.
The numbers presented for the age of the TTTP peak in paragraph six are the mean age of TTTPs across all transcripts. To test for a difference in the age of TTTPs between the sexes, the Wilcoxon signed rank test was used. To test whether X-chromosome genes caused the sex difference, TTTPs in genes from that chromosome were dropped the same statistical test performed on the remaining set, and a similarly significant probably for difference was found.
To analyse the properties of trajectories with TTTPs (for generation of Figure 2g,h), linear models were fitted to the expression data before and after each TTTP, and tested for whether there is a significant change. A small fraction of probes which were detected as having TTTPs were not found to show significant differential expression prior to the TTTP, after multiple hypothesis correction with the Benjamini-Hochberg correction (FDR = 0.05) and these were excluded from this analysis of TTTP characteristics. We classified genes as showing post-turn plateaus if there were not detected as differentially expressed after the TTTP with FDR = 0.05. Those that were detected as showing significant differential expression after the TTTP, in the opposite direction to before the TTTP, are denoted as showing post-turn reversals. To generate Figure 2f we calculated the deciles of the ages at which TTTPs occur, and used these to split the data into ten groups with as closely matching sizes as possible.
The Age-linked Gene Turning scores (ALiGeTs) were designed to represent two features: (1) proximity of the TTTP close to the target year, and (2) the changes in their expression level from the start of the dataset (at 14th gestational week) through to the age at which the TTTP occurred (this change is denoted as ). A human genes ALiGeT scores for a given age ( can be calculated as:
is the index of the gene of interest
is the year of age being queried
is the change in expression level prior to the TTTP
is the age of the first TTTP in gene
For mice the equation is modified slightly to account for the difference in lifespan of the two species. To allow the ALiGeT window to have an approximately equal width across a given proportion of the mouse’s, we scaled the term by 650/78 where 650 days represents old age for a mouse and 78 represents old age for a human. A mouse genes ALiGeT score for a given age can thus be calculated as:
To penalize genes which are further from the peak of TTTPs, the distance between the age of turning point () and the target age () is used as the exponent in an exponential expression with base 1.5. This function gives greatest emphasis to genes with D = 0, with a rapid fall over neighbouring years such that a human gene with D = 6 years have scores 9% the size of those with equivalent pre-turn changes in expression level at D = 0. The scoring function is graphically depicted in Figure 1—figure supplement 1. We show in Figure 5—figure supplement 5 that the results are stable to variation in the exponent base.
The term PeGeT score is used to refer to the ALiGeT score for the year in which that species has the most TTTPs.
To score genes using the DeGeT method (represented in Figure 1b and used to generate Figure 4d and Figure 5c) a similar scoring system was used. First ages were divided into ten groups based on the number of genes which turn/inflect within that age period, such that each age set has approximately equal size number of turning genes. Because there are many more TTTPs during early adulthood than in infancy/old age, the age windows span many years at start/end of the lifespan and as few as one year in during the twenties. For age set S, which covers ages , the score was equal to the pre-turn expression change if the gene turns within the age window bounded by x and y. If the gene does not turn within within the age window bounded by x and y then the score for that gene is zero.
Cell type enrichment analysis was performed on three datasets: Somel (Somel et al., 2009), Braincloud (Colantuoni et al., 2011) and the mouse hippocampus. The EWCE ('Expression Weighted Celltype Enrichment') package from Bioconductor (RRID:SCR_006442) was used to perform the enrichment with 100,000 bootstrap replicates used for each test. Age groups were assigned based on the frequency of TTTPs across the lifespan (0—10th, 11th—20th,…,90th—100th percentile of all TTTPs). The genes which turn within each age window were sorted based on the size of pre-turn expression change. For each age window, the 10% of genes which turn within that window with largest positive (upward) and negative (downward) expression changes were assigned into two groups: we refer to these lists as the 'target' lists. A previously published single cell transcriptome (SCT) dataset containing cells from cortex and hippocampus was then loaded (Zeisel et al., 2015). Any genes in the target lists which were not found to be expressed in the SCT dataset were dropped. For the analysis with Braincloud and Somel, the background gene set was taken as all genes with mouse orthologs that are also found in the SCT dataset for which a spline was fitted. For the mouse hippocampus analysis, the background set contained all genes for which a spline was fitted and which was also detected in the SCT dataset. For the directional analyses, the background set thus contains genes which change in both directions. Bonferroni correction was used to adjust for multiple testing. For both directions, for each cell type within each age group (indexed by r, and respectively), we calculated the mean ( and standard deviation (of the bootstrap distribution, and used this to determine the distance (in terms of standard deviations) that the target list falls from the expected mean—we refer to this value as . Values were calculated separately for each dataset. The plots shown in Figure 3 show normalised values derived from by dividing by the maximal absolute value over all age windows. As a result the maximal absolute enrichment is either 1 or −1. The values plotted on the x-axis each represent one of the age windows defined by the quantiles (specifically, the values shown are the mean of the upper and lower bounds of the window). The age windows for Somel and Braincloud are not fully overlapped but were annotated with the central points of the Braincloud age windows to enable both datasets to be plotted against each other.
To determine whether gene sets show larger ALiGeT scores than expected, a boot strapping test was performed. Genes in the target list, which were not detected as expressed on the arrays, were dropped from the analysis. Where multiple probes target the same gene, duplicated probes with smaller PeGeT inflection scores were dropped. For each target list being tested, ten thousand random gene lists of the same length were generated. The mean ALiGeT/PeGeT score for the target list and the random gene lists was calculated, and the proportion of random lists with smaller mean scores than the target list is taken as the probability. Where multiple lists were tested for significance against PeGeT scores, these were corrected using the Benjamini-Hochberg method (FDR = 0.05). The same method was used for the DeGeT set based analysis method as was used for the ALiGeT analysis, with bootstrapping being done with 10000 samples and correction for multiple testing over age sets done with the Bonferroni method.
Figure 4a,b, Figure 5d–g, and Figure 6—figure supplement 3 represent the results of the PeGeT bootstrapping analysis in graph form. We sought to represent the extent to which enrichment results from high scores amongst either a small number of genes, or from a broad increase in scores throughout the list. As in the bootstrapping analysis we compare the score distribution in the target (disease/synapse) list to the scores in random lists of the same length. To keep the plot tidy we only represent data from 100, rather than 10000 random lists. PeGeT scores for the target and random lists were sorted by numerical size. For each of the n genes in the target list, 100 dots were positioned with the y-axis determined by ith largest score in the target list, and the x-axis given by the ith largest score in each of the 100 random lists. To interpret the graph, note that if the majority of the random list scores fall above the red line, then the random lists have lower scores than the target list. It should be noted that the scales of the graph does bias the view towards the genes with largest scores.
To perform a bootstrapping analysis which controls for gene size and GC content, we obtained those values from biomart. Where multiple transcript lengths were associated with a single HGNC gene we took the mean value. The deciles of gene size and GC content were calculated over the set of expressed genes. The two sets of decile values were used to define a grid, and each gene assigned to a position within the grid based on it’s transcript lengths and GC content. To run a bootstrap analysis on a particular target list, 10000 random lists were constructed with equal length to the target list. Gene i in each random list was selected from the same grid square as gene i in the target list.
The disease gene lists are provided in Supplementary file 1f. The combined schizophrenia gene list (results shown in Figure 5a–d, Figure 6a,b, Figure 5—figure supplement 1–5,7 and Figure 6—figure supplement 1) used genetic associations from three types of studies, which were also analysed individually: the integrative dataset contains 42 genes assembled using a translational convergent functional genomics approach (Ayalew et al., 2012); the de novo gene set comprised 609 genes pooled from three studies which used parent-child trios to detect de novo mutations (Fromer et al., 2014; Xu et al., 2012; Girard et al., 2011); the GWAS results are from the 2014 release of the Schizophrenia Working Group of the Psychiatric Genomics Consortium (Abbas et al., 2009). Many SNPs from the GWAS study results were associated with multiple genes (349 genes were associated with 108 loci, with numerous loci associated with over twenty genes), and these were dropped leaving 62 genes to be used in our analyses (not that this does not apply to the analyses which directly utilised the GWAS summary statistics files). The GWAS result remained significant if the alternative approach was taken and all genes associated with SNPs were used. The additional schizophrenia de novo gene set (the ‘Gulsuner’ set) came from exome sequencing of ‘quads and trios’ (patients, their parents and unaffected siblings) associated with 105 individuals with the disorder (Gulsuner et al., 2013). Two autism lists were used, both based on finding de novo mutations through exome sequencing: the first contained 172 mutations (Sanders et al., 2012), the second 358 (Iossifov et al., 2012). The 78 intellectual disability genes were discovered through de novo sequencing of family groups (de Ligt et al., 2012). The Multiple Sclerosis, Amyotrophic lateral sclerosis (Lill et al., 2011), Parkinsons (23andMe Genetic Epidemiology of Parkinson's Disease Consortium et al., 2012) and Alzheimer's (Bertram et al., 2007) lists come from the top results of the following websites: msgene.org (69 genes); alsgene.org (17 genes); pdgene.org (23 genes); alzgene.org (10 genes).
Gene list enrichments were evaluated across the lifespan by generating for each gene, for each value of between 1 through 78 ( defined above). To test for disease enrichments the boot strapping method described above was applied. Enrichment of the gene sets were calculated at each age. The Bonferroni method was used to correct for the testing of the hypothesis at each of the 78 years of age. When running the ALiGeT for the eight diseases Bonferroni correction was applied across all the diseases as well as over each year of age. To obtain the sex specific datasets, the enrichments were calculated separately on the male subset of the data, and then again on the female subset. To test for sex differences in TTTP ages specific to disease gene sets, the age of TTTPs associated with those genes in the male and female data subsets were compared using the Wilcoxon signed-rank test.
To control for transcript length, the maximum transcript length was determined for each gene using Biomart. As the distribution of transcript lengths is sharply peaked, ten quantiles of transcript length were used to group the genes for display of TTTP score distributions. To control for neuron specificity, the data on single cell transcriptomes from the Linnarsson/Hjerling-Leffler labs (Zeisel et al., 2015) was utilized (data available at linnarssonlab.org/cortex/). The cell-type specificity matrix was used to produce a metric for neuron vs glia enrichment. This was done by first calculating neuronal expression as the sum of expression in all cells they label as ‘pyramidal’ or ‘interneurons’, whilst glial expression was the sum of expression in ‘astrocytes’, ‘endothelial’, ‘microglia’ or ‘oligodendrocytes’. The ratio of these values was taken, and classic markers checked to ensure they were as expected (Dlg4 = 15; Camk2a = 6; Map2 = 13; Gfap = 0.2; Mbp = 0.1; Aif1 = 0.2). The 5000 genes most enriched for neurons were then used to perform an ALiGeT analysis for the combined Schizophrenia list. To confirm that the threshold used for how neuron-specific the genes are does not influence this result, we also show below a figure with PeGeT results using different thresholds between 1000 and 8000 (Figure 6—figure supplement 1).
To perform the analyses restricted to up/down-regulating genes depicted in Figure 6c,d all probes which show a higher/lower expression level at the beginning of the spline, relative to the splines value at the TTTPs are dropped. Removal of duplicate probes targeting the same HGNC gene is performed after dropping the probes. Probes which do not have TTTPs are also dropped from from both analyses.
For the comparisons of mouse and human genes with large PeGeT scores, orthologs were determined using the Biomart package, through querying which HGNC Genes are orthologs for a given MGI ID. To find GO terms enriched in this set of genes, the MGI symbols were analysed using DAVID (Huang et al., 2009) against a background of mouse genes, and the GO term for ‘synaptic transmission’ was found to be the most enriched (the Benjamini-Hochberg corrected p-value is stated in the text). For the analysis of phenotypes from the Mammalian Phenotype Ontology (Smith and Eppig, 2009) (MPO), a full database of phenotypes was downloaded from ftp.informatics.jax.org. Any genes which were not detected as present by the microarrays were dropped from the MPO database, and a hypergeometric test for significance of enrichment performed on the remainder. Enrichment for ‘behavioural’ and ‘nervous system’ phenotypes was specifically tested for, and hence probabilities presented are not adjusted for multiple hypothesis testing.
Gene association statistics were calculated from Genome Wide Association Study (GWAS) Summary Statistics using v1.05 of MAGMA (Multi-marker Analysis of GenoMic Annotation) (de Leeuw et al., 2015). MAGMA enables disease/phenotypes association scores to be calculated for each gene, while accounting for linkage disequilibrium and the contributions from multiple SNPs. MAGMA takes GWAS summary statistics as input: these do not contain data on the individuals from the study and simply list the association p-values and z-scores/odds ratios for each SNP included in the study. The 1000 Genomes European panel was provided to MAGMA as reference data for calculating Linkage Disequilibrium. The schizophrenia summary statistics are associated with the GWAS analysis performed by the Psychiatric Genomics Consortium (PGC) which found 108 genome wide significant loci (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014); the file was downloaded from the PGC website (https://www.med.unc.edu/pgc/results-and-downloads).
DEGET enrichment was calculated by first grouping the genes into deciles based on the age of the turning point. DEGET groups for the MAGMA method are different from those calculated for gene list approaches for several reasons: firstly, sorting is performed using entrez gene IDs rather than HGNC gene symbols, secondly, all genes within the extended MHC region are removed from these analysis (as their MAGMA gene associations cannot be properly assigned due to Linkage Disequilibrium in this region). DEGET scores were assigned to genes as they were done for the gene set based analysis. To determine whether a given GWAS trait is enriched at a particular age, the z-score calculated by MAGMA for each gene was multiplied by the DEGET score. Z-scores were then shuffled 20,000 times, multiplying at each iteration with the DEGET scores, and compared to the unshuffled value. The p-value is based on the frequency with which the sum of unshuffled value is greater than the shuffled values. The same approach (multiplying ALIS scores with MAGMA z-scores, following by perturbations of z-scores) is used to calculate ALIS enrichment probabilities.
All of the human (including fetal) and all mouse samples were included in the age prediction analysis. Age predictions were performed with radial basis function Support Vector Machines through the e1071 package that provides an interface to libsvm in R (Chang and Lin, 2011). Two rounds of 10-fold partitioning were used to form training, validation and test sets. An initial round of random partitions separated test data from training/validation data. The combined set of training/validation data was then passed to the tune.svm() function from the e1071 package which then uses 10-fold cross-validation to perform a parameter search. A first shallow grid search was performed for and , and the optimal pair of values selected as (. A second finer grid search was then performed over and .
Probes were included in the model by first determining which probes are associated with age, as determined using a linear model. The linear modeling of age-associated probes was repeated for each of the partitioned sets of training/validation data; as such, data in the test set did not influence the selection of probes through the linear model. The probes were then ranked based on the level of association, and the top Ncutoff probes were used for training and testing. The reported accuracies were obtained using values of Ncutoff 40 for humans and 100 for mice. To determine the appropriate level of Ncutoff, a range of values were tested between 10 and 400, and the optimal number manually selected on the basis of having optimal SSE without using an unnecessarily large number of probes.
Crude PSD preparations were made from dissected mouse forebrain tissue from C57BL6/5J mice ages 1 month to 5 months. In brief, each forebrain was homogenized by performing 12 strokes with a Dounce homogenizer containing 5 mL of ice cold homogenization buffer (320 mM sucrose, 1 mM HEPES, pH = 7.4) containing 1X Complete EDTA-free protease inhibitor (Roche) and 1X Phosphatase inhibitor cocktail set II (Calbiochem). Insoluble material was pelleted by centrifugation at 1000 x g for 10 min at 4˚C. The supernatant (S1) was removed and the pellet was re-suspended in 2 mL of homogenization buffer and an additional six strokes were performed. Following a second centrifugation at 1000 x g for 10 min at 4˚C, the supernatant (S2) was removed and pooled with S1. The combined supernatants were then centrifuged at 18, 500 x g for 15 min at 4˚C. The pellet was re-suspended in 5 mL of extraction buffer (50 mM NaCl, 1% DOC, 25 mM Tris-HCl, pH 8.0) containing 1X Complete EDTA-free protease inhibitor (Roche) and 1X Phosphatase inhibitor cocktail set II (Calbiochem) and incubated on ice for 1 hr. The resulting crude PSD extracts were centrifuged at 10,000 x g for 20 min at 4 ˚C and the resulting supernatant filtered through a 0.2 µm syringe filter (Millipore).
Protein concentration of PSD preparations was determined using 1X Quickstart Bradford assay (BioRad). Thirty micrograms of PSD protein was prepared to contain 1X Novex NuPAGE LDS sample loading buffer (Invitrogen) with 100 mM DTT (BioRad), boiled for 10 min at 100˚C and loaded into 1 well of a 10-well Novex NuPAGE 3–12% Bis-Tris gradient gel (Invitrogen). Electrophoresis was performed under reducing conditions using the Novex NuPAGE SDS-PAGE system (Invitrogen) for 5 min. Gels were then stained with SimplyBlue SafeStain (Invitrogen) following the manufacturer’s instructions. Gel bands were excised and subjected to tryptic digestion using standard methods.
Five microgram of tryptic digest was analysed by LC-MS/MS using a UPLC Dionex QExactive (Thermo-Fisher, Waltham, Massachussets, USA). Protein identification was performed with MASCOT (Matrix Sciences) using Uniprot Mouse database (downloaded on 2014 March 24th). Label-free quantitation analysis was then performed on all timepoints examined in the study using Progenesis (Nonlinear Dynamics). The normalized Mass Spectroscopy dataset was read into R. Swissprot/Trembl ID’s for each detected protein was matched to a gene symbol. Orthologs were determined using the Biomart package, through querying which HGNC Genes are orthologs for a given MGI ID. The data was log2 transformed. A linear model was fitted to estimate protein expression based on age and sex using the bioconductor ‘lumi’ package. Significance of differential expression was corrected using the Benjamini-Hochberg method.
Build #85 of the Human Phenotype Ontology was downloaded from http://compbio.charite.de/hudson/job/hpo.annotations.monthly/lastStableBuild. All diseases associated with the following HPO terms were extracted from the ontology using Phenexplorer (compbio.charite.de/phenexplorer): ‘Neoplasm’, ‘Abnormality of the Nervous System’, ‘Abnormal peripheral nervous system morphology’ and ‘Abnormality of the autonomic nervous system’. Using these lists all diseases associated with neoplasms, and peripheral/autonomic disorders, and which are not associated with nervous system disorders, were dropped from the HPO phenotype database. From the remaining dataset, congenital onset disorders were taken as those annotated with the following terms: ‘Congenital onset’, ‘Infantile onset’ or ‘Neonatal onset’. Adult onset disorders were annotated with either ‘Adult onset’, ‘Young adult onset’, ‘Schizophrenia’ or ‘Bipolar affective disorder’. Schizophrenia enrichment was tested using the combined list described earlier. Enrichment analyses were performed using hypergeometric tests. Functional analysis was done by manually assigning a single functional category to each of the proteins detected in the dataset. Functional annotations were based on those from a previous paper on the synaptic proteome (Emes et al., 2008) and expanded based on Panther protein class.
R code was used for all TTTP and age prediction analyses described herein (see Bioconductor, ‘TurningPoints’).
Persistent angiogenesis in the autism brain: an immunocytochemical study of postmortem cortex, brainstem and cerebellumJournal of Autism and Developmental Disorders 46:1307–1318.https://doi.org/10.1007/s10803-015-2672-6
Genetic and non-genetic vulnerability factors in schizophrenia: the basis of the "two hit hypothesis"Journal of Psychiatric Research 33:543–548.https://doi.org/10.1016/S0022-3956(99)00039-4
Prenatal infection and schizophrenia: a review of epidemiologic and translational studiesAmerican Journal of Psychiatry 167:261–280.https://doi.org/10.1176/appi.ajp.2009.09030361
Development of the blood vessels and extracellular spaces during postnatal maturation of rat cerebral cortexThe Journal of Comparative Neurology 138:31–47.https://doi.org/10.1002/cne.901380104
Cognition through the lifespan: mechanisms of changeTrends in Cognitive Sciences 10:131–138.https://doi.org/10.1016/j.tics.2006.01.007
Diagnostic exome sequencing in persons with severe intellectual disabilityNew England Journal of Medicine 367:1921–1929.https://doi.org/10.1056/NEJMoa1206524
Normative data from the CANTAB. I: development of executive function over the lifespanJournal of Clinical and Experimental Neuropsychology 25:242–254.https://doi.org/10.1076/jcen.18.104.22.16839
lumi: a pipeline for processing Illumina microarrayBioinformatics 24:1547–1548.https://doi.org/10.1093/bioinformatics/btn224
Evolutionary expansion and anatomical specialization of synapse proteome complexityNature Neuroscience 11:799–806.https://doi.org/10.1038/nn.2135
Novel insights into the development and maintenance of the blood-brain barrierCell and Tissue Research 355:687–699.https://doi.org/10.1007/s00441-014-1811-2
Supramolecular organization of NMDA receptors and the postsynaptic densityCurrent Opinion in Neurobiology 45:139–147.https://doi.org/10.1016/j.conb.2017.05.019
Hierarchical organization and genetically separable subfamilies of PSD95 postsynaptic supercomplexesJournal of Neurochemistry 142:504–511.https://doi.org/10.1111/jnc.14056
Increased exonic de novo mutation rate in individuals with schizophreniaNature Genetics 43:860–863.https://doi.org/10.1038/ng.886
Synapse proteomics of multiprotein complexes: en route from genes to nervous system diseasesHuman Molecular Genetics 14 Spec No. 2:R225–R234.https://doi.org/10.1093/hmg/ddi330
Heritability of age of onset of psychosis in schizophreniaAmerican Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official Publication of the International Society of Psychiatric Genetics 153B:298–302.https://doi.org/10.1002/ajmg.b.30959
The influence of age and sex on the onset and early course of schizophreniaThe British Journal of Psychiatry 162:80–86.https://doi.org/10.1192/bjp.162.1.80
Extracting biological meaning from large gene lists with DAVIDCurrent protocols in bioinformatics Chapter 13:Unit 13.11.https://doi.org/10.1002/0471250953.bi1311s27
Proteomic analysis of NMDA receptor-adhesion protein signaling complexesNature neuroscience 3:661–669.https://doi.org/10.1038/76615
Development of the dopaminergic innervation in the prefrontal cortex of the ratThe Journal of Comparative Neurology 269:58–72.https://doi.org/10.1002/cne.902690105
Age of onset of mental disorders: a review of recent literatureCurrent Opinion in Psychiatry 20:359–364.https://doi.org/10.1097/YCO.0b013e32816ebc8c
Comparative genome hybridization suggests a role for NRXN1 and APBA2 in schizophreniaHuman Molecular Genetics 17:458–465.https://doi.org/10.1093/hmg/ddm323
DNA array profiling of gene expression changes during maize embryo developmentFunctional & Integrative Genomics 2:13–27.https://doi.org/10.1007/s10142-002-0046-6
Keeping up with genetic discoveries in amyotrophic lateral sclerosis: the ALSoD and ALSGene databasesAmyotrophic Lateral Sclerosis 12:238–249.https://doi.org/10.3109/17482968.2011.584629
Comparing development of synaptic proteins in rat visual, somatosensory, and frontal cortexFrontiers in Neural Circuits 7:97.https://doi.org/10.3389/fncir.2013.00097
The mammalian phenotype ontology: enabling robust annotation and comparative analysisWiley Interdisciplinary Reviews: Systems Biology and Medicine 1:390–399.https://doi.org/10.1002/wsbm.44
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridizationMolecular Biology of the Cell 9:3273–3297.https://doi.org/10.1091/mbc.9.12.3273
Neurodevelopment and Schizophrenia69–88, Cognitive development in adolescence: cerebral underpinnings, neural trajectories, and the impact of aberration, Neurodevelopment and Schizophrenia.
Modeling transformations of neurodevelopmental sequences across mammalian speciesJournal of Neuroscience 33:7368–7383.https://doi.org/10.1523/JNEUROSCI.5746-12.2013
Age-related decrease in brain synaptic membrane Ca2+-ATPase in F344/BNF1 ratsNeurobiology of Aging 19:487–495.https://doi.org/10.1016/S0197-4580(98)00078-5
Dendritic pruning of the medial amygdala during pubertal development of the male Syrian hamsterJournal of Neurobiology 66:578–590.https://doi.org/10.1002/neu.20251
Jonathan FlintReviewing Editor; University of California, Los Angeles, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "A genomic ageing program that reorganises the young adult brain is targeted in schizophrenia and anxiety disorders" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
The authors create a new method to quantify transcriptome trajectory turning points (TTTP) and apply this to human BrainCloud dataset and a new mouse brain transcriptome dataset across ages 58-600 days. The authors identify a peak age of TTTP around 25-30 years old, which they link to psychiatric disorders (schizophrenia and autism) and synaptic development. Overall the reviewers felt that this was an ambitious analysis covering a novel and interesting topic (complex brain gene expression trajectories).
1) The text needs attention. The Introduction is long and meandering and would be more accessible if shortened significantly. The authors need to explain what they have done with greater clarity. For example the model the authors propose (subsection “Transcriptome trajectories and schizophrenia” paragraph three) is so vague as to be nearly meaningless.
2) The authors need to explain better which models they used and why. There are several algorithms presented in this work (TTTP, peget, DeGET, AliGET), of which only TTTP is straightforward. It is hard to understand why the authors choose to use one over the other. Some of the models (such as Aliget) include parameters that do not have a clear biological justification and seem somewhat arbitrary. Comparison of units across species is also unclear – is a human "year" equivalent to a mouse "day"?
3) Correction for multiple testing needs to be made explicit. For the cell type analysis a Bonferoni correction is used, but it is not clear that appropriate corrections were made for other analyses (the multiple different enrichment tests).
4) A list of >600 de novo mutation-containing genes is used. However, this is not a list of genes with any level of reasonable statistical support – what is the FDR? Are these all mutations in genes never observed in controls? The anxiety gene list is not well statistically supported and has no place in such an analysis.
5) Some of the models include a δ-E term, which "negates the contribution of genes that merely fluctuate away from their mean." The δ-E term also likely biases analysis toward genes with certain levels of baseline expression, as well as the dynamic range of microarrays. In addition, the gene fluctuation issue should be adequately accounted for by using spline regressions and biological replicates.
6) With regard to the age classifier, it is unclear if this is developed using a test set and validation set, which is necessary to truly evaluate accuracy that would be generalizable and meaningful. If it is not, they must find an independent validation set to make this of any meaning. In this regard, the cell type analysis does use a validation set for example.
7) All models are based on the TTTP point, which is defined as the earliest age in which the first derivative of expression changes signs. It is not clear why the authors only use the earliest of such points when genes can clearly have multiple inflections across the lifetime (see Figure 2—figure supplement 1). By taking the earliest inflection point, this makes the analysis somewhat contingent on youngest sample in a dataset (which is not balanced between mouse and human experiments), as well as the resolution of samples at the younger timepoints, which is sparse for BrainCloud (see below). Finally, all models are highly contingent on (1) the quality and resolution of the input datasets and (2) the accuracy of the spline regression model. As shown in the supplement, choice of regression parameters can have dramatic effects on downstream analyses. There is no justification for the cubic spline model used or any assessment of the accuracy with which it fits the underlying data. The choice of parameters seems arbitrary.
8) Gender differences – Using the Braincloud dataset, the peak age of TTTP is reported to be delayed in females compared to males and this is used as an explanation for why psychiatric diseases occur later in F. However, the Braincloud data that this result is based on has already had sex effects regressed out prior to analysis. It is inappropriate to make any comment on sex effect without using the non-regressed dataset.
9) Braincloud is used to define the human TTTP trajectory and includes human prenatal and postnatal samples. The regressed and normalized data is downloaded from GEO. However, it is unclear whether measures of RNA quality and other sources of technical variability are taken into account in any of the analyses. There will likely be an interaction between age and RNA quality and therefore potentially more variability in gene expression signal seen during certain timepoints. Replication in the Kang et al. (GSE25219) dataset will be important as this is similar to BrainCloud in scope.
10) The authors used the cubic splines with 3 DOF to depict the gene expression trajectories, and the TTTPs were detected by searching slope changing points along the smooth curves. How well does this actually fit the data? Why were 3 DOF chosen? There should be some rigorous assessment of accuracy of regression model (compared with Loess, etc)
11) The degree of smooth curve and the number of binned intervals could affect the number of identified TTTPs. But in subsection “Detecting turning points (TTTPs” the author said "If multiple turning points were found in a single spline, then only the first was used". This strategy may result in the imprecisely predicting TTTPs, e.g., the differences shown in Figure 4—figure supplement 2 when using different parameters and different methods.
12) Human-Mouse comparisons: the authors find a peak of TTTP around age 160 days in mice which they imply is equivalent to 26-30 years in human. However, mice were not assessed before 58 days postnatally whereas human time-points include prenatal brain samples. Since TTTP is calculated as the earliest transition point (if multiple are detected in the sample gene) it is likely sensitive to the age windows assessed.
13) Age differences – the peak age of TTTP is reported to be in the range of 25-30 years in human. However, there are a relatively few number of samples in the 2-20 age groups compared with age 0 and 20+. The ability to identity changes in transcriptomic trajectory during this crucial period of brain development is strongly unpowered and could confound the age trajectory of TTTP.
14) Psychiatric enrichments – subsection “Psychiatric susceptibility genes in young adults”, paragraph two: "TTTP.…accurately predicted the age windows for the onset of schizophrenia and anxiety disorders." As shown in Figure 5—figure supplement 3, this is highly contingent on parameter choice of the model used to fit transcriptome data. Loess regression or cubic spline with 4 dof predicts much earlier onset of schizophrenia (~16 year old). The choice of parameters does not seem to be justified either biologically or in terms of fit to the dataset. As such, this conclusion cannot be considered robust.
15) Cell-type enrichment analyses are likely on very small numbers of genes, which is susceptible to bias. The Zeisel dataset was used to define single cell transcriptomes. However, this dataset is based on single-cell RNAseq with <3000 genes expressed in each cell. Genes were dropped from analysis that were not expressed in Zeisel and therefore the input set of genes is likely very small.
16) The author claimed the accuracies of age prediction are 6.5 years in human and 28 days in mice. If this prediction included fetal samples, the prediction is imprecise. One similar age prediction from DNA methylation reported the predicting difference is close to zero for embryonic samples (Genome Biology 2013, 14:R115). If this prediction did not include fetal samples, the authors should not state that, "Remarkably, they showed accurate age predictions across the entire range of ages in both species," as Braincloud includes 38 fetal samples. Similarly, the authors should make clear that they excluded these samples, and, in light of the dramatically higher temporal dynamics observed in fetal and infant periods in these samples (Nature 478, 519-523, Figure 2B), why they excluded these samples.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "A genomic ageing program that reorganises the young adult brain is targeted in schizophrenia and anxiety disorders" for further consideration at eLife. Your revised article has been favorably evaluated by a Senior editor, a Reviewing editor, and one reviewer.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
In my original review, I was very enthusiastic about the innovative approach and ideas contained in this paper, but was concerned about Methods and potential confounders, which were hard to understand and evaluate in the original paper. The other reviewer had very similar concerns. The authors now provide a detailed response to the critiques and a revised manuscript, which is much improved. They have rewritten, shortened many sections and improved clarity. They have done a good job of clarifying parameter choice and for some of algorithms, and shown the robustness of the methods to these choices.
I should emphasize that I still have my original enthusiasm about this innovative study. But although some of my concerns are well addressed, others regarding the robustness of the method and interpretation of the data are not and still need additional work. Because this study could illuminate some areas of disease susceptibility that have been mysterious, it could have high impact, but it is critical that it be methodologically solid in every respect. I think it is now believable that young adult is a time of interesting changes in gene expression trajectories, although as the authors acknowledge choice of spline parameters effects the actual age considerably. This is not a minor concern that needs acknowledgement, but a key point of their analysis. Better if the authors presented an interval of TTTP ranges based on parameter choices just to show what we can be certain about from these data. The major question that still remains is how is this related to disease specific factors.
Other concerns are as follows:
1) The main focus on the paper is defining transcripome trajectory turning points, which they then relate to disease. They have clarified the use of algorithms and Methods. Now that some of these issues are clarified, one major question still remains. If the TTTP indeed does define SZ onset at late adolescence with a peak at 25 years (although with some range depending on parameters chosen as mentioned above), it should be specific to schizophrenia and not to other disorders. The choice of gene sets to be compared is critical here, and in no way are the genetic studies of the brain disorders that are compared comparable. Some identify mostly common variants of small effect size, and others, such as ID and autism, mostly rare variants with large effect size. These mutations are under considerably different selection pressures as their effects sizes differ.
So, a major problem that still remains is that the choice of the 600 SZ de novo genes is not well supported by statistical genetic analysis. The authors’ response to this in the revision is not satisfactory. There is strong support for de novo mutations as a class in SZ and other neurodevelopmental disorders, and some pathways, more equivocally. But, the individual genes in this list are mostly not supported by evidence yet. It is not a question of finding mutations that occur only in SZ patients and not in controls; it is standard population genetics. de novo mutations occur frequently and many of these genes are almost certainly not risk genes. The authors should absorb some of the analyses in the following papers and should filter the list by metrics that are now well accepted in human genetics (PMID:27899611; PMID:25086666; PMID: 27535533; PMID:27533299; PMID:26439716; PMID:25684150).
Also in this regard, genome-wide significant genes for anxiety disorders have not been identified. Again, the use of a hodge podge of gene lists, without strong statistical support or rationale plagues this major aspect of the paper. The choice of gene lists is so important that it should be presented up-front and clarified. And when comparisons are made with these lists, they should be of equal power, or they don't support disease specificity. In other words, the gene lists should be the same size approximately, or better yet, account for a similar proportion of the population attributable genetic liability to account for the different effect sizes of mutations and common variants. Further, anxiety should certainly not be used at all, unless the authors can provide genome-wide statistical support for the genes identified.
If the authors want to use this SZ list of 600 genes that lacks statistical support or OR for most of the genes, they should definitely compare to a similarly filtered of genes that cause autism and intellectual disability, or list of brain enriched genes (2/3 fold?) to see if the trajectory that they see in SZ is meaningful, as mentioned above. In the gene lists there are 2- 3x more SZ genes than ASD genes (one list has 170, the other about 380), and there are only about 80 ID genes listed, when there are >400 known. At least the sensitivity of the TTTP analyses to the size of the gene list should be provided. Another ground truth would be to show that a similar sized list of PSD genes, or highly synaptically enriched genes not chosen for association with SZ did not show the same pattern. It simply may be that certain classes of synaptic genes are enriched for this pattern of trajectory switching and this creates a vulnerable period, but it is not due to an enrichment of SZ risk genes themselves.
3) The human-mouse comparison is really a tough area as the authors recognize, since it has been demonstrated that there is not a linear, constant scaling of stage comparability between these species across development when specific developmental processes are compared. The authors should at least cite some of the work that demonstrates this and discuss it more thoroughly as a caveat. Currently it is somewhat glib. They have done a very good job of dealing with this issue analytically, but the readers should know that it is a biological conundrum that is tough to address with any method mapping human development to mouse, and what the limitations are.https://doi.org/10.7554/eLife.17915.036
- Seth GN Grant
- Seth GN Grant
- Nathan G Skene
- Seth GN Grant
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Support from the Medical Research Council, Wellcome Trust, European Union Seventh Framework Programme (FP7 grant agreement no. 242167). T Le Bihan and L Imrie at SynthSys, University of Edinburgh for mass spectrometry sample analysis. The LC-MS QExactive equipment was purchased by a Wellcome Trust Institutional Strategic Support Fund and a strategic award from the Wellcome Trust for the Centre for Immunity, Infection and Evolution (095831/Z/11/Z). D Maizels for artwork.
Animal experimentation: All animal experiments conformed to the British Home Office Regulations (Animal Scientific Procedures Act 1986; Project License PPL80/2,337 to Prof Seth Grant), local ethical approval, and NIH guidelines.
- Jonathan Flint, Reviewing Editor, University of California, Los Angeles, United States
- Received: May 17, 2016
- Accepted: August 15, 2017
- Version of Record published: September 12, 2017 (version 1)
© 2017, Skene et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.