Inter-tissue convergence of gene expression during ageing suggests age-related loss of tissue and cellular identity
Abstract
Developmental trajectories of gene expression may reverse in their direction during ageing, a phenomenon previously linked to cellular identity loss. Our analysis of cerebral cortex, lung, liver, and muscle transcriptomes of 16 mice, covering development and ageing intervals, revealed widespread but tissue-specific ageing-associated expression reversals. Cumulatively, these reversals create a unique phenomenon: mammalian tissue transcriptomes diverge from each other during postnatal development, but during ageing, they tend to converge towards similar expression levels, a process we term Divergence followed by Convergence (DiCo). We found that DiCo was most prevalent among tissue-specific genes and associated with loss of tissue identity, which is confirmed using data from independent mouse and human datasets. Further, using publicly available single-cell transcriptome data, we showed that DiCo could be driven both by alterations in tissue cell-type composition and also by cell-autonomous expression changes within particular cell types.
Editor's evaluation
In this study, Izgi et al. investigated age-dependent gene expression pattern changes in male mice by analysing a new bulk RNA-seq data from four different tissues collected at different ages covering postnatal development and ageing. Gene expression patterns observed before sexual maturity show inter-tissue divergence, whereas convergence of gene expression profiles is observed after sexual maturity and during ageing, in a pattern that the authors call divergence-convergence or ‘DiCo.’ This observation may suggest that ageing results in at least a partial loss of tissue identity acquired developmentally.
https://doi.org/10.7554/eLife.68048.sa0Introduction
Development and ageing in multicellular organisms are highly intertwined processes. On the one hand, certain ageing-related phenotypes, such as presbyopia and osteoporosis (Luegmayr et al., 2004), are believed to represent the continuation of developmental processes into adulthood (Blagosklonny, 2006; de Magalhães and Church, 2005). Such cases of ‘runaway development’ or higher than optimal function during ageing (recognised as the hyperfunction theory of ageing; Gems and Partridge, 2013) may arise due to declined natural selection pressure failing to optimise expression regulation after sexual reproduction starts (Fisher, 1930; Medawar, 1953; Williams, 1957). Indeed, recent experimental studies in Caenorhabditis elegans show that senescence phenotypes promoted by insulin-IGF-1 signalling pathways support the hyperfunction theory (Lind et al., 2019; Ezcurra et al., 2018). On the other hand, molecular studies have also reported a reversal of the ageing transcriptome towards pre-adult levels in various contexts, including primate brain regions (Somel et al., 2010; Dönertaş et al., 2017; Colantuoni et al., 2011), and mouse liver and kidney (Anisimova et al., 2020). Studying the functional consequences of this reversal pattern in the ageing human brain, we previously interpreted it as an indication of loss of cellular identity in neurons, possibly exacerbated by a reduction in the relative frequencies of neurons (Dönertaş et al., 2017). Such changes, in turn, could be caused by the accumulation of stochastic damage at the genetic, epigenetic, and proteomic levels over an adult lifetime, causing deregulation of gene expression networks.
Several major questions remain. First, the prevalence of reversal phenotypes across tissues is unclear as most research has been conducted in the brain (Somel et al., 2010; Dönertaş et al., 2017). A second question pertains to the similarity of reversal-exhibiting genes and pathways across tissues. Ageing-related expression changes are partly shared among organs (Zahn et al., 2007), and reversal trends are also shared across different regions of the primate brain (Dönertaş et al., 2017). Distinct tissues might hence show parallel reversal patterns. Alternatively, as mammalian tissues diverge from each other during development in their transcriptome profiles (Cardoso-Moreira et al., 2019), one may hypothesise that during ageing tissues converge back towards similar transcriptome profiles. Such a putative late-age convergence phenomenon would be consistent with the notion of ageing-related cellular identity loss (Yang et al., 2019; Dönertaş et al., 2017). A final question concerns the mechanism behind the observed reversal trends at the bulk tissue level. Specifically, the contribution of cell-type composition and cell-autonomous changes to the reversals at the tissue level remains unexplored.
Documenting the reversal phenomenon is critical to better understand the proximate mechanisms of mammalian ageing, and its ultimate mechanisms, such as the stochastic disruption versus continued expression of developmental genes. However, such work has been limited by the scarcity of studies that include both development and ageing periods of the same organism and across different tissues. This work presents an age-series analysis of bulk transcriptome profiles of mice, including samples of four tissues across postnatal development and ageing periods covering the whole postnatal lifespan. Using this dataset, we study the prevalence, mechanisms, and functional consequences of the reversal phenomenon in different mouse tissues. We further test the related hypothesis of tissue convergence during ageing and investigate the contribution of cell-type composition and cell-autonomous changes.
Results
We generated bulk RNA-seq data from 63 samples covering the cerebral cortex (which we refer to as cortex), liver, lung, and skeletal muscle (which we refer to as muscle) of 16 male C57BL/6J mice, aged between 2 and 904 days of postnatal age (Materials and methods). As mice reach sexual maturity by around 2 months (Tacutu et al., 2018), we treated samples from individuals aged between 2 and 61 days (n = 7) as the development series, and those aged between 93 and 904 days (which roughly correspond to 80-year-old humans; Flurkey et al., 2007) (n = 9) as the ageing series (Figure 1—figure supplement 1). The final dataset contained n = 15,063 protein-coding genes expressed in at least 25% of the 63 samples (one 904-day-old mouse lacked cortex data).
Tissues diverge during postnatal development
Consistent with earlier work (Brawand et al., 2011; Cardoso-Moreira et al., 2019), we found that variation in gene expression is largely explained by tissue differences, such that the first three principal components (PCs) separate samples according to tissue (ANOVA p<10–20 for PC1–3, Figure 1—source data 1), with the cortex most distant from the others (Figure 1a). Meanwhile, PC4, which explains 8% of the total variance, displayed a shared age effect across tissues in development (Spearman’s correlation coefficient ρ = [-0.88, –0.99], nominal p<0.01 for each test; Figure 1b). Also, after the tissue effect was removed by standardisation, principal components analysis (PCA) showed a strong influence of age on the first two PCs, which explains 31% of the variance in total (Figure 1—figure supplement 2). We further observed higher similarity among tissues at the juvenile stage compared to the young-adult stage. In other words, distances between tissues increased with age (change in mean Euclidean distance among tissues with age during development in PC1–PC4 space ρdev = 0.99, pdev = 1.5 × 10–5, Figure 1—source data 1), which resonates with previous reports of inter-tissue transcriptome divergence during development (Cardoso-Moreira et al., 2019). This divergence pattern was also observed when PCA was performed with developmental samples only (days 2–61: change in mean Euclidean distance among tissues in PC1–PC4 space; ρ = 0.95, p=0.0008; Figure 1—figure supplement 3a and b).

Data summary and age-related expression patterns.
(a) Principal components analysis (PCA) of expression levels of 15,063 protein-coding genes across four tissues of 16 mice. Values in parentheses show the variation explained by each component. (b) Age trajectories of PC3 (left) and PC4 (right). Spearman’s correlation coefficients between PC4 and age in each tissue in development range between 0.88 and 0.99 (see Figure 1—source data 1 for all tests). The dashed vertical line indicates 90 days of age, separating development and ageing periods. Age distribution of samples are given in Figure 1—figure supplement 1. (c) Similarity between the age-related gene expression changes (Spearman’s correlation coefficient between expression and age without a significance cutoff) across tissues in development and ageing. Similarities were calculated using Spearman’s correlation coefficient between expression-age correlations across tissues. CTX, cortex; LV, liver; LNG, lung; MS, muscle. (d) The number of significant age-related genes in each tissue (false discovery rate [FDR]-corrected p-value<0.1). (e) Shared age-related genes among tissues identified without using a significance cutoff. The x-axis shows the number of tissues among which age-related genes are shared. Significant overlaps are indicated with an asterisk (Figure 1—figure supplement 4). (f) The proportion of age-related expression change trends (no significance cutoff was used) in each tissue across the lifetime. UpDown: upregulation in development and downregulation in the ageing; DownUp: downregulation in development and upregulation in the ageing; UpUp: upregulation in development and upregulation in the ageing; DownDown: downregulation in development and downregulation in the ageing. We confirmed the robustness of the results using variance stabilising transformation (VST normalisation in Figure 1—figure supplement 10).
-
Figure 1—source data 1
Data summary, age-related expression patterns, and reversal patterns.
- https://cdn.elifesciences.org/articles/68048/elife-68048-fig1-data1-v4.xlsx
Tissues involve common gene expression changes with age
We next characterised age-related changes in gene expression shared across tissues by (1) studying overall trends at the whole transcriptome level and testing their consistency using permutation tests, and (2) studying statistically significant changes at the single-gene level. First, we investigated similarities in overall trends of gene expression changes with age using the Spearman’s correlation coefficient (ρ) between expression levels and age for each gene in each tissue separately for the developmental and ageing periods (Materials and methods; tissue-specific age-related gene expression changes and functional enrichment test results are available in Supplementary file 1). We then examined transcriptome-wide similarities across tissues during development and ageing by comparing these gene-wise expression-age correlation coefficients (Figure 1c). Considering the whole transcriptome without a significance cutoff, we found a weak correlation of age-related expression changes in tissue pairs, both during development (ρ = [0.17, 0.39], permutation test p<0.05 for all the pairs, Figure 1—source data 1), and ageing (ρ = [0.23, 0.33], permutation test p<0.05 in 4/6 pairs, Figure 1—source data 1). We then tested whether developmental patterns among tissues may be shared more than ageing-associated patterns, but we did not find significant difference between inter-tissue similarities within the development and those within ageing (Wilcoxon signed-rank test, p=0.31). Moreover, the number of genes with the same direction of change (without applying a significance cutoff) across four tissues was consistently more than expected by chance (permutation test p<0.05), except for genes upregulated in ageing (Figure 1e, Figure 1—figure supplement 4). This attests to overall similarities across tissues both during postnatal development and during ageing, albeit of modest magnitudes. We obtained similar results using another normalisation approach, variance stabilising transformation (VST) from the DESeq2 package (Love et al., 2014), and confirmed that the observed patterns are not affected by the choice of normalisation method (Figure 1—figure supplements 10–11).
In the second approach, we focused on genes showing a significant age-related expression change, identified separately during development or during ageing (using Spearman’s correlation coefficient and false discovery rate [FDR]-corrected p-value<0.1, Figure 1d). We found that the developmental period was accompanied by a large number of significant changes (n = [1,941, 6,151], 13–41% across tissues), with the most manifest changes detected in the cortex. The genes displaying significant developmental changes across all four tissues also showed significant overlap (Figure 1—figure supplement 5a, Figure 1—figure supplement 6; permutation test: pshared_up = 0.027, pshared_down < 0.001). Using the Gene Ontology (GO), we found that shared developmentally upregulated genes were enriched in functions such as hormone signalling pathways and lipid metabolism (FDR-corrected p-value<0.1). Meanwhile, shared developmentally downregulated genes were enriched in functions such as cell cycle and cell division (FDR-corrected p-value<0.1; Supplementary file 2). Contrary to widespread expression change during development (13–41%), the proportion of genes undergoing significant expression change during ageing was between 0.013 and 15% (Figure 1d). This contrast between postnatal development and ageing was also observed in previous work on the primate brain (Somel et al., 2010; Işıldak et al., 2020). In terms of the number of genes with a significant ageing-related change, the most substantial effect we found was in the lung (n = 2319), while close to no genes showed a statistically significant change in the muscle (n = 2), a tissue previously noted for displaying a weak ageing transcriptome signature across multiple datasets (Turan et al., 2019). Not unexpectedly, we found no common significant ageing-related genes across tissues (Figure 1—figure supplement 5a). Considering the similarity between the ageing and development datasets (Figure 1c) and the similar sample sizes in development (n = 7) and ageing periods (n = 9), the lack of overlap in significant genes in ageing might be due to low signal-to-noise ratios in the ageing transcriptome as ageing-related changes are subtler compared to those in development (Figure 1—figure supplement 5b).
Gene expression reversal is a common phenomenon in multiple tissues
We then turned to investigate the prevalence of the reversal phenomenon (i.e. an opposite direction of change during development and ageing) across the four tissues. We first compared the trends of age-related expression changes between development and ageing periods in the same tissue, without a significance cutoff, to assess transcriptome-wide reversal patterns (Figure 1c). This revealed weak negative correlation trends in liver and muscle (though not in the lung and cortex), that is, genes up- or downregulated during development tended to be down- or upregulated during ageing, respectively. These reversal trends were comparable when the analysis was repeated with the genes showing relatively high levels of age-related expression change (|ρ| > 0.6 in both periods; Figure 1—figure supplement 7). We further studied the reversal phenomenon by classifying each gene expressed per tissue (n = 15,063) into those showing up- or downregulation during development and during ageing. Here, again, we did not use a statistically significance cutoff and summarised trends of continuous change versus reversal in each tissue. This approach follows Dönertaş et al., 2017 and focuses on global trends instead of single genes. In line with the above results, as well as earlier observations in the brain, kidney, and liver (Dönertaş et al., 2017; Anisimova et al., 2020), we found that ~50% (43–58%) of expressed genes showed reversal trends (Figure 1f), although these proportions were not significantly more than randomly expected in permutation tests (Figure 1—figure supplement 8, Materials and methods). Overall, we conclude that although the reversal pattern is not ubiquitous, the expression trajectories of the genes do not necessarily continue linearly into the ageing period.
Pathways related to development, metabolism, and inflammation are associated with the reversal pattern
We then asked whether genes displaying reversal patterns in each tissue may be enriched in functional categories. Our earlier study focusing on different brain regions had revealed that up-down genes, that is, genes showing developmental upregulation followed by downregulation during ageing, were enriched in tissue-specific pathways, such as neuronal functions (Dönertaş et al., 2017). Analysing up-down genes compared to all genes upregulated during development, we also found significant enrichment (FDR-corrected p-value<0.1) in functions such as ‘synaptic signalling’ in the cortex, as well as ‘tube development’ and ‘tissue morphogenesis’ in the lung, ‘protein catabolic process’ in the liver, and ‘cellular respiration’ pathways in the muscle (Supplementary file 3). Meanwhile, down-up genes (downregulation during development followed by upregulation during ageing) showed significant enrichment in functions such as ‘wound healing’ and ‘peptide metabolic process’ in the cortex, ‘translation’ and ‘nucleotide metabolic process’ in the lung, ‘inflammatory response’ in the liver, and ‘leukocyte activation’ in the muscle (Supplementary file 3).
Genes showing a reversal pattern are not shared among tissues
As tissues displayed modest positive correlations in their development- or ageing-related expression change trends (Figure 1c, Figure 1—figure supplement 7), and as we had previously observed that distinct brain regions show similarities in their reversal patterns (i.e. the same genes showing the same reversal type), different tissues might also be expected to show similarities in their reversal patterns. Interestingly, we found no overlap between gene sets with the reversal pattern (up-down or down-up genes) across tissues, relative to random expectation (permutation test, pup-down = 0.08, pdown-up = 0.53; Figure 1—figure supplement 9). Such a lack of overlap might be explained if genes showing reversal patterns in each tissue tend to be tissue-specific. It would also be consistent with the notion that reversals involve loss of cellular identities gained in development, during which tissue transcriptomes appear to diverge from each other (Figure 1a, Figure 1—figure supplement 3; Cardoso-Moreira et al., 2019). This result led us to ask whether, in accordance with the reversal phenomenon, inter-tissue transcriptome divergence may be followed by increasing inter-tissue similarity, or convergence, during ageing.
Inter-tissue divergence during development and convergence during ageing
We studied the inter-tissue divergence/convergence question using two approaches. In the first, we analysed how transcriptome-wide expression variation among tissues changes with age regardless of their age-related expression patterns in any particular tissue. To do this, for each individual, we calculated the coefficient of variation (CoV) across the four tissues for each commonly expressed gene (n = 15,063), which represents a measure of expression variation among tissues. Then, we assessed how such inter-tissue variation changes over the lifetime by calculating the Spearman’s correlation coefficient between CoV and age separately for development and ageing periods (correlation values for all genes are given in Figure 2—source data 1).
Using the CoV values calculated across all 15,063 genes (excluding one 904-day-old individual for which we lacked the cortex data), we observed a significant mean CoV increase in development (Spearman’s correlation coefficient ρ = 0.77, two-sided p=0.041), confirming that tissues diverge as development progresses (Figure 2a). Interestingly, during ageing, we observed a decrease in mean CoV with age, albeit not significant (ρ = −0.50, p=0.204, Figure 2a), suggesting that tissues may tend to converge during ageing. This was also supported by the PCA in which we observed a trend of ageing-associated decrease in mean Euclidean distance among tissues (using PC1–PC4 space with quantile-normalised data: ρ = −0.87, p=0.0026; with VST-normalised data ρ = −0.58, p=0.102, Figure 1—source data 1). We obtained the same divergence-convergence pattern by calculating the median CoV values for each individual instead of the mean (Figure 2—figure supplement 1). Figure 2b exemplifies this pattern of increasing and then decreasing CoV through lifetime for the gene displaying the strongest such signal.

Age-related change in gene expression variation among tissues estimated with coefficient of variation (CoV).
(a) Transcriptome-wide mean CoV trajectory with age. Each point represents the mean CoV value of all protein-coding genes (15,063) for each mouse (n = 15) except the one that lacks expression data in the cortex. (b) Age effect on CoV value of the Cd93 gene which has the highest rank for the divergence-convergence (DiCo) pattern in four tissues (Materials and methods). CoV increases during development and decreases during ageing, indicating expression levels show DiCo patterns among tissues. (c) Expression trajectories of the gene Cd93 in four tissues. (d) The number of significant CoV changes with age (false discovery rate [FDR]-corrected p-value<0.1) during development (left, nconverge = 772, ndiverge = 1809) and ageing (right, nconverge = 42, ndiverge = 20). Converge: genes showing a negative correlation (ρ) between CoV and age; diverge: genes showing a positive correlation between CoV and age. (e) Log2 ratio of convergent/divergent genes in development and in ageing. The graph represents only genes showing significant CoV changes (FDR-corrected p-value<0.1, given in panel d). Error bars represent the range of log2 ratios calculated from leave-one-out samples using the jackknife procedure (Materials and methods, values are given in Figure 2—source data 1).
-
Figure 2—source data 1
All the data related to divergence-convergence (DiCo) pattern: age-related coefficient of variation (CoV) change of genes, pairwise tissue expression correlations, analysis of independent datasets; GSE34378 (Jonker et al.), GSE132040 (Schaum et al.), and GTEx.
- https://cdn.elifesciences.org/articles/68048/elife-68048-fig2-data1-v4.xlsx
We identified n = 9058 genes showing divergent trends among tissues in development based on their CoV change with age (without using a significance cutoff per gene). Among these, n = 4802 showed convergent trends in ageing, which we refer to as DiCo genes. We next studied the transition points between divergence and convergence by clustering genes showing the DiCo pattern (n = 4802) based on their CoV values (Figure 2—figure supplement 2). Notably, cluster 1, which shows a slightly delayed divergence starting after 8 days and peaks around 3 months, was associated with metabolic and respiration-related processes (FDR-corrected p-value<0.1), and cluster 5, which shows a relatively delayed convergence after 4 months, was enriched in categories related to vascular development (FDR-corrected p-value<0.1) (Supplementary file 4). To assess the contribution of different tissues to the DiCo pattern, we further clustered DiCo-displaying genes (n = 4802) based on their expression levels (Figure 2—figure supplement 3). Not surprisingly, the clusters with relatively higher expression levels of a tissue (e.g. muscle in cluster 9) were enriched in functional categories (FDR-corrected p-value<0.1) related to that tissue (e.g. muscle cell development) (Supplementary file 5).
We then studied DiCo at the single-gene level. We tested each gene for a significant CoV change in their expression levels (i.e. divergence or convergence) in development and ageing (Spearman’s correlation test with FDR-corrected p-value<0.1). We found that the ratio of divergent and convergent genes differed significantly between development (70% divergence among 2581 significant genes) and ageing (68% convergence among 62 significant genes) (Figure 2d and e). The same pattern was also observed without using significance cutoff (Figure 2—figure supplement 4). We also confirmed that this pattern is also observed with VST-normalised data (Materials and methods), and is thus not affected by the data preprocessing approach (Figure 2—figure supplement 14).
To our knowledge, inter-tissue convergence during ageing is a novel phenomenon. We first considered the possibility that convergence during ageing could be explained by heteroscedasticity which could arise due to increased inter-individual variability in gene expression during ageing (Somel et al., 2006). To test this hypothesis, we compared expression-age heteroscedasticity levels between two gene sets: (1) genes with the DiCo pattern and (2) genes showing divergent patterns throughout lifetime (divergent-divergent [DiDi, n = 4182]) for each tissue separately (Materials and methods). We did not observe any significant difference in heteroscedasticity between DiCo and DiDi genes in any of the tissues (two-sided Kolmogorov–Smirnov [KS] test, p>0.05 in all tissues, Figure 2—figure supplement 15), which suggests that heteroscedasticity due to increased inter-individual variability probably does not drive the observed age-related convergence during ageing. Visual inspection of gene expression clusters also suggested that the DiCo pattern is not particularly associated with nonlinear changes in gene expression with age (Figure 1—figure supplements 12–15).
In order to further verify the DiCo pattern, we used a second approach to test it in our mouse dataset. For each individual, we calculated correlations between pairs of tissues across their gene expression profiles. Under the DiCo pattern, we would expect pairwise correlations to decrease during development and increase during ageing. Among all pairwise comparisons, we observed a strong negative correlation during development (ρ = [-0.61, –0.9], nominal p<0.05 in five out of six tests), while during ageing, four out of six comparisons showed a moderate positive correlation (ρ = [0.16, 0.69], nominal p<0.05 in one out of six comparisons, Figure 2—figure supplement 5). Calculating the mean of pairwise correlations among tissues for each individual, we observed the same DiCo pattern (nominal p<0.05 for both periods, Figure 2—figure supplement 6).
The DiCo pattern indicates loss of tissue specificity during ageing
Potential explanations of the DiCo pattern involve two scenarios consistent with the age-related loss of identity: (1) decreased expression of tissue-specific genes in their native tissues or (2) non-specific expression of tissue-specific genes in other tissues. To test these predictions, we first identified tissue-specific gene sets based on relatively high expression of that gene in a particular tissue (cortex: 1175; lung: 839; liver: 986; muscle: 766 genes). We noted that tissue-specific genes show clear up-down reversal patterns, being mostly upregulated during development, and downregulated during ageing (Figure 3, 57–89%). The up-down reversal pattern was particularly strong among tissue-specific genes for the three of four tissues tested (OR = [1.65, 6.52], p<0.05 for each tissue except in liver: OR = 0.87, p=0.09, Figure 3—source data 1). Tissue-specific genes were also enriched among DiCo genes (Figure 3—source data 1, OR = 1.56, Fisher’s exact test p<10–16).

Reversal patterns among tissue-specific genes.
Age-related expression changes of the tissue-specific genes. In each panel (a–d), the upper-left subpanels show effect size (ES) calculated with the Cohen’s d formula, using expression levels of each gene among tissues (Materials and methods). The IQR (line range) and median (point) ES for each tissue are shown. The number of tissue-specific genes is indicated inside each subpanel. The lower-left subpanels show violin plots of the distribution of age-related expression change values (Materials and methods) among tissue-specific genes in development and in ageing. Each quadrant represents the plots for each tissue-specific gene group. The red and blue lines connect gene expression changes for the same genes in development and ageing. DU: percentage of down-up reversal genes among downregulated, tissue-specific genes in development; UD: percentage of up-down reversal genes among upregulated, tissue-specific genes in development. Tissue-specific genes are enriched among UD reversal genes except in the liver (Fisher’s exact test; ORcortex = 1.65, ORlung = 6.52, ORliver = 0.87, ORmuscle = 1.26, p<0.05 for each test except in liver).
-
Figure 3—source data 1
Effect sizes for determination of tissue-specific genes, enrichment of divergence-convergence (DiCo), and reversal genes within tissue-specific genes.
- https://cdn.elifesciences.org/articles/68048/elife-68048-fig3-data1-v4.xlsx
We then tested our initial prediction that the DiCo pattern is related to tissue-specific genes losing their expression in their native tissue and/or gaining expression in non-native tissues during ageing. We first tested this hypothesis by considering all tissue-specific genes. We found a positive odds ratio between loss of expression in native tissue and gain in other tissues during ageing (OR = 5.50, Fisher’s exact test p=2.1 × 10–129, Figure 4a). The same analysis conducted with only the DiCo genes yielded a much stronger association (OR = 74.81, Fisher’s exact test p=5.9 × 10–203, Figure 4b). This suggests that loss of tissue-specific expression is observed across the transcriptome, with a particularly strong association among DiCo genes. Figure 4c–f exemplifies the expression trajectories of genes chosen from each group defined in Figure 4b.

The loss of tissue-specific expression during ageing and functional enrichment of divergence-convergence (DiCo) genes.
(a) Mosaic plot showing the association between maximal expression change in native vs. non-native tissues (x-axis) vs. down- (cyan) or upregulation (pink) during ageing across all tissue-specific genes (n = 3766). The highly significant odds ratio indicates that genes native to a tissue tend to be downregulated during ageing in that native tissue if they show maximal expression change during ageing in that tissue. Conversely, if they show maximal expression change during ageing in non-native tissue, those genes are upregulated during ageing. Consequently, tissue-specific expression patterns established during development will tend to be lost during ageing. (b) The same as (a) but using only the tissue-specific genes that show the DiCo pattern (n = 1287). (c) Summary of the association tests for ‘direction of maximal expression change in native vs. non-native tissues’ across all datasets analysed. The y-axis shows log2-transformed odds ratio (OR) for each dataset (x-axis) – Schaum4: using the same four tissues as our dataset. Schaum8: using eight tissues. GTEx4: using the same four tissues as our dataset. GTEx10: using 10 tissues. ***False discovery rate (FDR)-corrected p-value<10–87. p-Values are given in Table 2. The four groups are annotated as GR1–4 and gene expression changes for each group in our dataset is exemplified in (d–g). (h) Trends of expression change with age of genes (x-axis) in categories enriched in DiCo (Gene Set Enrichment Analysis [GSEA]). Enriched categories (n = 184) are summarised into representatives (y-axis) using hierarchical clustering and Jaccard similarities (Materials and methods). Categories are ordered by the number of genes they contain from highest (bottom, n = 290) to lowest (top, n = 26). The most distant cluster with low within-cluster similarity in the hierarchical clustering (other Gene Ontology [GO]) was clustered separately and given in Figure 4—figure supplement 1.
-
Figure 4—source data 1
Gene Set Enrichment Analysis (GSEA) result of divergence-convergence (DiCo) genes, DiCo enrichment with tissue-specific expression loss, age-related expression change correlations, and convergence overlaps among datasets.
- https://cdn.elifesciences.org/articles/68048/elife-68048-fig4-data1-v4.xlsx
We then asked whether genes displaying the DiCo pattern may be related to specific functional pathways or share specific regulators. Using GO, we searched for functional enrichment among convergent genes during ageing using developmentally divergent genes as the background (Materials and methods). We found enrichment for 184 GO Biological Process (BP) categories for the DiCo pattern (KS test, FDR-corrected p-value<0.1, Figure 4—source data 1) and summarised enriched categories by clustering them based on the number of genes they share. We then studied the trends of gene expression changes with age (without a significance cutoff) in each representative category for each tissue (Materials and methods) (Figure 4h; we provide detailed clustering for the categories in ‘Other GO’ Figure 4—figure supplement 1). On average, energy metabolism, mitochondria, and tissue function-related categories, as well as immune response-related categories, exhibit DiCo-type expression changes over time and across tissues, where temporal changes in different tissues occur in opposite directions. Notably, for the majority of representative GO categories, the lung had the most distinct expression patterns in both periods (Figure 4h, Figure 4—figure supplement 1).
Contrary to the functional enrichment results, we did not find any specific regulators (miRNA or transcription factors) associated with DiCo using the same background as above (at 235 tests for miRNA and 158 tests for TF, FDR-corrected p-value>0.1 for both tests) (Materials and methods), which suggests that DiCo pattern may not be driven by a limited number of specific regulators, but may instead be a transcriptome-wide phenomenon.
Additional mouse and human datasets confirm the association between loss of tissue specificity and inter-tissue convergence during ageing
We investigated inter-tissue convergence during ageing in three additional datasets where multiple tissue samples were available for the same individuals (Table 1). We conducted the analysis using a subset of the same four tissues in our dataset and also larger sets when additional samples were available. Age-related expression changes showed small to moderate correlations among all datasets analysed, with our dataset being most similar to the mouse dataset from Jonker et al., while the GTEx human dataset was the most distinct (Figure 4—figure supplement 2a).
Dataset characteristics summarising species, tissues, number of individuals, age range, sex, and platform used for measuring gene expression values.
Dataset | Species | Tissues | N | Age range | Sex | Method |
---|---|---|---|---|---|---|
Izgi et al.,four tissues | Mice | Brain, lung, liver, muscle | 8 | 3–30 months | Male | RNA-seq |
Jonker et al.,five tissues | Mice | Brain, lung, liver, kidney, spleen | 18 | 3–30 months | Female | Microarray |
Schaum et al.,four tissues | Mice | Brain, lung, liver, muscle | 37 | 3–27 months | Male (n = 26)Female (n = 11) | RNA-seq |
Schaum et al., eight tissues | Mice | Brain, lung, liver, muscle, subcutaneous fat, kidney, heart, spleen | 26 | 3–27 months | Male (n = 20)Female (n = 6) | RNA-seq |
GTEx,four tissues | Humans | Brain, lung, liver, muscle | 47 | 20–75 years | Male (n = 36)Female (n = 11) | RNA-seq |
GTEx,10 tissues | Humans | Adipose, tibial artery, cerebellum, lung, skeletal muscle, tibial nerve, pituitary, sun-exposed skin, thyroid, whole blood | 35 | 20–75 years | Male (n = 27)Female (n = 8) | RNA-seq |
First, using the Jonker et al. dataset (Jonker et al., 2013) comprising five tissues (Table 1), we observed transcriptome-wide convergence during ageing with a significant decline in mean Euclidean distance between PCs (ρ = –0.57, p=0.014, Figure 2—figure supplement 7a–c) and a strong decrease in mean CoV during ageing (ρ = –0.48, p=0.044, Figure 2—figure supplement 7d). Moreover, we found that 7/10 tissue pairs showed increased pairwise tissue correlations during ageing, although none of them was significant after multiple testing correction (Figure 2—figure supplement 7f). 66% of the genes with a significant change in CoV were convergent comparable to our dataset showing 68% convergence among significant changes. We also tested the association between the loss of identity and convergence pattern by repeating the same analysis as in Figure 4b with the Jonker et al. dataset using only the convergent genes in ageing as we lack developmental period. We again found strong association, consistent with convergent genes losing expression in their native tissue and gaining in other tissues during ageing (OR = 7.52, p<10–16, Figure 4c). The results are summarised in Table 2.
Result summary of the all datasets analysed.
First column shows the names of datasets analysed. Numbers in parentheses show the sample sizes. ‘Among all genes’ column refers to the analyses performed using all genes relevant to those analyses (subcolumns) without a significance cutoff. ‘Within significant CoV changes’: genes show significant CoV change with age with FDR-corrected p-value<0.1. In the ‘DiCo vs. tissue specificity (Di- as background)’ column, divergent genes in development (Di-) were chosen as background. ‘Co vs. expression change in native tissue association (Figure 4b)’ column refers to the analysis performed in Figure 4b for each dataset, and the results are presented in Figure 4c. The association tests were performed among convergent genes in ageing except in our dataset, which was performed with DiCo genes. Significant test results are indicated with italic fonts. Bold fonts show the results that support convergence or tissue-specific expression loss in ageing whether as a significant result or as a trend. Unsupportive test results and inapplicable tests are written in normal font.
Among all genes | Within significant CoV changes | ||||||
---|---|---|---|---|---|---|---|
PCA change in Euclidean distance | Mean CoV change | Median CoV change | Pairwise tissue correlations | DiCo vs. tissue specificity(Di- as background) | Co vs. expression change in native tissue association (Figure 4b) | Co vs. Di proportions | |
Izgi2022 | ρ = −0.87, p=0.0026 | ρ = −0.5, p=0.2 | rho = −0.48, P = 0.23 | 4/6 positive, none significant* | OR = 1.56, p=1.3 × 10–18 | OR = 74.81,p=5.9 × 10–203(among 1287 DiCo genes) | 68% convergence(among 62 significant genes*) |
Jonker 2013five tissues, two different than ours (n = 18) | ρ=−0.57, p=0.014 | ρ=−0.48, p=0.044 | ρ = −0.03, p=0.91 | 7/10 positive, none significant* | Di- background missing | OR = 7.52, p=6.5 × 10–109(among 2967 convergent genes) | 66% convergence(among 1735 significant genes*) |
Schaum 2020,same four tissues (n = 37) | ρ = 0.13, p=0.46 | ρ = 0.25, p=0.14 | ρ = 0.13, p=0.43 | 4/6 positive,two significant* | OR = 1.33, p=1.07 × 10–8 | OR = 58.03, p=1.5 × 10–197(among 2124 convergent genes) | 53% convergence(among 319 significant genes*) |
Schaum 2020,eight tissues (n = 26) | ρ = 0.1, p=0.62 | ρ = 0.16, p=0.43 | ρ = 0.04, p=0.86 | 16/28 positive,five significant* | Di- background missing | OR = 84.2, p=9.7 × 10–96(among 2380 convergent genes) | 54% convergence(among 244 significant genes*) |
GTEx,same four tissues | ρ = −0.23, p=0.12 | ρ = −0.12, p=0.42 | ρ = −0.18, p=0.23 | 5/6 positive, none significant* | Di- background missing | OR = 7.21, p=7 × 10–87(among 2407 convergent genes) | (no significant CoV changes) |
GTEx,10 tissues | ρ = −0.26, p=0.13 | ρ = −0.14, p=0.44 | ρ = −0.3, p=0.08 | 29/45 positive, none significant* | Di- background missing | OR = 13.01, p=5.7 × 10–114(among 2195 convergent genes) | (all three significant genes were convergent) |
-
ρ = Spearman’s correlation coefficient; OR = odds ratio; FDR = false discovery rate; CoV = coefficient of variation; DiCo = divergence-convergence; PCA = principal components analysis.
-
* FDR-corrected p-value<0.1.
Next, we used another mouse dataset by Schaum et al., 2020 (Table 1). Repeating the analysis on the same four tissues and also a larger set of eight tissues, we did not find support for transcriptome-wide convergence (Table 2, Figure 2—figure supplements 17 and 19). In the 4-tissue comparison, 4/6 tissue pairs, and in the 8-tissue comparison only 16/28 tissue pairs showed positive correlations, supporting the inter-tissue convergence during ageing (Figure 2—figure supplements 18c and 20c). Interestingly, 75% of the negative correlations involved muscle and subcutaneous fat. Convergence ratios among genes showing significant change in CoV (FDR-corrected p-value<0.1) were marginally above 50%. Although we did not observe widespread convergence during ageing in this dataset, we still detected strong associations between convergence in ageing and tissue specificity (OR4-tissue = 1.33, p=1.08 × 10–8) and identity loss (OR4-tissue = 58.3, p<10–16; OR8-tissue = 84.2, p < 10–16) (Figure 4c).
Lastly, we used the GTEx dataset to investigate inter-tissue convergence during ageing in humans. Calculating the change in mean Euclidean distance based on PCA and mean CoV values, we found a non-significant tendency towards convergence across the whole transcriptome in the same 4 tissues and a larger set of 10 tissues (Table 2, Figure 2—figure supplements 8 and 10). We also performed the four-tissue comparison with female and male individuals separately and observed relatively strong inter-tissue convergence among ageing females (ρfemale = –0.58, pfemale = 0.059) but less in males (ρmale = –0.052, pmale = 0.77) which lack individuals at the youngest and oldest age groups (Figure 2—figure supplement 16). Moreover, 5/6 and 29/45 tissue pairs showed increased correlation with age in 4-tissue and 10-tissue comparisons, consistent with inter-tissue convergence during ageing (Figure 2—figure supplements 9 and 11). Notably, 8 of 16 negative correlations in the 10-tissue comparison involved the skin tissue (Figure 2—figure supplement 11c). We also studied significant changes in CoV per gene, but found no significant gene in the 4-tissue comparison and only three genes in the 10-tissue comparison, all of which were convergent. Finally, we tested the association between the loss of expression in native tissue and gain in other tissues during ageing among convergent genes, confirming the association with the tissue identity (Figure 4c, Table 2).
Overall, analysis of these three additional datasets indicates that inter-tissue convergence during ageing is commonly, but not always, observed at the transcriptome-wide level in mice and in humans. Notably, the transcriptome-wide trend was weak in the Jonker et al. and GTEx datasets and not evident in the Schaum et al. dataset. The association between the loss of identity and convergence, on the other hand, was strong across all datasets (Table 2).
We further asked whether convergent gene sets identified in different datasets overlap. 11 of 15 comparisons were significant, but the effect sizes (ESs) were small (Figure 4—figure supplement 2b). We reason that the low overlap across datasets might reflect that transcriptome-wide convergence was weak and that we lack the developmental samples for the external datasets, that is, we can only compare convergence during ageing but not the DiCo pattern. Noteworthy, only 62% of convergent genes in ageing are divergent during development in our dataset, and low overlap between convergence does not rule out overlap across DiCo genes.
These results suggest that inter-tissue convergence in ageing may be a weak but widespread phenomenon and associated with the loss of tissue identity. Overall, while mouse and human tissues display divergence in development (Figures 1a and 2a, Cardoso-Moreira et al., 2019), this appears to be followed by a trend towards inter-tissue convergence in ageing (Figure 2a, Figure 2—figure supplements 1–20) and could be linked to loss of tissue identity.
Changes in cellular composition and cell-autonomous expression can both explain the DiCo pattern
Ageing-related transcriptome changes observed using bulk tissue samples may be explained by temporal changes in cell-type proportions within tissues, by cell-autonomous expression changes, or both. To explore whether the observed inter-tissue DiCo patterns may be attributed to changes in cell-type proportions, we used published data from a mouse single-cell RNA-sequencing experiment (Tabula Muris Consortium, 2020). For each of the four tissues in our original experiment, we collected cell-type-specific expression profiles from 3-month-old young adult mice in the Tabula Muris Senis dataset. We deconvoluted bulk tissue expression profiles in our mouse dataset using the corresponding tissue’s cell-type-specific expression profiles by regression analysis (Materials and methods) and studied the relative contributions of each cell type to tissue transcriptomes and how these change with age. The analysis was performed with three gene sets; all genes (n = [12,492, 12,849]), DiCo (n = [4007, 4106]), and non-DiCo genes (n = [8485, 8743]). Studying these deconvolution patterns, we observed a weak but consistent trend involving the most common cell types in different tissues. For instance, analysing DiCo genes in the liver and lung, we found that the most common cell type’s contribution (hepatocyte in the liver, and bronchial smooth muscle cell in the lung) tends to increase during development (Spearman’s correlation coefficient ρliver = 0.95, ρlung = 0.81, nominal p<0.05). This contribution then decreases during ageing (ρliver = –0.77, ρlung = –0.86, nominal p<0.05) (Figure 5a, Figure 5—figure supplement 1). This pattern was also observed in muscle and cortex, albeit not significantly (Figure 5a, Figure 5—figure supplement 1). These changes most likely reflect shifts in cellular composition, some of which were demonstrated directly in mice using in situ RNA staining (Schaum et al., 2020). Repeating the analysis with non-DiCo genes resulted in highly similar patterns considering the most common cell types in tissues, except in muscle ageing in which the age-related decrease was significantly higher with DiCo genes than the non-DiCo genes (permutation test with resampling all genes, pskeletal-muscle-satellite-cell = 0.04) (Figure 5a, Figure 5—figure supplement 1, Figure 5—figure supplements 2–5). These results indicate that the observed cellular composition changes may partly explain DiCo, although the influence of composition changes is not exclusive to genes displaying the DiCo pattern.

Contribution of tissue composition and cell-autonomous changes to the divergence-convergence (DiCo) pattern.
(a) Deconvolution analysis of our mouse dataset with the 3-month-old scRNA-seq data (Tabula Muris Senis) using DiCo (n = [4007, 4106]) and non-DiCo (n = [8485, 8743]) genes. Only the cell types with the highest relative contributions to each tissue bulk transcriptome are shown (cell-type names are given within each plot). Contributions of all cell types to bulk tissue transcriptomes are shown in Figure 5—figure supplement 1. (b) Distribution of correlations for minimally (left) and maximally (right) correlated cell-type pairs among tissues (n = 54 pairs). For each cell type of a given tissue, one minimally (or maximally) correlated cell type is chosen from other tissues among the 3-month age group of the Tabula Muris Senis dataset (density plots with solid line edges). Dashed lines show the correlation distributions in 24-month age of minimally or maximally correlated cell-type pairs identified in the 3-month age group. Bottom panel shows age-related expression similarity (ρ) changes of minimally (left) and maximally (right) correlated cell-type pairs. The correlation between age and tissue similarity (expression correlations) was calculated for each pair of cell types identified in the 3-month age group. All pairwise cell-type correlations and their age-related changes are given in Figure 5—source data 1.
-
Figure 5—source data 1
Cell-type proportion estimation and cell-autonomous changes using the Tabula Muris Senis dataset.
- https://cdn.elifesciences.org/articles/68048/elife-68048-fig5-data1-v4.xlsx
Next, we investigated the possible role of cell-autonomous changes in the DiCo pattern. Cell-autonomous changes could contribute to inter-tissue convergence during ageing in two ways. First, expression profiles of similar cell types shared across different tissues, such as immune cells, might converge with age. Another possible scenario, consistent with the notion of age-related cellular identity loss, is that the expression profiles of unrelated cell types, such as tissue-specific cell types in different tissues, converge with age. To test these scenarios, we first ordered the pairwise correlations between cell types in different tissues at the 3-month age group to determine the most similar and dissimilar cell types across tissues (Materials and methods). Then, we studied how these similarities (i.e. pairwise correlations) change with age (Figure 5b). Intriguingly, we found that pairs of similar cell types (i.e. those with the highest correlations) among tissues tend to become less similar with age (36/54 [67%] of pairwise comparisons, Figure 5—source data 1). On the contrary, the most distinct cell types (i.e. those with the lowest correlations) among tissues become more similar with age (45/54 [83%], Figure 5—source data 1). Repeating the analysis considering DiCo genes only yielded a similar trend (30/54 [56%] decrease in correlation among the most similar cell types, permutation test with resampling non-DiCo genes, p>0.1; and 47/54 [87%] increase in correlation among the most distinct cell types, permutation test, p>0.1). These trends are consistent with age-related cellular identity loss, and they suggest that cell-autonomous changes may also contribute to inter-tissue convergence during ageing, although further data and analyses would be needed to fully establish their validity.
Finally, we tested the possibility of intra-tissue convergence of cell types in the Tabula Muris Senis dataset by calculating expression variation among cell types using the CoV measure for each individual. However, we did not observe a consistent trend of increasing similarity among cell types within tissues from 3-month-old to 24-month-old mice (Figure 5—figure supplement 6).
Discussion
Our findings confirm a number of ageing-associated phenomena identified earlier, while also revealing new patterns. First, we report parallel age-related expression changes among the four tissues studied, during development, as well as in ageing. The inter-tissue correlation distributions were modest and also comparable between development and ageing (Figure 1c). This last point may appear surprising at first glance, given the stochastic nature of ageing relative to development (Bahar et al., 2006; Martinez-Jimenez et al., 2017; Angelidis et al., 2019; Somel et al., 2006; Feser et al., 2010; Kim et al., 1996; Enge et al., 2017), and also given earlier observations that developmental expression changes tend to be evolutionarily conserved, while ageing-related changes much less so (Zahn et al., 2007; Somel et al., 2010). At the same time, when we consider that tissues diverge during development, and also that ageing is characterised by parallel expression changes among tissues related to damage response, inflammation, and reduced energy metabolism (Zahn et al., 2007; Yang et al., 2015), similar magnitudes of correlations during development and ageing may be expected.
Second, we verify the generality of the reversal pattern, that is, up-down or down-up expression change patterns across the lifetime, among distinct mouse tissues that include both highly mitotic (lung and liver) and less mitotic ones (skeletal muscle and cortex). Consistent with earlier observations in fewer tissues (Anisimova et al., 2020; Dönertaş et al., 2017), we find that about half the expressed genes display reversal in all cases studied. Importantly, expression reversal is not ubiquitous across all genes and our findings do not necessarily contradict the hyperfunction theory. Instead, we suggest that reversal is a common phenomenon that influences a notable fraction of the transcriptome and is a likely contributor to mammalian ageing.
Two observations here are notable. One is that reversal-displaying genes, especially those displaying the up-down pattern in each tissue, can be associated with tissue-specialisation-related pathways (e.g. morphogenesis) and tissue-specific functions (e.g. synaptic activity). The second observation is the lack of significant overlap among reversal genes among tissues. We thus hypothesised that reversals might be reflecting tissue specialisation during development (hence lack of overlap among tissues) and loss of specialisation during ageing. These processes could manifest themselves as inter-tissue divergence and convergence patterns over lifetime. We indeed observed that the up-down reversal pattern is enriched in tissue-specific genes, except in the liver. Studying inter-tissue similarity across mouse lifespan, we further found that the four tissues’ transcriptomes diverged during postnatal development, and we further detected a trend towards inter-tissue convergence during ageing. We then further investigated this phenomenon through different approaches: (1) by studying overall trends using PCA, (2) by analysing transcriptome-wide trends of inter-tissue CoV without considering gene-wise significance cutoffs, (3) by focusing on genes with significant age-related changes in inter-tissue CoV, (4) by studying age-related changes in pairwise tissue correlations, (5) by analysing different cell types using scRNA-seq data, and (6) by repeating the same analysis using independent mouse and human ageing datasets. The patterns we found were mostly consistent with inter-tissue convergence, but the majority of transcriptome-wide results were associated with low ESs, and some were not statistically significant. Importantly, all significant results suggested convergence during ageing. We therefore conclude that (1) developmental inter-tissue divergence does not continue into ageing and (2) convergence during ageing may be common although possibly not ubiquitous.
The weakness of the inter-tissue convergence signal per dataset and the limited overlap between convergent gene sets among datasets could have multiple reasons. These include the low signal-to-noise ratios characterising ageing-related expression patterns, the lack of old-age individuals in our mouse dataset (>3-year-old mice) and the GTEx dataset (>90-year-old humans), limited overlap of tissues between our mouse dataset (cortex, liver, lung, and muscle) and the Jonker et al. dataset (cortex, liver, lung, spleen, kidney), as well as differences in ageing patterns between species or between sexes. Further research involving larger sample sizes and diverse species is needed to confirm the generalisability of the observations.
Finally, we report a number of interesting observations on DiCo. We determine that tissue-specific genes tend to be downregulated in the tissues that they belong to during ageing, while non-tissue-specific genes are upregulated, which was confirmed by all external datasets (Figure 4c). Second, using deconvolution, we infer that cell types most common in a tissue (e.g. hepatocytes in the liver) tend to increase in frequency during development, but then decrease in frequency during ageing, as also shown recently using immunohistochemistry in a number of mouse tissues (Schaum et al., 2020). Accordingly, the DiCo phenomenon may at least partly be explained by shifts in cellular composition. This is intriguing as both highly mitotic and low mitotic tissues share this trend, indicating that an explanation based on stem cell exhaustion may not be applicable here. Third, we find increased expression similarity between distinct cell types in different tissues during ageing, but decreased similarity between similar cell types. Cell-autonomous expression changes, therefore, likely also contribute to the DiCo phenomenon. We note that higher expression variability among cells at old age (Hernando-Herraez et al., 2019; Enge et al., 2017) could also lead to inter-tissue convergence during ageing. A fourth interesting observation was the absence of significant enrichment for specific transcription factor or microRNA targets among DiCo genes. This result may not be surprising if inter-tissue convergence is mostly driven by stochastic damage accumulation, such as loss of epigenetic marks. It is also possible that instead of specific regulators their interaction and cooperativity are associated with the DiCo. Future experimental studies could test both mechanistic aspects and functional link to tissue specificity.
We also note two major limitations of our study. One is related to the fact that our dataset represents bulk tissue samples, which may suffer from infiltration of foreign cell types into tissues. Indeed, one of the external datasets, Schaum et al., included samples from perfused mice (Schaum et al., 2020), and we did not find support for the transcriptome-wide convergence during ageing, even though the association between tissue identity loss and convergence was also evident. The scRNA-seq dataset we analysed further suggested that DiCo is associated with tissue-specific genes and not immune- or blood-related categories, but we still cannot rule out possible infiltration artefacts that may affect our results. A second limitation is related to ageing being highly sex-dimorphic in mammals (Yuan et al., 2012; Sampathkumar et al., 2020). Hence, in-depth analysis of sex specificity of the DiCo pattern could be relevant. Our mouse dataset included only male mice, while that of Jonker et al. was female-only. The fact that both revealed DiCo patterns suggest DiCo is not particular to one sex, but there could still exist sex-specific effects. In fact, when we analysed DiCo among human male and female individuals in the GTEx dataset separately, we observed slightly stronger inter-tissue convergence among ageing females than in males, although the GTEx male samples have also a drastically narrower age range (Figure 2—figure supplement 16). Accordingly, the prevalence of DiCo among humans and sexes waits to be determined.
Despite the open questions that remain, our results consistently support a model where ageing mammals suffer from loss of specialisation at the tissue level, and possibly also at the cellular level, which are observed as expression reversals and the newly discovered DiCo phenomenon we report here.
Materials and methods
Sample collection
Request a detailed protocolWe collected bulk tissue samples from 16 male C57BL/6J mice. The samples were snap frozen in liquid nitrogen and stored at –80°C. No perfusion was applied. The mice were of different ages covering the whole lifespan of Mus musculus, comprising both postnatal development and ageing periods. The samples included four different tissues; cerebral cortex, liver, lung, and skeletal muscle. One 904-day-old mouse had no cortex tissue sample and was thus excluded from the analysis. As a result, we generated 63 RNA-seq libraries in total.
Separation of development and ageing periods
Request a detailed protocolIn order to compare gene expression changes during postnatal development and ageing, we studied the samples before sexual maturation (covering 2–61 days of age, n = 7) as the postnatal development period, and samples covering 93–904 days (n = 9 in all tissues except in cortex where we had n = 8) as the ageing period.
RNA-seq library preparation
Request a detailed protocolRNA sequencing was performed as previously described (Liu et al., 2016) with slight modifications. Briefly, total RNA was extracted using the Trizol reagent (Invitrogen) from frozen tissue samples. For sequencing library construction, we randomised all samples to avoid batch effects and used the TruSeq RNA Sample Preparation Kit (Illumina) according to the manufacturer’s instruction. Libraries were then sequenced on the Illumina HiSeq 4000 system in three lanes within one flow cell using the 150 bp paired-end module.
RNA-seq data preprocessing
Request a detailed protocolThe quality assessment of the raw RNA-seq data was performed using FastQC v.0.11.5 (Andrews, 2010). Adapters were removed using Trimmomatic v.0.36 (Bolger et al., 2014). The low-quality reads were filtered using the parameters: ‘PE ILLUMINACLIP: TruSeq3-PE-2.fa:2:30:1:0:8:true, SLIDINGWINDOW:4:15, MINLEN:25’. The remaining high-quality reads were aligned to the mouse reference genome GRCm38 using STAR-2.5.3 (Dobin et al., 2013) with parameters: ‘--sjdbOverhang 99 --outSAMattrIHstart 0 --outSAMstrandfield intronMotif --sjdbGTFfile GRCm38.gtf’. The percentage of uniquely mapped reads in libraries ranged from 80% to 93%. We used cufflinks v.2.2.1 (Trapnell et al., 2010) to generate read counts for uniquely aligned reads (samtools ‘-q 255’ filter) and calculated expression levels as fragment per kilobase million (FPKM). In total, we quantified expression levels for 51,608 genes in the GRCm38.gtf file. We identified 50 duplicated genes with 1 > FPKM value assigned, and the sum of their FPKM values was used.
All the remaining analyses were performed in R v.4.1. We restricted the whole analysis to only protein-coding genes obtained by the ‘biotype’ feature of the biomaRt library v.2.48.2 (Durinck et al., 2009). We also excluded genes which were not detected (0 FPKM) in 25% or more of the samples (at least 15 of 63), resulting in 15,063 protein-coding genes in total. As FPKM normalisation does not effectively account for cross-library variability, we additionally performed two normalisation approaches:
Quantile normalisation: Using all the samples together (n = 63, regardless of their age or tissue), FPKM values were log2 transformed (after adding 1) and quantile normalised with ‘normalize.quantiles’ function from ‘preprocessCore’ library v.1.54 (Bolstad, 2020). This approach equalises the distributions of different libraries. The assumption is that any large-scale differences in expression-level distributions reflect technical factors.
VST: To assess the robustness of quantile normalisation on downstream analysis, we additionally implemented this approach, which ensures homoscedasticity, that is, variances of expression levels are independent of the mean (Anders and Huber, 2010). Uniquely aligned reads obtained from the STAR alignment were used to calculate read counts by HTSeq v.0.13.5 (Anders et al., 2015) with parameters: ‘--format= bam --order= pos --stranded= no --type= exon --mode= union --nonunique= none’. Read counts were then imported into R using the ‘DESeqDataSetFromHTSeqCount’ function in DESeq2 v.1.32.0 package (Love et al., 2014). The same filtration steps were applied as above, resulting in 14,973 protein-coding genes in total. Normalisation was performed with the ‘vst’ function and ‘blinded = T’ option in the DESeq2 package. The VST-normalised expression matrix was used to reproduce results of Figures 1 and 2, which are given in Figure 1—figure supplements 10 and 11 and Figure 2—figure supplement 14.
Principal components analysis
Request a detailed protocolWe studied the main sources of variation in the whole dataset using PCA on the scaled expression matrix with ‘prcomp’ function in the R base. The first four components, PC1–PC4, explained 31, 20, 17, and 8% of the total variance. We observed a clear separation of tissues in PC1 and PC2 and a strong age effect in PC4. To statistically confirm tissue differences, we performed ANOVA on individual PC scores with tissue as explanatory variable; this was run on each of the first four PCs (PC1–PC4) separately. The magnitude of the age effect on PCA was measured with Spearman’s correlation test between individual age and each individual’s PC score separately in each tissue. PCA was also repeated for development and ageing periods separately (Figure 1—figure supplement 3). We further calculated Euclidean distance in pairwise manner among tissues of each individual in PC1–4 space constructed in three different ways: (1) using all the samples together, (2) using only the developmental samples, and (3) using only the ageing samples. Then, we tested the effect of age on mean Euclidean distance among tissues using the Spearman’s correlation test. To study only the age effect on PC scores without the tissue effect, we performed the following: (1) we removed the tissue-specific effects from the data by scaling the expression levels of each gene to mean = 0 and sd = 1 in each tissue separately, (2) we combined the four scaled expression matrices, and (3) we conducted PCA on the combined dataset (Figure 1—figure supplement 2).
Age-related gene expression change
Request a detailed protocolTo identify genes showing age-related expression change in each tissue, we used Spearman’s correlation coefficient between individual age and expression level separately for development and ageing periods. To capture potential nonlinear but monotonic changes in expression, we chose the non-parametric two-sided Spearman’s correlation test for both periods. We have used two-sided tests for all statistical tests throughout the article except the permutation tests. Significance of age-related genes was assessed with the FDR (FDR-corrected p-value<0.1 cutoff, calculated with the Benjamini–Hochberg [BH] procedure; Benjamini and Hochberg, 1995) using the ‘p.adjust’ function in the R base library. Throughout the article, BH procedure with 0.1 cutoff was used for multiple test corrections of all statistical tests.
Functional associations
Request a detailed protocolWe tested the functional associations of age-related gene expression change in separate tissues for each period (development and ageing) separately, employing the gene set over-representation analysis (GORA) procedure with GO (Ashburner et al., 2000) BP categories using the ‘topGO’ package v.2.44 (Alexa and Rahnenfuhrer, 2019). We applied the ‘classical’ algorithm and performed Fisher’s exact test on categories that satisfy the criteria of a minimum 10 and maximum 500 number of genes. We used the whole set of expressed genes (n = 15,063) as the background. p-Values were corrected for multiple testing using the BH procedure. Categories with FDR-corrected p-value<0.1 were considered as significant.
Correlation between age-related gene expression changes in different tissues
Request a detailed protocolWe calculated Spearman’s correlation coefficients between age-related gene expression change ρgene values (i.e. correlation between gene expression levels and age) calculated per gene in each tissue pair (Figure 1c). In order to test the statistical significance of the correlations, we used a permutation scheme as the expression levels across tissues are not independent but belong to the same mice. In order to account for the dependence, the individual ages were permuted in each round, but the permuted values were kept constant across tissues (similar to permutation tests applied in Dönertaş et al., 2017; Işıldak et al., 2020; Dönertaş et al., 2018). Specifically, we performed 1000 permutation rounds. In each round, we randomised the individual ages using the ‘sample’ function in R, while keeping the permuted age labels constant for individuals across tissues. We calculated the age-related gene expression changes with permuted ages in development and ageing datasets separately, thus simulating the null distribution with no age effect in each period. We then calculated the Spearman’s correlation coefficient between the age-related expression levels from the permutations across tissues and assigned the p-value by calculating the proportion of permuted calculations with a more extreme correlation. All permutation tests in the article were performed as one-sided tests. The estimated false-positive proportion (eFPP; proportion of false positives among all true non-significant results (true negatives + false positives)) was calculated as the median value of expected values divided by the observed value (Figure 1—source data 1).
Shared gene expression changes across tissues
Request a detailed protocolWe summarised the number of shared age-related genes among tissues for up- and downregulated genes separately using FDR-corrected p-value<0.1 (Figure 1—figure supplement 5). The development and ageing datasets were tested separately. For each gene, we counted the number of tissues with the same direction of expression change with age. We calculated this overlap statistic among tissues (1) using genes with FDR-corrected p-value<0.1 and (2) with all genes without using any significance cutoff (Figure 1e, Figure 1—figure supplement 4).
Permutation test
Request a detailed protocolWe again used a permutation scheme to assess the significance of shared age-related genes to account for the dependence among tissues. We tested the significance of shared up- and downregulated genes, selected with or without an FDR cutoff, in development and in ageing periods separately. We used the age-related expression change values (ρ′gene) calculated by permuting individual ages, 1000 times. To test the significance of the overlap of significantly up- or downregulated genes (FDR-corrected p-value<0.1) among tissues, we used the following procedure: (1) for each permutation round, we ranked the ρ′gene values for each tissue in each period separately. (2) We chose the highest Nu (to test the upregulation) or lowest Nd (to test the downregulation) number of genes, where Nu and Nd are the number of significantly up- or downregulated genes, respectively, in a given tissue (FDR-corrected p-value<0.1). (3) For each permutation round, we calculated the number of overlaps across tissues using the chosen gene sets, that is, the number of tissues with the same direction of expression changes with age for those genes. Doing this for 1000 permutation results yielded a null distribution representing the expected overlaps if there were no age effects. (4) We calculated the p-value as the proportion of 1000 permutations where the number of overlaps was higher than the observed value. The eFPP was calculated as the median number of overlaps in permutations divided by the observed value.
Likewise, to test the significance of the overlap of shared up- and downregulated genes selected without FDR cutoff, we used the same permutation scheme explained above, but this time using all the age-related expression changes created using permutations (ρ′gene), without applying a significance cutoff for any tissue, and calculating the overlap across tissues in the same way.
Functional associations
Request a detailed protocolWe tested the functional associations of shared expression change trends among tissues in each period separately following the GORA procedure using the same criteria and algorithms explained in the previous section. To test shared upregulated (n = 45) or downregulated genes (n = 138) in development, we chose all significant age-related genes across tissues (n = 10,305) in the development period as background. Since we could not identify any shared ageing-related genes across tissues (Figure 1—figure supplement 5), we did not perform a functional test for the ageing period.
Analysis of gene expression reversals
Request a detailed protocolWe compared the direction of gene expression change during development and during ageing to identify reversal genes in each tissue separately. Genes showing upregulation (positive correlation with age) in development and downregulation (negative correlation with age) in ageing were assigned as up-down (UD) reversal genes, while the genes with the opposite trend (downregulation in development and upregulation in ageing) were assigned as down-up (DU) reversal genes. Without using any significance level for expression-age correlation values, we calculated the proportion of genes showing reversal by keeping the expression change direction in development the same, that is, UD% = UD/(UU + UD) and DU% = DU/(DD + DU).
Permutation test
Request a detailed protocolTo test the significance of reversal proportions, we kept the developmental changes constant and randomly permuted the individual ages only in the ageing period (as described earlier). Among developmental upregulated genes, we calculated the UD% in each permutation, simulating a null distribution for UD reversal. We applied the same principle for the DU genes. Thus, we created a null distribution with the expected reversal ratios and tested the significance of observed values for each tissue separately (Figure 1—figure supplement 8).
Functional associations
Request a detailed protocolWe used the GORA procedure as described earlier to test functional associations of reversal genes in each tissue but kept the developmental changes constant in the background. More specifically, we tested the functional enrichment of UD reversal genes against UU genes, and DU genes against DD genes. We thereby specifically test the functions associated with the reversal pattern, but not development-associated functions.
Overlap of reversal genes: Permutation test
Request a detailed protocolWe tested the significance of overlap using the same permutation scheme described above. Specifically, among developmental up- (or down-) regulated genes shared among tissues, we constructed null distributions by calculating the ratio of UD vs. UD + UU (or DU vs. DU + DD) genes shared among tissues, identified in 1000 random permutations of individual ages only in the ageing period (Figure 1—figure supplement 9). The number of shared upregulated genes was nup = 2255 (one gene excluded since it has constant expression in one tissue in ageing period), and the number of shared downregulated genes was ndown = 2209.
Tissue convergence and divergence calculations using CoV
For each individual mouse, for each gene (n = 15,063), we calculated the inter-tissue CoV estimate using normalised expression levels from the four tissues, dividing the standard deviation by the mean. We studied inter-tissue expression-variation change with age in development and ageing periods separately using two approaches: (1) using the change in mean or median CoV across genes and (2) studying significant CoV patterns at the single-gene level.
Mean/median CoV across all genes
Request a detailed protocolWe assessed transcriptome-wide variation among the tissues of each individual mouse by calculating the mean (or median) CoV of genes and then performing the Spearman’s correlation test between mean-CoV (or median-CoV) and individual age.
CoV at the single-gene level
Request a detailed protocolIn the second approach, we tested the correlation between the CoV value of a gene and individual age for each commonly expressed gene using the Spearman’s correlation test. p-Values were corrected for multiple testing using the ‘BH’ procedure. We used FDR-corrected p-value<0.1 as cutoff. The genes showing positive correlation between CoV and age were called ‘divergent,’ and the ones showing negative correlation were called ‘convergent’ (Figure 2b). Genes that display a divergent pattern during development and convergent pattern in ageing (without using a significance level) were called divergent-convergent (DiCo) genes (n = 4802).
Permutation test
Request a detailed protocolTo test the significance of DiCo genes (n = 4802), we kept the developmental divergent genes constant (n = 9058, without a significance cutoff) and randomly permuted the individual ages only in the ageing period (as described earlier). Among developmental divergent genes, we calculated the DiCo% for each permutation, simulating a null distribution for the DiCo pattern (Figure 2—figure supplement 12).
Clustering of DiCo genes
Request a detailed protocolWe used the k-means algorithm to cluster DiCo genes according to their CoV or expression changes with age separately (Figure 2—figure supplements 2–3). To find the optimum number of clusters for both procedures, we applied gap statistics using the ‘clusGap’ function in the ‘cluster’ package v.2.1.2 with 500 simulations (Tibshirani et al., 2001). We used the ‘kmeans’ function in base R with ‘iter.max = 20’ and ‘nstart = 50’ parameters to cluster CoV values or expression levels which were standardised to mean = 1 and sd = 0 across genes.
Effect of gene expression trajectories on DiCo
Request a detailed protocolTo identify potential non-monotonic expression changes with age that could not be detected with the Spearman’s correlation coefficient, we clustered all expressed genes (n = 15,063) in each tissue separately using the k-means algorithm following the same steps explained above (Figure 1—figure supplements 12–15). The list of genes belonging to each cluster is given in Figure 2—source data 1. Then, for each cluster, separately in each tissue, we performed a Fisher’s exact test to assess if a particular cluster pattern is enriched or depleted in DiCo genes relative to all other expressed genes (the background).
Functional association analysis
Request a detailed protocolTo test the functional associations of the genes showing the DiCo pattern among tissues, we performed GSEA using GO BPs. We retrieved developmental divergent genes (with ρCoV-age > 0, n = 9058) and multiplied these ρCoV-age values with the ones calculated in the ageing period. Therefore, the genes with a negative value represent a DiCo pattern, while the ones with a positive value represent a DiDi pattern. We then ranked the genes according to the calculated product values and sought enrichment for the upper and lower tail of the distribution using the KS test implemented in the ‘clusterProfiler’ package v.4.0.0 (Yu et al., 2012). The ‘gseGO’ function was used with parameters: ‘nPerm = 1000, minGSSize = 10, maxGSSize = 500 and pValueCutoff = 1’. Therefore, the enriched categories for the genes in the lower tail of the distribution would represent DiCo enrichment. Categories with FDR-corrected p-value<0.1 were considered as significant.
We summarised DiCo-enriched categories into representative ones following Dönertaş et al., 2021 and used hierarchical clustering on gene similarities among categories. The tree was cut into 25 clusters. For each cluster, we chose as representative the category that has the highest mean Jaccard similarity to the other categories in the same cluster. Then, we calculated the mean age-expression correlation across all the genes in each representative category in each tissue and in each period. As the unrelated categories, those with the low within-cluster similarity were grouped into one cluster, we denoted them ‘Other GO,’ and performed the same clustering steps to further summarise them (Figure 4—figure supplement 1).
We further sought functional enrichment among DiCo genes that were clustered with the k-means algorithm for both CoV and expression clusters separately (Figure 2—figure supplements 2–3). Genes in each cluster were tested among all DiCo genes using the same GORA procedure as described before.
Jackknife to test the Di/Co ratio between development and ageing
Request a detailed protocolWe tested the significance of divergent/convergent gene ratios using a jackknife resampling procedure in development and in ageing periods separately. Leaving out an individual in each iteration, we recalculated the number of significant divergent and convergent genes and their ratios. As we could not obtain any gene with significant CoV changes when the youngest adults were left out due to the decreased power, standard error and confidence interval calculation was not possible. Instead, we report the range of pseudovalues. We note that the range of ratios in leave-out samples do not contain the value 1 either in the development (0.41–0.49) or in the ageing (1.20–2.83) period (Figure 2e).
Pairwise tissue DiCo test
Request a detailed protocolIn order to further verify the inter-tissue DiCo pattern that we observed between development and ageing periods, we used a different approach based on expression correlations among tissues. We calculated pairwise Spearman’s correlation coefficients among tissues of the same individual mouse using all commonly expressed genes among the tissues (n = 15,063). For each tissue pair, we tested the correlation between age and inter-tissue expression correlations using the Spearman’s correlation test in development and in ageing periods separately. In addition, we calculated the mean (or median) of all six pairwise tissue correlations for each individual mouse and tested the correlation between age and average inter-tissue expression correlations using the Spearman’s correlation test (Figure 2—figure supplement 6).
Determination of tissue-specific genes
To identify which tissue(s) contribute to the reversal pattern, we assigned each gene to a tissue to identify tissue-specific expression patterns. First, we calculated an ES between the expression of a gene in a tissue versus other three tissues using the development samples only, and repeated this procedure for all tissues. Hence, we obtained ES for each commonly expressed gene in each tissue. ES was calculated using the ‘Cohen’s d’ formula defined as the difference between the two means divided by the pooled standard deviation. We then assigned each gene to a tissue in which the gene has the highest ES. Finally, we retrieved only the fourth quartile (>Q3) of genes assigned to a tissue to define tissue-specific expression. Using this approach, we identified 3766 tissue-specific genes in total (cortex: 1175; lung: 839; liver: 986; muscle: 766 genes).
Enrichment test with the direction of age-related change
Request a detailed protocolWe tested the association between tissue specificity and age-related expression change during ageing using Fisher’s exact test. Specifically, we constructed a contingency table with two categorical variables; the first variable defines the direction (either positive or negative) of maximum expression change during ageing identified in a tissue-specific gene, which is determined by the slope of the regression between log2 age and expression. The second variable defines whether this maximum expression change identified in a tissue-specific gene occurs in its native tissue or not (either yes or no). Hence, a positive odds ratio (OR) suggests that (1) either the expression of genes decreases the most in their native tissue and/or (2) the expression of genes increase the most in a non-native tissue during ageing.
Enrichment of tissue-specific genes in DiCo genes
Request a detailed protocolWe tested the association between tissue specificity (being either tissue-specific [n = 3766] or not [n = 11,297]) and the DiCo pattern (either showing DiCo [n = 4802] or not [n = 10,261]) using the Fisher’s exact test, calculating the enrichment of tissue-specific genes within DiCo genes.
Additional publicly available bulk tissue transcriptome datasets
Jonker
Request a detailed protocolWe downloaded the raw data from the GEO database with GSE34378 accession number (Jonker et al., 2013) and followed the same analysis pipeline described above using all the samples from five tissues (‘brain – cortex,’ ‘lung,’ ‘liver,’ ‘kidney,’ ‘spleen’) of 18 female mice comprising 90 samples in total. This dataset represents the ageing period of the mouse, ranging from 90 to 900 days. Using the oligo package v.1.56.0 (Carvalho and Irizarry, 2010), we retrieved the expression matrices and performed ‘rma’ normalisation followed by removing the probesets that were annotated to more than one gene. We confined the analysis to only the protein-coding genes expressed in at least 25% of all samples. The resulting 17,661 genes were log2 transformed (after adding 1) and quantile normalised using the preprocessCore library (Bolstad, 2020) across all samples. Downstream analysis was the same as described above.
Schaum
Request a detailed protocolWe downloaded the raw count matrix from the GEO database with GSE132040 accession number (Schaum et al., 2020) and performed the same filtrating steps as described above. We discarded the samples that have less than 4 million reads, which was the cutoff used in the article. We restricted the analysis to only protein-coding genes expressed in at least 25% of the samples that have expression in four tissues (‘brain,’ ‘lung,’ ‘liver,’ ‘muscle’). One individual was removed from the analysis due to being an outlier in PCA after visual inspection (mouse ID: ‘3m7,’ PCA plots before and after outlier removal are present in our GitHub repository; hmtzg, 2022). Final dataset contained 16,806 protein-coding genes from 37 mice that range from 3 to 27 months of age covering the ageing period. There were 11 female mice ranging from 3 to 21 months of age and 26 male mice ranging from 3 to 27 months of age. We performed the same normalisation method and downstream analyses described above. We extended the analysis to eight tissues (‘brain,’ ‘heart,’ ‘kidney,’ ‘liver,’ ‘lung,’ ‘muscle,’ ‘spleen,’ ‘subcutaneous fat’) which were chosen based on the highest number of individuals that have the same tissue samples and that cover the whole ageing period (3–27 months). For the fat tissue, ‘subcutaneous fat’ was chosen as representative tissue which has the highest number of samples among all minor fat tissues. After performing the same preprocessing steps explained above, the final dataset contained 17,619 genes from 26 mice. Downstream analysis was the same as above.
GTEx
Request a detailed protocolWe downloaded the processed GTEx v8 dataset (Battle et al., 2017) from the data portal and repeated the analysis in human tissues. We first confirmed our results in the same 4 tissues (‘brain – cortex,’ ‘lung,’ ‘liver,’ ‘muscle – skeletal’) and then expanded the analysis to 10 tissues (‘adipose – subcutaneous,’ ‘artery – tibial,’ ‘brain – cerebellum,’ ‘lung,’ ‘muscle – skeletal,’ ‘nerve – tibial,’ ‘pituitary,’ ‘skin – sun exposed [lower leg],’ ‘thyroid,’ ‘whole blood’). In order to choose which tissues to analyse, we first chose the minor tissues with the highest number of samples for each major tissue, which prevents the representation of the same tissue multiple times. We then performed hierarchical clustering of tissues based on the presence of samples from the same individuals (Figure 2—figure supplement 13) and cut the tree into three clusters based on visual inspection. We selected the cluster with the highest number of overlapping individuals to analyse. The same procedure was followed for both 4- and 10-tissue analyses. In particular, we restricted the analysis to the individuals with samples in all tissues analysed and with a death circumstance of 1 (violent and fast deaths due to an accident) and 2 (fast death of natural causes) on the Hardy scale (n = 47 for 4 tissues, n = 35 for 10 tissues). We removed duplicated genes from the analysis. Similar to our analysis with the mice data, we used only the protein-coding genes that are expressed in at least 25% of all samples, totalling 16,197 for 4 tissues and 16,305 for 10 tissues. The TPM values obtained from the GTEx data portal were log2 transformed (after adding 1) and quantile normalised using the preprocessCore library (Bolstad, 2020) in R. Downstream analysis was the same as other datasets. To study the sex-specific convergence patterns, we repeated the same analysis separating female (n = 11) and male (n = 36) individuals.
Comparison of datasets
Request a detailed protocolWe compared the age-related expression change patterns across tissues of all datasets analysed using Spearman’s correlation coefficient. We used the ‘pheatmap’ function from pheatmap package v1.0.12 (Raivo, 2019) using hierarchical clustering (Figure 4—figure supplement 2a).
We performed Fisher’s exact test to test the enrichment of convergent genes among datasets during ageing. We used only the convergent genes in ageing in our dataset (n = 7748) for comparison. For GTEx and Schaum et al. datasets, we performed enrichment for the same four tissues as our dataset and also for the larger sets, indicated as GTEx10 and Schaum8, respectively (Figure 4—figure supplement 2b).
Regulatory analysis
Request a detailed protocolWe used MiRTarBase (downloaded on 03/08/2021; Hsu et al., 2011, Hsu et al., 2014) and TRANSFAC (downloaded on 03/08/2021; Matys et al., 2003; Matys et al., 2006) resources from the Ma’ayan lab database (Rouillard et al., 2016) for miRNA and transcription factor binding site (TFBS) enrichment analyses, respectively. As the database contains target information only for human HGNC IDs, we first converted those IDs to human Ensembl IDs and then to mouse Ensembl IDs only for the one-to-one ortholog genes using ‘getBM’ and ‘getLDS’ functions from the biomaRt package. In total, we analysed 235 miRNAs associated with 5458 target genes and 158 TFs associated with 7427 target genes. We conducted the overrepresentation analysis in the same way as for the DiCo functional enrichment analysis: specifically, we tested the targets of each regulator for enrichment in -Co genes (convergent genes in ageing) among Di- genes (divergent genes in development) used as background to keep developmental patterns fixed. We restricted the analysis for miRNA and TFs that have at least five target genes. After multiple testing correction with the BH procedure, we found no enrichment among either of the regulator types. Enrichment results are given in Figure 4—source data 1.
Heteroscedasticity tests on the DiCo pattern
Request a detailed protocolTo test the hypothesis that the convergence pattern observed in the ageing period could be explained by the increased noise with age, thus regression towards the mean, we performed two distinct heteroscedasticity tests to compare DiCo genes against the lifelong-divergent genes (DiDi). In the first, we followed the method used to measure heteroscedasticity in Işıldak et al., 2020 and Kedlian et al., 2019. We first fit a linear model between log2-transformed age and expression level for each gene in each tissue (Kedlian et al., 2019; Işıldak et al., 2020; Somel et al., 2006). This represents the variability of error along the explanatory variable, age. Then, we calculated Spearman’s correlation coefficient between the absolute residual values and age, which can be used as an estimate of heterogeneity change with age. We compared the heterogeneity change values of DiCo and DiDi genes using a two-sided KS test in each tissue. In the second approach, we used the ‘ncvTest’ function from the ‘car’ package v.3.0.11 (Fox and Weisberg, 2018), which is a chi-squared test for heteroscedasticity estimated using a linear model. Again, we compared the heteroscedasticity measures of DiCo and DiDi genes using a two-sided KS test in each tissue.
Single-cell RNA-seq
Preprocessing
Request a detailed protocolWe used the Tabula Muris Senis dataset (Schaum et al., 2020) for scRNA-seq analysis as it is the only dataset to our knowledge that includes time-series samples covering old age and the tissues present in our dataset. Seurat-processed FACS data of the tissues lung, liver, skeletal muscle, and non-myeloid brain were downloaded from the figshare database (Pisco, 2020). The Seurat package v.4.0.0 (Stuart et al., 2019) was used to retrieve the expression matrix of the cells that are annotated to cell types in the original article. Each tissue contains samples from three time points: 90- (3 months), 540- (18 months), and 720-day-old (24 months) mice, totalling 14 samples each in lung, liver, and brain, and 9 samples in liver. We excluded cell types with less than 15 cells among all samples and excluded genes if the expression level is 0 for all cells at a given age. This resulted in a median number of 99–382 cells assigned to cell types, 6–24 cell types and 16,951–22,122 genes across tissues. Using 3-month-old mice, we calculated cell-type-specific expressions in each tissue. Specifically, we first calculated the mean expression levels among cells of an individual mouse for each cell type, and then calculated the mean among individuals to obtain an average expression value for each cell type. Uniprot gene symbols were converted to Ensembl gene IDs using the ‘biomaRt’ R package (Durinck et al., 2009).
Deconvolution
Request a detailed protocolWe used cell-type-specific expression profiles of 3-month-old mice to estimate relative contributions of cell types to the transcriptome profiles of tissues in our mouse dataset. For a given tissue in our mouse dataset, we used single-cell expression profiles of that tissue from the Tabula Muris Senis dataset. We used a linear regression-based deconvolution method for each tissue using three genesets: all genes (n = [12,492, 12,849]), DiCo genes (n = [4007, 4106]), and non-DiCo genes (n = [8485, 8743]). Regression coefficients were used as relative contributions of cell types according to the following linear model:
where i represents the tissue, Yi is the expression level of a sample in a tissue, bj1...jn represents the relative contributions of the n cell types in a tissue, and Xi1...in represents the expression levels of the n cell types in a tissue.
We then tested the effect of age on cell-type contributions (bj1,…bjn) using the Spearman’s correlation test in development and in ageing.
Cell-type similarities and their change during ageing
Request a detailed protocolTo investigate the contribution of cell-autonomous changes to inter-tissue convergence in ageing, we calculated pairwise cell-type expression correlations among tissues and studied how these correlations change with age. Based on pairwise correlations in the 3-month age group, we identified the maximally and minimally correlated cell-type pairs among tissues. Specifically, for each cell type in a given tissue, we chose the minimally correlated cell type in each of the other three tissues. For example, for each of the 10 cell types in the liver, we chose the minimally correlated cell type among the 15 cortex cell types, the minimally correlated cell type among the 24 lung cell types, and the minimally correlated cell type among the 6 muscle cell types. We repeated this procedure for all cell types in all four tissues, resulting in 54 cell-type pairs. Then, we calculated Spearman’s correlation coefficients between age and minimally correlated cell-type pairs identified in the 3-month-age group. Likewise, we repeated the same analysis for the maximally correlated cell-type pairs among tissues.
Permutation tests
Request a detailed protocolTo test whether DiCo genes are significantly more associated with cell-type proportion changes than non-DiCo genes, we performed a permutation test based on a resampling procedure. For each tissue, we took random samples among all genes (n = [12,492, 12,849]) with size N, where N is the number of DiCo genes in that tissue, and repeated the deconvolution analysis as explained above. By calculating cell-type proportion changes with age for each random sample repeated 1000 times, we created the null distribution for each cell type. Then, we calculated the p-values as the number of random samples having the same or higher cell-type proportion change values divided by the observed value (cell-type proportion changes with DiCo genes).
We applied a similar permutation scheme as explained above to test cell-type similarity change differences between DiCo and non-DiCo genes. For each random sample of non-DiCo genes with size N, we calculated the pairwise correlations among cell types of tissues and identified maximally and minimally correlated cell types in the 3-month-age group. Then, we calculated age-related changes of those correlations using Spearman’s correlation coefficient to construct the null distribution.
Analysis of within-tissue convergence of cell types
Request a detailed protocolAnalogous to inter-tissue convergence analysis, we also studied intra-tissue convergence of cell types in scRNA-seq data by calculating CoV among cell types within a tissue for each individual of ages 3 months, 18 months, and 24 months, separately. We filtered the data to obtain cell types present in at least two individual mice in every time point for each tissue which yielded 4, 7, 20, and 6 cell types in brain, liver, lung, and muscle, respectively. We then tested the mean CoV (or CoV per gene) change with age using Spearman’s correlation test.
Data availability
Sequencing data generated for this study have been deposited in GEO under accession code GSE167665. All data analysed during this study are included in the manuscript and supporting files. Source data files have been provided for all figures and figure supplements. Four additional and previously published datasets are used in this study: Jonker et al. 2013, GTEx Consortium et al. 2017, Schaum et al. 2020, and Tabula Muris Consortium 2020. All the code used to perform analyses is available in GitHub: https://github.com/hmtzg/geneexp_mouse (copy archived at swh:1:rev:1f2434f90404a79c87d545eca8723d99b123ac1c).
-
NCBI Gene Expression OmnibusID GSE167665. Bulk RNA-seq of mice covering the whole lifespan (2 days to 904 days) from four tissues.
-
NCBI Gene Expression OmnibusID GSE34378. Aging Experiment.
-
figshareOfficial data release for Tabula Muris Senis.https://doi.org/10.6084/m9.figshare.12654728.v1
-
NCBI Gene Expression OmnibusID GSE132040. Tabula Muris Senis: Bulk sequencing.
References
-
SoftwaretopGO: Enrichment Analysis for Gene OntologyTopGO.
-
HTSeq--a Python framework to work with high-throughput sequencing dataBioinformatics (Oxford, England) 31:166–169.https://doi.org/10.1093/bioinformatics/btu638
-
Gene Ontology: tool for the unification of biologyNature Genetics 25:25–29.https://doi.org/10.1038/75556
-
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple TestingJournal of the Royal Statistical Society 57:289–300.https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
-
Aging and immortality: quasi-programmed senescence and its pharmacologic inhibitionCell Cycle (Georgetown, Tex.) 5:2087–2102.https://doi.org/10.4161/cc.5.18.3288
-
Trimmomatic: a flexible trimmer for Illumina sequence dataBioinformatics (Oxford, England) 30:2114–2120.https://doi.org/10.1093/bioinformatics/btu170
-
A framework for oligonucleotide microarray preprocessingBioinformatics (Oxford, England) 26:2363–2367.https://doi.org/10.1093/bioinformatics/btq431
-
Genomes optimize reproduction: aging as a consequence of the developmental programPhysiology (Bethesda, Md.) 20:252–259.https://doi.org/10.1152/physiol.00010.2005
-
STAR: ultrafast universal RNA-seq alignerBioinformatics (Oxford, England) 29:15–21.https://doi.org/10.1093/bioinformatics/bts635
-
Elevated histone expression promotes life span extensionMolecular Cell 39:724–735.https://doi.org/10.1016/j.molcel.2010.08.015
-
BookChapter 20 - Mouse Models in Aging ResearchIn: Fox James G, Davisson Muriel T, Quimby Fred W, Barthold Stephen W, Newcomer Christian E, Smith Abigail L, editors. The Mouse in Biomedical Research. Academic Press. pp. 637–672.
-
Genetics of longevity in model organisms: debates and paradigm shiftsAnnual Review of Physiology 75:621–644.https://doi.org/10.1146/annurev-physiol-030212-183712
-
Softwaregeneexp_mouse, version swh:1:rev:1f2434f90404a79c87d545eca8723d99b123ac1cSoftware Heritage.
-
miRTarBase: a database curates experimentally validated microRNA-target interactionsNucleic Acids Research 39:D163–D169.https://doi.org/10.1093/nar/gkq1107
-
Effect of replicative age on transcriptional silencing near telomeres in Saccharomyces cerevisiaeBiochemical and Biophysical Research Communications 219:370–376.https://doi.org/10.1006/bbrc.1996.0240
-
Osteoclast formation, survival and morphology are highly dependent on exogenous cholesterol/lipoproteinsCell Death and Differentiation 11 Suppl 1:S108–S118.https://doi.org/10.1038/sj.cdd.4401399
-
Aging increases cell-to-cell transcriptional variability upon immune stimulationScience (New York, N.Y.) 355:1433–1436.https://doi.org/10.1126/science.aah4115
-
TRANSFAC: transcriptional regulation, from patterns to profilesNucleic Acids Research 31:374–378.https://doi.org/10.1093/nar/gkg108
-
TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotesNucleic Acids Research 34:D108–D110.https://doi.org/10.1093/nar/gkj143
-
SoftwarePheatmap: Pretty HeatmapsR Package Version.
-
Widespread sex dimorphism in aging and age-related diseasesHuman Genetics 139:333–356.https://doi.org/10.1007/s00439-019-02082-w
-
Gene expression becomes heterogeneous with ageCurrent Biology 16:R359–R360.https://doi.org/10.1016/j.cub.2006.04.024
-
Human Ageing Genomic Resources: new and updated databasesNucleic Acids Research 46:D1083–D1090.https://doi.org/10.1093/nar/gkx1042
-
Estimating the number of clusters in a data set via the gap statisticJournal of the Royal Statistical Society 63:411–423.https://doi.org/10.1111/1467-9868.00293
-
BookErosion of the Epigenetic Landscape and Loss of Cellular Identity as a Cause of Aging in MammalsCold Spring Harbor Laboratory.
Decision letter
-
Bérénice A BenayounReviewing Editor; University of Southern California, United States
-
Kathryn Song Eng CheahSenior Editor; The University of Hong Kong, Hong Kong
Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Inter-tissue convergence of gene expression and loss of cellular identity during ageing" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Matt Kaeberlein as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this letter to help you prepare a revised submission.
Essential revisions:
1. The reviewers noted some potential issues with the normalization of the RNA-seq data that could affect the results, including the use of FPKM normalization, which may drive some of the observations due to lack of inter-sample-normalization (see specific comments by Reviewers #2 and 3). The authors would need to show that using appropriate count-based methods yields similar conclusions.
2. The authors often support their conclusions on low effect sizes and significance above usual thresholds. It would be important to explicitly discuss these as caveats, as well as strongly tone down these conclusions (including in the title.). More generally, the reviewers felt that potential alternative hypotheses needed to be clearly laid out, and potential ruled out, for the manuscript to stand more strongly (see specific comments by Reviewers #1, 2 and 3.)
3. In general, the reviewers felt that additional information on methods was required for reproducibility (see individual reviewer reports for specific points).
Reviewer #1 (Recommendations for the authors):
1. "Such "runaway development" phenomena may be due to insufficient natural selection pressure to optimise regulation after sexual reproduction starts." As stated, this sentence suggests that maladaptive traits that could lower an organism's total reproductive success would not be selected against as long as they only appear after the beginning of reproduction. I would perhaps ask that here the authors cite, not a general review of evolutionary theories of aging, but a publication that specifically proposes that alleles that are deleterious as early as the beginning of sexual reproduction nevertheless are not selected against.
2. Lines 84-91, 145-153, 229-240, 245-250, etc. Here the font size changes in the manuscript which I assume is not intended.
3. Figure 1 caption. "Similarities were calculated using Spearman correlations coefficient between expression-age correlations across tissues." Here I suspect that "Spearman correlations coefficient" was intended to be "Spearman's correlation coefficient".
Reviewer #2 (Recommendations for the authors):
Although the authors report an intriguing finding, there are major issues in the manuscript as it stands, notably concerning the clarity and rigor of the data analysis and manuscript.
1. For the analysis of their RNA-seq dataset across samples, the authors describe using the FPKM framework (methods, line 438). This is very problematic for a number of reasons – use of FPKM for differential gene analysis across samples is deprecated because it poorly controls for inter-sample variability [PMID: 22988256; PMID: 32284352]. FPKM are not comparable between samples because the normalization is sample-specific – thus, since no cross-library normalization step is performed, it should never be used for between sample comparison. Thanks to community benchmarking efforts, TMM or VST normalized counts are widely believed to be the superior choice for DGE analysis [PMID: 22988256; PMID: 32284352]. Thus, it is crucial that the authors correct their analysis to avoid the use of FPKM, and show that their conclusions would hold with an analysis on TMM or VST normalized counts.
2. There are issues regarding consistency in analytical choices throughout the paper.
a. Application of significance cutoffs vary widely in the manuscript when selecting specific gene sets. Reasoning should be provided for each case in which significance cutoff was (or not) applied (e.g. Line 111). For example, it is rather concerning that in Figure 1d, no gene was significantly up-regulated with age in the muscle (significance cutoff applied), but in Figure 1f, a large proportion of genes are shown to increase in expression with age in the muscle (significance cutoff not applied).
b. Authors are not consistent with the FDR threshold applied (e.g. FDR < 10% in Line 123, FDR 20% in Line 127). The reasoning for the chosen thresholds, as well as the different choices should be provided for cases in which higher than standard FDR is applied.
c. In Figure 1, authors show that the muscle presents with the least number of significantly changed gene expression patterns (up and down) with age. Additionally, in the manuscript, the authors explain that the lack of genes that show statistically significant change in the muscle is supported by a previous study that also showed "weak ageing transcriptome signature across multiple datasets (Line 137)." Thus, it is rather concerning that excluding the cortex, a tissue with a large number of differentially expressed genes, for CoV calculation resulted in greater significance.
d. The authors use a mix of multiple correction methods without clear explanations as to why a consistent measure isn't used (i.e. FDR/BH, BY) – BY test is justified for GO analysis line 472, but why BY was not used for DGE analysis is not clear (line 462).
e. A number of statistical analyses performed in the manuscript do not pass the standard significance thresholds (FDR < 5% or p < 0.05). The standard applied in the field is usually p-value or FDR < 5%, although relaxing this threshold can be done when dealing with noisier datasets. When using more relaxed FDR or p-value thresholds, the authors should also be careful to strongly tone down statements proportionally to the lower statistical support for their claims (eg. Line 240: "a SIGNIFICANT divergence-convergence pattern).
3. The authors use a dataset from exclusively male mice. Since aging is known to be extremely sex-dimorphic [PMID: 27304504], the authors need to explicitly discuss this caveat/limitation in the main text (i.e. these observations may not hold in female animals). If the authors wish to make a more general statement, a suggestion would be to reanalyze data from the recent Tabula Muris Senis Companion paper Schaum et al., [PMID: 32669715], which includes similar tissues, "developmental" time points (1 vs 3 months) and aging course (3 to 27 months), and both sexes to extend the findings. In addition, the Schaum dataset is also RNA-seq, and thus more directly comparable to the author's dataset compared to the microarray Jonker dataset used for partial validation (Figure 2 – supplement 8).
4. The authors should consider revising the title, since the current form seems to indicate that the work discovered s significant loss of cellular identity during aging. This statement is too strong based on the analyses of the paper. We suggest toning down the title, for instance to "Transcriptomic analyses suggest inter-tissue convergence of gene expression and loss of cellular identity during ageing".
Reviewer #3 (Recommendations for the authors):
Authors only analyze the gene sets that are correlated with age i.e. linearly go up or down with age. However, ageing trajectories for different genes can be very dynamic. This is the likely reason why only a few age-related changes were detected in some tissues. To assess the robustness of some of the key observations, authors should consider an orthogonal and unbiased clustering analysis similar to Figure 2 supplement to get the patterns for gene expression changes with age and to what extent DiCo is observed with different clusters.
Some of the key observations have either low effect size or are not statistically significant. For example, inter-tissue expression similarity during ageing and development (lines 115-119), the expression reversal (lines 158-161), changes in coefficient of variation with age (lines 202-207) etc. Authors clearly mention these in the manuscript, but it raises concerns on the extent to which some of these observations might be random, or due to idiosyncrasies of RNA-seq itself. These caveats should also be discussed.
In the global PCA based analysis for figure 1 and 2 combining both development and ageing data, the variance can be driven by one of the two states. For example, transcription divergence analysis in PCA in figure 1 based on development alone is not significant which suggests that the divergence of the developmental samples is driven in-fact by ageing. For PCA based analysis using separate PCAs for development and ageing will be more statistically sound.
Are there any known gene categories that have stronger expression reversal trends? For example, do known transcription factors or pathways that regulate development show DiCo if analyzed separately? Or is this trend only seen at the genome-wide level and completely stochastic?
For functional enrichment analyses authors only mention some selected functions in the text but GO terms have a lot of redundancy and it is unclear if there is any emerging functional trends looking beyond the redundancy or selected examples. It will be useful to thoroughly analyze and summarize the enriched GO terms to assess for any functional patterns that connect functional decline with ageing and convergent expression or DiCo.
Is there a significant overlap between mouse and human ageing related changes at the gene or pathway level?
Line 299 – linking the functional categories with cellular identity appears a bit hand-wavy. Stronger support would be needed to make this conclusion.
Some of the aspects need more methodological details:
For RNA-seq analysis, were all the samples normalized together, or were development and ageing samples normalized separately? It should be clarified in the methods.
Were the mouse perfused? The details should be added in methods. If the mouse were not perfused, a significant shared proportion of gene expression can be due to blood components. Also, the possibility that some of the shared changes could be due to infiltration of immune and other cell types in tissues should be mentioned.
More details for single cell deconvolution should be added in methods.
It is unclear to me how how the age related gene expression and PC correlation was performed. Was the loadings underlying PC axes used for correlation analysis? It should be elaborated in methods.
Some of the supplemental figures are missing.
https://doi.org/10.7554/eLife.68048.sa1Author response
Essential revisions:
1. The reviewers noted some potential issues with the normalization of the RNA-seq data that could affect the results, including the use of FPKM normalization, which may drive some of the observations due to lack of inter-sample-normalization (see specific comments by Reviewers #2 and 3). The authors would need to show that using appropriate count-based methods yields similar conclusions.
We believe this comment mainly stems from the lack of clarity of our methods, which we improved in this version. We had used quantile normalisation on FPKM normalised values, which is an inter-sample-normalisation and normalises for overall library sizes between samples. In this revised version, we further repeat the analysis with VST normalisation following the DESeq2 pipeline and confirm the robustness of our results. The details are provided under the relevant comments below.
2. The authors often support their conclusions on low effect sizes and significance above usual thresholds. It would be important to explicitly discuss these as caveats, as well as strongly tone down these conclusions (including in the title.). More generally, the reviewers felt that potential alternative hypotheses needed to be clearly laid out, and potential ruled out, for the manuscript to stand more strongly (see specific comments by Reviewers #1, 2 and 3.)
We thank the reviewers for raising this issue. We have now (1) explained the difference between FPR (now called eFPP to prevent confusion) and FDR in the methods, to clarify that we use the same significance cutoff across different analyses, and also provide an additional measure that estimates the false positive proportion (not a p-value) based on permutations; (2) toned down the manuscript, including the title; (3) included additional discussion to summarise all the limitations laid out throughout the manuscript; (4) specifically tested the alternative hypothesis suggested by the reviewer #1; (5) proposed ageing-related increase in inter-cellular noise (heteroskedasticity) as a possible explanation for the convergence observed during ageing.
3. In general, the reviewers felt that additional information on methods was required for reproducibility (see individual reviewer reports for specific points).
We have re-written a significant proportion of the methods and included detailed explanations in the figure legends where applicable to improve reproducibility.
Reviewer #1 (Recommendations for the authors):
1. "Such "runaway development" phenomena may be due to insufficient natural selection pressure to optimise regulation after sexual reproduction starts." As stated, this sentence suggests that maladaptive traits that could lower an organism's total reproductive success would not be selected against as long as they only appear after the beginning of reproduction. I would perhaps ask that here the authors cite, not a general review of evolutionary theories of aging, but a publication that specifically proposes that alleles that are deleterious as early as the beginning of sexual reproduction nevertheless are not selected against.
We thank the reviewer for the suggestion. The article by de Magalhaes and Church is the review article in which the authors revised the developmental theory of ageing especially in the light of genomic studies and thus is the original article where the ‘runaway development’ phenomenon was introduced in these exact terms. We now also included the original article by Mikhail V. Blagosklonny, where a related phenomenon, the hyperfunction theory was first proposed (Blagosklonny, 2006) – however, this is again a conceptual review where the theory was first proposed based on observations on many different grounds. Therefore, we have now also included papers with experimental support for this notion (Ezcurra et al., 2018; Lind et al., 2019). Moreover, we have now included a statement in the discussion to convey the message that we observe a reversal in the majority of the genes, but this does not necessarily negate the developmental or hyperfunction theory, which might be relevant for a limited number of pathways and may contribute to the ageing phenotype (lines 550-553). Finally, we now cite (Fisher, 1930; Medawar, 1952; Williams, 1957), which suggests the force of natural selection declines with age.
2. Lines 84-91, 145-153, 229-240, 245-250, etc. Here the font size changes in the manuscript which I assume is not intended.
Thank you for pointing this out – we have now corrected it.
3. Figure 1 caption. "Similarities were calculated using Spearman correlations coefficient between expression-age correlations across tissues." Here I suspect that "Spearman correlations coefficient" was intended to be "Spearman's correlation coefficient".
Thank you – we have changed it and carefully checked the manuscript one more time to prevent such typos.
Reviewer #2 (Recommendations for the authors):
Although the authors report an intriguing finding, there are major issues in the manuscript as it stands, notably concerning the clarity and rigor of the data analysis and manuscript.
1. For the analysis of their RNA-seq dataset across samples, the authors describe using the FPKM framework (methods, line 438). This is very problematic for a number of reasons – use of FPKM for differential gene analysis across samples is deprecated because it poorly controls for inter-sample variability [PMID: 22988256; PMID: 32284352]. FPKM are not comparable between samples because the normalization is sample-specific – thus, since no cross-library normalization step is performed, it should never be used for between sample comparison. Thanks to community benchmarking efforts, TMM or VST normalized counts are widely believed to be the superior choice for DGE analysis [PMID: 22988256; PMID: 32284352]. Thus, it is crucial that the authors correct their analysis to avoid the use of FPKM, and show that their conclusions would hold with an analysis on TMM or VST normalized counts.
We thank the reviewer for their comment. We had used FPKM followed by quantile normalisation, which performs an inter-sample normalisation. We had mentioned this at lines 443, 662, and 682 in the previous version, but had not explained this normalisation procedure in detail. Now the relevant section in the methods is updated to explain the quantile normalisation method (line 690). The main reason we choose quantile normalisation is that we analysed both RNA-seq and microarray data and this normalisation method can be applied to both, following other background normalisations and transformations (i.e. FPKM + log2 transformation for RNA-seq and RMA correction and log2 transformation for microarray). However, we agree with the reviewer that use of VST and TMM are more conventional as these are already implemented in the widely used analysis pipelines of DESeq2 and EdgeR.
We thus repeated the analysis of our dataset using VST and confirmed the main results.
We replicated our findings presented in Figure 1 using VST-normalised data. The results are presented in Figure 1—figure supplement 10 and 11:
i) We first compared VST- and QN-processed data in terms of age-related expression change trends in each tissue and period separately, without applying a significance cutoff (across n=14,973 genes). We found high correlations in all 8 comparisons (⍴dev=[0.81, 0.94], ⍴ageing=[0.83, 0.93], Figure 1—figure supplement 11).
ii) Repeating PCA analysis with VST-normalised data, we show that variation in gene expression (n=14,973 genes) is largely explained by tissue differences across PC1, 2, and 3 (ANOVA p<10-16 for PC1-3). In PC4, which explains 7% of the total variation (as opposed to 8% in Figure 1b, we observe an age-effect across tissues in development (Spearman’s correlation coefficient ⍴=[0.57, 0.99], nominal p<0.01 for each test except in muscle; Figure 1—figure supplement 10b). We also confirmed that tissues diverge during postnatal development, although the convergence during ageing was not significant (change in mean Euclidean distance among tissues with age in PC1-4 space, ⍴dev=0.937, pdev=0.00185; during ageing ⍴ageing=-0.58, pageing=0.102, Figure 1-source data).
iii) Similar to Figure 1c results, we find that tissue pairs show weak correlations in their age-related expression change trends measured with VST-normalised data (n=14,973 genes), both during development (⍴=[0.19, 0.39]) and during ageing (⍴=[0.25, 0.34]). Furthermore, we find that the number of genes with the same direction of change (with or without significance cutoff) across four tissues with VST-normalised data is comparable to Figure 1d and Figure 1e result (Figure 1—figure supplement 10d-e).
iv) Reversal analysis with VST-normalised data revealed that ~50% (42-59%) of expressed genes (n=14,973) showed reversal patterns (without significance cutoff) in tissues comparable to Figure 1f result (Figure 1—figure supplement 10f). Comparing VST- and QN-normalised data, we found that 70-79% of the genes showed similar reversal or continuous patterns across tissues.
We confirmed our findings from Figure 2 with VST normalisation and presented the results in Figure 2—figure supplement 14:
i) We calculated CoV values across all expressed genes (n=14,973) among tissues in VST-normalised data and confirmed a significant mean CoV increase in development (Spearman’s correlation coefficient ⍴=0.99, p=10-5), consistent with QN normalised data that yielded ⍴=0.77 (p=0.041). During ageing, the decrease in mean CoV with age (⍴=-0.48, p=0.23) was comparable to the results shown in Figure 2a (⍴=-0.50, p=0.204).
ii) We replicated the difference between development and ageing in the proportion of genes showing divergent versus convergent patterns (at gene-wise FDR<0.1) with the VSTnormalised data (Figure 2—figure supplement 14d-e). Specifically, VST-normalised data displayed 89% divergence in development (among 3,476 significant genes) as opposed to 70% divergence in QN normalised data (among 2,581 significant genes). Likewise, during ageing, VST-normalised data displayed 68% convergence (among 19 significant genes) similar to 68% convergence in QN normalised data (among 62 significant genes).
We conclude that the choice of the normalisation method does not affect our results. We now present the VST-based results briefly in the text (lines 143-146, 263-265) to prevent redundancy and provide detailed explanations in the supplementary figure legends (Figure 1figure supplement 10, Figure 2—figure supplement 14). We now also explain the VST normalisation steps in the Methods (line 675).
2. There are issues regarding consistency in analytical choices throughout the paper.
a. Application of significance cutoffs vary widely in the manuscript when selecting specific gene sets. Reasoning should be provided for each case in which significance cutoff was (or not) applied (e.g. Line 111). For example, it is rather concerning that in Figure 1d, no gene was significantly up-regulated with age in the muscle (significance cutoff applied), but in Figure 1f, a large proportion of genes are shown to increase in expression with age in the muscle (significance cutoff not applied).
We thank the reviewer for their comment. We actually analyse all the results in two ways: (1) using a consistent significance cutoff (multiple testing (FDR) adjusted p-value<0.1, or if no multiple testing was applied, p-value<0.05), and (2) without using a significance cutoff to assess transcriptome-wide trends. We emphasise this as soon as the first sentence of the Results section titled “Tissues involve common gene expression changes with age”. We now added a second sentence to emphasise this point:
“We next characterised age-related changes in gene expression shared across tissues by (i) studying overall trends at the whole transcriptome level and testing their consistency using permutation tests, and (ii) studying statistically significant changes at the single gene level.”
This has two motivations. First, given the low signal/noise ratios during mammalian ageing, identifying statistically significant patterns in individual genes is frequently difficult, but transcriptome-wide patterns can be more effectively identified. Second, observing the same trends using both approaches increases the robustness of our conclusions and may suggest that the observations are a genome-wide phenomenon, not restricted to a small number of genes. However, since transcriptome-wide trends are more prone to technical artefacts, we further test the genes that show statistically significant changes.
We now attempted to further clarify where a significance cutoff was and was not used in the analyses, by updating the text to emphasise the gene set of interest for each analysis (lines 131, 140).
We also thank the reviewer for pointing out the ambiguity about muscle genes. In Figure 1f, we study global reversal trends in each tissue for all the genes without any significance cutoff, i.e. only analysing expression change trends (following (Dönertaş et al., 2017)). Thus, although there are no statistically significant ageing-related genes in muscle tissue (Figure 1d), we observe a transcriptome-wide reversal trend (Figure 1f). We now updated the figure text to explain clearly what is presented (lines 183-187).
b. Authors are not consistent with the FDR threshold applied (e.g. FDR < 10% in Line 123, FDR 20% in Line 127). The reasoning for the chosen thresholds, as well as the different choices should be provided for cases in which higher than standard FDR is applied.
We thank the reviewer for pointing out that the difference between FDR (e.g. line 123 in the previous version) and FPR (e.g. line 127 in the previous version) was not clearly explained in the text. FDR is the multiple testing correction ‘false discovery rate’ applied using the ‘p.adjust’ function with BH procedure in R, which we did not clearly state in the first submission. We updated the text to explain that FDR is applied to correct the p-values for all statistical tests (functional association tests were corrected with BY procedure in the previous version but for consistency, we used FDR in all the tests in the current version – see point d.) (lines 711-714).
We have also updated the text to replace all “FDR<0.1” with “FDR corrected p-value<0.1”.
We had used the term FPR as an estimate of the false-positive rate calculated using random permutations (FPR = median number of positives observed across permutations / number of positives observed in the data). We provide this metric with permutation test results as additional information on the effect size, not as an alternative to FDR corrected p-values. However, thanks to the reviewer’s comment we realised that the phrase FPR could be confused with other metrics. We, therefore, redefined it as eFPP, ‘estimated false-positive proportion’ (lines 740-743). We now updated the relevant sections to make sure that these two definitions are not interchangeable. We also removed eFPP from the main text to avoid confusion and only provide this information only in the supplementary figures.
c. In Figure 1, authors show that the muscle presents with the least number of significantly changed gene expression patterns (up and down) with age. Additionally, in the manuscript, the authors explain that the lack of genes that show statistically significant change in the muscle is supported by a previous study that also showed "weak ageing transcriptome signature across multiple datasets (Line 137)." Thus, it is rather concerning that excluding the cortex, a tissue with a large number of differentially expressed genes, for CoV calculation resulted in greater significance.
We believe that the reviewer’s concern perhaps stems from our poor wording of CoV calculation in the text. For each individual, we calculated CoV across tissues using all expressed genes (n=15,063) regardless of their specific age-related expression changes (significant or not) in any tissue. Then, we summarised these gene-wise CoV values by calculating their mean to assess a genome-wide variation score among tissues of an individual and how these scores change with age (Figure 2a). We have now updated the relevant Results section to explain the CoV calculation in more detail (lines 221-223). We believe that the significant age-related expression changes in one tissue should not have a direct impact on the results as the number of significantly changed genes during ageing in the cortex was also very small (n=68).
Despite this, we repeated the same analysis, this time excluding the muscle tissue, which resulted in a nonsignificant but similar pattern during ageing (Spearman’s correlation test between mean CoV and age (n=15): ⍴dev=0.85, pdev=0.016; ⍴ageing=-0.29, pageing=0.49, Author response image 1a -right panel), and the lack of significance again might be driven by the lack of one cortex sample.
The exclusion of the cortex data was motivated by the fact that we were missing an aged individual in this tissue. However, estimating CoV using only three points may be suboptimal, and we, therefore, decided to be conservative and thus removed this analysis (i.e. results excluding cortex) from the manuscript.

Age-related change in CoV summarised across genes excluding cortex or muscle.
Change in CoV using 15 samples from three tissues (one oldest individual (904 days of age) excluded due to its missing data in the cortex), either excluding cortex (left column) or muscle (right column) tissue. Each point represents the (a) mean CoV, (b) median CoV value of all protein-coding genes (15,063) for each mouse. x-axis is in log2 scale. The dashed grey line shows the start of the ageing period. The Spearman’s correlation coefficients and p values for each period are indicated separately on the plots.
d. The authors use a mix of multiple correction methods without clear explanations as to why a consistent measure isn't used (i.e. FDR/BH, BY) – BY test is justified for GO analysis line 472, but why BY was not used for DGE analysis is not clear (line 462).
We thank the reviewer for this point. For consistency, we now use FDR (BH) for multiple testing correction for all statistical tests throughout the text. We now updated the relevant sections (lines 711, 713-714).
e. A number of statistical analyses performed in the manuscript do not pass the standard significance thresholds (FDR < 5% or p < 0.05). The standard applied in the field is usually p-value or FDR < 5%, although relaxing this threshold can be done when dealing with noisier datasets. When using more relaxed FDR or p-value thresholds, the authors should also be careful to strongly tone down statements proportionally to the lower statistical support for their claims (eg. Line 240: "a SIGNIFICANT divergence-convergence pattern).
We thank the reviewer for the comment. We believe that the 10% FDR cutoff allows a compromise between overall false positives and false negatives, and is commonly used as a threshold in the transcriptomics literature, including recent analyses on the association between gene expression and splicing (DOI: 10.7554/eLife.67077), expression and neurodegeneration (DOI: 10.7554/eLife.64564), or expression and psychosocial experiences (DOI: 10.7554/eLife.63852).
Using a more stringent threshold would reduce the false positive probability per gene. However, our emphasis in this manuscript is not individual genes but transcriptome-wide patterns, which we test using the whole transcriptome without significance threshold, and also with significant genes (with an FDR corrected p-value<0.1). The particular instance where we use ‘significant’ in line 240 refers to the result of Spearman’s correlation test between mean pairwise correlations among tissues and age in development and ageing periods.
That said, we did follow the reviewer’s suggestion and toned down our conclusions. Accordingly, we changed the referred line to ‘we observed the same divergence-convergence pattern’ (lines 286-287).
3. The authors use a dataset from exclusively male mice. Since aging is known to be extremely sex-dimorphic [PMID: 27304504], the authors need to explicitly discuss this caveat/limitation in the main text (i.e. these observations may not hold in female animals). If the authors wish to make a more general statement, a suggestion would be to reanalyze data from the recent Tabula Muris Senis Companion paper Schaum et al., [PMID: 32669715], which includes similar tissues, "developmental" time points (1 vs 3 months) and aging course (3 to 27 months), and both sexes to extend the findings. In addition, the Schaum dataset is also RNA-seq, and thus more directly comparable to the author's dataset compared to the microarray Jonker dataset used for partial validation (Figure 2 – supplement 8).
We thank the reviewer for this valuable comment. In fact, we identify convergent trends in ageing in both female and male animals, in our dataset (only males) and in the Jonker et al., dataset (only females), respectively. We further find that convergent genes overlap between the two datasets, although modestly (OR = 1.22, p = 1.67x10-8, Figure 4—figure supplement 2b). This suggests that DiCo is not unique to a single-sex in mice. We now explain this in lines 430-434, 614-616.
We would also have been interested in testing possible sex dimorphism in DiCo, but this is not possible given technical differences between the two datasets (i.e. Jonker et al.,'s and ours).
The GTEx dataset includes both male and female humans and displays patterns consistent with convergence during ageing, although not significantly (Figure 2—figure supplement 811). Following the reviewer's point, we repeated the GTEx dataset analysis separately for females and males. We observed a stronger convergence in females than in males, although none of these results were significant (⍴female=-0.58, pfemale=0.059; ⍴male= -0.052, pmale=0.77) (Figure 2—figure supplement 16). The female and male GTEx samples differ both in sample size (n=11 vs n=36, respectively) and age range (no male individuals within 20-29 and 70-79 age groups), which could explain the observed differences between sexes.
We now discuss possible sex-dimorphism in Discussion as a limitation of this study and suggest a more comprehensive analysis with different datasets in the future (line 612-620).
Finally, we thank the reviewer for suggesting including the Schaum et al., dataset. We now included this dataset, analysing both the same 4 tissues we have and a broader set of 8 tissues (Figure 2—figure supplement 17-20). However, we could not include the development period as the sample size was very small for the two time points; 1 and 3 months of age. We could not confirm the transcriptome-wide trend towards convergence during ageing in this dataset and found that the muscle and subcutaneous fat tissues showed the most divergent patterns.
However, we confirmed a strong association between the loss of identity and convergence during ageing. Not having the developmental period for the external datasets limited the complete replication of our original study, which focuses on the DiCo pattern, divergence followed by convergence. Nevertheless, a trend towards convergence during ageing was observed at the transcriptome level in Jonker et al., and GTEx and the association between convergence and loss of identity was evident across all datasets. We now updated the text to reflect that the support for the transcriptome-wide trend is weak and our most robust result is the association between identity loss and convergence.
We now add Figure 4—figure supplement 2 and Table 1 to summarise all results across datasets.
4. The authors should consider revising the title, since the current form seems to indicate that the work discovered s significant loss of cellular identity during aging. This statement is too strong based on the analyses of the paper. We suggest toning down the title, for instance to "Transcriptomic analyses suggest inter-tissue convergence of gene expression and loss of cellular identity during ageing".
Thank you for the suggestion – we have now toned down the title: ‘Inter-tissue convergence of gene expression during ageing suggests age-related loss of tissue and cellular identity’
Reviewer #3 (Recommendations for the authors):
Authors only analyze the gene sets that are correlated with age i.e. linearly go up or down with age. However, ageing trajectories for different genes can be very dynamic. This is the likely reason why only a few age-related changes were detected in some tissues. To assess the robustness of some of the key observations, authors should consider an orthogonal and unbiased clustering analysis similar to Figure 2 supplement to get the patterns for gene expression changes with age and to what extent DiCo is observed with different clusters.
We thank the reviewer for the comment. The reason we used Spearman’s correlation was to capture potential non-linear but monotonic changes. However, we agree with the reviewer that Spearman’s correlation cannot identify non-monotonic changes. As suggested, we now performed k-means clustering for each tissue to identify age-related expression patterns and then performed an enrichment test to assess if DiCo genes are enriched or depleted in any cluster. The results are now presented as supplementary figures (Figure 1—figure supplement 12-15). Notably, we did not observe a bias towards a cluster having a linear pattern with age or not in any of the tissues. For example, the cortex tissue displayed 15 clusters of expression changes, among which 5 are enriched and another 5 are depleted in DiCo (Figure 1—figure supplement 12). Two of the DiCo enriched clusters (2/5) do not have linear trajectories with age (clusters 12 and 14) whereas three of the DiCo depleted clusters (3/5) display linear patterns (clusters 6, 7 and 15). We have now presented this result in the relevant section (lines 276-277).
Some of the key observations have either low effect size or are not statistically significant. For example, inter-tissue expression similarity during ageing and development (lines 115-119), the expression reversal (lines 158-161), changes in coefficient of variation with age (lines 202-207) etc. Authors clearly mention these in the manuscript, but it raises concerns on the extent to which some of these observations might be random, or due to idiosyncrasies of RNA-seq itself. These caveats should also be discussed.
We now added a limitations paragraph in the Discussion, detailing both this aspect and other limitations raised by the other reviewers (lines 605-620).
In the global PCA based analysis for figure 1 and 2 combining both development and ageing data, the variance can be driven by one of the two states. For example, transcription divergence analysis in PCA in figure 1 based on development alone is not significant which suggests that the divergence of the developmental samples is driven in-fact by ageing. For PCA based analysis using separate PCAs for development and ageing will be more statistically sound.
We thank the reviewer for this suggestion. We now followed the reviewer’s suggestion and analysed age-related convergence using PCA performed only on ageing samples. In this way, we now analyse Euclidean distance in PC1-4 (i) using all samples together (across the lifetime – Figure 1a), (ii) using samples from developmental period only (Figure 1—figure supplement 3a-b), and (iii) using samples from ageing period only (Figure 1—figure supplement 3d-e). For the PCA analysis based on all the samples together, change in mean Euclidean distance with age among tissues was significant both during development and ageing (⍴dev=0.99, pdev=1.5x10-5; during ageing ⍴age=-0.87, page=0.0026). When we performed the PCA analysis separately for the two periods, we still observed a significant divergence among tissues during development (⍴=0.95, p=0.0008) and observed a marginally significant convergence during ageing (⍴=-0.64, p=0.0596). We hope that this analysis addresses the reviewer’s point. We also note that our earlier analysis had an error (we had used PC3-4 spaces to calculate the Euclidean distance for the mouse dataset, mistakenly excluding PC1-2), which we now corrected (Figure 1, Figure 1—figure supplement 3).
Are there any known gene categories that have stronger expression reversal trends? For example, do known transcription factors or pathways that regulate development show DiCo if analyzed separately? Or is this trend only seen at the genome-wide level and completely stochastic?
We thank the reviewer for the suggestion. We now used the MiRTarBase database for miRNAs and TRANSFAC for transcription factors and tested if any specific regulator is associated with DiCo pattern genes. We did not find any significant association (lines 353357). This may be consistent with DiCo being a transcriptome-wide and diffuse phenomenon caused by stochastic changes, as the reviewer indicates. It is also possible that not specific regulators but a change in their cooperative action influences the pattern, or we lack sufficient power with the current datasets to identify subtle effects of individual regulators. We now mention this also in the Discussion (line 601-603).
For functional enrichment analyses authors only mention some selected functions in the text but GO terms have a lot of redundancy and it is unclear if there is any emerging functional trends looking beyond the redundancy or selected examples. It will be useful to thoroughly analyze and summarize the enriched GO terms to assess for any functional patterns that connect functional decline with ageing and convergent expression or DiCo.
In the previous version, enriched GO terms were summarised according to their Jaccard similarities using the ‘emapplot’ function of the ‘enrichplot’ package. However, we agree with the reviewer that the enrichment network did not remove redundancy among the categories, and might have hidden unique and interesting functions.
We have now used another approach to summarise and interpret DiCo enriched categories in more detail. First, we clustered the categories based on the number of genes they share using hierarchical clustering and cut the tree into 25 clusters. For each cluster, we chose a representative category that has the highest mean-Jaccard similarity to the other categories in the same cluster (Dönertaş et al., 2021). Then, we calculated the mean expression changes (⍴ between expression and age) of the genes in the representative categories for each tissue, separately (Figure 4h). We believe that this new analysis provides better insight to our main findings such that tissue-related functions gained during development display opposing expression trends during ageing, i.e. reversal pattern, providing support to loss of tissuespecific expression and thus contributing to convergence during ageing. We have now updated the main text and methods (lines 343-351, 874-881).
Is there a significant overlap between mouse and human ageing related changes at the gene or pathway level?
We thank the reviewer for the suggestion. We now compare the correlations between ageing related expression changes and overlap between convergent genes during ageing (Figure 4 figure supplement 2). While mouse datasets, especially our data and Jonker et al., showed higher correlations among each other, the correlations between age-related expression changes in the human and mouse datasets were smaller but overall positive when we consider the same tissues in both datasets (average correlations between GTEx and mouse datasets are ⍴Jonker=0.12, ⍴Our_dat=0.13, ⍴Schaum=0.01). Convergent genes in ageing also showed weak but significant overlaps across datasets. Human dataset showed significant overlap with Jonker et al., and Schaum et al., datasets but not with ours. Nevertheless, the overlap for these datasets was also small. Importantly, apart from the differences in species, sex, age distribution and the technology used to generate the data, the small overlap between datasets may stem from the fact that the external datasets lack developmental samples. Thus, we can only compare convergence during ageing but not the DiCo pattern, the main focus of our study. Since only 62% of the convergent genes during ageing are divergent in development in our dataset, we should emphasise that low overlap for convergence does not rule out overlap across DiCo genes. Similarly, we use divergent genes during development as the background for our functional enrichment tests to find functional associations of DiCo. Since we lack the developmental background we performed functional enrichment for only the convergent genes but did not find any significant association for the GTEx dataset.
Line 299 – linking the functional categories with cellular identity appears a bit hand-wavy. Stronger support would be needed to make this conclusion.
Thank you for the suggestion. We have now toned down that particular line (347-350), and also revised the manuscript, in general, to make sure we do not overstate our findings.
Some of the aspects need more methodological details:
For RNA-seq analysis, were all the samples normalized together, or were development and ageing samples normalized separately? It should be clarified in the methods.
We thank the reviewer for pointing out this ambiguity. We have combined all the samples together regardless of their age (development and ageing together) or tissue (n=63) and performed quantile normalisation on log2 transformed (after adding 1) FPKM values. As FPKM normalisation does not account for inter-sample variability, we adopted this approach to control for cross-library variability between different tissues and ages. We have rewritten the relevant section in the methods to explain the normalisation in more detail (line 666-685). The same approach is used for the VST-normalisation in this version and development and ageing periods are normalised together.
Were the mouse perfused? The details should be added in methods. If the mouse were not perfused, a significant shared proportion of gene expression can be due to blood components. Also, the possibility that some of the shared changes could be due to infiltration of immune and other cell types in tissues should be mentioned.
We thank the reviewer for this question. Indeed, infiltration of other cell types is a serious consideration in most bulk tissue analyses. Since we observe a similar trend between different cell types using scRNAseq data and that DiCo is particularly associated with tissue-specific genes and not with any bood or immune-related signature, we believe the effect should be minimal in our analysis. However, the new dataset we included in the analysis, Schaum et al., did use perfused mice and although the association between the loss of tissue identity and convergence was particularly strong, the transcriptome-wide trend did not support convergence. We now included the potential influence of infiltration of other cell types as a limitation of our study in the Discussion (line 606-612).
More details for single cell deconvolution should be added in methods.
We have now updated the deconvolution section in the methods to explain how we conducted the analysis in more detail (lines 1054-1064).
It is unclear to me how how the age related gene expression and PC correlation was performed. Was the loadings underlying PC axes used for correlation analysis? It should be elaborated in methods.
We performed PCA on individuals’ gene expression values and used these PC scores of the individuals (not the loadings) and their ages to perform the correlation analysis. We have now updated the relevant section in the methods (lines 693-696).
Some of the supplemental figures are missing.
We are sorry for this confusion, but having checked our submission we could not detect any missing files. In case this might be an issue with the submission portal, for the second round we will update the bioRxiv version of the article where the reviewers can check all files separately if needed.
https://doi.org/10.7554/eLife.68048.sa2Article and author information
Author details
Funding
European Molecular Biology Laboratory
- Handan Melike Dönertaş
Scientific and Technological Council of Turkey (2232)
- Mehmet Somel
Science Academy (Turkey) BAGEP Awards
- Mehmet Somel
METU Internal Grant
- Mehmet Somel
Leibniz Institute on Aging – Fritz Lipmann Institute (FLI) (Open Access Fund)
- Handan Melike Dönertaş
Leibniz Association (Open Access Fund)
- Handan Melike Dönertaş
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Wolfgang Enard and Wulf Hevers for help with the mouse experiments and sharing samples, Nurcan Tuncbag, Nihal Terzi Çizmecioğlu, and the whole METU CompEvo team for helpful comments and fruitful discussions, and Zeliha Gözde Turan and Melih Yıldız for the critical reading of the manuscript and their suggestions. This work was supported by EMBL (HMD), the Scientific and Technological Research Council of Turkey (TÜBİTAK 2232, MS), the Science Academy (of Turkey) BAGEP Award (MS), and a METU Internal Grant (BAP, MS). The publication of this article was funded by the Open Access Fund of the Leibniz Association and the Leibniz Institute on Aging – Fritz Lipmann Institute (FLI), Jena, Germany. The FLI is a member of the Leibniz Association and is financially supported by the Federal Government of Germany and the State of Thuringia.
Ethics
Human subjects: Data involving human subjects were obtained from a published dataset, GTEx portal (https://www.gtexportal.org/home/datasets, with accession phs000424.v8.p2). Hence, no ethical statement is required.
Post-mortem samples were obtained from 16 C57BL/6J mice aged between 2 days and 904 days. All mouse experiments were overseen by the Institutional Animal Welfare Officer of the Max Planck Institute for Evolutionary Anthropology (MPI-EVA). They were performed according to the German Animal Welfare Legislation, ("Tierschutzgesetz") and registered with the Federal State Authority Landesdirektion Sachsen (No. 24-9162. 11-01 (T62/08)). The mice were sacrificed for reasons independent of this study, their tissues were harvested and frozen immediately, and stored at -80°C.
Senior Editor
- Kathryn Song Eng Cheah, The University of Hong Kong, Hong Kong
Reviewing Editor
- Bérénice A Benayoun, University of Southern California, United States
Publication history
- Preprint posted: March 2, 2021 (view preprint)
- Received: March 3, 2021
- Accepted: January 28, 2022
- Accepted Manuscript published: January 31, 2022 (version 1)
- Accepted Manuscript updated: February 1, 2022 (version 2)
- Version of Record published: February 25, 2022 (version 3)
- Version of Record updated: March 3, 2022 (version 4)
Copyright
© 2022, Izgi et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 3,294
- Page views
-
- 434
- Downloads
-
- 7
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Genetics and Genomics
The b-hemoglobinopathies, such as sickle cell disease and b-thalassemia, are one of the most common genetic diseases worldwide and are caused by mutations affecting the structure or production of β-globin subunits in adult hemoglobin. Many gene editing efforts to treat the β-hemoglobinopathies attempt to correct β-globin mutations or increase γ-globin for fetal hemoglobin production. δ-globin, the subunit of adult hemoglobin A2, has high homology to β-globin and is already pan-cellularly expressed at low levels in adult red blood cells. However, upregulation of δ-globin is a relatively unexplored avenue to increase the amount of functional hemoglobin. Here, we use CRISPR-Cas9 to repair non-functional transcriptional elements in the endogenous promoter region of δ-globin to increase overall expression of adult hemoglobin 2 (HbA2). We find that insertion of a KLF1 site alone is insufficient to upregulate δ-globin. Instead, multiple transcription factor elements are necessary for robust upregulation of δ-globin from the endogenous locus. Promoter edited HUDEP-2 immortalized erythroid progenitor cells exhibit striking increases of HBD transcript, from less than 5% to over 20% of total β-like globins in clonal populations. Edited CD34+ hematopoietic stem and progenitors (HSPCs) differentiated to primary human erythroblasts express up to 46% HBD in clonal populations. These findings add mechanistic insight to globin gene regulation and offer a new therapeutic avenue to treat β-hemoglobinopathies.
-
- Genetics and Genomics
Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal components (PCs), simulating simple population structures, and inconsistent use of real data and power evaluations. We evaluate PCA and LMM both varying number of PCs in realistic genotype and complex trait simulations including admixed families, subpopulation trees, and real multiethnic human datasets with simulated traits. We find that LMM without PCs usually performs best, with the largest effects in family simulations and real human datasets and traits without environment effects. Poor PCA performance on human datasets is driven by large numbers of distant relatives more than the smaller number of closer relatives. While PCA was known to fail on family data, we report strong effects of family relatedness in genetically diverse human datasets, not avoided by pruning close relatives. Environment effects driven by geography and ethnicity are better modeled with LMM including those labels instead of PCs. This work better characterizes the severe limitations of PCA compared to LMM in modeling the complex relatedness structures of multiethnic human data for association studies.