1. Computational and Systems Biology
Download icon

Mouse aging cell atlas analysis reveals global and cell type-specific aging signatures

  1. Martin Jinye Zhang  Is a corresponding author
  2. Angela Oliveira Pisco  Is a corresponding author
  3. Spyros Darmanis
  4. James Zou  Is a corresponding author
  1. Department of Electrical Engineering, Stanford University, United States
  2. Department of Epidemiology, Harvard T.H. Chan School of Public Health, United States
  3. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, United States
  4. Chan-Zuckerberg Biohub, United States
  5. Department of Biomedical Data Science, Stanford University, United States
Research Article
  • Cited 0
  • Views 3,770
  • Annotations
Cite this article as: eLife 2021;10:e62293 doi: 10.7554/eLife.62293

Abstract

Aging is associated with complex molecular and cellular processes that are poorly understood. Here we leveraged the Tabula Muris Senis single-cell RNA-seq data set to systematically characterize gene expression changes during aging across diverse cell types in the mouse. We identified aging-dependent genes in 76 tissue-cell types from 23 tissues and characterized both shared and tissue-cell-specific aging behaviors. We found that the aging-related genes shared by multiple tissue-cell types also change their expression congruently in the same direction during aging in most tissue-cell types, suggesting a coordinated global aging behavior at the organismal level. Scoring cells based on these shared aging genes allowed us to contrast the aging status of different tissues and cell types from a transcriptomic perspective. In addition, we identified genes that exhibit age-related expression changes specific to each functional category of tissue-cell types. Altogether, our analyses provide one of the most comprehensive and systematic characterizations of the molecular signatures of aging across diverse tissue-cell types in a mammalian system.

Introduction

Aging leads to the functional decline of major organs across the organism and is the main risk factor for many diseases, including cancer, cardiovascular disease, and neurodegeneration (Niccoli and Partridge, 2012; López-Otín et al., 2013). Past studies have highlighted different hallmarks of the aging process, including genomic instability, telomere attrition, epigenetic alterations, loss of proteostasis, deregulated nutrient sensing, mitochondrial dysfunction, cellular senescence, stem cell exhaustion, and altered intercellular communication (López-Otín et al., 2013; Campisi, 2013; Vijg and Suh, 2013; Nikolich-Žugich, 2018). However, the primary root of aging remains unclear, and the underlying molecular mechanisms are yet to be fully understood.

To gain a better insight into the mammalian aging process at the organismal level, the Tabula Muris Consortium, which we are members of, created the single-cell transcriptomic data set Tabula Muris Senis (TMS) (Tabula Muris Consortium, 2020). TMS is one of the largest expert-curated single-cell RNA sequencing (scRNA-seq) data sets, containing over 300,000 annotated cells from 23 tissues and organs of male and female mice (Mus musculus). The cells were collected from mice of diverse ages, making this data a tremendous opportunity to study the genetic basis of aging across different tissues and cell types. The TMS data is organized into scRNA-seq expression of different tissue-cell type combinations (e.g., B cells in spleen) via expert annotation and clustering.

The original TMS paper explored primarily the cell-centric effects of aging, aiming to characterize changes in cell type composition within different tissues. Here we provide a systematic gene-centric study of gene expression changes occurring during aging across different cell types. The cell-centric and gene-centric perspectives are complementary, as the gene expression can change within the same cell type during aging, even if the cell type composition in the tissue does not vary over time.

Our analysis focused on the TMS FACS data (acquired by cell sorting in microtiter well plates followed by Smart-seq2 library preparation Picelli et al., 2014) because it has more comprehensive coverage of tissues and cell types (Supplementary file 1) and is more sensitive at quantifying gene expression levels as compared to the TMS droplet data. As shown in Figure 1A, the FACS data was collected from 16 C57BL/6JN mice (10 males, 6 females) with ages ranging from 3 months (20-year-old human equivalent) to 24 months (70-year-old human equivalent). It contains 120 cell types from 23 tissues, totaling 110,096 cells. We also used the TMS droplet data (derived from microfluidic droplets) for those tissues where the data is available, to further validate our findings on an additional data set generated by a different method.

Figure 1 with 6 supplements see all
Analysis overview.

(A) Sample description. The TMS FACS data was collected from 16 C57BL/6JN mice (10 males, 6 females) with ages ranging from 3 months (20-year-old human equivalent) to 24 months (70-year-old human equivalent). (B) Significantly aging-dependent genes in all 76 tissue-cell types in the FACS data. The left panels show the number of aging-related genes (discoveries) for each tissue-cell type, broken down into the number of upregulated genes (orange) and the number of downregulated genes (blue), with the numbers on the left showing the ratio (up/down). Tissue-cell types with significantly more up/downregulated genes (ratio >1.5) are highlighted in solid color. Most tissue-cell types have significantly more downregulated aging genes. The right panel shows the number of cells sequenced for each tissue-cell type.

We investigated the comprehensive expression signatures of aging across tissues and cell types in the mouse. We performed systematic differential gene expression (DGE) analysis to identify aging-related genes in 76 tissue-cell type combinations across 23 tissues (Figure 1B, Supplementary file 1). Furthermore, we characterized both shared and tissue-cell-specific aging signatures. Our study identified global aging genes (GAGs), namely genes whose expression varies substantially with age in most (>50%) of the tissue-cell types. Interestingly, the changes in expression of these genes are highly concordant across tissue-cell types and exhibit strong bimodality. That is, these genes tend to be either downregulated during aging in most of the tissue-cell types or upregulated across the board. We leveraged this coordinated dynamic to construct an aging score based on the GAGs. We found that the aging score is significantly positively correlated with the chronological age, both in the FACS data and in multiple independent data sets. Moreover, the aging score contrasts the aging status of tissue-cell types with different functionalities and turnover rates, shedding light on the heterogeneous aging process across the 76 tissue-cell types. The score distinguished itself by its single-cell resolution and large data scale, as previous works either studied the biological age at an individual-level (Harries et al., 2011; Holly et al., 2013; Peters et al., 2015; Fleischer et al., 2018; Horvath, 2013; Jylhävä et al., 2017; Petkovich et al., 2017) or focused on a small number of organs (Ori et al., 2015; Arrojo E Drigo et al., 2019). Overall, our analysis highlights the power of scRNA-seq in studying aging and provides a comprehensive catalog of aging-related gene signatures across diverse tissue-cell types.

Results

Identification of aging-related genes

We considered 76 tissue-cell types in the TMS FACS data, 26 tissue-cell types in the TMS droplet data, and 17 tissues in an accompanying bulk RNA-Seq mouse aging study (Schaum et al., 2020) (referred to as the bulk data) with sufficient sample size. We performed DGE analysis for each tissue-cell type separately, treating all cells from the tissue-cell type as samples. We tested if the expression of each gene is significantly related to aging using a linear model treating age as a numerical variable while controlling for sex. We applied an FDR threshold of 0.01 (the number of comparisons corresponds to the number of genes in the tissue-cell type) and an age coefficient threshold of 0.005 (in the unit of log fold change per month, corresponding to 10% fold change from 3 m to 24 m). For details, please refer to the DGE analysis subsection in Materials and methods.

As shown in Figure 1B, the number of significantly age-dependent genes per tissue-cell type ranges from hundreds to thousands. Interestingly, most tissue-cell types have more downregulated aging-related genes than upregulated aging-related genes, suggesting a general decrease in gene expression over aging. This downregulation pattern is unlikely to be confounded by technical factors such as sequencing depth because the 18 m/24 m mice were sequenced at a higher depth (Figure 1—figure supplement 1C–E). By doing separate DGE analyses using mice from one sex or mice from a subset of age groups (3 m/18 m), we further found that such a downregulation pattern was mostly driven by 24 m mice and was not specific to one sex (Figure 1—figure supplement 2). We also observed a similar pattern in the droplet data (Figure 1—figure supplement 3).

In addition, we found that most aging-related genes identified in the analysis have monotonic aging trajectories, meaning that their expressions either increased or decreased monotonically during aging (Figure 1—figure supplements 4,5). However, a subset of genes in the FACS data (13%), while being upregulated during aging overall, increased from 3 m to 18 m and slightly decreased from 18 m to 24 m; those genes are enriched in brown adipose tissue (BAT) B cells, large intestine epithelial cells, and mesenchymal adipose tissue (MAT) mesenchymal stem cells of adipose (Figure 1—figure supplement 4C).

Bimodal effects of aging and global aging genes

We found that most genes are significantly related to aging in at least one tissue-cell type (13,376 in the TMS FACS data and 6233 in the TMS droplet data), consistent with the intuition that aging is a highly complex trait involving many biological processes. The aging-related genes discovered in the FACS data significantly overlap with other important gene sets, including both known human and mouse aging markers as recorded in the GenAge database (Tacutu et al., 2018), senescence genes (Campisi, 2013), transcription factors, eukaryotic initiation factors, and ribosomal protein genes (Figure 2—figure supplement 1). Some of the top overlapping genes, significantly related to aging in most tissue-cell types, include known mouse aging markers Jund, Apoe, and Gpx4 and known human aging markers Jund, Apoe, Fos, and Cdc42 from the GenAge database (Tacutu et al., 2018), and senescence genes Jund, Junb, Ctnnb1, App, and Mapk1. In addition, we found that each tissue-cell type has around 5% aging-related genes that are shared by the GenAge human aging markers. However, we did not find any tissue-cell types that are specifically enriched with these known human aging markers, suggesting that the conservation between mouse aging and human aging is relatively uniform across tissue-cell types (Figure 2—figure supplement 2).

We visualized all aging-related genes (significant in ≥1 tissue-cell type) in Figure 2A, where the color indicates the number of genes. The x-axis shows the weighted proportion of tissue-cell types (out of 76 tissue-cell types) where the gene is significantly related to aging, while the y-axis shows the weighted proportion of tissue-cell types where the gene is upregulated. The tissue-cell type weights used here are inversely proportional to the number of cell types in the tissue, in order to ensure equal representation of the tissues. The visualization makes it clear that there are more downregulated aging genes than upregulated ones, consistent with Figure 1B. Perhaps more strikingly, a bimodal pattern is apparent, in the sense that the aging-related genes tend to have a consistent direction of change during aging across most tissue-cell types. Interestingly, it was also recently reported in other studies that many shared aging-related genes exhibit consistent direction of change during aging across mouse tissues and cell types, including the brain (Ximerakis et al., 2019), kidney, lung, and spleen (Kimmel et al., 2019).

Figure 2 with 4 supplements see all
Tissue-cell level global aging genes (GAGs).

(A) Tissue-cell level aging-related genes with color indicating the number of the genes. The x-axis shows the weighted proportion of tissue-cell types (out of all 76 tissue-cell types) where the gene is significantly related to aging, while the y-axis shows the weighted proportion of tissue-cell types where the gene is upregulated. (B and C) Heatmap of age coefficients of the GAGs (panel B) and the top 20 GAGs (panel C). The age coefficients are in the unit of log fold change per month and blue/red represent down/upregulation. (D) Top GO biological pathways for the GAGs.

Genes that exhibit age-related expression changes across many cell types have particularly strong bimodal behavior. This motivated us to define global aging genes (GAGs), namely the genes that are significantly related to aging in more than 50% of weighted tissue-cell types (i.e., cell types after normalizing by cell type frequencies using the tissue-cell type weights as described above). We identified 330 GAGs in total, among which 93 are consistently upregulated and 190 consistently downregulated (>80% of weighted tissue-cell types); only 47 have an inconsistent directionality (upregulated in 20–80% of weighted tissue-cell types). We found that the GAGs significantly overlap with genes known in aging-related diseases, including strong overlap with genes related to Alzheimer’s disease (p=2.4e-11), neuroblastoma (p=1.4e-7), fibrosarcoma (p=3.3e-5), and osteoporosis (p=1.5e-4), and relatively weaker overlap with genes related to Huntington’s disease (p=2.5e-3), skin carcinoma (p=3.6e-3), kidney cancer (p=1.2e-3), acute promyelocytic leukemia (p=3.2e-3), acute myeloid leukemia (p=3e-3), endometrial cancer (p=1.6e-3), and hypertension (p=1.9e-3) (please see the Global aging genes subsection in Materials and methods for details). Our results are not sensitive to the specific choice of the 50% threshold for selecting GAGs; using different thresholds produced similar gene ontology (GO) enrichment analysis results (Figure 2—figure supplement 3C) or GAG scores (Figure 3—figure supplement 1A–D) as detailed below.

We visualized the age coefficients for 10 top up/downregulated GAGs (a consistent direction in >80% of weighted tissue-cell types) that are related to aging in the most number of tissue-cell types, as shown in Figure 2C. Many of these genes have been previously shown to be highly relevant to aging. For example, the downregulation of Lars2 has been shown to result in decreased mitochondrial activity and increase the lifespan for C. elegans (Lee et al., 2003). On the other hand, Jund is a proto-oncogene known to protect cells against oxidative stress and its knockout may cause a shortened lifespan in mice (Laurent et al., 2008). Moreover, Rpl13a was observed to be upregulated in almost all tissue-cell types. As a negative regulator of inflammatory proteins, Rpl13a contributes to the resolution phase of the inflammatory response, ensuring that the inflamed tissues are completely restored back to normal tissues. It also contributes to preventing cancerous growth of the injured cells caused by prolonged expression of the inflammatory genes (Zhou et al., 2015; Mazumder et al., 2003). Therefore, it is interesting to observe the upregulation of Rpl13a given that most old mice have severe inflammatory symptoms.

As shown in Figure 2D, Gene Ontology (GO) biological pathway enrichment analysis revealed that the 330 GAGs are associated with apoptosis, translation, biosynthesis, metabolism, and cellular organization. These biological processes are highly relevant to aging (Tower, 2015; Anisimova et al., 2018; Barzilai et al., 2012) and are shared across most cell types, consistent with the intuition that GAGs represent the global aging process across tissue-cell types. In addition, the KEGG pathways associated with the GAGs are consistent with the GO terms and additionally highlighted immune-related pathways and multiple aging-related diseases (Figure 2—figure supplement 4A). Moreover, the findings were supported by similar analyses on the set of 59 GAGs discovered in the droplet data (Figure 2—figure supplement 3A B). We also performed pathway enrichment analysis using the Ingenuity Pathway Analysis (IPA) software (Krämer et al., 2014), which confirmed our findings for the biological processes associated with the GAGs (Figure 2—figure supplement 4B). Of note is the finding that the mTOR pathway, a known aging-associated pathway, is predicted to be inhibited given the expression of the GAGs (Weichhart, 2018; Papadopoli et al., 2019; Johnson et al., 2013; Figure 2—figure supplement 4C). Interestingly, mTOR downregulation has been shown to promote longevity (Stallone et al., 2019; Laplante and Sabatini, 2012), a further indication that the GAGs are related to the aging process.

GAG score contrasts the heterogeneous aging status of tissue-cell types

Following the analysis of GAGs, we next leveraged these marker genes to characterize the holistic aging status of different tissue-cell types. We aggregated the expression of global aging markers into a single score for each cell, referred to as the GAG score (please see the GAG score subsection in Materials and methods for details). We used the FACS data to identify the GAGs because it has more comprehensive coverage of different tissue-cell types. Intuitively, the GAG score tags the global aging process, and it reflects both the chronological age of the organism and the tissue-cell type-specific aging effects. To formally dissect these components, we defined a fixed-effect model with the GAG score being the response variable and various other factors, including the chronological age, sex, and binary-coded tissue-cell types, being explanatory variables. As a sanity check, the chronological age effect on the GAG score is significantly positive (p<1e-100). The model explains 60.2% of the GAG score variance in the TMS FACS data while adding extra interaction terms between age and each tissue-cell type only slightly increased the model fit (explained variance 62.6%), indicating a reasonable fit for the current model (Figure 3—figure supplement 2C). Interestingly, while most young cells have smaller GAG scores and old cells have larger GAG scores, we also found four tissue-cell types whose GAG scores are similar between age groups and eight tissue-cell types that have a subpopulation of cells whose GAG scores are more similar to that of cells from a different age group (Figure 3—figure supplement 3), highlighting the heterogeneity of the aging process across tissue-cell types.

We next considered the tissue-cell type effects on the GAG score. Intuitively, a larger GAG score effect suggests that the corresponding cell type could be molecularly more sensitive to aging compared to other cells in the same animal. As shown in Figure 3A, immune cells and stem cells have higher GAG score effects, while most parenchymal cell types have lower GAG score effects; such a contrast is also statistically significant, as shown in Figure 3B. Indeed, immune cells and stem cells are known to undergo the most substantial changes with aging. Specifically, the aging of the immune system is commonly linked to the impaired capacity of elderly individuals to respond to new infections (Montecino-Rodriguez et al., 2013). Also, adult stem cells are critical for tissue maintenance and regeneration, and the increased incidence of aging-related diseases has been associated with a decline in the stem cell function (Ermolaeva et al., 2018). On the other hand, parenchymal cells like pancreatic cells, neurons, heart myocytes, and hepatocytes have lower aging scores. This could be an indication that these tissue-specialized cell types are more resilient to aging and are able to maintain their functions despite the changes in the animal.

Figure 3 with 8 supplements see all
GAG score.

(A) Tissue-cell GAG score effects with 95% confidence intervals. The color represents the functional category of the tissue-cell type. (B and C) Effects of cell functional categories (panel B) and binary cell lifespan (panel C) on the GAG score, meta-analyzed over all tissue-cell types within the group. A positive y-value means that the cells in the group (functional category group for panel B and binary cell lifespan group for panel C) have higher GAG score values than other cells of the same age and sex. 95% confidence intervals and nominal p-values are provided to quantify the differences between categories. In panel C, the average lifespan annotation is also provided for a subset of tissue-cell types where such information is available. (D) Comparison between the tissue-cell GAG score effects estimated from the FACS data (x-axis) and the droplet data (y-axis). Each dot corresponds to a tissue-cell type, and a linear fit is provided showing that the estimates are consistent.

We also found that the tissue-cell GAG score effects are in general positively correlated with the cell turnover rate. For example, short-lived cells like skin epidermal cells, monocytes, and T cells (Koster, 2009; Guilliams et al., 2018; Westera et al., 2013) have higher GAG score effects while long-lived cells like neurons, oligodendrocytes, pancreatic β-cells, liver hepatocytes, and heart atrial myocytes (Arrojo E Drigo et al., 2019; Teta et al., 2005; Magami et al., 2002; Bergmann et al., 2015) have very low GAG score effects. To quantify this observation, we assigned a binary cell lifespan label to a subset of cell types where such data is available from literature (Milo et al., 2010; Hobson and Denekamp, 1984; Stewart et al., 1980; Lawson et al., 1992; Arrojo E Drigo et al., 2019; Darwich et al., 2014; Lowry and Zehring, 2017; Geering et al., 2013; Cheshier et al., 1999; Fulcher and Basten, 1997; Scollay et al., 1980) (long for >180 days and short for <90 days, Supplementary file 1); using binary labels instead of the actual values allows us to incorporate more cell types whose exact lifespan information is not available but are known to be long-/short-lived. We found that short-lived cells have significantly higher GAG score effects than long-lived cells (p=1.68e-3, Figure 3C). One possible explanation is that the GAG score is associated with the biological processes related to cell proliferation, development, and death, which are more active in cell types that have a higher turnover rate. This striking difference is also consistent with the intuition that cells that have undergone more divisions (also with higher turnover rates) are ‘‘older’ and could have molecular memories.

Validating the GAG score on external data

We performed several analyses to validate the robustness of the GAG score. First, the GAG score is not sensitive to perturbations of the current scoring method, including using different criterion to select GAGs or not performing cell-wise background correction (Figure 3—figure supplement 1A–D). We also found that estimating tissue-cell GAG score effects using only old cells gave an almost identical result (Figure 3—figure supplement 1H).

Next, we performed a parallel analysis to identify GAGs on the TMS droplet data and found 59 such genes (due to smaller sample size and detection power). We found that 34 genes were shared between the droplet GAGs and the 330 FACS GAGs as described above (p=9e-48). Similar to the FACS GAGs, we also found that the droplet GAGs significantly overlap with genes known in many aging-related diseases, including Alzheimer’s disease (p=2.8e-4), neuroblastoma (p=1.2e-3), and fibrosarcoma (p=1.4e-3). In addition, we considered a set of 261 shared aging genes reported in Kimmel et al., 2019. This scRNA-seq study contains cells from the kidney, lung, and spleen in both young and old mice, and the 261 shared aging genes were defined as genes significantly related to aging in more than five cell types in the paper (Kimmel et al., 2019). We found that 90 genes were shared between Kimmel et al. genes and the FACS GAGs (p=2e-105) and 42 genes were shared between Kimmel et al. genes and the droplet GAGs (p=7e-69). All p-values reported here were computed via Fisher’s exact tests.

Using the GAGs identified from the TMS FACS data, we computed the GAG score and further estimated the GAG score effects for cells in the TMS droplet data, the bulk data (treating each mouse sample as a ‘‘cell’’), the Kimmel et al. data, and the data set from Kowalczyk et al., 2015. This last data set has only three subtypes of hematopoietic stem cells (HSCs) and was therefore omitted in other analyses. We found that the chronological age effect on the GAG score is significantly positive in all four validation data sets (p=7e-10 for the bulk data due to smaller sample size and detection power, and p<1e-100 for the other three data sets). Since the GAGs were selected based on the FACS data, the GAG score is agnostic of the age labels in the four validation data sets, confirming that the GAG score is truly indicative of the aging process.

The tissue-cell GAG score effects estimated from the other four data sets are also in line with those estimated from the FACS data (Figure 3—figure supplements 1,4E F). Specifically, in the droplet data and the Kimmel et al. data, immune cell types have higher GAG score effects while epithelial, endothelial, and parenchymal cell types have lower GAG score effects. In particular for the droplet data, short-lived cell types also have higher but non-significant GAG score effects, due to a smaller number of annotated cell types. While looking at the bulk data, we found that immune-related tissues and organs such as whole blood, spleen, and marrow have the highest GAG score effects. It is interesting to observe that in the Kowalczyk et al. data, the GAG score effects of MPPs (multipotent progenitors) is less than that of ST-HSCs (short-term HSCs) which is less than that of LT-HSCs (long-term HSCs). This is exactly aligned with the differentiation potentials of these three cell types, consistent with the hypothesis that more stem-like cells have higher GAG score effects as observed in the FACS data.

Finally, we found that the tissue-cell GAG score effects are highly consistent between data sets (correlation 0.76 with p=2e-19 between the FACS data and the droplet data in Figure 3D, correlation 0.75 with p=1e-3 between the FACS data and the Kimmel et al. data in Figure 3—figure supplement 1G). In summary, we showed that the GAG score is capable of describing the chronological age as well as the transcriptional changes during the aging process. Furthermore, we could use the tissue-cell GAG score effects to contrast the aging status of cell types with different biological properties, including functional categories and turnover rates. This provides a comprehensive analysis demonstrating how the GAG score captures the heterogeneous molecular effects of aging. Additionally, we repeated both the DGE analysis and the GAG score analysis at the tissue-level by combining all cell types from the same tissue. We observed qualitatively similar findings, supporting the robustness of the analyses. Please see Figure 3—figure supplements 58 for more details.

Category-specific aging genes

We next consider genes specific to a subset of tissue-cell types, including functional-category-specific genes, cell type-specific genes, tissue-specific-genes, and tissue-cell type-specific genes. Given a set of tissue-cell types, in order to have an overall meta age coefficient for cells in this tissue-cell type set, we first combined the age coefficients of all tissue-cel types within the set by meta-analysis; similarly, we also computed the outside-set meta age coefficient by meta-analyzing all outside-set tissue-cell types. Then we selected the genes that have significantly different within-set and outside-set meta age coefficients as the set-specific genes (please see the Category-specific aging genes subsection in Materials and methods for more details). Of note, almost no genes identified here are shared by GAGs.

In the original TMS paper (Tabula Muris Consortium, 2020), each tissue-cell type was assigned one of the six functional category labels, namely endothelial, epithelial, immune, stem/progenitor, stromal, and parenchymal cells. When examining the data by functional category, we found that the endothelial, immune, stem, and stromal cells exhibit highly category-specific aging behavior. Indeed, we found a higher number of specific aging genes for these categories (Figure 4B). Moreover, their age coefficients are specific to the respective functional category, as we can see from the clear block structure across tissue-cell types (Figure 4A). In addition, when performing GO biological pathway enrichment analysis on these six sets of genes separately, we only found significant pathways for these four categories (Figure 4C). Among them, endothelial-specific genes were associated with various processes related to angiogenesis and negative regulation of cell migration; the latter suggests decreased endothelial cell functionality during aging because endothelial cell migration is essential to angiogenesis (Lamalice et al., 2007). Also, immune-specific genes were associated with activation of various immune responses, in line with a strong link between the aging process and the immune system (Tabula Muris Consortium, 2020; Nikolich-Žugich, 2018). In addition, stem-specific genes were associated with ossification and diverse angiogenesis processes, and both stem-specific and stromal-specific genes were associated with extracellular matrix and structure organization.

Figure 4 with 3 supplements see all
Functional-category-specific genes.

(A) Age coefficients, in the unit of log fold change per month, for functional-category-specific genes. Both genes in the x-axis and tissue-cell types in the y-axis are ordered by functional categories. For example, the upper-left block corresponds to endothelial-specific genes. (B) Number of functional-category-specific genes for each category. (C) GO biological pathways for functional-category-specific genes, with color representing the negative log10 FDR.

Such an analysis also facilitates the discovery of interesting specific genes related to aging. For example, C2cd4b (Figure 4—figure supplements 1 and 2A), a parenchymal-specific gene, has large age coefficients in several pancreatic, mammary gland, large intestine cell types, and almost zero age coefficients in other cell types. The increased expression of C2cd4b has been associated with an increased risk of type 2 diabetes (Kycia et al., 2018) and increased expression of C2cd4b in old mice pancreatic cell types may suggest the increased risk of type 2 diabetes for these mice. In addition, C2cd4b has been shown to lead to sexually dimorphic changes in body weight and glucose homeostasis (Mousavy Gharavy et al., 2021), in line with the fact that the mammary gland is a well-known sexually dimorphic tissue. A second example is Gsn (Figure 4—figure supplements 1 and 2B), which is downregulated in stromal and stem cell types and upregulated in other cell types during aging. This can be explained by its function for making gelsolin, an important protein for cell movement. Not only is cell movement important to immune and endothelial cells, but Gsn has also been shown to be a potential biomarker to aging-related neurodegeneration (Manavalan et al., 2013).

Beyond functional-category-specific genes, we also identified genes specific to several cell types, including B cells, basal cells of epidermis, endothelial cells, macrophages, mesenchymal stem cells, mesenchymal stem cells of adipose, myeloid cells, and skeletal muscle satellite cells, corroborated by their association with related biological processes (Figure 4—figure supplement 3). The method also allowed us to identify genes specific to each tissue. However, we did not find any genes specific to a single tissue-cell type. All the gene sets are available in Supplementary file 3.

Discussion

This study provides a systematic and comprehensive analysis of aging-related transcriptomic signatures by analyzing 76 tissue-cell types in the TMS FACS data. Together with the analysis in the first publication of Tabula Muris Senis (Tabula Muris Consortium, 2020), it forms one of the largest analysis to date of the mammalian aging process at the single-cell resolution. Of particular interest are the 330 global aging genes (GAGs) identified in the study. These genes exhibit aging-dependent expressions in a majority of tissue-cell types in the mouse. The GAGs are enriched with many interesting genes, including known human and mouse aging markers, aging-related disease genes, senescence genes, transcription factors, eukaryotic initiation factors, and ribosomal protein genes. Interestingly, most of the GAGs are strongly bimodal as their expressions either decrease or increase during aging across almost all tissue-cell types, suggesting that these genes have a uniform response to aging, which is robust to the specific tissue or cellular context. Moreover, we find a systematic decrease in expression for most genes as well as a decrease in the number of actively expressed genes, suggesting a turning off of transcription activity as the animal ages. A recent study has observed that the number of expressed genes decreases during cellular differentiation in mouse (Gulati et al., 2020). It is interesting that we quantify a similar phenomenon for aging, despite the substantial longer time-scale of aging compared to differentiation.

While we focused on detecting genes whose changes with age are linear and non-sex-specific, it is also interesting to study aging-related changes that are non-linear or sex-specific. For example, as shown in Figure 1—figure supplement 1, the number of expressed genes changes with age in a non-linear and sex-specific manner, suggesting the existence of such genes. We have validated our findings using the TMS droplet data, the bulk RNA-seq data, and external data sets, and it is important to have further validations in future studies. In particular, the bimodal expression pattern is less apparent in the TMS droplet data, perhaps due to its limited tissue-cell type coverage and relatively shallower sequencing depth.

The remarkable bimodal consistency of the GAGs makes them useful as biomarkers to characterize the aging status of individual cells. We proposed a new aging score, namely the GAG score, based on the GAGs. The tissue-cell type-specific GAG score effects quantify how sensitive each tissue-cell type is to aging and are positively correlated with the cell division rate. For example, immune cells tend to have higher GAG scores than other cells of the same age and sex, which reflects the phenomenon that they undergo many cycles of cell division and also change substantially during the animal’s lifespan. One hypothesis is that the GAG score captures some aspects of the true biological age of the cells, which could be different from the birth age of the animal. An interesting direction of future work is to further investigate this model with functional experiments. In line with this, it would be important to study how some of the transcriptomic changes we quantify here, for example the downregulation of mTOR, point toward healthy aging or how can they inform experiments that can uncover the mechanism to ameliorate the aging effects. In addition, while the GAG score was proposed to capture the global aging status, there also exist cell type-specific aging programs; a combination of both would better characterize the overall aging status of a cell. The construction of cell type-specific aging scores would require a more comprehensive longitudinal catalog of diverse subtypes and states of cells within different cell types. While this is beyond the scope of the current work, it is an exciting direction to pursue when such data becomes available.

The GAG score is also related to the transcriptome age predictors developed in previous works (Harries et al., 2011; Holly et al., 2013; Peters et al., 2015; Fleischer et al., 2018), in the sense that they all use the gene expression information and are predictive of the animals’ chronological age. The commonly used approach in previous works is to train a model (e.g., linear/logistic regression model) to predict the individuals’ chronological age from their gene expression. Instead of a model-fitting algorithm, our GAG score uses the GAGs that were selected from a broad range of tissue-cell types in an unbiased manner, by meta-analyzing the DGE results of 76 tissue-cell types and putting each tissue-cell type on the same footing. This ensures that the genes used by the GAG score capture the shared aging process and are not biased toward certain tissue-cell types. Indeed, the GAGs were shown to be associated with biological processes that are highly relevant to aging (Figure 2), providing better interpretability of the score. In comparison, previous studies focused on only one specific tissue, such as the blood (Harries et al., 2011; Holly et al., 2013; Peters et al., 2015) or dermal fibroblasts (Fleischer et al., 2018), and hence may have selected genes that were biased toward that particular tissue.

Overall, our study provides a comprehensive characterization of aging genes across a wide range of tissue-cell types in mice. In addition to the biological insights, it also serves as a comprehensive reference for researchers working on related topics.

Materials and methods

Data preprocessing

Request a detailed protocol

We considered five data sets, namely the TMS FACS data, the TMS droplet data, the data in Schaum et al., 2020 (referred to as the bulk data), the data in Kimmel et al., 2019, and the data in Kowalczyk et al., 2015. For the TMS FACS data and the TMS droplet data, we filtered out genes expressed in fewer than 3 cells, filtered out cells expressing fewer than 250 genes, and discarded cells with a total number of counts fewer than 5000 for the FACS data and a total number of unique molecular identifiers (UMIs) fewer than 2500 for the droplet data. For the bulk data, we filtered out genes expressed in fewer than five samples, and filtered out samples expressing fewer than 500 genes. We did not filter cells for the other two data sets. For all five data sets, we normalized each sample to have 10,000 reads/UMIs per sample, followed by a log transformation (log(x + 1) where x is the read count). We note that such a procedure is the same as that in the original paper (Tabula Muris Consortium, 2020). We did not correct for batch effects as no substantial batch effects were identified in the original TMS paper (Tabula Muris Consortium, 2020).

DGE analysis

Request a detailed protocol

As shown in Supplementary file 1, we considered 76 tissue-cell types in 23 tissues with more than 100 cells in both young (3 m) and old (18 m, 24 m) age groups for the TMS FACS data; 26 tissue-cell types in 11 tissues with more than 500 cells in both young (1 m, 3 m) and old (18 m, 21 m, 24 m, 30 m) age groups for the TMS droplet data; and all 17 tissues for the bulk data (Schaum et al., 2020). We required more cells for the TMS droplet data than the TMS FACS data because the droplet data has a much lower sequencing depth (6000 UMIs per cell, as compared to 0.85 million reads per cell for the FACS data). Also, we did not focus on the TMS droplet data in the main results due to its limited tissue and cell type coverage.

We performed a DGE analysis for cells in each tissue-cell type separately. In the DGE analysis for a tissue-cell type, all cells in the tissue-cell type were treated as samples, and a separate test was performed for each gene with the observations being the expressions of the gene across the cells. We identified genes significantly related to aging using a linear model treating age as a continuous variable while controlling for sex, namely,

(1) geneexpressionage+sex.

Since the tests were performed at a cell level for each tissue-cell type, the cell numbers will only affect the detection power but will not bias the result. Therefore, it is not a confounding factor and was not controlled for. We used the MAST package (Finak et al., 2015) (version 1.12.0) in R to perform the DGE analysis. The zero counts in the scRNA-seq data were handled by the MAST package and we do not observe other types of missing data. We did not control for the cellular detection rate (Finak et al., 2015) (CDR, corresponding to the number of expressed genes in a cell) because we found that in our data, CDR is positively correlated with age and negatively correlated with technical covariates such as sequencing depth and the number of detected ERCC spike-ins, both when considering all the cells or when stratified by sex (Figure 1—figure supplement 1). As a result, controlling for CDR may remove genuine aging effects. We note that CDR is defined as the number of expressed genes in a cell, which is a fundamental quantity of the data set and is not specific to the MAST package. Therefore, such an observation does not imply a potential issue of MAST. Nonetheless, we found that the age coefficients, estimated with and without CDR correction, were highly correlated (0.89 for the FACS data and 0.93 for the droplet data, Figure 1—figure supplement 6), ruling out the possibility that CDR correction would significantly alter the result.

We used the Benjamini–Hochberg (FDR) procedure (Benjamini and Hochberg, 1995) to control for multiple comparisons, where the number of comparisons corresponds to the number of genes in the tissue-cell type. We applied an FDR threshold of 0.01 and an age coefficient threshold of 0.005 (in the unit of log fold change per month, corresponding to around 10% fold change from 3 m to 24 m) for detecting genes significantly related to aging.

Global aging genes

Request a detailed protocol

We selected a gene as a GAG if it is significantly related to aging in more than 50% of weighted tissue-cell types. Here, the tissue-cell type weights are inversely proportional to the number of cell types in the tissue, in order to ensure equal representation of tissues.

For the overlap between GAGs and the genes known in aging-related diseases, we considered the top 25 aging-related diseases and obtained their related genes from the Human Disease Database (MalaCards) (Rappaport et al., 2013; Rappaport et al., 2014; Rappaport et al., 2017). We then converted the human genes to the corresponding mouse orthologs using g:Profiler (Raudvere et al., 2019) (version 1.2.2). The p-values quantifying the significance of the overlap were computed using Fisher’s exact tests.

Pathway enrichment analysis

Request a detailed protocol

We used g:Profiler (Raudvere et al., 2019) to perform GO biological pathway enrichment analysis. We considered biological pathways with FDR smaller than 0.01. We used Gene Set Enrichment Analysis (GSEA MGSig Database) (Subramanian et al., 2005; Liberzon et al., 2011) to perform the KEGG pathway analysis. We filtered for mouse genes and considered biological pathways with FDR smaller than 0.05. We also used the IPA software (Krämer et al., 2014) to perform canonical pathway analysis (Figure 2—figure supplement 4). For Figure 2—figure supplement 4A, we used an FDR threshold of 1e-5 and a z-score threshold of 0.5.

GAG score

Request a detailed protocol

Given a set of GAGs (e.g., the FACS GAGs), the cell-wise GAG score for cell i is computed as:

  1. Compute the raw GAG score as the average expression of the upregulated GAGs (significantly related to aging in >50% of weighted tissue-cell types and upregulated in >80% of weighted tissue-cell types) minus the average expression of the downregulated GAGs (significantly related to aging in >50% of weighted tissue-cell types and downregulated in >80% of weighted tissue-cell types), That is,

    RawGAGscorei=mean(upregulatedGAGs)imean(downregulatedGAGs)i
  2. Following the recipe of DeTomaso et al., 2019, z-normalize the raw GAG score using the expected mean and variance of a random set of genes with the same number of up/downregulated genes:

    (3) GAGscorei=RawGAGscorei/[stdi1/nup+1/ndown],
  • where stdi is the standard deviation of the gene expression of cell i, nup is the number of upregulated GAGs, and ndown is the number of downregulated GAGs.

For robustness consideration, we checked the expression levels of the genes used for computing the GAG score and found that there are no extreme values. Therefore, the GAG score is unlikely to be dominated by a few highly expressed genes. We also considered another version of GAG score where each gene is weighted by its expression range. We found the two versions produce highly correlated results. Please see Figure 3—figure supplement 2A B for more details.

For estimating the GAG score effects, we model the GAG score of a cell i as being linearly dependent of the chronological age, sex, and tissue-cell type of the cell, namely,

(4) GAGscoreiagei+sexi+jtissue-cell typeij

Here, agei is the age of the animal (in months) that cell i comes from, sexi is one if cell i comes from a male and zero otherwise, and tissue-cell typeij is one if cell i belongs to tissue-cell type j and zero otherwise. We do not include the intercept term because all binary coded tissue-cell types sum up to one. Finally, we further center both response and explanatory variables and perform an ordinary least square regression to estimate the GAG score effects for the age, sex, and each tissue-cell type.

Meta-analyses in Figure 3B and C and Figure 3—figure supplement 1E F were performed assuming a random effect model (Riley et al., 2011). For comparisons of the tissue-cell GAG score effects between data sets, namely those in Figure 3D and Figure 3—figure supplement 1G, we used all tissue-cell types instead of restricting to the 76 TMS FACS tissue-cell types and 26 TMS droplet tissue-cell types. This increased the number of overlapping tissue-cell types between data sets.

Category-specific aging genes

Request a detailed protocol

We considered identifying functional-category-specific genes, cell type-specific genes, tissue-specific genes, and tissue-cell type-specific genes. For a set of tissue-cell types, for example the set of all immune tissue-cell types, the genes specific to the set (or set-specific genes) are selected as follows. For each gene, we first estimate its within-set meta age coefficient by meta-analyzing the age coefficients of all tissue-cell types in the set assuming a random effect model (Riley et al., 2011). Specifically, it is done by assuming that there is a meta age coefficient for the set of tissue-types, and the age coefficient for each tissue-cell type in the set is a random variable whose mean is equal to the meta age coefficient of the set. Similarly, we estimate the outside-set meta age coefficient by meta-analyzing all tissue-cell types outside the set. Then, we define the set-specific genes to be genes whose:

  1. within-set meta age coefficient is significantly different from its outside-set meta age coefficient (FDR < 0.01);

  2. within-set meta age coefficient is large enough (absolute value > 0.005);

  3. outside-set meta age coefficient is not significantly different from 0 (FDR > 0.01).

The p-values are computed based on the mean and the standard error assuming a normal distribution, and FDR is computed with respect to all genes.

Code availability

Request a detailed protocol

The code for reproducing all results is at https://github.com/czbiohub/tabula-muris-senis/tree/master/2_aging_signature (Zhang and Pisco, 2021; copy archived at swh:1:rev:0fd2ee501f4f3bd0e691b6071aee2c9286f1cf92).

Data availability

All data can be downloaded at https://figshare.com/articles/dataset/tms_gene_data_rv1/12827615.

The following previously published data sets were used

References

    1. Koster MI
    (2009) Making an epidermis
    Annals of the New York Academy of Sciences 1170:7–10.
    https://doi.org/10.1111/j.1749-6632.2009.04363.x

Decision letter

  1. Jing-Dong Jackie Han
    Reviewing Editor; Chinese Academy of Sciences, China
  2. Jessica K Tyler
    Senior Editor; Weill Cornell Medicine, United States
  3. Lei Hou
    Reviewer; Massachusetts Institute of Technology, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This work presents a highly refined and validated global aging (GAG) score, as well as establishing category-specific murine aging genes, which together provide a comprehensive angle to understand aging at different scales. The GAG score enables capture of tissue-cell-type specific effects, setting more focus on the tissue- and cell type-specific aging status, and enables assessment of the general aging status.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Mouse Aging Cell Atlas Analysis Reveals Global and Cell Type Specific Aging Signatures" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Lei Hou (Reviewer #1).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

Reviewer #1:

The proposed studies aims to in-depth analyses of aging transcriptomic signatures in mouse both at tissue-cell type and global level from single-cell data. It is an important perspective to understand aging not only for specific tissue-cell type but also the differences among them. One of the main contributions is a novel aging score metric, based on which aging process is compared across tissue-cell type pairs. However, I have several concerns about this aging score and would prefer a revision from the authors.

1. I am confused about the method you used to calculate the aging score.

In main text, the aging score is calculated for each cell, then averaged at tissue-cell type level, and regress out age and sex; while in method for each cell, after adjusting the background, age and sex are regressed out from the raw aging score, then the scores are averaged at tissue-cell type level.

2. For either case, the name of the aging score is misleading, as the residuals after regressing out age and sex, it is not a predictor of aging. Instead, it represents how a specific tissue-cell type is off from the average aging effect across all conditions.

3. If it is actually what authors would love to show, the random effect in each tissue-cell type should be taken into consideration: for cells from different tissue-cell type, their intercepts of the regression model exp~ age+sex+ age*sex could be different, and thus a linear mixed model should be more appropriate in this case. It may explain why in supplementary Figure 9, aging scores from immune cells compared to other cell types of this study show even weaker correlation with that based on blood samples from Peters' work (pvalue 5.4e-4, 0.08, compared to 1.8 e-5).

4. For comparing aging effects purpose, I don't understand why aging scores of cells from the young time point should be included in the final aging score for each tissue-cell type. Firstly, it doesn't represent the aging effect if you also include scores from young cells; Secondly, the scores for each tissue-cell type may be confounded by different percentages of young cells.

Reviewer #2:

In the manuscript titled "Mouse Aging Cell Atlas Analysis Reveals Global and Cell Type Specific Aging Signatures", Zhang and et al. systematically explored the aging-related genes in 76 tissue-cell types from 23 tissues in 10 male and 6 female mice from the Tabula Muris Senis single-cell transcriptomic dataset. The authors used a linear regression model to identify a set of up-/down-regulated aging-related genes, found a general down-regulation of gene expression in most tissue-cell types and revealed sets of aging regulated genes that shared by different tissue-cell types or that specifically enriched in some tissue-cell types. The authors further leveraged the average expression difference in the up- and down-regulated global aging genes, and proposed an aging score defined as a correlated factor that determines the change in different tissue-cell types without effects of age and sex. The manuscript is well organized, the results presented are impressing and the conclusions are pretty attractive. However, the data analyses need substantial revisions to draw any convincing conclusions, specifically, I have a list of concerns as below:

1. General concerns:

a. The sample size may be insufficient (10 males, 6 females) to build linear regression model to determine reliable correlations between certain gene expression with the two factors of age and sex in single-cell level.

b. Since the TMS datasets are curated from a collaboration program with different labs, confounding batch effects should be explicitly evaluated and resolved before any further analysis.

c. Though many platforms and technologies may have built-in strategies to deal with uncertainty of RNA copy number, normalization of single cell transcriptome are still highly recommended in the data processing to control technical inconsistency and calculation of DGE. The MAST package is aware of the problem and use the CDR calculation as alternatives of normalization, which, however, was omitted by the authors when using the package.

d. Beside the validation of the aging score, parallel analyses of droplets datasets should be performed in the same tissue-cell types to validate their findings.

2. Identification of aging-related genes:

a. In the linear model of DGE with age and sex, it is unclear what the observations were in the model, whether the statistical test performed in individual cells across different tissue-cell types; how to deal with missing data; and how to control the effect of cell numbers if the tests are performed in tissue-cell types level.

b. FDR calculation is highly depend on simultaneous consideration of comparisons in multiple tests. It is also unclear what the exact number of comparisons considered in each FDR test.

c. Figure 1B shows number comparison between up- and down-regulated genes related to age regardless of sex, however the authors should clarify whether the sex has effect on the result and a new figure of the comparison considering sex is highly recommended.

3. Tissue-cell level global aging markers:

a. The definition of global aging genes is based on an arbitrary cutoff of half of tissue-cell types without considering the data structure, tissue-cell similarities or proportions of cells determined in mice, which may be conceptually confusing if change happened as data accumulating and new tissue-cell types identified.

b. For those up- or down-regulated genes in specific tissue-cell types, a pickup top 20 genes is not convincing and it is necessary to evaluate the significance of the involvement of those genes in shared biological progresses or functions.

4. Aging score based on global aging genes:

a. The aging score is defined based simply on the potential determinants in the average expression changes between up- and down-regulated global aging genes without the effects from age and sex, which is not intuitively straightforward and confusing, given the ambiguous involvement in biological functions or regulations of the list of genes.

b. The authors claim that aging score is correlated with the cell turnover but failed to show any solid evidence.

5. Tissue-cell-specific aging genes:

a. The authors grouped tissue-cell types into 6 categories based on annotated functionality, and then identified over-represented genes in each category as tissue-cell-specific aging genes. I am wondering why not define the tissue-cell-specific aging genes in original tissue-cell types; the arbitrary grouping may be inaccurate and inevitably reduced the resolution of the tissue-cell specificity.

b. Again, the author failed to establish solid connection between these tissue-cell-specific aging genes and the tissue-cell specific functions or aging progress, which makes the result less reliable.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Mouse Aging Cell Atlas Analysis Reveals Global and Cell Type Specific Aging Signatures" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Jessica Tyler as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Lei Hou (Reviewer #1).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

You have made great efforts to refine and validate their global aging (GAG) score, as well as establish category-specific aging genes, which together provide a comprehensive angle to understand aging at different scales. Still, we have some concerns as below.

Essential changes:

1. GAG score aims to capture common aging states across all cell types. Though authors tried to validate it from different datasets, it would be more convincible if they could show how well GAG score is fitted for GAG score ~ chronological age + sex + tissue-cell-type with both scatter plot and proportion of explained. Alternatively, will it fit better for GAG score ~ chronological age + sex + tissue-cell-type + tissue-cell-type * chronological age, where tissue-cell-type could also affect the slope of age.

2. GAG score, due to its nature of capturing global pattern, may lose the power to identify the aging state of those cell types with the specific program associated with aging. It may be the reason for observations mentioned in line 135-138 and 146-148, specifically in Sup.Figure 11, for Brain.Non-Myeloid. neuron, liver.hepatocyte, mamary_glad.bascal cells, cells of 24 months even have a smaller GAG score than cells of 18 month. However, This may be potentially explained by catogery-specific aging genes later identified. Would tissue-cell type with more category-specific aging genes be more likely to show this heterogenous aging state pattern? Further, an accurate aging score for a specific cell type may be better consisting of two parts, global aging score, and specific aging score.

3. Last session of tissue-level section is not interesting to me, since differential genes at cell type are already accurately defined, and there's no reason to go back to bulk differential signal, which may be confounded by other factors such as cell-type proportions.

https://doi.org/10.7554/eLife.62293.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

The proposed studies aims to in-depth analyses of aging transcriptomic signatures in mouse both at tissue-cell type and global level from single-cell data. It is an important perspective to understand aging not only for specific tissue-cell type but also the differences among them. One of the main contributions is a novel aging score metric, based on which aging process is compared across tissue-cell type pairs. However, I have several concerns about this aging score and would prefer a revision from the authors.

1. I am confused about the method you used to calculate the aging score.

In main text, the aging score is calculated for each cell, then averaged at tissue-cell type level, and regress out age and sex; while in method for each cell, after adjusting the background, age and sex are regressed out from the raw aging score, then the scores are averaged at tissue-cell type level.

2. For either case, the name of the aging score is misleading, as the residuals after regressing out age and sex, it is not a predictor of aging. Instead, it represents how a specific tissue-cell type is off from the average aging effect across all conditions.

3. If it is actually what authors would love to show, the random effect in each tissue-cell type should be taken into consideration: for cells from different tissue-cell type, their intercepts of the regression model exp~ age+sex+ age*sex could be different, and thus a linear mixed model should be more appropriate in this case. It may explain why in supplementary Figure 9, aging scores from immune cells compared to other cell types of this study show even weaker correlation with that based on blood samples from Peters' work (pvalue 5.4e-4, 0.08, compared to 1.8 e-5).

Thank you for these suggestions. We agree with the reviewer and have substantially revised our aging score analysis. In the revised workflow, we first compute the aging score for each cell the same way as before, namely by taking the mean expression of the up-regulated global aging genes minus the mean expression of the down-regulated global aging genes. We refer to this quantity as the global aging gene (GAG) score as suggested by comment (2).

Next, as suggested by comment (3), instead of using the aging score residuals to study tissue-cell-specific aging effects, we formally define a fixed effect model to jointly model the effects of the chronological age, sex, and tissue-cell types on the GAG score:

GAG score ~ chronological age + sex + tissue-cell-type

Then we estimate the tissue-cell GAG score effects via regression (please see the GAG score subsection in Methods for more details). The tissue-cell GAG score effects capture how the GAG score of cells from the tissue-cell type deviates from the average GAG score of cells in the same age and sex group. Intuitively, a larger GAG score effect suggests that the corresponding cell type could be molecularly more sensitive to aging compared to other cells in the same animal.

We replicated all of our original findings using this estimated tissue-cell type GAG effects. First, we find that different tissue-cell types have heterogeneous effects on the GAG score, following the same ordering as in our original submission (Original Figure 3A, Revision Figure 3A). Second, stem cells and immune cells have higher effects while parenchymal cells have lower effects (Original Supp. Figure 8, Revision Figure 3B). Third, short-lived cells have higher GAG score effects (Original Figure 3B, Revision Figure 3C); and the estimates are consistent between different datasets (Original Figure 3C, Revision Figure 3D). These results demonstrate that we can reproduce and support our findings with this more principled statistical model of aging scores.

We removed the comparison to Peters et al. work since we do not focus on predicting the chronological age using the GAG score as was done in Peters et al. Instead, we focus on using the GAG score to contrast the tissue-cell specific aging effects. Moreover, Peters et al. primarily focused on blood cell types in their analyses, making it difficult to directly compare with our work which leverages a diverse range of tissue-cell types.

Instead, we included two additional data sets for comparison: Kimmel et al. GR 2019 on mouse aging in lung, kidney, and spleen, and Kowalczyk et al. GR 2015 on mouse aging in subtypes of HSCs. We further validated our GAG score analysis by showing:

1. The global aging genes identified in the Tabula Muris Senis (TMS) FACS data and the TMS droplet data significantly overlap with each other and both sets of genes significantly overlap with the 261 shared aging genes in Kimmel et al. (p=9e-48 between FACS and droplet, p=2e-105 between FACS and Kimmel, p=7e-69 between droplet and Kimmel).

2. Using GAGs identified in our FACS data, we compute the GAG score and further estimate the GAG score effects for all five data sets (TMS FACS data, TMS droplet, the bulk data from Schaum et al., Kimmel et al., Kowalczyk et al.). We found the chronological age effects on GAG score to be significantly positive in all five data sets, confirming that the GAG score captures the aging process.

3. The tissue-cell GAG score effects estimated from the five data sets are highly consistent.

4. Additionally, we showed that the GAG score is robust to different thresholds for selecting the global aging genes.

Please see the “Validating the GAG score on external data” section in the revised paper for more details.

4. For comparing aging effects purpose, I don't understand why aging scores of cells from the young time point should be included in the final aging score for each tissue-cell type. Firstly, it doesn't represent the aging effect if you also include scores from young cells; Secondly, the scores for each tissue-cell type may be confounded by different percentages of young cells.

Thank you for this question. To address the concerns raised by the reviewer, we performed the same analysis, namely the tissue-cell GAG effects estimation via regression, using only the old cells (18m/24m). We found that the result to be highly consistent. Please see Supp. Figure 13H. Here, each dot corresponds to a tissue-cell type, x-axis corresponds to the tissue-cell GAG score effects estimated using all cells, and y-axis corresponds to the GAG score effects estimated using only old cells.

Reviewer #2:

In the manuscript titled "Mouse Aging Cell Atlas Analysis Reveals Global and Cell Type Specific Aging Signatures", Zhang and et al. systematically explored the aging-related genes in 76 tissue-cell types from 23 tissues in 10 male and 6 female mice from the Tabula Muris Senis single-cell transcriptomic dataset. The authors used a linear regression model to identify a set of up-/down-regulated aging-related genes, found a general down-regulation of gene expression in most tissue-cell types and revealed sets of aging regulated genes that shared by different tissue-cell types or that specifically enriched in some tissue-cell types. The authors further leveraged the average expression difference in the up- and down-regulated global aging genes, and proposed an aging score defined as a correlated factor that determines the change in different tissue-cell types without effects of age and sex. The manuscript is well organized, the results presented are impressing and the conclusions are pretty attractive. However, the data analyses need substantial revisions to draw any convincing conclusions, specifically, I have a list of concerns as below:

1. General concerns:

a. The sample size may be insufficient (10 males, 6 females) to build linear regression model to determine reliable correlations between certain gene expression with the two factors of age and sex in single-cell level.

Thank you for the question. Our study is actually one of the largest single-cell RNA-seq aging studies, containing 110k annotated cells from 16 mice in the FACS data and 250k annotated cells from 23 mice in the droplet data. Other aging studies have relatively fewer cells or fewer animals, e.g., 55k cells from 7 mice in [Kimmel et al. Genome Research 2019] and 50k cells from 16 mice in [Ximerakis et al. Nat. Neurosci. 2019]. Moreover, the statistical power of our analyses is substantially enhanced due to the fact that we have a large number of sequenced cells from diverse tissue-cell types in each animal. In our study, we considered each cell as a sample, which enabled us to have sufficient samples for fitting the linear regression model. Because most single-cell RNA-seq studies have a limited number of animals, it is a common and accepted practice to treat each cell as a sample [Finak Genome Biology 2015]. We have added a clarification on this in the revision.

b. Since the TMS datasets are curated from a collaboration program with different labs, confounding batch effects should be explicitly evaluated and resolved before any further analysis.

Thank you for the question. The TMS data was centrally collected and processed at the CZ Biohub, and does not contain significant batch effects as verified in the original TMS data generation paper. Specifically, the original paper showed that cells from the same cell type would cluster together by unbiased whole-transcriptome Louvain clustering (Extended Data Figure 1g-h [The Tabula Muris Consortium Nature 2020], https://www.nature.com/articles/s41586-020-2496-1/figures/5).

c. Though many platforms and technologies may have built-in strategies to deal with uncertainty of RNA copy number, normalization of single cell transcriptome are still highly recommended in the data processing to control technical inconsistency and calculation of DGE. The MAST package is aware of the problem and use the CDR calculation as alternatives of normalization, which, however, was omitted by the authors when using the package.

Thank you for the question. We agree with the reviewer that it is important to control for technical factors. We performed the size factor normalization the same as the original TMS paper (please see the “Data preprocessing” subsection in Methods).

We decided that it is better to not adjust for CDR because our analysis shows that CDR is positively correlated with age and negatively correlated with technical covariates such as sequencing depth and the number of detected ERCC spike-ins, both when considering all cells or considering each sex separately (Supp. Figure 1). As a result, controlling for CDR may remove genuine aging effects. This is why we presented the main results without CDR adjustment.

We also performed parallel analyses after adjusting for CDR and found that the age coefficients estimated with and without CDR correction were highly correlated (0.89 for the TMS FACS data and 0.93 for the TMS Droplet data, Supp. Figure 2). This demonstrates that our results are robust to CDR correction.

d. Beside the validation of the aging score, parallel analyses of droplets datasets should be performed in the same tissue-cell types to validate their findings.

Thank you for this helpful suggestion. We have added parallel experiments for the droplet data as validations, including DGE analysis (Supp. Figures 1,2,4), aging trajectory analysis (Supp. Figure 6), pathway enrichment analysis (Supp. Figures 10A-B), all GAG score (global aging gene score) analysis (the “Validating the GAG score on external data” section in the main paper), and tissue-level validations (Supp. Figures 17-19). We found the results are consistent and added comments in the corresponding places in the paper.

2. Identification of aging-related genes:

a. In the linear model of DGE with age and sex, it is unclear what the observations were in the model, whether the statistical test performed in individual cells across different tissue-cell types; how to deal with missing data; and how to control the effect of cell numbers if the tests are performed in tissue-cell types level.

Thanks for the questions; we apologize that this was not clear in the original submission. We performed a DGE analysis for cells in each tissue-cell type separately. The samples correspond to all cells in the tissue-cell type and the observations correspond to the expressions of the gene across the cells. The zero counts in scRNA-seq were handled by the MAST package and we do not observe other types of missing data. Since the tests were performed at a cell level for each tissue-cell type, the cell numbers will only affect the detection power for each tissue-cell type.

Therefore, it is not a confounding factor and does not need to be controlled. We have clarified these in the revision.

b. FDR calculation is highly depend on simultaneous consideration of comparisons in multiple tests. It is also unclear what the exact number of comparisons considered in each FDR test.

The number of comparisons correspond to the number of genes in the tissue-cell types. We have clarified this in the revision.

c. Figure 1B shows number comparison between up- and down-regulated genes related to age regardless of sex, however the authors should clarify whether the sex has effect on the result and a new figure of the comparison considering sex is highly recommended.

Thanks for the good suggestion. We also performed the same DGE analysis using only cells from one sex or using cells from a subset of age groups (3m/18m). Please see Supp. Figures 3-4 for the new results.

Specifically, Figure 1B showed that most tissue-cell types have more down-regulated aging-related genes than up-regulated aging-related genes. We found that such a pattern was mostly driven by 24m mice and was not specific to one sex. We also observed a similar pattern on the droplet data.

3. Tissue-cell level global aging markers:

a. the definition of global aging genes is based on an arbitrary cutoff of half of tissue-cell types without considering the data structure, tissue-cell similarities or proportions of cells determined in mice, which may be conceptually confusing if change happened as data accumulating and new tissue-cell types identified.

We have improved the analyses to better account for the data structures that you mentioned. We instead use the weighted tissue-cell types to determine the global aging genes, where the tissue-cell weights are inversely proportional to the number of cell types in the tissue to ensure equal representation of different tissues. Please see the “Global aging genes” subsection in Methods for more details.

We also performed a set of perturbation analysis to show that our results are not sensitive to variations of the global aging gene selection rules. Specifically, we have showed that:

1. Global aging genes selected using different significance thresholds give similar GO enrichment analysis results (Supp. Figure 10C).

2. Global aging genes selected using different significance thresholds and directionality thresholds give similar aging scores (i.e., the GAG scores, Supp. Figures 13A-D).

b. For those up- or down-regulated genes in specific tissue-cell types, a pickup top 20 genes is not convincing and it is necessary to evaluate the significance of the involvement of those genes in shared biological progresses or functions.

In the initial submission, we performed a canonical pathway enrichment analysis for global aging genes using the IPA software (previous Figure 2D, revision Supp. Figure 9B-C). In the revision, we also added a GO biological process pathway enrichment analysis using g:Profiler (Figure 2D) and performed KEGG pathway analysis using GSEA (GSEA MGSig Database, Supp. Figure 9A). The GO analysis showed that the global aging genes are significantly associated with apoptosis, translation, biosynthesis, metabolism, and cellular organization (each corresponding to multiple pathways with FDR<1e-4). The result is supported by the canonical pathway analysis and the KEGG pathway analysis. These biological processes are highly relevant to the aging process and are shared across most cell types, consistent with the intuition that GAGs represent the global aging process across tissue-cell types. In addition, the IPA canonical pathway analysis also highlighted the mTOR pathway, which is known to be highly relevant to aging. Based on these analyses, we have improved the discussion of the shared biological processes and function of aging related genes in the “Bimodal effects of aging and global aging genes” section of the main paper.

4. Aging score based on global aging genes:

a. The aging score is defined based simply on the potential determinants in the average expression changes between up- and down-regulated global aging genes without the effects from age and sex, which is not intuitively straightforward and confusing, given the ambiguous involvement in biological functions or regulations of the list of genes.

Thank you for the question. As described in the response to the previous comment (3b), we have shown that the global aging genes are associated with aging-related biological processes through three sets of gene set enrichment analysis.

Moreover, to make the analysis more conceptually clear, we have defined a fixed effect model to jointly model the effects of the chronological age, sex, and tissue-cell types on the global aging gene (GAG) score:

GAG score ~ chronological age + sex + tissue-cell-type

Then we estimate the tissue-cell GAG score effects via regression (please see the GAG score subsection in Methods for more details). The tissue-cell GAG score effects capture how the GAG score of cells from the tissue-cell type deviates from the average GAG score of cells in the same age and sex group. Intuitively, a larger GAG score effect suggests that the corresponding cell type could be molecularly more sensitive to aging compared to other cells in the same animal.

We replicated all of our original findings using this estimated tissue-cell type GAG effects. First, we find that different tissue-cell types have heterogeneous effects on the GAG score, following the same ordering as in our original submission (Original Figure 3A, Revision Figure 3A). Second, stem cells and immune cells have higher effects while parenchymal cells have lower effects (Original Supp. Figure 8, Revision Figure 3B). Third, short-lived cells have higher GAG score effects (Original Figure 3B, Revision Figure 3C); and the estimates are consistent between different datasets (Original Figure 3C, Revision Figure 3D). These results demonstrate that we can reproduce and support our findings with this more principled statistical model of aging scores.

As validations, we added two additional data sets for comparison: Kimmel et al. GR 2019 on mouse aging in lung, kidney, and spleen, Kowalczyk et al. GR 2015 on mouse aging in subtypes of HSCs. We validate our GAG score analysis by showing:

1. The global aging genes identified in the Tabula Muris Senis (TMS) FACS data and the TMS droplet data significantly overlap with each other and both sets of genes significantly overlap with the 261 shared aging genes in Kimmel et al. (p=9e-48 between FACS and droplet, p=2e-105 between FACS and Kimmel, p=7e-69 between droplet and Kimmel).

2. Using GAGs identified in our FACS data, we compute the GAG score and further estimate the GAG score effects for all five data sets (TMS FACS data, TMS droplet, the bulk data from Schaum et al., Kimmel et al., Kowalczyk et al.). We found the chronological age effects on GAG score are significantly positive in all five data sets, confirming that the GAG score tags the aging process.

3. The tissue-cell GAG score effects estimated from the five data sets are highly consistent.

4. Additionally, we showed that the GAG score is robust to different thresholds for selecting the global aging genes.

Please see the “Validating the GAG score on external data” section in the revised paper for more details.

b. The authors claim that aging score is correlated with the cell turnover but failed to show any solid evidence.

Instead of correlating the tissue-cell GAG score effects (corresponding to the tissue-cell aging score residual in the previous submission) and the cell turnover rate as done in the initial submission, we assigned a binary cell lifespan label to each tissue-cell type and compared the tissue-cell GAG score effects between the long-lived cells and short-lived cells. This procedure allows us to incorporate more cell types where the exact lifespan information is not available but are commonly considered as long-/short- lived cells. We found short-lived cells have significantly higher GAG score effects as compared to long-lived cells (p=1.68e-3, Figure 3C)

5. Tissue-cell-specific aging genes:

a. The authors grouped tissue-cell types into 6 categories based on annotated functionality, and then identified over-represented genes in each category as tissue-cell-specific aging genes. I am wondering why not define the tissue-cell-specific aging genes in original tissue-cell types; the arbitrary grouping may be inaccurate and inevitably reduced the resolution of the tissue-cell specificity.

Thanks for this helpful comment. As suggested by the reviewer, we further investigated aging-related genes that are specific to (1) a functional category, (2) a cell type, (3) a tissue, (4) a tissue-cell type. In order to do this, we designed a new method to identify category-specific genes based on meta-analysis (please see the “Category-specific aging genes” subsection in Methods for details).

1. For functional categories, genes specific to endothelial cells, immune cells, stem cells and stromal cells show strong specificity to the category they belong to and are enriched in highly relevant biological pathways. Genes specific to epithelial cells and parenchymal cells show a less specific pattern.

2. For cell types, we found genes specific to B cells, basal cells of epidermis, endothelial cells, macrophages, mesenchymal stem cells, mesenchymal stem cells of adipose, myeloid cells, and skeletal muscle satellite cells. We found that they are associated with relevant biological processes.

3. For tissues, we found genes specific to each tissue, but their functionalities are less interpretable.

4. We did not find any gene that has an aging signature specific to just a single tissue-cell type.

Please in the “Category-specific aging genes” section, Figure 4 and Supp. Figures 14-16 for more details.

b. Again, the author failed to establish solid connection between these tissue-cell-specific aging genes and the tissue-cell specific functions or aging progress, which makes the result less reliable.

Similar to the validation for global aging genes, we also performed GO biological pathway enrichment analysis for category specific genes. We found that functional-category specific genes and cell-type specific genes are associated with highly relevant biological processes. Please see Figure 4C and Supp. Figure 16B for more details.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Essential revisions:

1. GAG score aims to capture common aging states across all cell types. Though authors tried to validate it from different datasets, it would be more convincible if they could show how well GAG score is fitted for GAG score ~ chronological age + sex + tissue-cell-type with both scatter plot and proportion of explained. Alternatively, will it fit better for GAG score ~ chronological age + sex + tissue-cell-type + tissue-cell-type * chronological age, where tissue-cell-type could also affect the slope of age.

Thank you for this comment. As suggested by the reviewer, we added Figure 3—figure supplement 4C to report the proportion of variance explained by the original model and the alternative model with the interaction terms between age and each tissue-cell type. As shown in this new figure, the original model explains 60.2% of the GAG score variance in the TMS FACS data, indicating a reasonable fit. Adding the tissue-cell-type * chronological age interaction term in the model only slightly increased the model fit (explained variance 62.6%). In addition, we find the amount of improvement by the alternative model is similar across datasets. We added this result in the main paper.

It was difficult to visualize the multivariate model as scatter plots—for example the scatter plot of one of the covariates with the GAG score doesn’t capture the joint regression. Therefore, we did not include it.

2. GAG score, due to its nature of capturing global pattern, may lose the power to identify the aging state of those cell types with the specific program associated with aging. It may be the reason for observations mentioned in line 135-138 and 146-148, specifically in Sup.Figure 11, for Brain.Non-Myeloid. neuron, liver.hepatocyte, mamary_glad.bascal cells, cells of 24 months even have a smaller GAG score than cells of 18 month. However, This may be potentially explained by catogery-specific aging genes later identified. Would tissue-cell type with more category-specific aging genes be more likely to show this heterogenous aging state pattern? Further, an accurate aging score for a specific cell type may be better consisting of two parts, global aging score, and specific aging score.

Thank you for the comment. As you suggested, we investigated whether tissue-cell types with more category-specific aging genes are more likely to show heterogenous aging patterns, and the data does not show a consistent association. For example, parenchymal cells, including neurons and hepatocytes as mentioned in your comment, in general have a more heterogeneous aging process pattern. However, as shown in Figure 4B, there are relatively fewer genes specific to parenchymal cells as compared to other categories.

As a side point, the mamary_glad.basal cells as mentioned in your comment only have cells from two time points (3m and 18m), and do not exhibit a heterogeneous aging pattern, namely 24m cells having smaller GAG scores than 18m cells. We removed the 24m label in the corresponding figure legend (Figure 3—figure supplement 1B, mamary_glad.basal cell panel) to avoid confusion.

More generally, while the current data allows us to construct a global aging score (GAG score), it is more difficult to construct cell-type-specific aging scores, which requires cataloging cell types at a higher granularity. For example, an immune-specific aging score would require a comprehensive catalog of different types and states of immune cells in different tissues and organs. Overall, we identified much more global aging genes (330) compared to tissue-cell type specific aging genes (14 to 76). Comprehensive characterization of tissue-cell type specific aging scores may be out of the scope of our current data, which is why we focus on the global aging score. We agree with you that this is an exciting direction of future work and we added a discussion on this in the Discussion section.

3. Last session of tissue-level section is not interesting to me, since differential genes at cell type are already accurately defined, and there's no reason to go back to bulk differential signal, which may be confounded by other factors such as cell-type proportions.

Thanks for the comment. As suggested by the reviewer, we have moved the tissue-level validation section to the supplementary material and added the following remark in the main paper.

https://doi.org/10.7554/eLife.62293.sa2

Article and author information

Author details

  1. Martin Jinye Zhang

    1. Department of Electrical Engineering, Stanford University, Palo Alto, United States
    2. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, United States
    3. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    jinyezhang@hsph.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0006-2466
  2. Angela Oliveira Pisco

    Chan-Zuckerberg Biohub, San Francisco, United States
    Contribution
    Conceptualization, Data curation, Supervision, Investigation, Writing - original draft, Writing - review and editing
    For correspondence
    angela.pisco@czbiohub.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0142-2355
  3. Spyros Darmanis

    Chan-Zuckerberg Biohub, San Francisco, United States
    Contribution
    Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  4. James Zou

    1. Department of Electrical Engineering, Stanford University, Palo Alto, United States
    2. Chan-Zuckerberg Biohub, San Francisco, United States
    3. Department of Biomedical Data Science, Stanford University, Palo Alto, United States
    Contribution
    Supervision, Funding acquisition, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    jamesyzou@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8880-4764

Funding

Chan-Zuckberg Biohu

  • James Zou

National Science Foundation (CCF 1763191)

  • James Zou

National Institutes of Health (R21 MD012867-01)

  • James Zou

National Institutes of Health (P30AG059307)

  • James Zou

Silicon Valley Community Foundation

  • James Zou

National Institutes of Health (R01 MH115676)

  • Martin Jinye Zhang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to thank S Quake, R Sinha, R Sit, J Cool, B van de Geijn, H Shi, and X Xu for feedback. MJZ and JZ are supported by NSF CCF 1763191, NIH R21 MD012867-01, NIH P30AG059307, and grants from the Silicon Valley Foundation and the Chan-Zuckerberg Initiative. MJZ is also supported by NIH R01 MH115676.

Senior Editor

  1. Jessica K Tyler, Weill Cornell Medicine, United States

Reviewing Editor

  1. Jing-Dong Jackie Han, Chinese Academy of Sciences, China

Reviewer

  1. Lei Hou, Massachusetts Institute of Technology, United States

Publication history

  1. Received: August 20, 2020
  2. Accepted: March 29, 2021
  3. Accepted Manuscript published: April 13, 2021 (version 1)
  4. Version of Record published: April 14, 2021 (version 2)

Copyright

© 2021, Zhang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,770
    Page views
  • 514
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Chromosomes and Gene Expression
    2. Computational and Systems Biology
    Kazuhiro Maeshima, Shiori Iida
    Insight

    DNA loops can be formed by a mechanism in which the cohesin complex pulls DNA strands through its ring structure using biased Brownian motion.

    1. Chromosomes and Gene Expression
    2. Computational and Systems Biology
    Torahiko L Higashi et al.
    Research Article

    The cohesin complex topologically encircles DNA to promote sister chromatid cohesion. Alternatively, cohesin extrudes DNA loops, thought to reflect chromatin domain formation. Here, we propose a structure-based model explaining both activities. ATP and DNA binding promote cohesin conformational changes that guide DNA through a kleisin N-gate into a DNA gripping state. Two HEAT-repeat DNA binding modules, associated with cohesin’s heads and hinge, are now juxtaposed. Gripping state disassembly, following ATP hydrolysis, triggers unidirectional hinge module movement, which completes topological DNA entry by directing DNA through the ATPase head gate. If head gate passage fails, hinge module motion creates a Brownian ratchet that, instead, drives loop extrusion. Molecular-mechanical simulations of gripping state formation and resolution cycles recapitulate experimentally observed DNA loop extrusion characteristics. Our model extends to asymmetric and symmetric loop extrusion, as well as z-loop formation. Loop extrusion by biased Brownian motion has important implications for chromosomal cohesin function.