Macroscopic Analyses of RNA-Seq Data to Reveal Chromatin Modifications in Aging and Disease

Achal Mahajan; Francesca Ratti; Ban Wang; Hana El-Samad; James H Kaufman; Vishrawas Gopalakrishnan

doi:10.7554/eLife.107396.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Yamini Dalal
National Cancer Institute, Bethesda, United States of America
Senior Editor
Yamini Dalal
National Cancer Institute, Bethesda, United States of America

Reviewer #1 (Public review):

Summary:

In this manuscript, Mahajan et.al introduce two innovative macroscopic measures-intrachromosomal gene correlation length (𝓁∗) and transition energy barrier-to investigate chromatin structural dynamics associated with aging and age-related syndromes such as Hutchinson-Gilford Progeria Syndrome (HGPS) and Werner Syndrome (WRN). The authors propose a compelling systems-level approach that complements traditional biomarker-driven analyses, offering a more holistic and quantitative framework to assess genome-wide dysregulation. The concept of 𝓁∗ as a spatial correlation metric to capture chromatin disorganization is novel and well-motivated. The use of autocorrelation on distance-binned gene expression adds depth to the interpretation of chromatin state shifts. The energy landscape framework for gene state transitions is an elegant abstraction, with the notion of "irreversibility" providing a thermodynamic interpretation of transcriptional dysregulation. The application to multiple datasets (Fleischer, Line-1) and pathological states adds robustness to the analysis. The consistency of chromosome 6 (and to some extent chromosomes 16 and X) emerging as hotspots aligns well with known histone cluster localization and disease-relevant pathways. The manuscript does an excellent job of integrating transcriptomic trends with known epigenetic hallmarks of aging, and the proposed metrics can be used in place of traditional techniques like PCA in capturing structural transcriptome features. However, a direct correlation with ATACseq/ HiC data with the present analysis will be more informative.

Strengths:

Novel inclusion of statistical metrics that can help in systems-level studies in aging and chromatin biology.

Weaknesses:

(1) In the manuscript, the authors mention "While it may be intuitive to assume that highly expressed genes originate from euchromatin, this cannot be conclusively stated as a complete representation of euchromatin genes, nor can LAT be definitively linked to heterochromatin". What percentage of LAT can be linked to heterochromatin? What is the distribution of LAT and HAT in the euchromatin?

(2) In Figure 2, the authors observe "that the signal from the HAT class is the stronger between two and the signal from the LAT class, being mostly uniform, can be constituted as background noise." Is this biologically relevant? Are low-abundance transcripts constitutively expressed? The authors should discuss this in the Results section.

(3) The authors make a very interesting observation from Figure 3: that ASO-treated LINE-1 appears to be more effective in restoring HGPS cell lines closer to wild-type compared to WRN.. This can be explained by the difference in the basal activity of L1 elements in the HGPS vs WRN cell types. The authors should comment on this.

(4) The authors report that "from the results on Fleicher dataset is the magnitude of the difference in similarity distance is more pronounced in 𝓁∗ than in gene expression." Does this mean that the alterations in gene distance and chromatin organization do not result in gene expression change during aging?

(5) "In Fleischer dataset, as evident in Figure 4a, although changes in the heterochromatin are not identical for all chromosomes shown by the different degrees of variation of 𝓁∗ in each age group." The authors should present a comprehensive map of each chromosome change in gene distance to better explain the above statement.

(6) While trends in 𝓁∗ are discussed at both global and chromosome-specific levels, stronger statistical testing (e.g., permutation tests, bootstrapping) would lend greater confidence, especially when differences between age groups or treatment states are modest.

(7) While the transition energy barrier is an insightful conceptual addition, further clarification on the mathematical formulation and its physical assumptions (e.g., energy normalization, symmetry conditions) would improve interpretability. Also, in between Figures 7 and 8, the authors first compare the energy barrier of Chromosome 1 and then for all other chromosomes. What is the rationale for only analyzing chromosome 1? How many HAT or LAT are present there?

https://doi.org/10.7554/eLife.107396.1.sa2

Reviewer #2 (Public review):

The authors report that intra-chromosomal gene correlation length (spatial correlations in gene expressions along the chromosome) serves as a proxy of chromatin structure and hence gene expression. They further explore changes in these metrics with aging. These are interesting and important findings. However, there are fundamental problems at this time.

(1) The basic method lacks validation. There is no validation of the method by approaches that directly measure chromatin structure, for example ATAC-seq, ChIP-seq, or CUT n RUN.

(2) There is no validation by interventions that directly probe chromatin structure, such as HDAC inhibitors. The authors employ datasets with knockdown of LINE-1 for validation. However, this is not a specific chromatin intervention.

(3) There is no statistical analysis, e.g., in Figures 4 and 5.

(4) The authors state, "in Figure 4a changes in the heterochromatin are not identical for all chromosomes shown...." I do not see the data for individual chromosomes.

(5) In comparisons of WT vs HGPS NT or HGPS SCR (Figure S6), is this a fair comparison? The WT and HGPS are presumably from different human donors, so they have genetic and epigenetic differences unrelated to HGPS.

https://doi.org/10.7554/eLife.107396.1.sa1

Author response:

Reviewer #1 (Public review):

Summary:

In this manuscript, Mahajan et. al. introduce two innovative macroscopic measures-intrachromosomal gene correlation length (𝓁∗) and transition energy barrier-to investigate chromatin structural dynamics associated with aging and age-related syndromes such as Hutchinson-Gilford Progeria Syndrome (HGPS) and Werner Syndrome (WRN). The authors propose a compelling systems-level approach that complements traditional biomarker-driven analyses, offering a more holistic and quantitative framework to assess genome-wide dysregulation. The concept of 𝓁∗ as a spatial correlation metric to capture chromatin disorganization is novel and well-motivated. The use of autocorrelation on distance-binned gene expression adds depth to the interpretation of chromatin state shifts. The energy landscape framework for gene state transitions is an elegant abstraction, with the notion of "irreversibility" providing a thermodynamic interpretation of transcriptional dysregulation. The application to multiple datasets (Fleischer, Line-1) and pathological states adds robustness to the analysis. The consistency of chromosome 6 (and to some extent chromosomes 16 and X) emerging as hotspots aligns well with known histone cluster localization and disease-relevant pathways. The manuscript does an excellent job of integrating transcriptomic trends with known epigenetic hallmarks of aging, and the proposed metrics can be used in place of traditional techniques like PCA in capturing structural transcriptome features. However, a direct correlation with ATACseq/HiC data with the present analysis will be more informative.

(1) In the manuscript, the authors mention "While it may be intuitive to assume that highly expressed genes originate from euchromatin, this cannot be conclusively stated as a complete representation of euchromatin genes, nor can LAT be definitively linked to heterochromatin". What percentage of LAT can be linked to heterochromatin? What is the distribution of LAT and HAT in the euchromatin?

Thank you for this insightful question. In the revision we will add chromatin state annotations using ChromHMM to identify overlap between HAT/LAT and corresponding chromatin state. This should provide the specific percentages and distributions you requested.

We would like to take this opportunity to clarify that based on the plots Fig S1, and differential gene expressions, HAT is most likely a subset of euchromatin and LAT may contain both euchromatin and heterochromatin. The HAT/LAT cutoff occurs around the knee point in the log-log plot (Figure S1), where the linear portion indicates scale-invariant behavior with similar relative changes across expression ranks. The non-linear portion represents departure from power-law scaling, where low-expression genes exhibit sharper decline than expected. This suggests potential biological mechanisms such as chromatin silencing, detection limits, or technical artifacts related to sequencing depth.

We will provide detailed chromatin state analysis in the revision. For reference, HAT gene lists per chromosome are available in our GitHub repository at: https://github.com/altoslabs/papers-2025-rnaseq-chrom-aging/tree/main/data/Preprocessed_dat a under //chromosome_{}/data_hi.

(2) In Figure 2, the authors observe "that the signal from the HAT class is the stronger between two and the signal from the LAT class, being mostly uniform, can be constituted as background noise." Is this biologically relevant? Are low-abundance transcripts constitutively expressed? The authors should discuss this in the Results section.

We apologize for the confusion arising from the usage of the term “background noise”. We agree that the distinction between high-abundance transcripts (HATs) and low-abundance transcripts (LATs) deserves more explicit discussion in the Results.

Our intention is to say that HAT has a higher signal-to-noise ratio (SNR) compared to LAT. This is coming from the power law graph of FigS1. Our intention is to state that the HAT class provides a strong, robust signal, consistent across chromosomes and the LAT class exhibits lower SNR and a more uniform background-like distribution in the context of the problem we are solving and not rather a generic biological statement. The experiment result that led to this statement is presented in FigS3. This does not imply that low-abundance transcripts lack biological relevance, but rather that they contribute less to the spatial organization patterns we measure.

(3) The authors make a very interesting observation from Figure 3: that ASO-treated LINE-1 appears to be more effective in restoring HGPS cell lines closer to wild-type compared to WRN.. This can be explained by the difference in the basal activity of L1 elements in the HGPS vs WRN cell types. The authors should comment on this.

We thank the reviewer for this incisive biological observation. While the differential effectiveness of ASO-treated LINE-1 in HGPS versus WRN cell lines is indeed an interesting phenomenon that may relate to basal L1 activity differences, this biological mechanism falls outside the scope of our current study.

Our paper focuses on demonstrating that the 𝓁∗ metric can sensitively detect chromatin structural changes that have been independently validated. We utilize the Della Valle et al. (2022) dataset specifically because it provides experimentally confirmed chromatin structural differences (Progeroid vs wild-type vs ASO-treated Progeriod), allowing us to validate that 𝓁∗ correlates with these established changes.

For detailed discussion of the biological mechanisms underlying differential LINE-1 ASO effectiveness between progeroid syndromes, we would direct readers to Della Valle et al. (2022) and related LINE-1 biology literature. Our contribution lies in demonstrating that 𝓁∗ can capture these chromatin organizational changes with enhanced sensitivity compared to traditional expression-based approaches. We are reluctant, without further experimentation, to venture into over-interpreting these results from a biology perspective.

(4) The authors report that "from the results on Fleischer dataset is the magnitude of the difference in similarity distance is more pronounced in 𝓁∗ than in gene expression." Does this mean that the alterations in gene distance and chromatin organization do not result in gene expression change during aging?

Thank you for this important clarification request. This observation, illustrated in Figure 3, highlights two key points: (1) 𝓁∗ shows similar trends to PCA analysis, and (2) 𝓁∗ demonstrates higher sensitivity than traditional gene expression analysis.

This enhanced sensitivity enables better discrimination between aging states, particularly in the Fleischer dataset representing natural aging where changes are more gradual. The higher sensitivity stems from 𝓁∗'s ability to capture transcriptional spatial organization through spatial autocorrelation, which can detect subtle organizational changes that may precede or accompany expression changes rather than replacing them.

We will clarify in the revision that chromatin organizational changes and gene expression changes are complementary rather than mutually exclusive phenomena during aging.

(5) "In Fleischer dataset, as evident in Figure 4a, although changes in the heterochromatin are not identical for all chromosomes shown by the different degrees of variation of 𝓁∗ in each age group." The authors should present a comprehensive map of each chromosome change in gene distance to better explain the above statement.

Thank you for the feedback. If we understand your comment correctly, we need to provide a chromosome-wise distribution for Fig3c. We will update the paper and the supplementary.

(6) While trends in 𝓁∗ are discussed at both global and chromosome-specific levels, stronger statistical testing (e.g., permutation tests, bootstrapping) would lend greater confidence, especially when differences between age groups or treatment states are modest.

Thank you for the helpful suggestion. In the revision, we will incorporate permutation-based significance testing by shuffling the gene annotation and count table to generate a null distribution for our 𝓁∗ calculation. This will allow us to more rigorously assess whether the observed differences across age groups or treatment states deviate from chance expectations and thereby lend greater statistical confidence to our findings.

(7) While the transition energy barrier is an insightful conceptual addition, further clarification on the mathematical formulation and its physical assumptions (e.g., energy normalization, symmetry conditions) would improve interpretability. Also, in between Figures 7 and 8, the authors first compare the energy barrier of Chromosome 1 and then for all other chromosomes.

What is the rationale for only analyzing chromosome 1? How many HAT or LAT are present there?

Regarding chromosome 1 focus: we initially presented chromosome 1 as a representative example, but we will include energy landscape analysis for all chromosomes in the supplementary materials

We use the same HATs that were extracted during 𝓁∗ for the energy landscape as well. The HAT details are present in the github repo, the link provided in response to 1st feedback.

The normalization of the energy barrier ensures comparability across chromosomes of different sizes and across samples with different absolute expression scales. Specifically, we normalize with respect to the total area under the two-dimensional energy landscape while using the thermal energy (k_B T) as a scaling factor to place transition energy barriers on the scale of thermal fluctuations. This is formally expressed as in Eq. (1).

The physical consequences of symmetry in the energy landscape are discussed in lines 472-491 of the manuscript, where we also introduce the concept of irreversibility. In brief, the chromatin energy landscape (Figure 8) is constructed by quantifying the energy contributions of genes that are upregulated (lower triangular matrix) and downregulated (upper triangular matrix) between two states. If the integrated energy contributions of upregulated and downregulated genes are equal, the landscape is symmetric, representing a thermodynamically reversible process, for example, nucleosome repositioning between euchromatic and heterochromatic regions without net gain or loss of nucleosomes. However, in cases where epigenetic modifications alter nucleosome density (e.g., disease states that reduce nucleosome numbers), the integrated energies are unequal, reflecting an irreversible energy cost. In this case, restoring chromatin requires additional energy input (e.g., to replace “missing” nucleosomes), which manifests as asymmetry in the landscape.

Reviewer #2 (Public review):

The authors report that intra-chromosomal gene correlation length (spatial correlations in gene expressions along the chromosome) serves as a proxy of chromatin structure and hence gene expression. They further explore changes in these metrics with aging. These are interesting and important findings. However, there are fundamental problems at this time.

(1) The basic method lacks validation. There is no validation of the method by approaches that directly measure chromatin structure, for example ATAC-seq, ChIP-seq, or CUT n RUN.

We appreciate the reviewer’s point that direct measurements such as ATAC-seq and ChIP-seq remain the gold standard for characterizing chromatin structure. Our method is designed to complement, not replace, these approaches by leveraging RNA-seq data to detect large-scale transcriptional patterns that correlate with chromatin dynamics.

We agree that integrating datasets with paired RNA-seq and chromatin accessibility assays would strengthen the manuscript and plan to include one such dataset in the revision.

Based on this feedback, we will also take the opportunity during revision to clarify and soften certain statements. Specifically, we will reposition ℓ∗ as a sensitive, computational proxy for detecting transcriptional signatures that are suggestive of chromatin structural changes. In other words, ℓ∗ provides an indirect window into chromatin dynamics through transcriptional spatial organization, allowing detection of patterns that may precede or accompany structural changes. Direct assays such as ATAC-seq or ChIP-seq remain essential for confirming the underlying physical modifications. To make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text.

We would like to take this opportunity to clarify why our initial version focused on the Della Valle and Fleischer datasets rather than including new paired datasets with direct chromatin measurements. The primary objective of our paper is to introduce two macroscopic RNA-seq–based measures, ℓ∗ and the energy landscape, that are designed to detect transcriptional signatures suggestive of chromatin structural changes in the context of aging and age-related diseases. These measures explicitly model transcriptional spatial organization and provide a sensitive, scalable way to analyze RNA-seq data in domains where direct chromatin assays may not be readily available.

The datasets we used (Della Valle et al., Fleischer et al.) have been rigorously validated and independently demonstrated differences in chromatin structure between conditions. Our goal was to show that ℓ∗ and the energy landscape align with and extend these established findings, offering a more sensitive measure of transcriptional spatial organization. Specifically, in the Della Valle dataset, chromatin structural differences between progeroid and healthy donors — and their partial rescue by LINE-1 ASO treatment — were experimentally confirmed, providing a strong foundation for testing whether our metrics reflect these known changes. Similarly, the Fleischer dataset captures natural, in vivo aging, which has also been linked to chromatin alterations in prior studies.

Thus, our approach builds on this well-established biological context rather than attempting to re-demonstrate these chromatin differences from scratch. Finally, we emphasize that our current focus is aging and age-related diseases. While the framework could potentially be applied to other chromatin modification contexts, we have not tested it outside this domain and do not claim general applicability at this stage.

(2) There is no validation by interventions that directly probe chromatin structure, such as HDAC inhibitors. The authors employ datasets with knockdown of LINE-1 for validation. However, this is not a specific chromatin intervention.

We request the reviewer to refer to our response to (1) as it includes the rationale behind the selection of LINE-1 and Fleischer dataset. We would also like to state that while the focus of Della Valle et al. was LINE-1 treated ASO to show rescue of progeroid samples, it also contains data for non-treated as well as healthy samples. Importantly, untreated progeroid samples show distinctly different chromatin structure compared to healthy samples, with substantial differences detectable by both PCA and our 𝓁∗ metric.

Our 𝓁∗ method provides additional interpretability by capturing transcriptional spatial organization, resulting in shorter correlation lengths for healthy patients and longer lengths for progeroid patients.

But as mentioned in our response to (1) we will try to add an additional dataset with paired rna-seq and one of ATAC, ChIP-seq or CUT n RUN in the revision

(3) There is no statistical analysis, e.g., in Figures 4 and 5.

We have provided statistical analysis for Fig 4 (lines 237-241). We will do a similar analysis for Fig. 5.

(4) The authors state, "in Figure 4a changes in the heterochromatin are not identical for all chromosomes shown...." I do not see the data for individual chromosomes.

The data for individual chromosomes is available in supplementary Fig. S11 – references at line 425. We will make this cross-reference clearer in the main text and consider whether some of this chromosome-specific information should be elevated to the main figures for better accessibility.

(5) In comparisons of WT vs HGPS NT or HGPS SCR (Figure S6), is this a fair comparison? The WT and HGPS are presumably from different human donors, so they have genetic and epigenetic differences unrelated to HGPS.

Figure S6 demonstrates that 𝓁∗ analysis identifies chromosome 6 as most affected, consistent with differential gene expression patterns.

Regarding donor differences in WT vs HGPS comparisons, we defer to the experimental design of Della Valle et al., which follows standard practices in progeroid research. Our review of the literature indicates that progeroid studies typically use either parent/child samples or different donor comparisons (as individuals cannot simultaneously represent both WT and HGPS states).

Importantly, the LINE-1 ASO treatment comparisons use the same cell lines, eliminating donor variability concerns. This experimental design allows us to validate that 𝓁∗ can detect rescue effects within genetically identical samples, supporting the method's sensitivity to chromatin structural changes

Reviewing Editor Comments:

You'll note that both reviewers were very thoughtful in their comments, and in principle are supportive and excited by the work. However, their evaluation of the strength of evidence diverged substantially. I'm inclined to suggest that finding a way to support the novel method with an alternative approach would greatly improve the impact of this work. I encourage you to consider a revision that provides such data, in the context of technology currently available to the field.

We sincerely thank the editor for their thoughtful and encouraging assessment of our work. We are grateful for their recognition of the novelty of our macroscopic measures (ℓ∗ and the transition energy barrier) and their potential to provide a systems-level understanding of chromatin structural dynamics in aging and age-related syndromes. In response to the editor’s suggestion for direct validation with chromatin accessibility data, we plan to integrate an additional dataset containing paired RNA-seq and ATAC-seq or related measurements in our revision. This will help strengthen the link between our RNA-seq–based metrics and direct chromatin assays. We have also clarified and softened the manuscript text to ensure it is clear that ℓ∗ serves as a complementary, computational proxy, not a replacement, for direct experimental approaches. Very specifically, to make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text. We thank the editor for the feedback. We have provided additional details in response to specific comments made by the reviewers.

https://doi.org/10.7554/eLife.107396.1.sa0

Macroscopic Analyses of RNA-Seq Data to Reveal Chromatin Modifications in Aging and Disease

Peer review process

Editors

Be the first to read new articles from eLife