voyAGEr: free web interface for the analysis of age-related gene expression alterations in human tissues

  1. Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal
  2. Sia Partners, 4 Rue Voltaire, 44000 Nantes, France
  3. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Bérénice Benayoun
    University of Southern California, Los Angeles, United States of America
  • Senior Editor
    Carlos Isales
    Augusta University, Augusta, United States of America

Reviewer #1 (Public Review):

This fascinating paper by A.L. Schneider et al. describes voyAGEr, a shiny-based interface for easy exploration of the GTEx dataset by non- or novice programmers. Importantly, voyAGEr is open source and available from github, which could greatly accelerate additional development and further uses of this interesting tool.

The authors developed a pipeline for modeling age-related changes in gene expression in the GTEx data called ShARP-LM, fitting a linear model for age, sex, and age&sex interaction terms. This pipeline underlies the later analyses that can be applied within voyAGEr. These analyses are labeled by tissue so that users can easily begin a query based on a tissue or a gene of possible interest.

voyAGEr implements many kinds of interesting R-based tools such as pathway overrepresentation analysis and gene co-expression module analysis, in a way that makes these approaches accessible to non-bioinformaticist aging researchers.

As the tidal wave of publicly available large, high-dimensional datasets such as transcriptomes continues to grow exponentially, the usefulness of tools such as voyAGEr will only increase. While test users may be able to imagine features or refinements they wish were already present, due to the open source approach they or anyone else including but not limited to the present authors can implement additional features in the future. I look forward to using this tool and to staying abreast of its future development.

Overall, this study describes a new tool of interest to the field. The manuscript is clearly written overall. The figures and supplementary information are all clear and all add to the manuscript.

Reviewer #2 (Public Review):

The purpose of this study is to develop a tool that serves as a starting point for investigating and uncovering genes and pathways associated with aging. The tool utilizes information from the GTEx public database, which contains post-mortem human data. It focuses on identifying age-related gene expression changes across different age range, biological sexes, and medical histories, with a focus on specific tissues.

Additionally, the authors envision the platform as continuously evolving, with ongoing development and expansion to include new data and features, ensuring it remains a cutting-edge resource for researchers studying aging.

# Strengths
voyAGEr presents a tool for exploring gene expression changes across multiple tissues in the context of aging. One of the main strengths of the tool is its intuitive and user-friendly interface, which allows for easy navigation and exploration of gene expression patterns for biologists. Users can explore changes in gene expression of single genes across multiple tissues, enabling them to identify genes of interest that can be further investigated.

A particularly noteworthy strength of the tool is its ability to show tissue-specific gene expression patterns. This feature is essential for elucidating the paradigm of tissue-specific asynchronous aging and provides a unique and valuable resource for the aging community.

Overall, the tool offers an entry point for further investigation of genes involved in aging, and its ability to show tissue-specific gene expression patterns provides a unique and valuable resource for the scientific community.

Lastly, the tool is accompanied by a clear and thorough tutorial that explains each of its functionalities and provides examples. The authors also acknowledge the limitations of the statistical inference tests used in the tool, which adds to its overall transparency.

# Weaknesses

## Underlying data analysis
In this tool/resource paper, it is crucial that the data used is up-to-date to provide the most comprehensive and relevant information to users. However, the authors utilized GTEx v7, which is an outdated (2016) version of the dataset. It is worth noting that GTEx v8 includes over 940 individuals, representing a 35% increase in individuals, and a 50% increase in the total number of samples. The authors should check the newer versions of GTEx and update the data.

The authors did not address any correction for batch effects or RNA integrity numbers, which are known to affect transcriptome profiles. For instance, our analysis of GTEx v8 Cortex tissue revealed that after filtering out lowly expressed genes, in the same way authors did, PC1 (which accounts for 24% of the variation) had a Spearman's correlation value of 0.48 (p<6.1e-16) with RNA integrity number.

The data analyzed in the GTEx dataset is not filtered or corrected for the cause of death, which can range from violent and sudden deaths to slow deaths or cases requiring a ventilator. As a result, the data may not accurately represent healthy aging profiles but rather reflect changes in the transcriptome specific to certain diseases due to the age-related increase in disease risk. While the authors do acknowledge this limitation in the discussion, stating that it is not a healthy cohort and disease-specific analysis is not feasible due to the limited number of samples, it would be useful for users to have the option to analyze only cases of fast death, excluding ventilator cases and deaths due to disease. This is typically how GTEx data is utilized in aging studies. Alternatively, the authors should consider including the "cause of death" variable in the model.

The age distribution varies across tissues which may impact the results of the study. The authors' claim that age distribution does not affect the outcomes is inconclusive. Since the study aims to provide cross-tissue analysis, it is important to note that differing age distributions across tissues can influence the overall results. To address this, the authors should conduct downsampling to different age distributions across tissues and evaluate the level of tissue-specific or common changes that remain after the distributions are made similar.

The GTEx resource is extremely valuable, however, it comes with challenges. GTEx contains tissue samples from the same individuals across different tissues, resulting in varying degrees of overlap in sample origin across tissues as not all tissues are collected for all individuals. This could affect the similar/different patterns observed across tissues. As this tool is meant for broader use by the community, it is crucial for the authors to either rule out this possibility by conducting a cross-tissue comparison using a non-parametric model that accounts for the dependency between samples from the same individual, or to provide information on the degree of similarity between samples so that the users can keep this possibility in mind when using the tool for hypothesis generation.

## Visualisation and analysis platform
The authors aimed to create an open-source and ever-evolving resource that could be adapted and improved with new functionality. However, this goal was only partially achieved. Although the code for the web app is open source, crucial components such as the statistical tests or the linear model are not included in the repository, limiting the tool's customizability and adaptability.

Furthermore, the authors' choice of visualization platform (R shiny) may not be the best fit for extensibility and open-source collaboration, as it lacks modularity. A more suitable alternative could be production-oriented platforms such as Flask or FastAPI.

To facilitate collaboration and improve the tool's adaptability, data resulting from the pre-processing pipeline should be made publicly available. This would make it easier for others to contribute and extend the tool's functionality, ultimately enhancing its value for the scientific community.

Reviewer #3 (Public Review):

In their manuscript, Schneider et al. aim to develop voyAGEr, a web-based tool that enables the exploration of gene expression changes over age in a tissue- and sex-specific manner. The authors achieved this goal by calculating the significance of gene expression alterations within a sliding window, using their unique algorithm, Shifting Age Range Pipeline for Linear Modelling (ShARP-LM), as well as tissue-level summaries that calculated the significance of the proportion of differentially expressed genes by the windows and calculated enrichments of pathways for showing biological relevance. Furthermore, the authors examined the enrichment of cell types, pathways, and diseases by defining the co-expressed gene modules in four selected tissues. The voyAGEr was developed as a discovery tool, providing researchers with easy access to the vast amount of transcriptome data from the GTEx project. Overall, the research design is unique and well-performed, with interesting results that provide useful resources for the field of human genetics of aging. I have a few questions and comments, which I hope the authors can address.

1. In the gene-centric analyses section of the result, to improve this manuscript and database, linear regression tests accounting for the entire range of age should be added. The authors' algorithm, ShARP-LM, tests locally within a 16-year window which makes it has lower power than the linear regression test with the whole ages. I suspect that the power reduction is strongly affected in the younger age range since a larger number of GTEx donors are enriched in old age. By adding the results from the lm tests, readers would gain more insight and evidence into how significantly their interest genes change with age.
2. In line with the ShARP-LM test results, it is not clear which criterion was used to define the significant genes and the following enrichment analyses. I assume that the criterion is P < 0.05, but it should be clearly noted. Additionally, the authors should apply adjusted p-values for multiple-test correction. The ideal criterion is an adjusted P < 0.05. However, if none or only a handful of genes were found to be significant, the authors could relax the criteria, such as using a regular P < 0.01 or 0.05.
3. In the gene-centric analyses section, authors should provide a full list of donor conditions and a summary table of conditions as supplementary.
4. The tissue-specific assessment section has poor sub-titles. Every title has to contain information.
5. I have an issue understanding the meaning of NES from GSEA in the tissue-specific assessment section. The authors performed GSEA for the DEGs against the background genes ordered by t-statistics (from positive to negative) calculated from the linear model. I understand the p-value was two-tailed, which means that both positive and negative NES are meaningful as they represent up-regulated expression direction (positive coefficient) and down-regulated expression direction (negative coefficient) with age, respectively, within a window. However, in the GSEA section of Methods, authors were not fully elaborate on this directionality but stated, "The NES for each pathway was used in subsequent analyses as a metric of its over- or down-representation in the Peak". The authors should clearly elaborate on how to interpret the NES from their results.
6. In the Modules of co-expressed genes section, the authors did not explain how or why they selected the four tissues: brain, skeletal muscle, heart (left ventricle), and whole blood. This should be elaborated on.
7. In the modules of the co-expressed genes section, the authors did not provide an explanation of the "diseases-manual" sub-tab of the "Pathway" tab of the voyAGEr tool. It would be helpful for readers to understand how the candidate disease list was prepared and what the results represent.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation