Author response:
The following is the authors’ response to the original reviews
Reviewer #1 (Recommendations for the authors):
(1) Storyline and Narrative Flow:
Consider revising the manuscript to create a more coherent and consistent narrative. Clarify how each section of the study-particularly the transition from multi-omics data integration to single-cell RNA-seq validation-contributes to the overall research question. This will help readers better understand the logical flow of the study.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
We have modified some text, including the connections between different sections in the results part and the objectives and roles of various analyses in each section, thus enhancing the coherence between the contexts and clarifying the objectives and functions of each analysis, We believe this will help readers better understand the main content of the entire text.
(2) Immune Cell Activity Analysis:
Reevaluate the methods used to assess immune cell activities within the context of the tumor microenvironment. Consider providing additional justification for the relevance of using the cancer cell model for this analysis. If necessary, explore alternative methods or models that might offer more meaningful insights into immune-tumor interactions.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
Using RNA-Bulk data, we evaluated the tumor immune microenvironment through various methods to assess immune infiltration levels and responses to immunotherapy. We found that the results were largely consistent with those presented in the manuscript, providing strong support for our viewpoints. We also acknowledge the limitations of findings from bioinformatics analysis. In our upcoming research, we plan to develop organoid models with gene expression patterns of both CS1 and CS2 subtypes, using these models as a foundation for studying the tumor immune microenvironment.
(3) Single-Cell RNA-Seq Validation:
Expand the validation of your findings using single-cell RNA-seq data. This could include more in-depth analyses that explore the heterogeneity within the subtypes and confirm the robustness of your classification method at the single-cell level. This would strengthen the support for your claims about the relevance of the identified subtypes.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
In this manuscript, we employed the NTP algorithm to classify malignant cells identified by the CopyKAT algorithm using characteristic genes of CS1 and CS2 subtypes. This approach is similar to previous method that analyzed patients in the ICGC cohort with the same subtype genes. We consider this classification method valid.
After classifying the malignant cells, we performed metabolic and cell communication analyses on the CS1 and CS2 subtype cells, revealing significant differences in biological pathways enriched by differential genes, metabolic levels, and cell signaling patterns. These differences align with variations observed in prior classifications and analyses based on RNA-Bulk data.
We also acknowledge that validating the classification method solely with the single-cell dataset from this study is insufficient. We analyzed GSE202642 using the same processes and methods as GSE229772, finding that the results were generally consistent, indicating that our classification method exhibits a degree of robustness at the single-cell level.
(4) Methodological Justification:
Provide a more detailed rationale for the selection of machine learning algorithms and integration strategies used in the study. Explain why the chosen methods are particularly well-suited for this research, and discuss any potential limitations they might have.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
We have updated the methodology section to enhance readers' understanding of the fundamental principles involved. This analysis has two key features: first, it combines 10 machine learning algorithms to generate 101 models and ultimately selects the prognostic prediction model with the highest C-index from these 101 algorithms; second, it utilizes the LOOCV method to analyze the training and validation sets. Compared to the conventional method of randomly dividing the training and validation sets by a fixed ratio, this approach significantly minimizes the bias and randomness introduced by the splitting process. Therefore, we believe this analysis can leverage the characteristic genes of the CS1 and CS2 subtypes, combined with existing clinical data from public databases, to yield results that are more accurate and reliable than the commonly used prognostic models in previous literature, such as COX regression and Lasso regression, as well as other individual algorithms. While this analysis presents advantages over some previous modeling methods, it is essential to recognize that it remains based on analyses conducted using public databases, which may obscure certain factors that might be clinically relevant to patient prognosis due to the mathematical logic of the algorithms.
(5) Figures and Visualizations:
Improve the clarity of your figures by addressing the following:
a) Figure 3A: Cluster the pathways to make the comparisons clearer and more meaningful.
b) Figure 4A: Clearly explain the significance of the blue bar.
c) Figure 4B: Ensure this figure is discussed in the main text to justify its inclusion.
d) Figure 7C: Enhance the figure legend to provide more informative details.
Additionally, ensure that figure descriptions go beyond the captions and provide detailed explanations that help the reader understand the significance of each figure.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
Figure 3A: We clustered the samples based on CS1 and CS2 subtypes and displayed the immune-related cell scores of each sample as a heatmap.
Figure 4A: The blue bars in the figure represent the average C-index of this algorithm combination in the training dataset TCGA and the validation dataset ICGC, which we have supplemented in the corresponding sections of the text.
Figure 4B: We described this figure in the results section, which primarily aims to validate whether our prognostic prediction model can predict patient outcomes in the TCGA cohort. The results showed that after performing prognostic risk scoring on patients based on the prediction model and categorizing them into high-risk and low-risk groups, the two groups exhibited significant prognostic differences, with the high-risk group showing worse outcomes compared to the low-risk group. This indicates that our prognostic prediction model can effectively distinguish the prognostic risk differences among patients in the TCGA-LIHC cohort. We also discussed these findings in the discussion section.
Figure 7C: We used both point color and size to visualize the levels of metabolic scores, resulting in two dimensions in the legend, which actually represent the same information. Therefore, we removed the results that used point size to indicate the levels of metabolic scores.
(6) Supplementary Materials:
Consider including more detailed supplementary materials that provide additional validation data, extended methodological descriptions, and any other information that would support the robustness of your findings.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
In the subsequent version of the record, we will upload the important results obtained during the research to GitHub, and in this revision, we have updated some figures that may better explain the results or the robustness of the findings as supplementary materials.
(7) Recent Literature:
a) Incorporate more recent studies in your discussion, especially those related to HCC subtypes and the application of machine learning in oncology. This will provide a more current context for your work and help position your findings within the broader field.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
We have reviewed several studies related to HCC subtype classification and the application of machine learning in this field. In the discussion section, we summarize the significance and limitations of these studies. Additionally, we discuss the characteristics of our study in comparison to previous research in this field.
(8) Data and Code Availability:
Ensure that all data, code, and materials used in your study are made available in line with eLife's policies. Provide clear links to repositories where readers can access the data and code used in your analyses.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
We have examined the relevant data, code, and materials. We confirm that we have indicated the sources of the data and tools used in the analysis within the manuscript. Moreover, these data and tools are accessible via the websites or references we have provided.
Reviewer #2 (Recommendations for the authors):
(1) While the computational findings are robust, further experimental validation of the two subtypes, particularly the role of the MIF signaling pathway, would strengthen the biological relevance of the findings. In vitro or in vivo validation could confirm the proposed mechanisms and their influence on patient prognosis.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
We intend to verify our findings in future studies using tumor cell line models and animal models. We aim to identify and intervene with key molecules in the MIF signaling pathway. We will investigate how the MIF signaling pathway affects tumor sensitivity to treatment in both cell line and animal models, along with the underlying mechanisms.
(2) Consider testing the model on additional independent cohorts beyond the TCGA and ICGC datasets to further demonstrate its generalizability and applicability across different patient populations.
We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:
We analyzed the GSE14520 study recorded in the GEO database, which uploaded a cohort consisting of 209 HCC patients and their corresponding RNA sequencing data. We validated the prognostic model obtained in this study using this cohort, and found that the model effectively distinguishes patients into high-risk and low-risk prognostic categories. Furthermore, there is a significant prognostic difference between the high-risk and low-risk patient groups. This is consistent with the results we obtained previously.
(3) Review the manuscript for long or complex sentences, which can be broken down into shorter, more readable parts.
We have made revisions to the long and complex sentences in the manuscript without compromising its academic integrity and rationality, with the hope that this will help readers better understand the content of this study.
During the revision process, in addition to addressing the reviewer comments, we conducted a thorough review of the analysis. In the course of this review, we identified a few errors in the data usage and have since corrected the relevant data and figures:
Figure 4: Due to space constraints, we adjusted the composition of the figures after incorporating the validation results from the GSE14520 dataset.
Figure 5A: We rechecked the regression coefficients included in the model, updated several more recent prognostic models, and calculated the C-index for 20 prognostic models in the TCGA and ICGC cohorts using a method consistent with previous studies.
Figure 5C-D: We adjusted the clarity of the figures.
Figure 8: We reclassified the selected malignant cells and updated the subtypes results. Subsequently, based on the repeatedly confirmed typing results, we comprehensively updated the analysis results of the subsequent cell communication network construction, ensuring that the entire analysis process remains consistent with previous findings. We also adjusted the composition of the figure and presented the images that could not be conveniently merged due to space constraints as Figure 9.