Author response:
The following is the authors’ response to the previous reviews
Public Reviews:
Reviewer #1 (Public review):
Weaknesses:
(1) Research scope
The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted, and why comparisons to previous literature were not included.
We have significantly strengthened the biological and clinical rationale for focusing on these three genes in the Introduction. Specifically, we now clearly justify their selection based on distinct functional roles: ZNF217 (oncogene, 20q13 amplification); ZNF703 (luminal subtype oncogenic driver); ZNF750 (tumor suppressor involved in differentiation). We have also explicitly define the knowledge gap: lack of mutation and expression data for these genes in African populations, particularly Kenyan cohorts.
Importantly, we have now incorporated comparative analysis with TCGA data in the Results. This include; A new section on “Recurrent mutations and comparison with TCGA”; a new table, “Table 6” and a curated dataset, “Supplementary Table S4”
(2) Language and Style Issues
There are many typos and clear errors in the main text (e.g. (ref)).
Additionally, several statements read unnaturally. For example:
"Investigators uncovered 170 mutations ..." should instead be phrased as "We identified 170 mutations ...."
"The research team ..." should be rephrased as "Our team ...."
The manuscript has undergone comprehensive language editing throughout the revised draft.
(3) Methods and Data Analysis Details
The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:
(a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).
(b) Statistical methods for somatic mutation/SNP detection.
(c) Details of RNA purification and RNA-seq library preparation.
Without these details, the reproducibility of the study is limited.
We have fully revised and substantially expanded the Methods section to improve clarity, transparency, and reproducibility. In the revised manuscript, we now provide explicit details of all key analytical steps. These include quality control procedures using FastQC and MultiQC, as well as read trimming parameters implemented in Trimmomatic (leading and trailing quality <3, sliding window 4:15, and minimum read length of 36 bp). We also clearly describe alignment of reads to the hg38 reference genome using BWA-MEM, followed by somatic variant calling using MuTect2 in paired tumor–normal mode with incorporation of a Panel of Normals (PON). Variant filtering criteria are now explicitly stated, including minimum read depth (≥10), base quality (≥20), and variant allele fraction (≥0.05), and functional annotation was performed using VEP (v108).
In addition, we have included details on variant validation through visualization in the Integrative Genomics Viewer (IGV), as well as RNA-seq processing steps using STAR for alignment, featureCounts for quantification, and DESeq2 for normalization and differential expression analysis. Statistical analyses are now clearly described, including the use of paired tests and Benjamini–Hochberg correction for multiple testing. Collectively, these additions directly address the reviewer’s concerns by ensuring that all analytical procedures are transparently reported and fully reproducible.
(4) Data Reporting
This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:
(a) Deposit sequencing data in a public repository.
(b) Provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).
(c) Clarify whether raw or adjusted p-values were used for DEG analysis.
(d) Perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.
We have improved data transparency and reporting in the revised manuscript. All sequencing data are now publicly available, with whole-exome sequencing (WES) data deposited in the Sequence Read Archive (SRA; PRJNA913947) and RNA-seq data available in the Gene Expression Omnibus (GEO; GSE225846). In addition, we have provided comprehensive Supplementary Materials to support reproducibility and facilitate further analysis, including detailed mutation summaries (Table S1), mutation positions (Table S2), amino acid changes (Table S3), the curated TCGA comparison dataset (Table S4), protein domain annotations (Table S5), and the combined gene expression and clinical dataset (Table S6).
We have also clarified key aspects of the statistical analysis, including the use of Benjamini–Hochberg adjusted p-values and the thresholds applied for significance. Furthermore, in response to reviewer comments regarding subtype-specific analyses, we have explicitly addressed in the Discussion why subtype-stratified differential expression analysis was not performed, noting that the limited sample size would reduce statistical power and increase the risk of overinterpretation. Together, these revisions enhance the transparency, accessibility, and interpretability of the study.
(5) Mutation Analysis
Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.
We have substantially enhanced the mutation analysis by incorporating several new figures and complementary analyses that provide deeper biological interpretation. Specifically, we added Figure 1 to summarize mutation burden, coding consequences, and prevalence; Figure 2 to illustrate the nucleotide substitution spectrum; Figure 3 to map mutations across protein domains; Figure 4 to assess functional enrichment and mutation composition; and Figure 5 to highlight recurrent mutations.
Reviewer #2 (Public review):
Weaknesses:
The current cohort size is relatively small to reach significant findings, and targeted exploration on ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.
We acknowledge the limitation posed by the relatively small cohort size and have addressed this concern in several ways in the revised manuscript. First, we have explicitly stated this limitation in the Discussion section. We have also reframed the study as a pilot and population-specific exploratory analysis to better reflect its scope. To strengthen the overall significance, we integrated both mutation and gene expression data, incorporated comparisons with TCGA datasets, and emphasized the importance of African-specific genomic insights. Importantly, we highlight that this study provides novel data from an underrepresented population, which represents a key contribution to the field.
Reviewer #3 (Public review):
Weaknesses:
The author has enhanced the descriptive depth of the study by adding details on mutations, expression subgroup analyses, and functional annotations but has not addressed the core weaknesses of small cohort size and lack of functional validation. While the revised version is more comprehensive in cataloging molecular alterations, it remains confined to descriptive analysis, with no substantial improvement in the reliability or generalizability of its conclusions.
We have addressed this concern by clearly acknowledging the key limitations of the study, including the absence of functional validation, the relatively small sample size, and the limited generalizability of the findings. In response, we have refined our interpretation to avoid causal claims and instead present the results as hypothesis-generating. We have also expanded the Discussion to include future research directions, recommending functional validation studies, multi-omics approaches, and validation in larger, more diverse cohorts.
In addition, we have strengthened the robustness of the study by incorporating comparisons with TCGA data, providing more detailed mutation classification, and integrating genomic and transcriptomic analyses. Beyond addressing reviewer comments, we have further improved the manuscript by reorganizing the Results section to follow a clear and logical flow—from mutation burden and spectrum to protein-level distribution, functional enrichment, recurrent mutations, and TCGA comparison. We have also improved figure quality and labeling to meet journal standards, added clear and consistent figure captions, and ensured alignment between the text, figures, and tables throughout the manuscript.
We sincerely thank the reviewers for their valuable feedback, which has significantly improved the quality and rigor of this work.