Mutational and Expression Profile of ZNF217, ZNF750, ZNF703 Zinc Finger Genes in Kenyan Women Diagnosed with Breast Cancer

  1. Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya
  2. Directorate of Research and Innovation, Mount Kenya University, Thika, Kenya
  3. Département de Chimie, Université du Quebéc à Montréal, Montreal, Canada
  4. International Livestock Research Institute, Nairobi, Kenya

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Yongliang Yang
    Shanghai University of Medicine and Health Sciences, Shanghai, China
  • Senior Editor
    Caigang Liu
    Shengjing Hospital of China Medical University, Shenyang, China

Reviewer #2 (Public review):

Summary:

The authors sought to characterize the somatic mutation landscape and gene expression profiles of Kenyan breast cancer patients. By comparing Whole Exome Sequencing (WES) and RNA-seq data from 23 paired tumor-normal samples against The Cancer Genome Atlas (TCGA) cohorts, the study specifically aimed to highlight the role of the ZNF gene family.

Strengths:

The study addresses a critical gap in genomic research by focusing on an underrepresented African population, which is essential for achieving global health equity in oncology.

Weaknesses:

The cohort is relatively small for definitive landscape characterization. The study fails to explore the mechanistic link between identified somatic mutations and observed aberrant gene expression.

Impact and Utility:

The impact of this work is currently limited. While the data adds to the growing repository of African genomic samples, the lack of novelty and mechanistic insight reduces its utility for the broader scientific community. To be clinically valuable, the study would need to offer more robust, unbiased profiling that could eventually inform population-specific diagnostics or therapies.

Additional Context:

Breast cancer in African populations often presents with different clinical trajectories compared to Western cohorts. While any data from these regions is vital, "landscape" studies require high statistical power and unbiased analysis to differentiate true population-specific drivers from noise or small-sample variance. Without a clear regulatory mechanism linking mutations to phenotypes, the findings remain preliminary observations.

Reviewer #3 (Public review):

Summary:

This revised study analyzes the somatic mutational profiles and transcriptomic expression of three zinc-finger genes (ZNF217, ZNF703, ZNF750) in 23 Kenyan women with breast cancer, using whole-exome sequencing and RNA-sequencing of paired tumor-normal tissues. A total of 358 somatic mutations were detected, and all three genes were significantly upregulated in tumors compared to normal tissues (ZNF217 showing the most prominent difference). The findings provide preliminary evidence for the idenfication of diagnostic/prognostic biomarkers or therapeutic targets in sub-Saharan African populations.

Strengths:

The study's key strengths lie in its focus on an underrepresented Kenyan cohort, addressing a critical gap in sub-Saharan African breast cancer genomic research. It integrates DNA-level mutation analysis with RNA-level expression data, leveraging standardized bioinformatics pipelines and rigorous quality control to deliver detailed insights into mutation types, functional impacts, and amino acid changes.

Comments on revised version:

After careful revision by the authors, the manuscript has become more rigorous. The limitations including small sample size and lack of functional validation are properly acknowledged, and conclusions are prudently presented as hypothesis‑generating rather than causal claims. Meanwhile, strengthened multi‑omics analyses, TCGA validation, logical reorganization of results and improved figure presentation further enhance the reliability of this work.

Author response:

The following is the authors’ response to the previous reviews

Public Reviews:

Reviewer #1 (Public review):

Weaknesses:

(1) Research scope

The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted, and why comparisons to previous literature were not included.

We have significantly strengthened the biological and clinical rationale for focusing on these three genes in the Introduction. Specifically, we now clearly justify their selection based on distinct functional roles: ZNF217 (oncogene, 20q13 amplification); ZNF703 (luminal subtype oncogenic driver); ZNF750 (tumor suppressor involved in differentiation). We have also explicitly define the knowledge gap: lack of mutation and expression data for these genes in African populations, particularly Kenyan cohorts.

Importantly, we have now incorporated comparative analysis with TCGA data in the Results. This include; A new section on “Recurrent mutations and comparison with TCGA”; a new table, “Table 6” and a curated dataset, “Supplementary Table S4”

(2) Language and Style Issues

There are many typos and clear errors in the main text (e.g. (ref)).

Additionally, several statements read unnaturally. For example:

"Investigators uncovered 170 mutations ..." should instead be phrased as "We identified 170 mutations ...."

"The research team ..." should be rephrased as "Our team ...."

The manuscript has undergone comprehensive language editing throughout the revised draft.

(3) Methods and Data Analysis Details

The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

(a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

(b) Statistical methods for somatic mutation/SNP detection.

(c) Details of RNA purification and RNA-seq library preparation.

Without these details, the reproducibility of the study is limited.

We have fully revised and substantially expanded the Methods section to improve clarity, transparency, and reproducibility. In the revised manuscript, we now provide explicit details of all key analytical steps. These include quality control procedures using FastQC and MultiQC, as well as read trimming parameters implemented in Trimmomatic (leading and trailing quality <3, sliding window 4:15, and minimum read length of 36 bp). We also clearly describe alignment of reads to the hg38 reference genome using BWA-MEM, followed by somatic variant calling using MuTect2 in paired tumor–normal mode with incorporation of a Panel of Normals (PON). Variant filtering criteria are now explicitly stated, including minimum read depth (≥10), base quality (≥20), and variant allele fraction (≥0.05), and functional annotation was performed using VEP (v108).

In addition, we have included details on variant validation through visualization in the Integrative Genomics Viewer (IGV), as well as RNA-seq processing steps using STAR for alignment, featureCounts for quantification, and DESeq2 for normalization and differential expression analysis. Statistical analyses are now clearly described, including the use of paired tests and Benjamini–Hochberg correction for multiple testing. Collectively, these additions directly address the reviewer’s concerns by ensuring that all analytical procedures are transparently reported and fully reproducible.

(4) Data Reporting

This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

(a) Deposit sequencing data in a public repository.

(b) Provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

(c) Clarify whether raw or adjusted p-values were used for DEG analysis.

(d) Perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

We have improved data transparency and reporting in the revised manuscript. All sequencing data are now publicly available, with whole-exome sequencing (WES) data deposited in the Sequence Read Archive (SRA; PRJNA913947) and RNA-seq data available in the Gene Expression Omnibus (GEO; GSE225846). In addition, we have provided comprehensive Supplementary Materials to support reproducibility and facilitate further analysis, including detailed mutation summaries (Table S1), mutation positions (Table S2), amino acid changes (Table S3), the curated TCGA comparison dataset (Table S4), protein domain annotations (Table S5), and the combined gene expression and clinical dataset (Table S6).

We have also clarified key aspects of the statistical analysis, including the use of Benjamini–Hochberg adjusted p-values and the thresholds applied for significance. Furthermore, in response to reviewer comments regarding subtype-specific analyses, we have explicitly addressed in the Discussion why subtype-stratified differential expression analysis was not performed, noting that the limited sample size would reduce statistical power and increase the risk of overinterpretation. Together, these revisions enhance the transparency, accessibility, and interpretability of the study.

(5) Mutation Analysis

Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

We have substantially enhanced the mutation analysis by incorporating several new figures and complementary analyses that provide deeper biological interpretation. Specifically, we added Figure 1 to summarize mutation burden, coding consequences, and prevalence; Figure 2 to illustrate the nucleotide substitution spectrum; Figure 3 to map mutations across protein domains; Figure 4 to assess functional enrichment and mutation composition; and Figure 5 to highlight recurrent mutations.

Reviewer #2 (Public review):

Weaknesses:

The current cohort size is relatively small to reach significant findings, and targeted exploration on ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

We acknowledge the limitation posed by the relatively small cohort size and have addressed this concern in several ways in the revised manuscript. First, we have explicitly stated this limitation in the Discussion section. We have also reframed the study as a pilot and population-specific exploratory analysis to better reflect its scope. To strengthen the overall significance, we integrated both mutation and gene expression data, incorporated comparisons with TCGA datasets, and emphasized the importance of African-specific genomic insights. Importantly, we highlight that this study provides novel data from an underrepresented population, which represents a key contribution to the field.

Reviewer #3 (Public review):

Weaknesses:

The author has enhanced the descriptive depth of the study by adding details on mutations, expression subgroup analyses, and functional annotations but has not addressed the core weaknesses of small cohort size and lack of functional validation. While the revised version is more comprehensive in cataloging molecular alterations, it remains confined to descriptive analysis, with no substantial improvement in the reliability or generalizability of its conclusions.

We have addressed this concern by clearly acknowledging the key limitations of the study, including the absence of functional validation, the relatively small sample size, and the limited generalizability of the findings. In response, we have refined our interpretation to avoid causal claims and instead present the results as hypothesis-generating. We have also expanded the Discussion to include future research directions, recommending functional validation studies, multi-omics approaches, and validation in larger, more diverse cohorts.

In addition, we have strengthened the robustness of the study by incorporating comparisons with TCGA data, providing more detailed mutation classification, and integrating genomic and transcriptomic analyses. Beyond addressing reviewer comments, we have further improved the manuscript by reorganizing the Results section to follow a clear and logical flow—from mutation burden and spectrum to protein-level distribution, functional enrichment, recurrent mutations, and TCGA comparison. We have also improved figure quality and labeling to meet journal standards, added clear and consistent figure captions, and ensured alignment between the text, figures, and tables throughout the manuscript.

We sincerely thank the reviewers for their valuable feedback, which has significantly improved the quality and rigor of this work.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation