A novel high-throughput single-cell DNA sequencing method reveals hidden genomic heterogeneity in the unicellular eukaryote Leishmania

  1. Experimental Parasitology Unit, Institute of Tropical Medicine Antwerp, Antwerp, Belgium
  2. Molecular Parasitology Unit, Institute of Tropical Medicine Antwerp, Antwerp, Belgium

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    María Zambrano
    CorpoGen, Bogotá, Colombia
  • Senior Editor
    Dominique Soldati-Favre
    University of Geneva, Geneva, Switzerland

Reviewer #1 (Public review):

Summary:

Negreira, G. et al clearly presented the challenges of conducting genomic studies in unicellular pathogens and of addressing questions related to the balance between genome integrity and instability, pivotal for survival under the stressful conditions these organisms face and for their evolutionary success. This underlies the need for powerful approaches to perform single-cell DNA analyses suited to the small and plastic Leishmania genome. Accordingly, their goal was to develop such a novel method and demonstrate its robustness.

In this study, the authors combined semi-permeable capsules (SPCs) with primary template-directed amplification (PTA) and adapted the system to the Leishmania genome, which is about 100 times smaller than the human genome and exhibits remarkable plasticity and mosaic aneuploidy. Given the size and organization of the Leishmania genome, the challenges were substantial; nevertheless, the authors successfully demonstrated that PTA not only works for Leishmania but also represents a significantly improved whole-genome amplification (WGA) method compared with standard approaches. They showed that SPCs provide a superior alternative for cell encapsulation, increasing throughput. The methodology enabled high-resolution karyotyping and the detection of fine-scale copy number variations (CNVs) at the single-cell level. Furthermore, it allowed discrimination between genotypically distinct cells within mixed populations.

Strengths:

This is a high-impact study that will likely contribute to our understanding of DNA replication and the genetic plasticity of Leishmania, including its well-documented aneuploidy, somy variations, CNVs, and SNPs - all key elements for elucidating various aspects of the parasite's biology, such as genome evolution, genetic exchange, and mechanisms of drug resistance.

Overall, the authors clearly achieved their objectives, providing a solid rationale for the study and demonstrating how this approach can advance the investigation of Leishmania's small, plastic genome and its frequent natural strain mixtures within hosts. This methodology may also prove valuable for genomic studies of other single-celled organisms.

Weaknesses:

The discussion section could be enriched to help readers understand the significance of the work, for instance, by more clearly pointing out the obstacles to a better understanding of DNA replication in Leishmania. Or else, when they discuss the results obtained at the level of nucleotide information and the relevance of being able to compare, in their case, the two strains, they could refer to the implications of this level of precision to those studying clonal strains or field isolates, drug resistance or virulence in a more detailed way.

Reviewer #2 (Public review):

Summary:

Negreira et al. present an application of a novel single-cell genomics approach to investigate the genetic heterogeneity of Leishmania parasites. Leishmania, while also representing a major global disease with hundreds of thousands of cases annually, serves as a model to test the rigor of the sequencing strategy. Its complex karyotypic nature necessitates a method that is capable of resolving natural variation to better understand genome dynamics. Importantly, an earlier single-cell genomics platform (10x Chromium) is no longer available, and new methods need to be evaluated to fill in this gap.

The study was designed to evaluate whether a capsule-based cell capture method combined with primary template-directed amplification (PTA) could maintain levels of genomic heterogeneity represented in an equal mixture of two Leishmania strains. This was a high bar, given the relatively small protozoan genome and prior studies that showed limitations of single-cell genomics, especially for gene-level copy number changes. Overall, the study found that semi-permeable capsules (SPC) are an effective way to isolate high-quality single cells. Additionally, short reads from amplified genomes effectively maintained the relative levels of variation in the two strains on the chromosome, gene copy, and individual base level. Thus, this method will be useful to evaluate adaptive strategies of Leishmania. Many researchers will also refer to these studies to set up SPC collection and PTA methods for their organism of choice.

Strengths:

(1) The use of SPC and PTA in a non-bacterial organism is novel. The study displays the utility of these methods to isolate and amplify single genomes to a level that can be sequenced, despite being a motile organism with a GC-rich genome.

(2) The authors clearly outlined their optimization strategy and provided numerous quality-control metrics that inspire confidence in the success of achieving even chromosomal coverage relative to ploidy.

(3) The use of two distinct Leishmania strains with known clonal status provided strong evidence that PTA-based amplification could reflect genome differences and displayed the utility of the method for studies of rare genotypes.

(4) Evaluating the SPCs pre- and post-amplification with microscopy is a practical and robust way of determining the success of SPC formation and PTA.

(5) The authors show that the PTA-based approach easily resolved major genotypic ploidy in agreement with a prior 10x Chromium-based study. The new method had improved resolution of drug resistance genotypes in the form of both copy-number variations and single-nucleotide polymorphisms.

(6) In general, the authors are very thorough in describing the methods, including those used to optimize PTA lysis and amplification steps (fresh vs frozen cells, naked DNA vs sorted cells, etc). This demonstrates a depth of knowledge about the procedure and leaves few unanswered questions.

(7) The custom, multifaceted, computational assessment of coverage evenness is a major strength of the study and demonstrates that the authors acknowledge potential computational factors that could impact the analysis.

Weaknesses:

(1) The rationale behind some experimental/analysis choices is not well-described. For example, the rationale behind methanol fixation and heat-lysis is unclear. Additionally, the choice of various methods to assess "evenness" is not justified (e.g. why are multiple methods needed? What is the strength of each method?). Also, there is no justification for using 100k reads for subsampling. Finally, what exactly constitutes a "confidently-called SNP"?

(2) In the methods, the STD protocol lists a 15-minute amplification at 45C whereas the PTA protocol involves 10h at 37C. This is a dramatic difference in incubation time and should be addressed when comparing results from the two methods. It is not really a fair comparison when you look at coverage levels; of course, a 10-hour incubation is going to yield more reads than a 15-minute incubation.

(3) There is a lack of quantitative evaluations of the SPCs. e.g. How many capsules were evaluated to assess doublets? How many capsules were detected as Syto5 positive in a successful vs an unsuccessful experiment?

(4) The authors do not address some of the amplification results obtained under various conditions. For example, why did temperature-based lysis of STD4 lead to amplification failure? Also, what is the reason for fewer "true" cells (higher background) in the PTA samples compared to the STD samples? Is this related to issues with barcoding or, alternatively, substandard amplification as indicated by lower read amounts in some capsules (knee plots in Figure 1C)?

(5) The paper presents limited biological relevance. Without this, the paper describes an improvement in genome amplification methods and some proof-of-concept analyses. Using a 1:1 mixture of parasites with different genotypes, the authors display the utility of the method to resolve genetic diversity, but they don't seek to understand the limits of detecting this diversity. For some, the authors do not comment on the mixed karyotypes from the HU3 cells (Figure 3F) other than to state that this line was not clonal. For CNVs, the two loci evaluated were detected at relatively high copy number (according to Figure 4C, they are between 4 and 20 copies). Thus, the sensitivity of CNV detection from this data remains unclear; can this approach detect lower-level CNVs like duplications, or minor CNVs that do not show up in every cell?

(6) The authors state that Leishmania can carry extrachromosomal copies of important genes. There is no discussion about how the presence of these molecules would affect the amplification steps and CNV detection. For example, the phi29 enzyme is very processive with circular molecules; does its presence lead to overamplification and overrepresentation in the data? Is this evident in the current study? This information would be useful for organisms that carry this type of genetic element.

(7) The manuscript is missing a comparison with other similar studies in the field. For example, how does this coverage level compare to those achieved for other genomes? Can this method achieve amplification levels needed to assess larger genomes? Has there been any evaluation of base composition effects since Leishmania is a GC-rich genome?

(8) Cost is mentioned as a benefit of the SPC platform, and savings are achieved when working in a plate format, but no details are included on how this was evaluated.

(9) The Zenodo link for custom scripts does not exist, and code cannot be evaluated.

Reviewer #3 (Public review):

In this manuscript, Negreira et al. propose a new scDNAseq method, using semi-permeable capsules (SPCs) and primary template-directed amplification (PTA). The authors optimize several metrics to improve their predictions, such as determining GC bias, Intra-Chromosomal fluctuation (ICF -metric to differentiate replicative and non-replicative cells) and Intra-chromosomal coefficient of variation (ICCV - chromosome read distribution). The coverage evenness was evaluated using the fini index and the median absolute pairwise difference between the counts of two consecutive bins. They validate the proposed method using two Leishmania donovani strains isolated from different countries, BPK081 (low genomic variability) and HU3 (high genomic variability). Then, they showed that the method outperforms WGA and has similar accuracy to the discontinued 10X-scDNA (10X Genomics), further improving on short CNV identification. The authors also show that the method can identify somy variations, insertions/deletions and SNP variations across cells. This is a timely and very relevant work that has a wide applicability in copy number variation assessment using single-cell data.

I really appreciate this work. My congratulations to the authors. All my comments below only aim to improve an already solid manuscript.

(1) Data availability: Although the authors provide a Zenodo link, the data is restricted. I also could not access the GitHub link in the Zenodo website: https://github.com/gabrielnegreira/2025_scDNA_paper. The authors should make these files available.

(2) 2-SPC-PTA and SPC-STD cell count comparison: The authors have consistently proven that the SPC-PTA method was superior to SPC-STD. However, there are a few points that should be clarified regarding the SPC-PTA results. Is there an explanation for the lower proportion of SPC to true cells success in SPC-STD, which reflects the bimodal distribution for the reads per cell in SPC-PTA2 and a three-to-multimodal distribution in SPC-PTA1 in Figure 1B? Also, in Table 1, does the number of reads reflect the number of reads in all sequenced SPCs or only in the true cells? If it is in the SPCs, I suggest that the authors add a new column in the table with the "Number of reads in true cells" to account for this discrepancy.

(3) The authors should evaluate the results with a higher coverage for SCP-PTA. I understand that the authors subsampled the total read to 100,000 to allow cross-sample comparisons, especially between SPC-STD and SPC-PTA. However, as they concluded that the SPC-PTA was far superior, and the samples SPC-PTA1 and SPC-PTA2 had an "elbow" of 650,493 and 448,041, respectively, it might be interesting to revisit some of the estimations using only SPC-PTA samples and a higher coverage cutoff, as 400,000.

(4) Doublet detection: I suggest that the authors be a little more careful with their definition of doublets. The doublet detection was based on diagnostic SNPs from the two strains, BPK081 and HU3, which identify doublets between two very different and well-characterised strains. However, this method will probably not identify strain-specific doublets. This is of minor importance for cloned and stable strains with few passages, as BPK081, but might be more relevant in more heterogeneous strains, as HU3. Strain-specific doublets might also be relevant in other scenarios, as multiclonal infections with different populations from the same strain in the same geographic area. One positive point is that the "between strain doublet count" was low, so probably the within-strain doublet count should be low too. The manuscript would benefit from a discussion on this regard.

(5) Nucleotide sequence variants and phylogeny: I believe that a more careful description of the phylogenetic analysis and some limitations of the sequence variant identification would benefit the manuscript.

(5.1) As described in the methods, the authors intentionally selected two fairly different Leishmania donovani strains, HU3 and BPK081, and confirmed that the sequent variant methodology can separate cells from each strain. It is a solid proof of concept. However, most of the multiclonal infections in natural scenarios would be caused by parasite populations that diverge by fewer SNPs, and will be significantly harder to detect. Hence, I suggest that a short discussion about this is important.

(5.2) The authors should expand on the description of the phylogenetic tree. In the HU3 on Figure 5F left panel, most of the variation is observed in ~8 cells, which goes from position 0 to position ~28.000.

Most of the other cells are in very short branches, from ~29.000 to 30.4000 (5F right panel). Assuming that this representation is a phylogram, as the branches are short, these cells diverge by approximately 100-2000 SNPs. It is unexpected (but not impossible) that such ~8 divergent cells be maintained uniquely (or in very low counts) in the culture, unless this is a multiclonal infection. I would carefully investigate these cells. They might be doublets or have more missing data than other cells. I would also suggest that a quick discussion about this should be added to the manuscript.

Author response:

Reviewer #1 (Public review):

Summary:

Negreira, G. et al clearly presented the challenges of conducting genomic studies in unicellular pathogens and of addressing questions related to the balance between genome integrity and instability, pivotal for survival under the stressful conditions these organisms face and for their evolutionary success. This underlies the need for powerful approaches to perform single-cell DNA analyses suited to the small and plastic Leishmania genome. Accordingly, their goal was to develop such a novel method and demonstrate its robustness.

In this study, the authors combined semi-permeable capsules (SPCs) with primary template-directed amplification (PTA) and adapted the system to the Leishmania genome, which is about 100 times smaller than the human genome and exhibits remarkable plasticity and mosaic aneuploidy. Given the size and organization of the Leishmania genome, the challenges were substantial; nevertheless, the authors successfully demonstrated that PTA not only works for Leishmania but also represents a significantly improved whole-genome amplification (WGA) method compared with standard approaches. They showed that SPCs provide a superior alternative for cell encapsulation, increasing throughput. The methodology enabled high-resolution karyotyping and the detection of fine-scale copy number variations (CNVs) at the single-cell level. Furthermore, it allowed discrimination between genotypically distinct cells within mixed populations.

Strengths:

This is a high-impact study that will likely contribute to our understanding of DNA replication and the genetic plasticity of Leishmania, including its well-documented aneuploidy, somy variations, CNVs, and SNPs - all key elements for elucidating various aspects of the parasite's biology, such as genome evolution, genetic exchange, and mechanisms of drug resistance.

Overall, the authors clearly achieved their objectives, providing a solid rationale for the study and demonstrating how this approach can advance the investigation of Leishmania's small, plastic genome and its frequent natural strain mixtures within hosts. This methodology may also prove valuable for genomic studies of other single-celled organisms.

We thank the reviewer for the positive feedback and appreciation of the potential applications for the methodology we describe here.

Weaknesses:

The discussion section could be enriched to help readers understand the significance of the work, for instance, by more clearly pointing out the obstacles to a better understanding of DNA replication in Leishmania. Or else, when they discuss the results obtained at the level of nucleotide information and the relevance of being able to compare, in their case, the two strains, they could refer to the implications of this level of precision to those studying clonal strains or field isolates, drug resistance or virulence in a more detailed way.

We thank the reviewer for the suggestions. Indeed, single-cell DNA sequencing has successfully revealed cell-to-cell variability in replication timing and fork progression in mammalian cells[1,2] and we believe that the SPC-PTA workflow could be used in similar studies in Leishmania to complement bulk-based observations[3,4]. Regarding nucleotide information, it is indeed of high relevance to detect minor circulating variants with potential virulence impact and/or effect on drug resistance which could be missed by bulk sequencing. This includes the ability to detect co-occurring variants with potential epistatic effects. These topics will be further developed in the revised version. Finally, we will explicitly discuss how this methodology can be applied beyond Leishmania, to investigate genome plasticity, adaptation, and evolutionary processes in other organisms.

Reviewer #2 (Public review):

Summary:

Negreira et al. present an application of a novel single-cell genomics approach to investigate the genetic heterogeneity of Leishmania parasites. Leishmania, while also representing a major global disease with hundreds of thousands of cases annually, serves as a model to test the rigor of the sequencing strategy. Its complex karyotypic nature necessitates a method that is capable of resolving natural variation to better understand genome dynamics. Importantly, an earlier single-cell genomics platform (10x Chromium) is no longer available, and new methods need to be evaluated to fill in this gap.

The study was designed to evaluate whether a capsule-based cell capture method combined with primary template-directed amplification (PTA) could maintain levels of genomic heterogeneity represented in an equal mixture of two Leishmania strains. This was a high bar, given the relatively small protozoan genome and prior studies that showed limitations of single-cell genomics, especially for gene-level copy number changes. Overall, the study found that semi-permeable capsules (SPC) are an effective way to isolate high-quality single cells. Additionally, short reads from amplified genomes effectively maintained the relative levels of variation in the two strains on the chromosome, gene copy, and individual base level. Thus, this method will be useful to evaluate adaptive strategies of Leishmania. Many researchers will also refer to these studies to set up SPC collection and PTA methods for their organism of choice.

Strengths:

(1) The use of SPC and PTA in a non-bacterial organism is novel. The study displays the utility of these methods to isolate and amplify single genomes to a level that can be sequenced, despite being a motile organism with a GC-rich genome.

(2) The authors clearly outlined their optimization strategy and provided numerous quality-control metrics that inspire confidence in the success of achieving even chromosomal coverage relative to ploidy.

(3) The use of two distinct Leishmania strains with known clonal status provided strong evidence that PTA-based amplification could reflect genome differences and displayed the utility of the method for studies of rare genotypes.

(4) Evaluating the SPCs pre- and post-amplification with microscopy is a practical and robust way of determining the success of SPC formation and PTA.

(5) The authors show that the PTA-based approach easily resolved major genotypic ploidy in agreement with a prior 10x Chromium-based study. The new method had improved resolution of drug resistance genotypes in the form of both copy-number variations and single-nucleotide polymorphisms.

(6) In general, the authors are very thorough in describing the methods, including those used to optimize PTA lysis and amplification steps (fresh vs frozen cells, naked DNA vs sorted cells, etc). This demonstrates a depth of knowledge about the procedure and leaves few unanswered questions.

(7) The custom, multifaceted, computational assessment of coverage evenness is a major strength of the study and demonstrates that the authors acknowledge potential computational factors that could impact the analysis.

We deeply appreciate the positive and encouraging feedback on our manuscript.

Weaknesses:

(1) The rationale behind some experimental/analysis choices is not well-described. For example, the rationale behind methanol fixation and heat-lysis is unclear. Additionally, the choice of various methods to assess "evenness" is not justified (e.g. why are multiple methods needed? What is the strength of each method?). Also, there is no justification for using 100k reads for subsampling. Finally, what exactly constitutes a "confidently-called SNP"?

The methanol fixation prior to lysis is part of the original protocol described in the Single-Microbe Genome Barcoding Kit manual and was meant to facilitate lysis and DNA denaturation in bacterial cells (for which the kit was originally developed). However, in our preliminary tests with bulk samples – described in the supplementary material – we noticed a strong negative effect on lysis efficiency/DNA recovery when parasites were fixed with methanol. Thus, we decided to test the effect of skipping this step in the single-cell DNA workflow. We kept the SPC_STD1 sample to have a safe control where the full workflow described in the kit manual was followed.

As we were unsure if the standard lysis (25 ˚C for 15 minutes) would work efficiently for Leishmania, we included the heat-lysis (99˚C for 15 minutes) as well as the longer incubation lysis (25 ˚C for 1h). These modifications were listed as validated alternatives in the kit's manual.

The 100k reads threshold was chosen based on the number of reads found in the 'true cell' with the lowest read count.

Regarding variant calling, a variant was considered confidently called if it was covered, at single-cell level, by at least one deduplicated read with Phred quality above Q30 and mapping quality (MAPQ) also above 30.

In the revised version, we will include these explanations and improve the explanation of the metrics used to estimate coverage quality.

(2) In the methods, the STD protocol lists a 15-minute amplification at 45C whereas the PTA protocol involves 10h at 37C. This is a dramatic difference in incubation time and should be addressed when comparing results from the two methods. It is not really a fair comparison when you look at coverage levels; of course, a 10-hour incubation is going to yield more reads than a 15-minute incubation.

We agree with the reviewer that the longer incubation period of PTA might explain the higher read count seen in the PTA samples, although the differences in amplification kinetics (linear in PTA, exponential in STD) and potential differences in amplification saturation points make it difficult to compare them. For instance, an updated version of PTA (ResolveDNA V2) uses a lower amplification time (2.5 h) and achieves similar amplification levels compared to the 10h incubation time, suggesting PTA amplification saturates well before the 10h time. In any case, all quality check metrics were done with the cells subsampled to 100 k reads to mitigate the effect of read count differences on the data quality.

(3) There is a lack of quantitative evaluations of the SPCs. e.g. How many capsules were evaluated to assess doublets? How many capsules were detected as Syto5 positive in a successful vs an unsuccessful experiment?

We agree with the reviewer but during experimental execution SPCs were only assessed qualitatively via microscopy following the Single-cell microbe DNA barcoding kit manual. No quantitative analysis was done and therefore we do not have this data. Regarding doublet, this was done in silico based on the detection of SPCs containing mixed genomes from the two strains used in the study as described in the Materials and Methods. As pointed by another reviewer, this only allow the detection of inter-strain doublets. In the revised version, we explain this and add an estimation of total doublets based on the inter-strain doublet rate.

(4) The authors do not address some of the amplification results obtained under various conditions. For example, why did temperature-based lysis of STD4 lead to amplification failure? Also, what is the reason for fewer "true" cells (higher background) in the PTA samples compared to the STD samples? Is this related to issues with barcoding or, alternatively, substandard amplification as indicated by lower read amounts in some capsules (knee plots in Figure 1C)?

After exchange with the technical support team of the SPC generator kit, it was clarified that the heat lysis done in STD4 should have had a shorter incubation time (10 minutes instead of 15 minutes). We suspect that the longer incubation time, combined with the higher temperature and the harsh lysis condition with 0.8M KOH might have damaged SPCs and therefore DNA might have leaked out of them before WGA. In the microscopy images, SPCs in STD4 show a swollen aspect not seen in the other samples. In the revised version we will explain this more clearly.

(5) The paper presents limited biological relevance. Without this, the paper describes an improvement in genome amplification methods and some proof-of-concept analyses. Using a 1:1 mixture of parasites with different genotypes, the authors display the utility of the method to resolve genetic diversity, but they don't seek to understand the limits of detecting this diversity. For some, the authors do not comment on the mixed karyotypes from the HU3 cells (Figure 3F) other than to state that this line was not clonal. For CNVs, the two loci evaluated were detected at relatively high copy number (according to Figure 4C, they are between 4 and 20 copies). Thus, the sensitivity of CNV detection from this data remains unclear; can this approach detect lower-level CNVs like duplications, or minor CNVs that do not show up in every cell?

As described above we will include more discussion on potential biological relevance of the method in the revised version of the manuscript. In the revised version we will attempt to use dedicated bioinformatic tools to discover de novo CNVs, as per the suggestion of other reviewers. This might also allow us to determine the detection limit of the methodology for CNVs.

(6) The authors state that Leishmania can carry extrachromosomal copies of important genes. There is no discussion about how the presence of these molecules would affect the amplification steps and CNV detection. For example, the phi29 enzyme is very processive with circular molecules; does its presence lead to overamplification and overrepresentation in the data? Is this evident in the current study? This information would be useful for organisms that carry this type of genetic element.

We believe our data, which uses short-read sequences, does not allow to differentiate between intra-chromosomal CNVs and linear or circular episomal CNVs, so we cannot define if circular CNVs are over-amplified. Of note, we have previously demonstrated that the M-locus CNV in chromosome 36 is intrachromosomal, not circular (episomal)[5].

(7) The manuscript is missing a comparison with other similar studies in the field. For example, how does this coverage level compare to those achieved for other genomes? Can this method achieve amplification levels needed to assess larger genomes? Has there been any evaluation of base composition effects since Leishmania is a GC-rich genome?

We believe the SPC-PTA workflow can be applied to organisms with larger genomes as PTA was developed specifically for mammalian cells[6], and also because, in our hands, it outperformed the 10X scDNA solution, which was developed for mammals.

We believe direct comparison with other studies regarding coverage levels is elusive because other steps in the workflow apart from the WGA, such as the library preparation (PCR-based in our case), as well as genome features like GC content, size, and presence of repetitive regions, can also affect coverage levels and evenness. One strength of our approach was the use a single sample (the 50/50 mix between two L. donovani strain) for all conditions, thus removing potential parasite-specific biases. In addition, the application of a multiplexing system during barcoding allowed us to combine all samples prior to library preparation, thus removing potential differences introduced by this step.

Regarding the effect of GC-content, we did notice a positive bias in all samples in regions with higher GC content, which had to be corrected in silico. This was the opposite to a negative bias observed in previous study[7] likely due to differences in WGA and/or library preparation. In the revised version, we will include a supplementary figure showing the GC bias.

(8) Cost is mentioned as a benefit of the SPC platform, and savings are achieved when working in a plate format, but no details are included on how this was evaluated.

In the revised version we will provide precise cost estimates and the rationale for the estimation.

(9) The Zenodo link for custom scripts does not exist, and code cannot be evaluated.

The full Zenodo link (https://doi.org/10.5281/zenodo.17094083) will be included in the revised version.

Reviewer #3 (Public review):

Summary

In this manuscript, Negreira et al. propose a new scDNAseq method, using semi-permeable capsules (SPCs) and primary template-directed amplification (PTA). The authors optimize several metrics to improve their predictions, such as determining GC bias, Intra-Chromosomal fluctuation (ICF -metric to differentiate replicative and non-replicative cells) and Intra-chromosomal coefficient of variation (ICCV - chromosome read distribution). The coverage evenness was evaluated using the fini index and the median absolute pairwise difference between the counts of two consecutive bins. They validate the proposed method using two Leishmania donovani strains isolated from different countries, BPK081 (low genomic variability) and HU3 (high genomic variability). Then, they showed that the method outperforms WGA and has similar accuracy to the discontinued 10X-scDNA (10X Genomics), further improving on short CNV identification. The authors also show that the method can identify somy variations, insertions/deletions and SNP variations across cells. This is a timely and very relevant work that has a wide applicability in copy number variation assessment using single-cell data.

Strengths

I really appreciate this work. My congratulations to the authors. All my comments below only aim to improve an already solid manuscript.

We thank the reviewer for the enthusiasm and positive feedback.

Weaknesses

(1) Data availability: Although the authors provide a Zenodo link, the data is restricted. I also could not access the GitHub link in the Zenodo website: https://github.com/gabrielnegreira/2025_scDNA_paper. The authors should make these files available.

Both the Zenodo (https://doi.org/10.5281/zenodo.17094083) and the GitHub (https://github.com/gabrielnegreira/2025_scDNA_paper) repositories are now publicly available.

(2) 2-SPC-PTA and SPC-STD cell count comparison: The authors have consistently proven that the SPC-PTA method was superior to SPC-STD. However, there are a few points that should be clarified regarding the SPC-PTA results. Is there an explanation for the lower proportion of SPC to true cells success in SPC-STD, which reflects the bimodal distribution for the reads per cell in SPC-PTA2 and a three-to-multimodal distribution in SPC-PTA1 in Figure 1B? Also, in Table 1, does the number of reads reflect the number of reads in all sequenced SPCs or only in the true cells? If it is in the SPCs, I suggest that the authors add a new column in the table with the "Number of reads in true cells" to account for this discrepancy.

The reason for the higher presence of 'background' SPCs in the PTA samples is not clear, but we hypothesize that it could be due to PTA favoring amplification of small, free floating DNA molecules that might have been trapped in cell-free SPCs, as PTA works with shorter amplicons. Also, the longer incubation time seen in PTA (10 h) might have allowed enhanced amplification of low quantities of free-floating DNA to detectable levels. Regarding Table 1, indeed it only show the total number of reads per sample. In the revised version we will include the suggested column to Table 1.

(3) The authors should evaluate the results with a higher coverage for SCP-PTA. I understand that the authors subsampled the total read to 100,000 to allow cross-sample comparisons, especially between SPC-STD and SPC-PTA. However, as they concluded that the SPC-PTA was far superior, and the samples SPC-PTA1 and SPC-PTA2 had an "elbow" of 650,493 and 448,041, respectively, it might be interesting to revisit some of the estimations using only SPC-PTA samples and a higher coverage cutoff, as 400,000.

We believe the 100.000 cutoff is already high for aneuploidy analysis as we have successfully reconstructed parasite karyotype with 20.000 reads per cell8, so a higher cutoff will likely not improve it. For CNV analysis, in the revised version, we will try to identify de novo CNVs using dedicated bioinformatic tools as per other reviewer suggestions. There, we will also test if a higher CNV detection sensitivity is achieved using the suggested 400,000 reads cutoff for the PTA samples.

(4) Doublet detection: I suggest that the authors be a little more careful with their definition of doublets. The doublet detection was based on diagnostic SNPs from the two strains, BPK081 and HU3, which identify doublets between two very different and well-characterised strains. However, this method will probably not identify strain-specific doublets. This is of minor importance for cloned and stable strains with few passages, as BPK081, but might be more relevant in more heterogeneous strains, as HU3. Strain-specific doublets might also be relevant in other scenarios, as multiclonal infections with different populations from the same strain in the same geographic area. One positive point is that the "between strain doublet count" was low, so probably the within-strain doublet count should be low too. The manuscript would benefit from a discussion on this regard.

We fully agree with the reviewer. We will make it clear in the revised version that we quantify inter-strain doublets only, and we will also provide an estimation of total doublets based on the inter-strain doublet rate.

(5) Nucleotide sequence variants and phylogeny: I believe that a more careful description of the phylogenetic analysis and some limitations of the sequence variant identification would benefit the manuscript.

(5.1) As described in the methods, the authors intentionally selected two fairly different Leishmania donovani strains, HU3 and BPK081, and confirmed that the sequent variant methodology can separate cells from each strain. It is a solid proof of concept. However, most of the multiclonal infections in natural scenarios would be caused by parasite populations that diverge by fewer SNPs, and will be significantly harder to detect. Hence, I suggest that a short discussion about this is important.

We will add a short discussion clarifying the limitations, while noting that our data demonstrate the ability of the approach to resolve very closely related cells, as illustrated by the fine-scale genetic differences observed within the clonal BPK081 population and by the detection of rare variants at targeted loci. We will also emphasize that the sensitivity to detect closely related genotypes depends on sequencing depth and the genomic regions considered.

(5.2) The authors should expand on the description of the phylogenetic tree. In the HU3 on Figure 5F left panel, most of the variation is observed in ~8 cells, which goes from position 0 to position ~28.000. Most of the other cells are in very short branches, from ~29.000 to 30.4000 (5F right panel). Assuming that this representation is a phylogram, as the branches are short, these cells diverge by approximately 100-2000 SNPs. It is unexpected (but not impossible) that such ~8 divergent cells be maintained uniquely (or in very low counts) in the culture, unless this is a multiclonal infection. I would carefully investigate these cells. They might be doublets or have more missing data than other cells. I would also suggest that a quick discussion about this should be added to the manuscript.

In the revised version we will improve the description of the phylogenetic analysis. We will also investigate deeper the 8 mentioned cells to define if they have confounding factors that might have led to their discrepancy. The possibility of multiclonal infection in HU3 is not excluded as this strain was not cloned after isolation.

References:

(1) Dileep, V., Gilbert, D. M., Dileep, V. & Gilbert, D. M. Single-cell replication profiling to measure stochastic variation in mammalian replication timing. Nat. Commun. 9, 427 (2018).

(2) Miura, H. et al. Single-cell DNA replication profiling identifies spatiotemporal developmental dynamics of chromosome organization. Nat. Genet. 51, 1356–1368 (2019).

(3) Marques, C. A. et al. Genome-wide mapping reveals single-origin chromosome replication in Leishmania, a eukaryotic microbe. Genome Biol. 16, 230 (2015).

(4) Damasceno, J. D. et al. Leishmania major chromosomes are replicated from a single high-efficiency locus supplemented by thousands of lower efficiency initiation events. Cell Rep. 44, 116094 (2025).

(5) Imamura, H. et al. Evolutionary genomics of epidemic visceral leishmaniasis in the Indian subcontinent. eLife 5, e12613 (2016).

(6) Gonzalez-Pena, V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc. Natl. Acad. Sci. 118, e2024176118 (2021).

(7) Imamura, H. et al. Evaluation of whole genome amplification and bioinformatic methods for the characterization of Leishmania genomes at a single cell level. Sci. Rep. 10, 15043 (2020).

(8) Negreira, G. H. et al. High throughput single-cell genome sequencing gives insights into the generation and evolution of mosaic aneuploidy in Leishmania donovani. Nucleic Acids Res. 50, 293–305 (2022).

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation