Single-cell RNA-seq reveals trans-sialidase-like superfamily gene expression heterogeneity in Trypanosoma cruzi populations

  1. Lucas Inchausti
  2. Lucia Bilbao
  3. Vanina A Campo
  4. Joaquín Garat
  5. José Sotelo-Silveira
  6. Gabriel Rinaldi
  7. Virginia M Howick
  8. Maria A Duhagon
  9. Javier G De Gaudenzi
  10. Pablo Smircich  Is a corresponding author
  1. Laboratorio de Bioinformática, Departamento de Genómica, Instituto de Investigaciones Biológicas Clemente Estable, Uruguay
  2. Sección Genómica Funcional, Facultad de Ciencias, Universidad de la República, Uruguay
  3. Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín-Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina
  4. Escuela de Bio y Nanotecnologías (EByN), Universidad Nacional de San Martín, Argentina
  5. Departamento de Genómica, Instituto de Investigaciones Biológicas Clemente Estable, Uruguay
  6. Sección Biología Celular, Facultad de Ciencias, Universidad de la República, Uruguay
  7. Department of Biology, University of Oxford, United Kingdom
  8. School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, United Kingdom
9 figures, 1 table and 5 additional files

Figures

Identification of amastigote and trypomastigote cell populations.

(a) UMAP colored by detected clusters based on gene expression profiles. (b) Heatmap of the top 10 gene markers upregulated in each of the three-cell populations identified (log2FC >1 and adjusted p-value <0.05). (c) Expression of a cluster 0 marker gene (C4B63-16g183) on the UMAP, and (d) Expression of a cluster 1 marker gene (C4B63-16g155) on the UMAP.

Figure 2 with 1 supplement
Overview of expression patterns across amastigote and trypomastigote cells.

(a) Summation of expression levels values from all multigene family genes for each cell from amastigote (Cluster 1) and trypomastigote (Cluster 0) cell populations (**** p<0.0001, meanAma = 137.2, SDAma = 48.7, meanTrypo = 201.1, SDTrypo = 48.6, FCTrypo/Ama = 1.5). (b) UMAP visualization of the expression patterns of multigene family genes; num_multigene indicates the number of multigene family genes detected per cell (genes with >0 UMI counts). Z_multigene reflects the relative expression level of multigene family genes per cell, calculated as the z-score-standardized sum of their UMI counts, such that positive values reflect above-average multigene family expression and negative values reflect below-average levels. (c) Violin plots showing the number of cells expressing a specific gene belonging to each group of genes: subsampled single-copy and multigene families, ribosomal genes, and trans-sialidases. To avoid biases against size differences between single-copy and multigene family genes, we generated a subsampled single-copy genes list, randomly selecting an equal number of genes as those from the multigene family’s gene set. The expression distribution of the subsampled single-copy genes is similar to the distribution of the entire dataset (* p<0.05, **** p<0.0001. See Supplementary file 3). (d) Lorenz curves showing the cumulative proportion of gene expression relative to the cumulative proportion of genes for subsampled single-copy, multigene family genes, ribosomal protein coding genes, and trans-sialidase-like genes. Genes were ordered by total expression, and the dashed line indicates perfect equality. Curves that deviate further from the diagonal reflect greater inequality, meaning that fewer genes account for most of the expression within each category. Statistically significant differences between groups for c and d are shown in Supplementary file 3.

Figure 2—figure supplement 1
Extended overview of expression patterns across amastigote and trypomastigote cells.

(a) Summatory of expression levels values from subsampled single-copy genes for each cell from amastigote (Cluster 1) and trypomastigote (Cluster 0) cell populations (**** p<0.0001, meanAma = 394.9, SDAma = 101.7, meanTrypo = 353.1, SDTrypo = 62.1, FCAma/Trypo = 1.1)., (b) UMAP projection for 2D visualization of core gene expression among cells,; num_multigene indicates the number of multigene family genes detected in each cell, whereas z_multigene indicates the expression levels calculated by summing the UMI counts of all multigene family genes in each cell and then standardizing this value using a z-score transformation, such that positive values reflect above-average multigene family expression and negative values reflect below-average levels. (c) Violin plots showing the number of cells expressing a specific gene belonging to each group of genes: subsampled single-copy and multigene families, ribosomal genes and different multigene families, and (d) Lorenz curves showing the cumulative proportion of gene expression relative to the cumulative proportion of genes for subsampled single-copy, multigene family genes (together or grouped by multigene family) and ribosomal protein coding genes. Genes were ordered by total expression, and the dashed line indicates perfect equality. Curves that deviate further from the diagonal reflect greater inequality, meaning that fewer genes account for most of the expression within each category. Statistically significant differences between groups for c and d are shown in Supplementary file 3.

Figure 3 with 1 supplement
Trypomastigote subpopulations identified based on trans-sialidase expression profiles.

(a) UMAP visualization colored by detected clusters based on gene expression profiles, with trypomastigote subpopulations identified. (b) violin plot displaying average expression levels of ribosomal protein-coding genes across sub-populations. (c) violin plot showing combined trans-sialidase expression levels for each sub-population. * p<0.05, **** p<0.0001.

Figure 3—figure supplement 1
Trypomastigote sub-clusters identified based on trans-sialidase expression profiles.

Violin plots displaying average expression levels across sub-clusters and associated fold changes (FCtrypo_0/trypo_1) of (a) transporters coding genes, (b) DNA and RNA polymerase-associated protein coding genes, (c) phosphatases coding genes and (d) multigene family genes. **** p<0.0001.

Figure 4 with 1 supplement
Overview of TcS gene expression patterns in Trypo_0 cells.

(a) Heatmap displaying the expression of TcS genes in each cell that together account for 75% of total TcS gene expression within cluster Trypo_0. Cells are clustered by TcS expression profiles, with colors representing each gene’s percentage contribution to the cell’s total TcS expression. (b) Average expression of TcS genes grouped by the percentage of cells expressing each gene. In red are highlighted the top 50 TcS with highest average expression. (c) Gini index distribution for trypomastigotes cluster 0 (Trypo_0) cells considering only TcS detected in each cell[1] (d) Lorenz curves showing, for each cell in cluster Trypo_0, the cumulative proportion of total TcS expression as a function of the cumulative proportion of detected TcS genes. Genes were ordered by total expression, and the dashed line indicates perfect equality (i.e. all detected TcS genes contribute equally to the total TcS expression of a given cell). Green and orange curves correspond to cells with higher and lower expression equality, respectively.

Figure 4—figure supplement 1
Overview of gene expression of high frequency TcS.

(a) Venn diagram showing the overlap between the top 100 most expressed TcS from bulk RNA-seq data and TcS expressed in more than 40% of cells from cluster trypo_0, and (b) Heatmap displaying the expression of TcS genes in each cell that together account for 75% of total TcS gene expression and are expressed in more than 40% of cells within cluster trypo_0. Cells are clustered by TcS expression profiles, with colors representing each gene’s percentage contribution to the cell’s expression.

Genomic context and neighborhood composition of frequently detected versus lowly detected TcS loci.

(a) Representative genomic loci of frequently detected (top, high abundance) and lowly detected (bottom, low abundance) TcS genes. Genes are shown as arrows, colored according to genomic compartment: core (dark green), disruptive (salmon), and TcS genes under analysis (yellow). Chromosomal coordinates are indicated below each locus. (b) Comparison of the percentage of multigene-family neighbors within polycistronic transcription units containing frequently detected and lowly detected TcS genes. Lowly detected TcS genes were subsampled to n=30. Wilcoxon rank-sum test: p=6.6 × 10⁻⁷. (c) Mean percentage of multigene-family neighbors in polycistrons calculated from 50 random subsets (n=30) of lowly detected TcS genes (mean = 37.76%). The orange dot indicates the corresponding mean for frequently detected TcS genes (9.27%).

Author response image 1

(A) Distribution of pairwise sequence identity values calculated among the 3′-end regions of all transcripts (defined as the 3′UTR plus 20% of the coding sequence). (B) Distribution of read mapping coordinates over all multigene family transcripts normalized as percentage of the gene length (C) Scatter plots showing the correlation between estimated transcript counts obtained using kallisto (red) and STAR + featureCounts (grey) versus the corresponding simulated ground-truth values.

Author response image 2
Correlation analysis of number of reads assigned to cells between technical replicate 1 and technical replicate 2.
Author response image 3
Per-gene number of expressing cells by TcS group and life-stage.

Boxplots show, for each TcS group (I–VIII), the distribution across genes of the number of cells in which the gene is detected. Each point represents a single TcS; Amastigote cells: green points/boxes, Trypomastigote cells: salmon points/boxes. The y-axis is on log10 scale. Asterisks indicate statistically significant differences from the comparison between Amastigote and Trypomastigote within each TcS group, assessed using a paired two-sided Wilcoxon signed-rank test: * p < 0.05, ** p < 0.01, *** p < 0.001.

Author response image 4
Percentage of mitochondrial and ribosomal rRNA derived reads.

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Trypanosoma cruzi)Dm28cContreras et al., 1985--
Cell line (Rattus norvegicus)H9c2ATCCCRL-1446
RRID:CVCL_0286
Provided and validated by ATCC (not verified in-house).
Commercial assay, kitChromium Next GEM Single Cell 3ʹ10x Genomicsv3.1Performed by service provider
Software, algorithmSeurat (R)Hao et al., 2024Version 5
RRID:SCR_016341
-
Software, algorithmkallisto bustoolsMelsted et al., 2021Version 0.51.1
Version 0.45.1
Software, algorithmpeaks2UTRZhang et al., 2017Version 1.2.6

Additional files

Supplementary file 1

Marker genes for identified cell clusters by scRNA-seq.

(a) Genes significantly upregulated in cluster 0 (adjusted P-value <0.05), (b) Genes significantly upregulated in cluster 1 (adjusted P-value <0.05) and (c) Genes significantly upregulated in cluster 2 (adjusted P-value <0.05)

https://cdn.elifesciences.org/articles/105822/elife-105822-supp1-v1.xlsx
Supplementary file 2

Correlation of results from differential expression analysis using bulk RNA-seq data with the top 10 gene markers from each of the three identified clusters from scRNA-seq data analysis.

Positive log2FC corresponds to upregulated genes in the trypomastigote stage, whereas negative log2FC corresponds to upregulated genes in the amastigote stage. Gene C4B63_196g28 showed no statistically significant changes in expression. Genes without values were not identified as differentially expressed in the bulk RNA-seq data.

https://cdn.elifesciences.org/articles/105822/elife-105822-supp2-v1.xlsx
Supplementary file 3

Statistical analysis results from gene expression heterogeneity of gene groups.

(a) Summary of the number of cells expressing genes by gene group. Shown are the mean, relative standard deviation (rsd), and median of the number of cells expressing each gene within the indicated groups. (b) Pairwise statistical comparisons between gene groups based on the number of cells expressing each group. Differences between groups were assessed using the Wilcoxon rank-sum test. (c) Summary statistics of the Gini index across gene groups. The Gini index was computed for each gene as a quantitative measure of expression inequality across cells, derived from the corresponding Lorenz curves from Figure 2d. Reported values indicate the mean, relative standard deviation (rsd), and median Gini index for each gene group. (d) Pairwise statistical comparisons of Gini index distributions between gene groups. Differences between groups were assessed using the Wilcoxon rank-sum test. Reported values correspond to p-values for each pairwise comparison.

https://cdn.elifesciences.org/articles/105822/elife-105822-supp3-v1.xlsx
Supplementary file 4

Percentage of cells in which each TcS is expressed in trypomastigote cell population.

(a) All trypomastigote cells, (b) Trypomastigote cells corresponding to the trypo_ 0 cluster and (c) Trypomastigote cells corresponding to the trypo_ 1 cluster.

https://cdn.elifesciences.org/articles/105822/elife-105822-supp4-v1.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/105822/elife-105822-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lucas Inchausti
  2. Lucia Bilbao
  3. Vanina A Campo
  4. Joaquín Garat
  5. José Sotelo-Silveira
  6. Gabriel Rinaldi
  7. Virginia M Howick
  8. Maria A Duhagon
  9. Javier G De Gaudenzi
  10. Pablo Smircich
(2026)
Single-cell RNA-seq reveals trans-sialidase-like superfamily gene expression heterogeneity in Trypanosoma cruzi populations
eLife 14:RP105822.
https://doi.org/10.7554/eLife.105822.3