Abstract
Mass cytometry, also known as Cytometry by time-of-flight (CyTOF), is a cutting-edge high-dimensional technology for profiling marker expression at the single-cell level. This technology significantly advances clinical research in immune monitoring and the interrogation of immune cell populations. Nevertheless, the vast amount of data generated by CyTOF poses a daunting challenge for analysis. To address this, we describe ImmCellTyper (https://github.com/JingAnyaSun/ImmCellTyper), a novel and robust toolkit designed for CyTOF data analysis. The analytical framework incorporates an in-house developed semi-supervised clustering tool named BinaryClust, which first characterises main cell lineages, followed by in-depth interrogation for population of interest using unsupervised methods.
BinaryClust was benchmarked with existing clustering tools and demonstrated superior accuracy and speed across two datasets comprising around 4 million cells, performing as good as manual gating by human experts. Furthermore, this computational pipeline provides a variety of visualization and analytical tools spanning from quality control to differential analysis, which can be tailored to user’s specific needs, aiming to provide a one-stop solution for CyTOF data analysis. The general workflow consists of five key steps: 1) Batch effect evaluation and correction, 2) Data quality control and pre-processing, 3) Main cell lineage characterisation and quantification, 4) Extraction and in-depth investigation of cell type of interest; 5) Differential analysis of cell abundance and functional marker expression (supporting multiple study groups). Overall, ImmCellTyper integrates expert’s biological knowledge in a semi-supervised fashion to accurately deconvolute well-defined main cell lineages, while also preserving the potential of unsupervised approaches to discover novel cell subsets and providing a user-friendly toolset to remove the analytical barrier for high-dimensional immune profiling.
Introduction
Mass cytometry or cytometry by time-of-flight (CyTOF) is a powerful high-throughput single-cell technology which employs stable elemental isotopes, as the same manner of fluorophores, to detect cellular proteins of interest. This approach successfully tackles the panel multiplex challenges faced by traditional flow cytometry due to spectral overlap and permits simultaneous measurement of over 40 parameters on millions of cells. To date, CyTOF has been widely applied in basic and translational medical research, such as deep immunophenotyping and characterisation of novel refined cell subsets, immune monitoring of cell-adoptive therapy, and dissecting cell subpopulations from heterogeneous tumour samples 1. Nonetheless, the advantages of CyTOF come with the problem of handling high-dimensional dataset. Traditional gating strategy for flow cytometry, while still serving as the ‘gold standard’ for cell population identification in cytometry data, may not be an optimal option for CyTOF data due to its high-dimensional settings. The high-parametric resolution in CyTOF, aimed at revealing previously undiscovered cell subpopulations, leads to a significant increase in the complexity of gating schemas and hierarchical depth, which makes manual gating extremely labour intensive and time-consuming 2. Therefore, effective computational toolkits and pipelines are entailed for CyTOF data mining and analysis.
Efforts have been directed toward developing means of clustering algorithms to deconvolute a pool of live cell mixture into distinct cell populations, which facilitates CyTOF data analysis. For instance, unsupervised methods, which include flowSOM3, Phenograph4, X-shift5, spade6, and DensVM7 etc., often combine with dimension reduction techniques like t-SNE8, UMAP9, PCA etc., and require manual annotation of each cluster based on the marker expression patterns indicated by heatmaps. This approach works well to analyse populations in a data-driven manner, with all files concatenated and analysed all at once. Compared to manual gating, it has advantages in terms of convenience, efficiency, and relative unbiasedness from biological preconception, facilitating the detection of novel cell phenotypes for deep phenotyping. However, the unsupervised approaches are not always ideal and suitable for cytometry data. Mathematical clustering does not necessarily have biological meaning, leading to occasional inaccuracies. And several benchmarking studies suggested that the accuracy for these unsupervised tools may not be optimal10. Additionally, the technical uncertainty of results produced by different clustering approaches remains a conundrum. Even for the same unsupervised method, the discrepancy among different runs without setting a seed, reduces the reproducibility and may cause confusion, particularly for biologists with limited computational knowledge. Moreover, manual validation is also essential, as biological annotation is the inevitable step to provide biological relevant labels for the clusters. However, this process can be time-consuming and subjective, hindering the automation of pipelines. This problem is particularly pronounced for a large marker panel and samples with high heterogeneity, resulting in a higher number of clusters that need to be annotated.
Advances in artificial intelligence have accelerated the development of alternative clustering methods in a supervised manner for cell type inference. These methods consider the ‘ground truth’ or prior knowledge about the marker expression of each given cell types to automatically label each cell. Currently, a couple of semi-automatic methods have been developed, such as Linear Discriminant Analysis (LDA)11, DGCyTOF12, CyAnno13, DeepCyTOF14, Automated Cell-type Discovery and Classification (ACDC) 15and Semi-supervised Category Identification and Assignment (SCINA) 16 etc. ACDC15 and SCINA16 use a matrix of pre-defined markers for each cell type to annotate the clusters showing the same signature. These methods assume that markers are either expressed or not expressed (binary), which limits their ability to distinguish cell subtypes with similar phenotypes, particularly non-canonical cell types that cannot be easily separated linearly. Alternatively, DeepCyTOF and LDA use a marker expression matrix extracted from manually gated cell types as a training dataset to build a machine learning model for cell type prediction. This approach has higher precision and accuracy compared with aforementioned methods10. Nonetheless, it can be labour intensive for preparation of the training set manually. Also, these methods are limited in their ability to predict novel cell subsets beyond the pre-gated set of cell types and lack a systematic and comprehensive way to assign the cells which were not identifiable under any of the gated cell types. New solutions have emerged with algorithms such as DGCyTOF12 and CyAnno13, the former adopts a deep learning classification combined with hierarchical stable-clustering methods and an iteration calibration system to identify known cell types and assign novel subsets; while CyAnno is based on a machine-learning framework which allows the integrative modelling of both ‘gated’ and ‘ungated’ cells. Both methods have demonstrated high accuracy in their test datasets, but unfortunately are not widely used by the research community, possibly due to the issue for the hassle of training data preparation, the lack of user-friendliness and implementation challenges for bench researchers.
To address the common drawbacks of current semi-supervised and unsupervised clustering algorithms and preserve their strengths in discovering both canonical and non-canonical cell subsets respectively, we propose a strategy implemented in ImmCellTyper for cell classification named BinaryClust. By considering biologists’ prior biological knowledge and interpretation for canonical cell clusters in a semi-supervised manner, BinaryClust first automatically characterises the main cell lineages in a fast and accurate way. Subsequently, it extracts specific cell types of interest for further clustering using unsupervised algorithms to identify cell subsets including previously unreported non-conventional population. In addition, this R-implemented pipeline takes advantage of SingleCellExperiment class for data management, providing an easy-to-use and organized systematic workflow of CyTOF data handling. The whole pipeline includes quality control and batch effect correction, which helps to effectively pool datasets from different batches, and ensures the robustness for downstream analysis. Meanwhile, modules like dimension reduction, semi-supervised and unsupervised clustering (flowSOM and Phenograph), interactive data visualisation, and statistical testing for complex study design were also incorporated in this pipeline. Compared with existing integrated computational workflow, such as CATALYST17, CapX18, Cytofkit19 and ImmunoCluster20 which was developed and maintained by our team, this recently developed toolkit advanced further on coherence, functionality and user-friendliness. Overall, this approach has the potential to facilitate and smooth the investigation of CyTOF-based research.
Results
BinaryClust is comparable with manual gating in quantifying the abundances of main cell lineages
The core concept for BinaryClust is depicted in Figure1, where a simple user-designed cell type marker expression matrix is required and serves as a reference to characterise the main cell lineages, with positive markers indicated in ‘+’, negative markers in ‘-’, and irrelevant markers in ‘A’. K-means (k=2) will be applied to divide the positive and negative cell population of each marker, then align it to the reference table to infer main cell types. This is followed by the extraction of population-of-interest for downstream clustering using unsupervised methods for subpopulation discovery.
To assess the performance of the automated cell type classification and prediction, we generated a test CyTOF dataset using peripheral blood mononuclear cell (PBMC) samples from seven patients with myeloproliferative neoplasm (MPN) and two healthy donors, employing a 37-marker deep immunophenotyping panel (Supplementary Table 1). Manual gating was performed by two independent experts which identified seven main cell lineages. These results act as the reference for evaluating the computer-aided methods. The hierarchical sequential gating strategy was explicitly illustrated in Supplementary Figure 1. We evaluated the agreement between manual gating results and BinaryClust results regarding cell frequencies of each population. The mean value was calculated from manual gating results of two experts to compare with BinaryClust-generated results using Pearson correlation analysis. As shown in Figure2A, the two methods exhibit a strong correlation with coefficient (R2) equals to 1,1,0.99,0.96,0.96,0.99,0.99 for CD4 T cells, CD8 T cells, dendritic cells, NK cells, monocytes, and gamma delta T cells, respectively (all P <0.0001). And most of the data points remain close to the line of equality (red line, R2=1), indicating a high degree of agreement. Meanwhile, the Bland-Altman plots in Figure 2B also suggest no consistent bias of manual gating versus BinaryClust across all the identified cell types. The good performance of BinaryClust was further validated in the influenza dataset published by our group21, which contains FCS files from 11 individuals with 6 main immune cell types detected, the average F-measure and ARI reached 0.98 and 0.91 respectively (Supplementary Figure2A-B).
BinaryClust achieves high accuracy and speed compared with flowSOM and LDA
To further evaluate BinaryClust’s performance, we compared it with the well-performing unsupervised algorithm flowSOM and supervised classifier Linear discriminant analysis (LDA). FlowSOM was run on the same MPN dataset with k value set as 20, followed by manual annotation and cluster merging to identify the same cell populations as in manual gating and BinaryClust. The cell frequencies derived from BinaryClust, flowSOM, and manual gating (expert1 and expert2) were compared using interaction plot (Figure 3). We observed that BinaryClust remains consistent with manual gating (all P>0.05), whereas flowSOM identified significantly less Gamma Delta T cells and dendritic cells (P<0.001 and P=0.006, respectively) compared with the other three measurements. We also increased the initial k value to 40 to improve accuracy by over clustering then merging clusters 3. We found that the average frequency for Gamma Delta T cells and dendritic cells increased to 1.5% and 4.65% respectively but still remained significantly different (P<0.05) from the other two methods. It is also interesting to see that although the same gating strategy was applied, different experts obtained results with slight variation despite no statistical significance. The same findings were confirmed using boxplot as shown in Supplementary Figure 3.
Subsequently, LDA was also tested on the same dataset. Since this method requires a training dataset from manual gating to build a model, we exported the cell events and cluster assignment from Cytobank, then equally partitioned all cells into training and test dataset.
Due to the difference in method implementation, we did not include LDA for cell abundance quantification comparison with other methods, since only half of the cells were used for prediction. To ensure an equitable comparison and further evaluate the accuracy, F-measure, and Adjusted Rand Index (ARI) were calculated using manual gating cluster IDs as reference. As indicated in Table 1, for the MPN dataset, BinaryClust achieved high F-measure and ARI in all 7 cell types with an average of 0.94 and 0.91 respectively. The performance remains excellent in influenza dataset as well (average F-measure = 0.98, Supplementary Figure 2C). LDA has equivalent prediction accuracy as BinaryClust with an average F-measure of 0.93 and ARI of 0.90. Both supervised methods outperformed flowSOM (average F-measure=0.75, average ARI=0.66), which is in line with previous benchmarking study on evaluating supervised and unsupervised clustering algorithms10. Notably, for cell types that constitute a substantial proportion in the pool like CD4 T cells, CD8 T cells, NK cells, monocytes, and B cells, flowSOM can identify them with high precision and sensitivity, while this is not the case for dendritic cells and gamma delta T cells which account for less than 3% of the whole population (F-measure = 0.51 and 0.15, respectively). Even so, the overall F-measure and ARI for flowSOM considering cell proportion reached 0.81 and 0.80.
It has been demonstrated that flowSOM and LDA are among the fastest clustering algorithms without compromising their performance. Here, we compared the speed amongst the three approaches in MPN, Influenza and COVID-19 datasets22which contain 2,231,053, 210,933 and 3,862,628 cells respectively. Since LDA requires training data to be executed, we did not run it on COVID-19 dataset due to the absence of manual gating results. As shown in Figure 4, BinaryClust exhibited the highest speed in both MPN and Influenza dataset but fell behind flowSOM slightly in COVID-19 dataset.
ImmCellTyper pipeline supports interactive data visualisation and comparison among study groups
BinaryClust is an important component of ImmCellTyper, a comprehensive integrated pipeline designed for systematic CyTOF data mining. Hence, we utilised the visualisation functions to further prove the reliability and robustness of BinaryClust.
BinaryClust inherently considers CyTOF makers as binary distributed which is in most of the case, but not always. Therefore, it is crucial to check the marker behaviour before running the pipeline. As shown in Figure 5A, all markers selected for the classification matrix (Figure 5B) displayed a binary distribution implying the suitability for this pipeline. After clustering (Figure 5C), median marker expression heatmap (Figure 5D) can check the reliability of the results before proceeding for downstream analysis. In MPN dataset, we projected the cluster assignment resulted from BinaryClust and manual gating to UMAP (Figure 5E), along with the expression of the phenotypic markers (Figure 5F and supplementary Figure 4A). High similarity was observed between BinaryClust and manual gating, whereas slight difference was found on flowSOM results coloured UMAP on islands of CD8 T cells and CD4 T cells (Supplementary Figure 4A). FlowSOM appeared to classify cells that were close on spatial distance into the same cluster, in contrast to prior knowledge-based methods: BinaryClust and manual gating.
Application of ImmCellTyper pipeline to the COVID-19 dataset demonstrates its versatile functionalities for comprehensive data analysis
To showcase the analytical and visualisation functions of ImmCellTyper pipeline, we used the dataset published by Chevrier et al. 22, which described the immune signature of mild and severe COVID-19 patients in comparison with healthy individuals. There are a total of 82 FCS files with a 40-plex marker panel focusing on innate immunity in this dataset. An initial marker expression check was carried out based on the user-defined matrix (Figure 6B) and displayed in Figure 6A. BinaryClust was then performed and identified 12 cell populations as expected. We used t-distributed stochastic neighbour embedding (t-SNE) in this dataset for dimension reduction, coloured by cell types and faceted by disease conditions, which exhibited substantial immune alteration among healthy control, mild and severe COVID-19 patients (Figure 6C). From the heatmap in Figure 6D, we can have an overview of the marker expression of each population, which remains consistent with our initial definition indicated in the expression matrix.
The abundances of the identified cell types were quantified for each individual study participant in the format of stacked histogram (Figure 7A) and summarized in boxplot (Figure 7B). Compared with healthy volunteers, there are significant immune alterations in COVID-19 patients in B cells, Basophils, cDCs, Monocytes, NK cells, CD8 T cells, Neutrophils and pDCs (all P<0.05), which depicts a similar trend to the original paper, despite slight discrepancies caused by different statistical methods used. Kruskal Wallis test was conducted for this dataset followed by multiple testing correction (Benjamini-Hochberg procedure) and Dunn’s test for post hoc analysis, while Chevirer et al. used Mann-Whitney Wilcoxon test corrected by Holm method. We then selected 9 markers (IL-6, PD-L1, VISTA, IDO, TIM-3, TMEM173, GranzymeB, PPARg, Ki-67) as the state markers which can reflect the functional or proliferative status of the immune cell types. Notably, Granzyme B was observed to be significantly highly expressed in COVID-19 patients versus healthy control across all identified immune cell types (all P<0.05), TMEM173 was substantially up-regulated in COVID-19 patients particularly in monocytes, neutrophils, and CD4 T cells; and PD-L1 expression remained low for all cells, indicating the immune system of COVID-19 patients were highly activated without exhaustion regardless of disease severity, which was not explored previously. Since monocytes and neutrophils were of particular interest in the original paper, and the panel also included specific markers for in-depth interrogation, we then extracted the two population and carried out Phenograph (k=60) to investigate the subclusters. As shown in Figure 7D, Phenograph returned 16 and 14 subclusters respectively for monocytes and neutrophils, with heatmap provided in Supplementary Figure 5.
Reanalysis of the COVID-19 dataset demonstrated the concordance of ImmCellTyper pipeline with original reports with additional findings, as well as its versatile interactive data visualisation functionalities.
Discussion
In this study, we present an analytical pipeline named ImmCellTyper for systematic exploration of CyTOF data. This pipeline addresses a comprehensive range of analytical needs encompassing data quality check, batch effects examination/correction, cell type identification, and downstream differential analysis accompanied by high-quality, publishable data visualisations. Furthermore, ImmCellTyper includes an in-house developed, knowledge-based, semi-supervised classifier BinaryClust with high accuracy and speed, and also integrates the well-performing and state-of-the-art unsupervised algorithms (Phenograph and flowSOM). By adopting the strategy of first obtaining the main cell types and then select specific cell types for in-depth interrogation with higher clustering resolution, ImmCellTyper combines the advantages of both supervised and unsupervised clustering algorithms for discovery of both major populations and refined subpopulations. Designed for ease of use, this pipeline features a clear and user-friendly workflow, and is compatible to the widely used pipeline CATALYST, aiming to provide a one-stop solution for CyTOF users.
For validating the robustness of the automated cell characterization function of ImmCellTyper: BinaryClust, we involved two independent datasets (MPN dataset and Influenza dataset) and compared with existing clustering tools flowSOM and LDA. Previous Benchmarking studies indicated that flowSOM was one of the best-performing unsupervised tools in precision, speed and stability, with F-measure ranges from 0.58∼0.90 in various datasets10, and was widely used in high-impact research publications10,23. However, the results of flowSOM vary upon the k values set by the users which causes some uncertainty and impairs reproducibility of the results. In contrast, semi-supervised clustering methods like ACDC, DeepCyTOF14 and LDA outperformed the unsupervised methods in the same benchmarking study. The F-measure obtained by ACDC and LDA ranged from 0.78∼0.99, which was significantly higher than unsupervised approaches including Accense, PhenoGraph, Xshift, k-means, flowMeans24, FlowSOM and DEPECHE25 (F-measure: 0.28∼0.93)23. In this study, we chose LDA because it has proven to be superior to ACDC and DeepCyTOF in terms of precision and speed10,11. The excellent performance of LDA was demonstrated in MPN dataset of this study with F-measure reached 0.93 for around 4 million cells. BinaryClust was comparable in accuracy (F-measure=0.94) but much faster in speed. Both semi-supervised approaches outperformed flowSOM as expected. Although semi-supervised methods seem to perform better, the mainstream for CyTOF data analysis still relies on unsupervised methods. The reason for this might be that most of the semi-supervised methods rely on manual gating as reference and requires either the manually gated expression matrix or user-defined binary matrix as training data. This process takes extra time and efforts especially for defining subpopulations, and may introduce end-user bias. Additionally, the common restriction for all the supervised methods is that they have limited capability to reveal novel subsets as those are not predefined in the training pool, which poses a critical challenge for the pursuit of novel discoveries. Considering all these strength and limitations, as a semi-supervised method, BinaryClust possesses the advantages of incorporating prior biological knowledge, being reproducible across runs and accurate on cell type identification. Meanwhile, the input requirement is a simple matrix of cell population definition based on marker positivity, avoiding the hassle to manually gate example FCS files. With a deep understanding that certain CyTOF markers are expressed on a continuum rather than binary manner, this automated supervised approach will only be used for classification of main cell lineages. The test datasets in this study contained data from human PBMC samples which represented a heterogenous pool of immune cells, covering the major types of various immune populations, and demonstrated excellent performance of BinaryClust on classifying CD4 T cells, CD8 T cells, gamma delta T cells, NK cells, monocytes, dendritic cells, and B cells. We also recommend checking marker expression distribution before running the pipeline to ensure accurate results. After the above step, this pipeline supports cell population extraction for further dissection into subclusters using unsupervised approaches, which excel in detecting rare and refined subpopulations. Phenograph is a particularly effective tool for this purpose. By applying this strategy, our previous work with BinaryClust facilitated the evaluation of the impact of systematic anti-cancer agents on lymphocyte population in non-small cell lung cancer patients, as well as the characteristics of autologous T cell products after manufacturing 26.
Another purpose behind the development of the ImmCellTyper framework is to cater to general analysis needs, spanning from pre-processing to downstream differential analysis. There are existing well-established pipelines like CATALYST17, diffcyt27, Cytofkit19, and ImmunoCluster20 etc., each of them has its own strength and limitations. One of the challenges existed in CyTOF data pre-processing is effectively integrating multiple batches, given the technical differences arising from experiments and instrumental acquisition, which could result in the separation of clustering across batches and potentially confounding the signal of interest. In ImmCellTyper pipeline, we provide the functionality for batch effects examination and incorporated two well-performing batch correction algorithms CytoNorm28 and CytofRUV29 into our framework prior to downstream analysis, which ensures the quality of the data and, to the best of our knowledge, makes it the first integrated pipeline with this module. For the downstream analysis, tools like diffcyt and CATALYST group markers into phenotypic and state makers, enabling the detection of differentially abundant cell clusters and differential expression of functional markers within each cell population27. This well-recognised strategy has been extensively applied in various studies. But one limitation is that it directly classifies cells into high-resolution clusters, and cannot automatically merge subclusters into one major population with similar phenotypes27. Our pipeline addresses this problem via a precise and convenient solution: using the semi-supervised classifier (BinaryClust) of ImmCellTyper. The two pipelines (CATALYST and ImmCellTyper) are well compatible, allowing users to leverage their functions together and providing a broader range of analytical options. On the other hand, Cytofkit has the advantages of the graphical user interface (GUI) for non-specialists, integration of a variety of clustering (DensVM7, ClusterX, FlowSOM, Phenograph), dimension reduction (PCA, ISOMAP, t-SNE) methods, and inference of the relatedness among cell populations, but it does not support complex study design, group comparison and statistical testing19. Several methods have been developed to fit with the high-dimensional features of CyTOF data for differential analysis including diffcyt, CellCnn30, cydar31, citrus32, cyEMD33 etc. Nonetheless, none of them is applicable to compare more than two study groups at once which involves multiple testing. Therefore, to accommodate study designs involving multiple groups, ImmCellTyper does statistics via first applying Kruskal Wallis test, followed by Benjamin-Hochberg (BH) procedure for multiple testing correction and post hoc analysis via Dunn’s test or pairwise-Wilcoxon test. In cases where the comparison involves only two groups, Mann-Whitney test will be applied in terms of cell frequency, as the distribution of CyTOF data generally does not fit for normal distribution.
One limitation of this computational pipeline is that its semi-supervised cluster identification may not be well-suited for the direct identification of specific subpopulations which are defined by continuum markers. This is due to the method’s reliance on the presumption that makers are binary distributed. In addition, for users with general interest in charactering subclusters of each main cell lineage, it can be laborious to first perform BinaryClust then extract every population for unsupervised clustering. In such scenarios, BinaryClust can be used in parallel with other clustering methods or pipelines like CATALYST. This allows users to quickly obtain additional information about main cell types with the accuracy comparable to manual gating.
In summary, we introduce a novel open-source R-implemented strategy and a versatile toolbox for CyTOF data analysis. The future direction is to automate the data clean-up, compensation, and bead normalisation steps. Additionally, we also aim to implement the whole pipeline into Python, a more accessible programming language for bioinformatics novices and biologists who wish to perform high-dimensional data analysis independently.
Materials and methods
BinaryClust
Most of the CyTOF markers exhibit log-normal or bi-modal distribution with zero inflation after arcsinh transformation. Thus, we employ binary classification using k-means to group cells into negative and positive populations (k=2) for each marker indicated in the user-defined classification matrix. Here, k-means is an unsupervised clustering algorithm to cluster the data based on the Euclidean distance among points, which is calculated by the formula below:
Then assign each data point into a cluster centroid which is denoted by ci, and dist() is the Euclidean distance:
By aligning the k-means results with the user-designed classification matrix, cell populations can be subsequently classified and annotated.
R package ImmCellTyper
This computational pipeline is implemented in the R package ImmCellTyper and publicly available on Github (https://github.com/JingAnyaSun/ImmCellTyper). Instructions for package installation and function usage can be found in the README file on the Github page. It is recommended to first examine the unwanted non-biological variation across batches and perform additional batch normalisation if necessary. Afterwards, import the data into the second part of the pipeline for downstream analysis. Users are required to prepare all the FCS files, sample metadata containing the details and grouping information of each sample, panel metadata with the information of the antibody panel with metal tags used in the experiments, and cell type classification matrix with phenotypic marker expression in a binary manner of each cell lineage. All files need to be in the right format to use the pipeline.
ImmCellTyper workflow overview
ImmCellTyper pipeline is composed of seven steps corresponding to two separate sub-pipelines, as described in Figure 8 and Github vignettes (https://github.com/JingAnyaSun/ImmCellTyper/tree/main/vignettes):
Sub-pipeline1 (corresponds to Figure 8, workflow step 2): Batch effects evaluation and correction. Batch effects occur when samples were collected and measured at different sites or time points, especially for large-scale studies. It is crucial to remove the unwanted variation which might interfere the true biological differences. After bead normalisation and data clean-up (Figure 8, workflow step 1), which can be performed using third-party platforms, such as CyTOF v7.0 system control software and Cytobank, users can systematically examine batch effects on two levels including marker behaviours and clustering results based on the method introduced by Trussart et al.29If needed, CytoNorm and CytofRUV, which are well-performing correction algorithms, can be used to align the existing batch effects.
Sub-pipeline2 (corresponds to Figure 8, workflow step 3-7): semi-supervised classification, differential analysis, and in-depth investigation. When data are cleaned and normalised, they can be imported into the second part of the pipeline, constructed into a SingleCellExperiment (SCE) object, and undergo semi-supervised classification to identify the major cell types and test the differential frequencies or state marker expression among study groups. After that, if the users have certain interests of specific cell types and pre-design the panel for that, or in another circumstance, the initial statistics draw the user’s attention into a certain cell type, institutively, further clustering using unsupervised tools like flowSOM or Phenograph should be conducted with an increased cluster resolution and deeper investigation for cell subsets, after extracting the cell population of interest. ImmCellTyper pipeline has the same data storage and infrastructure as CATALYST, therefore all the functions in CATALYST can be seamlessly used in ImmCellTyper, tailored to user’s analytical needs. We do not elaborate on the basic functions of CATALYST, which can be found in the tutorial vignettes of the package (https://github.com/HelenaLC/CATALYST).
Batch correction algorithms
The batch effect correction algorithms embedded in the function of ‘batchNorm’ comprises CytoNorm, as described by Van Gassen et al.28 and CytofRUV by Trussart et al.29. Both algorithms rely on anchors (reference samples/technical replicates) across batches to perform normalisation. CytoNorm uses flowSOM clustering to first identify clusters prior to a population-specific transformation on the reference samples by computing the quantile values and aligning them with splines, whereas CytofRUV applies remove unwanted variation III (RUV-III) to CyTOF data by estimating and eliminating the non-biological variation of the pseudo-replicates.
Agreement evaluation
To assess the agreement among manual gating, BinaryClust classification and flowSOM clustering, we used correlation, interaction plot and Bland-Altman analysis. Correlation evaluates the relationship between two variables which doesn’t mean concordance, but if two methods agree, surely, they should be highly correlated. We compared the results of cell population frequency generated by experts and BinaryClust using Pearson correlation, calculated the correlation coefficient and P value, with the line of equality indicating perfect agreement (red solid line, R2=1) in the plots to help gauge the degree of agreement between the two methods; Bland-Altman plot refers to a dot plot of the difference between two variables (y-axis) against the mean of them (x-axis), as described by J.Martian Bland and Douglas G. Altman in 1986, which represents a graphical magnitude of bias (average of difference) with 95% confidence interval. The math formula for the limit of agreement is as below:
Limits of agreement = mean difference observed ± 1.96 × standard deviation
We also used Interaction plot to display the interaction effects of the three methods including manual gating, BinaryClust classification and flowSOM clustering on the measurement of cell frequencies to evaluate the agreement.
F-measure
F-measure is a method to assess the accuracy of the clustering method compared to gold standard, which is manual gating results in this study. It stands for the harmonic mean of the precision and recall values, which can be calculated using below formula:
Here, precision represents the positive predictive value: the proportion of true positive instances divided by the instances classified as positive by the clustering algorithm; recall evaluates the sensitivity, which is the number of true positive events correctly identified by the algorithm among all events that belong to the cluster. F-measure ranges from 0 to 1, where 1 suggests perfect performance and 0 indicates poor precision and recall.
Adjusted Rand Index
Adjusted rand index is a widely used method to measure the similarity between two clustering results. We used it to assess the agreement between test clustering algorithm and the gold standard labels which is derived from manual gating. ARI is defined as the following formula based on a contingency matrix (where nij, ai, bj are values from the contingency table):
ARI ranges from -1 to 1, with 1 indicates a perfect match, 0 means not relevant, and -1 indicates complete mismatch between two clustering results.
Differential analysis
In order to perform the differential analysis of cell abundances and state marker expression amongst study groups, we consider the number of study groups into two conditions in function ’StatTest’: 1) n=2, Mann-Whitney Wilcoxon analysis will be applied as cell frequency/marker expression doesn’t fit normal distribution; 2) n=3, Kruskal-Wallis test will be first performed followed by multiple testing correction using BH procedure and post-hoc analysis (Dunn’s test or pairwise Wilcoxon test).
Sample collection and preparation for CyTOF
PBMC samples from MPN patients were requested and obtained from biobank at Guy’s Hospital, under a protocol approved by the KCL Biobank Access Committee (REC18/EE/0025). Healthy volunteers were recruited at Guy’s hospital with informed consent and ethical approval by King’s College Research Ethics Committee (HR-17/18-5960 MOD-20/21-5960) in accordance with the Declaration of Helsinki. All identifiable information of study participants were securely stored in a trusted research environment managed by members of the team.
Venous blood samples from healthy volunteers were collected in BD vacutainer EDTA tubes, and PBMCs were isolated and purified using Ficoll-Hypaque density gradient centrifugation. In brief, blood was carefully layered onto Ficoll and spined at 460g for 20 minutes at room temperature (RT) without brake; Upon observing a clear separation of blood components, PBMCs were then carefully isolated and washed three times with RPMI 1640 medium to remove other contaminants. Subsequently, the purified PBMCs were cryopreserved in liquid nitrogen at a density of 1X107 cells per vial.
CyTOF Antibody staining
Cryopreserved cells were quickly thawed at 37°C in water bath, suspended in pre-warmed RPMI 1640 medium, washed three times, and underwent Fc-blocking using human TruStain FcX (Biolegend) for 10 mins at RT. Given that the antibody panel employs indirect detection of CD95 using anti-CD95-APC and anti-APC-106Cd, cells were initially stained with anti-CD95-APC for 30 minutes in a dark place, followed by two washes with 2ml cell staining buffer (CSB).
After the removal of supernatant, cells were resuspended in 300µl CSB and transferred into the dip tube containing lyophilized antibody mix of Maxpar direct immune profiling assay (MDIPA, Standard BioTools). Additional antibodies used to study T cell activation, migration, and exhaustion status (anti-TIM3, anti-PD-1, anti-ICOS, anti-TIGIT, anti-OX40) were also added.
Cells were incubated with the antibody mix for 30 mins at RT in compliance with manufacturer’s instructions, washed twice and fixed using a freshly prepared 1.6% paraformaldehyde (PFA, Thermo Fisher Scientific) solution for 10 mins at RT. In the end, cells were washed with CSB twice to remove residual PFA and stained with 125nM Cell-ID intercalator-Iridium in 1 ml Maxpar Fix and Perm Buffer (Standard BioTools) overnight at 4°C.
The stained cells were frozen down using freezing media (fetal bovine serum containing 10% DMSO) and stored at -80°C freezer before data acquisition.
On the day for CyTOF acquisition, cryopreserved samples were thawed, washed twice with 1ml CSB, followed by additional two washes with 1ml cell acquisition solution (CAS, Standard BioTools), and centrifuged at 800g for 5 minutes. Cells were then filtered through a 40 μm cell strainer to avoid blockage and cell count was determined by Countess automated cell counter (Invitrogen). EQ four-element calibration beads (Standard BioTools) were added to a final concentration of 0.5×106 cells/ml to adjust signal fluctuation of the instrument. CyTOF acquisition was performed on Helios mass cytometer system. For each batch of cell staining and run, technical replicates from the same healthy donor were included to evaluate and correct batch effects.
The full antibody panel including the metal tag, clone, and supplier is listed in Supplementary Table 1.
CyTOF data pre-processing
FCS files were first processed for bead normalisation using CyTOF v7.0 system control software (Standard BioTools) to correct signal drift during acquisition. Subsequently, the files were imported into Cytobank (Cytobank Inc.) for data cleaning, with a detailed procedure illustrated in Supplementary Figure 6. The aim is to remove non-events including debris, doublets, normalisation (EQ) beads and other undesired events like dead cells. The CD45+ cell population was pre-gated using Cytobank for down-stream analysis.
Manual gating
Manual gating for major cell populations was performed using Cytobank platform, and gatingML files were exported and converted into gatingSet object to extract labels of each cell via the R packages flowWorkspace, openCyto and CytoML. These data were used as reference for benchmarking computer-aided algorithms.
Dataset availability
In this study, we tested ImmCellTyper pipeline on the MPN cohort with 7 MPN patients and 2 healthy volunteers, influenza cohort with 11 patients and the COVID-19 cohort with 59 COVID-19 patients and 23 healthy volunteers. The FCS files (after clean-up and gating of CD45 population) and metadata of the MPN cohort were deposited in Zenodo (doi: 10.5281/zenodo.10076940); The influenza cohort was published by our lab21, and the data were stored in Zenodo (doi: 10.5281/zenodo.7982165); The COVID-19 dataset was previously published by Chevrier et al.22, and the FCS files can be retrieved from Mendeley Data (https://data.mendeley.com/datasets/vyy8ttw7n9/1).
Competing interests
The authors declare that there are no competing interests exist.
Acknowledgements
This work is supported in part by: Blood Cancer UK [Ref. 22009] and CRUK City of London Centre (CoL) Award [CTRQQR-2021/100004] to MM and SK; JS was supported by the K-CSC scholarship.
We thank BRC flowCore at Guy’s Hospital for technical support, Biobank at King’s College London for patient samples; We are thankful for Cynthia Bishop, Katrina Todd, Rosa Andres Ejarque, Elena Torre, Robert Page, Rita Antunes Dos Reis and Mishto lab members at King’s College London, Daniel Dancer, Thomas Adejumo at Standard Biotools for technical assistance and sample management, as well as all study participants for their kind support for this research. Special thanks to Eliezer Van Allen and Jihye Park at Dana Farber Cancer Institute to review and comment this work.
Supplementary Figures
References
- 1.Mass Cytometry: Single Cells, Many FeaturesCell 165:780–791https://doi.org/10.1016/j.cell.2016.04.019
- 2.A Beginner’s Guide to Analyzing and Visualizing Mass Cytometry DataJ Immunol 200:3–22https://doi.org/10.4049/jimmunol.1701494
- 3.FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry dataCytometry A 87:636–645https://doi.org/10.1002/cyto.a.22625
- 4.Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with PrognosisCell 162:184–197https://doi.org/10.1016/j.cell.2015.05.047
- 5.Automated mapping of phenotype space with single-cell dataNat Methods 13:493–496https://doi.org/10.1038/nmeth.3863
- 6.Extracting a cellular hierarchy from high-dimensional cytometry data with SPADENat Biotechnol 29:886–891https://doi.org/10.1038/nbt.1991
- 7.High-dimensional analysis of the murine myeloid cell systemNature Immunology 15:1181–1189https://doi.org/10.1038/ni.3006
- 8.Viualizing data using t-SNEJournal of Machine Learning Research 9:2579–2605
- 9.UMAP: Uniform Manifold Approximation and Projection for Dimension ReductionArXiv https://doi.org/10.48550/arXiv.1802.03426
- 10.A comparison framework and guideline of clustering methods for mass cytometry dataGenome Biology 20https://doi.org/10.1186/s13059-019-1917-7
- 11.Predicting Cell Populations in Single Cell Mass Cytometry DataCytometry A https://doi.org/10.1002/cyto.a.23738
- 12.DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry dataPLOS Computational Biology 18https://doi.org/10.1371/journal.pcbi.1008885
- 13.CyAnno: a semi-automated approach for cell type annotation of mass cytometry datasetsBioinformatics 37:4164–4171https://doi.org/10.1093/bioinformatics/btab409
- 14.Gating mass cytometry data by deep learningBioinformatics 33:3423–3430https://doi.org/10.1093/bioinformatics/btx448
- 15.Automated cell type discovery and classification through knowledge transferBioinformatics 33:1689–1695https://doi.org/10.1093/bioinformatics/btx054
- 16.SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk SamplesGenes (Basel 10https://doi.org/10.3390/genes10070531
- 17.CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasetsF1000Res 6https://doi.org/10.12688/f1000research.11622.3
- 18.Mass Cytometry Discovers Two Discrete Subsets of CD39(-)Treg Which Discriminate MGUS From Multiple MyelomaFront Immunol 10https://doi.org/10.3389/fimmu.2019.01596
- 19.Cytofkit: A Bioconductor Package for an Integrated Mass Cytometry Data Analysis PipelinePLoS Comput Biol 12https://doi.org/10.1371/journal.pcbi.1005112
- 20.ImmunoCluster provides a computational framework for the nonspecialist to profile high-dimensional cytometry dataElife 10https://doi.org/10.7554/eLife.62915
- 21.Altered immune response to the annual influenza A vaccine in patients with myeloproliferative neoplasmsBr J Haematol 193:150–154https://doi.org/10.1111/bjh.17096
- 22.A distinct innate immune signature marks progression from mild to severe COVID-19Cell Rep Med 2https://doi.org/10.1016/j.xcrm.2020.100166
- 23.Recent Advances in Computer-Assisted Algorithms for Cell Subtype Identification of Cytometry DataFront Cell Dev Biol 8https://doi.org/10.3389/fcell.2020.00234
- 24.Rapid cell population identification in flow cytometry dataCytometry A 79:6–13https://doi.org/10.1002/cyto.a.21007
- 25.Determination of essential phenotypic elements of clusters in high-dimensional entities-DEPECHEPLoS One 14https://doi.org/10.1371/journal.pone.0203247
- 26.Pre-treatment with systemic agents for advanced NSCLC elicits changes in the phenotype of autologous T cell therapy productsMol Ther Oncolytics 31https://doi.org/10.1016/j.omto.2023.100749
- 27.diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clusteringCommunications Biology 2https://doi.org/10.1038/s42003-019-0415-5
- 28.CytoNorm: A Normalization Algorithm for Cytometry DataCytometry A 97:268–278https://doi.org/10.1002/cyto.a.23904
- 29.Removing unwanted variation with CytofRUV to integrate multiple CyTOF datasetsElife 9https://doi.org/10.7554/eLife.59630
- 30.Sensitive detection of rare disease-associated cell subsets via representation learningNature Communications 8https://doi.org/10.1038/ncomms14825
- 31.Testing for differential abundance in mass cytometry dataNat Methods 14:707–709https://doi.org/10.1038/nmeth.4295
- 32.Automated identification of stratifying signatures in cellular subpopulationsProc Natl Acad Sci U S A 111:E2770–2777https://doi.org/10.1073/pnas.1408792111
- 33.A systematic comparison of novel and existing differential analysis methods for CyTOF dataBriefings in Bioinformatics 23https://doi.org/10.1093/bib/bbab471
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Copyright
© 2024, Sun et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.