1. Computational and Systems Biology
  2. Immunology and Inflammation
Download icon

ImmunoCluster provides a computational framework for the nonspecialist to profile high-dimensional cytometry data

  1. James W Opzoomer
  2. Jessica A Timms
  3. Kevin Blighe
  4. Thanos P Mourikis
  5. Nicolas Chapuis
  6. Richard Bekoe
  7. Sedigeh Kareemaghay
  8. Paola Nocerino
  9. Benedetta Apollonio
  10. Alan G Ramsay
  11. Mahvash Tavassoli
  12. Claire Harrison
  13. Francesca Ciccarelli
  14. Peter Parker
  15. Michaela Fontenay
  16. Paul R Barber
  17. James N Arnold  Is a corresponding author
  18. Shahram Kordasti  Is a corresponding author
  1. School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, United Kingdom
  2. Institut Cochin, Institut National de la Santé et de la Recherche Médicale U1016, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 8104, Université Paris Descartes, France
  3. UCL Cancer Institute, Paul O'Gorman Building, University College London, United Kingdom
  4. Centre for Host Microbiome Interaction, FoDOCS, King’s College, Guy’s Hospital, United Kingdom
  5. Haematology Department, Guy’s Hospital, United Kingdom
  6. Cancer Systems Biology Laboratory, The Francis Crick Institute, United Kingdom
  7. Francis Crick Institute, United Kingdom
Tools and Resources
  • Cited 0
  • Views 1,678
  • Annotations
Cite this article as: eLife 2021;10:e62915 doi: 10.7554/eLife.62915

Abstract

High-dimensional cytometry is an innovative tool for immune monitoring in health and disease, and it has provided novel insight into the underlying biology as well as biomarkers for a variety of diseases. However, the analysis of large multiparametric datasets usually requires specialist computational knowledge. Here, we describe ImmunoCluster (https://github.com/kordastilab/ImmunoCluster), an R package for immune profiling cellular heterogeneity in high-dimensional liquid and imaging mass cytometry, and flow cytometry data, designed to facilitate computational analysis by a nonspecialist. The analysis framework implemented within ImmunoCluster is readily scalable to millions of cells and provides a variety of visualization and analytical approaches, as well as a rich array of plotting tools that can be tailored to users’ needs. The protocol consists of three core computational stages: (1) data import and quality control; (2) dimensionality reduction and unsupervised clustering; and (3) annotation and differential testing, all contained within an R-based open-source framework.

Introduction

Systems immunology approaches aim to explore and understand the complexity of the immune system. However, with 350 cluster of differentiation (CD) antigens, over 100 cytokines and chemokines, and many different cell subsets, this is a challenging task (Davis et al., 2017). Liquid mass cytometry (LMC), imaging mass cytometry (IMC), and flow cytometry (FC) are powerful techniques applied to exploratory immunophenotyping and biomarker discovery. With the ability to profile over 40 markers on an individual cell, these techniques have rapidly expanded our understanding of the immune system and its perturbations throughout disease pathogenesis (Bertolo et al., 2020; Wang et al., 2020). However, such comprehensive analyses produce large amounts of data and increased dimensionality, resulting in a demanding computational task to analyze (Newell and Cheng, 2016). Often, a limiting factor to these analyses is the requirement of an in-depth knowledge of computational biology. There is an unmet need for an easy-to-use flexible open-source computational framework to explore a variety of high-dimensional single-cell cytometry datasets such as LMC, IMC, and FC within a single framework for nonspecialists. A number of previous computational cytometry workflows have been proposed, including CyTOF workflow (Nowicka, 2019), (Diggins et al., 2015) and CapX (Ashhurst, 2019). Subsequently, some of these workflows have been incorporated into graphical user interfaces (GUI) or code-based workflow packages such as CATALYST (Crowell et al., 2020), Cytofkit (Chen et al., 2016), diffcyt (Weber et al., 2019), and Spectre (Ashhurst, 2020). However, these approaches can be challenging to implement without significant computational knowledge. We have therefore developed ImmunoCluster, building on and extending the framework proposed in the CyTOF workflow (Nowicka, 2019), to create modular, flexible, and easy-to-use implementations of cytometry analysis pipelines. The ImmunoCluster package is a framework that provides appropriate data structures and methods to increase the utility of these high-dimensional methods and make data analysis and interpretation accessible to all researchers to facilitate immune phenotyping projects, such as those associated with preclinical and clinical studies.

Here, we present ImmunoCluster, an open-source computational framework for the analysis of high-dimensional LMC, IMC, and FC datasets. ImmunoCluster is an R package and framework that focuses on the organization and visualization of data to help define biological identity of cells, construct a hierarchy of biologically meaningful cell populations, and detect significant changes between conditions, timepoints, and groups, including serial samples from a clinical trial setting. The framework uses state-of-the-art computational techniques in an easily amendable and flexible format. Users can manipulate figures and outputs to suit their specific needs. The computational approaches used to analyze high-dimensional data are rapidly evolving, and ImmunoCluster has the flexibility to incorporate these novel computational methods in dimensionality reduction, clustering, and trajectory analysis within its framework. The computational framework is designed for ease of use and broad applicability so that it is simple enough to be implemented by users with only a basic knowledge of R, as well as possessing the flexibility for more advanced users to incorporate additional new methodology and build on the framework. The framework relies on the SingleCellExperiment class (Lun, 2017), which means that the flow cytometry standard (FCS) data is contained within a purpose-built object that stores all stages of analysis to permit multiple analysis paths to be performed in parallel. The framework incorporates methods for applying popular R packages (i.e., RPhenograph [Chen and Chen, 2015], Rtsne [Krijthe, 2018], and FlowSOM [Van Gassen, 2015a]) along with convenient ‘wrapper functions’ allowing for the use these popular dimensionality reduction and clustering algorithms in an easy format for nonspecialists to rapidly produce interpretable data that are extendable to advanced experimental designs. ImmunoCluster also contains functionality for adaptive downsampling at the import and dimensionality reduction stages of the pipeline to overcome significant increases in runtime associated with large datasets in dimensionality reduction algorithms like UMAP and tSNE. The outputs from ImmunoCluster are created using the ggplot package, which generates ggplot graphical objects that are flexible, can be further modified and utilized directly into figures for reports and publications. The core of the visualization tool was developed using scDataviz, a Bioconductor package (Blighe, 2020a) for visualizing single-cell data and influenced by the visual framework outlined in CyTOF workflow (Nowicka, 2019). ImmunoCluster offers an alternative to the currently used frameworks such as CATALYST (Crowell et al., 2020) and Cytofkit (Chen et al., 2016), with a focus on ease of use for biologists, as well as offering additional novel aspects for its users compared with other published frameworks. Importantly, and unique to ImmunoCluster, the scope of functionality in this package also enables the analysis of IMC data. Users are able to run IMC data in line with FC and LMC data analyses. Due to the lower resolution of IMC data (compared with FC and LMC), we specifically designed the ‘ranked expression heatmap’ function to work with IMC data, which allows users to rank markers measured to help identify cell populations. IMC provides a means to analyze the spatial dimension of the cells in situ within tissues, which can provide important insight when ascribing functionality to cell populations. Methods for analyzing large cytometry datasets, particularly IMC datasets, in an open-source computational environment are currently limited. ImmunoCluster has been designed for use by a nonspecialist; however, it would be useful to a range of users from wet-lab nonspecialist experimentalists to experienced computational biologists with potential utility in day-to-day research through to large-scale studies involving multiple longitudinal serial biopsies where samples need to be analyzed and compared.

Clinical immune monitoring is an area of increasing utilization (Hernandez et al., 2020). In the current work, we will present examples of ImmunoCluster’s ability to detect clinically relevant perturbations in immune cellular heterogeneity in LMC, IMC, and FC datasets in human patient and/or healthy samples. ImmunoCluster is easily extensible to several arbitrary experimental designs and can be utilized to aid both discovery of novel subpopulations of cells within heterogeneous samples or as a foundation for monitoring longitudinal changes or responses to treatment. As such, ImmunoCluster provides a resource that will permit unsupervised high-dimensional data analysis to be more widely adopted in immunobiology and many other disciplines.

Results

This section aims to convey the scope of ImmunoCluster, its versatility, and applicability to complex datasets across a range of high-dimensional approaches. We implemented the ImmunoCluster framework for three different types of high-dimensional single-cell data (LMC, IMC, and FC). ImmunoCluster also offers user-friendly flexibility throughout the framework, and researchers can easily adapt all figures and outputs to suit their specific needs, resulting in an abundance of tailored outputs for the user to assess and use in publications, reports, and presentations. An example detailing a variety of tailor-made figures produced by ImmunoCluster can be found in Figure 1—figure supplement 1.

We first used previously published and publicly available LMC data (Hartmann et al., 2019). This dataset was chosen as it is representative of an intricate immunophenotyping project and would test the ability of ImmunoCluster to reproduce published results. Secondly, we investigated two novel IMC datasets, from HNSCC and diffuse large B-cell lymphoma (DLBCL) patients with the aim of demonstrating the applicability of the ImmunoCluster tool for IMC data analysis and cell cluster identification and visualization. Finally, FC data from the BM of seven healthy donors (HDs) during hip surgery were used to show the ability of the framework to analyze conventional FC data and identify rare immune cell populations.

Liquid mass cytometry

The chosen dataset (Hartmann et al., 2019) includes mass cytometry data from 15 patients with leukemia who all received bone marrow transplant (BMT), three of whom developed acute graft versus host disease (GvHD). Samples were taken at 30 and 90 days post-BMT. Thirty-three cell markers, both lineage and functional, were analyzed using mass cytometry (Helios CyTOF system). Prior to uploading into the ImmunoCluster framework, the FCS files were normalized and gated as described in the Materials and methods section. We observed that a typical pipeline run on the full 2.3 million cell LMC GvHD dataset (with UMAP downsampling to 500k cells) would take ~110 min on a 2.9 GHz Intel Core i7 MacBook pro with 16 GB RAM (Figure 1—figure supplement 2).

An initial visualization of the data was first carried out using multidimensional scaling (MDS) plots, which were overlaid with metadata to identify the similarity of patients by condition and day of measurement (Figure 1A, B). A heatmap was also created to give an overview of marker expression for each patient and annotated with the metadata provided (Figure 1C). The UMAP algorithm was used for dimensionality reduction of data. There was a difference in the distribution of cells from GvHD and ‘none’ across the cell islands between both timepoints, 30 and 90 days after BMT (Figure 2A).

Figure 1 with 2 supplements see all
Initial exploration of liquid mass cytometry data from patients with leukemia who received bone marrow transplants (BMTs).

(A, B) Multidimensional scaling of data; median marker expression data from each sample were used to create plots, annotated with condition (graft versus host disease or none) (A), and time after BMT treatment (B). (C) Heatmap showing the median marker expression for each patient.

Figure 2 with 3 supplements see all
Dimensionality reduced liquid mass cytometry CyTOF data and marker expression.

(A) UMAP plots colored by graft versus host disease and none and split by timepoint (30 and 90 days after bone marrow transplant treatment). (B) Expression of eight selected lineage markers projected onto UMAP plot. Identifying cell types and abundance of clusters. (C) Heatmap showing median marker expression across all identified cell types. (D) UMAP annotated with cell types. (E) Distribution of immune cell frequencies and abundance (%) of each cell type across all samples measured.

The mass cytometry panel used in this experiment was specifically designed to identify all major human immune cell lineages (Hartmann et al., 2019); the marker expression across cell islands was visualized and indicated cell types present within the cell islands (Figure 2B) (expression of all markers measured; Figure 2—figure supplement 1). To further distinguish the cell types that were present in the cell islands and visible from the dimensionality reduced UMAP plots (Figure 2B), the FlowSOM clustering algorithm with consensus clustering metaclustering was applied, which has been demonstrated to scale well to large cytometry datasets (Weber and Robinson, 2016). The number of clusters input into the FlowSOM algorithm was slightly higher than the number of expected cell clusters (n = 56) (Figure 2—figure supplement 2). The SingleCellExperiment object stored the data for each K meta-clustering (K1-K56); therefore, further downstream data exploration was carried out looking at different numbers of K clusters. A heatmap showing the expression of the 33 markers measured was used to identify cluster cell types (Figure 2—figure supplement 2), and ImmunoCluster was successfully able to replicate the findings from Hartmann et al., 2019, with 24 of the cell populations being identified (Figure 2C, D). Selecting the correct number of clusters to input into the clustering algorithm can be a challenging aspect of these analyses, but the framework provides the user with multiple parameters to aid this decision. We recommend that users should always overestimate the number of cell populations as data for each K number of clusters will be stored in the SingleCellExperiment object and can therefore be explored and refined later using the different visualization techniques available within the framework. Additionally, the elbow plot generated by the ConsensusClusterPlus package can be examined alongside the clustering algorithm output to guide the decision on the best number of clusters for downstream analysis (Figure 8—figure supplement 6). Over-clustering can allow for the identification of rare cell types within a dataset at the expense of often generating several clusters of highly prevalent cell types that likely represent the same biological cell type. ImmunoCluster provides the tools to manually and reproducibly merge clusters of the same biological cell identity into one group after over-clustering. These clusters can additionally be merged into each other to use a less granular, higher-level population annotation (see Figure 2—figure supplement 3 for an example of higher-level clustering).

The abundance of all cell types across all samples was measured, and CD14+CD16- monocytes were identified as the most abundant population, correlating with the Hartmann et al. data (Hartmann et al., 2019Figure 2E). Individually displaying patient’s cell-type abundance means that we could identify variation within a group of interest; for example, we observed that although there is variation within the GvHD and none, overall, they appear to follow the same trends in cell-type abundance (Figure 3A). Additionally, significant differences between memory B-cells (false discovery rate [FDR] p=4.38×10−3), naïve B-cells (FDR p=1.35×10−2), and naïve CD4+ T-cells (FDR p=3.47×10−2) were identified between the GvHD and none (Figure 3B, C) using a t-test. A difference in number of naïve B-cells was previously reported by Hartmann et al., 2019 in this dataset and can be seen in Figure 2A (using Figure 2D for cell island identification), where there is a noticeable reduction of these cells in the GvHD patients. In addition, a reduction in naïve CD4+ T-cells can also be seen, which was also previously reported, but to a lesser extent. A volcano plot can be used to highlight the differentially abundant cell clusters between GvHD and none (GvHD logFC+ve and none logFC-ve), where cell types with FDR p<0.05 are shown in red (Figure 3D), which is also in line with the previously published data (Hartmann et al., 2019). We compared ImmunoCluster’s differential abundance testing output (stat_test_clust), run in the t-test mode, with the Hartmann et al. data (Hartmann et al., 2019) into the diffcyt computational framework (Weber et al., 2019), which is a state-of-the-art tool for differential discovery analyses. Diffcyt identified the same three cell clusters (memory B-cells, naïve B-cells, and naïve CD4+ T-cells) as differentially abundant (FDR p<0.05) between the GvHD and none conditions. This demonstrates concordance with ImmunoCluster’s statistical output that identified naïve B-cells and naïve CD4+ T-cells as one of the principal differentially abundant populations, which was also identified in the original analysis of the data (Supplementary file 4). Additionally, we explored the expression of checkpoint-related molecules, and their receptors, markers of proliferation, and invariant natural killer T (iNK T) cells in CD8+ T-cells and compared the expression of these in the patients with GvHD and none (Figure 3E). These markers help assess the functional states of cells and the checkpoint-related molecules and proliferative activity markers such as PD-1, PD-L1, TIM3, Ki-67, and TCRVa24-Ja18; the expression of these antigens has been proposed previously as candidate biomarkers for immunotherapy (Hartmann et al., 2019). Visually, GvHD patients had a higher expression of PD-1 and a noticeable difference in the Ki-67 proliferation marker as expected for these patients (Figure 3E).

Figure 3 with 1 supplement see all
Liquid mass cytometry (LMC) CyTOF data cluster and cell-type abundances.

(A) Percentage of each cell type shown for each patient numbered across the bottom of the plot. (B, C) Box plots portray interquartile range (IQR) with the horizontal line representing the median percentage of cell types in both graft versus host disease (GvHD) and none patients for B-cell and CD4+ T-cell populations, respectively. (D) Differential abundance analysis; volcano plot showing the significantly differentially expressed cell abundances (false discovery rate p<0.05) between GvHD and none (GvHD logFC+ve and none logFC-ve). (E) Comparison of checkpoint-related molecules (PD-1 and TIM3), receptors (PD-L1), proliferative (Ki-67), and iNK T-cells (TCRVa24-Ja18) marker expression between GvHD and none patients in the CD8+ T-cell cluster.

Imaging mass cytometry

We utilized two different IMC datasets to demonstrate how ImmunoCluster would handle IMC data from two different tumor microenvironments that possess (1) a biological image with clear boundaries, such as head and neck squamous cell carcinoma (HNSCC), where clear tumor and stroma regions were evident; (2) a biological image that was more heterogeneous and diffuse, such as the DLBCL lymph node section. ImmunoCluster’s ability to explore IMC data adds an important element to the framework and demonstrates the flexibility and applicability of the tool. Researchers can explore IMC data easily, and in line with LMC and FC data. After data preprocessing (see Materials and methods and Figure 8—figure supplement 2), the transformed gated FCS files were uploaded into the ImmunoCluster framework. Generally, the same workflow that was applied to the LMC and FC data was applied to the IMC data from a section of tissue from patients with HNSCC and DLBCL (lymph node) as a proof-of-principle application of the ImmunoCluster tool for IMC data analysis (Figures 4 and 5). The only difference with the IMC data was the use of a rank heatmap (Figure 4—figure supplement 1); this ranks the order of each marker for each sample (1 – total number of samples). The reason for this was clarity and ease of cell identification across clusters as IMC produces lower resolution data compared to the data output of LMC or FC, which makes manual assignments of populations more difficult to interpret. Therefore, a rank of expression was used as an easy means to facilitate the assignment of cell identity through defining the highest and lowest expression of markers between clusters.

Head and neck squamous cell carcinoma tissue section

Cell segmented regions are annotated in Figure 4A, where regions 1–3 were selected as tumor regions and 3–6 as stroma regions. The ImmunoCluster tool was able to distinguish between the stroma and tumor regions of the tissue section from the HNSCC tumor (Figure 4B), and with additional markers could provide a more in-depth analysis of the cell phenotypes in each respective region of the tumor. Dimensionality reduction of the data clearly separated the cell islands belonging to the stroma and tumor regions of the tissue (Figure 4C), suggesting different cell phenotypes between the regions. The FlowSOM algorithm was applied to cluster the cells and identify cell populations. The rank heatmap was used to show the ranked expression of each marker (ranked 1–10, with 10 being high) for each cluster (1–10) (Figure 4—figure supplement 1), and the identified cell types were then used to annotate the heatmap (Figure 4D). High expression of E-cadherin can be used as a marker of cancerous tissue in HNSCC, and its expression correlates with the cancer regions selected for analysis (Figure 4D). We focused on CD8+ T-cell, B-cell, and macrophage cell clusters (Hladíková et al., 2019; Evrard et al., 2019). Regions 2 and 3 of the tumor mostly consist of proliferating (Ki-67+) tumor cells and macrophages (tumor-associated and PD-L1+). Region 1 was a mix of immune cells, B-cells, and PD-L1+ CD8+ T-cells (Figure 4E). Stromal regions 5 and 6 are mostly PD-L1+ macrophages and CD8+ T-cells. Region 4 has a cluster of mixed immune cells, and from the IMC image (Figure 4A) we can see that this region looks like the tumor is encroaching into the stromal region. The stroma seems to be mostly nonproliferating cells (Ki-67compared to the tumor region) and CD8+ T-cells, including a CD20+ CD8+ cytotoxic subset (Gingele et al., 2020).

Figure 4 with 1 supplement see all
HNSCC imaging mass cytometry (IMC) data.

Immunophenotyping the tumor microenvironment with IMC data using ImmunoCluster. (A) IMC image showing an example of five channels: PD-L1 (green), CD4 (yellow), E-cadherin (red), CD20 (magenta), and CD8α (blue). Images with the segmented cell borders are highlighted. The tumor and stroma areas are clear to the eye, and three regions were selected from each as shown (regions 1–6). (B) Multidimensional scaling plot of stroma and tumor regions. (C) Dimensionality reduced data (UMAP algorithm applied) annotated with FlowSOM clusters and split by region type. (D) Rank heatmap: ranked expression (1–8, where 8 is high) of seven markers (CD20, CD8α, E-cadherin, Ki-67, PD-L1, CD68, and PD-1) and identified cell type. (E) Proportion of cell types for each tissue region.

Diffuse large B-cell lymphoma lymph node section

The IMC DLBCL lymph node section was split by high and low Ki-67 expression to identify the highly proliferative tumor cells (Figure 5A); Ki-67 is used as a prognostic marker in DLBCL (Tang et al., 2017). The FlowSOM algorithm was applied to the data to split the IMC data into clusters; these clusters were projected onto the UMAP plot to visualize how these clusters were split between the Ki-67 high and low (Figure 5B). The marker expression ranking tool was applied to the clustered data and used to identify cell types (Figure 5C, Figure 5—figure supplement 1). The majority of Ki-67 high population were proliferating tumor cells (84%), and the Ki-67 low population consisted of a heterogeneous collection of immune cell populations, such as CD4+ T-cells, CD8+ T-cells, B-cells, macrophages, and dendritic cells (DCs) that were successfully identified using the ImmunoCluster tool (Figure 5D).

Figure 5 with 1 supplement see all
Diffuse large B-cell lymphoma imaging mass cytometry (IMC) data.

Immunophenotyping the lymph node microenvironment with IMC data using ImmunoCluster. (A) IMC image showing an example of four channels: DNA (red), CD20 (green), CD8α (dark blue), and CD45RA (light blue). The image was split by high and low Ki-67 expression to identify highly proliferative cells (tumor). (B) Dimensionality reduced data (UMAP algorithm applied) annotated with FlowSOM clusters and split by Ki-67 high or low. (C) Rank heatmap: ranked expression (1–8, where 8 is high) of 12 markers (CD8, CD68, CD4, CD3, CD45, CD20, CD45RA, Ki-67, CD74, CD16, PD-L1, and CD11c) and identified cell type. (D) Proportion of cell types for Ki-67 high and Ki-67 low cell populations.

Flow cytometry

The HD BM taken during hip surgery was gated for the CD3+CD4+ population before analysis within the ImmunoCluster framework, with the aim of testing ImmunoCluster’s ability to identify minor populations such as Tregs and their subpopulations (Kordasti et al., 2016). The FlowSOM algorithm was applied to the data, resulting in 40 clusters (Figure 6—figure supplement 1), and a heatmap was created with median expression of markers to identify cell types (Figure 6B). Due to the markers used and the low prevalence of Tregs, the majority of cells were identified as CD4+ T-cells. Additionally, three populations of Tregs were identified by the ImmunoCluster tool: Tregs (CD25+CD127low) (3.6%, 1.3–5.5), Treg A (CD25+CD127low CD45RA+) (0.7%, 0.1–2.0), and Treg B-cells (CD25+CD127lowCCR4+CD95high) (0.9%, 0.2–2.4) (Figure 6C). The abundance of Treg A and B for all HDs is shown in Figure 6D, and their distribution was as expected (Kordasti et al., 2016).

Figure 6 with 1 supplement see all
Healthy donor bone marrow flow cytometry data: identification of rare CD4+ T-cell immune cell subsets.

(A) Multidimensional scaling plot of each healthy donor (1–7). (B) Heatmap showing marker expression across clusters of identified cell types. (C) Dimensionality reduced data (UMAP algorithm applied) annotated with cell type. (D) Proportion of Treg A and B from total cells from healthy donors.

Discussion

Here, we provide an overview of the flexibility and scope for utilizing ImmunoCluster, an open-access easy-to-use framework for LMC, IMC, and FC dataset analyses. The ImmunoCluster package emphasizes ease of use in the generation of flexible and modular cytometry analysis pipelines for applications such as clinical immune monitoring studies. Explanations of the functions and workflow have been provided, and an in-depth step-by-step protocol and tutorial are available on the GitHub site (https://github.com/kordastilab/ImmunoCluster).

The purpose for creating the ImmunoCluster framework was to support nonspecialist researchers to carry out complex immunophenotyping experiments. The dawn of ‘cytometry big data’ and its ever-growing utilization has surpassed the number of experimental parameters that a researcher can feasibly analyze as a collective. The ImmunoCluster framework was designed in collaboration between wet lab and computational biologists, with the purpose of creating a tool that enables state-of-the-art analysis, yet easy to use in its application. We provide sufficient detail and explanation for the nonspecialist researcher to understand the need for each analysis step as well as how to confidently execute the process, with an aim of performing transparent and reproducible analyses.

ImmunoCluster has been built with an emphasis on generalizability and scalability to facilitate a broad use and is an advance over other open-source tools as it provides an integrated framework to uniquely perform a complete computational pipeline on high-dimensional LMC, IMC, and FC data. The inclusion of IMC data analysis brings a unique element to the ImmunoCluster framework and differentiates it from other published protocols. ImmunoCluster was built to easily leverage many popular R packages for cytometry data (i.e., RPhenograph [Levine et al., 2015], Rtsne [van der Maaten and Hinton, 2008; Laurens van der, 2014], and FlowSOM [Van Gassen, 2015b]), selected because they are open-source and regularly maintained with extensive documentation. All methods prioritize customizable and attractive visualizations designed to be used by both dry- and wet-lab researchers/clinicians. The inclusion of adaptive downsampling of cells for the running of computationally intensive dimensionality reduction steps, using tools such as UMAP and tSNE, allows ImmunoCluster to generate cytometry analysis pipelines that are readily scalable to a dataset of millions of cells across several samples and a variety of experimental or phenotypic conditions. ImmunoCluster can also be run on a local desktop/laptop computer with standard configuration, depending on the number of cells, adaptive downsampling settings and selection of clustering algorithms can usually be run within a day end-to-end (Figure 1—figure supplement 2). Whilst the PhenoGraph clustering method does not readily scale, the FlowSOM-based clustering method can be implemented on much larger datasets.

The limitations of this computational framework are important to take into consideration when designing an experiment. Unsupervised clustering (stage 2) of samples is the most important stage of the framework, and its ability to accurately define populations across all samples is critical to all later stages of investigation. The number of k clusters to generate (cluster resolution) can significantly affect the ability to identify or determine a change in a biologically relevant population. Merging two subpopulations due to low resolution of clustering may mask an important experimental observation. Determining the precise number of clusters that are relevant for a given dataset is an important step. In our step-by-step guide, we provide complete documentation and clear user-friendly approaches to optimally define this.

Technical inter-sample marker signal variability due to batch effects may impact the ability of unsupervised analysis to reliably detect certain populations if significant batch effects are present. MDS plots created in stage 1 of the framework can be used to detect batch effects as well as other technical artifacts such as antibody staining anomalies. An approach commonly employed in mass cytometry to overcome this is to use sample barcoding, reducing the variability between each sample, allowing all samples to be exposed to the same antibody mixture (Schulz and Mei, 2019). Another regularly employed approach is the inclusion of a shared control sample in each independent batch, for example, the same cell type as the experimental samples but all from the same HD. Statistical methods for batch correction may also be applied; in recent years, a class of methods called remove unwanted variation (RUV) have been developed, and CytofRUV (Trussart et al., 2020) is a recently developed package specifically designed for CyTOF dataset batch correction. If there are significant differences in the total number of cells recovered between samples, samples with many more cells may bias the clustering. Samples with very few cells recovered may result in information loss and missing populations that are in fact present. The ImmunoCluster framework provides users with two opportunities (creating the SingeCellExperiment and running the dimensionality reduction) to downsample, which means the same number of samples will be used for each sample, rectifying the problem of varying cell numbers. If a particular sample has very few cells, they may need to be excluded from the analysis as they may not be a representative sample. Although channel spillover is diminished in mass cytometry, it still exists in fluorescence-based FC and should be considered when designing antibody panels to reduce the effects on introducing cell phenotype artifacts in downstream unsupervised analysis. As such, initial exploratory data analysis (e.g., MDS plots) is key to determining if any of these confounding factors might be present in the data. Additionally, users may want to introduce fluorescence‐minus‐one controls, where cells are stained with all fluorescently tagged antibodies except for the one of interest (Roederer, 2001).

Summary

The ImmunoCluster package increases the accessibility of advanced computational methods for users tasked with generating high-quality analysis of high-dimensional LMC, IMC, and FC data. More advanced users can leverage ImmunoCluster’s data structures, methods, and suitability for scripted analysis to integrate our work with other R analysis tools and build on the package’s foundation. In our clinically relevant immune monitoring case study setting, the ImmunoCluster framework successfully identified 24 phenotypically distinct cell clusters, and their abundance across all samples, and highlighted differentially represented cells between GvHD and none patients, all of which were consistent with the Hartmann et al. data (Hartmann et al., 2019). We also applied the framework to two sets of IMC data, showing the broad applicability of ImmunoCluster as well as the ease of being able to analyze the IMC data in line with LMC data, helping compare these types of data, which will be useful in future studies where multiple technologies are being applied within one study. Finally, we implemented FC data into ImmunoCluster, confirming that it was able to analyze conventional FC data and identify rare immune cell populations. Currently, the preprocessing of raw files is carried out prior to implementation into the ImmunoCluster framework, but due to the flexibility and design of the tool, future implementations may include IMC data preprocessing within the ImmunoCluster R package. This versatile framework provides an opportunity to include all researchers with varying knowledge and experience in computational biology to be involved in the experimental project from experimental design, wet lab/clinical trial, all the way through the data analysis process and visualization.

Materials and methods

Data file preprocessing and preparation for implementation into the ImmunoCluster framework

LMC data

Request a detailed protocol

Here, we describe an example of the ImmunoCluster package analyzing previously published LMC data (CyTOF) from 15 patients with leukemia 30 and 90 days after BMT, previously published by Hartmann et al., 2019. After BMT, three of these patients suffered acute GvHD and 12 had no evidence of GvHD (noted as ‘none’ herein). For systems-level biomarker discovery within this trial, a panel of 33 antibodies were incorporated into the immunophenotyping panel (Supplementary file 1). The panel was designed to cover all major immune cell lineages and several functional subsets including T-, B-, NK, and myeloid cells and granulocytes (detailed in Hartmann et al., 2019). The marker panel also included a variety of immune-regulatory proteins, such as PD-1, PD-L1, and TIM3. The publicly available dataset from the Hartmann et al. study was extracted for the current study from the FlowRepository (http://flowrepository.org/id/FR-FCM-Z244) (Spidlen et al., 2012). Raw FCS files for the LMC dataset that were collected on the Helios CyTOF system (Fluidigm, UK) machine were first normalized using the free CyTOF 6.7 system control software (files can also be concatenated using the same software if necessary). These files were gated using FlowJo (version 10.5.3) (Becton, Dickinson and Company, UK) to remove beads, doublets, dead cells, and non-CD45+ cells as well as erythrocytes (CD235αβ/CD61+) and neutrophils (CD16+) (Figure 8—figure supplement 1), as described by Hartmann et al., 2019, before implementation into the ImmunoCluster framework. There are various alternative open-access tools that can be used to gate these files, such as OpenCyto (Finak et al., 2014), CytoExploreR (Hammill, 2020), and Cytoverse (https://cytoverse.org/).

IMC data

Request a detailed protocol

Two different examples of IMC datasets were implemented into the framework. One IMC dataset was from a human tissue section taken from a patient with HNSCC. Consent was attained by the Guy’s and St Thomas’ Research Biobank, within King’s Health Partners Integrated Cancer Centre. The second IMC dataset was from a lymph node section from a patient with DLBCL. Formalin-fixed paraffin-embedded (FFPE) DLBCL tumor tissue was obtained from King’s College Hospital, in accordance with the Declaration of Helsinki and approved by the UK National Research Ethics Committee (reference 13/NW/0040). A detailed protocol is available for both datasets as a Supplementary file (Appendix 1 and 2, respectively). The raw HNSCC IMC data from 12 channels (10 markers + 2 intercalator-Ir channels; Supplementary file 2) and the raw DLBCL IMC data from 20 channels (19 markers + nuclei channel; Supplementary file 3) were processed with Python scripts and CellProfiler pipelines according to Schulz et al., 2018 to segment the image data into individual cells (Figure 8—figure supplement 2). The mean marker intensity for each segmented cell was exported (comma-separated values [CSV] file) and either divided into six regions of equal area (HNSCC, three tumor and three stroma regions) or divided by Ki-67 expression (DLBCL, Ki-67 high or low). The mean marker intensity for each segmented cell was multiplied by 65,535 to recover the image intensities (Fluidigm IMC machine = 16 bit, the dynamic range of the measurements is 0 to (216)−1) and asinh (co-factor 0.8) transformed in R Studio. These CSV files were converted into FCS files and gated in Cytobank (Chen and Kotecha, 2014); functions are also available within the ImmunoCluster package to convert SCE or CSV data files into FCS format.

FC data

Request a detailed protocol

FC data from the BM of seven HDs were used to demonstrate the ability of the framework to analyze conventional FC data and identify rare immune cell populations. BM samples from HDs were obtained by extraction of the bone marrow cells from the bone of the femoral head. This noninterventional study was approved by the ethical committee of Cochin-Port Royal Hospital (Paris, France) (CLEP Decision No.: AAA-2020-08039). Femoral heads were obtained after informed consent during hip replacement surgery. These were cut in half and collected in a conservation medium (Hanks balanced salt solution with NaHCO3, Eurobio), supplemented with heparin (7%) and then transported to the laboratory at room temperature (RT). These were scraped with a spatula, ground in a mortar, and washed with a PBS solution supplemented with 100 µg/ml DNAse (Sigma Aldrich). The FCS files were gated for singlets using a FSCint/FSCpeak dot plot, and dead cells were removed using a forward scatter (FSC) and side scatter (SSC) dot plot. Leukocytes were gated on a CD45/SSC dot plot. Finally, they were gated to identify the CD3+CD4+ population of cells for analysis in the ImmunoCluster framework (gated in Cytobank; Chen and Kotecha, 2014).

Workflow overview

Request a detailed protocol

The ImmunoCluster package is accessible via GitHub (https://github.com/kordastilab/ImmunoCluster), where a detailed step-by-step walkthrough of the tool and R Markdown files (RMD) for LMC/FC and IMC data analysis are available. The RMD files permit users to logically work through the framework in sections in a ‘click and play’ manner. Additionally, all scripts to replicate figures published here are available via GitHub. Users will need to have R downloaded to run the ImmunoCluster tool. The ImmunoCluster framework provides tools and support, allowing researchers to follow a workflow that guides them through experimental design, data analyses, interpretation, and statistical significance testing, to publishable graphics identifying differences in phenotype and abundance of cells between conditions analyzed (Figure 2 and Figure 7 and Figure 8). The framework comprises three core computational stages that are conducted by the ImmunoCluster tool:

ImmunoCluster workflow stages and SingleCellExperiment structure.

(A) Schematic outlining the typical cytometry workflow and its interactions with the SingleCellExperiement ImmunoCluster object. (B) SingleCellExperiment structure in ImmunoCluster. The SingleCellExperiment class is a data container, storing multiple layers of data to create the SingleCellExperiment object, which holds all relevant data for an experiment. Feature metadata: imported by the user in the form of a panel_metadata file, which is a table containing all markers measured, each annotated with either lineage or functional information for downstream analysis. Primary and transformed data: the imported expression data is stored in an assay; additionally, the scaled data (arcsinh transformed) is also stored in an assay, meaning both can be easily accessed. Cell metadata: the first metadata added to this element of the structure will be a sample_metadata file imported by the user, containing any relevant metadata for the experiment, that is, days after treatment and graft versus host disease or none. Throughout the ImmunoCluster tool, more layers of metadata are added to cell metadata, that is, cell cluster identification (FlowSOM and Phenograph). Dimension reductions: dimensionality reduction coordinates, such as UMAP and tSNE, are stored and can be easily accessed throughout the ImmunoCluster tool for downstream analyses.

Stage 1: Data import and quality control

Request a detailed protocol
  1. We provide tools designed primarily for computational preprocessing of FCS data files for downstream applications including parameter renaming and transformation. The generated high-dimensional data are imported into the computational framework as FCS file formats or summarized feature expression values in CSV tabular format.

  2. Associated metadata files need to be completed by the researcher in the experimental design stage (Figure 7). This includes a sample_metadata file, in which all metadata relating to samples will need to be input, such as timepoints, response to treatments, and patient status (Figure 8—figure supplement 3A). For simplicity in executing this step, the panel_metadata file allows users to rename parameters, such as the markers used in the analysis as well as select the markers that are to be used for different analyses stages (all markers may not need to be included in the following steps) (Figure 8—figure supplement 3B).

  3. All of the above-mentioned data is stored within a SingleCellExperiment object (Lun, 2017). The SingleCellExperiment is an S4 class object that is implemented into the ImmunoCluster framework and is in essence a data container in which you can store and retrieve information such as metal-barcoding for sample multiplexing, metadata, dimensionality reduction coordinates, and more (outlined in Figure 7). FCS files can also be exported from the SingleCellExperiment using the write_sce_to_fcs function.

  4. After the data and metadata files have been imported and stored in the SingleCellExperiment object, we provide several approaches for initial exploratory visualization of the data. One example of this is the MDS plots. The median marker expression data from each sample are used to create MDS plots, and therefore allow for an initial visualization of data points (labeled with metadata) and show the similarity and differences between samples in two-dimensions. Additionally, these plots can be used to detect batch effects and other technical artifacts such as antibody staining anomalies. Moreover, a heatmap of expression of markers measured for each patient can be created and metadata can be used to annotate the heatmap.

Stage 2: Dimensionality reduction and unsupervised clustering

Request a detailed protocol
  1. Dimensionality reduction is a key component of high-dimensional single-cell data analyses and enables researchers to view high-dimensional data. For example, the 33 protein markers analyzed by Hartmann et al., 2019 can be reduced to lower dimensional embedded coordinates per cell. Nonlinear dimensionality reduction techniques can avoid overcrowding and represent data in distinct cell islands (Figure 8).

  2. The framework offers users three nonlinear statistical methods for representing high-dimensional single-cell data in low-dimensional space (Figure 8—figure supplement 4). MDS (utilized in stage 1 for initial data exploration), uniform manifold approximation and projection (UMAP) (Becht et al., 2019), and t-Distributed Stochastic Neighbor Embedding (tSNE) (van der Maaten and Hinton, 2008) algorithms are available to generate a two-dimensional (2D) embedding of the data. However, the SingleCellExperiment object can store any other form of arbitrary dimensionality reduction, such as principal component analysis (Blighe, 2020b) or DiffusionMaps (Angerer et al., 2016), among others.

  3. The density of cells within the distinct cell islands produced by the 2D embedding of multidimensional data can be viewed using the density_plot function (Figure 2—figure supplement 1).

  4. After the data has been visualized in dimensionality reduced space, clustering algorithms can be used to identify cell communities within the data, which allows the identification of phenotype and abundance of cell clusters within populations/groups being analyzed. A number of clustering algorithms are available within the framework to define cell populations of interest in an unbiased manner. First, an ensemble (use of multiple algorithms) clustering method of FlowSOM (Van Gassen, 2015b) and Consensus clustering (ConsensusClusterPlus R package, Wilkerson and Waltman, 2013), which tests the stability of the clusters, leading to better results than applying a basic hierarchical clustering algorithm (Weber and Robinson, 2016). Additionally, the PhenoGraph clustering algorithm is available, which uses K-nearest neighbors and a Euclidean distance metric (DiGiuseppe et al., 2018). Lastly, k-means clustering, which classifies cells into clusters with high intra-class similarity and low inter-class similarity, where each cluster is represented by its center (centroid) that corresponds to the mean of points (e.g., markers) assigned to the cluster. Either or all clustering algorithms can be applied at the users’ discretion.

  5. The aim of the clustering algorithms is to assign all cells to K clusters (K1, K2, …), resulting in clusters corresponding to true cell types. Typically, a K larger than the number of expected cell types is chosen at this stage as the ImmunoCluster framework allows users to explore all K clusters (Figure 8—figure supplement 5), as well as collapsing clusters of the same cell type into one cluster if the user feels over-clustering has occurred. Over-clustering may enable the clustering identification of rare cell types, sometimes at the expense of creating several ‘artificial’ clusters of the more prevalent cell types. Additionally, the workflow allows the generation of an elbow plot that is created to help the user to select an appropriate number of clusters (Figure 8—figure supplement 6).

Figure 8 with 6 supplements see all
ImmunoCluster workflow overview.

(0) Samples are stained/treated and measured, for imaging mass cytometry the tissue is segmented, and regions selected for further downstream analysis. After measurement, the raw data is normalized, concatenated, (combining flow cytometry standard files from the same samples, which may have been split due to large sample volume or technical issues) and gated, before importing into ImmunoCluster. (1) Quality control of data is carried out before analysis. (2) Data is reduced to two dimensions using either UMAP or tSNE algorithm, and data are clustered using the FlowSOM or Phenograph algorithms (these algorithms were selected as they are both high-performing unsupervised clustering algorithms; an in-depth comparison has previously been described by Weber and Robinson, 2016). (3) Data is visualized, and metadata, clusters, and cell-type labels are used to explore differences between samples/conditions. *A detailed step-by-step guide for using the ImmunoCluster tool is available: https://github.com/kordastilab/ImmunoCluster.

Stage 3: Annotation and differential testing

Request a detailed protocol
  1. After dimensionality reduction and clustering, the next step is to annotate each cluster based on marker expression. If over-clustering has occurred, multiple clusters of the same cell type can be present, and either a lower number of K clusters can be used; additionally, clusters of the same cell type can be collapsed into each other to create one cluster.

  2. Clusters containing multiple cell types may also be identified; this could mean the user has under-clustered or occasionally a group of different cells are clustered together by the algorithms.

  3. Expression of each marker analyzed can be projected onto the dimensionality reduced data by the researcher, aiding with the identification of cell types or phenotypically distinct clusters.

  4. Outputs such as heatmaps showing the expression of all markers for each cluster can be used to identify the clusters of interest (Figure 2—figure supplement 2). These can be used to annotate figures produced in stage 3.

  5. Biaxial dot plots with contouring of the SCE data can be produced using the biaxial_plot function; the expression of markers can be explored in the entire dataset or a subset of the data, that is, a particular cell cluster. A clustering option within biaxial_plot allows users to see the spread of data across two selected markers for different conditions (any metadata from the sample_metadata file), as well as different clusters (e.g., K1-60) (Figure 3—figure supplement 1).

  6. Due to the type of data produced from IMC (lower resolution compared to LMC and FC), a novel function called imc_rank was specifically designed for the IMC data; this creates a heatmap in which the expression of each marker is ranked. This rank heatmap can be used to facilitate the identification of clusters that are ‘high’ or ‘low’ for particular markers to distinguish and logically assign cell populations manually.

  7. These tools help define biological identity of cell clusters from stage 2 and construct a hierarchy of biologically meaningful cell populations based on marker expression and similarity.

  8. The metadata added to the SingleCellExperiment can be used to annotate the dimensionality reduced (i.e., UMAP) and clustered (FlowSOM, Phenograph, or K-means) data, allowing for visualization of the distribution of cell islands between different condition/timepoints (Figure 8).

  9. The above-mentioned metadata is also used to annotate all differential testing outputs, meaning the user can make many comparisons and explore multiple parameters all within the ImmunoCluster framework.

  10. A variety of differential testing outputs can be created, such as median marker expression for each sample (using clustered data) can be used to produce a hierarchical clustered heatmap, and box plots of cell cluster abundance between different conditions at different timepoints (any metadata can be used to annotate these figures).

  11. Statistically significant differences in the annotated populations between conditions are identified and displayed throughout the framework to the user for ongoing investigation and interpretation. Data from the ImmunoCluster-generated SingleCellExperiment object can be easily extracted for downstream analysis with differential analysis packages such as diffcyt (Weber et al., 2019) and cydar (Lun et al., 2017) for differential discovery, which is based on a combination of high-resolution clustering and empirical Bayes moderated tests. In addition, the ImmunoCluster framework offers the stat_test_clust and stat_test_expression functions that use the Wilcox rank-sum or Welch’s T-test to identify which cell clusters are significantly different between conditions (any two-way comparison can be made using metadata input by user), and which markers are significantly different within cell clusters between different conditions, respectively.

Appendix 1

Staining protocol for a human tissue section taken from a patient with head and neck squamous cell carcinoma (HNSCC).

EquipmentConsumablesReagentsReagent initial concentrationReagent final
Concentration
Humidified slide chamber (Sigma Aldrich, #H6644-1EA)*Sterile 3 ml pastettes (Alpha Labs, #LW112)DPBS, Ca, and Mg free (Gibco, #14190-094)
Dedicated glassware, coplin jars (Fisher, #10264732) **Superfrost Plus slides (Fisher, #ES-325)Tween-20 (Sigma Aldrich, #P7949-100 ml)*100%0.1%
Orbital shakerKimberly-Clark Precision Wipers (Fisher, #10623111)*Antigen retrieval solution pH 9.0
(R&D Systems, #CTS013)
1:10 in milli-Q water
Water bath0.2 syringe filter (TripleRed, #FPE204030)Superblock (Thermo Fisher, #37515)**
Fridge50 ml syringes (Fisher, #10636531)Iridium (Fluidigm, #201192B)***500 µM0.5 µM in DPBS
Milli-Q water purification system (18.2 MΩ.cm)Filter pipette tipsMetal tagged antibodies (Fluidigm)Require optimization for your tissue type
Hard slide box for storage in −80°C (VWR #631-1504)PAP pen (Abcam, #ab2601)FcR block (Trustain, Biolegend, # 422302)100%5%
PipettesXylene (Acros, #180862500)
Plastic tweezers, to handle slides (Farlamedical, #400200)Ethanol (Sigma Aldrich, #32221-2.5L)
Plastic coplin (Thermo Fisher, #E9611)
****
BSA (Sigma Aldrich, #A9647)20%
To prepare on the same day freshly
10%
Kiovin Hu IgG
Provide by Hospital’s pharmacy
10%
100 mg/ml
5 mg/ml
* Alternatively, use a pipette tip box with moist towel at the bottom and piece of parafilm on the tray (this prevents evaporation and stops condensation gathering at the back of the slide, which can pool around edges and dilute antibodies).
**Rinse in water, do not clean with detergent or in laboratory dishwasher (source of barium contamination).
**** Plasticware is required for 96°C water bath, used when dedicated histological autoclave or decloaking chamber is not available; you can also use a 50 ml centrifuge tube, loosely capped (Falcon or Corning).
*Normal laboratory hand towels and blue roll are fine to tip excess buffer off slide surface or to place in the bottom of the humidified slide chamber. However, they should not be used to wipe the back of the slides or draw off excess buffer from front – as can transfer lint to the slides.*Permeabilization can also be conducted with Triton X100 or saponin
** Alternative blocking solutions include BSA, human albumin, human immunoglobulin (KIOVIG from Pharmacy), serum same species as antibodies (mouse serum, Sigma Aldrich, # M5905-10ML).
*** DNA stain use alternative histone 3
antibody #3176023D.

STAINING PROTOCOL DAY 1: SECTION CUTTING

  • Cut a 5 µm section from the block at and place in an incubation oven overnight (o/n) at 37°C.

  • Cut a 3 µm section for hematoxylin and eosin (H&E) staining.

STAINING PROTOCOL DAY 2:

1. PREPARATION

  • Prepare xylene and ethanol at 50:50 ratio ~ 50 ml of final volume.

  • Prepare the following ethanol dilutions, with milli-Q water: 96%, 90%, 80%, and 70% (~50 ml of final volume for each).

  • Dilute the 10X basic antigen retrieval buffer in milli-Q water in 50 ml falcon tube (leaving the cap loose) and heat to 96°C in a water bath.

  • *BSA (Sigma Aldrich, #A9647): prepare a 20% solution fresh using, 0.4 g BSA in a plastic tube. Add 1 ml of superblock solution and incubate in the water bath for 1 hr. Once it is fully dissolved, top up to 2 ml (final volume).

  • Prepare the wash buffer using Dulbecco’s phosphate buffered saline (DPBS) (calcium/magnesium free) and 0.1% Tween-20 (500 µl of Tween-20 [Sigma Aldrich, #P7949-100 ml] in the 500 ml DPBS bottle). Filter the wash buffer every time you use it with a 0.2 µm filter. Store at room temperature.

1. Notes for wash steps

Always conduct washes in ~50 ml of buffer in coplin jars on an orbital mixer.

Never wash in laboratory dishwasher or with detergent as this can represent a source of barium contamination. Also, do not use metal slide holders or forceps.

2. STAINING

  • Heat the 1X antigen retrieval solution (1:10 dilute in milli-Q water) to 96°C in a water bath.

  • Heat a coplin jar and a bottle of wash buffer (0.1% Tween-20 in DPBS) to 37°C in a water bath.

  • Prepare the humidified slide chamber (box with a damp tissue in the bottom and lid or piece of parafilm to cover).

  • Prepare blocking and antibody incubation buffer fresh: 10% BSA, 0.1% Tween-20, 1/20 dilution of FcR block (Trustain, Biolegend, #422302) and 1/20 dilution of KIOVIG (Kiovin Hu IgG) in Superblock (Thermo Fisher, #37515) solution. Use Superblock to top up to final volume. Avoid mixing, filtering, or centrifugation of the Superblock buffer prior to pipetting, let solution settle at 4°C and pipette from top of the solution to avoid any aggregate material (that will have sunk to the bottom).

  • Conduct xylene washes in a safety cabinet.

  • Use separate coplin jars.

  • The xylene should be changed regularly, especially the first coplin jar in the sequence, which will have the most wax dissolved in it (readily become cloudy and less efficient).

1. CLEAR WAX FROM SLIDES (you sequentially remove the wax, with each wash in fresh xylene)

  • Incubate slide 1 hr at 60°C.

  • Agitate slides (up and down motion) by hand, three washes at RT:

  • Xylene 100% for 5 min.

  • Xylene 100% for 5 min.

  • 50:50 xylene:ethanol mix for 10 min.

1. REHYDRATE TISSUE (outside hood, use plastic tweezers)

  • Remove slides from xylene:ethanol mix in the hood, transfer to 100% ethanol for 5 min, and remove from cabinet.

  • Transfer slides to 90% ethanol – mix on orbital shaker for 5 min.

  • Transfer slides to 80% ethanol – mix on orbital shaker for 5 min.

  • Transfer slides to 70% ethanol – mix on orbital shaker for 5 min.

  • Remove slides and transfer to DPBS – mix orbital shaker for 10 min.

  • Slide can stay in DPBS until the antigen retrieval solution is ready.

RETRIEVE ANTIGEN

Fluidigm have validated all their FFPE antibodies using 30 min antigen retrieval, with basic buffer, in a 96°C water bath.

  • Heat 1X antigen retrieval solution to 96°C in the water bath (in plastic coplin jar or closed falcon 50 ml centrifuge tube).

  • When the slides are added, the temperature may drop – do not start the timer until the probe indicates the temperature in the tube has returned to 96°C.

  • Incubate slides in 1X antigen retrieval solution to 96°C in the water bath (in closed 50 ml falcon centrifuge tube) and incubate for 30 min.

  • Cool down closed 50 ml falcon tube to around RT under cold running water.

  • Transfer slides to DPBS and wash on orbital shaker for 5 min.

  • Dry slides with tissue wipers (Thermo Fisher) and leave to dry for 10 s.

  • Draw a wax ring around the tissue.

  • Wash twice with filtered wash buffer (0.1% Tween-20) in DPBS on the orbital shaker for 8 min. Filter it every time you use the wash buffer with 0.2 µm filter and store at RT.

  • Leave slides in the wash buffer until the block solution is ready.

1. BLOCK

  • While sections are in permeabilization buffer (0.1% Tween-20), prepare sufficient blocking buffer to cover the tissue, within the wax ring.

  • For a very small diameter wax ring, 50 µl will suffice; however, a large piece of tissue (>1 cm diameter) will require ~250 µl.

Blocking buffer preparation
SolutionDilute inInitial concentrationFinal
concentration
Dilution
factor
Per 100 µlPer
300 µl
Tween-20 (Sigma Aldrich, #P7949-100 ml)Superblock solutionPrepared first at 10%
Then at 1%
0.1%101030
Kiovin Hu IgGSuperblock solution10%
100 mg/ml
5 mg/ml20515
FcR block (Trustain, Biolegend, # 422302)Superblock solution100%5% (1/20)20515
BSA (Sigma Aldrich, #A9647)Superblock solution20%
(10 g in 100 ml = 10%)
10%250150
Superblock (Thermo Fisher, #37515)To top up
Until your final volume
3090

When you add the buffer to the inside of the wax circle, it should be contained and fully cover the tissue.

  • Remove slides from DPBS and remove excess buffer by tipping slide.

  • Use a lint-free tissue to dry the back of the slide thoroughly and to gently draw excess buffer away from the front, without touching the tissue or wax ring.

  • Transfer to humidified box, cover tissue with blocking buffer.

  • Incubate at RT for 90 min.

1. PREPARE ANTIBODY MIX AND INCUBATE

Antibody mix
Cat#SupplierTagTarget/
solution
Dil factor
(1:X)
Initial concentrationFinal concentration
3170019DFluidigm170CD3 (polyclonal, C-terminal)400
3156033DFluidigm156CD4100
3152016DFluidigm152CD451000
3162035DFluidigm162CD8-alpha800
3161029DFluidigm161CD20250
MAB1561R&D Systems150PD-L150
3158029DFluidigm158E-cadherin1000
3159035DFluidigm159CD68400
3165039DFluidigm165PD-150
3168022DFluidigm168Ki-67400
#A9647Sigma AldrichBSA
Prepare fresh
220%10%
#422302BiolegendFCR20100%5%
Provided by hospital’s pharmacyIgG Hu Kiovig20100 mg/ml5 mg/ml
#P7949Sigma AldrichTween-20101%0.1%
#37515Thermo FisherSuperblockTo top up to the required volume

You will require the same volume of antibody mix used for blocking solution and tip off excess blocking buffer onto tissue paper.

Use a lint-free tissue to dry the back of the slide thoroughly and to gently draw excess buffer away from the front, without touching the tissue or wax ring.

Incubate slide o/n with antibody mix at 4°C in humidified chamber.

STAINING PROTOCOL DAY 3

1. WASH IN PERMEABLIZATION BUFFER

  • Tip off the antibody mix onto tissue paper.

  • Flush the slide surface gently but thoroughly with a pastette and permeabilization buffer (DPBS with added 0.1% Tween-20) at RT.

  • Wash in the same RT buffer ~50 ml in coplin jar on the mixer for 8 min.

  • Repeat wash step for another 8 min.

1. WASH IN DPBS

  • In the same jar, decant 0.1% Tween-20 DPBS wash buffer and add further wash buffer at RT and then place on the mixer for 8 min.

  • Repeat wash step for another 8 min.

1. DNA STAIN

For initial testing, prepare iridium (Fluidigm) in DPBS at a final concentration of 0.5 µM (may require some optimization for your tissue type).

Dilute 125 µM iridium stock (Fluidigm 201192A) 1/250 with DPBS.

  • Tip off excess DPBS onto tissue paper.

  • Use a lint-free tissue to dry the back of the slide thoroughly and to gently draw excess buffer away from the front, without touching the tissue or wax ring.

  • Place slide in humidified chamber and load iridium solution into the wax ring so that it covers the tissue.

  • Incubate at RT for 30 min.

  • Flush the slide surface thoroughly with RT DPBS gently using a pastette.

  • Wash in RT DPBS on the mixer for 5 min.

  • Wash in milli-Q water ~50 ml in a clean coplin jar on the mixer for 5 min.

  • Air dry.

Slides can be stored for several months at RT in a cool, dry, dust-free storage container.

Appendix 2

Protocol for sectioning and staining of diffuse large B-cell lymphoma lymph node section

  1. A 5 μm tissue section was deparaffinized with xylene and sequentially rehydrated in graded ethanol. Heat-induced antigen retrieval was performed in an automated pressure cooker (Menapath Antigen Access Unit, Menarini) at 125°C for 2 min in Antigen Retrieval Reagent-Basic (R&D Systems).

  2. The tissue was then permeabilized with DPBS, 0.1% Tween-20 for 15 min, and blocked with Superblock (Thermo Fisher Scientific), 0.1% Tween-20, Fc Receptor Blocking solution (1:20; Biolegend; blocking buffer) for 2 hr at RT.

  3. Antibodies were diluted together according to Supplementary file 3 in blocking buffer and applied to the tissue overnight at 4°C.

  4. After washing, the slide was incubated with the DNA intercalator (Cell-ID Intercalator-Ir, Fluidigm) for 30 min at RT.

  5. The slide was then briefly washed with milli-Q water and air dried.

  6. Acquisition of 1 mm2 tissue region was carried out using the Hyperion imaging system (Fluidigm) at 200 Hz, with a resolution of 1 μm/pixel.

Data availability

The liquid mass cytometry dataset is available from FlowRepository (http://flowrepository.org/id/FR-FCM-Z244), the additional three datasets: 'Imaging mass cytometry data: Head and neck squamous cellcarcinoma tissue section', 'Flow cytometry data: healthy donor bone marrow taken during hip surgery', and 'Imaging mass cytometry data: Head and neck squamous cell carcinoma tissue section', are available from Dryad (https://datadryad.org/stash).

The following data sets were generated
The following previously published data sets were used
    1. Hartmann FJ
    2. Babdor J
    3. Gherardini PF
    4. Amir ED
    5. Jones K
    6. Sahaf B
    7. Marquez DM
    8. Krutzik P
    9. O'Donnell E
    10. Sigal N
    11. Maecker HT
    12. Meyer E
    13. Spitzer MH
    14. Bendall SC
    (2019) Flow Repository
    ID FR-FCM-Z244. Identification of disease-associated immune signatures following bone marrow transplantation.

References

  1. Book
    1. Ashhurst TM
    (2019) Analysis of the Murine Bone Marrow Hematopoietic System Using Mass and Flow Cytometry
    In: McGuire H. M, Ashhurst T. M, editors. Mass Cytometry: Methods and Protocols. Springer. pp. 159–192.
    https://doi.org/10.1007/978-1-4939-9454-0
    1. Ashhurst TM
    (2020) Integration, exploration, and analysis of high-dimensional single-cell cytometry data using spectre
    Cytometry. Part a : The Journal of the International Society for Analytical Cytology 10:e24350.
    https://doi.org/10.1002/cyto.a.24350
  2. Software
    1. Blighe K
    (2020a) scDataviz: Single Cell Dataviz and Downstream Analyses
    scDataviz: Single Cell Dataviz and Downstream Analyses.
  3. Software
    1. Blighe AL
    (2020b) PCAtools: PCAtools: Everything Principal Components Analysis
    PCAtools: PCAtools: Everything Principal Components Analysis.
  4. Software
    1. Hammill D
    (2020)
    CytoExploreR: Interactive Analysis of Cytometry Data
    CytoExploreR: Interactive Analysis of Cytometry Data.
    1. Laurens van der M
    (2014)
    Accelerating t-SNE using Tree-Based algorithms
    Journal of Machine Learning Research 15:3221–3245.
  5. Software
    1. Lun A
    (2017)
    SingleCellExperiment: S4 Classes for Single Cell Data
    SingleCellExperiment: S4 Classes for Single Cell Data.
  6. Software
    1. Nowicka M
    (2019)
    CyTOF Workflow: Differential Discovery in High-Throughput High-Dimensional Cytometry Datasets
    CyTOF Workflow: Differential Discovery in High-Throughput High-Dimensional Cytometry Datasets.
    1. van der Maaten L
    2. Hinton GJ
    (2008)
    Visualizing data using t-SNE
    Journal of Machine Learning Research 9:2579–2605.
  7. Software
    1. Van Gassen S
    (2015a) FlowSOM. Bioconductor Version: Release (3.12)
    FlowSOM. Bioconductor Version: Release (3.12).
  8. Software
    1. Wilkerson M
    2. Waltman P
    (2013)
    ConsensusClusterPlus: ConsensusClusterPlus
    ConsensusClusterPlus: ConsensusClusterPlus.

Decision letter

  1. Simon Yona
    Reviewing Editor; The Hebrew University of Jerusalem, Israel
  2. Tadatsugu Taniguchi
    Senior Editor; Institute of Industrial Science, The University of Tokyo, Japan
  3. Evan W Newell
    Reviewer; Fred Hutchinson Cancer Research Center, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The revised manuscript now has a clearer analysis workflow. ImmunoCluster is a user-friendly pipeline R package for the analysis of flow and mass cytometry experiments and for imaging mass cytometry. We hope this provides non-computational immunologists with an opportunity to use dimensionality reduction, unsupervised clustering and differential expression/abundance analyses.

Decision letter after peer review:

Thank you for submitting your article "ImmunoCluster: A computational framework for the non-specialist to profile cellular heterogeneity in cytometry datasets" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Tadatsugu Taniguchi as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Evan W Newell (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

The authors propose ImmunoCluster, a user-friendly pipeline R package for the analysis of flow and mass cytometry experiments and for imaging mass cytometry. The claim is that this approach and package is easy to use for non-computational immunologists and facilitates dimensionality reduction, unsupervised clustering, and differential expression/abundance analyses.

While this software is useful, the examples provided are good, there is very little/no novelty in the approach provided. The reviewers main concern is that the paper in its current form does not propose any new methodology with regards to clustering, visualization, or differential analysis. It was felt that the paper in its current form combines existing methodology in a user-friendly way that could indeed be important for biologists. In addition, the paper should acknowledge similar packages that already exist (eg CytofKit https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005112).

The combined reviewers' comments can be found below. For this manuscript to merit further consideration at eLife, clearer descriptions of ease of use of Immunocluster compared to other methods and other distinguishing features should be better highlighted and additional benchmarking comparisons.

Summary:

This tool enables advances but other tools already available that are sufficient and therefore the authors will need to convince a large portion of the researcher to use this tool, better comparisons should be provided to help in answering this question and then also better convince readers to use this software.

Essential revisions:

The authors note that several other similar packages influenced the development of ImmunoCluster (Refs 13-16) but it is unclear what makes this package novel. For instance, how does this workflow compare to other packages such as CATALYST (referenced by the authors) for quantifying cell cluster abundances and assessment of differential composition, which has been used in various ways in several publications (Crowell et al. F1000Res. 2020 [this one is new so not yet possible to have referenced], Weber et al., Communications Biology 2019, Nowicka et al., F1000Res. 2017, and related Fonseka et al. Sci. Trans. Med. 2018)? With regards to the differential abundance testing, there has been some excellent work in recent years, including DiffCyt (https://www.nature.com/articles/s42003-019-0415-5) and Cydar (https://pubmed.ncbi.nlm.nih.gov/28504682/). These methods among others seem to be the state-of-the-art in differential abundance analysis. How do your differential analysis results compare to these others, for example? Can you implement these methods in your pipeline?

The authors mention that the pipeline can accommodate millions of cells. Is this the case for all the clustering and visualization steps, or do you require downsampling? It would be helpful to see an analysis of run-time/details of the computer used to analyse the data. It appears the clustering methods implemented here (phenograph for example) might not work well if there are millions of cells across multiple samples.

In the MDS plots of samples (see Figure 3A), it is unclear what features are used to project the samples. Could you please elaborate on this point?

Does this current approach have any means for assessing or correcting for batch effects within the datasets? The authors only mention sample barcoding as an experimental plan to reduce batch effects, but fail to mention of any strategy to assess or correct for batch effects, which is a very important consideration that should be addressed.

Similarly, although channel spillover and differences in cell numbers were mentioned, no means were provided to assess or rectify these effects using the approach described.

https://doi.org/10.7554/eLife.62915.sa1

Author response

Essential revisions:

The authors note that several other similar packages influenced the development of ImmunoCluster (Refs 13-16) but it is unclear what makes this package novel. For instance, how does this workflow compare to other packages such as CATALYST (referenced by the authors) for quantifying cell cluster abundances and assessment of differential composition, which has been used in various ways in several publications (Crowell et al. F1000Res. 2020 [this one is new so not yet possible to have referenced], Weber et al., Communications Biology 2019, Nowicka et al., F1000Res. 2017, and related Fonseka et al. Sci. Trans. Med. 2018)? With regards to the differential abundance testing, there has been some excellent work in recent years, including DiffCyt (https://www.nature.com/articles/s42003-019-0415-5) and Cydar (https://pubmed.ncbi.nlm.nih.gov/28504682/). These methods among others seem to be the state-of-the-art in differential abundance analysis. How do your differential analysis results compare to these others, for example? Can you implement these methods in your pipeline?

We thank the reviewers for their comments. We have highlighted and discussed below where we have made revisions to address the comments raised. In response, we have added a new paragraph to the introduction that describes frameworks which influenced the development of ImmunoCluster and provide comparisons with other available frameworks, e.g. CATALYST. We use this discussion to highlight and more clearly emphasize the unique aspects provided by the ImmunoCluster framework, such as the easily interpreted imaging mass cytometry data analysis, customizable figure outputs, and adaptive down-sampling. Each discussed in detail in the manuscript and below. We hope that these comparisons now more clearly define ImmunoCluster’s niche in this area.

Influential packages and comparison with other available frameworks

Lines 47-54: “A number of previous computational cytometry workflows have been proposed, including CyTOF workflow [1], Diggins et al.[2] and CapX[3]. […] We have therefore developed ImmunoCluster, building on and extending the framework proposed in the CyTOF workflow[1] to create modular, flexible and easy-to-use implementations of cytometry analysis pipelines.”

Lines 76-92: ImmunoCluster also contains functionality foradaptive down-sampling at the import and dimensionality reduction stages of the pipeline to overcome significant increases in runtime associated with large datasets in dimensionality reduction algorithms like UMAP and tSNE.[…] Methods for analyzing large cytometry datasets, in particular IMC datasets, in an open-source computational environment are currently limited.”

Additional advantages

We would also like to highlight below some of the key novel aspects of ImmunoCluster which are now emphasized within the manuscript.

Easily interpreted imaging mass cytometry data analysis: ImmunoCluster pipeline has the ability to carry out downstream imaging mass cytometry (IMC) data analysis. This allows users to run IMC data in line with flow cytometry (FC) and liquid mass cytometry (LMC) data analyses. Due to the lower resolution of IMC data (compared with FC and LMC) we specifically designed the ‘ranked expression heatmap’ function to work with IMC data, which allows users to rank markers measured to help identify cell populations. Presented in Figures 4 and 5.

lines 85-92: “Importantly, and unique to ImmunoCluster, the scope of functionality in this package also enables the analysis of IMC data. […] Methods for analyzing large cytometry datasets, in particular IMC datasets, in an open-source computational environment are currently limited.”

Customizable figure outputs: In addition to the section mentioned above we have provided a new Supplementary Figure (Figure 1—figure supplement 1) to showcase our user-friendly ‘customizable figure outputs ’, detailing a variety of tailor-made figures produced by ImmunoCluster. We offer flexibility throughout the framework; researchers can easily adapt all figures and outputs to suit their specific needs. The outputs from ImmunoCluster are created using the ggplot package which generates ggplot graphical objects that are flexible and easily modified, resulting in an abundance of tailored outputs for the user to assess and use in reports, presentations, and publications. One example is the outputs a user can create from the dimensionality reduced data (e.g. UMAP algorithm). Users can project marker expression onto UMAP plots to help visually identify the cell types represented in the islands. The density of cells in each island can also be visualized using a density plot function. After clustering (e.g. FlowSOM), clusters can be projected onto the UMAP plot, the number of clusters can be easily changed so users can visually inspect different total number of clusters. Once users have assigned cell types, these can also be projected onto the UMAP plot and help users visually inspect their cluster assignments. Furthermore, researchers can use the metadata to colour and split the UMAP plots, allowing users to identify which cell islands are present in certain conditions or at particular time points (Figure 1—figure supplement 1).

Lines 107-111: “ImmunoCluster also offers user-friendly flexibility throughout the framework, researchers can easily adapt all figures and outputs to suit their specific needs resulting in an abundance of tailored outputs for the user to assess and use in publications, reports and presentations. An example detailing a variety of tailor-made figures produced by ImmunoCluster can be found in Figure 1—figure supplement 1.”

Adaptive down-sampling: ImmunoCluster also contains functionality for adaptive down-sampling at the import and dimensionality reduction stages of the pipeline to overcome significant increases in runtime associated with large datasets in dimensionality reduction algorithms like UMAP and tSNE.

Lines 76-78: “ImmunoCluster also contains functionality for adaptive down-sampling at the import and dimensionality reduction stages of the pipeline to overcome significant increases in runtime associated with large datasets in dimensionality reduction algorithms like UMAP and tSNE.”

Differential abundance testing

The ImmunoClusterstat_test_clust function allows users to carry out t-test and Wilcoxon rank sum test and export results tables which we feel enrich the utility of the pipeline for the basic research scientist, who will now be able to more easily analyze and interrogate statistically significant biological changes within the data.

In response to the reviewers’ comment regarding differential abundance testing, we carried out a concurrent test using the diffCyt framework with the Hartmann et al.[9] data to benchmark the output of the pipeline with currently available ‘gold standard’ tools. Additionally, we highlight that users can easily extract the data from theImmunoCluster generated SingleCellExperiment object and carry out downstream analyses with other packages such as diffcyt. Text now reads:

Lines 172-179: “We compared ImmunoCluster’s differential abundance testing output (stat_test_clust), run in the t-test mode, with the Hartmann et al. data[9] into thediffcyt computational framework [6], which is a state-of-the-art tool fordifferential discovery analyses. Diffcyt identified the same three cell clusters (memory B-cells, naïve B-cells and naïve CD4+ T-cells) as differentially abundant (FDR p<0.05) between the GvHD and none conditions. This demonstrates concordance with ImmunoCluster’s statistical output that identified naïve B-cells and naïve CD4+ T-cells as one of the principle differentially abundant populations, which was also identified in the original analysis of the data (Supplementary file 4).”

Lines 488-495: “Data from the ImmunoCluster generated SingleCellExperiment object can be easily extracted for downstream analysis with differential analysis packages such as diffcyt [6] and cydar [10] for differential discovery, which is based on a combination of high-resolution clustering and empirical Bayes moderated tests. In addition, the ImmunoCluster framework offers the stat_test_clust and stat_test_expression functions which use the Wilcox rank sum or Welch’s T-test to identify which cell clusters are significantly different between conditions (any two-way comparison can be made using metadata input by user), and which markers are significantly different within cell clusters between different conditions, respectively.”

The authors mention that the pipeline can accommodate millions of cells. Is this the case for all the clustering and visualization steps, or do you require downsampling? It would be helpful to see an analysis of run-time/details of the computer used to analyse the data. It appears the clustering methods implemented here (phenograph for example) might not work well if there are millions of cells across multiple samples.

The ImmunoCluster package contains functionality to adaptively down-sample cell number at both the data import and dimensionality reduction stage. For computationally intensive dimensionality steps such as UMAP and tSNE down-sampling is likely required for large datasets with >1Million cells. Down-sampling is not required for clustering depending on the method selection, as suggested Phenograph will not scale effectively, but the FlowSOM method implemented will scale to large datasets. We do not require down-sampling for any of the additional visualization steps for the large 2.3 Million cell published LMC GvHD dataset analyzed. In response to the reviewer’s comment we have now provided a new Supplementary Figure with runtimes for a typical pipeline generated using ImmunoCluster and at different levels of down-sampling (Figure 1—figure supplement 2). We have also modified the text to read:

Lines 124-126: “We observed that a typical pipeline run on the full 2.3 million cell LMC GvHD dataset (with UMAP down-sampling to 500k cells) would take approximately 110 minutes on a 2.9 GHz Intel Core i7 Macbook pro with 16 GB RAM (Figure 1—figure supplement 2).”

Lines 504-514: “The inclusion of adaptive down-sampling of cells for the running of computationally intensive dimensionality reduction steps, using tools such as UMAP and tSNE, allows ImmunoCluster to generate cytometry analysis pipelines that readily scalable to a dataset of millions of cells across several samples and a variety of experimental or phenotypic conditions. […] Whilst the PhenoGraph clustering method does not readily scale, the FlowSOM-based clustering method can be implemented on much larger datasets.”

In the MDS plots of samples (see Figure 3A), it is unclear what features are used to project the samples. Could you please elaborate on this point?

Many thanks for spotting this important point, our apologies this wasn’t clear. In response, the median marker expression data in each sample were used to create the MDS plot, we have added this clarification in both the text and figure legend.

Lines 408-412: “One example of this is the Multidimensional scaling (MDS) plots.The median marker expression data from each sample are used to create MDS plots, and therefore allow for an initial visualization of data points (labelled with metadata) and show the similarity and differences between samples in two-dimensions.”

Lines 617-621: “Figure 1. Initial exploration of LMC data from patients with leukemia who received bone marrow transplants. (A-B) Multidimensional scaling (MDS) of data, median marker expression data from each sample were used to create plots, annotated with condition (GvHD or none) (A), and time after bone marrow transplant (BMT) treatment (B). C Heatmap showing the median marker expression for each patient.”

Does this current approach have any means for assessing or correcting for batch effects within the datasets? The authors only mention sample barcoding as an experimental plan to reduce batch effects, but fail to mention of any strategy to assess or correct for batch effects, which is a very important consideration that should be addressed.

We agree with the reviewer’s comment that assessing and correcting for batch effects is very important, therefore in response we have now added a specific paragraph discussing strategies used for batch correction to the manuscript which will further guide and support the utilization of ImmunoCluster. Text now reads:

Lines 282-292: “Technical inter-sample marker signal variability due to batch effects may impact the ability of unsupervised analysis to reliably detect certain populations, if significant batch effects are present. […] Statistical methods for batch correction may also be applied, in recent years a class of methods called Remove Unwanted Variation (RUV) have been developed, and CytofRUV [12] is a recently developed package specifically designed for CyTOF dataset batch correction.”

Similarly, although channel spillover and differences in cell numbers were mentioned, no means were provided to assess or rectify these effects using the approach described.

In response, we have also now expanded our description on how to rectify channel spillover within the text.

Lines 298-304: “Although channel spillover is diminished in mass cytometry, it still exists in fluorescence-based FC, and should be considered when designing antibody panels to reduce the effects on introducing cell phenotype artifacts in downstream unsupervised analysis. […] where cells are stained with all fluorescently-tagged antibodies except for the one of interest [13].”

Furthermore, we have also expanded our descriptions around the potential issues associated with cell number disparity between samples.

Lines 292-298: “If there are significant differences in the total number of cells recovered between samples, samples with many more cells may bias the clustering. […] If a particular sample has very few cells they may need to be excluded from the analysis as they may not be a representative sample.”

References:

https://github.com/HelenaLC/CATALYST

https://github.com/kevinblighe/scDataviz

1. Nowicka, M., et al., CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets [version 4; peer review: 2 approved]. 2019. 6(748).2. Diggins, K.E., P.B. Ferrell, Jr., and J.M. Irish, Methods for discovery and characterization of cell subsets in high dimensional mass cytometry data. Methods (San Diego, Calif.), 2015. 82: p. 55-63.3. Ashhurst, T.M., et al., Analysis of the Murine Bone Marrow Hematopoietic System Using Mass and Flow Cytometry, in Mass Cytometry: Methods and Protocols, H.M. McGuire and T.M. Ashhurst, Editors. 2019, Springer New York: New York, NY. p. 159-192.4. Crowell H, Z.V., Chevrier S, Robinson M, CATALYST: Cytometry dATa anALYSis Tools. R package version 1.12.2, . 2020.5. Chen, H., et al., Cytofkit: A Bioconductor Package for an Integrated Mass Cytometry Data Analysis Pipeline. PLOS Computational Biology, 2016. 12(9): p. e1005112.6. Weber, L.M., et al., diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering. Communications Biology, 2019. 2(1): p. 183.7. Ashhurst, T.M., et al., Integration, exploration, and analysis of high-dimensional single-cell cytometry data using Spectre. bioRxiv, 2020: p. 2020.10.22.349563.8. Blighe, K., “scDataviz: single cell dataviz and downstream analyses.” . 2020.9. Hartmann, F.J., et al., Comprehensive Immune Monitoring of Clinical Trials to Advance Human Immunotherapy. Cell Rep, 2019. 28(3): p. 819-831.e4.10. Lun, A.T.L., A.C. Richard, and J.C. Marioni, Testing for differential abundance in mass cytometry data. Nature Methods, 2017. 14(7): p. 707-709.11. Schulz, A.R. and H.E. Mei, Surface Barcoding of Live PBMC for Multiplexed Mass Cytometry. Methods Mol Biol, 2019. 1989: p. 93-108.12. Trussart, M., et al., Removing unwanted variation with CytofRUV to integrate multiple CyTOF datasets. eLife, 2020. 9: p. e59630.13. Roederer, M., Spectral compensation for flow cytometry: Visualization artifacts, limitations, and caveats. 2001. 45(3): p. 194-205.

https://doi.org/10.7554/eLife.62915.sa2

Article and author information

Author details

  1. James W Opzoomer

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    Jessica A Timms and Kevin Blighe
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6842-756X
  2. Jessica A Timms

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    James W Opzoomer and Kevin Blighe
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3687-9312
  3. Kevin Blighe

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    James W Opzoomer and Jessica A Timms
    Competing interests
    No competing interests declared
  4. Thanos P Mourikis

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Software, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Nicolas Chapuis

    Institut Cochin, Institut National de la Santé et de la Recherche Médicale U1016, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 8104, Université Paris Descartes, Paris, France
    Contribution
    Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Richard Bekoe

    UCL Cancer Institute, Paul O'Gorman Building, University College London, London, United Kingdom
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  7. Sedigeh Kareemaghay

    Centre for Host Microbiome Interaction, FoDOCS, King’s College, Guy’s Hospital, London, United Kingdom
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  8. Paola Nocerino

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  9. Benedetta Apollonio

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  10. Alan G Ramsay

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0452-0420
  11. Mahvash Tavassoli

    Centre for Host Microbiome Interaction, FoDOCS, King’s College, Guy’s Hospital, London, United Kingdom
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  12. Claire Harrison

    1. School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    2. Haematology Department, Guy’s Hospital, London, United Kingdom
    Contribution
    Funding acquisition, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  13. Francesca Ciccarelli

    1. School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    2. Cancer Systems Biology Laboratory, The Francis Crick Institute, London, United Kingdom
    Contribution
    Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  14. Peter Parker

    1. School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    2. Francis Crick Institute, London, United Kingdom
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  15. Michaela Fontenay

    Institut Cochin, Institut National de la Santé et de la Recherche Médicale U1016, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 8104, Université Paris Descartes, Paris, France
    Contribution
    Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5492-6349
  16. Paul R Barber

    1. School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    2. UCL Cancer Institute, Paul O'Gorman Building, University College London, London, United Kingdom
    Contribution
    Resources, Software, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8595-1141
  17. James N Arnold

    School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    Contribution
    Conceptualization, Resources, Software, Supervision, Funding acquisition, Investigation, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    Shahram Kordasti
    For correspondence
    james.n.arnold@kcl.ac.uk
    Competing interests
    No competing interests declared
  18. Shahram Kordasti

    1. School of Cancer and Pharmaceutical Sciences, King’s College London, Faculty of Life Sciences and Medicine, Guy’s Hospital, London, United Kingdom
    2. Haematology Department, Guy’s Hospital, London, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    James N Arnold
    For correspondence
    shahram.kordasti@kcl.ac.uk
    Competing interests
    Honoraria: Beckman Coulter, GWT-TUD, Alexion. Consulting or Advisory Role: Syneos Health. Research Funding: Celgene, Novartis pharmaceutical
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0347-4207

Funding

Cancer Research UK (C604/A25135)

  • Jessica A Timms
  • Shahram Kordasti

Aplastic Anemia and MDS International Foundation

  • Shahram Kordasti

H2020 European Research Council (335326)

  • James W Opzoomer
  • James N Arnold

Medical Research Council (MR/N013700/1)

  • James W Opzoomer

Medical Research Council

  • James W Opzoomer

Rosetrees Trust (M117-F2)

  • Sedigeh Kareemaghay
  • Mahvash Tavassoli

Guy's and St Thomas' NHS Foundation Trust (G170701)

  • Jessica Timms
  • Shahram Kordasti

LifeArc (1118406)

  • Jessica Timms
  • Shahram Kordasti

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to thank the following funding bodies for their support:

Guy's and St Thomas Hospital Charity: grant number: G170701, Jessica A Timms, Shahram Kordasti.

Cancer Research UK (CRUK), King's Health Partners Centre : Jessica A Timms, Shahram Kordasti C604/A25135.

Aplastic Anemia and MDS International Foundation (AAMDSIF): Shahram Kordasti.

Blood Cancer UK: Shahram Kordasti.

European Research Council (ERC): James N Arnold Start up grant 335326.

Medical Research Council (MRC): James W Opzoomer MR/N013700/1.

Medical Research Council (MRC): James W Opzoomer Doctoral Training Partnership in Biomedical Sciences.

Rosetrees Trust (Rosetrees): Sedigeh Kareemaghay, Mahvash Tavassoli M117-F2.

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. The research was supported by the Cancer Research UK King’s Health Partners Centre and Experimental Cancer Medicine Centre at King’s College London, and the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.

Ethics

Human subjects: Formalin-fixed paraffin-embedded (FFPE) DLBCL tumor tissue was obtained from King's College Hospital, in accordance with the Declaration of Helsinki and approved by the UK National Research Ethics Committee (reference 13/NW/0040). Head and neck squamous cell carcinoma (HNSCC) tissue was obtained from King's College Hospital, consent was attained by the King Guy's & St Thomas' Research Biobank, within King's Health Partners Integrated Cancer Centre. The non-interventional study which collected bone marrow samples from elderly healthy donors was approved by the ethical committee of Cochin-Port Royal Hospital (Paris, France) (CLEP Decision N°: AAA-2020-08039).

Senior Editor

  1. Tadatsugu Taniguchi, Institute of Industrial Science, The University of Tokyo, Japan

Reviewing Editor

  1. Simon Yona, The Hebrew University of Jerusalem, Israel

Reviewer

  1. Evan W Newell, Fred Hutchinson Cancer Research Center, United States

Publication history

  1. Received: September 8, 2020
  2. Accepted: April 22, 2021
  3. Accepted Manuscript published: April 30, 2021 (version 1)
  4. Version of Record published: May 11, 2021 (version 2)

Copyright

© 2021, Opzoomer et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,678
    Page views
  • 199
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Chromosomes and Gene Expression
    2. Computational and Systems Biology
    Torahiko L Higashi et al.
    Research Article

    The cohesin complex topologically encircles DNA to promote sister chromatid cohesion. Alternatively, cohesin extrudes DNA loops, thought to reflect chromatin domain formation. Here, we propose a structure-based model explaining both activities. ATP and DNA binding promote cohesin conformational changes that guide DNA through a kleisin N-gate into a DNA gripping state. Two HEAT-repeat DNA binding modules, associated with cohesin’s heads and hinge, are now juxtaposed. Gripping state disassembly, following ATP hydrolysis, triggers unidirectional hinge module movement, which completes topological DNA entry by directing DNA through the ATPase head gate. If head gate passage fails, hinge module motion creates a Brownian ratchet that, instead, drives loop extrusion. Molecular-mechanical simulations of gripping state formation and resolution cycles recapitulate experimentally observed DNA loop extrusion characteristics. Our model extends to asymmetric and symmetric loop extrusion, as well as z-loop formation. Loop extrusion by biased Brownian motion has important implications for chromosomal cohesin function.

    1. Computational and Systems Biology
    Pierpaolo Sorrentino et al.
    Short Report Updated

    Brain activity during rest displays complex, rapidly evolving patterns in space and time. Structural connections comprising the human connectome are hypothesized to impose constraints on the dynamics of this activity. Here, we use magnetoencephalography (MEG) to quantify the extent to which fast neural dynamics in the human brain are constrained by structural connections inferred from diffusion MRI tractography. We characterize the spatio-temporal unfolding of whole-brain activity at the millisecond scale from source-reconstructed MEG data, estimating the probability that any two brain regions will significantly deviate from baseline activity in consecutive time epochs. We find that the structural connectome relates to, and likely affects, the rapid spreading of neuronal avalanches, evidenced by a significant association between these transition probabilities and structural connectivity strengths (r = 0.37, p<0.0001). This finding opens new avenues to study the relationship between brain structure and neural dynamics.