ImmunoCluster provides a computational framework for the nonspecialist to profile high-dimensional cytometry data
Abstract
High-dimensional cytometry is an innovative tool for immune monitoring in health and disease, and it has provided novel insight into the underlying biology as well as biomarkers for a variety of diseases. However, the analysis of large multiparametric datasets usually requires specialist computational knowledge. Here, we describe ImmunoCluster (https://github.com/kordastilab/ImmunoCluster), an R package for immune profiling cellular heterogeneity in high-dimensional liquid and imaging mass cytometry, and flow cytometry data, designed to facilitate computational analysis by a nonspecialist. The analysis framework implemented within ImmunoCluster is readily scalable to millions of cells and provides a variety of visualization and analytical approaches, as well as a rich array of plotting tools that can be tailored to users’ needs. The protocol consists of three core computational stages: (1) data import and quality control; (2) dimensionality reduction and unsupervised clustering; and (3) annotation and differential testing, all contained within an R-based open-source framework.
Introduction
Systems immunology approaches aim to explore and understand the complexity of the immune system. However, with 350 cluster of differentiation (CD) antigens, over 100 cytokines and chemokines, and many different cell subsets, this is a challenging task (Davis et al., 2017). Liquid mass cytometry (LMC), imaging mass cytometry (IMC), and flow cytometry (FC) are powerful techniques applied to exploratory immunophenotyping and biomarker discovery. With the ability to profile over 40 markers on an individual cell, these techniques have rapidly expanded our understanding of the immune system and its perturbations throughout disease pathogenesis (Bertolo et al., 2020; Wang et al., 2020). However, such comprehensive analyses produce large amounts of data and increased dimensionality, resulting in a demanding computational task to analyze (Newell and Cheng, 2016). Often, a limiting factor to these analyses is the requirement of an in-depth knowledge of computational biology. There is an unmet need for an easy-to-use flexible open-source computational framework to explore a variety of high-dimensional single-cell cytometry datasets such as LMC, IMC, and FC within a single framework for nonspecialists. A number of previous computational cytometry workflows have been proposed, including CyTOF workflow (Nowicka, 2019), (Diggins et al., 2015) and CapX (Ashhurst, 2019). Subsequently, some of these workflows have been incorporated into graphical user interfaces (GUI) or code-based workflow packages such as CATALYST (Crowell et al., 2020), Cytofkit (Chen et al., 2016), diffcyt (Weber et al., 2019), and Spectre (Ashhurst, 2020). However, these approaches can be challenging to implement without significant computational knowledge. We have therefore developed ImmunoCluster, building on and extending the framework proposed in the CyTOF workflow (Nowicka, 2019), to create modular, flexible, and easy-to-use implementations of cytometry analysis pipelines. The ImmunoCluster package is a framework that provides appropriate data structures and methods to increase the utility of these high-dimensional methods and make data analysis and interpretation accessible to all researchers to facilitate immune phenotyping projects, such as those associated with preclinical and clinical studies.
Here, we present ImmunoCluster, an open-source computational framework for the analysis of high-dimensional LMC, IMC, and FC datasets. ImmunoCluster is an R package and framework that focuses on the organization and visualization of data to help define biological identity of cells, construct a hierarchy of biologically meaningful cell populations, and detect significant changes between conditions, timepoints, and groups, including serial samples from a clinical trial setting. The framework uses state-of-the-art computational techniques in an easily amendable and flexible format. Users can manipulate figures and outputs to suit their specific needs. The computational approaches used to analyze high-dimensional data are rapidly evolving, and ImmunoCluster has the flexibility to incorporate these novel computational methods in dimensionality reduction, clustering, and trajectory analysis within its framework. The computational framework is designed for ease of use and broad applicability so that it is simple enough to be implemented by users with only a basic knowledge of R, as well as possessing the flexibility for more advanced users to incorporate additional new methodology and build on the framework. The framework relies on the SingleCellExperiment class (Lun, 2017), which means that the flow cytometry standard (FCS) data is contained within a purpose-built object that stores all stages of analysis to permit multiple analysis paths to be performed in parallel. The framework incorporates methods for applying popular R packages (i.e., RPhenograph [Chen and Chen, 2015], Rtsne [Krijthe, 2018], and FlowSOM [Van Gassen, 2015a]) along with convenient ‘wrapper functions’ allowing for the use these popular dimensionality reduction and clustering algorithms in an easy format for nonspecialists to rapidly produce interpretable data that are extendable to advanced experimental designs. ImmunoCluster also contains functionality for adaptive downsampling at the import and dimensionality reduction stages of the pipeline to overcome significant increases in runtime associated with large datasets in dimensionality reduction algorithms like UMAP and tSNE. The outputs from ImmunoCluster are created using the ggplot package, which generates ggplot graphical objects that are flexible, can be further modified and utilized directly into figures for reports and publications. The core of the visualization tool was developed using scDataviz, a Bioconductor package (Blighe, 2020a) for visualizing single-cell data and influenced by the visual framework outlined in CyTOF workflow (Nowicka, 2019). ImmunoCluster offers an alternative to the currently used frameworks such as CATALYST (Crowell et al., 2020) and Cytofkit (Chen et al., 2016), with a focus on ease of use for biologists, as well as offering additional novel aspects for its users compared with other published frameworks. Importantly, and unique to ImmunoCluster, the scope of functionality in this package also enables the analysis of IMC data. Users are able to run IMC data in line with FC and LMC data analyses. Due to the lower resolution of IMC data (compared with FC and LMC), we specifically designed the ‘ranked expression heatmap’ function to work with IMC data, which allows users to rank markers measured to help identify cell populations. IMC provides a means to analyze the spatial dimension of the cells in situ within tissues, which can provide important insight when ascribing functionality to cell populations. Methods for analyzing large cytometry datasets, particularly IMC datasets, in an open-source computational environment are currently limited. ImmunoCluster has been designed for use by a nonspecialist; however, it would be useful to a range of users from wet-lab nonspecialist experimentalists to experienced computational biologists with potential utility in day-to-day research through to large-scale studies involving multiple longitudinal serial biopsies where samples need to be analyzed and compared.
Clinical immune monitoring is an area of increasing utilization (Hernandez et al., 2020). In the current work, we will present examples of ImmunoCluster’s ability to detect clinically relevant perturbations in immune cellular heterogeneity in LMC, IMC, and FC datasets in human patient and/or healthy samples. ImmunoCluster is easily extensible to several arbitrary experimental designs and can be utilized to aid both discovery of novel subpopulations of cells within heterogeneous samples or as a foundation for monitoring longitudinal changes or responses to treatment. As such, ImmunoCluster provides a resource that will permit unsupervised high-dimensional data analysis to be more widely adopted in immunobiology and many other disciplines.
Results
This section aims to convey the scope of ImmunoCluster, its versatility, and applicability to complex datasets across a range of high-dimensional approaches. We implemented the ImmunoCluster framework for three different types of high-dimensional single-cell data (LMC, IMC, and FC). ImmunoCluster also offers user-friendly flexibility throughout the framework, and researchers can easily adapt all figures and outputs to suit their specific needs, resulting in an abundance of tailored outputs for the user to assess and use in publications, reports, and presentations. An example detailing a variety of tailor-made figures produced by ImmunoCluster can be found in Figure 1—figure supplement 1.
We first used previously published and publicly available LMC data (Hartmann et al., 2019). This dataset was chosen as it is representative of an intricate immunophenotyping project and would test the ability of ImmunoCluster to reproduce published results. Secondly, we investigated two novel IMC datasets, from HNSCC and diffuse large B-cell lymphoma (DLBCL) patients with the aim of demonstrating the applicability of the ImmunoCluster tool for IMC data analysis and cell cluster identification and visualization. Finally, FC data from the BM of seven healthy donors (HDs) during hip surgery were used to show the ability of the framework to analyze conventional FC data and identify rare immune cell populations.
Liquid mass cytometry
The chosen dataset (Hartmann et al., 2019) includes mass cytometry data from 15 patients with leukemia who all received bone marrow transplant (BMT), three of whom developed acute graft versus host disease (GvHD). Samples were taken at 30 and 90 days post-BMT. Thirty-three cell markers, both lineage and functional, were analyzed using mass cytometry (Helios CyTOF system). Prior to uploading into the ImmunoCluster framework, the FCS files were normalized and gated as described in the Materials and methods section. We observed that a typical pipeline run on the full 2.3 million cell LMC GvHD dataset (with UMAP downsampling to 500k cells) would take ~110 min on a 2.9 GHz Intel Core i7 MacBook pro with 16 GB RAM (Figure 1—figure supplement 2).
An initial visualization of the data was first carried out using multidimensional scaling (MDS) plots, which were overlaid with metadata to identify the similarity of patients by condition and day of measurement (Figure 1A, B). A heatmap was also created to give an overview of marker expression for each patient and annotated with the metadata provided (Figure 1C). The UMAP algorithm was used for dimensionality reduction of data. There was a difference in the distribution of cells from GvHD and ‘none’ across the cell islands between both timepoints, 30 and 90 days after BMT (Figure 2A).
The mass cytometry panel used in this experiment was specifically designed to identify all major human immune cell lineages (Hartmann et al., 2019); the marker expression across cell islands was visualized and indicated cell types present within the cell islands (Figure 2B) (expression of all markers measured; Figure 2—figure supplement 1). To further distinguish the cell types that were present in the cell islands and visible from the dimensionality reduced UMAP plots (Figure 2B), the FlowSOM clustering algorithm with consensus clustering metaclustering was applied, which has been demonstrated to scale well to large cytometry datasets (Weber and Robinson, 2016). The number of clusters input into the FlowSOM algorithm was slightly higher than the number of expected cell clusters (n = 56) (Figure 2—figure supplement 2). The SingleCellExperiment object stored the data for each K meta-clustering (K1-K56); therefore, further downstream data exploration was carried out looking at different numbers of K clusters. A heatmap showing the expression of the 33 markers measured was used to identify cluster cell types (Figure 2—figure supplement 2), and ImmunoCluster was successfully able to replicate the findings from Hartmann et al., 2019, with 24 of the cell populations being identified (Figure 2C, D). Selecting the correct number of clusters to input into the clustering algorithm can be a challenging aspect of these analyses, but the framework provides the user with multiple parameters to aid this decision. We recommend that users should always overestimate the number of cell populations as data for each K number of clusters will be stored in the SingleCellExperiment object and can therefore be explored and refined later using the different visualization techniques available within the framework. Additionally, the elbow plot generated by the ConsensusClusterPlus package can be examined alongside the clustering algorithm output to guide the decision on the best number of clusters for downstream analysis (Figure 8—figure supplement 6). Over-clustering can allow for the identification of rare cell types within a dataset at the expense of often generating several clusters of highly prevalent cell types that likely represent the same biological cell type. ImmunoCluster provides the tools to manually and reproducibly merge clusters of the same biological cell identity into one group after over-clustering. These clusters can additionally be merged into each other to use a less granular, higher-level population annotation (see Figure 2—figure supplement 3 for an example of higher-level clustering).
The abundance of all cell types across all samples was measured, and CD14+CD16- monocytes were identified as the most abundant population, correlating with the Hartmann et al. data (Hartmann et al., 2019; Figure 2E). Individually displaying patient’s cell-type abundance means that we could identify variation within a group of interest; for example, we observed that although there is variation within the GvHD and none, overall, they appear to follow the same trends in cell-type abundance (Figure 3A). Additionally, significant differences between memory B-cells (false discovery rate [FDR] p=4.38×10−3), naïve B-cells (FDR p=1.35×10−2), and naïve CD4+ T-cells (FDR p=3.47×10−2) were identified between the GvHD and none (Figure 3B, C) using a t-test. A difference in number of naïve B-cells was previously reported by Hartmann et al., 2019 in this dataset and can be seen in Figure 2A (using Figure 2D for cell island identification), where there is a noticeable reduction of these cells in the GvHD patients. In addition, a reduction in naïve CD4+ T-cells can also be seen, which was also previously reported, but to a lesser extent. A volcano plot can be used to highlight the differentially abundant cell clusters between GvHD and none (GvHD logFC+ve and none logFC-ve), where cell types with FDR p<0.05 are shown in red (Figure 3D), which is also in line with the previously published data (Hartmann et al., 2019). We compared ImmunoCluster’s differential abundance testing output (stat_test_clust), run in the t-test mode, with the Hartmann et al. data (Hartmann et al., 2019) into the diffcyt computational framework (Weber et al., 2019), which is a state-of-the-art tool for differential discovery analyses. Diffcyt identified the same three cell clusters (memory B-cells, naïve B-cells, and naïve CD4+ T-cells) as differentially abundant (FDR p<0.05) between the GvHD and none conditions. This demonstrates concordance with ImmunoCluster’s statistical output that identified naïve B-cells and naïve CD4+ T-cells as one of the principal differentially abundant populations, which was also identified in the original analysis of the data (Supplementary file 4). Additionally, we explored the expression of checkpoint-related molecules, and their receptors, markers of proliferation, and invariant natural killer T (iNK T) cells in CD8+ T-cells and compared the expression of these in the patients with GvHD and none (Figure 3E). These markers help assess the functional states of cells and the checkpoint-related molecules and proliferative activity markers such as PD-1, PD-L1, TIM3, Ki-67, and TCRVa24-Ja18; the expression of these antigens has been proposed previously as candidate biomarkers for immunotherapy (Hartmann et al., 2019). Visually, GvHD patients had a higher expression of PD-1 and a noticeable difference in the Ki-67 proliferation marker as expected for these patients (Figure 3E).
Imaging mass cytometry
We utilized two different IMC datasets to demonstrate how ImmunoCluster would handle IMC data from two different tumor microenvironments that possess (1) a biological image with clear boundaries, such as head and neck squamous cell carcinoma (HNSCC), where clear tumor and stroma regions were evident; (2) a biological image that was more heterogeneous and diffuse, such as the DLBCL lymph node section. ImmunoCluster’s ability to explore IMC data adds an important element to the framework and demonstrates the flexibility and applicability of the tool. Researchers can explore IMC data easily, and in line with LMC and FC data. After data preprocessing (see Materials and methods and Figure 8—figure supplement 2), the transformed gated FCS files were uploaded into the ImmunoCluster framework. Generally, the same workflow that was applied to the LMC and FC data was applied to the IMC data from a section of tissue from patients with HNSCC and DLBCL (lymph node) as a proof-of-principle application of the ImmunoCluster tool for IMC data analysis (Figures 4 and 5). The only difference with the IMC data was the use of a rank heatmap (Figure 4—figure supplement 1); this ranks the order of each marker for each sample (1 – total number of samples). The reason for this was clarity and ease of cell identification across clusters as IMC produces lower resolution data compared to the data output of LMC or FC, which makes manual assignments of populations more difficult to interpret. Therefore, a rank of expression was used as an easy means to facilitate the assignment of cell identity through defining the highest and lowest expression of markers between clusters.
Head and neck squamous cell carcinoma tissue section
Cell segmented regions are annotated in Figure 4A, where regions 1–3 were selected as tumor regions and 3–6 as stroma regions. The ImmunoCluster tool was able to distinguish between the stroma and tumor regions of the tissue section from the HNSCC tumor (Figure 4B), and with additional markers could provide a more in-depth analysis of the cell phenotypes in each respective region of the tumor. Dimensionality reduction of the data clearly separated the cell islands belonging to the stroma and tumor regions of the tissue (Figure 4C), suggesting different cell phenotypes between the regions. The FlowSOM algorithm was applied to cluster the cells and identify cell populations. The rank heatmap was used to show the ranked expression of each marker (ranked 1–10, with 10 being high) for each cluster (1–10) (Figure 4—figure supplement 1), and the identified cell types were then used to annotate the heatmap (Figure 4D). High expression of E-cadherin can be used as a marker of cancerous tissue in HNSCC, and its expression correlates with the cancer regions selected for analysis (Figure 4D). We focused on CD8+ T-cell, B-cell, and macrophage cell clusters (Hladíková et al., 2019; Evrard et al., 2019). Regions 2 and 3 of the tumor mostly consist of proliferating (Ki-67+) tumor cells and macrophages (tumor-associated and PD-L1+). Region 1 was a mix of immune cells, B-cells, and PD-L1+ CD8+ T-cells (Figure 4E). Stromal regions 5 and 6 are mostly PD-L1+ macrophages and CD8+ T-cells. Region 4 has a cluster of mixed immune cells, and from the IMC image (Figure 4A) we can see that this region looks like the tumor is encroaching into the stromal region. The stroma seems to be mostly nonproliferating cells (Ki-67- compared to the tumor region) and CD8+ T-cells, including a CD20+ CD8+ cytotoxic subset (Gingele et al., 2020).
Diffuse large B-cell lymphoma lymph node section
The IMC DLBCL lymph node section was split by high and low Ki-67 expression to identify the highly proliferative tumor cells (Figure 5A); Ki-67 is used as a prognostic marker in DLBCL (Tang et al., 2017). The FlowSOM algorithm was applied to the data to split the IMC data into clusters; these clusters were projected onto the UMAP plot to visualize how these clusters were split between the Ki-67 high and low (Figure 5B). The marker expression ranking tool was applied to the clustered data and used to identify cell types (Figure 5C, Figure 5—figure supplement 1). The majority of Ki-67 high population were proliferating tumor cells (84%), and the Ki-67 low population consisted of a heterogeneous collection of immune cell populations, such as CD4+ T-cells, CD8+ T-cells, B-cells, macrophages, and dendritic cells (DCs) that were successfully identified using the ImmunoCluster tool (Figure 5D).
Flow cytometry
The HD BM taken during hip surgery was gated for the CD3+CD4+ population before analysis within the ImmunoCluster framework, with the aim of testing ImmunoCluster’s ability to identify minor populations such as Tregs and their subpopulations (Kordasti et al., 2016). The FlowSOM algorithm was applied to the data, resulting in 40 clusters (Figure 6—figure supplement 1), and a heatmap was created with median expression of markers to identify cell types (Figure 6B). Due to the markers used and the low prevalence of Tregs, the majority of cells were identified as CD4+ T-cells. Additionally, three populations of Tregs were identified by the ImmunoCluster tool: Tregs (CD25+CD127low) (3.6%, 1.3–5.5), Treg A (CD25+CD127low CD45RA+) (0.7%, 0.1–2.0), and Treg B-cells (CD25+CD127lowCCR4+CD95high) (0.9%, 0.2–2.4) (Figure 6C). The abundance of Treg A and B for all HDs is shown in Figure 6D, and their distribution was as expected (Kordasti et al., 2016).
Discussion
Here, we provide an overview of the flexibility and scope for utilizing ImmunoCluster, an open-access easy-to-use framework for LMC, IMC, and FC dataset analyses. The ImmunoCluster package emphasizes ease of use in the generation of flexible and modular cytometry analysis pipelines for applications such as clinical immune monitoring studies. Explanations of the functions and workflow have been provided, and an in-depth step-by-step protocol and tutorial are available on the GitHub site (https://github.com/kordastilab/ImmunoCluster).
The purpose for creating the ImmunoCluster framework was to support nonspecialist researchers to carry out complex immunophenotyping experiments. The dawn of ‘cytometry big data’ and its ever-growing utilization has surpassed the number of experimental parameters that a researcher can feasibly analyze as a collective. The ImmunoCluster framework was designed in collaboration between wet lab and computational biologists, with the purpose of creating a tool that enables state-of-the-art analysis, yet easy to use in its application. We provide sufficient detail and explanation for the nonspecialist researcher to understand the need for each analysis step as well as how to confidently execute the process, with an aim of performing transparent and reproducible analyses.
ImmunoCluster has been built with an emphasis on generalizability and scalability to facilitate a broad use and is an advance over other open-source tools as it provides an integrated framework to uniquely perform a complete computational pipeline on high-dimensional LMC, IMC, and FC data. The inclusion of IMC data analysis brings a unique element to the ImmunoCluster framework and differentiates it from other published protocols. ImmunoCluster was built to easily leverage many popular R packages for cytometry data (i.e., RPhenograph [Levine et al., 2015], Rtsne [van der Maaten and Hinton, 2008; Laurens van der, 2014], and FlowSOM [Van Gassen, 2015b]), selected because they are open-source and regularly maintained with extensive documentation. All methods prioritize customizable and attractive visualizations designed to be used by both dry- and wet-lab researchers/clinicians. The inclusion of adaptive downsampling of cells for the running of computationally intensive dimensionality reduction steps, using tools such as UMAP and tSNE, allows ImmunoCluster to generate cytometry analysis pipelines that are readily scalable to a dataset of millions of cells across several samples and a variety of experimental or phenotypic conditions. ImmunoCluster can also be run on a local desktop/laptop computer with standard configuration, depending on the number of cells, adaptive downsampling settings and selection of clustering algorithms can usually be run within a day end-to-end (Figure 1—figure supplement 2). Whilst the PhenoGraph clustering method does not readily scale, the FlowSOM-based clustering method can be implemented on much larger datasets.
The limitations of this computational framework are important to take into consideration when designing an experiment. Unsupervised clustering (stage 2) of samples is the most important stage of the framework, and its ability to accurately define populations across all samples is critical to all later stages of investigation. The number of k clusters to generate (cluster resolution) can significantly affect the ability to identify or determine a change in a biologically relevant population. Merging two subpopulations due to low resolution of clustering may mask an important experimental observation. Determining the precise number of clusters that are relevant for a given dataset is an important step. In our step-by-step guide, we provide complete documentation and clear user-friendly approaches to optimally define this.
Technical inter-sample marker signal variability due to batch effects may impact the ability of unsupervised analysis to reliably detect certain populations if significant batch effects are present. MDS plots created in stage 1 of the framework can be used to detect batch effects as well as other technical artifacts such as antibody staining anomalies. An approach commonly employed in mass cytometry to overcome this is to use sample barcoding, reducing the variability between each sample, allowing all samples to be exposed to the same antibody mixture (Schulz and Mei, 2019). Another regularly employed approach is the inclusion of a shared control sample in each independent batch, for example, the same cell type as the experimental samples but all from the same HD. Statistical methods for batch correction may also be applied; in recent years, a class of methods called remove unwanted variation (RUV) have been developed, and CytofRUV (Trussart et al., 2020) is a recently developed package specifically designed for CyTOF dataset batch correction. If there are significant differences in the total number of cells recovered between samples, samples with many more cells may bias the clustering. Samples with very few cells recovered may result in information loss and missing populations that are in fact present. The ImmunoCluster framework provides users with two opportunities (creating the SingeCellExperiment and running the dimensionality reduction) to downsample, which means the same number of samples will be used for each sample, rectifying the problem of varying cell numbers. If a particular sample has very few cells, they may need to be excluded from the analysis as they may not be a representative sample. Although channel spillover is diminished in mass cytometry, it still exists in fluorescence-based FC and should be considered when designing antibody panels to reduce the effects on introducing cell phenotype artifacts in downstream unsupervised analysis. As such, initial exploratory data analysis (e.g., MDS plots) is key to determining if any of these confounding factors might be present in the data. Additionally, users may want to introduce fluorescence‐minus‐one controls, where cells are stained with all fluorescently tagged antibodies except for the one of interest (Roederer, 2001).
Summary
The ImmunoCluster package increases the accessibility of advanced computational methods for users tasked with generating high-quality analysis of high-dimensional LMC, IMC, and FC data. More advanced users can leverage ImmunoCluster’s data structures, methods, and suitability for scripted analysis to integrate our work with other R analysis tools and build on the package’s foundation. In our clinically relevant immune monitoring case study setting, the ImmunoCluster framework successfully identified 24 phenotypically distinct cell clusters, and their abundance across all samples, and highlighted differentially represented cells between GvHD and none patients, all of which were consistent with the Hartmann et al. data (Hartmann et al., 2019). We also applied the framework to two sets of IMC data, showing the broad applicability of ImmunoCluster as well as the ease of being able to analyze the IMC data in line with LMC data, helping compare these types of data, which will be useful in future studies where multiple technologies are being applied within one study. Finally, we implemented FC data into ImmunoCluster, confirming that it was able to analyze conventional FC data and identify rare immune cell populations. Currently, the preprocessing of raw files is carried out prior to implementation into the ImmunoCluster framework, but due to the flexibility and design of the tool, future implementations may include IMC data preprocessing within the ImmunoCluster R package. This versatile framework provides an opportunity to include all researchers with varying knowledge and experience in computational biology to be involved in the experimental project from experimental design, wet lab/clinical trial, all the way through the data analysis process and visualization.
Materials and methods
Data file preprocessing and preparation for implementation into the ImmunoCluster framework
LMC data
Request a detailed protocolHere, we describe an example of the ImmunoCluster package analyzing previously published LMC data (CyTOF) from 15 patients with leukemia 30 and 90 days after BMT, previously published by Hartmann et al., 2019. After BMT, three of these patients suffered acute GvHD and 12 had no evidence of GvHD (noted as ‘none’ herein). For systems-level biomarker discovery within this trial, a panel of 33 antibodies were incorporated into the immunophenotyping panel (Supplementary file 1). The panel was designed to cover all major immune cell lineages and several functional subsets including T-, B-, NK, and myeloid cells and granulocytes (detailed in Hartmann et al., 2019). The marker panel also included a variety of immune-regulatory proteins, such as PD-1, PD-L1, and TIM3. The publicly available dataset from the Hartmann et al. study was extracted for the current study from the FlowRepository (http://flowrepository.org/id/FR-FCM-Z244) (Spidlen et al., 2012). Raw FCS files for the LMC dataset that were collected on the Helios CyTOF system (Fluidigm, UK) machine were first normalized using the free CyTOF 6.7 system control software (files can also be concatenated using the same software if necessary). These files were gated using FlowJo (version 10.5.3) (Becton, Dickinson and Company, UK) to remove beads, doublets, dead cells, and non-CD45+ cells as well as erythrocytes (CD235αβ/CD61+) and neutrophils (CD16+) (Figure 8—figure supplement 1), as described by Hartmann et al., 2019, before implementation into the ImmunoCluster framework. There are various alternative open-access tools that can be used to gate these files, such as OpenCyto (Finak et al., 2014), CytoExploreR (Hammill, 2020), and Cytoverse (https://cytoverse.org/).
IMC data
View detailed protocolTwo different examples of IMC datasets were implemented into the framework. One IMC dataset was from a human tissue section taken from a patient with HNSCC. Consent was attained by the Guy’s and St Thomas’ Research Biobank, within King’s Health Partners Integrated Cancer Centre. The second IMC dataset was from a lymph node section from a patient with DLBCL. Formalin-fixed paraffin-embedded (FFPE) DLBCL tumor tissue was obtained from King’s College Hospital, in accordance with the Declaration of Helsinki and approved by the UK National Research Ethics Committee (reference 13/NW/0040). A detailed protocol is available for both datasets as a Supplementary file (Appendix 1 and 2, respectively). The raw HNSCC IMC data from 12 channels (10 markers + 2 intercalator-Ir channels; Supplementary file 2) and the raw DLBCL IMC data from 20 channels (19 markers + nuclei channel; Supplementary file 3) were processed with Python scripts and CellProfiler pipelines according to Schulz et al., 2018 to segment the image data into individual cells (Figure 8—figure supplement 2). The mean marker intensity for each segmented cell was exported (comma-separated values [CSV] file) and either divided into six regions of equal area (HNSCC, three tumor and three stroma regions) or divided by Ki-67 expression (DLBCL, Ki-67 high or low). The mean marker intensity for each segmented cell was multiplied by 65,535 to recover the image intensities (Fluidigm IMC machine = 16 bit, the dynamic range of the measurements is 0 to (216)−1) and asinh (co-factor 0.8) transformed in R Studio. These CSV files were converted into FCS files and gated in Cytobank (Chen and Kotecha, 2014); functions are also available within the ImmunoCluster package to convert SCE or CSV data files into FCS format.
FC data
Request a detailed protocolFC data from the BM of seven HDs were used to demonstrate the ability of the framework to analyze conventional FC data and identify rare immune cell populations. BM samples from HDs were obtained by extraction of the bone marrow cells from the bone of the femoral head. This noninterventional study was approved by the ethical committee of Cochin-Port Royal Hospital (Paris, France) (CLEP Decision No.: AAA-2020-08039). Femoral heads were obtained after informed consent during hip replacement surgery. These were cut in half and collected in a conservation medium (Hanks balanced salt solution with NaHCO3, Eurobio), supplemented with heparin (7%) and then transported to the laboratory at room temperature (RT). These were scraped with a spatula, ground in a mortar, and washed with a PBS solution supplemented with 100 µg/ml DNAse (Sigma Aldrich). The FCS files were gated for singlets using a FSCint/FSCpeak dot plot, and dead cells were removed using a forward scatter (FSC) and side scatter (SSC) dot plot. Leukocytes were gated on a CD45/SSC dot plot. Finally, they were gated to identify the CD3+CD4+ population of cells for analysis in the ImmunoCluster framework (gated in Cytobank; Chen and Kotecha, 2014).
Workflow overview
Request a detailed protocolThe ImmunoCluster package is accessible via GitHub (https://github.com/kordastilab/ImmunoCluster), where a detailed step-by-step walkthrough of the tool and R Markdown files (RMD) for LMC/FC and IMC data analysis are available. The RMD files permit users to logically work through the framework in sections in a ‘click and play’ manner. Additionally, all scripts to replicate figures published here are available via GitHub. Users will need to have R downloaded to run the ImmunoCluster tool. The ImmunoCluster framework provides tools and support, allowing researchers to follow a workflow that guides them through experimental design, data analyses, interpretation, and statistical significance testing, to publishable graphics identifying differences in phenotype and abundance of cells between conditions analyzed (Figure 2 and Figure 7 and Figure 8). The framework comprises three core computational stages that are conducted by the ImmunoCluster tool:
Stage 1: Data import and quality control
Request a detailed protocolWe provide tools designed primarily for computational preprocessing of FCS data files for downstream applications including parameter renaming and transformation. The generated high-dimensional data are imported into the computational framework as FCS file formats or summarized feature expression values in CSV tabular format.
Associated metadata files need to be completed by the researcher in the experimental design stage (Figure 7). This includes a sample_metadata file, in which all metadata relating to samples will need to be input, such as timepoints, response to treatments, and patient status (Figure 8—figure supplement 3A). For simplicity in executing this step, the panel_metadata file allows users to rename parameters, such as the markers used in the analysis as well as select the markers that are to be used for different analyses stages (all markers may not need to be included in the following steps) (Figure 8—figure supplement 3B).
All of the above-mentioned data is stored within a SingleCellExperiment object (Lun, 2017). The SingleCellExperiment is an S4 class object that is implemented into the ImmunoCluster framework and is in essence a data container in which you can store and retrieve information such as metal-barcoding for sample multiplexing, metadata, dimensionality reduction coordinates, and more (outlined in Figure 7). FCS files can also be exported from the SingleCellExperiment using the write_sce_to_fcs function.
After the data and metadata files have been imported and stored in the SingleCellExperiment object, we provide several approaches for initial exploratory visualization of the data. One example of this is the MDS plots. The median marker expression data from each sample are used to create MDS plots, and therefore allow for an initial visualization of data points (labeled with metadata) and show the similarity and differences between samples in two-dimensions. Additionally, these plots can be used to detect batch effects and other technical artifacts such as antibody staining anomalies. Moreover, a heatmap of expression of markers measured for each patient can be created and metadata can be used to annotate the heatmap.
Stage 2: Dimensionality reduction and unsupervised clustering
Request a detailed protocolDimensionality reduction is a key component of high-dimensional single-cell data analyses and enables researchers to view high-dimensional data. For example, the 33 protein markers analyzed by Hartmann et al., 2019 can be reduced to lower dimensional embedded coordinates per cell. Nonlinear dimensionality reduction techniques can avoid overcrowding and represent data in distinct cell islands (Figure 8).
The framework offers users three nonlinear statistical methods for representing high-dimensional single-cell data in low-dimensional space (Figure 8—figure supplement 4). MDS (utilized in stage 1 for initial data exploration), uniform manifold approximation and projection (UMAP) (Becht et al., 2019), and t-Distributed Stochastic Neighbor Embedding (tSNE) (van der Maaten and Hinton, 2008) algorithms are available to generate a two-dimensional (2D) embedding of the data. However, the SingleCellExperiment object can store any other form of arbitrary dimensionality reduction, such as principal component analysis (Blighe, 2020b) or DiffusionMaps (Angerer et al., 2016), among others.
The density of cells within the distinct cell islands produced by the 2D embedding of multidimensional data can be viewed using the density_plot function (Figure 2—figure supplement 1).
After the data has been visualized in dimensionality reduced space, clustering algorithms can be used to identify cell communities within the data, which allows the identification of phenotype and abundance of cell clusters within populations/groups being analyzed. A number of clustering algorithms are available within the framework to define cell populations of interest in an unbiased manner. First, an ensemble (use of multiple algorithms) clustering method of FlowSOM (Van Gassen, 2015b) and Consensus clustering (ConsensusClusterPlus R package, Wilkerson and Waltman, 2013), which tests the stability of the clusters, leading to better results than applying a basic hierarchical clustering algorithm (Weber and Robinson, 2016). Additionally, the PhenoGraph clustering algorithm is available, which uses K-nearest neighbors and a Euclidean distance metric (DiGiuseppe et al., 2018). Lastly, k-means clustering, which classifies cells into clusters with high intra-class similarity and low inter-class similarity, where each cluster is represented by its center (centroid) that corresponds to the mean of points (e.g., markers) assigned to the cluster. Either or all clustering algorithms can be applied at the users’ discretion.
The aim of the clustering algorithms is to assign all cells to K clusters (K1, K2, …), resulting in clusters corresponding to true cell types. Typically, a K larger than the number of expected cell types is chosen at this stage as the ImmunoCluster framework allows users to explore all K clusters (Figure 8—figure supplement 5), as well as collapsing clusters of the same cell type into one cluster if the user feels over-clustering has occurred. Over-clustering may enable the clustering identification of rare cell types, sometimes at the expense of creating several ‘artificial’ clusters of the more prevalent cell types. Additionally, the workflow allows the generation of an elbow plot that is created to help the user to select an appropriate number of clusters (Figure 8—figure supplement 6).
Stage 3: Annotation and differential testing
Request a detailed protocolAfter dimensionality reduction and clustering, the next step is to annotate each cluster based on marker expression. If over-clustering has occurred, multiple clusters of the same cell type can be present, and either a lower number of K clusters can be used; additionally, clusters of the same cell type can be collapsed into each other to create one cluster.
Clusters containing multiple cell types may also be identified; this could mean the user has under-clustered or occasionally a group of different cells are clustered together by the algorithms.
Expression of each marker analyzed can be projected onto the dimensionality reduced data by the researcher, aiding with the identification of cell types or phenotypically distinct clusters.
Outputs such as heatmaps showing the expression of all markers for each cluster can be used to identify the clusters of interest (Figure 2—figure supplement 2). These can be used to annotate figures produced in stage 3.
Biaxial dot plots with contouring of the SCE data can be produced using the biaxial_plot function; the expression of markers can be explored in the entire dataset or a subset of the data, that is, a particular cell cluster. A clustering option within biaxial_plot allows users to see the spread of data across two selected markers for different conditions (any metadata from the sample_metadata file), as well as different clusters (e.g., K1-60) (Figure 3—figure supplement 1).
Due to the type of data produced from IMC (lower resolution compared to LMC and FC), a novel function called imc_rank was specifically designed for the IMC data; this creates a heatmap in which the expression of each marker is ranked. This rank heatmap can be used to facilitate the identification of clusters that are ‘high’ or ‘low’ for particular markers to distinguish and logically assign cell populations manually.
These tools help define biological identity of cell clusters from stage 2 and construct a hierarchy of biologically meaningful cell populations based on marker expression and similarity.
The metadata added to the SingleCellExperiment can be used to annotate the dimensionality reduced (i.e., UMAP) and clustered (FlowSOM, Phenograph, or K-means) data, allowing for visualization of the distribution of cell islands between different condition/timepoints (Figure 8).
The above-mentioned metadata is also used to annotate all differential testing outputs, meaning the user can make many comparisons and explore multiple parameters all within the ImmunoCluster framework.
A variety of differential testing outputs can be created, such as median marker expression for each sample (using clustered data) can be used to produce a hierarchical clustered heatmap, and box plots of cell cluster abundance between different conditions at different timepoints (any metadata can be used to annotate these figures).
Statistically significant differences in the annotated populations between conditions are identified and displayed throughout the framework to the user for ongoing investigation and interpretation. Data from the ImmunoCluster-generated SingleCellExperiment object can be easily extracted for downstream analysis with differential analysis packages such as diffcyt (Weber et al., 2019) and cydar (Lun et al., 2017) for differential discovery, which is based on a combination of high-resolution clustering and empirical Bayes moderated tests. In addition, the ImmunoCluster framework offers the stat_test_clust and stat_test_expression functions that use the Wilcox rank-sum or Welch’s T-test to identify which cell clusters are significantly different between conditions (any two-way comparison can be made using metadata input by user), and which markers are significantly different within cell clusters between different conditions, respectively.
Appendix 1
Staining protocol for a human tissue section taken from a patient with head and neck squamous cell carcinoma (HNSCC).
Equipment | Consumables | Reagents | Reagent initial concentration | Reagent final Concentration |
Humidified slide chamber (Sigma Aldrich, #H6644-1EA)* | Sterile 3 ml pastettes (Alpha Labs, #LW112) | DPBS, Ca, and Mg free (Gibco, #14190-094) | ||
Dedicated glassware, coplin jars (Fisher, #10264732) ** | Superfrost Plus slides (Fisher, #ES-325) | Tween-20 (Sigma Aldrich, #P7949-100 ml)* | 100% | 0.1% |
Orbital shaker | Kimberly-Clark Precision Wipers (Fisher, #10623111)* | Antigen retrieval solution pH 9.0 (R&D Systems, #CTS013) | 1:10 in milli-Q water | |
Water bath | 0.2 syringe filter (TripleRed, #FPE204030) | Superblock (Thermo Fisher, #37515)** | ||
Fridge | 50 ml syringes (Fisher, #10636531) | Iridium (Fluidigm, #201192B)*** | 500 µM | 0.5 µM in DPBS |
Milli-Q water purification system (18.2 MΩ.cm) | Filter pipette tips | Metal tagged antibodies (Fluidigm) | Require optimization for your tissue type | |
Hard slide box for storage in −80°C (VWR #631-1504) | PAP pen (Abcam, #ab2601) | FcR block (Trustain, Biolegend, # 422302) | 100% | 5% |
Pipettes | Xylene (Acros, #180862500) | |||
Plastic tweezers, to handle slides (Farlamedical, #400200) | Ethanol (Sigma Aldrich, #32221-2.5L) | |||
Plastic coplin (Thermo Fisher, #E9611) **** | BSA (Sigma Aldrich, #A9647) | 20% To prepare on the same day freshly | 10% | |
Kiovin Hu IgG Provide by Hospital’s pharmacy | 10% 100 mg/ml | 5 mg/ml | ||
* Alternatively, use a pipette tip box with moist towel at the bottom and piece of parafilm on the tray (this prevents evaporation and stops condensation gathering at the back of the slide, which can pool around edges and dilute antibodies). **Rinse in water, do not clean with detergent or in laboratory dishwasher (source of barium contamination). **** Plasticware is required for 96°C water bath, used when dedicated histological autoclave or decloaking chamber is not available; you can also use a 50 ml centrifuge tube, loosely capped (Falcon or Corning). | *Normal laboratory hand towels and blue roll are fine to tip excess buffer off slide surface or to place in the bottom of the humidified slide chamber. However, they should not be used to wipe the back of the slides or draw off excess buffer from front – as can transfer lint to the slides. | *Permeabilization can also be conducted with Triton X100 or saponin ** Alternative blocking solutions include BSA, human albumin, human immunoglobulin (KIOVIG from Pharmacy), serum same species as antibodies (mouse serum, Sigma Aldrich, # M5905-10ML). *** DNA stain use alternative histone 3 antibody #3176023D. |
STAINING PROTOCOL DAY 1: SECTION CUTTING
Cut a 5 µm section from the block at and place in an incubation oven overnight (o/n) at 37°C.
Cut a 3 µm section for hematoxylin and eosin (H&E) staining.
STAINING PROTOCOL DAY 2:
1. PREPARATION
Prepare xylene and ethanol at 50:50 ratio ~ 50 ml of final volume.
Prepare the following ethanol dilutions, with milli-Q water: 96%, 90%, 80%, and 70% (~50 ml of final volume for each).
Dilute the 10X basic antigen retrieval buffer in milli-Q water in 50 ml falcon tube (leaving the cap loose) and heat to 96°C in a water bath.
*BSA (Sigma Aldrich, #A9647): prepare a 20% solution fresh using, 0.4 g BSA in a plastic tube. Add 1 ml of superblock solution and incubate in the water bath for 1 hr. Once it is fully dissolved, top up to 2 ml (final volume).
Prepare the wash buffer using Dulbecco’s phosphate buffered saline (DPBS) (calcium/magnesium free) and 0.1% Tween-20 (500 µl of Tween-20 [Sigma Aldrich, #P7949-100 ml] in the 500 ml DPBS bottle). Filter the wash buffer every time you use it with a 0.2 µm filter. Store at room temperature.
1. Notes for wash steps
Always conduct washes in ~50 ml of buffer in coplin jars on an orbital mixer.
Never wash in laboratory dishwasher or with detergent as this can represent a source of barium contamination. Also, do not use metal slide holders or forceps.
2. STAINING
Heat the 1X antigen retrieval solution (1:10 dilute in milli-Q water) to 96°C in a water bath.
Heat a coplin jar and a bottle of wash buffer (0.1% Tween-20 in DPBS) to 37°C in a water bath.
Prepare the humidified slide chamber (box with a damp tissue in the bottom and lid or piece of parafilm to cover).
Prepare blocking and antibody incubation buffer fresh: 10% BSA, 0.1% Tween-20, 1/20 dilution of FcR block (Trustain, Biolegend, #422302) and 1/20 dilution of KIOVIG (Kiovin Hu IgG) in Superblock (Thermo Fisher, #37515) solution. Use Superblock to top up to final volume. Avoid mixing, filtering, or centrifugation of the Superblock buffer prior to pipetting, let solution settle at 4°C and pipette from top of the solution to avoid any aggregate material (that will have sunk to the bottom).
Conduct xylene washes in a safety cabinet.
Use separate coplin jars.
The xylene should be changed regularly, especially the first coplin jar in the sequence, which will have the most wax dissolved in it (readily become cloudy and less efficient).
1. CLEAR WAX FROM SLIDES (you sequentially remove the wax, with each wash in fresh xylene)
Incubate slide 1 hr at 60°C.
Agitate slides (up and down motion) by hand, three washes at RT:
Xylene 100% for 5 min.
Xylene 100% for 5 min.
50:50 xylene:ethanol mix for 10 min.
1. REHYDRATE TISSUE (outside hood, use plastic tweezers)
Remove slides from xylene:ethanol mix in the hood, transfer to 100% ethanol for 5 min, and remove from cabinet.
Transfer slides to 90% ethanol – mix on orbital shaker for 5 min.
Transfer slides to 80% ethanol – mix on orbital shaker for 5 min.
Transfer slides to 70% ethanol – mix on orbital shaker for 5 min.
Remove slides and transfer to DPBS – mix orbital shaker for 10 min.
Slide can stay in DPBS until the antigen retrieval solution is ready.
RETRIEVE ANTIGEN
Fluidigm have validated all their FFPE antibodies using 30 min antigen retrieval, with basic buffer, in a 96°C water bath.
Heat 1X antigen retrieval solution to 96°C in the water bath (in plastic coplin jar or closed falcon 50 ml centrifuge tube).
When the slides are added, the temperature may drop – do not start the timer until the probe indicates the temperature in the tube has returned to 96°C.
Incubate slides in 1X antigen retrieval solution to 96°C in the water bath (in closed 50 ml falcon centrifuge tube) and incubate for 30 min.
Cool down closed 50 ml falcon tube to around RT under cold running water.
Transfer slides to DPBS and wash on orbital shaker for 5 min.
Dry slides with tissue wipers (Thermo Fisher) and leave to dry for 10 s.
Draw a wax ring around the tissue.
Wash twice with filtered wash buffer (0.1% Tween-20) in DPBS on the orbital shaker for 8 min. Filter it every time you use the wash buffer with 0.2 µm filter and store at RT.
Leave slides in the wash buffer until the block solution is ready.
1. BLOCK
While sections are in permeabilization buffer (0.1% Tween-20), prepare sufficient blocking buffer to cover the tissue, within the wax ring.
For a very small diameter wax ring, 50 µl will suffice; however, a large piece of tissue (>1 cm diameter) will require ~250 µl.
Blocking buffer preparation | ||||||
---|---|---|---|---|---|---|
Solution | Dilute in | Initial concentration | Final concentration | Dilution factor | Per 100 µl | Per 300 µl |
Tween-20 (Sigma Aldrich, #P7949-100 ml) | Superblock solution | Prepared first at 10% Then at 1% | 0.1% | 10 | 10 | 30 |
Kiovin Hu IgG | Superblock solution | 10% 100 mg/ml | 5 mg/ml | 20 | 5 | 15 |
FcR block (Trustain, Biolegend, # 422302) | Superblock solution | 100% | 5% (1/20) | 20 | 5 | 15 |
BSA (Sigma Aldrich, #A9647) | Superblock solution | 20% (10 g in 100 ml = 10%) | 10% | 2 | 50 | 150 |
Superblock (Thermo Fisher, #37515) | To top up Until your final volume | 30 | 90 |
When you add the buffer to the inside of the wax circle, it should be contained and fully cover the tissue.
Remove slides from DPBS and remove excess buffer by tipping slide.
Use a lint-free tissue to dry the back of the slide thoroughly and to gently draw excess buffer away from the front, without touching the tissue or wax ring.
Transfer to humidified box, cover tissue with blocking buffer.
Incubate at RT for 90 min.
1. PREPARE ANTIBODY MIX AND INCUBATE
Antibody mix | ||||||
---|---|---|---|---|---|---|
Cat# | Supplier | Tag | Target/ solution | Dil factor (1:X) | Initial concentration | Final concentration |
3170019D | Fluidigm | 170 | CD3 (polyclonal, C-terminal) | 400 | ||
3156033D | Fluidigm | 156 | CD4 | 100 | ||
3152016D | Fluidigm | 152 | CD45 | 1000 | ||
3162035D | Fluidigm | 162 | CD8-alpha | 800 | ||
3161029D | Fluidigm | 161 | CD20 | 250 | ||
MAB1561 | R&D Systems | 150 | PD-L1 | 50 | ||
3158029D | Fluidigm | 158 | E-cadherin | 1000 | ||
3159035D | Fluidigm | 159 | CD68 | 400 | ||
3165039D | Fluidigm | 165 | PD-1 | 50 | ||
3168022D | Fluidigm | 168 | Ki-67 | 400 | ||
#A9647 | Sigma Aldrich | BSA Prepare fresh | 2 | 20% | 10% | |
#422302 | Biolegend | FCR | 20 | 100% | 5% | |
Provided by hospital’s pharmacy | IgG Hu Kiovig | 20 | 100 mg/ml | 5 mg/ml | ||
#P7949 | Sigma Aldrich | Tween-20 | 10 | 1% | 0.1% | |
#37515 | Thermo Fisher | Superblock | To top up to the required volume |
You will require the same volume of antibody mix used for blocking solution and tip off excess blocking buffer onto tissue paper.
Use a lint-free tissue to dry the back of the slide thoroughly and to gently draw excess buffer away from the front, without touching the tissue or wax ring.
Incubate slide o/n with antibody mix at 4°C in humidified chamber.
STAINING PROTOCOL DAY 3
1. WASH IN PERMEABLIZATION BUFFER
Tip off the antibody mix onto tissue paper.
Flush the slide surface gently but thoroughly with a pastette and permeabilization buffer (DPBS with added 0.1% Tween-20) at RT.
Wash in the same RT buffer ~50 ml in coplin jar on the mixer for 8 min.
Repeat wash step for another 8 min.
1. WASH IN DPBS
In the same jar, decant 0.1% Tween-20 DPBS wash buffer and add further wash buffer at RT and then place on the mixer for 8 min.
Repeat wash step for another 8 min.
1. DNA STAIN
For initial testing, prepare iridium (Fluidigm) in DPBS at a final concentration of 0.5 µM (may require some optimization for your tissue type).
Dilute 125 µM iridium stock (Fluidigm 201192A) 1/250 with DPBS.
Tip off excess DPBS onto tissue paper.
Use a lint-free tissue to dry the back of the slide thoroughly and to gently draw excess buffer away from the front, without touching the tissue or wax ring.
Place slide in humidified chamber and load iridium solution into the wax ring so that it covers the tissue.
Incubate at RT for 30 min.
Flush the slide surface thoroughly with RT DPBS gently using a pastette.
Wash in RT DPBS on the mixer for 5 min.
Wash in milli-Q water ~50 ml in a clean coplin jar on the mixer for 5 min.
Air dry.
Slides can be stored for several months at RT in a cool, dry, dust-free storage container.
Appendix 2
Protocol for sectioning and staining of diffuse large B-cell lymphoma lymph node section
A 5 μm tissue section was deparaffinized with xylene and sequentially rehydrated in graded ethanol. Heat-induced antigen retrieval was performed in an automated pressure cooker (Menapath Antigen Access Unit, Menarini) at 125°C for 2 min in Antigen Retrieval Reagent-Basic (R&D Systems).
The tissue was then permeabilized with DPBS, 0.1% Tween-20 for 15 min, and blocked with Superblock (Thermo Fisher Scientific), 0.1% Tween-20, Fc Receptor Blocking solution (1:20; Biolegend; blocking buffer) for 2 hr at RT.
Antibodies were diluted together according to Supplementary file 3 in blocking buffer and applied to the tissue overnight at 4°C.
After washing, the slide was incubated with the DNA intercalator (Cell-ID Intercalator-Ir, Fluidigm) for 30 min at RT.
The slide was then briefly washed with milli-Q water and air dried.
Acquisition of 1 mm2 tissue region was carried out using the Hyperion imaging system (Fluidigm) at 200 Hz, with a resolution of 1 μm/pixel.
Data availability
The liquid mass cytometry dataset is available from FlowRepository (http://flowrepository.org/id/FR-FCM-Z244), the additional three datasets: 'Imaging mass cytometry data: Head and neck squamous cellcarcinoma tissue section', 'Flow cytometry data: healthy donor bone marrow taken during hip surgery', and 'Imaging mass cytometry data: Head and neck squamous cell carcinoma tissue section', are available from Dryad (https://datadryad.org/stash).
-
DryadImagingmass cytometry data: Head and neck squamous cell carcinoma tissuesection.https://doi.org/10.5061/dryad.gf1vhhmpr
-
DryadFlow cytometry data: healthy donor bone marrow taken during hipsurgery.https://doi.org/10.5061/dryad.4b8gthtcf
-
DryadImaging mass cytometry data: Diffuse large B-cell lymphoma lymph node section.https://doi.org/10.5061/dryad.3n5tb2rhd
-
Flow RepositoryID FR-FCM-Z244. Identification of disease-associated immune signatures following bone marrow transplantation.
References
-
Destiny: diffusion maps for large-scale single-cell data in RBioinformatics 32:1241–1243.https://doi.org/10.1093/bioinformatics/btv715
-
BookAnalysis of the Murine Bone Marrow Hematopoietic System Using Mass and Flow CytometryIn: McGuire H. M, Ashhurst T. M, editors. Mass Cytometry: Methods and Protocols. Springer. pp. 159–192.https://doi.org/10.1007/978-1-4939-9454-0
-
Integration, exploration, and analysis of high-dimensional single-cell cytometry data using spectreCytometry. Part a : The Journal of the International Society for Analytical Cytology 10:e24350.https://doi.org/10.1002/cyto.a.24350
-
Dimensionality reduction for visualizing single-cell data using UMAPNature Biotechnology 37:38–44.https://doi.org/10.1038/nbt.4314
-
SoftwarescDataviz: Single Cell Dataviz and Downstream AnalysesscDataviz: Single Cell Dataviz and Downstream Analyses.
-
SoftwarePCAtools: PCAtools: Everything Principal Components AnalysisPCAtools: PCAtools: Everything Principal Components Analysis.
-
Cytofkit: a bioconductor package for an integrated mass cytometry data analysis pipelinePLOS Computational Biology 12:e1005112.https://doi.org/10.1371/journal.pcbi.1005112
-
Cytobank: providing an analytics platform for community cytometry data analysis and collaborationCurrent Topics in Microbiology and Immunology 377:127–157.https://doi.org/10.1007/82_2014_364
-
Systems immunology: just getting startedNature Immunology 18:725–732.https://doi.org/10.1038/ni.3768
-
PhenoGraph and viSNE facilitate the identification of abnormal T-cell populations in routine clinical flow cytometric dataCytometry Part B: Clinical Cytometry 94:744–757.https://doi.org/10.1002/cyto.b.21588
-
Role of CD20(+) T cells in multiple sclerosis: implications for treatment with ocrelizumabNeural Regeneration Research 15:663–664.https://doi.org/10.4103/1673-5374.266913
-
SoftwareCytoExploreR: Interactive Analysis of Cytometry DataCytoExploreR: Interactive Analysis of Cytometry Data.
-
Systemic blood immune cell populations as biomarkers for the outcome of immune checkpoint inhibitor therapiesInternational Journal of Molecular Sciences 21:2411.https://doi.org/10.3390/ijms21072411
-
Accelerating t-SNE using Tree-Based algorithmsJournal of Machine Learning Research 15:3221–3245.
-
SoftwareSingleCellExperiment: S4 Classes for Single Cell DataSingleCellExperiment: S4 Classes for Single Cell Data.
-
Testing for differential abundance in mass cytometry dataNature Methods 14:707–709.https://doi.org/10.1038/nmeth.4295
-
Mass cytometry: blessed with the curse of dimensionalityNature Immunology 17:890–895.https://doi.org/10.1038/ni.3485
-
SoftwareCyTOF Workflow: Differential Discovery in High-Throughput High-Dimensional Cytometry DatasetsCyTOF Workflow: Differential Discovery in High-Throughput High-Dimensional Cytometry Datasets.
-
Surface barcoding of live PBMC for multiplexed mass cytometryMethods in Molecular Biology 1989:93–108.https://doi.org/10.1007/978-1-4939-9454-0_7
-
SoftwareFlowSOM. Bioconductor Version: Release (3.12)FlowSOM. Bioconductor Version: Release (3.12).
-
Immunophenotypic characterization of reactive and neoplastic plasmacytoid dendritic cells permits establishment of a 10-color flow cytometric panel for initial workup and residual disease evaluation of blastic plasmacytoid dendritic cell neoplasmHaematologica 106:1047–1055.https://doi.org/10.3324/haematol.2020.247569
-
SoftwareConsensusClusterPlus: ConsensusClusterPlusConsensusClusterPlus: ConsensusClusterPlus.
Article and author information
Author details
Funding
Cancer Research UK (C604/A25135)
- Jessica A Timms
- Shahram Kordasti
Aplastic Anemia and MDS International Foundation
- Shahram Kordasti
H2020 European Research Council (335326)
- James W Opzoomer
- James N Arnold
Medical Research Council (MR/N013700/1)
- James W Opzoomer
Medical Research Council
- James W Opzoomer
Rosetrees Trust (M117-F2)
- Sedigeh Kareemaghay
- Mahvash Tavassoli
Guy's and St Thomas' NHS Foundation Trust (G170701)
- Jessica Timms
- Shahram Kordasti
LifeArc (1118406)
- Jessica Timms
- Shahram Kordasti
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We would like to thank the following funding bodies for their support:
Guy's and St Thomas Hospital Charity: grant number: G170701, Jessica A Timms, Shahram Kordasti.
Cancer Research UK (CRUK), King's Health Partners Centre : Jessica A Timms, Shahram Kordasti C604/A25135.
Aplastic Anemia and MDS International Foundation (AAMDSIF): Shahram Kordasti.
Blood Cancer UK: Shahram Kordasti.
European Research Council (ERC): James N Arnold Start up grant 335326.
Medical Research Council (MRC): James W Opzoomer MR/N013700/1.
Medical Research Council (MRC): James W Opzoomer Doctoral Training Partnership in Biomedical Sciences.
Rosetrees Trust (Rosetrees): Sedigeh Kareemaghay, Mahvash Tavassoli M117-F2.
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. The research was supported by the Cancer Research UK King’s Health Partners Centre and Experimental Cancer Medicine Centre at King’s College London, and the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.
Ethics
Human subjects: Formalin-fixed paraffin-embedded (FFPE) DLBCL tumor tissue was obtained from King's College Hospital, in accordance with the Declaration of Helsinki and approved by the UK National Research Ethics Committee (reference 13/NW/0040). Head and neck squamous cell carcinoma (HNSCC) tissue was obtained from King's College Hospital, consent was attained by the King Guy's & St Thomas' Research Biobank, within King's Health Partners Integrated Cancer Centre. The non-interventional study which collected bone marrow samples from elderly healthy donors was approved by the ethical committee of Cochin-Port Royal Hospital (Paris, France) (CLEP Decision N°: AAA-2020-08039).
Copyright
© 2021, Opzoomer et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,131
- views
-
- 426
- downloads
-
- 19
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Cell Biology
- Computational and Systems Biology
Induced pluripotent stem cell (iPSC) technology is revolutionizing cell biology. However, the variability between individual iPSC lines and the lack of efficient technology to comprehensively characterize iPSC-derived cell types hinder its adoption in routine preclinical screening settings. To facilitate the validation of iPSC-derived cell culture composition, we have implemented an imaging assay based on cell painting and convolutional neural networks to recognize cell types in dense and mixed cultures with high fidelity. We have benchmarked our approach using pure and mixed cultures of neuroblastoma and astrocytoma cell lines and attained a classification accuracy above 96%. Through iterative data erosion, we found that inputs containing the nuclear region of interest and its close environment, allow achieving equally high classification accuracy as inputs containing the whole cell for semi-confluent cultures and preserved prediction accuracy even in very dense cultures. We then applied this regionally restricted cell profiling approach to evaluate the differentiation status of iPSC-derived neural cultures, by determining the ratio of postmitotic neurons and neural progenitors. We found that the cell-based prediction significantly outperformed an approach in which the population-level time in culture was used as a classification criterion (96% vs 86%, respectively). In mixed iPSC-derived neuronal cultures, microglia could be unequivocally discriminated from neurons, regardless of their reactivity state, and a tiered strategy allowed for further distinguishing activated from non-activated cell states, albeit with lower accuracy. Thus, morphological single-cell profiling provides a means to quantify cell composition in complex mixed neural cultures and holds promise for use in the quality control of iPSC-derived cell culture models.
-
- Computational and Systems Biology
- Structural Biology and Molecular Biophysics
Viral adhesion to host cells is a critical step in infection for many viruses, including monkeypox virus (MPXV). In MPXV, the H3 protein mediates viral adhesion through its interaction with heparan sulfate (HS), yet the structural details of this interaction have remained elusive. Using AI-based structural prediction tools and molecular dynamics (MD) simulations, we identified a novel, positively charged α-helical domain in H3 that is essential for HS binding. This conserved domain, found across orthopoxviruses, was experimentally validated and shown to be critical for viral adhesion, making it an ideal target for antiviral drug development. Targeting this domain, we designed a protein inhibitor, which disrupted the H3-HS interaction, inhibited viral infection in vitro and viral replication in vivo, offering a promising antiviral candidate. Our findings reveal a novel therapeutic target of MPXV, demonstrating the potential of combination of AI-driven methods and MD simulations to accelerate antiviral drug discovery.