1. Computational and Systems Biology
Download icon

Removing unwanted variation with CytofRUV to integrate multiple CyTOF datasets

  1. Marie Trussart  Is a corresponding author
  2. Charis E Teh
  3. Tania Tan
  4. Lawrence Leong
  5. Daniel HD Gray
  6. Terence P Speed
  1. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Australia
  2. School of Mathematics and Statistics, The University of Melbourne, Australia
  3. The Walter and Eliza Hall Institute of Medical Research, Australia
  4. Department of Medical Biology, The University of Melbourne, Australia
Research Article
  • Cited 6
  • Views 1,715
  • Annotations
Cite this article as: eLife 2020;9:e59630 doi: 10.7554/eLife.59630

Abstract

Mass cytometry (CyTOF) is a technology that has revolutionised single-cell biology. By detecting over 40 proteins on millions of single cells, CyTOF allows the characterisation of cell subpopulations in unprecedented detail. However, most CyTOF studies require the integration of data from multiple CyTOF batches usually acquired on different days and possibly at different sites. To date, the integration of CyTOF datasets remains a challenge due to technical differences arising in multiple batches. To overcome this limitation, we developed an approach called CytofRUV for analysing multiple CyTOF batches, which includes an R-Shiny application with diagnostic plots. CytofRUV can correct for batch effects and integrate data from large numbers of patients and conditions across batches, to confidently compare cellular changes and correlate these with clinically relevant outcomes.

Introduction

Mass cytometry or Cytometry by Time-Of-Flight (CyTOF) (Bandura et al., 2009) is a high-throughput technology that permits the simultaneous measurement of the expression level of more than 40 proteins in millions of single cells. It uses antibodies, which are labelled with heavy metal ion tags to target the proteins of interest, and are in turn detected by time-of-flight mass spectrometry. CyTOF has been a powerful tool for delineating cell subsets in heterogeneous tissues such as blood and tumour, and for correlating single-cell differences with biologically relevant outcomes (Levine et al., 2015; Qiu et al., 2011). This capability has been useful in understanding the mechanisms of resistance that develop in certain blood cancers to a new class of anti-cancer drugs, termed BH3 mimetics, in early stage clinical trials of several blood cancers (Agarwal et al., 2019; Blombery et al., 2019). Yet, a major challenge in the field is the high variation in the performance observed in the CyTOF instrument, caused by both differences in instrument calibration and fluctuations in signal strength.

To overcome this challenge, a normalisation method was created to improve the comparability between measurements (Finck et al., 2013). Briefly, the method involves the addition of five types of control beads mixed in with cells, each type tagged with a different heavy metal element, the collection of the control beads throughout the run, and the application of a multiplicative correction at the end of the run. For the multiplicative correction, the algorithm calculates smoothed intensities from each control bead element, estimates a coefficient at each control bead acquisition time-point, and corrects the instrument sensitivity at that specific time-point by computing a unique slope for all the control bead elements, assuming that they vary at similar rates. To extend from control bead events to all cells, the value of the coefficient is linearly interpolated over all time-points of the experiment by assuming that all cells (non-control bead events) have similar slopes to the closest bead time-point. The final normalisation step involves applying the interpolated correction coefficient to all the protein measurements. The use of this normalisation to account for intra-instrument time-drift variation has become common practice, but novel correction procedures are still needed to address all other types of variation between samples between research centres, see for example (Leipold et al., 2018) and within laboratories run on different days (see below).

In this study, a single CyTOF dataset, barcoded sample set, or run is referred to as a CyTOF batch. We call the batch effects we seek to remove ‘unwanted variation’, and their causes include: differential antibody staining across samples within a batch, different batches of reagents, different machines or the inevitable lab differences found in multicentre studies. The ability to accurately distinguish true biological changes from technical artefacts like those just mentioned is critical, and has already been done to an extent for flow cytometry assays (Finak et al., 2016; Maecker et al., 2012).

Two methods have recently been published that aim to achieve consistency between samples across batches which make use of shared reference samples across batches. BatchAdjust (Schuyler et al., 2019) offers methods analogous to the control bead normalisation described above, which include scaling all measurements by ratios of means or medians. With the scaling methods, a factor is computed for each protein and each batch to adjust the measurements on the reference samples replicated across batches. Similar adjustments are then applied to the samples within a batch to achieve consistency with their reference samples. However, technical variation can impact specific cell types differently (Van Gassen et al., 2019). To address this, CytoNorm (Van Gassen et al., 2019) uses the clustering algorithm FlowSOM (Van Gassen et al., 2015) to identify clusters prior to normalisation, and defines a cluster-specific goal distribution for the values of each protein measurement using the means of quantiles. Their approach then uses splines to transform the original protein values into new values, which have the goal distribution. This method relies on the strong assumption that batch effects do not affect the clusters, and this is examined using coefficients of variation. Both methods were shown to be effective in removing batch-to-batch variation in the datasets analysed. The observation that batch differences can affect clusters differently (Van Gassen et al., 2019) suggests that it will not be sufficient to apply a single batch adjustment to the measurements on all cells, as it is being done in BatchAdjust (Schuyler et al., 2019). However, a comparison of the two methods assessing the performance and limitations of each method has not been performed. Additionally, there are no tools or metrics to assess whether the post normalised CyTOF data are more or less consistent across batches, not only at the protein expression level but also at the cluster level.

In recent years, a class of methods called Remove Unwanted Variation (RUV) has been developed to remove unwanted variation such as batch effects, from high-dimensional genetic and genomic data. They have been applied to microarray (Gagnon-Bartsch and Speed, 2012), RNA-seq (Risso et al., 2014), Nanostring nCounter gene expression (Molania et al., 2019) and single-cell RNA-seq data (Lin et al., 2019). Here, for the first time, we adopt the approach and develop a computational algorithm which permits the integration of data across CyTOF batches. Our method is based on the RUV-III method (Molania et al., 2019), which uses technical replicates and negative control genes to estimate unwanted variation. We applied RUV-III to CyTOF data by exploiting pseudo-replicates to estimate the unwanted variation and remove it. It is implemented in the R package ‘CytofRUV’, which is available at the following link: www.github.com/mtrussart/CytofRUV. We begin by examining the batch effects found when comparing CyTOF data from samples replicated across batches. To do so, we built an R-Shiny application that exhibits any batch effects present in such samples using four different diagnostic plots and their associated numerical metrics based on protein expression distributions and clustering results. Then, we compare the unadjusted data with the normalised data using either CytofRUV, BatchAdjust or CytoNorm, all on three different datasets. Our results suggest that not only does CytofRUV do better at removing unwanted variation from measured protein expression, it also makes the distributions of these quantities more uniform across batches, and enhances the detection of biologically important changes embodied in the data across batches.

Results

Batch effects include protein expression differences

A known source of unwanted variation in CyTOF datasets is the time-drift in signal intensity (Finck et al., 2013). However, there can also be variation due to differences across batches in antibody conjugates or other reagents, as well as operators, machines and laboratories. To exhibit some of the batch effects that can arise with CyTOF within one lab, we conducted an experiment using samples replicated across batches. Replicated samples allow us to assess intra-site reproducibility and systematic differences due to technical variation.

We created a dataset based on 24 samples in total, consisting of peripheral blood mononuclear cell (PBMC) samples from three patients with chronic lymphocytic leukaemia (CLL) and PBMC from nine healthy controls (HC), each replicated across two batches of 12 samples (Table 1). All samples were stained with a 31-antibody panel targeting 19 lineage (Table 2) and 12 functional proteins (Table 3) that were previously validated (Teh et al., 2020). After processing the data (Methods), we applied an arcsinh-transformation defined as arcsinh (intensity/5) in all that follows.

Table 1
Samples descriptions.

The first column indicates the sample id, the second the patient condition, either healthy controls (HC) or chronic lymphocytic leukaemia (CLL), the third column indicates the patient id and the last indicates the batch number, 1 or 2.

Sample IdConditionPatient IdBatch
HC1_B1HCVBDR9961
HC2_B1HCVBDR10891
HC3_B1HCVBDR10901
HC4_B1HCVDBR10981
HC5_B1HCVDBR11081
HC6_B1HCVDBR11031
HC7_B1HCVDBR11051
HC8_B1HCVDBR11071
HC9_B1HCVBDR11111
CLL1_B1CLLDG33-011
CLL2_B1CLLDG23-011
CLL3_B1CLLDG27-011
HC1_B2HCVBDR9962
HC2_B2HCVBDR10892
HC3_B2HCVBDR10902
HC4_B2HCVDBR10982
HC5_B2HCVDBR11082
HC6_B2HCVDBR11032
HC7_B2HCVDBR11052
HC8_B2HCVDBR11072
HC9_B2HCVBDR11112
CLL1_B2CLLDG33-012
CLL2_B2CLLDG23-012
CLL3_B2CLLDG27-012
Table 2
Lineage surface proteins selected.

The first column indicates the transition element isotope (mass number, element name), the second column indicates the antigen selected, and the last two columns indicate the clone name and vendor.

MetalLineage (surface) protein antibodyCloneVendor
189 YCD45HI30BioLegend
2115 InHLA-DRL243BioLegend
3140 CeCD27M-T271BioLegend
4141 PrCD235a/bHIR2BioLegend
5142 NdCD19HIB19BioLegend
6143 NdCD5UCHT2BioLegend
7144 NdCD38HIT2BioLegend
8145 NdCD4RPA-T4BioLegend
9146 NdCD8RPA-T8BioLegend
10147 SmCD20H1BD
11148 NdCD163G8BioLegend
12151 EuCD1236H6BioLegend
13155 GdCD56B159BioLegend
14156 GdCD14HCD56BioLegend
15159 TbCD11cBu15BioLegend
16169 TmCD45RAHI100BioLegend
17170 ErCD3UCHT1BioLegend
18171 YbCD66CD66a-B1.1DVS
19209 BiCD61VI-PL2DVS
Table 3
Set of intracellular functional proteins selected.

The first column transition element isotope (mass number, element name), the second column indicates the antigen selected, and the last two columns indicate the clone name and vendor.

MetalFunctional (intracellular) protein antibodyCloneVendor
1140 CeBAK7D10WEHI
2153 EuBcl-xLE18Abcam
3154 SmBax1B4WEHI
4157 GdBcl-2100WEHI
5160 GdMcl-1Y37Abcam
6161 DycMycD84C12CST
7163 DyBFL-1SP435Abcam
8165 HoBim3C5WEHI
9166 ErpRb [S807/811]J112-906BD
10172 YbBCLW16H12WEHI
11173 YbcCaspase3C92-605BD
12174 Ybp537F5CST

The first class of diagnostic plots we use is based on median protein expression. The multi-dimensional scaling (MDS) plot (Figure 1A) computed using median protein expression from all cells in each sample as described in Crowell HL et al., 2017 and Nowicka et al., 2017, shows the dissimilarities between samples. The first dimension (MDS1) separates the CLL from the HC samples well. The second dimension (MDS2) shows the batch differences, with samples that originate from batch one placed at the bottom of the plot and samples from batch two at the top of the plot. This distinction clearly reveals that the protein expression measured in the samples is affected by batch. We also carried out hierarchical clustering on the median expression across all cells in the samples of the 19 lineage proteins and 12 functional proteins detected, to highlight the proteins driving the observed clustering of samples (columns) and proteins (rows) in the heatmap (Figure 1B). As with the MDS plot, a grouping of samples by condition and by batch is observed.

Visualisation of batch effects on the median protein expression across batches .

(A) Multi-dimensional scaling plot of the 24 samples computed using median protein expression. (B) Heatmap of the median protein expression of 19 lineage proteins and 12 functional proteins across all cells measured for each sample in the dataset.

We next examined the magnitude of the batch-to-batch differences in the distributions of the protein expression across replicates, as our second class of diagnostic plots. We observed that batch effects not only affect each protein differently (Figure 2—figure supplement 1) but also each sample differently (Figure 2—figure supplement 1). To assess the importance of the variation found between samples replicated across batches, we compared them to expected, biologically relevant differences within a single batch. To do this, we used another dataset of a replicated sample from one index patient with CLL, and samples from six other patients with CLL. The one sample from the index patient with CLL was replicated across eight different CyTOF batches, resulting in eight replicated CLL datasets. We compared the variation in these datasets with that from six other patients with CLL processed in a single CyTOF batch. We focussed on expression of BCL-2, an archetypal pro-survival protein that is greatly upregulated in CLL cells compared to their normal B cell counterparts, yet still exhibits variation in CLL cells among patients (Majid et al., 2008). We found that the variation in the distributions of the BCL-2 expression from a single sample across batches (Figure 2A) is comparable to that observed in the BCL-2 expression for the six patients at screening from a single CyTOF batch (Figure 2B). We conclude that batch effects due to unwanted variation can occur over a range comparable to that due to actual biological changes among patients (Figure 2A–B). An important repercussion in this example is that the impacts of treatments on distinct CLL cellular phenotypes would not be confidently detected without correcting for batch effects.

Figure 2 with 2 supplements see all
Distribution of BCL-2 expression.

(A) Distributions of BCL-2 expression in one sample from one treated CLL cancer patient, replicated across 8 CyTOF batches, coloured by batch. (B) Distributions of BCL-2 expression in one sample from each of 6 different CLL cancer patients at screening, processed in a single CyTOF batch, coloured by patient.

Batch effects affect cell subpopulations differently

To assess how batch effects affect cell subpopulations, we used our first dataset of samples from 3 patients with CLL and 9 PBMC healthy controls (HC). We partitioned the entire collection of cells into clusters using FlowSOM as described in Nowicka et al., 2017, which appeared among the fastest and best performing clustering approaches for CyTOF data (Weber and Robinson, 2016). In all 24 samples, we carried out this clustering using the 19 lineage proteins to identify 20 cell clusters (Figure 3A and Figure 3—figure supplement 1). We performed the clustering using different numbers of clusters, and the choice of 20 clusters was determined based on the biological interpretation of the cell subpopulations found. Here we refer to clusters as cell subpopulations, although in some cases the clustering method might produce clusters that do not necessarily correspond to homogeneous cell subpopulations. The third class of diagnostic plots we display are based on the clustering results. We used t-Distributed Stochastic Neighbor Embedding (t-SNE) as described in Crowell HL et al., 2017 and Nowicka et al., 2017 to give a 2D representation of the single-cell data, with the positions of cells reflecting their proximity in high-dimensional space. It was then possible to visualise the impact of batch differences on cell subtypes identification in the datasets by examining the t-SNE plot coloured by cluster (Figure 3A), which assumed different distributions across the replicate CyTOF runs. Additionally, we highlighted the batch effect by overlaying the predominant CLL cell subpopulation (cluster 9) coloured by batch (Figure 3B). The distinct positions of these clusters on the second dimension (t-SNE2) suggested that substantial unwanted variation was altering cell population measures across the two batches.

Figure 3 with 4 supplements see all
Cell clustering plots show batch effects in cells from the same cancer patient CLL1 sample replicated across 2 CyTOF runs.

(A) Cell clustering identification. t-SNE plot based on the arcsinh-transformed expression of the 19 lineage proteins in the cells. For display purposes, 2000 cells were randomly selected from each of the samples. Cells are coloured according to the 20 clusters obtained using FlowSOM clustering stratified by batch 1 (left) or 2 (right) of the corresponding replicated sample. (B) Same as in (A) selecting only cluster 9 cells but coloured by the batch 1 or 2 of the corresponding replicated sample. (C) Same as in (A) but after CytofRUV normalisation with k = 5. (D) Same as in (B) but after CytofRUV normalisation with k = 5. (E) Linear discriminant analysis applied to data on two cell types from the same sample replicated across two batches, with shape indicating cell type and colour indicating batch. (F) Cluster proportions. Barplot of the relative abundance (percentage) of the cells in clusters 2, 6 and 7 by batch.

To optimise the dimensionality reduction and visualise the extent to which discrete subsets of cells are separated from each other in the 31-dimensional space, we performed a linear discriminant analysis (LDA) on four subsets of cells: the predominant CLL cell subpopulation (cluster 9) and CD8 killer cell subpopulation (cluster 2) identified in the study from batches 1 and batch 2. The first dimension (LD1) separates the cell types well, while the second dimension (LD2) embodies batch differences: cells that originate from batch one are located at the top of the plot while cells from batch two are at the bottom of the plot (Figure 3E).

CyTOF has the unique ability to provide a deeper understanding of the molecular changes induced by targeted therapies at the single-cell level and how their efficacy is influenced by cancer heterogeneity. However, the assessment of patient heterogeneity and its correlation with the clinical outcome necessitates a reliable detection of changes in protein expression across patients. As such, we compared the median expression of BCL-2 for the 3 patients with CCL in the main CLL cluster nine and found that the variation from a single sample across batches is higher than that observed in the BCL-2 median expression across patients (Figure 3—figure supplement 2A). To overcome this limitation, the assessment of heterogeneity of patients with CLL requires a correction for batch effects that would remove this unwanted variation.

Batch effects can induce differences in cell subpopulation abundances

In CyTOF studies, the analysis of cell subpopulation abundances as well as that of protein expression can be used to identify sets of proteins that are associated with response to a treatment. Comparing the proportions of inferred cell types across different drug treatments highlights the subpopulations that change across experimental conditions. To assess whether batch differences affect cell subpopulation abundances, we compared the cluster frequencies across replicates. We detected a noticeable difference in the proportions of CLL cancer cells (cluster 2, cluster 6 and cluster 7) among cells that originate from the batch one compared to those from batch 2 (Figure 3F). Our R-Shiny application can also be used to visualise the cluster proportions across samples, and is our fourth class of diagnostic plots (Figure 4). Such variation in cell subpopulation abundance is important when batches have markedly different proportions. We need to be able to identify changes in subpopulation abundance which are due to biology and not to unwanted variation between samples.

CytofRUV’s R-Shiny application for the identification of batch effects in cluster proportions across batches.

All diagnostic plots can be obtained by the user selecting an option at the top left corner by from: Median Protein Expression, Protein Expression Distributions, Clustering Results and Cluster Proportions. The selected option displays barplots of cluster proportions across samples before normalisation and by conditions CLL or HC on a subsample of the whole dataset. Vertical black boxes contain the same replicated sample across batches one and batch2.

In summary, we examined the reproducibility of samples replicated across batches. To facilitate the identification of batch effects across replicates, we developed an R-Shiny application that produces the four diagnostic plots previously mentioned: Median Protein Expression, Protein Expression Distributions, Clustering Results and Cluster Proportions. We found that batch differences affect not only the protein expression levels, but also the cluster proportions. Such differences are problematic in large-scale studies with multiple patients, cell types and treatments, as they compromise the detection of biologically important changes. The integration of datasets from multiple CyTOF batches is therefore an important challenge to be addressed.

Using CytofRUV to remove batch effects in CyTOF data

To integrate data from multiple CyTOF batches, we have developed a normalisation method that removes batch effects in CyTOF data. In order to estimate and adjust for such unwanted variation, CytofRUV exploits the concept of pseudo-replicates (here cells) in the RUV-III method that has been successfully applied to the Nanostring nCounter gene expression platform (Molania et al., 2019) and to single-cell RNA-seq data (Lin et al., 2019). We cluster using FlowSOM, and we assume that at least one cluster in the replicated samples is shared across the batches. We then consider the cells of clusters shared across the batches to be pseudo-replicates (see Methods). To adjust for batch effects, CytofRUV begins by averaging protein values across pseudo-replicates, and then forms residuals. This leads to an estimate of one aspect of the unwanted variation (α) on each protein, which in turn is used to estimate the other aspect (W) of the unwanted variation on each cell. Finally, those estimates are combined into an estimate of the unwanted variation (Wα), and that is subtracted from the data. The dimension (k) of the unwanted variation also needs to be determined. To find a good value for k, we repeat the analysis with different values of k and then evaluate the quality of each result using our diagnostic plots and the corresponding summary statistics.

We first used CytofRUV on data from the 12 samples replicated across two batches, using two samples as our known replicated reference samples. Specifically, we used HC1 for the HC samples and CLL2 for the CLL samples, defining all cells from those samples in any given cluster to be pseudo-replicates. Assuming that all 20 clusters have cells from HC1 and CLL2 in both batches, any differences in protein expression between cells within the cluster but in different batches will represent unwanted variation. We summarized the performance of our CytofRUV normalisation method on all the CLL samples using three metrics and compare them with the corresponding ones for BatchAdjust (Schuyler et al., 2019) and CytoNorm (Van Gassen et al., 2019).

CytofRUV reduces batch effects from protein expression

To assess the quality of the our CytofRUV normalisation, we first compared the distributions of proteins across the two batches for the designated replicated reference samples. We found that, for all the proteins, these distributions become more similar across batches (Figure 2—figure supplement 1, Figure 2—figure supplement 2). Also, we observed that the variation in the median expression of BCL-2 for the 3 patients with CLL in the main CLL cluster is reduced compared to that observed in the BCL-2 median expression across patients (Figure 3—figure supplement 2B). We also observed a decrease in the batch effects both on the t-SNE plots (Figure 3C and D) and in the linear discriminant analysis (Figure 3—figure supplement 3). To quantify the batch differences between these pairs of distributions, we computed the Earth Movers Distance (EMD) as described in Van Gassen et al., 2019 for all the proteins and all CLL and HC samples for both the original dataset and the normalised datasets (Figure 5A and D). For only the CytofRUV normalisation, we also computed the EMD by cluster for all proteins to assess the reduction in batch differences within cluster compared to the raw data, where batch effects are affecting protein distributions within cluster differently (Figure 5—figure supplement 1). For all CLL samples, CytofRUV gave the lowest EMD for all values of k = 5,10 and 15, not only compared to the raw data but also compared to both BatchAdjust and CytoNorm (Figure 5A). Similarly, CytofRUV also gave the lowest EMD for 7 out of 9 HC samples compared to both BatchAdjust and CytoNorm (Figure 5D).

Figure 5 with 1 supplement see all
Metrics to assess the effectiveness of the normalisation methods.

In all panels, the colour indicates either the raw data or the method used for normalisation. (A) Boxplots of the Earth Movers Distances (EMD) between paired protein expression distributions across batches for each CLL sample. (B) Hellinger distances between paired cluster proportions across batches for each CLL sample. (C) Mean Silhouette scores computed for all CLL samples on the cluster types (bio) on the x-axis and on batch (batch) on the y-axis. (D) Same as (A) for the HC samples. (E) Same as (B) for the HC samples. (F) Silhouette scores computed for all HC samples on the cluster types (bio) on the x-axis and on batch (batch) on the y-axis.

Altogether EMD revealed that CytofRUV is the method with the optimal reduction in batch differences from protein expression for 11 out of 12 samples. CytoNorm gave the lower EMD when compared with BatchAdjust for 7 out of 12 samples (Figure 5A and D).

CytofRUV makes the cell subpopulation proportions more consistent across batches

The proportions of the different cell subpopulations in replicates in different batches should be consistent. This consistency among the replicates ensures that a differential analysis of their abundances will confidently detect robust cellular changes across experimental conditions. To assess how well CytofRUV corrects for differences in cluster proportions across batches, we computed the Hellinger distance (see Methods) between the paired cluster proportions of all replicated samples. For the HC samples, CytofNorm gave the lowest Hellinger distances for 4 samples out of 9 compared to both CytofRUV and BatchAdjust. Two remaining samples are the ones where CytofRUV is able to make those proportions more consistent across replicates (Figure 5E). For all the CLL samples, the lowest Hellinger distances are found for the normalisation with CytofRUV methods (Figure 5B). Overall CytofRUV is able to adjust the cluster proportions and make them consistent across replicates to a greater extent than is achieved by the two others methods. CytoNorm also performs well in adjusting those proportions, while BatchAjust generally leads to Hellinger distances similar to those in the raw data (Figure 5B and E).

Effective removal of unwanted variation leads to a better separation of cell subpopulations

To evaluate whether CytofRUV not only removes batch effects but preserves biology, we compare the cell-to-cell variation within clusters in relation to between clusters. We do this by computing Silhouette scores (see Methods), which are a combined measure of the degree of separation of cells within clusters and that between clusters. We compute two Silhouette scores sbiology and sbatch, based on the cell subpopulations defined by the clusters, and batch grouping, respectively. A normalisation method that successfully removes batch effects should lead to adjusted data with a small sbatch score, while one which preserves or enhances the biology should have high value of sbiology, at least as high as that before adjustment. To assess this aspect of the performance of normalisation methods, we combine the sbiology for clusters with sbatch for the batches in a single plot (Figure 5C and F). When the raw data are found at the bottom left corner of the plot and a normalisation method is found at the top right corner of the plot, this indicates that the method has successfully removed batch effects and preserved or enhanced the biology. According to the mean Silhouette scores of the CLL samples (Figure 5C), CytofRUV with k = 5 gives the best results, not only enhancing the biology i.e. cell sub-type definition across replicates (also on t-SNE plots Figure 3C and D), but also removing the batch effects. For most of the HC samples, sbatch is similar for all methods and in a few samples sbatch is higher than in the raw data for all methods (Figure 5F). Using CytofRUV with k = 5 for the HC samples, we also obtained a higher sbiology for 8 out of the 9 HC samples (Figure 5F) compared to the two other methods.

CytofRUV removes batch effects across multiple batches in two other datasets

To expand our analysis, we also tested all methods on two other datasets containing replicate reference samples across multiple batches, taken from those used in Schuyler et al., 2019 and in CytoNorm (Van Gassen et al., 2019).

The first dataset (Schuyler et al., 2019) is from peripheral whole blood samples from a single healthy donor, processed at one time to include unstimulated and stimulated conditions (Lipopolysaccharide LPS+ and Resiquimod R848-), and aliquoted into 12 batches. We carried out the normalisation of all samples using BatchAdjust as explained in Schuyler et al., 2019. For CytofRUV and CytoNorm, we used all the stimulated samples as replicated reference samples and identified 20 clusters with FlowSOM. CytofRUV used all clusters to define pseudo-replicates.

Our R-Shiny application can also be used to assess the ability of a normalisation method to correct for batch effects. Using our fourth diagnostic plot, we can visualise the ability of CytofRUV with k = 10 to remove batch effects in cluster proportions across 12 batches (Figure 6C) compared to that in the raw data (Figure 6A). We also summarized the performance of all three methods using the same three metrics (Figure 6—figure supplement 1). Again, we observed the ability of CytoNorm to correct for differences in cluster proportions in stimulated samples where the lowest Hellinger distances are found (Figure 6—figure supplement 1B), and in unstimulated samples (Figure 6—figure supplement 1E), followed by CytofRUV with overall lower distances than BatchAdjust. Likewise, we confirmed that, overall, CytofRUV gave lower EMD for both the stimulated (Figure 6—figure supplement 1A) and unstimulated samples (Figure 6—figure supplement 1D) compared to the two other methods, which in some cases have some cases higher EMD than those of the raw data. Again we observed improved Silhouette scores for CytofRUV compared with BatchAdjust and CytoNorm, which have higher sbatch (Figure 6—figure supplement 1C).

Figure 6 with 3 supplements see all
CytofRUV performance on two other datasets with multiple batches.

(A) Barplot of proportions of clusters across 28 samples from the BatchAdjust dataset (Schuyler et al., 2019) before normalisation, by samples and coloured by cluster. Vertical black boxes contain the same sample (Stimulated or Unstimulated) replicated across 14 batches. (B) Protein expression distribution from the CytoNorm dataset (Van Gassen et al., 2019) before normalisation of all cells from the stimulated samples across 10 batches and coloured by batch. (C) Same as (A) but after CytofRUV normalisation with k = 10. (D) Same as (B) but after CytofRUV normalisation with k = 5.

The second dataset (Van Gassen et al., 2019) comes from the FlowRepository FR-FCM-Z247 which is the validation cohort of an immunoprofiling study of women during pregnancy used in CytoNorm (Van Gassen et al., 2019). The samples we analyse come from the blood of one healthy donor which was replicated across 10 separately barcoded plates (i.e. batches). Each batch contained unstimulated cells and cells stimulated with both Interferon α and LPS. Each stimulated and unstimulated sample was duplicated (referred to as sample 1 and sample 2), giving four samples per plate. To assess the limitations of the three normalisation methods, we carried out two analyses. The first normalisation was only carried out on samples two using the stimulated samples two as replicate reference samples, while the second normalisation was done on samples 1 and 2 using both the stimulated and unstimulated samples one as reference samples. We performed CytoNorm as explained in Van Gassen et al., 2019 and identified 25 clusters with FlowSOM and we used all clusters to define pseudo-replicates for CytofRUV.

For the first normalisation using only the stimulated samples as replicate reference samples, CytofRUV gives lower EMD for all samples, followed by CytoNorm which overall gives lower EMD when compared with BatchAdjust (Figure 6—figure supplement 2A, Figure 6—figure supplement 2C). CytoNorm corrects some differences in cluster proportions only in stimulated samples compared to the two other methods that give EMD generally similar to those in the raw data (Figure 6—figure supplement 2B). However, according to the Hellinger distance and Silhouette scores with sbatch higher to that in the raw data, the unstimulated samples still have batch effects after normalisation by all three methods (Figure 6—figure supplement 2C, Figure 6—figure supplement 2E). Similarly, the Silhouette scores of the stimulated samples indicate that none of the normalisation methods is able to successfully remove batch effects (Figure 6—figure supplement 2C).

In our second normalisation, when using both stimulated and unstimulated samples one as reference samples, lower Hellinger distances are found on both unstimulated and stimulated samples two for CytofRUV compared to BatchAdjust and CytoNorm, which have higher distances than those in the raw data for most samples. Better Silhouette scores are found for both samples for CytoNorm (Figure 6—figure supplement 3B, Figure 6—figure supplement 3E, Figure 6—figure supplement 3C) compared with the two other methods. Using our second class of diagnostic plots, we observe that CytofRUV with k = 5 effectively eliminates batch differences from protein expression across the 10 batches (Figure 6B) compared to that in the raw data (Figure 6D). It also gives the lowest EMD compared to both CytoNorm and BatchAdjust (Figure 6—figure supplement 3A, Figure 6—figure supplement 3D).

Discussion

We have developed a new computational approach for analysing multiple CyTOF batches implemented in the CytofRUV R package. We showed how CytofRUV can reduce batch effects in three different datasets across multiple batches. Our method allows pooling of data from a large number of patients and conditions run across multiple batches, thereby enabling the integration of multiple CyTOF datasets.

Our approach adapts the RUV-III procedure to CyTOF by exploiting pseudo-replicates. We began by examining the batch effects found in CyTOF by comparing data from samples replicated across batches. To do this, we build an R-Shiny interface that highlights the presence of any batch effects in replicated samples using four different diagnostic plots: Median Protein Expression, Protein Expression Distributions, Clustering Results and Cluster Proportions. Finally, we compare CytofRUV with the two recently developed methods, BatchAdjust and CytoNorm, using three different datasets. Our results suggest that not only does CytofRUV frequently do better at removing unwanted variation in measured protein expression, making the distributions of these quantities more uniform across batches, but it also enhances the detection of biologically important differences embodied in the data across batches.

Using replicated samples to assess intra-site reproducibility and differences due to technical variation, we first showed how batch differences in protein expression in CyTOF datasets consisting of several CyTOF batches are comparable to biologically relevant differences within a single batch. In that context, confidently detecting of the impact of diverse treatments on distinct CLL cellular phenotypes would not be possible without correcting for such batch effects.

We developed an interactive R-Shiny application that exhibits batch effects in CyTOF studies. Different diagnostic plots can be selected by the user to display any batch effects on CyTOF data before and after normalisation.

We showed that not only are batch effects found in individual protein expression values which was also found in Schuyler et al., 2019, but also that batch differences affect samples differently. We observed the impact of batch differences on cell subtypes identification and how protein distributions are affected differently within cluster, something that was also found by Van Gassen et al., 2019. This suggests that it will not be sufficient to apply a single batch adjustment to the measurements on all cells, as is performed in BatchAdjust (Schuyler et al., 2019). We also noticed changes in the cluster abundances across batches.

RUV-III has been successfully applied to the Nanostring nCounter gene expression platform (Molania et al., 2019) and to single-cell RNA-seq data (Lin et al., 2019). Here, we adapted RUV-III to CyTOF data using as pseudo-replicates the cells of clusters in the different batches, and taking the collection of all protein expression values as ‘negative controls’. We refer to the section ‘Selection of negative control genes’ (Molania et al., 2019) for this last point. Here, as there and in other contexts, before/after normalisation comparisons indicate that this can be effective. Our method is adaptable and flexible in that it allows the user to select different normalisations: it can be implemented for one or several replicated reference samples, it can also normalise specific clusters or all clusters, and the dimension (k) of the unwanted variation can vary. In all cases, the user can visualise the diagnostic plots to assess the effectiveness of their normalisation. An important step of our method is the identification of biological subpopulations prior to normalisation, as this guides the selection of some or all clusters to define pseudo-replicates across batches.

To assess the abilities of CytofRUV, BatchAdjust and CytoNorm to remove batch effects, we computed three statistical metrics. We first compared the distributions of proteins across batches for replicated samples using EMD. Second, to assess how well each method corrects for differences in cluster proportions across batches, we computed the Hellinger distance between the paired cluster proportions of all replicated samples. Finally, to assess whether each method not only removes batch effects but preserves the biology, we computed Silhouette scores that quantify the cell-to-cell variation within cluster in relation to that between clusters.

Overall, CytofRUV had the best capability to make protein expression distributions more similar across batches. When compared with BatchAdjust and CytoNorm on different datasets it has the lowest EMD for all proteins for most samples in all three datasets and all four analyses performed, followed by CytoNorm. We further saw that batch differences are reduced within clusters by computing EMD by cluster for all proteins. We also saw CytoNorm’s ability to correct for differences in cluster proportions as well as CytofRUV, both having the lowest Hellinger distances in two of the four analyses performed. Silhouette scores also indicate that in most of our analyses (two out of the three datasets) CytofRUV not only removes the batch effects but also improves biological accuracy of the clusters compared to the two other methods. Likewise, CytoNorm is performing well according to Silhouette scores in the second analysis of the third dataset. We also conclude that BatchAdjust has the least ability to remove batch effects in the three datasets we have explored here, according to those metrics. This conclusion might be a result the single batch adjustment performed by BatchAdjust to the measurements on all cells, whereas CytoNorm and CytofRUV both take into account the impact of batch differences at the cluster level.

We envisage that in some cases removing batch effects might lead to removal of clusters or to the identification of new biologically relevant clusters. Some clusters could be artefacts of the batch effects. We use sbiology to determine the tightness and separation of the clustering, as it provides an evaluation of the clustering validity.

We observed that the B cell cluster (Figure 3A, cluster 16) is only present in the batch 1 of the CLL1 sample, and CytofRUV normalisation leads to the presence of this cluster (Figure 3C, cluster 2) in both batches. In practise, the user should examine and identify any new clusters post-normalisation, as some clusters might not be conserved after normalisation, and relate those to the clusters that were selected as pseudo-replicates. We also recommend the exploration of different clustering resolutions and how they change as the number of clusters increases, using clustering tree visualisation (Zappia and Oshlack, 2018) to relate the clusters pre and post-normalisation.

To further quantify the extend of batch effects in CyTOF datasets, it might be interesting to perform differential analyses of replicates across batches. Recent methods like diffcyt (Weber et al., 2019) have been developed that also uses FlowSOM for clustering to define cell populations, and empirical Bayes moderated tests adapted from transcriptomics for differential analysis. This method could be adapted to test for differential abundances of cell subpopulations and differential expression of proteins in the replicated samples across batches. In our first dataset containing two batches, differential analysis of protein expression is not suitable as we have too few replicates to permit a useful statistical test. However, we computed the differences in median expression for all proteins across batches before compared to that after CytofRUV normalisation for the CLL samples (ΔΔ, Figure 3—figure supplement 4.A) and the HC samples (ΔΔ, Figure 3—figure supplement 4.B). We observed that ΔΔ > 0 for most proteins especially for the CLL samples indicating reduction in batch differences. We did perform a differential analysis of the cluster abundances across batches using diffcyt and compare the results before and after normalisation. We also observed a decrease in the number of clusters that were found to have a significant difference in abundance across batches before normalisation (13 out of the 20 clusters) compared to that in the data after CytofNorm (7 out of 20 clusters).

One limitation of our method is that CytofRUV does rely on the assumption that at least one cell subpopulation is shared across the batches that would be used as our pseudo-replicates.

In experiments where large batch effects occur and no cell subpopulation is shared between batches, our method would not be applicable.

In this study, we considered three different datasets containing up to 24 samples and up to 12 batches. RUV methods have already been applied to hundreds and thousands of samples with Nanostring technology (Molania et al., 2019) and hundreds of samples with RNAseq. CytofRUV can also handle larger studies, and no memory limitation has been reached so far on other datasets containing hundreds of samples across several batches with up to 43 milllion cells. We ran the current analysis on a rstudio-server version 1.3.959–1 Professional for CentOS 6, taking about an hour per dataset, and we also tested it on other rstudio versions. Future work will involve collecting data from larger studies with more than dozens of batches to conduct further testing of memory limitations.

A requirement of the current CytofRUV method is the availability of enough material to have replicated reference samples in each batch. As previously mentioned for CytoNorm, and as we also observed for CytofRUV, in order to remove batch effects from all samples, it might be necessary to include more than one set of replicated references samples in the batch, in particular including samples that are similar to each of the main types of samples. In future work, we plan to explore estimating the unwanted variation using different replicated reference samples, each present in only some of the batches, to avoid the need for replicated reference samples in every batch. For example, the use of a carefully designed ‘bridging’ set of replicated reference samples analogous to long term reference samples in metabolomics (Broadhurst et al., 2018) should lead to the normalisation of large studies that is just as effective as that achieved using the same replicated reference samples throughout.

In summary, we proposed here a computational algorithm called CytofRUV that effectively enables batch effect reduction in mass cytometry with an adaptable normalisation method that detects heterogeneity of cellular responses across large-scale studies with multiple patients, cell types and conditions (e.g. treatment).

Materials and methods

CytofRUV

View detailed protocol

To remove the unwanted variation across multiple datasets and batches, we used fastRUV-III as previously described (Lin et al., 2019; Molania et al., 2019). Briefly, the data are first standardised before being fitted to the RUV-III model underlying all RUV analyses:

Yij=Xiβj+Wiαj+ εij

Here, Yij is the standardised expression value (arcsinh(intensity/5)) for protein j in a cell i with i=1,, m and j=1,, n, of m cells and n proteins. The standardisation is to have zero mean and unit standard deviation across all cells for all protein measurements.

Xi represents the factors of interest for the sample giving cell i. Wi represents the unwanted factors for that cell. The dimension of the unwanted factors is being denoted by k, αj represents the coefficient of Wi for protein j in a cell i and εij is noise, typically εij ~ N(0, σj2).

The data are normalised in six steps. First, all the data are clustered, typically using FlowSOM, although other clustering methods can be used. Second, group of cells are defined to be pseudo-replicates if they belong to the same subpopulation (i.e. cluster) but are in different batches, which can be done either on specific clusters or on all clusters. Third, cell-level residuals are computed by averaging all the protein measurements across those pseudo-replicates, and subtracting these averages from each measurement on each cell. In essence, differences of measurements on pseudo-replicates are considered as unwanted variation. The quantities αj are then estimated from the singular value decomposition (SVD) of the m x n matrix of these residuals. Next the k-vectors of Wi for the cells are then estimated using all proteins as 'negative controls'. Finally, the estimates of Wi and αj are multiplied to get an estimate of Wiαj, which is then subtracted from the Yij to get the final adjusted data. Full details and a discussion of key issues can be found in Lin et al., 2019 and Molania et al., 2019.

BatchAdjust

Request a detailed protocol

For all the datasets, BatchAdjust was performed using the R code and usage instructions as described in Schuyler et al., 2019. We used the option of scaling the 95th percentiles with no transformation applied to the data before adjustment.

CytoNorm

Request a detailed protocol

CytoNorm was performed as described in Van Gassen et al., 2019 following the two steps provided in the R library CytoNorm to normalise the data.

Earth Movers Distance

Request a detailed protocol

To quantify the similarity of protein expression distributions across batches, we computed the Earth Movers Distance, also called the Wasserstein metric, as described in Van Gassen et al., 2019. Briefly, the data are binned and we computed the pairwise EMDs across batches for the distribution of every protein over all cells as well as over the cells in each cluster. This was done for both the original dataset and the normalised datasets.

Hellinger distance

Request a detailed protocol

To quantify the differences between cluster proportions across batches, we computed the Hellinger distances between the proportions found in samples replicated across different batches. This distance is defined for two probability p=(pi) and q=(qi) by H(p,q), where H(p,q)2=12i=1n(pi- qi)2.

We computed these distances for both the original and the normalised datasets.

Silhouette scores

Request a detailed protocol

To assess the extent to which the data are grouped based on the batch effects as opposed to biological signals, we computed batch and biology Silhouette scores. Given a partitioning of all cells into groups, if ai denote the average Euclidean distance of the protein expression between the cell i and all other cells in the group to which cell i is assigned, and bi is the minimum of the average distance between the cell i and any cells in other groups not containing cell i, then the silhouette coefficient of cell i is calculated as

si=bi-aimax{bi,ai}

The average of the silhouette values across cells using a particular grouping is called the silhouette score for that grouping. Silhouette score ranges from −1 to +one where positive values (bi is high and ai is low) indicate that cells are well matched to their own group. In this way, we computed the silhouette score sbatch based on the batches as groups and the silhouette score sbiology based on the grouping of the cells by subpopulation (i.e. clusters).

sbiology is used to quantify the cell-to-cell variation within cell subpopulations compared to other subpopulations. Negative values mean that the data might be mis-clustered as it is more similar to a neighbouring cluster. For example, if two different biological relevant clusters would be merged into a single cluster, sbiology will reflect this merging with a lower sbiology value.

Differential analysis

Request a detailed protocol

To perform a differential analysis of the cluster abundances across batches, we used diffcyt with the default method 'diffcyt-DA-edgeR' and default parameters with batch as the factor of interest for the differential tests. We performed this analysis both on the data before and after normalisation and calculated the number of significant detected clusters at 10% false discovery rate as described in the diffcyt workflow.

As the differential expression of proteins could not be performed with too few replicates, as with a t-test, we computed the medians Mm,b for each marker m and each batch b across the 12 paired samples in a given cluster. We then computed the difference of medians between batches Δ=Mm,2  Mm,1 across the 12 paired samples. We computed those differences raw on the data before and Δnorm after normalisation and then plotted the difference of those values, that is ΔΔ=ΔrawΔnorm in Figure 3—figure supplement 4.

R package CytofRUV

Request a detailed protocol

Our algorithm is implemented in the R package ‘CytofRUV’ and is available at:(www.github.com/mtrussart/CytofRUV) (Trussart, 2020). Installation and R code usage instructions for both the R package and the R-Shiny application can be found on the GitHub page. Users are required to provide: the FCS files from all samples in the study, a metadata file containing the details of each sample and a panel file containing the details of all proteins used in the study. The R-Shiny application allows the user to explore the data and identify batch effects across replicates using the diagnostic plots previously mentioned: Median Protein Expression, Protein Expression Distributions, Clustering Results and Cluster Proportions using samples replicated across batches. It can be explored on all the data or on a subsample of the data. The normalize_data function allow the user to adjust for batch effects with parameter settings for the CytofRUV algorithm, such as which replicated samples to use and the value of k. The pipeline and scripts used to generate the results described in this manuscript is also available in the supplementary data.

PBMC samples from patient and healthy donor

Request a detailed protocol

Blood was obtained from healthy donors (via the Victorian Donor Blood Registry) and patients with CLL (via the Royal Melbourne Hospital, Australia). All patients consented under Melbourne Health HREC 2016.305 and samples were analysed under HREC 2016.066. PBMCs were isolated using standard Ficoll-based methods and cryopreserved.

Mass cytometry

Request a detailed protocol

Cells were thawed and stained for viability with cisplatin. Cells were then fixed with paraformaldehyde (PFA: Electron Microscopy Sciences, Hatfield, PA, USA) to a final concentration of 1.6% for 10 min at room temperature. Cells were pelleted and washed once with cell staining medium (CSM, PBS with 0.5% BSA and 0.02% sodium azide) to remove residual PFA and stored at −80°C.

Cells were barcoded using 20-plex palladium barcoding according to manufacturer’s instructions (Fluidigm, South San Francisco, CA, USA). Following barcoding, cells were pelleted and washed once with cell staining medium (PBS with 0.5% BSA and 0.02% sodium azide) to remove residual PFA. Cells were stained with CD16/CD32 for 10 min and surface antibodies (Table 2) for 30 min at room temperature. Cells were permeabilized with 4°C methanol for 10 min. Cells were washed three times with CSM and stained with intracellular antibodies (Table 2) for 30 min at room temperature. Cells were washed with CSM, then stained with 125 nm 191Ir/193Ir DNA intercalator (Fluidigm, South San Francisco, CA, USA) in PBS with 1.6% PFA at 4°C overnight. Cells were washed once with CSM, washed three times with double-distilled water and filtered to remove aggregates and resuspended with EQ normalisation beads immediately before analysis using a Helios mass cytometer (Fluidigm, South San Francisco, CA, USA). Throughout the analysis, cells were maintained at 4°C and introduced at a constant rate of ~300 cells/sec.

Processing data for mass cytometry

Request a detailed protocol

Data concatenation, normalisation and debarcoding are done using the R Catalyst package merging the two batches (RUV_1B and RUV3B) when applying the Finck normalisation. The R script used to generate this preprocessing (CytofRUV_preprocessing_dataset.R) is also available in the supplementary data.

Flow repository

Request a detailed protocol

The FCS files from this study are available at flow repository ID FR-FCM-Z2L2.

Data availability

The fcs files from this study are available at Flow Repository, ID FR-FCM-Z2L2.

The following data sets were generated
The following previously published data sets were used
    1. Gassen S
    (2019) FlowRepository
    ID FR-FCM-Z247. Immune Clock of Pregnancy Validation - Controls.

References

Decision letter

  1. Detlef Weigel
    Senior Editor; Max Planck Institute for Developmental Biology, Germany
  2. Greg Finak
    Reviewing Editor; Fred Hutchinson Cancer Research Center, United States
  3. Nima Aghaeepour
    Reviewer
  4. Anna Belkina
    Reviewer; Boston University School of Medicine, United States
  5. Sofie Van Gassen
    Reviewer; Ugent Center for Inflammation Research, Belgium

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The work is an important contribution to the field of cytometry. It provides an objective and well thought out normalization procedure for mass cytometry and potentially even fluorescence flow cytometry.

Decision letter after peer review:

Thank you for submitting your article "CytofRUV: Removing unwanted variation to integrate multiple CyTOF datasets" for consideration by eLife. Your article has been reviewed by four peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Detlef Weigel as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Nima Aghaeepour (Reviewer #1); Anna Belkina (Reviewer #3); Sofie Van Gassen (Reviewer #4).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

The authors present a new Cytof normalization approach based on RUV III that has proven useful for other technologies including RNASeq, single-cell RNAseq and nanostring. The reviewers all agreed that this was a strong manuscript that makes an important contribution to an area of the field that remains under-served, and they unanimously recommended publication.

Essential revisions:

Several concerns arose during review that the authors should address to strengthen the results and improve the presentation.

1) Given the size of typical cytometry data sets (millions of cells and hundreds of samples), how does this approach scale? What are the limitations? The Discussion mentions large-scale studies, but how large in practice?

2) Most importantly, reviewers raised questions around the evaluation of the normalization procedures. Specifically, since the data are re-clustered after normalization, and performance of normalization was assessed against the re-clustered data (and while the reviewers agreed this made sense), they were concerned about how a negative impact of normalization could be assessed. Specifically, if normalization failed, leading to fusion of biologically relevant clusters post-normalization, how could that be detected? It was not clear that this would be captured by Sbiology (as pre- and post-normalization Sbiology measures are not compared), nor how and if the proposed figures and evaluation measures could be interpreted to detect this situation. The reviewers felt this aspect needed further exploration and discussion.

3) Finally the reviewers were looking for more guidance from the presentation around the relative ranking of the different methods. The authors should more clearly present their conclusions about the relative performance of the methods and state which methods are performing well or poorly and why, and if the results are not conclusive, this should be stated and explained.

https://doi.org/10.7554/eLife.59630.sa1

Author response

Essential revisions:

Several concerns arose during review that the authors should address to strengthen the results and improve the presentation.

1) Given the size of typical cytometry data sets (millions of cells and hundreds of samples), how does this approach scale? What are the limitations? The Discussion mentions large-scale studies, but how large in practice?

We thank the reviewers for their question and we updated the Discussion of the manuscript with more details regarding large-scale studies and their limitations in the revised manuscript.

“In this study we considered three different datasets containing up to 24 samples and up to 12 batches. […] For example, the use of a carefully designed “bridging” set of replicated reference samples analogous to long term reference samples in metabolomics (Broadhurst et al., 2018) should lead to the normalization of large studies that is just as effective as that achieved using the same replicated reference samples throughout.”

2) Most importantly, reviewers raised questions around the evaluation of the normalization procedures. Specifically, since the data are re-clustered after normalization, and performance of normalization was assessed against the re-clustered data (and while the reviewers agreed this made sense), they were concerned about how a negative impact of normalization could be assessed. Specifically, if normalization failed, leading to fusion of biologically relevant clusters post-normalization, how could that be detected? It was not clear that this would be captured by Sbiology (as pre- and post-normalization Sbiology measures are not compared), nor how and if the proposed figures and evaluation measures could be interpreted to detect this situation. The reviewers felt this aspect needed further exploration and discussion.

The reviewers are raising an important aspect to take into account in the evaluation of the normalization. We acknowledge that in some cases there might be loss of clusters or the identification of new biologically relevant clusters. An important aspect to take into account in this discussion is that the same number of clusters is conserved before and after normalization. To assess the performance of the normalization, we clarify out use of Sbiology and provide an example of how users can validate their clusters. We expanded the exploration and discussions of the questions raised here in the Discussion in the revised manuscript.

3) Finally the reviewers were looking for more guidance from the presentation around the relative ranking of the different methods. The authors should more clearly present their conclusions about the relative performance of the methods and state which methods are performing well or poorly and why, and if the results are not conclusive, this should be stated and explained.

We added more descriptions in the relative performance of each method for all three datasets (Results) and clearly stated the conclusions from those analysis in the Discussion in the revised manuscript.

https://doi.org/10.7554/eLife.59630.sa2

Article and author information

Author details

  1. Marie Trussart

    1. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
    2. School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Supervision, Validation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    trussart.m@wehi.edu.au
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7258-7272
  2. Charis E Teh

    1. The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
    2. Department of Medical Biology, The University of Melbourne, Parkville, Australia
    Contribution
    Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9745-2876
  3. Tania Tan

    1. The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
    2. Department of Medical Biology, The University of Melbourne, Parkville, Australia
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  4. Lawrence Leong

    1. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
    2. School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
    Contribution
    Data curation, Software
    Competing interests
    No competing interests declared
  5. Daniel HD Gray

    1. The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
    2. Department of Medical Biology, The University of Melbourne, Parkville, Australia
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8457-8242
  6. Terence P Speed

    1. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
    2. School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Validation, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5403-7998

Funding

National Health and Medical Research Council (1054618)

  • Marie Trussart
  • Terence P Speed

National Health and Medical Research Council (1158024)

  • Daniel HD Gray

Cancer Council Victoria (1146518)

  • Tania Tan
  • Daniel HD Gray

National Health and Medical Research Council (1089072)

  • Charis E Teh

Perpetual Impact Philanthropy (IPAP2019/1437)

  • Charis E Teh

UROP Fellowship

  • Lawrence Leong

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

MT and TPS were supported by NHMRC Program Grant 1054618. DHDG is supported by a NHMRC Senior Research Fellowship (1158024), a Cancer Council of Victoria Grant-in-aid (1146518) and support from the Victorian Comprehensive Cancer Centre. CET is supported by an Australian NHMRC Early Career Fellowship (1089072).

We are grateful to Dr MaryAnn Anderson and Prof Andrew Roberts for the provision of patient material.

This work was made possible through Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIISS. The authors thank Dr Andrew Mitchell (University of Melbourne) for assistance with mass cytometry maintenance/operation. This work was performed in part at the Materials Characterisation and Fabrication Platform (MCFP) at the University of Melbourne and the Victorian Node of the Australian National Fabrication Facility (ANFF).

We thank Prof. Damien Hicks, Dr Chris Woodruff and Dr Ramyar Molina for their suggestions during this study. We thank Dr Anna Quaglieri for her suggestions concerning the CytofRUV R package. We also thank Prof. Ronald P Schuyler, Prof. Debashis Ghosh, Prof. Elena Hsieh, Ms Helena Crowell, Prof. Jean Yang, Yingxin Lin and Dr Greg Finak as well as the reviewers for their helpful comments on the manuscript.

Ethics

Human subjects: All patients provided written informed consent and the study was approved by Human Research Ethics Committees/Institutional Review Boards: RMH (2005.008, 2012.244, 2016.305,2016.066) and the Walter and Eliza Hall Institute (G15/05).

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Developmental Biology, Germany

Reviewing Editor

  1. Greg Finak, Fred Hutchinson Cancer Research Center, United States

Reviewers

  1. Nima Aghaeepour
  2. Anna Belkina, Boston University School of Medicine, United States
  3. Sofie Van Gassen, Ugent Center for Inflammation Research, Belgium

Publication history

  1. Received: June 3, 2020
  2. Accepted: September 5, 2020
  3. Accepted Manuscript published: September 7, 2020 (version 1)
  4. Version of Record published: September 18, 2020 (version 2)
  5. Version of Record updated: November 5, 2020 (version 3)

Copyright

© 2020, Trussart et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,715
    Page views
  • 167
    Downloads
  • 6
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    Christopher P Mancuso et al.
    Research Article Updated

    Environmental disturbances have long been theorized to play a significant role in shaping the diversity and composition of ecosystems. However, an inability to specify the characteristics of a disturbance experimentally has produced an inconsistent picture of diversity-disturbance relationships (DDRs). Here, using a high-throughput programmable culture system, we subjected a soil-derived bacterial community to dilution disturbance profiles with different intensities (mean dilution rates), applied either constantly or with fluctuations of different frequencies. We observed an unexpected U-shaped relationship between community diversity and disturbance intensity in the absence of fluctuations. Adding fluctuations increased community diversity and erased the U-shape. All our results are well-captured by a Monod consumer resource model, which also explains how U-shaped DDRs emerge via a novel ‘niche flip’ mechanism. Broadly, our combined experimental and modeling framework demonstrates how distinct features of an environmental disturbance can interact in complex ways to govern ecosystem assembly and offers strategies for reshaping the composition of microbiomes.

    1. Computational and Systems Biology
    Michael S Lauer, Deepshikha Roychowdhury
    Research Article Updated

    Previous reports have described worsening inequalities of National Institutes of Health (NIH) funding. We analyzed Research Project Grant data through the end of Fiscal Year 2020, confirming worsening inequalities beginning at the time of the NIH budget doubling (1998–2003), while finding that trends in recent years have reversed for both investigators and institutions, but only to a modest degree. We also find that career-stage trends have stabilized, with equivalent proportions of early-, mid-, and late-career investigators funded from 2017 to 2020. The fraction of women among funded PIs continues to increase, but they are still not at parity. Analyses of funding inequalities show that inequalities for investigators, and to a lesser degree for institutions, have consistently been greater within groups (i.e. within groups by career stage, gender, race, and degree) than between groups.