1. Immunology and Inflammation
  2. Microbiology and Infectious Disease
Download icon

Single-cell RNA-seq reveals transcriptomic heterogeneity mediated by host–pathogen dynamics in lymphoblastoid cell lines

  1. Elliott D SoRelle
  2. Joanne Dai
  3. Emmanuela N Bonglack
  4. Emma M Heckenberg
  5. Jeffrey Y Zhou
  6. Stephanie N Giamberardino
  7. Jeffrey A Bailey
  8. Simon G Gregory
  9. Cliburn Chan
  10. Micah A Luftig  Is a corresponding author
  1. Department of Molecular Genetics and Microbiology, Center for Virology, Duke University School of Medicine, United States
  2. Department of Biostatistics and Bioinformatics, Duke University School of Medicine, United States
  3. Department of Pharmacology and Cancer Biology, Duke University School of Medicine, United States
  4. Department of Medicine, University of Massachusetts Medical School, United States
  5. Duke Molecular Physiology Institute and Department of Neurology, Duke University School of Medicine, United States
  6. Department of Pathology and Laboratory Medicine, Warren Alpert Medical School, Brown University, United States
Research Article
  • Cited 3
  • Views 2,546
  • Annotations
Cite this article as: eLife 2021;10:e62586 doi: 10.7554/eLife.62586

Abstract

Lymphoblastoid cell lines (LCLs) are generated by transforming primary B cells with Epstein–Barr virus (EBV) and are used extensively as model systems in viral oncology, immunology, and human genetics research. In this study, we characterized single-cell transcriptomic profiles of five LCLs and present a simple discrete-time simulation to explore the influence of stochasticity on LCL clonal evolution. Single-cell RNA sequencing (scRNA-seq) revealed substantial phenotypic heterogeneity within and across LCLs with respect to immunoglobulin isotype; virus-modulated host pathways involved in survival, activation, and differentiation; viral replication state; and oxidative stress. This heterogeneity is likely attributable to intrinsic variance in primary B cells and host–pathogen dynamics. Stochastic simulations demonstrate that initial primary cell heterogeneity, random sampling, time in culture, and even mild differences in phenotype-specific fitness can contribute substantially to dynamic diversity in populations of nominally clonal cells.

Introduction

Lymphoblastoid cell lines (LCLs) are immortalized cells prepared by in vitro transformation of resting primary B cells from peripheral blood with Epstein–Barr virus (EBV) (Bird et al., 1981; Anderson and Gusella, 1984). LCLs are used extensively in research as a model for EBV-associated malignancies including diffuse large B cell lymphoma (Nichele et al., 2012; Tazzari et al., 1999) and post-transplant lymphoproliferative disorder (Markasz et al., 2009; Rea et al., 1994). Because EBV is a non-mutagenic transformant in this context, LCLs constitute an important renewable source of human cells and genomic DNA that are used in immunological, genetic, and virology research (Çalışkan et al., 2014; Choy et al., 2008; Oh et al., 2013; Stark et al., 2010; Volkova et al., 2019).

EBV is a double-stranded oncogenic gammaherpesvirus infecting over 90% of humans (Rickinson and Kieff, 2007). In vivo, the virus typically establishes an asymptomatic persistent latent infection in episomal form (Lindahl et al., 1976; Nonoyama and Pagano, 1972) within resting memory B cells (Longnecker et al., 2013). Latent infection can take one of several forms, each characterized by distinct programs of viral gene expression initiated from different promoters (Price and Luftig, 2015). For example, classical EBV infection within resting memory B cells in vivo is characterized by the Latency I program in which expression from the Q promoter yields a single viral protein, EBV Nuclear Antigen 1 (EBNA1), which functions to maintain the viral episome (Hung et al., 2001). Latency I, termed ‘true latency’, is established only after a complex progression of infection through pre-latency, Latency IIb, Latency III, and restricted forms of latency (e.g., Latency IIa), each occurring in distinct tissues within the body (Price and Luftig, 2015). EBV can undergo lytic reactivation as a replication strategy, which is relatively infrequent in cell culture despite being essential for transmission in vivo (Bhende et al., 2004).

In vitro, the process of LCL production also necessarily involves multiple transitions in viral transcriptional programs. In the immediate-early stage of infection (the pre-latent phase), expression from the W promoter yields EBNA-LP, EBNA2, and several noncoding RNAs (EBERs, BHRF1 miRNAs, and BART miRNAs). A brief burst of lytic gene transcription (without lytic replication) is also observed during pre-latency (Woisetschlaeger et al., 1989). EBNA-LP and EBNA2 protein levels increase gradually within these early-infected cells, eventually leading to Latency IIb in which EBNA2 activation of the C promoter upregulates expression of EBNA1, EBNA3A, EBNA3B, EBNA3C, and additional EBNA-LP and EBNA2 proteins as well as noncoding RNAs (Alfieri et al., 1991). Latency IIb gene products induce hyperproliferation, a period of several days during which infected B cells divide every 10–12 hr (Nikitin et al., 2010). During hyperproliferation, EBNA1 mediates viral genome replication while EBNA3 proteins inhibit host cell antiviral and tumor-suppression responses. Variance in virally mediated rates of proliferation ensures that some infected cells undergo DNA damage-induced growth arrest (Nikitin et al., 2010; Nikitin et al., 2014) while others continue to proliferate, eventually outgrowing as immortalized LCLs. LCLs largely exhibit the Latency III transcriptional profile, characterized by expression of all six EBNAs (EBNA-LP, EBNA1, EBNA2, and EBNAs 3A–3C) in addition to latent membrane proteins 1 and 2 (LMP-1, LMP-2A/B) and noncoding RNAs (Young and Rickinson, 2004). In Latency III, EBNA2 stimulates expression of LMP-1, a constitutively active tumor necrosis factor receptor (TNFR) homolog (Mosialos et al., 1995). LMP-1 signaling drives proliferation and survival via NFκB pathway activation (Devergne et al., 1996), which has been shown to be essential for LCL outgrowth (Cahir-McFarland et al., 2000).

Although studied extensively, complete characterization of the viral and host determinants of growth arrest versus immortalization of early-infected cells remains elusive (Mrozek-Gorska et al., 2019). As one consequence, it is unclear whether or to what extent viral transformation may influence the resulting LCL cell populations. The possibility of significant phenotypic diversity within and across LCL samples warrants consideration, given the intrinsic variance of the human primary B cell repertoire (Morbach et al., 2010; Perez-Andres et al., 2010) and the multiplicity of viral transcription programs active in the journey to immortalization. Indeed, we recently described a gene expression program having low expression of LMP1 and NFκB targets which was unique to early infection (Latency IIb) relative to an otherwise identical population of LCLs (Messinger et al., 2019). The wide distribution in LMP1 and NFκB target expression levels within an LCL has been characterized and ascribed to the dynamic sampling of a distribution of immune evasive states, at the fringes of which growth and survival can be compromised (Brooks et al., 2009; Lam et al., 2004; Lee and Sugden, 2008).

In this study, we characterize the transcriptomic profiles of five different LCLs with single-cell resolution to assess inter- and intra-sample heterogeneity. Four of the sampled LCLs (two in-house and two commercial cell lines) were transformed with the prototypical B95-8 strain of EBV derived from an infectious mononucleosis patient (Miller and Lipman, 1973), while a fifth sample (in-house) was prepared from cells transformed with the M81 strain isolated from a human nasopharyngeal carcinoma sample (Desgranges et al., 1976; Tsai et al., 2013). Primary cells used in establishing the five LCLs were isolated and transformed from a total of four donors; cells from one donor were transformed concomitantly to establish LCLs with each of the tested EBV strains. We observe B cell genetic heterogeneity in the form of differential heavy chain isotype expression across LCLs and, in three instances, within a sample. Further, comparable patterns of phenotypic variance with respect to NFκB pathway and plasma cell-like differentiation genes are evident in each LCL. Expression of host and viral genes indicate that individual cells within LCLs occupy a continuum of infection states. We also present an initial stochastic model to explore factors beyond the nuances of host–pathogen interactions that may generate profound phenotypic diversity within cultured cell lines. Our findings highlight some of the underappreciated complexity inherent within LCLs and broadly underscore the importance of understanding and accounting for sources of heterogeneity within presumptive cell lines.

Results

LCL generation and data provenance

Three LCLs were prepared in-house by infection of PBMCs from two donors (sample numbers 461 and 777) with one of two different EBV strains (B95-8 or M81). Each of these three samples (LCL 461 B95-8, LCL 777 B95-8, and LCL 777 M81) was prepared and processed using standard single-cell RNA sequencing workflows (see Materials and methods). Two additional, publicly available data sets were obtained for commercially available samples of the GM12878 and GM18502 LCLs, which were generated as previously reported by Osorio and colleagues (Osorio et al., 2019). These five samples yielded single-cell RNA count matrices for subsequent analysis.

LCL sample quality control (QC)

Count matrices for the five samples exhibited similar feature, total RNA count, and mitochondrial gene distributions (Figure 1—figure supplements 1 and 2) and were subjected to standardized QC thresholding (see Materials and methods). Cell cycle marker expression (Figure 1—figure supplement 3) was scored and regressed out during selection of highly variable genes as features to avoid clusters arising solely from cell cycle phase. Selected features were used to derive principal components which were evaluated (Figure 1—figure supplement 4) and subsequently used for dimensional reduction (see Materials and methods). Separate analysis of the merged sample data set indicated that inter-donor variability is the predominant source of heterogeneity (Figure 1—figure supplement 5).

Immunoglobulin isotype heterogeneity within and across LCL samples

The five LCL populations exhibit distinct immunoglobulin (Ig) profiles with respect to both gene expression levels and isotype frequencies (Figure 1). Three of the five samples (LCL 777 B95-8, LCL 777 M81, and GM12878) contain IgM+ and class-switched IgA+ and IgG+ subpopulations, whereas two samples (LCL 461 B95-8 and GM18502) almost exclusively expressed IgG (IGHG1-4; Figure 1A). Additionally, cells within each isotype class exhibit a wide range of Ig transcript levels across all samples in an apparent class-independent fashion. No significant expression of IGHE was observed in any of the five samples, consistent with the isotype’s rarity in the peripheral blood (He et al., 2017; Saunders et al., 2019). The immunoglobulin compositions observed for each LCL were confirmed subsequently by RT-PCR and sequencing, which revealed that each isotype represents a distinct clone within the culture (Figure 1—figure supplement 6). Significant IGHD transcript levels were observed in one sample (LCL 777 B95-8), where the gene’s expression was constrained to (and varied inversely with expression levels of) IgM+ cells (Figure 1—figure supplement 7).

Figure 1 with 16 supplements see all
Immunoglobulin isotype heterogeneity within and across lymphoblastoid cell lines (LCLs).

(A) Relative expression of immunoglobulin heavy chain genes (IGHM, IGHA1, and IGHG1) in five LCLs analyzed by single-cell RNA sequencing. Data are represented by dimensional reduction (t-distributed stochastic neighbor embedding) of principal components generated from feature selection following out-regression of cell cycle markers (see Experimental methods). (B) Percentage of cells in LCL population within each isotype class. Null classification represents cells exhibiting negligible immunoglobulin heavy chain expression.

The proportion of cells expressing each isotype varied substantially among LCLs (Figure 1B). IgG was the only isotype observed in LCL 461 B95-8. Cells in the GM18502 sample were also homogenous for IGHG1, although low levels of IGHM transcripts are observed in up to half of the population. The proportion of IgM+, IgA+, and IgG+ subpopulations in LCL 777 B95-8 were 69%, 7%, and 24%; in LCL 777 M81 were 1%, 35%, and 64%; and in GM12878 were 6%, 73%, and 18%. Abundance of Ig light chain gene (kappa or lambda) and heavy chain isoform expression are generally correlated with variable heavy chain expression in each of the five samples (Figure 1—figure supplements 716). The isotype and clonal frequency differences between LCL 777 B95-8 and LCL 777 M81 are notable, given that these samples originated from the same donor and were transformed at the same time with different viral strains.

Differential Ig isotype expression is a significant source of variation in LCLs, as captured by the loadings from principal component analysis (PCA), typically within the first four PCs. Consequently, differences in Ig isotype are effectively captured in dimensionally reduced data sets generated from PCs using t-distributed stochastic neighbor embedding (tSNE) even at low clustering resolution. In samples with more homogenous isotype expression (LCL 461 B95-8 and GM18502), the relative Ig expression level is a significant factor in distinguishing clusters.

Genes involved in B cell activation and differentiation exhibit inverse expression gradients

Across all samples, LCL populations display variable mRNA transcript levels for genes involved in cell activation, inhibition of apoptosis, response to oxidative stress, and differentiation (Figure 2). Gradients in Ig expression exhibit strong anticorrelation with expression of NFκB pathway transcripts (e.g., NFKB2, NFKBIA, and EBI3) central to B cell activation and survival (Figure 2A, Figure 2—figure supplement 1A). Similar gradients are observed for metabolic and oxidative stress response transcripts (e.g., TXN, PRDX1, PKM, LDHA, ENO1, and HSP90AB1); however, these transcripts are present more broadly (>80% of cells) and at higher levels than NFκB-related genes (20–30% of cells) in each sample (Figure 2—figure supplement 2). While NFκB family gene expression is consistently anticorrelated with that of B cell differentiation factors, significant diversity exists in NFκB-high cells with respect to specific subunits including REL, RELA, and RELB (Figure 2—figure supplement 3). This implies differential intercellular NFκB dimer composition and, consequently, intra-sample variation in NFκB-mediated transcriptional programs. Expression of NFκB regulated BCL2 family members (e.g., BCL2L1/Bcl-xL and BCL2A1/BFL1) displays strong anticorrelation with Ig expression level. However, MCL1 and BCL2 mRNAs are more broadly expressed across cells within each LCL, while BCL2L2/BCL-W is only modestly expressed in LCLs (Figure 2—figure supplement 4).

Figure 2 with 8 supplements see all
Lymphoblastoid cell lines (LCLs) exhibit anticorrelated expression gradients of activation and differentiation genes.

(A) Inverse expression gradients of immunoglobulin genes (IGHM, IGHA1, and IGHG1) in magenta and NFκB targets (NFKB2, NFKBIA, EBI3, ICAM1, and BCL2A1) and TXN in green. (B) Similar inverse gradients of NFκB targets in green and B cell differentiation markers (TNFRSF17, XBP1, MZB1, CD27, and CD38) in orange. (C) Pearson correlation maps and hierarchical clustering reveal negative correlation of differentiation (orange) and activation (green) gene sets and positive correlations between genes within each set. (D) In LCLs comprising multiple immunoglobulin isotypes, heavy chain class and differentiation/activation gradients constitute orthogonal (independent) axes of phenotypic variance.

Ig gradients are closely related to expression of differentiation and maturation markers (e.g., CD27, TNFRSF17/BCMA, XBP1, MZB1, and PRDM1) (Bhende et al., 2007; Hatzoglou et al., 2000; Rosenbaum et al., 2014), which are likewise anticorrelated with NFκB pathway markers (Figure 2B and C, Figure 2—figure supplement 1B). The apparent inverse relationship between these gene sets defines a major axis of phenotypic variance within LCL samples comprising multiple Ig isotypes (Figure 2D). The orthogonality of the pro-survival/differentiation and isotype class diversity axes implies that these two aspects of phenotypic variance are decoupled. Continuity between phenotypes resembling activated B cells (ABC) and antibody-secreting cells (ASC) is also captured in the expression profiles of key genes involved in the mutually antagonistic control of B cell state (Figure 2—figure supplement 5; Nutt et al., 2015). In this model, genes including PAX5 and IRF8 promote the ABC state; IRF4 and MKI67 (a G2/M cell cycle marker) are markers of a transitional phenotype; and PRDM1 (BLIMP1) and XBP1 promote the ASC state. As cell cycle marker expression was regressed out, mitotic phase has negligible influence on the observed trends.

Whereas distinctions in Ig isotype class expression tend toward discrete partitioning, intra-isotype expression of differentiation and maturation genes reflects a continuum of transcriptomic states and cellular functions. Thus, within a given isotype, elevated Ig heavy chain expression is negatively correlated with activation/anti-apoptotic gene expression and positively correlated with maturation/differentiation gene expression. These relationships are most readily evident in LCL samples consisting of a single class-switched population, such as GM18502 (Figure 2—figure supplement 1C).

Finally, the viral EBNA2 and EBNA3 proteins are responsible for transcriptional regulation that we specifically interrogated within the single cell data. The direct EBNA2 targets RUNX3 and FCER2/CD23 correlated with NFκB expression (Figure 2—figure supplement 6Spender et al., 2002). Indeed, the expression of RUNX3 and FCER2/CD23 was anticorrelated with Ig expression consistent with the known role of EBNA2 in suppressing heavy chain transcription (Jochner et al., 1996). In contrast, the EBNA3 repressed targets including CXCL9, CXCL10, BCL2L11/BIM, and ADAMDEC1 were uniformly repressed (Figure 2—figure supplement 7) consistent with the role of histone and DNA methylation in maintaining gene repression of EBNA3 targets (Harth-Hertle et al., 2013; McClellan et al., 2013; Paschos et al., 2009).

Differential expression of genes involved in cell activation could affect rates of cell proliferation within an LCL population. To explore this possibility, we sorted three additional LCLs by ICAM-1 expression and evaluated the growth and metabolic profiles of the sorted fractions. On average, ICAM-1hi cells (consistent with the ABC phenotype) exhibited modestly faster growth in culture than ICAM-1lo cells (ASC phenotype) between 1 and 4 days post-sorting. Notably, metabolic activity was elevated in ICAM-1hi cells than ICAM-1lo cells across all three LCLs, as indicated by higher rates of glycolysis and oxygen consumption (Figure 2—figure supplement 8).

Viral state heterogeneity affects host expression profile distributions in LCLs

Clusters with high EBV lytic gene expression are observed in two of the three data sets (LCL 777 B95-8 and LCL 777 M81) aligned against the human reference genome containing the viral genome as an extra chromosome (see Materials and methods; Figure 3). Lytic cluster cells are small, accounting for 2.2% and 0.9% of the LCL 777 B95-8 and LCL 777 M81 cell populations, respectively (Figure 3A). The higher rate of lytic cell capture in the B95-8 sample relative to the M81 sample is somewhat surprising, as the M81 strain is known for increased frequency of lytic reactivation; however, this disparity may originate from the nature of single-cell sample preparation method (see Discussion; Zheng et al., 2017).

Figure 3 with 1 supplement see all
Viral and host gene expression in lytic cell subpopulations.

(A) Clustering of dimensionally reduced data sets for LCL 777 B95-8 and LCL 777 M81. (B) Grouping of cell clusters into latent (red) and lytic (cyan) cells based on viral and host gene expression signatures of principal components. (C) Relative expression of four representative Epstein–Barr virus (EBV) lytic genes (BHRF1, BLRF1, BALF1, and BARF1) is elevated in lytic cell subpopulations. (D) Lytic cell clusters exhibit elevated expression of several host cell genes (SGK1, NHLH1, NFATC1, MIER2, and SFN) relative to latently infected cells. While under-sampled due to subpopulation size, immunoglobulin class frequencies in lytic cells roughly reflect the population-wide frequencies.

The presence or absence of viral lytic transcripts is a significant source of phenotypic variance in these samples, as reflected in population groupings by viral state (Figure 3B) and principal component loadings (Figure 1—figure supplements 15 and 16, PC_3 and PC_7, respectively). Lytic cells can be identified confidently from high expression of EBV genes including BLRF1, BALF1, and BARF1, among others (Figure 3C). BHRF1 expression is also elevated in lytic cells, although BHRF1 transcripts are ubiquitous at low levels sample wide. This is likely because BHRF1 can be expressed during both latent and lytic phases of EBV infection from different promoters (Xing and Kieff, 2007). Cells identified as lytic exhibit lytic gene expression ranging from approximately 3–15% of total measured transcripts per cell (Figure 3—figure supplement 1). Thus, it is possible that this cluster represents both truly lytic cells (>10% lytic transcripts) and abortive lytic cells (Chiu and Sugden, 2016). Alternatively, the cells with lower lytic transcript expression may have been at earlier stages of lytic reactivation at the time of sample preparation.

While the absolute number of lytic cells in each sample is low, the data indicate that the lytic cells are polyclonal with respect to Ig heavy chain expression, display upregulation of several host genes including NFATC1, MIER2, SFN, and SGK1, and exhibit heterogeneous NFκB expression (Figure 3D, Figure 1—figure supplements 5 and 6). Ig isotype distributions in lytic cell clusters appear roughly proportional to the whole-sample distributions. NFATC1, MIER2, SFN, and SGK1 transcript levels were queried for GM12878 and GM18502 samples to test whether the presence of lytic cell subpopulations might be inferred from host gene expression. A sub-cluster representing a small percentage of cells in GM12878 (<0.5%) were found to co-express MIER2 and NFATC1. Negligible expression of either gene was observed in GM18502 (Figure 1—figure supplements 8 and 9).

Loss of mitochondrial and Ig expression in subpopulations under oxidative stress

Three of the five samples (LCL 461 B95-8, GM12878, and GM18502) contain clusters that exhibit metabolic transcriptional profiles in stark contrast with typical expression in each population (Figure 4). Cells within these clusters account for 1–4% of the three samples after QC (Figure 4A) and are most notable for their low expression of mitochondrial genes (Figure 4B). In the case of LCL 461 B95-8 and GM18502, these cells are the first to partition from the rest of the sample at low clustering resolution (Figure 4—figure supplements 15).

Figure 4 with 7 supplements see all
Lymphoblastoid cell line (LCL) subpopulations exhibiting reduced mitochondrial gene expression and elevated metabolic and oxidative stress genes.

(A) Clustering of dimensionally reduced data sets for LCL 461 B95-8, GM12878, and GM18502. (B) Distinct clusters within each of these samples are defined by uncharacteristically low mitochondrial gene expression. (C) Grouping of cell clusters to partition ‘mito-low’ cells (cyan) for differential expression comparison. (D) Mito-low cells exhibit reduced expression of cytochrome oxidase (MT-CO1 and MT-CO2), NADH-ubiquinone oxidoreductase (MT-ND1 and MT-ND2), MALAT1, and numerous lymphoid and B-cell lineage markers (CD19, MS4A1/CD20, PTPRC/CD45, CD74, and HLA-A). Mito-low cells exhibit increased expression of genes associated with cytoskeletal rearrangements (ACTB and TUBB), metabolic stress (PKM, ENO1, and LDHA), protein folding/degradation (HSP90AB1, PSMA1, and PPIA), and oxidative stress (TXN and PRDX1).

Compared to the rest of each sample, these atypical cells exhibit significantly depleted levels of cytochrome c oxidase (MT-CO1 and 2, complex IV) and NADH-ubiquinone oxidoreductase subunits (MT-ND1 and 2, complex I) as well as a lack of canonical markers of lymphoid (e.g., PTPRC/CD45, CD74), B cell-specific lineage (e.g., CD19, MS4A1/CD20), and in some cases, MHC class I and II antigen presentation (e.g., HLA-A, HLA-B, HLA-C, and HLA-DRFigure 4C and D, Figure 1—figure supplements 7 and 9, Figure 4—figure supplement 6).

Expression of genes involved in oxidative stress (TXN and PRDX1), unfolded protein responses (PPIA and HSP90AB1), metabolic shunt pathways (PKM, ENO1, and LDHA), and cytoskeletal rearrangements (ACTB and TUBB) is enriched consistently in this subset relative to the bulk population in each of the three LCLs (Figure 4D, Figure 2—figure supplement 2). Ig heavy chain transcripts are notably absent from these subpopulations, although some degree of light-chain expression is observed (Figure 1—figure supplements 79). While these cells are on the low end of the population distribution with respect to total RNA counts and unique feature RNAs (Figure 4—figure supplement 7), the measured values are consistent with intact, viable cells.

A stochastic model for LCL phenotypic heterogeneity

A simple stochastic simulation based on a discrete-time Markov chain model (Škulj, 2006) was developed to understand better the factors that may influence phenotypic heterogeneity observed in LCLs, using Ig isotype frequencies as an example (Figure 5). In principle, the simulation may be adapted to any set of phenotypes within a sample. For additional details regarding model parameters and assumptions, please see the Materials and methods (Stochastic simulations) and refer to the source code (Source code 2).

Stochastic simulation of heterogeneous lymphoblastoid cell line (LCL) evolution.

(A) Stochastic immunoglobulin isotype frequency evolution. Three random single-trial simulations initiated from the same starting class frequencies are presented, assuming equal likelihood of proliferation across isotype classes (n = 1000 cells). The last panel shows mean and standard deviation for outcomes from 100 trials simulated from the same parameters. (B) Simulation of a founder effect. Population under-sampling (modeled by comparing results from 25 trials using n = 100, 500, 1000, and 5000 cells, left-to-right panels) increases outcome variance and accelerates convergence to a single isotype. (C) Effect of phenotype-specific fitness advantages. Simulation results are presented for scenarios in which class-switched isotypes (IgA, IgG, and IgE) have a 1% (left panel) or 2% (right panel) fitness advantage over IgM cells. (D) Four random single-trial simulations over long periods of time (1000 division rounds) with a 1% fitness advantage for class-switched cells (left panels) compared to 10 trials over the same period with equal fitness across classes. (E) Single-trial isotype frequency evolution and corresponding simulated clustering (see Materials and methods) in the case of equal proliferation probability. Starting frequencies of IgM, IgA, IgG, and IgE cells are 89%, 5%, 5%, and 1%, respectively. (F) As in E, with a 1% fitness advantage for class-switched cells. (G) As in E, with a 2% advantage for class-switched cells.

In the present implementation, changes in Ig isotype frequency can be simulated in discrete steps (rounds of cell division) as a function of initial phenotype frequencies, population sampling (with replacement), and potential differences in phenotypic fitness captured as fixed, (un)equal isotype-specific proliferation probabilities. The model assumes a fixed cell death rate across all isotypes in any given division round. The number of simulated trials can be adjusted to capture individual stochastic realizations or probabilistic outcome distributions. Each parameter and assumption can be adjusted by the user for tailored applications.

Three randomly selected realizations and averaged outcomes (trials = 100) of the model for a fixed sample size (n = 1000 cells) demonstrate the effects of intrinsic stochasticity on the evolution of phenotype proportions over many rounds of cell division (rounds = 300), even when each phenotype confers equivalent fitness (Figure 5A). In the case of equal fitness and sufficient sample size, initial phenotype frequencies are a key determinant of whether the most prevalent phenotype will change over time because of stochasticity.

The effect of sample size on inter-trial variance can be substantial, even when cell populations are sampled with replacement to maintain phenotype proportions in each round (Figure 5B). Mean phenotype proportions are generally conserved, whereas trial standard deviation decreases as the sample size increases (trials = 25, rounds = 300, n = 100, 500, 1000, or 5000 cells). This is generally expected, since undersampling increases the likelihood that phenotype frequencies in the drawn sample will deviate from those of the population, even in the case of replacement.

It is notable that minor differences in relative fitness (1–2%) can lead to dramatic changes in isotype distributions over time (Figure 5C). The rate of such change is proportional to the magnitude(s) of fitness differences (n = 1000 cells, rounds = 300). Four randomly selected clonal evolution trajectories realized with a modest fitness advantage (2%) for class-switched cells (IgA, IgE, and IgG) reveal the potential for drastic variations when multiple rare phenotypes with a fitness advantage exist (n = 2500 cells, trials = 10, rounds = 1000). Thus, rare cells may become prevalent or even dominant over time if they exhibit only slightly greater fitness relative to other cells in some environmental context (e.g., cell culture). In such cases, observed phenotype frequencies can deviate wildly from expectations of equal fitness over time (Figure 5D).

Cluster simulation was implemented by random sampling from four arbitrary, isotype-specific 2D normal distributions based on empirical observations that Ig isotypes yield distinct clusters in dimensionally reduced single-cell RNA-seq data (Figure 5E–G). Simulated clusters were generated from randomly selected trials initiated from the same initial phenotype distribution (IgM = 89%; IgA = 5%; IgG = 5%; IgE = 1%) at three different relative fitness advantages (0%, 1%, and 2%) for class-switched isotypes. In all cases, the proportion of observed cells in each cluster fluctuates over time. As expected, the presence or absence of observed phenotypic heterogeneity (in this example, isotype polyclonality) in a cell population is a complex function of relative frequency, fitness, sampling (i.e., bottlenecks), stochasticity, and time (Ewens, 2012; Nowak, 2006).

Discussion

Ig isotype heterogeneity in LCLs

LCL clonality is known to change over time, although the factors involved in this evolution are not fully characterized (Ryan et al., 2006). PBMC derivation from multiple donors is an obvious source of cellular heterogeneity in the analyzed samples presented herein. B cells from peripheral blood (≈5–10% of all lymphocytes) comprise wide ranges of naïve (≈50–80%, mean ≈ 65%) and memory (≈15–45%, mean ≈ 30%) cells, with immature/transitional and plasmablasts accounting for smaller proportions (≈1–10%, mean ≈ 5% and ≈0.5–4.5%, mean ≈ 2%, respectively; Perez-Andres et al., 2010). Within the memory cell compartment, proportions of non-switched (IgM) and switched memory (IgA, IgG, and technically, IgE) are also likely donor-specific. The negligible number of IgE+ cells present across the samples can be explained by the isotype’s low frequency in the peripheral blood (He et al., 2017). A notable limitation of this study is the lack of access to (GM12878 and GM18502) or retention of (LCL_461 and LCL_777) original donor primary B cells and longitudinal sampling, which would have provided direct insights into donor-dependent cellular heterogeneity.

It is evident from LCL 777 B95-8 and LCL 777 M81 samples that inter-donor differences cannot fully explain the observed isotype heterogeneity in LCLs. While it may be tempting to attribute the observed differences to infection with different viral strains, there is ample experimental evidence that EBV infection does not induce class-switching (Miyawaki et al., 1991). The disparity in isotype frequencies is notable since these samples were transformed, cultured, prepared, and sequenced in parallel (i.e., under equivalent conditions and within the same interval).

The polyclonality exhibited within LCL 777 B95-8 and LCL 777 M81 contrast with the dominance of a single isotype in LCL 461 B95-8 and GM18502 samples (in each case, IgG). The only notable difference between LCL 461 B95-8 and LCL 777 B95-8 is that the former sample was in culture substantially longer prior to single-cell library preparation. Given that the GM18502 line was derived more than a decade ago, these observations implicate the influence of culture period in significantly altering the isotype proportions present within LCLs, which is altogether consistent with known (and profound) challenges associated with cell culture (Hughes et al., 2007; Briske-Anderson et al., 1997; O’Driscoll et al., 2006). In this regard, the data from GM12878 merit remark. The finding of polyclonality in this sample is surprising, given that GM12878 has been in culture over a timescale comparable to GM18502 (Anders and Huber, 2010). Forgoing the possibility of errors in sample handling or procurement, the persistence of genetic heterogeneity in this line is both intriguing and potentially confounding. Whether or to what extent cellular diversity may influence observed results will inevitably vary on a study-specific basis, but sample-intrinsic variance should be considered even when homogeneity is presumed (Choy et al., 2008; Morley et al., 2004).

Multiple isotypes within an LCL sample guarantee clonal diversity, but the presence of a single isotype does not necessarily ensure the inverse (intra-sample homogeneity). While not in the scope of the present study, B cell receptor (BCR) 5’ single cell sequencing of LCL samples could provide insights into variable regions and whether subpopulations of a given isotype are the progeny of one or multiple founder cells (and whether this changes over time).

Viral origins of LCL phenotypic variance

NFκB pathway signaling is constitutively activated by viral LMP-1 in EBV-transformed B cells (Devergne et al., 1996). LMP-1 induction of the NFκB pathway is necessary for LCL survival (Cahir-McFarland et al., 2000; Kaye et al., 1993; Dirmeier et al., 2003); however, the observed intra- and inter-LCL variance in transcript levels of NFκB and several of its transcriptional targets add nuance to this picture. Similar profiles of NFκB pathway transcript levels across samples may constitute a snapshot of the most probable distribution arising from stochastic NFκB target expression induced by EBV infection. This may arise from a transcriptional bursting mechanism in which mRNA transcript levels in each cell fluctuate over time (as a Poisson process) while the proportion of cells containing n transcripts in a population at any given time is roughly constant (Raj et al., 2006; Raj and van Oudenaarden, 2008; Weinberger et al., 2005; Behar and Hoffmann, 2010; O'Dea et al., 2007). Alternatively, or perhaps additionally, variation in NFκB pathway activity may be a manifestation of the different viral latency states present within each sample, as indicated by correlation with host markers of latency IIb and III.

The distinct anticorrelation between NFκB/viral latency program and B lymphocyte differentiation genes is noteworthy. While a mechanism imparting causality to this relationship is not yet fully clear, recent time-resolved bulk transcriptomic data revealed that EBV-induced plasma cell phenotypes (including upregulation of XBP1) developed as early as the pre-latent phase of infection (1–14 days; Mrozek-Gorska et al., 2019). Correlated expression of MZB1 with XBP1, TNFRSF17, CD27, and CD38 support the model that the development of plasma cell characteristics is reminiscent of germinal center differentiation. Single-cell data adds complexity to this finding and its consequences for LCL heterogeneity even after long-term outgrowth. Specifically, EBV transformation in vitro appears to maintain B cells along a continuum of differentiation states, each with varying degrees of similarity to phenotypes observed in vivo (Price and Luftig, 2015). In the case of LCL generation, the multiple transcriptional programs of the transformant likely constitute an inescapable source of phenotypic heterogeneity.

The low number of observed lytic cells is likely a consequence of EBV’s predominant latency and the fact that lytic reactivation is by nature somewhat incompatible with single-cell RNA-seq methods. However, these small subpopulations provide an interesting case for examination. The spatial proximity of lytic clusters in LCL 777 B95-8 and LCL 777 M81 to plasma-like clusters resulting from tSNE dimensional reduction implied phenotypic similarity; however, we found that this is likely an artifact of the tSNE algorithm since UMAP dimensional reduction did not preserve this proximity. Notwithstanding, XBP1 upregulation in plasma cells has been shown to transactivate the viral BZLF1 promoter and induce lytic reactivation (Sun and Thorley-Lawson, 2007; Laichalk and Thorley-Lawson, 2005). Lytic cells also display relatively high and polyclonal Ig heavy chain expression in addition to other shared characteristics with plasma-like cells (reduced expression of NFκB subunits and its targets). By contrast, lytic cells exhibit notably reduced levels of B cell differentiation transcripts. Thus, viral transcription changes in dynamic response to host cell programs (and vice versa) contribute to the observed LCL diversity. Prior work has shown that the viral proteins EBNA3A and EBNA3C suppress plasma-like phenotypes during EBV latency establishment (Styles et al., 2017). The possibility that EBV may undergo lytic reactivation in response to plasma cell differentiation as a means of maintaining persistent latent infection is a topic of future interest.

Host genes upregulated within lytic cluster cells (e.g., NFATC1, MIER2, SFN, and SGK1) represent a limited subset of transcription factors associated with B (and T) lymphocyte activation (Peng et al., 2001; Tsitsikov et al., 2001), several of which have been recently identified at various degrees of enrichment within lytic cells (Frey et al., 2020). The presence of NFATC1 is particularly notable considering the recent report of this factor contributing to the spontaneous lytic phenotype of type 2 EBV by upregulating expression of BZLF1 to promote the lytic gene expression cascade (Romero-Masters et al., 2020).

Although PC loadings reveal substantial upregulation of more than a dozen EBV lytic genes, cells within the lytic clusters curiously lack expression of BZLF1, which plays a role in the latent-to-lytic transition (Bhende et al., 2004). The absence of BZLF1 reads (and low mRNA counts generally) ostensibly may result from factors including naturally low transcript abundance, reduced transcript capture efficiency, and/or reduced efficiency of reverse transcription to cDNA owing to RNA secondary structural motifs (Ozsolak and Milos, 2011).

‘Marker-less’ subpopulations

The small populations of cells in LCL 461 B95-8, GM12878, and GM18502 characterized by low mitochondrial gene expression and a dearth of canonical B cell markers are curiosities. These cells share similarities with exhausted plasma cells, most notably an apparent loss of Ig heavy chain expression while retaining moderate kappa and light chain expression (Köhler, 1980; Haas and Wabl, 1984), and hallmarks of oxidative stress including upregulated thioredoxin expression (Fernando et al., 1992; Lu and Holmgren, 2014; Muri et al., 2018; Muri et al., 2020). Low levels of NFκB pathway transcripts in these clusters most closely resemble expression profiles of cells with a plasma-like phenotype in the same samples. It is unlikely that these cells are immature, naïve, or transitional B cells, given that neither IGHM nor IGHD expression is observed. Loss of lineage marker expression is suggestive of a tumor-like phenotype (Schwering et al., 2003).

Factors in the evolution of subclonal heterogeneity

Cellular diversity abounds even within presumptive clonal lines. For LCLs generated from EBV-transformed primary B cells, the list of parameters affecting the cell population’s phenotypic profile includes donor-specific frequencies of non-switched and switched memory B cells, heterogeneous states of viral infection, phenotype-specific differential fitness in culture, stochasticity, and time. By definition, some degree of differential fitness exists among cells in each sample as a consequence of the variability in pro-survival, proliferation, and anti-apoptotic genes. Mechanistically, a portion of this variance is expected to arise from heritable yet transient epigenetic signatures (Shaffer et al., 2020). Indeed, epigenetic diversity affecting chromatic architecture across LCL subclones from a single donor was recently demonstrated through ChIP-Seq analysis (Ozgyin et al., 2019). Lastly, as a principle of evolution, phenotypic differences do not necessarily have to be selected directly; they may simply be carried over in cells possessing other selected features. With respect to the stochastic model presented herein, the simulated phenotype advantage of class-switched memory vs. non-switched memory cells need not be construed as originating from heavy chain isotype expression.

Experimental procedures including cell passaging and the initial transformation itself may contribute to variance among LCLs. As an illustration, consider that 1 million PBMC has around 25,000 B cells, of which 7500 (30%) on average are memory cells of various classes. If the rate of transformation leading to LCL outgrowth is 10%, then ≈750 memory cells out of 1 million PBMCs define the initial isotype frequency of the eventual LCL. This sample size is small relative to the donor’s total memory B cell compartment and may lead to founder cell effects. Consequently, B cell population undersampling may be a foregone conclusion in the context of LCL preparation.

Additional studies that utilize time-resolved single-cell sampling from original B cells through early infection and long-term LCL outgrowth in culture will be essential to explore further the factors contributing to longitudinal stability and variation in transcriptional profiles of B cells immortalized by EBV infection. Moreover, while the transcriptomic profiles we report provide a valuable resource, additional molecular layers must be interrogated through parallel -omics techniques (e.g., ATAC-seq and DNA methylation) across individual cells to understand deeply the mechanistic underpinnings of transcriptional heterogeneity.

Conclusion

Single-cell RNA sequencing reveals that LCLs including widely used commercial lines exhibit substantial phenotypic diversity. During the early stages of LCL generation, EBV infection drives cell proliferation by mimicking the process of B cell activation. After successful LCL outgrowth, infected B cells occupy a range of phenotypic states along a continuum between activation and plasma cell differentiation and, in some cases, exhibit signs of lytic reactivation. The diversity observed within LCLs (and cultured lines generally) can originate from intrinsic heterogeneity within primary cells, transcriptional programs of the viral transformant, and the realization of inherently stochastic processes (including certain gene expression programs) over time. The data reported herein enable extensive hypothesis generation and interrogation of aspects of B cell biology, EBV pathogenesis, and host–virus interactions. Moreover, this work highlights the importance of considering the possible sources and experimental consequences of cell population heterogeneity when using cultured cell lines.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Biological sample (Homo sapiens)Whole bloodGulf Coast Regional Blood CenterMultiple donors; sources of PBMCs for LCL_461 and LCL_777 preparation
Cell line (Homo sapiens)B95-8 Z-HTThis paper; Price et al., 2017Stimulated to obtain B95-8 strain (Type 1 EBV) viral supernatants
Cell line (Homo sapiens)M81This paper; Tsai et al., 2013Stimulated to obtain M81 strain (Type 2 EBV) viral supernatants
Cell line (Homo sapiens)LCL_461This paper;
Price et al., 2017
Prepared from donor PBMCs
Cell line (Homo sapiens)LCL_777This paperPrepared from donor PBMCs
Cell line (Homo sapiens)GM12878Coriell InstituteRRID:CVCL_7526White female donor
Cell line (Homo sapiens)GM18502Coriell InstituteRRID:CVCL_P459Yoruba female donor
AntibodyAnti-human CD54 (ICAM-1), PE-conjugated (mouse monoclonal)BiolegendCat #353106Clone #HA58
Sequence-based reagent5’ L-VH 1This paper; Tiller et al., 2008PCR primersACAGGTGCCCACTCCCAGGTGCAG
Sequence-based reagent5’ L-VH 3This paper; Tiller et al., 2008PCR primersAAGGTGTCCAGTGTGATGTGCAG
Sequence-based reagent5’ L-VH 4/6This paper; Tiller et al., 2008PCR primersCCCAGATGGGTCCTGTCCCAGGTGCAG
Sequence-based reagent5’ L-VH 5This paper; Tiller et al., 2008PCR primersCAAGGAGTCTGTTCCGAGGTGCAG
Sequence-based reagent5’ L-Vκ 1/2This paper; Tiller et al., 2008PCR primersATGAGGATCCCTGCTCAGCTGCTGG
Sequence-based reagent5’ L-Vκ 3This paper; Tiller et al., 2008PCR primersCTCTTCCTCCTGCTACTCTGGCTCCCAG
Sequence-based reagent5’ L-Vκ 4This paper; Tiller et al., 2008PCR primersATTTCTCTGTTGCTCTGGATCTCTG
Sequence-based reagent5’ L-Vλ 1This paper; Tiller et al., 2008PCR primersGGTCCTGGGCCCAGTCTGTGCTG
Sequence-based reagent5’ L-Vλ 2This paper; Tiller et al., 2008PCR primersGGTCCTGGGCCCAGTCTGCCCTG
Sequence-based reagent5’ L-Vλ 3This paper; Tiller et al., 2008PCR primersGCTCTGTGACCTCCTATGAGCTG
Sequence-based reagent5’ L-Vλ 4/5This paper; Tiller et al., 2008PCR primersGGTCTCTCTCACAGCTTGTGCTG
Sequence-based reagent5’ L-Vλ 6This paper; Tiller et al., 2008PCR primersGTTCTTGGGCCAATTTTATGCTG
Sequence-based reagent5’ L-Vλ 7This paper; Tiller et al., 2008PCR primersGGTCCAATTCTCAGGCTGTGGTG
Sequence-based reagent5’ L-Vλ 8This paper; Tiller et al., 2008PCR primersGAGTGGATTCTCAGACTGTGGTG
Sequence-based reagent3′ Cγ CH1 (IgG)This paper; Tiller et al., 2008PCR primersGGAAGGTGTGCACGCCGCTGGTC
Sequence-based reagent3′ Cμ CH1 (IgM)This paper; Tiller et al., 2008PCR primersGGGAATTCTCACAGGAGACGA
Sequence-based reagent3′Cα CH1 (IgA)This paper; Tiller et al., 2008PCR primersTGGGAAGTTTCTGGCGGTCACG
Sequence-based reagent3′ Cκ 543 (Kappa Light Chain)This paper; Tiller et al., 2008PCR primersGTTTCTCGTAGTCTGCTTTGCTCA
Sequence-based reagent3′ Cλ (Lambda Light Chain)This paper; Tiller et al., 2008PCR primersCACCAGTGTGGCCTTGTTGGCTTG
Commercial assay or kitSV96 Total RNA Isolation KitPromegaCat #Z3500
Commercial assay or kitHigh-Capacity cDNA Reverse Transcription KitThermoCat #4368814
Commercial assay or kitSingle Cell 3’ Reagent Kit Protocol, v2 chemistry10× GenomicsCat #CG00052
Commercial assay or kitiMag Negative Isolation KitBD BiosciencesCat #558007CD19+ B cell isolation from PBMCs
Software, algorithmCellRanger10× Genomicsv.2.0.0
Software, algorithmSeurat (R package)Satija et al., 2015; Stuart et al., 2019v.3.1.5

PBMC isolation and transformation with EBV

Request a detailed protocol

Whole blood samples from two normal donors (777 and 461) were obtained from the Gulf Coast Regional Blood Center. PBMCs were isolated from each sample by Ficoll gradient (Sigma, # H8889). CD19+ B cells were extracted from each PBMC sample through magnetic separation (BD iMag Negative Isolation Kit, BD, # 558007). Purified B cells were cultured in RPMI 1640 media supplemented with 15% fetal calf serum (FCS, vol./vol., Corning), 2 mM L-glutamine, penicillin (100 units/mL), streptomycin (100 μg/mL, Invitrogen), and cyclosporine A (0.5 μg/mL).

B95-8 and M81 strains of EBV were generated from the B95-8 Z-HT and M81 cell lines, respectively, as described previously (Johannsen et al., 2004). Separate bulk infections of B cells were performed by incubating donor B cells with B95-8 Z-HT or M81 supernatants for 1 hr at 37°C, 5% CO2 to produce the following cultures: 777_B95-8, 777_M81, and 461_B95-8. After virus incubation, cells were rinsed in 1× PBS and resuspended in R15 media. LCL outgrowth was achieved from each of these three samples, resulting in LCL_777_B95-8, LCL_777_M81, and LCL_461_B95-8.

Cell lines and culture

Request a detailed protocol

LCL 777_B95-8, 777_M81, and 461_B95-8 were generated in our laboratory by infection of primary human B cells obtained from the Gulf Coast Regional Blood Center with EBV strains B95-8 and M81. These lines were confirmed to be mycoplasma negative using the Sigma Lookout PCR kit.

All three in-house LCL samples were cultured in supplemented RPMI media as described above, substituting 10% FCS instead of 15% FCS. Prior to single-cell sample preparation, LCL_777_B95-8 and LCL_777_M81 were maintained in culture for approximately 1 month, whereas LCL_461_B95-8 was cultured for longer than 6 months. Immediately prior to single-cell sample preparation, LCLs were resuspended and disaggregated.

LCL samples and data

Request a detailed protocol

LCL_777_B95-8, LCL_777_M81, and LCL_461_B95-8 were created as described above. LCLs GM_12878 and GM_18502 were obtained, prepared, sequenced, and aligned as described by Osorio and colleagues (Osorio et al., 2019). Briefly, these samples were obtained from the Coriell Institute for Medical Research, cultured for several days, and then prepared as single-cell GEMs (Gel bead in Emulsions) with the 10× Genomics Chromium system using version 2 chemistry for total RNA. Single-cell sequencing libraries were generated using established 10× Genomics protocols, and sequencing was performed with a Novaseq 6000 (Illumina, San Diego). Unique Molecular Identifier (UMI) count matrices were generated from these samples by using CellRanger v.2.1.0 with alignment to the hg38 version of the human reference genome. Additional information about the experimental handling and acquisition of data for GM12878 and GM18502 is provided in the original reference (Osorio et al., 2019). Gene-barcode matrix files for each sample were downloaded from the Gene Expression Omnibus (accession ID: GSE126321) and subsequently analyzed along with data from LCL_777_B95-8, LCL_777_M81, and LCL_461_B95-8 samples, while the LCL_461_B95-8 sample was run in a separate experimental batch.

Single-cell RNA sample preparation and sequencing

Request a detailed protocol

Single-cell RNA samples for LCL_777_B95-8, LCL_777_M81, and LCL_461_B95-8 were prepared using the General Sample Preparation demonstrated protocol from 10× Genomics (10×, Manual Part #CG00053) adapted from the original published methods (Zheng et al., 2017). Briefly, disaggregated LCLs were resuspended in fresh 1× PBS supplemented with 0.04% BSA, stained with trypan blue to assess viability, and counted using a hemocytometer for preparation to target concentration.

Single-cell libraries for sequencing were prepared from each sample using the methods described in the 10× Genomics Single Cell 3’ Reagent Kit Protocol (v2 chemistry, Manual Part #CG00052). In brief, GEMs were prepared using the 10× Chromium Controller, after which cDNA synthesis and feature barcoding were performed and sequencing libraries for each sample were constructed. Sequencing runs were performed on an Illumina HiSeq 3000/4000 (Illumina, San Diego). Samples for LCL_777_B95-8 and LCL_777_M81 were sequenced in a pooled run in a single HiSeq lane.

Raw base call files (*bcl.gz) from sequencing runs were processed using CellRanger v.2.0.0 to generate fastq files (*fastq.gz) via CellRanger’s ‘mkfastq’ command. CellRanger’s ‘count’ command was then used to align reads from the three in-house LCL samples to the human reference genome (hg38) with the Type 1 EBV reference genome (NC_007605) concatenated as an extra chromosome (reflecting the episomal nature of the EBV genome within infected B cells). This process yielded gene-barcode matrices (UMI count matrices) for subsequent analysis.

Sample QC, analysis, and visualization

Request a detailed protocol

UMI count matrices for all five LCL samples were analyzed using the Seurat single-cell analysis package for R (Seurat v.3.1.5; Satija et al., 2015; Stuart et al., 2019). Filtered barcode matrices were loaded into Seurat, after which genes present in fewer than three cells and cells expressing fewer than 200 unique RNA molecules (features) and more than 65,000 unique features were filtered out. Additionally, cells in which mitochondrial genes accounted for greater than 5% of all transcripts were excluded from analysis. Beyond the uniform application of QC steps, we did not investigate the potential for batch-specific effects across the five samples run in four experiments. After QC thresholding, feature data were normalized and scored for cell cycle markers. Cell cycle scoring was used to regress out S and G2M gene features to remove variance (and unwanted effects on clustering) in the data sets arising from cell cycle phase. Cell cycle-corrected data were then scaled, and selection was performed to find the highest-variance features. PCA was performed on selected (n = 2000) variable features, and PCs were subsequently used to define distinct subpopulations within each of the five samples. For visualization, PCs were used to generate clusters at various resolutions and dimensionally reduced using tSNE. The R code used to process data and produce figures presented in this manuscript is provided as a supporting file (Source code 1), and the Python code used for simulations is provided as a supporting file (Source code 2) and is also available on GitHub (https://github.com/esorelle/ig-evo-sim; copy archived at https://archive.softwareheritage.org/swh:1:dir:8c47b2c0202aa8f255380c742a3cda3ff777abc7/).

PCR validation experiments

Request a detailed protocol

Cell pellets were collected for each of the five LCLs, and total mRNA was extracted from each pellet using the Promega SV96 Total RNA Isolation Kit (Promega, cat # Z3500) and quantified using a NanoDrop 2000 spectrophotometer (Thermo). Total mRNA was then used to create cDNA pools for each sample using a High-Capacity cDNA Reverse Transcription Kit (Thermo, cat # 4368814). Previously reported primer sequences flanking each heavy (IgM, IgA, and IgG) and light chain (Ig kappa and Ig lambda) gene of interest (Tiller et al., 2008; listed below) were purchased from Integrated DNA Technologies (IDT) and used to amplify each cDNA pool using standard procedures across a temperature gradient. PCR products and loading dye (Gel Loading Dye, Purple [6×], NEB, cat # B7025) were run on 2% agarose gels with SYBR Safe at 120 V for 45 min with a 100 base pair ladder (NEB, N3231S) and subsequently visualized using a LI-COR Odyssey Fc Imaging System (LI-COR Biosciences). For LCL_777_B95-8 and GM12878, PCR products were sequenced (GeneWiz) and aligned to assess clonality (imgt.org).

ICAM-1 cell sorting, proliferation, and metabolic assays

Request a detailed protocol

LCLs were stained with CD54 (ICAM-1) antibody (PE, Biolegend #HA58) according to the supplier’s manual. Then, cells were sorted on a Beckman Coulter Astrios cell sorter by anti-CD54 fluorescence, with ICAM-1-high and ICAM-1-low being defined as the top 15% and bottom 15%, respectively. 24 hr after sorting of ICAM-1-high and ICAM-1-low LCLs, extracellular acidification rate (ECAR) and oxygen consumption rate (OCR) were measured using the Seahorse XF24 extracellular flux analyzer (Agilent Technologies) Cell Energy Phenotype Test. Suspension LCLs were attached to culture plates by using Cell-Tak (BD Bioscience). ECAR and OCR were measured in Seahorse XF Base Medium supplemented with 1 mM pyruvate, 2 mM glutamine, and 10 mM glucose (Sigma Aldrich). ECAR and OCR values were normalized to cell number. For stress measurements, ECAR and OCR were measured over time after injection of oligomycin and FCCP. Metabolic potential measures the ability of cells to meet energetic demands under conditions of stress and is the percentage increase of stressed over baseline ECAR or OCR.

Stochastic simulations

Request a detailed protocol

The concept of a discrete-time Markov chain was adapted to simulate the evolution of phenotype frequencies, using immunoglobulin heavy chain isotype distributions within LCLs as an example. Briefly, the simulation takes as input a cell population of size n comprising B cells of different Ig heavy chain isotype classes at user-defined initial frequencies, fixed probabilities of proliferation in synchronous rounds of cell division, and a constant cell death rate assumption (also user-defined). Within the scope of computational feasibility, users can specify the number of rounds of cell division to simulate and the number of simulation trials to run. Additionally, users may choose to generate simulated cluster data modeled from distinct 2D normal distributions for each isotype for a specified number of trials at fixed intervals (i.e., every nth cell division round). The simulation was implemented in Python, and the code used to generate the simulated data is provided as a supporting file. The code is also available at (add as public GitHub repo) and may be freely implemented and modified.

Source data files

Request a detailed protocol

Raw sequencing data for the three previously unpublished samples (LCL_777_B95-8, LCL_777_M81, and LCL_461_B95-8) are deposited in the NCBI Sequence Read Archive (SRA) and can be accessed along with processed data from the NCBI Gene Expression Omnibus (GEO, Series Accession: GSE158275).

Data availability

Raw sequencing data for the three previously unpublished samples (LCL_777_B958, LCL_777_M81, and LCL_461_B958) are deposited in the NCBI Sequence Read Archive (SRA) and can be accessed along with processed data from the NCBI Gene Expression Omnibus (GEO, Series Accession: GSE158275).

The following data sets were generated
    1. SoRelle ED
    2. Dai J
    3. Zhou JY
    4. Giamberardino SN
    5. Bailey JA
    6. Gregory SG
    7. Chan C
    8. Luftig MA
    (2020) NCBI Gene Expression Omnibus
    ID GSE158275. Single-cell characterization of transcriptomic heterogeneity in lymphoblastoid cell lines.
The following previously published data sets were used
    1. Osorio D
    2. Yu X
    3. Yu P
    4. Serepedin E
    5. Cai JJ
    (2019) NCBI Gene Expression Omnibus
    ID GSE126321. Single cell RNA sequencing of lymphoblastoid cell lines of European and African ancestries.

References

    1. Briske-Anderson MJ
    2. Finley JW
    3. Newman SM
    (1997) The influence of culture time and passage number on the morphological and physiological development of Caco-2 cells
    Proceedings of the Society for Experimental Biology and Medicine. Society for Experimental Biology and Medicine 214:248–257.
    https://doi.org/10.3181/00379727-214-44093
    1. Desgranges C
    2. Lenoir G
    3. de-Thé G
    4. Seigneurin JM
    5. Hilgers J
    6. Dubouch P
    (1976)
    In vitro transforming activity of EBV. I-Establishment and properties of two EBV strains (M81 and M72) produced by immortalized callithrix jacchus lymphocytes
    Biomedicine 25:349.
    1. Dirmeier U
    2. Neuhierl B
    3. Kilger E
    4. Reisbach G
    5. Sandberg ML
    6. Hammerschmidt W
    (2003)
    Latent membrane protein 1 is critical for efficient growth transformation of human B cells by epstein-barr virus
    Cancer Research 63:2982–2989.
  1. Book
    1. Longnecker RM
    2. Kieff E
    3. Cohen J
    (2013)
    Fields Virology: Sixth Edition
    Wolters Kluwer Health Adis (ESP).
  2. Book
    1. Nowak MA
    (2006)
    Evolutionary Dynamics: Exploring the Equations of Life
    Harvard University Press.
    1. Rickinson A
    2. Kieff E
    (2007)
    Epstein-Barr virus
    Fields Virology 2:2655–2700.
    1. Tazzari PL
    2. de Totero D
    3. Bolognesi A
    4. Testoni N
    5. Pileri S
    6. Roncella S
    7. Reato G
    8. Stein H
    9. Gobbi M
    10. Stirpe F
    (1999)
    An Epstein-Barr virus-infected lymphoblastoid cell line (D430B) that grows in SCID-mice with the morphologic features of a CD30+ anaplastic large cell lymphoma, and is sensitive to anti-CD30 immunotoxins
    Haematologica 84:988–995.

Decision letter

  1. Melanie M Brinkmann
    Reviewing Editor; Technische Universität Braunschweig, Germany
  2. Päivi M Ojala
    Senior Editor; University of Helsinki, Finland
  3. Erik K Flemington
    Reviewer; Tulane University School of Medicine, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Single-cell characterization of transcriptomic heterogeneity in lymphoblastoid cell lines" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Erik K Flemington (Reviewer #2).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

The reviewers clearly appreciated your work which, by including five LCL cell lines from different donors and two different virus strains, generates a valuable resource to understand the dynamics between latent and lytic infection in EBV-infected B cells and B cell differentiation versus activation. However, the results and conclusions were not really validated and the work remains quite descriptive. We came to the conclusion, especially taking into regard the comments of reviewer #3, that this work in its current state is not of profound general interest to the readers in the non-EBV field and therefore not suited for publication in eLife.

Reviewer #1:

The authors report single cell RNA sequencing of five LCL lines. Three of them are recently derived, two with the B95-8 and one with the M81 EBV, and two have been in culture for prolonged periods of time. Two LCLs are from the same donor, one B95-8 and one M81 transformed. Three are predominated by IgG switched memory, one by IgA switched memory and only one by IgM, originating from naïve or unswitched memory B cells. Interestingly, B cell activation and NF-kappaB expression correlates inversely with both antibody isotype class switching and plasma cell differentiation. Lytic EBV gene expression can only be observed in a small subset, primarily in the LCLs from the same donor transformed with B95-8 and M81 viruses but is not higher in the M81 transformed LCL. Finally, the authors model how heterogenous LCL composition might evolve, but the GM12878 cell line demonstrates that this does not necessarily drive to clonality in all instances. The authors suggest that LCLs go through a founder bottleneck that renders possibly every LCL different and that this should be taken into account in using these cellular models.

The study described the comprehensive analysis of five LCLs and generates a valuable resource to understand the dynamics in EBV infected B cells between latent and lytic infection and B cell differentiation versus activation. However, some more information on the correlation of LCL proliferation with gene expression, latent EBV gene expression and antibody isotype composition from uninfected to LCL to lytic reactivation should be provided.

1) The gene expression analysis suggests inverse correlation of B cell activation and differentiation. Is this reflected by LCL proliferation in vitro? Do the LCLs with higher frequencies of differentiated LCLs proliferate slower than for example LCL777 B95-8.

2) The authors report lytic EBV gene expression, but presumably latent transcripts were rarely sequenced due to their low transcript number per cell. Nevertheless, it would be interesting if LMP1 expression frequency is elevated in LCLs with higher frequencies of activated cells and diminished in IgA or IgG expressing cells. The authors should attempt to address these questions by alternative means like flow cytometry or immune fluorescence microscopy.

3) Does lytic EBV reactivation occur in all antibody isotype carrying subpopulations similarly, or is it enriched in XBP1 positive IgG and IgA carrying B cells? Is there any preference between IgA and IgG expressing differentiated B cells?

4) Do the LCLs reflect peripheral B cell composition of the donors at all? Does for example donor 777 have a higher percentage of IgA positive B cells in the peripheral blood than 461.

5) GM12878 might argue that LCL composition could be stable over time in some LCLs and not necessarily drift towards monoclonality. Do the authors have any longitudinal information on BCR isotype composition in their investigated LCLs?

Reviewer #2:

This paper demonstrates fairly wide diversity of cell transcriptomes within EBV derived LCL populations. This intra-cell population diversity extends even to cell lines that have been in culture for many years. This likely speaks to the principles driving transcriptional activation which derives from chance intermolecular interactions that are albeit favored or disfavored based on changes in chromatin and chromatin domain structures. The relevance is that the certain level of randomness of these principles can lead to dynamic changes in entire transcription and differentiation programs even within a single cell population. While there is already evidence for these kinds of issues in cell populations, this work brings out these principles in the context of LCLs/EBV (particularly striking are the findings of the presence of plasma blast-like populations and marker-less subpopulations. Overall, this paper provides important insights into the transcriptional and phenotypic diversity that exists in what might have previously been perceived as mostly uniform cell populations of tissue culture LCLs.

The authors have also been able to identify unique transcriptome signatures for reactivating cells which is potentially interesting from the standpoint of informing us on the nature of apparently stochastic events that trigger this transition to the EBV lytic phase. Nevertheless, given the lack of detection of BZLF1 (and possibly other lytic transcripts?), it would be helpful if the authors could provide additional evidence that these are true lytic cells vs abortive lytic cells (the latter of which would itself would be an interesting finding). It would be helpful if the authors could plot distributions of the percentage of viral lytic transcript reads to cell transcript reads in these populations of cells (this is hard to gauge from Figure 3C). Since herpesviral lytic infection typically results in a substantial proportion of lytic transcripts (minimum of 10% of all reads), this would help determine whether most of these cells are truly lytic or abortive lytic (perhaps through some epigenetic changes that lead to a higher level of transcriptional bursting of the EBV genome).

Reviewer #3:

Summary: Lymphoblastoid Cell Lines (LCLs) are induced by infecting primary B cells with Epstein-Barr Virus (EBV) and constitute a widely used cell line model in molecular biology, oncology and immunology. Understanding the intra- and inter-heterogeneity of these cell lines using single-cell analysis is key to design research and interpret experimental results. The authors induced three cell lines using two EBV strains and they performed single cell RNA-seq using the 10x platform and conducted a very classical analysis using Seurat. They added two datasets from the literature (Osorio et al., 2019). They describe that each cell line has a certain level of intra-heterogeneity with different Ig expression patterns, different maturation stages (Figure 1 and 2), as well as exhibit different viral lytic/latent stages (Figure 3) and mitochondrial gene expression (Figure 4).

General comments: This study is needed and interesting as a resource for the community of scientists using these cell lines but I see major problems in the design of the study and the analysis. Beyond doing scRNA-seq and displaying clustering analysis, the final aim and the novelty of the study are not clear to me. It is neither clear how one can use such analysis to guide his research. As presented, the data analysis is very preliminary (Figure 1—figure supplement 15-19 and Figure 4—figure supplement 1-4) and the study needs substantial improvements before publication in a top-tier journal such as eLife. No functional analysis is provided to indeed demonstrate that intra- and inter-variability has implications in study design.

1) Design of the experiments: the overall scope of the study is very limited with only five different donors and it is not clear what is the contribution of the different viral strains. The rational of the study design is not outlined. The authors have chosen to look at the transcriptome only; we would have expected to have more -omics for a resource paper such as ATAC-seq and methylome to document the underlying heterogeneity of the cells.

2) The description of the variability of LCLs is not novel. Ozgyin et al., 2019 have already provided a genotype-independent functional genomic variability of the LCLs. This study is not quoted.

3) The variability between the five LCLs is not clearly delineated. Each cell line is heterogenous (Figure 1 to 4) but the authors have studied the cell lines independently without attempting to merge the data. The inter-heterogeneity is very poorly documented. The data analysis to perform a merge would require substantial computational work that goes beyond the scope of a two-month revision. It would be very important to explain the contribution of the inter-individual genomic variability between donors.

4) The authors claim in the Abstract that "This heterogeneity is likely attributable to intrinsic variance in primary B cells and host-pathogen dynamics." In a publication in a top journal such as eLife, we expect that such a claim is documented with experiments. The authors claim that "primary cell heterogeneity, random sampling, time in culture, and even mild differences in phenotype-specific fitness" can contribute to such heterogeneity. These are very general claims that do not help and are not supported with experiments.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Single-cell characterization of transcriptomic heterogeneity in lymphoblastoid cell lines" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Päivi Ojala as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Erik K Flemington (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

Lymphoblastoid Cell Lines (LCLs) are induced when primary B cells are infected with the human herpesvirus Epstein-Barr Virus (EBV) and are a widely used model in molecular biology, oncology, and immunology. The authors have used single cell transcriptomics to demonstrate substantial phenotypic heterogeneity within and across LCLs. Hence, this work is important for researchers working with LCLs for the design and interpretation of experiments.

Revisions:

Please include in your discussion the points raised by reviewers #1 and #3 about the limitations of this study:

Reviewer #1: Due to lack of donor material prior to transformation by EBV the authors could not assess if the heterogeneity of their cell lines was donor dependent. Furthermore, they could not provide any information on the longitudinal stability of the reported heterogeneity.

Reviewer #3: It remains to uncover the origin of this variability using other layers of -omics and longitudinal sampling of the cells along the transformation process.

https://doi.org/10.7554/eLife.62586.sa1

Author response

[Editors’ note: The authors appealed the original decision. What follows is the authors’ response to the first round of review.]

The reviewers clearly appreciated your work which, by including five LCL cell lines from different donors and two different virus strains, generates a valuable resource to understand the dynamics between latent and lytic infection in EBV-infected B cells and B cell differentiation versus activation. However, the results and conclusions were not really validated and the work remains quite descriptive. We came to the conclusion, especially taking into regard the comments of reviewer #3, that this work in its current state is not of profound general interest to the readers in the non-EBV field and therefore not suited for publication in eLife.

We thank the reviewers and editors for their consideration and recognition of our work. Based on the unanimous conclusion by the reviewers and editors that the work “generates a valuable resource,” we request that the submission be reconsidered as a Tools and Resources article rather than a Research Article.

We have revised the manuscript with data from three additional experiments that directly address reviewer requests for validation and functional implications of our findings. Further, we have provided responses and references to address other questions raised during the review. These are described further in the point-by-point responses to each review comment. We believe that these revisions have strengthened our manuscript substantially and adequately resolve the reviewers’ concerns.

We strongly believe this work will be of interest to readers outside of the EBV field because of the widespread use of LCLs in genetic and immunological research (as we noted in the manuscript and in our cover letter accompanying the original submission). The diversity in viral transcription observed within the datasets will certainly be relevant to EBV researchers; the effects of viral transformation on host cell phenotypic heterogeneity for these widely used models are likewise relevant to a more general readership.

Reviewer #1:

[…]

1) The gene expression analysis suggests inverse correlation of B cell activation and differentiation. Is this reflected by LCL proliferation in vitro? Do the LCLs with higher frequencies of differentiated LCLs proliferate slower than for example LCL777 B95-8.

We have provided new data to address this question. We sorted LCLs by ICAM (a marker for activated cells) expression and found that ICAM-high populations exhibit more rapid proliferation than ICAM-low populations in the days immediately after sorting (Figure 2—figure supplement 8 in the revised manuscript). This highlights a functional implication of the observed transcriptomic heterogeneity with respect to cell population growth dynamics.

2) The authors report lytic EBV gene expression, but presumably latent transcripts were rarely sequenced due to their low transcript number per cell. Nevertheless, it would be interesting if LMP1 expression frequency is elevated in LCLs with higher frequencies of activated cells and diminished in IgA or IgG expressing cells. The authors should attempt to address these questions by alternative means like flow cytometry or immune fluorescence microscopy.

As recently reported by our lab (Messinger et al., 2019), FACS sorting by ICAM-1 expression serves as a reliable proxy for viral LMP-1 expression due to tight correlation (via EBV induction of NF-κB pathways) [see Figures 1B-D, 2C in Messinger et al., (2019)] and correlates directly to the extent of DNA replication in LCLs [see Figure 3D]. Additional RNA-FiSH data from the cited paper [see Figure 6] confirms that only a fraction of cells within LCL populations express LMP1. These findings are consistent with the newly provided cell proliferation and metabolic data sorted by ICAM status, which collectively indicate the functional role of elevated LMP1 in activated cell subpopulations.

3) Does lytic EBV reactivation occur in all antibody isotype carrying subpopulations similarly, or is it enriched in XBP1 positive IgG and IgA carrying B cells? Is there any preference between IgA and IgG expressing differentiated B cells?

The limited number of lytic cells preclude a definitive answer to this question; however, we note that cells expressing each of the three heavy chain isotypes are observed within the lytic cluster in LCL 777 B95-8 at isotype frequencies roughly corresponding to those of the whole cell population (Figure 1—figure supplement 5). This suggests similarity in lytic reactivation rates across isotypes [Note: there are even fewer lytic cells in LCL 777 M81, however the presence of IgA- and IgG-expressing cells is likewise observed (Figure 1—figure supplement 6)].

4) Do the LCLs reflect peripheral B cell composition of the donors at all? Does for example donor 777 have a higher percentage of IgA positive B cells in the peripheral blood than 461.

The diversity among LCLs almost certainly reflects different donor peripheral B cell compositions (as well as other factors including passaging and stochasticity – see Figure 5 simulations and corresponding results/discussion). However, we did not retain (and in the case of GM12878 and GM18502, do not have access to) primary cell samples prior to transformation, so it is not possible to answer this question dispositively for the reported lines. As noted by Heath and Rickinson [Heath et al., PLoS Pathogens (2012), https://journals.plos.org/plospathogens/article/comments?id=10.1371/journal.ppat.1002697], we would expect that this does reflect B cell composition of donor, however such composition can change over time.

5) GM12878 might argue that LCL composition could be stable over time in some LCLs and not necessarily drift towards monoclonality. Do the authors have any longitudinal information on BCR isotype composition in their investigated LCLs?

We did not acquire longitudinal data in this work.

Reviewer #2:

[…]

The authors have also been able to identify unique transcriptome signatures for reactivating cells which is potentially interesting from the standpoint of informing us on the nature of apparently stochastic events that trigger this transition to the EBV lytic phase. Nevertheless, given the lack of detection of BZLF1 (and possibly other lytic transcripts?), it would be helpful if the authors could provide additional evidence that these are true lytic cells vs abortive lytic cells (the latter of which would itself would be an interesting finding). It would be helpful if the authors could plot distributions of the percentage of viral lytic transcript reads to cell transcript reads in these populations of cells (this is hard to gauge from Figure 3C). Since herpesviral lytic infection typically results in a substantial proportion of lytic transcripts (minimum of 10% of all reads), this would help determine whether most of these cells are truly lytic or abortive lytic (perhaps through some epigenetic changes that lead to a higher level of transcriptional bursting of the EBV genome).

Thank you to reviewer #2 for making this important suggestion. We have re-analyzed the level of host and viral gene expression in the lytic cell clusters and have found that the percentage of lytic transcripts in cells varies across the lytic cluster (LCL 777 B95-8, Figure 3—figure supplement 1 in the revised manuscript). In the highest-expressing cells, the 20 most prevalent lytic genes account for 10-15% of all transcripts, which is consistent with truly lytic cells. Interestingly, about half of the cells in the cluster have less than 10% lytic transcript content. As the reviewer has noted, this could indicate abortive lytic replication; alternatively, these could be cells that had only recently initiated lytic replication at the time of sample preparation. We have added this analysis and corresponding discussion to the manuscript.

Reviewer #3:

Summary: Lymphoblastoid Cell Lines (LCLs) are induced by infecting primary B cells with Epstein-Barr Virus (EBV) and constitute a widely used cell line model in molecular biology, oncology and immunology. Understanding the intra- and inter-heterogeneity of these cell lines using single-cell analysis is key to design research and interpret experimental results. The authors induced three cell lines using two EBV strains and they performed single cell RNA-seq using the 10x platform and conducted a very classical analysis using Seurat. They added two datasets from the literature (Osorio et al., 2019). They descibe that each cell line has a certain level of intra-heterogeneity with different Ig expression patterns, different maturation stages (Figure 1 and 2), as well as exhibit different viral lytic/latent stages (Figure 3) and mitochondrial gene expression (Figure 4).

General comments: This study is needed and interesting as a resource for the community of scientists using these cell lines but I see major problems in the design of the study and the analysis. Beyond doing scRNA-seq and displaying clustering analysis, the final aim and the novelty of the study are not clear to me. It is neither clear how one can use such analysis to guide his research. As presented, the data analysis is very preliminary (Figure 1—figure supplement 15-19 and Figure 4—figure supplement 1-4) and the study needs substantial improvements before publication in a top-tier journal such as eLife. No functional analysis is provided to indeed demonstrate that intra- and inter-variability has implications in study design.

We thank reviewer #3 for acknowledging the utility of our reported work for a broad community of researchers beyond the EBV community. We believe that the newly added data and analysis signify substantial improvements to the manuscript with respect to validation and functional implications for LCLs.

Respectfully, it is unclear why the reviewer cites 9 out of 38 total figures and figure supplements to broadly assert that the presented data is “very preliminary” – we provided these specific supplements (principal component heatmaps and clustering resolution screens) for completeness and analytical transparency (for example, different clustering resolutions may be required depending on the desired analysis). This assertion does not substantively challenge the key data and findings presented in the main figures.

Regarding aim and novelty, we note several key aspects of the work including: 1) the first (to our knowledge) evidence of multiclonality in LCLs at single-cell resolution, 2) single-cell characterization of inverse expression patterns for B cell activation and differentiation programs (now supplemented with additional functional and validation data), 3) the first single-cell dataset revealing EBV lytic replication at appreciable levels, and 4) the first report of LCL subpopulations with distinct metabolic profiles.

With respect to how such analysis might allow one to guide her own research, the datasets clearly identify numerous genes of interest and correlations that can inform (to name just a few areas):

1) Subsequent targeted studies in B cell activation and metabolism; modeling of host-pathogen dynamics

2) Guides to developing molecular reporters and other tools for live-cell studies

3) Transcriptome-wide evaluation of the effects of chemical perturbations including potential therapeutics (for example, the class of drugs that aims to treat viral infection by inducing higher rates of lytic reactivation).

This work clearly opens new avenues of research, including refinement of a model for B cell functional states and dynamics in the context of EBV infection and associated lymphomas.

We provide further detailed addresses to reviewer #3’s concerns below, which we believe have resulted in a substantially improved manuscript.

1) Design of the experiments: the overall scope of the study is very limited with only five different donors and it is not clear what is the contribution of the different viral strains. The rational of the study design is not outlined. The authors have chosen to look at the transcriptome only; we would have expected to have more -omics for a resource paper such as ATAC-seq and methylome to document the underlying heterogeneity of the cells.

A previously uncharacterized inverse relationship between activation and differentiation transcriptional signatures at the single cell level is consistently shown in 5 LCLs from different donors prepared and analyzed in different labs. At the same time, notable and meaningful clonal differences are found to exist within and across the LCLs, demonstrating that nominally identical cell populations should not be assumed to be monolithic with respect to identity or fitness. We cite data linking some of these differences to variation in viral transcriptional states and demonstrate the effects on cell proliferation. In and of themselves, these are valuable findings. While inclusion of additional samples may offer further confirmation or refinement of the generalizability of either finding, it is not immediately clear to us that further samples are necessary in this case. The reviewer contends that five samples constitute a limited study but provides no rationale or statistical basis for what defines a study of sufficient size – this is necessarily dependent on the nature and magnitude of what is being investigated/tested. Because the findings of the study were not known a priori, defining such a standard would have proven challenging. Whether 100% of LCLs uniformly exhibit the B cell activation / differentiation continuum or what percentage of LCLs display polyclonality are questions best addressed by future studies.

The contributions from testing different viral strains (B95-8 and M81, the most widely used Type 1 and Type 2 strains, respectively) in cells from the same donor are articulated throughout the manuscript.

We believe the rationale and aim of the study is clearly defined in the manuscript Introduction: “Although studied extensively, complete characterization of the viral and host determinants of growth arrest versus immortalization of early-infected cells remains elusive.27 As one consequence, it is unclear whether or to what extent viral transformation may influence the resulting LCL cell populations. The possibility of significant phenotypic diversity within and across LCL samples warrants consideration, given the intrinsic variance of the human primary B cell repertoire28,29 and the multiplicity of viral transcription programs active in the journey to immortalization.”

Reviewer #3 is correct that additional -omics data would add layers of functional detail such as epigenetic profiling to the study. While we appreciate the reviewer’s focus and expertise in multi-omics analyses and the importance of these techniques, this work was conceived and executed with a different scope and focus. We feel that the work in its present form provides a rich resource and compelling findings that will be sufficiently valuable to the readership.

2) The description of the variability of LCLs is not novel. Ozgyin et al., 2019 have already provided a genotype-independent functional genomic variability of the LCLs. This study is not quoted.

We thank reviewer #3 for providing this reference. We have cited and discussed it within our revised manuscript. The statement that “variability of LCLs is not novel” is rather vague and reductive – the work by Ozgyin characterizes certain aspects of variability within LCLs, and our work characterizes aspects of variability that are distinct from that paper’s scope. Below, we highlight several key differences of our manuscript relative to the work by Ozgyin and colleagues:

1) Ozgyin et al. provide a thorough exploration of bulk transcriptomic and epigenetic heterogeneity in multiple LCLs cultured from a single donor. By contrast, we report a conserved continuum of transcriptomic heterogeneity across LCLs from multiple donors.

2) Ozgyin et al. show that heritable epigenetic signatures persist across different samples from the same donor and that these differences may have pharmacological implications. Our work shows that transcriptional differences within LCLs are correlated to different functional states intrinsic to B cell biology and are influenced by heterogeneity in viral programs.

3) Ozgyin et al. present FACS data showing light chain variability across samples from the same donor (a valuable finding). Our work presents single-cell sequencing data showing heavy chain clonality within and across donors, as well as light chain data.

4) Ozgyin et al. focus on host cell transcription as influenced by chromatin accessibility. Our work focuses on the effects of host-pathogen interactions by reporting both host cell and viral reads and their relative expression in individual cells within LCLs.

5) Our work describes a stochastic modeling framework to explore the effects of differential phenotypic fitness and other experimental considerations on LCL evolution.

6) Notably, both works highlight the broad utility of and interest in LCLs across research disciplines

3) The variability between the five LCLs is not clearly delineated. Each cell line is heterogenous (Figure 1 to 4) but the authors have studied the cell lines independently without attempting to merge the data. The inter-heterogeneity is very poorly documented. The data analysis to perform a merge would require substantial computational work that goes beyond the scope of a two-month revision. It would be very important to explain the contribution of the inter-individual genomic variability between donors.

The revised manuscript provides a characterization of the merged datasets (Figure 1—figure supplement 5 in the revised manuscript). It shows that donor-specific differences are a dominant source of LCL heterogeneity. The fact that donor-specific genomic differences dominate relative to conserved intra-sample trends is unsurprising. This analysis reinforced discussion related to the stochastic model of cell phenotype evolution – that the contribution of inter-individual genomic variability is significant.

4) The authors claim in the Abstract that "This heterogeneity is likely attributable to intrinsic variance in primary B cells and host-pathogen dynamics." In a publication in a top journal such as eLife, we expect that such a claim is documented with experiments. The authors claim that "primary cell heterogeneity, random sampling, time in culture, and even mild differences in phenotype-specific fitness" can contribute to such heterogeneity. These are very general claims that do not help and are not supported with experiments.

These are general and foundational principles. In the case of the effects of host-pathogen dynamics, please refer to the substantial body of research describing EBV LMP1-mediated upregulation of NF-κB signaling pathways and the newly-provided data showing higher proliferation of ICAM-1 – high (an LMP1 proxy) cells within LCLs. In the case of the effects of random sampling, time in culture, and phenotype-specific fitness, we were referring to results of stochastic simulations. While not so dispositive as clean results from wet lab experiments, models and simulations can be valuable tools for research and guide subsequent studies.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Revisions:

Please include in your discussion the points raised by reviewers #1 and #3 about the limitations of this study:

Reviewer #1: Due to lack of donor material prior to transformation by EBV the authors could not assess if the heterogeneity of their cell lines was donor dependent. Furthermore, they could not provide any information on the longitudinal stability of the reported heterogeneity.

We have added the following sentences to the Discussion section to address these points:

“A notable limitation of this study is the lack of access to (GM12878 and GM18502) or retention of (LCL_461 and LCL_777) original donor primary B cells and longitudinal sampling, which would have provided direct insight into donor-dependent cellular heterogeneity.”

“Additional studies that utilize time-resolved single-cell sampling from original B cells through early infection and long-term LCL outgrowth in culture will be essential to explore further the longitudinal stability and variation in transcriptional profiles following EBV infection.”

Reviewer #3: It remains to uncover the origin of this variability using other layers of -omics and longitudinal sampling of the cells along the transformation process.

We have added the following sentences to the Discussion section to address these points:

“Additional studies that utilize time-resolved single-cell sampling from original B cells through early infection and long-term LCL outgrowth in culture will be essential to explore further the longitudinal stability and variation in transcriptional profiles following EBV infection. Moreover, while the transcriptomic profiles we report provide a valuable resource, additional molecular layers must be interrogated through parallel -omics techniques (e.g., ATAC-seq, DNA methylation) across individual cells to understand deeply the mechanistic underpinnings of transcriptional heterogeneity.”

https://doi.org/10.7554/eLife.62586.sa2

Article and author information

Author details

  1. Elliott D SoRelle

    1. Department of Molecular Genetics and Microbiology, Center for Virology, Duke University School of Medicine, Durham, United States
    2. Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Writing - original draft, Writing - review and editing, EDS performed Seurat analysis of single-cell sequencing datasets for the five LCLs, developed clonal evolution simulations, helped design validation experiments, prepared figures, and wrote the manuscript
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3362-1028
  2. Joanne Dai

    Department of Molecular Genetics and Microbiology, Center for Virology, Duke University School of Medicine, Durham, United States
    Present address
    Amgen Inc, San Francisco, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Methodology, Writing - review and editing, JD performed the initial cell isolation (461_B95-8, 777_B95-8, and 777_M81) for the 10X platform as well as the initial CellRanger and Seurat analysis of those three LCLs
    Competing interests
    Joanne Dai is affiliated with Amgen Inc, The author has no financial interests to declare.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9879-4704
  3. Emmanuela N Bonglack

    1. Department of Molecular Genetics and Microbiology, Center for Virology, Duke University School of Medicine, Durham, United States
    2. Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, United States
    Contribution
    Data curation, Formal analysis, Validation, Investigation, Writing - review and editing, ENB performed ICAM-1 sorting experiments, growth curves, and Seahorse experiments to assess metabolic differences
    Competing interests
    No competing interests declared
  4. Emma M Heckenberg

    Department of Molecular Genetics and Microbiology, Center for Virology, Duke University School of Medicine, Durham, United States
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Writing - review and editing, EMH performed IgH and IgL RT-PCR and sequencing analysis of LCLs
    Competing interests
    No competing interests declared
  5. Jeffrey Y Zhou

    Department of Medicine, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Data curation, Formal analysis, Supervision, Validation, Investigation, Writing - review and editing, JYZ performed an independent analysis of the three scRNA-seq datasets derived in house providing important initial insight into the heterogeneity and quality of these samples
    Competing interests
    No competing interests declared
  6. Stephanie N Giamberardino

    Duke Molecular Physiology Institute and Department of Neurology, Duke University School of Medicine, Durham, United States
    Contribution
    Data curation, Formal analysis, Supervision, Investigation, Methodology, Writing - review and editing, SG performed initial CellRanger and Seurat analysis and training of JD in these suites towards our initial analysis of the in-house derived LCL scRNA-seq data sets
    Competing interests
    No competing interests declared
  7. Jeffrey A Bailey

    Department of Pathology and Laboratory Medicine, Warren Alpert Medical School, Brown University, Providence, United States
    Contribution
    Supervision, Writing - review and editing, JAB provided oversight and guidance to JYZ in the independent analysis of LCL scRNA-seq data of the 461 and 777 LCLs
    Competing interests
    No competing interests declared
  8. Simon G Gregory

    Duke Molecular Physiology Institute and Department of Neurology, Duke University School of Medicine, Durham, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Methodology, Writing - review and editing, SGG provided oversight of SG and guidance to JD and EDS in developing the analysis pipeline for all scRNA-seq data
    Competing interests
    No competing interests declared
  9. Cliburn Chan

    Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, United States
    Contribution
    Data curation, Formal analysis, Supervision, Investigation, Writing - review and editing, CC supervised EDS along with MAL and provided oversight and review of scRNA-seq analysis as well as the development of the cell proliferation simulations
    Competing interests
    No competing interests declared
  10. Micah A Luftig

    Department of Molecular Genetics and Microbiology, Center for Virology, Duke University School of Medicine, Durham, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Writing - review and editing, MAL conceived experiments, acquired funding for the project, analyzed data to provided key interpretations regarding the viral/host interactions and activation/differentiation continuum. He provided essential oversight for the conceptual framework of the study and edited the manuscript
    For correspondence
    micah.luftig@duke.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2964-1907

Funding

National Institute of Dental and Craniofacial Research (R01-DE025994)

  • Micah A Luftig

National Cancer Institute (T32-CA009111)

  • Elliott D SoRelle
  • Joanne Dai
  • Micah A Luftig

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to acknowledge the assistance of the Duke Molecular Physiology Institute Molecular Genomics core (Karen Abramson), the Duke Flow Cytometry Shared Resource (Mike Cook, Lynne and Nancy Martin, and Lynn Martinek), and the Duke School of Medicine Cellular Metabolism Analysis Core Facility (Nancie MacIver and Amanda Nichols) for the generation of key data in this manuscript. We also wish to thank Drs. Eric Johannsen and JJ Miranda for thoughtful discussion and feedback on this work.

Senior Editor

  1. Päivi M Ojala, University of Helsinki, Finland

Reviewing Editor

  1. Melanie M Brinkmann, Technische Universität Braunschweig, Germany

Reviewer

  1. Erik K Flemington, Tulane University School of Medicine, United States

Publication history

  1. Received: August 29, 2020
  2. Accepted: January 26, 2021
  3. Accepted Manuscript published: January 27, 2021 (version 1)
  4. Version of Record published: February 6, 2021 (version 2)

Copyright

© 2021, SoRelle et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,546
    Page views
  • 254
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Immunology and Inflammation
    Srijan Seal et al.
    Review Article

    Researchers worldwide are repeatedly warning us against future zoonotic diseases resulting from humankind’s insurgence into natural ecosystems. The same zoonotic pathogens that cause severe infections in a human host frequently fail to produce any disease outcome in their natural hosts. What precise features of the immune system enable natural reservoirs to carry these pathogens so efficiently? To understand these effects, we highlight the importance of tracing the evolutionary basis of pathogen tolerance in reservoir hosts, while drawing implications from their diverse physiological and life-history traits, and ecological contexts of host-pathogen interactions. Long-term co-evolution might allow reservoir hosts to modulate immunity and evolve tolerance to zoonotic pathogens, increasing their circulation and infectious period. Such processes can also create a genetically diverse pathogen pool by allowing more mutations and genetic exchanges between circulating strains, thereby harboring rare alive-on-arrival variants with extended infectivity to new hosts (i.e., spillover). Finally, we end by underscoring the indispensability of a large multidisciplinary empirical framework to explore the proposed link between evolved tolerance, pathogen prevalence, and spillover in the wild.

    1. Immunology and Inflammation
    2. Microbiology and Infectious Disease
    Hao Gu et al.
    Research Article

    Vaccination strategies for rapid protection against multidrug-resistant bacterial infection are very important, especially for hospitalized patients who have high risk of exposure to these bacteria. However, few such vaccination strategies exist due to a shortage of knowledge supporting their rapid effect. Here, we demonstrated that a single intranasal immunization of inactivated whole cell of Acinetobacter baumannii elicits rapid protection against broad A. baumannii-infected pneumonia via training of innate immune response in Rag1-/- mice. Immunization-trained alveolar macrophages (AMs) showed enhanced TNF-α production upon restimulation. Adoptive transfer of immunization-trained AMs into naive mice mediated rapid protection against infection. Elevated TLR4 expression on vaccination-trained AMs contributed to rapid protection. Moreover, immunization-induced rapid protection was also seen in Pseudomonas aeruginosa and Klebsiella pneumoniae pneumonia models, but not in Staphylococcus aureus and Streptococcus pneumoniae model. Our data reveal that a single intranasal immunization induces rapid and efficient protection against certain Gram-negative bacterial pneumonia via training AMs response, which highlights the importance and the possibility of harnessing trained immunity of AMs to design rapid-effecting vaccine.