A CD4+ T cell reference map delineates subtype-specific adaptation during acute and chronic viral infections

  1. Massimo Andreatta
  2. Ariel Tjitropranoto
  3. Zachary Sherman
  4. Michael C Kelly
  5. Thomas Ciucci  Is a corresponding author
  6. Santiago J Carmona  Is a corresponding author
  1. Department of Oncology, UNIL CHUV and Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Switzerland
  2. Agora Cancer Research Center, Switzerland
  3. Swiss Institute of Bioinformatics, Switzerland
  4. David H. Smith Center for Vaccine Biology and Immunology, Department of Microbiology and Immunology, University of Rochester, United States
  5. Single Cell Analysis Facility, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, United States
  6. Laboratory of Immune Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, United States

Abstract

CD4+ T cells are critical orchestrators of immune responses against a large variety of pathogens, including viruses. While multiple CD4+ T cell subtypes and their key transcriptional regulators have been identified, there is a lack of consistent definition for CD4+ T cell transcriptional states. In addition, the progressive changes affecting CD4+ T cell subtypes during and after immune responses remain poorly defined. Using single-cell transcriptomics, we characterized the diversity of CD4+ T cells responding to self-resolving and chronic viral infections in mice. We built a comprehensive map of virus-specific CD4+ T cells and their evolution over time, and identified six major cell states consistently observed in acute and chronic infections. During the course of acute infections, T cell composition progressively changed from effector to memory states, with subtype-specific gene modules and kinetics. Conversely, in persistent infections T cells acquired distinct, chronicity-associated programs. By single-cell T cell receptor (TCR) analysis, we characterized the clonal structure of virus-specific CD4+ T cells across individuals. Virus-specific CD4+ T cell responses were essentially private across individuals and most T cells differentiated into both Tfh and Th1 subtypes irrespective of their TCR. Finally, we showed that our CD4+ T cell map can be used as a reference to accurately interpret cell states in external single-cell datasets across tissues and disease models. Overall, this study describes a previously unappreciated level of adaptation of the transcriptional states of CD4+ T cells responding to viruses and provides a new computational resource for CD4+ T cell analysis.

Editor's evaluation

This paper uses single-cell genomics to examine the heterogeneity of virus-specific CD4 T cells over time in both acute and chronic viral infection. Further, the authors build a comprehensive atlas of the transcriptional evolution of virus-specific CD4 T cell responses that could be used as a reference tool to interpret other datasets. This work characterizes how the antiviral CD4 T cell transcriptional landscape changes with time and will be of broad interest to those that study acute and chronic CD4 T cell responses.

https://doi.org/10.7554/eLife.76339.sa0

Introduction

CD4+ T cells play a critical role in shaping immune responses against pathogens through the secretion of soluble mediators and direct cell interactions with other immune cell populations. The multifaceted ability of CD4+ T cells to orchestrate multiple layers of protection relies on their unique capacity to adopt diverse functional fates upon antigen encounters (Nguyen et al., 2019; Swain et al., 2012). Following viral infections, naive CD4+ T cells clonally expand and differentiate into effector populations supporting both cellular and humoral responses. This functional diversification of CD4+ T cell populations is under tight transcriptional control, ensuring the appropriate positioning and deployment of effector functions (Zhu et al., 2010). While Th1 cells, supported by the transcription factors Blimp1 and T-bet, regulate cellular responses in helping CD8+ T cells and innate populations through the secretion of INF-γ, follicular-helper CD4+ T cells (Tfh), which depend on Bcl6, promote antibody responses via direct cell contact with B cells and the production of cytokines like IL-21 (Crotty, 2011; Laidlaw et al., 2016; Sheikh and Groom, 2021).

During the contraction phase following its initial amplification, the pool of virus-specific CD4+ T cells declines in both self-resolving and persistent infections. However, the nature of the infection greatly impacts the evolution of CD4+ T cell functions (Brooks et al., 2005; Crawford et al., 2014; Fahey et al., 2011). After acute viral infections, pathogen clearance is followed by the persistence of memory CD4+ T cell populations that acquire distinct phenotypes, gene expression and functional properties (Crawford et al., 2014; Hale et al., 2013; Marshall et al., 2011). Because memory populations are heterogeneous and include many subsets, including Th1- and Tfh-like subsets, measuring transcriptional changes occurring between the effector to memory populations remains challenging. In fact, in addition of Th1 and Tfh cells, memory populations are comprised of a less differentiated subset of Central Memory (Tcm) cells that contribute to long-term protective functions of CD4+ T cells (Pepper and Jenkins, 2011). Tcm cells phenotypically resemble cells present during the early anti-viral response and referred to as Central Memory precursors (Tcmp), raising the possibility of an early transcriptional imprinting that favors the emergence of long-lived memory CD4+ T cells (Ciucci et al., 2019; Marshall et al., 2011; Pepper et al., 2011). Yet, the nature of such program, as well as its overlap with that involved in Th1 and Tfh subsets, has not been elucidated. More broadly, it remains unclear whether a shared transcriptional module regulates memory differentiation, or whether diverse gene programs allowing long-term maintenance are imprinted in a subset-specific manner. In sharp contrast to acute settings, chronic infections do not result in such phenotypic memory transition of persisting cells (Brooks et al., 2005; Fahey et al., 2011). Instead, in response to sustained antigenic stimulation, CD4+ T cells acquire dysfunctional features, including the expression of inhibitory receptors and reduced cytokine production (Brooks et al., 2005; Crawford et al., 2014). In this context, questions remain about how persistent infections alter the functional and transcriptional landscape of CD4+ T cell populations. However, because of the lack of consistent definition of virus-specific CD4+ T cell states across conditions and over time, the subtype-specific adaptations during infections are currently poorly characterized.

Although there is evidence that cell fate decisions are stochastically imprinted on T cells (Buchholz et al., 2016; Buchholz et al., 2013; Soon et al., 2020), other studies have shown that cell-intrinsic factors as well as environmental cues affect the differentiation of single naïve T cells (Cho et al., 2017; Tubo et al., 2016; Tubo et al., 2013). Among these factors, the interaction between the T cell receptor (TCR) and their cognate antigen has been shown to influence the diversification and maintenance of CD4+ T cells (Cho et al., 2017; Snook et al., 2018; Tubo et al., 2013). For instance, recent studies showed that Th1 and Tfh differentiation are influenced by TCR usage, affinity to cognate peptides and the type of infection (Khatun et al., 2021; Künzli et al., 2021; Snook et al., 2018). Yet, we do not fully understand the extent by which the TCR repertoire impacts the early fate decision of CD4+ T cells responding to acute and chronic viral infections. In particular, it remains to be addressed whether naïve T cells with particular TCR chains are preferentially recruited during the effector phase and adopt specific transcriptional profiles that could skew the overall immune response.

Here, we employed single-cell RNA sequencing (scRNA-seq) coupled with single-cell TCR sequencing to explore the landscape of virus-specific CD4+ T cell states at different timepoints during acute and chronic infections. We provide evidence of both shared and subtype-specific transcriptional changes occurring dynamically in both types of infection. Analysis of paired scRNA-seq and scTCR-seq data of antigen-specific polyclonal T cells revealed that, although a fraction of the clonotypes in both acute and chronic settings were significantly biased towards specific subtypes, T cell functional diversification appears to be mostly independent of the expression of particular TCR chain pairs. Based on these results, we make available a new reference map describing virus-specific CD4+ T cell states – including Th1, Tfh, and Tcmp/Tcm – and their dynamic evolution over time during acute and chronic infection. By combining this map with a reference-projection algorithm, we provide a new computational framework that enables automated and accurate interpretation of CD4+ T cell states across models, conditions, and experiments.

Results

Differential phenotypic adaptation of CD4+ T cells in acute and chronic viral infection

To characterize the diversification of T cell populations during acute and chronic infections, we used two variants of the lymphocytic choriomeningitis virus (LCMV): the Armstrong and Clone 13 strains. Both viruses induce a strong T cell amplification early in the response, followed by the persistence of a small pool of virus-specific T cells at later timepoints. While the Armstrong strain results in an acute infection cleared within 6–8 days post infection (dpi), Clone 13 persists, leading to chronic infection (Ahmed et al., 1984; Crawford et al., 2014). Using these models, we sought to measure the phenotypic changes of virus-specific T cells following infection, both at early and late timepoints. Virus-specific CD4+ and CD8+ T cells were identified using MHC tetramers loaded with the LCMV-derived GP66 and GP33 peptides, respectively. Cells were analyzed using spectral flow cytometry with a panel of 21 parameters allowing high-dimensional analyses based on 14 surface markers expressed by virus-specific T cells (Figure 1A, Figure 1—figure supplement 1A-C).

Figure 1 with 1 supplement see all
Phenotypic characterization of virus-specific CD4+ T cells by spectral flow cytometry.

Spleen GP66:I-Ab+ CD4+ T cells were analyzed 7 and 21 days after infection with LCMV Armstrong and Clone 13. (A) Schematic of experimental procedures. Uniform Manifold Approximation and Projection (UMAP) visualization was calculated based on the expression of 14 markers on virus-specific CD4+ T cells pooled from 5 animals. (B) Expression of selected markers shown on the UMAP as in (A). (C) CD4+ T cells from each condition were highlighted as contour lines on the UMAP. Experiment with 4–5 mice per group, representative of two independent experiments. See also Figure 1—figure supplement 1 for spectral cytometry analysis including GP33:H2Db+ CD8+ T cells, and for surface marker panels used to characterize T cell populations.

Cytometric readouts showed that splenic virus-specific CD4+ T cells are highly heterogeneous and largely differ phenotypically from CD8+ T cells, both at early and late timepoints after acute and chronic infections (Figure 1B, Figure 1—figure supplement 1D). Interestingly, while CD4+ T cells responding to acute and chronic infections at early timepoints showed partial similarities, minimal overlap was observed at late timepoints. Additionally, CD4+ T cells differentiating in chronic settings appear to change less drastically over time compared to the sharp transition occurring between early and late timepoints after acute infection (Figure 1C). Overall, these analyses revealed a profound and fast-adapting phenotypic heterogeneity of CD4+ T cell populations in response to different infection settings.

Defining the landscape of CD4+ T cell states in acute and chronic viral infection

To characterize the transcriptional landscape of virus-specific CD4+ T cells and gain further insight into their heterogeneity and transcriptional adaptation, we conducted single-cell RNA sequencing (scRNA-seq) of virus-specific CD4+ T cells isolated from LCMV-infected animals at different timepoints. Virus-specific GP66:I-Ab+ were purified either 7 or 21 days after Clone 13 infection – conditions referred to as Early and Late Chronic. In addition, similar populations were isolated 7, 21 and >60 days after LCMV Armstrong infection – conditions referred to as Acute, Early and Late Memory, respectively. A total of 11 samples, including two or three biological replicates per condition, were processed with droplet-based scRNA-seq, resulting in over 35,000 high-quality virus-specific CD4+ T cell transcriptomes (Figure 2A and Supplementary file 1A-B). In addition, selected samples were used to measure simultaneously TCR usage and transcriptome at single-cell resolution. To generate a unified map of virus-specific CD4+ T cell states in acute and chronic infections, all datasets were integrated with STACAS, a computational tool allowing correction of batch effects while preserving relevant biological variability across datasets (Andreatta and Carmona, 2021a) (see Materials and methods) (Figure 2B). While different timepoints and types of infection (i.e. acute vs chronic) occupied different areas of the integrated space, biological replicates were largely covering overlapping areas of the map (Figure 2—figure supplement 1A), suggesting a successful data integration. By clustering the high-dimensional space of the integrated map, we defined six major and three minor CD4+ T cell clusters that were annotated based on the expression of canonical markers previously described in this model. Among the six major clusters, representing >93% of the cells in the map, we identified: (i) Th1 effector cells, expressing the highest levels of Cxcr6 and Ly6c2 (encoding Ly6C); (ii) Tfh effector cells, preferentially expressing Cxcr5 and Izumo1r (encoding FR4); (iii) Central memory precursors (Tcmp) expressing the highest levels of Ccr7 (Figure 2C). The remaining three major clusters were identified as putative memory populations based on their higher expression of memory-associated genes Tcf7 (encoding TCF1) and Il7r, corresponding to: (iv) Th1 memory (co-expressing Tcf7, Il7r, Cxcr6 and Ly6c2), (v) Tfh memory (co-expressing Tcf7, Il7r and Izumo1r) and (vi) Central Memory cells (Tcm), with the highest levels of Tcf7 and Il7r but limited expression of Th1/Tfh marker genes (Figure 2C). Consistent with these annotations, Th1 memory, Tfh memory and Central Memory (Tcm) populations were predominantly derived from virus-specific CD4+ T cells isolated at late timepoints after acute infection (Figure 2—figure supplement 1A, see next section). In addition, these clusters of virus-specific CD4+ T cells and their annotations were independently validated using gene signatures that were previously identified on CD4+ T cells after acute viral infection (Ciucci et al., 2019; Figure 2—figure supplement 1B). Finally, three minor clusters corresponded to (i) Foxp3-expressing regulatory T cells (Treg), (ii) a Tfh-like state expressing high levels of type 1 interferons-stimulated genes (INFI-stimulated) and (iii) a population characterized by high levels of Eomes (Eomes-HI) (Figure 2C). These minor states were largely associated to chronic infection and will be described in a later section. All subtypes were present in similar proportions across biological replicates, and samples clustered by condition rather than by batch, further confirming a successful data integration (Figure 2—figure supplement 1D-E). Similar subtype proportions were confirmed by spectral cytometry (Figure 1—figure supplement 1E).

Figure 2 with 1 supplement see all
Transcriptional landscape of CD4+ T cell states during infections.

(A) Schematic experimental design to assess virus-specific T cell transcriptomes at different timepoints in acute (Armstrong) and chronic (Clone 13) LCMV infections (additional information in Supplementary file 1). (B) UMAP visualization of single-cell data before and after dataset integration, highlighting the samples and batches on the left and the 9 CD4+ T cell subtypes of the reference map on the right. (C) Expression levels of key marker genes in the 9 subtypes of the reference map. (D) Single-cell expression visualized in the UMAP space for key marker genes of Th1, Tfh and Tcmp/Tcm subtypes. (E) Average expression of differentially expressed genes in the six major subtypes of the reference map; selected genes are highlighted.

We next explored the expression of genes involved in the function of CD4+ T cell subsets across major clusters. As expected, Th1 cells, both effector and memory, expressed the highest amount of the Th1-defining transcription factor Prdm1 (encoding Blimp1) together with Gzmb – encoding the cytotoxic effector molecule Granzyme B. Similarly, the Tfh-specific transcription factor Bcl6 and effector molecule Il21 were almost exclusively expressed in Tfh cells (Figure 2D). Interestingly, Tcmp and Tcm populations, which expressed the highest levels of Ccr7, Il7r and S1pr1 were characterized by the promiscuous expression of both Th1-associated genes such as Nkg7, Ifngr1, or Runx3, and Tfh-specific markers like Slamf6, Tox, or Bcl2 (Figure 2C–D, Figure 2—figure supplement 1C).

Differential gene expression analysis between the major CD4+ T cell states revealed additional subtype-specific genes, and showed that Th1 states (both Effector and Memory subsets) share a common gene module (e.g. Id2, Runx3, Gzmb), distinct to that of Tfh cells, characterized by the expression of Tox, Maf, and Izumo1r (Figure 2E). Although Tcmp and Tcm states are largely defined by their lack of Th1 and Tfh gene signatures, consistent with a more quiescent, undifferentiated state, they are characterized by the shared expression of genes such as Ccr7, Klf2, and S1pr1. The full list of CD4+ T cell subtype-specific signatures is available in Supplementary file 2, and gene expression in this dataset can be explored online at https://spica.unil.ch/refs/viral-CD4-T.

Subtype-specific evolution of CD4+ T cell states during acute infection

To further describe CD4+ T cell states as they adapt during the course of an acute, self-resolving infection that generates protective memory T cells, we investigated subtype composition and subtype-specific transcriptional changes over time. First, from the effector phase (7 dpi) to the early memory phase (21 dpi), we observed a dramatic shift from Th1 effector, Tcmp and Tfh effector to Th1 memory, Tcm and Tfh memory (Figure 3A), concomitantly associated with a reduction in the absolute number of T cells (Figure 2—figure supplement 1F-G). This was consistent with the fact that major functional and phenotypic changes occurs during and after the contraction phase 12–20 days following infection (Marshall et al., 2011; Pepper and Jenkins, 2011).

Figure 3 with 1 supplement see all
Subset-specific adaptation of CD4+ T cell states in chronic and acute infections.

(A,C) Distribution of T cell states at different timepoints after acute and chronic infections. In the UMAP plots, contour lines indicate the density of T cells for each type of infection and timepoint; the barplots in the bottom row indicate the percentage of cells in each subtype in the indicated condition. (B,D) Normalized average expression during acute and chronic infections among Th1, Tfh and Tcm(p) subtypes. Selected genes from differentially expressed genes shown in Figure 3—figure supplement 1A,C. (E,F) Animal were infected with LCMV Armstrong (Acute) or Clone 13 (Chronic) and analyzed at the indicated timepoints (Early: 7dpi; Late: 21dpi). (E) Graph shows the percentage of Eomes+ cells among spleen GP66:I-Ab+T cells. (F) Plot (left) shows the intracellular expression of Eomes and Thpok on spleen GP66:I-Ab+T cells analyzed 21 dpi after LCMV Clone 13 infection. Graph (right) shows Thpok mean fluorescent intensity (gMFI) in the indicated population. (E,F) are from one experiment with >5 mice per group, representative of 2 independent experiments.

To further investigate the transcriptional changes underlying this transition, we sought to identify gene expression differences among matching cell states during the acute response and early memory phase. We also interrogated potential changes between the early and late memory timepoints resulting in the identification of 196 genes differentially expressed in a time- and state-specific manner in response to acute infection (Figure 3—figure supplement 1A, Supplementary file 2). Our analyses revealed that the transition of Th1 and Tfh subtypes from effector phase to memory phase were accompanied by the dampening of effector function molecules such as Gzmb (Th1) and Il21 (Tfh), and by the acquisition of Il7r expression at the memory phase (Figure 3B). However, the downregulation of effector programs, especially for the Tfh state, was more pronounced in late memory phase compared to early memory, suggesting that memory CD4+ T cells undergo continued transcriptional remodeling after the contraction phase. In contrast, the central memory-type cells [Tcmp and Tcm clusters, referred to as Tcm(p)] readily downregulated most effector-associated genes at the early memory phase. Similarly, the Tcm(p) state more quickly upregulated genes associated with the function, survival or trafficking of memory cells like Ccr7, Il7r, Bcl2, or Klf6 compared to Th1 and Tfh states. This early divergence and stable expression of memory genes is compatible with the possibility that Tcmp represent a pool of circulatory cells with an increased fitness to develop into long-lived memory cells.

These analyses highlight subtype-specific transcriptional changes from effector to memory states. In particular, they suggest that, while Tcmp are poised to transition to memory, Th1 and Tfh states do so with different kinetics and using divergent transcriptional modules.

Subtype-specific adaptation of CD4+ T cell states to chronic infection

We next investigated the adaptation of CD4+ T cell transcriptional states to chronic infection. Compared to the acute setting, we did not observe a sharp transition to memory states at day 21, and most virus-specific CD4+ T cells matched effector subtypes (Figure 3C). Although the proportion of the Th1 subtype remained similar at early timepoints between acute and chronic condition, there was a reduction in the Th1 effector subsets at late chronic stages. In addition, we observed a larger pool of Tfh cells both at early and late chronic timepoints compared to acute settings (Figure 3C), consistent with previous studies highlighting a Tfh bias during Clone 13 infection (Brooks et al., 2005; Fahey et al., 2011). We also noted that the fraction of Tcmp cells was lower in chronic infection, both at early and late stages, compared to the CD4+ T cells in acute infection. Indeed, the proportion of Tcmp and Tcm cells, which can be identified by flow cytometry based on CCR7 expression, was greatly reduced among virus-specific CD4+ T cells in chronic settings compared to acute infection (Figure 3—figure supplement 1B).

Next, we aimed to assess the transcriptional changes affecting the differentiation and persistence of each subtype in responses to chronicity. To this end, we measured the differences across subtypes at an early (7 dpi) and late phase (21 dpi) of the chronic response. We identified 214 genes differentially expressed between Acute vs. Early Chronic, Early vs. Late Chronic and Early memory vs. Late Chronic timepoints (Figure 3—figure supplement 1C, Supplementary file 2). Importantly, most changes observed at the late chronic phase were not present at the early chronic stage, suggesting that they are not merely attributed to changes in viral replication or host responses to viral variants. We observed that late chronicity was associated with the upregulation of a shared gene module, including Nr4a family members (Nr4a1, Nr4a2, Nr4a3) and Tox in all subtypes (Figure 3D, Figure 3—figure supplement 1C), indicative of the strong TCR engagement in response to persistent antigen (Seo et al., 2019). Similarly, the expression of inhibitory receptors such as Pdcd1 (encoding PD-1) and Lag3 was also detected across states in late chronic samples. Late timepoints were also characterized by the downregulation of effector modules, including gene associated with cytotoxic function, such as Gzmb and Ctsw (encoding Cathepsin W) in Th1 clusters. In contrast, cytokine Il21 as well as transcription factor Maf remained highly expressed in Tfh clusters, suggesting that, unlike in Th1, effector functions in Tfh cells are not dampened at late chronic phase. In fact, Il21 expression appears to increase at late timepoints in both Tfh and Th1 subtypes (Figure 3D). In contrast to what was observed in response to acute infection, Tcm(p) minimally diverged from other states, as the expression of inhibitory receptors and transcription factors associated with T cell dysfunction such as Ikzf2 (encoding Helios), Bhlhe40, or Gata3 (Crawford et al., 2014; Doering et al., 2012; Singer et al., 2016) were equally upregulated in all states in chronic settings. However, unlike other subtypes, Tcm(p) maintained expression of Il7r, Ccr7, and S1rp1 (Figure 3D, Figure 3—figure supplement 1C).

In addition to the six main CD4+ T cell states, we detected two distinct states that were almost exclusively present in response to chronic infection: the IFNI-stimulated state and the Eomes-HI state (Figure 3AC). The Eomes-HI state was specifically observed at late stages of chronic infection, consistent with previous studies (Crawford et al., 2014; Lewis et al., 2016) and flow cytometry analyses (Figure 3E and Figure 3—figure supplement 1D). Because this cluster was characterized by the co-expression of Eomes, Lag3 and Xcl1 (Figure 3—figure supplement 1E), three functional targets repressed by the CD4+ T cell-defining transcription factor ThPOK (Ciucci et al., 2019; Taniuchi, 2018), we sought to determine whether its expression was altered in this subset. Indeed, we found that ThPOK protein expression was reduced specifically in EOMES+ virus-specific CD4+ T cells late during chronic infection (Figure 3F). In addition, this subset displays specific expression of Crtam and Gzmk (Figure 3—figure supplement 1E) and is compatible with the CD4+ T cell subtype with high-cytotoxic potential (Cenerenti et al., 2022).

In summary, these analyses showed that persistent antigen exposure during chronic infections deeply alters CD4+ T cell differentiation by imprinting both common and subtype-specific transcriptional changes associated with chronicity.

Clonotype-fate relationships of virus-specific CD4+ T cells in acute and chronic infections

Because cell-intrinsic factors, notably the expression of TCR, have been shown to impact the differentiation of virus-specific T cells (Khatun et al., 2021; Künzli et al., 2021; Snook et al., 2018), we sought to determine whether CD4+ T cell states are influenced by the expression of particular sets of TCR chains. To describe the clonal relationship between CD4+ T cell states, we analyzed the TCR usage of all T cells for which a productive pair of TCRα/TCRβ sequences was detected (65% of all single cells). To limit potential confounding factors related to the survival or expansion of clonotypes over time, we restricted our analyses to early timepoints (i.e. 7 dpi) after acute or chronic infections. We observed a consistent pattern of clonal expansion across different animals, with large clonotypes (13–31 clones with >20 cells per animal) occupying roughly half of the clonal space in all samples, both for acute and early chronic settings (Figure 4A). Next, we interrogated potential repertoire overlaps between animals considering the pair of nucleotide and amino-acid sequences of the CDR3 regions. Strikingly, this analysis showed that less than 3% of the clonotypes (7 out of 795 CDR3 nucleotide pairs; 20 out of 779 CDR3 protein pairs) were observed in two or more animals, when considering clones with 3 cells or more (Figure 4B). Similar results were obtained when all clones, including singletons, were included (Figure 4—figure supplement 1A). This suggests that the TCR repertoire of virus-specific CD4+ T cells is largely private, that is subject-specific, even when considering a single peptide specificity and animals with the same genetic background.

Figure 4 with 1 supplement see all
Clonal structure of virus-specific CD4+ T cells and clonotype-fate relationship.

(A) Fraction of clonal space occupied by clonotypes with different levels of expansion. For each sample (three replicates for Acute, two replicates for Early Chronic), the plot indicates the number of single cells that belong to clonotypes in one of the five classes of expansion. (B) Number of clones with identical CDR3 nucleotides or amino acid sequence pairs between individual samples among clonotypes with ≥3 cells. (C) Clonotype bias analysis for acute and chronic infection samples. Plots show clonotype bias vs. clonal size for all clones with >10 cells. Clonotypes are colored by predominant T cell subtype (left) or by Z-score of the clonotype functional bias (right). Red line highlights the null distribution (background distribution) for each condition. (D, E) Distribution of cells over the reference map for the (D) four most expanded and (E) four most biased clonotypes in acute and chronic infections. The CDR3 alpha and beta sequences and the clonotype size are indicated for each clonotype.

Next, we explored the potential relationship between TCR usage and the emergence of specific T cell states at early infection timepoints. To this end, we defined a ‘clonotype bias’ metric to quantify how individual clones are skewed toward a specific subtype (see Materials and methods). At its extreme values, a clonotype bias of 1 indicates that a clonotype is composed uniquely of cells from the same subtype, and a clonotype bias of zero corresponds to a clonotype that matches exactly the background subtype distribution of the whole sample. Because small clonotypes are statistically more likely to show high clonotype bias compared to large clonotypes, we only considered expanded clonotypes with 10 or more cells, and corrected for clonotype size by generating expected background distributions by random permutation (Figure 4—figure supplement 1B, see Materials and methods). This analysis revealed that most clonotypes were largely unbiased, both in acute and chronic infections, giving rise to multiple CD4+ T cell states (Figure 4C–D), consistent with previous studies in acute infection (Khatun et al., 2021). However, we observed that a small number of clonotypes exhibited significant functional bias (Z-score >5), that is they were preferentially enriched in one specific subtype (Figure 4C). In the case of acute infection, 9% of expanded clonotypes (15 out of 165) exhibited a functional bias toward one subtype. Similarly, 13% of expanded clonotypes in chronic infection (11 out of 85) showed a significant clonotype bias (Figure 4C). Interestingly, in acute infection 14 out of 15 biased clonotypes were skewed toward either the Tcmp or Th1 states (8 and 6 respectively), while in chronic infection all 11 biased clonotypes showed functional bias toward either Th1 or Tfh states (8 and 3, respectively) (Figure 4C–E). We did not observe any robust CDR3 motif associated with biased clonotypes, and previously reported fate-biased CDR3 motifs (Khatun et al., 2021) were not predictive of clonotype lineage on our data (Figure 4—figure supplement 1C-D and Materials and methods).

These combined analyses of the TCR repertoire and transcriptional landscape reveal that the vast majority of the clonotypes can differentiate into multiple states with minimal functional skewing. However, a minority of these clonotypes are significantly biased toward a particular functional state, and this bias appears to be influenced by the type of infection.

Reference map projection to dissect the effect of genetic and therapeutic perturbations

A cell atlas is particularly useful when it serves as a ‘reference’ to compare and interpret new data. We have recently proposed a computational method, ProjecTILs, that allows analyzing single-cell datasets by projection into a reference atlas (Andreatta et al., 2021c). Using this approach, we sought to apply our CD4+ T cell reference map for the interpretation of CD4+ T cell states in external datasets (Figure 5A).

Figure 5 with 3 supplements see all
Interpretation of CD4+ T cell states in external datasets by projection into the reference map.

(A) Independent scRNA-seq datasets can be projected into the CD4+ T cell reference map using the ProjecTILs algorithm and interpreted in the context of the space and cell states of the reference. (B) UMAP embeddings for the external data by Ciucci et al., 2019 at 7 and 30 days after acute LCMV infection projected into the reference map. Contour lines indicate the density of projected cells. (C) Radar plots showing average expression profiles of a panel of CD4+ T cell marker genes, for the projected cells from Ciucci et al., 2019 (blue and red) compared to the reference map profiles (black) grouped by predicted subtype. (D) UMAP embeddings for projected scRNA-seq data of Bcl6- and Prdm1-deficient virus-specific CD4+ T cells isolated 7 days after acute infection with LCMV Armstrong (data from Ciucci et al., 2022). (E) UMAP embeddings for projected scRNA-seq data of virus-specific CD4+ T cells isolated 33 days after chronic infection with LCMV Clone13, in animals treated with anti-PDL1 or isotype control (data from Snell et al., 2021). Barplots in the bottom row indicate the percentage of cells projected in each reference subtype.

To verify the accuracy of data projection into our reference map, we analyzed an independent scRNA-seq dataset of LCMV-specific CD4+ T cells isolated at 7 and 30 days post-infection with LCMV Armstrong (Ciucci et al., 2019). Validating both the accuracy of the map and of the projection algorithm, cells from day 7 were projected into the Tcmp and effector states, while cells from day 30 were largely projected into the memory states (Figure 5B). Importantly, the expression profile of key marker genes for the projected samples matched closely to the expression profile of the reference map in all major cell subtypes (Figure 5C).

For additional validation, we re-analyzed LCMV-specific CD4+ T cells isolated 10 days post-infection with LCMV Armstrong (Khatun et al., 2021). Dataset projection into our CD4+ T cell reference map revealed that the majority of these virus-specific T cells were found in the Th1 Effector, Tfh Effector, or Tcmp states (Figure 5—figure supplement 1A), similarly to subtype compositions we observed at day 7 in acute infection (Figure 3A). This is consistent with the notion that transition to memory phenotypes occurs later, at day 12–20 post-infection (Marshall et al., 2011). Importantly, very similar subtype distributions were observed across different mice, highlighting the robustness of the projection algorithm across multiple biological replicates (Figure 5—figure supplement 1B). Based on the same dataset, we verified that, while cycling cells tend to cluster together in unsupervised analyses irrespective of their subtype, they are correctly classified by reference projection (Figure 5—figure supplement 1C-G and methods). We also confirmed that data projection is robust to sequencing depth, with consistent subtype annotation with as low as one third of typical sequencing depths (Figure 5—figure supplement 2 and methods).

We also tested the robustness of our projection approach in detecting subtle variations across biologically similar samples. To this end, we took advantage of transcriptomic datasets of LCMV-specific CD4+ T cells isolated at memory timepoints (day 35) after in vivo administration of an inhibitor blocking NAD-induced cell death (NICD) (Künzli et al., 2020). As expected for a late timepoint, in both the control and treated conditions most cells were projected into the memory clusters (Tcm, Th1 memory, and Tfh memory) (Figure 5—figure supplement 1H). However, the NICD-protector-treated sample showed a~twofold increase of Tfh effector and Tfh memory cells compared to control (Figure 5—figure supplement 1I). This is consistent with Tfh cells being more susceptible to NICD than other CD4+ T cell subtypes, and that in vivo NICD-blockade can enhance the persistence of Tfh populations after infection (Künzli et al., 2020).

Having validated the accuracy and robustness of reference map projections, we sought to apply the reference map to interpret the effect of genetic and pharmacological perturbations. Transcription factors Bcl6 and Blimp1 (encoded by the Prdm1 gene) have antagonistic roles in driving CD4+ T cell differentiation into Tfh or Th1 lineages, respectively (Ciucci et al., 2022). Projection of scRNA-seq data from genetically altered virus-specific CD4+ T cells isolated 7 days after LCMV acute infection (Ciucci et al., 2022) showed that Bcl6-deficient cells almost exclusively acquired a Th1 state (Th1 Effector or Th1 Memory), while Blimp1-deficient cells were dominated by Tfh and Tcmp states (Figure 5D). These results are in line with the known role of Bcl6 in driving Tfh and memory differentiation, with the role of Blimp1 in promoting Th1 functions (Crotty, 2011), and highlight the utility of our tool to describe the effect of genetic perturbations.

Next, we aimed at using our reference map to interpret the effect of immunotherapies. To this end, we projected scRNA-seq data of virus-specific CD4+ T cells isolated from mice chronically infected with LMCV, after treatment with an anti-PD-L1 antibody (Snell et al., 2021). While control samples showed a similar subtype distribution to our late chronic samples (Figure 5E, Figure 3C), anti-PD-L1 treatment increased the relative proportion of Th1 effectors (Figure 5E). Expression profiles for all major subtypes in this dataset largely matched those of our reference (Figure 5—figure supplement 3B), including the expression of exhaustion markers, which was similar to chronic infection samples of the reference map (Figure 5—figure supplement 3C). Notably, Th1 effector cells after anti-PD-L1 treatment upregulated a Th1-associated gene module that includes Klrd1, Plac8, Ctla2a, and Ly6c2 (Figure 5—figure supplement 3D), confirming the findings by Snell et al., 2021.

Overall, these analyses demonstrate that our map and projection method can successfully describe biologically relevant alterations in the subtype distribution and transcriptional programs following diverse perturbations.

Diversity of virus-specific CD4+ T cells across tissues

To investigate whether our reference map can be useful to describe the subtype composition and transcriptional landscape of CD4+ T cells from other biological models and tissues, we projected influenza-specific CD4+ T cells isolated from lungs and draining lymph nodes (LN) at different timepoints after infection (Swarnalekha et al., 2021). Data projection revealed that, while lymphoid tissue was largely dominated by Tfh subtypes, lung samples were enriched in Th1 cells, with a progressive accumulation of Tfh cells at later timepoints (Figure 6A–C). These results recapitulated recent observations into the delayed development of tissue-resident CD4+ T cells with Tfh characteristics in the lungs after influenza infection (Son et al., 2021; Swarnalekha et al., 2021). In addition to changes in cell subtype composition, we evaluated transcriptional differences between tissues for each subtype. Consistently with the findings by Swarnalekha et al., we observed that, compared to LNs, all the most abundant T cell subtypes in the lung display a tissue-specific gene module, which includes Crem, Tnfrsf4, H3f3b, Fth1, Ifngr1, Vps37b, Sub1, and Arpc3 (Figure 6D).

Diversity of virus-specific CD4+ T cells across tissues.

(A) Reference projection of influenza-specific CD4+ T cells isolated from draining lymph nodes and lungs at different timepoints after infection (Swarnalekha et al., 2021). Black contour lines represent density of cells over the reference UMAP embeddings. (B) Summary of subtype composition (percentage of total cells) for each of the six samples in this study. (C) Fold-change of cell subtype proportions between lung and lymph node at day 30 p.i. (D) Differentially expressed genes for select subtypes between lung and lymph node at day 30 p.i. The genes consistently found in all three comparisons (hereby ‘lung residency signature’) are: Crem, Tnfrsf4, H3f3b, Fth1, Ifngr1, Vps37b, Sub1, and Arpc3 (p-values from Wilcoxon test). (E) Reference projection of LCMV-specific CD4+ T cells from liver and spleen at day 37 p.i. (Künzli et al., 2020). (F) UCell scores on spleen and liver samples for the “lung residency signature” learned from CD4+ T cells in influenza infection (p-values from Wilcoxon test).

We next projected LCMV-specific CD4+ T cells isolated from liver and spleen at day 37 after acute infection from the study by Künzli et al., 2020. As previously reported, liver-resident CD4+ T cells were strongly enriched in Th1 subtypes compared to spleen (Künzli et al., 2020; Figure 6E). Interestingly, the tissue-specific gene module derived from lung in the context of influenza infection was also significantly upregulated by all subtypes of LCMV-specific CD4+ T cells in the liver, compared to those from spleen (Figure 6F). Altogether, these results suggest that subtype-defining transcriptional programs are preserved across tissues, and that these can be exploited for classification by reference map projection. Moreover, they indicate that different subtypes can use the same gene programs to adapt to different tissues, for instance to acquire residency capacity in non-lymphoid tissue.

Reference map projection to explore CD4+ T cell diversity beyond viral infections

Lastly, we investigated how a transcriptional map of CD4+ T cells that developed in response to viral infections could help interpreting the heterogeneity of CD4+ T cells differentiating in a non-infectious context. To this end, we isolated and performed scRNA-seq of tumor-specific CD4+ T cells from tumor and draining lymph nodes (dLN) of animals inoculated with a colon carcinoma expressing the LCMV-derived GP protein (Magen et al., 2019; Supplementary file 1). Projection of these data into the reference map showed that most tumor-specific CD4+ T cells in the dLN corresponded to Tfh states (Figure 7A–B), similarly to virus-specific cells in the LN of infected animals (Figure 6A–B). In contrast to viral infection, in tumor-draining LN we observed a sizable fraction (~10%) of Treg among antigen-specific CD4+ T cells (Figure 7A–B). Moreover, tumor-infiltrating lymphocytes (TILs) were largely projected into the Th1 effector (40–50%) and Treg (~30%) reference map states (Figure 7A–B).

Projection of tumor-specific CD4+ T cells into the reference viral map.

(A) Reference projection of tumor-specific (GP66:I-Ab+) CD4+ T cells isolated from the tumors (TIL) or draining lymph nodes (LN) of animals inoculated with MC38-GP tumor. Black points and contour lines represent projected cells and their density over the reference UMAP embeddings. (B) Subtype composition as percentage of total cells for each sample. (C) Radar plots showing average expression profiles of a panel of CD4+ T cell marker genes, for the projected tumor-specific CD4+ T cells from the indicated organs compared to the reference map profiles (black). (D) Re-calculated UMAP plot generated after merging virus-specific T cell data (reference map) with tumor-specific T cell data (projected data). Non-Treg, non-Tfh tumor-infiltrating CD4+ T lymphocytes (TILs) emerge as a distinct cluster (‘Tumoral_Th’). (E) Differentially expressed genes between virus-specific T helpers (Th1 Effector) and tumor-infiltrating T helpers (‘Tumoral_Th’); p-values from Wilcoxon test. (F) Expression of genes associated with Th1 or Th2 functions in the indicated cell subtypes (log-normalized UMI counts).

Next, we examined each subtype separately. While the gene profile of tumor-specific Tfh were consistent with those of the viral reference, tumor-specific T cells projected into the Th1 effector state seemed to diverge (Figure 7C). Indeed, re-clustering and re-calculation of UMAP embeddings (now including both the virus-specific and the tumor-specific T cells) revealed that the T helper cells from the tumor formed a separate cluster (Figure 7D). Compared to virus-specific Th1 cells, this tumor-specific CD4+ T cell state differentially expressed genes associated with Th2 cells, including Igfbp7, Ccl1, Ccr8, and Il13, and downregulated Th1-associated genes, including Ccl5 and Ly6c2 (Figure 7E–F). This suggests that CD4+ T cells acquire distinct effector programs in cancer and infection.

While reference maps aim at being as comprehensive as possible, it is possible that new datasets contain novel states that are not represented in the reference, especially when used in different diseases models. In these cases, the user is encouraged to make use of all the analytic tools we provide with the ProjecTILs package (see Materials and methods) to evaluate the degree of correspondence between reference and query, as we illustrated in the case of tumor-specific T cells (Figure 7C–E). These analyses demonstrate the feasibility of using a reference map to describe cell diversity beyond the states already present in the map, and as a strategy to expand references to incorporate novel, unrepresented cell states.

Discussion

CD4+ T cells orchestrate immune responses to pathogens and critically support protection conferred by vaccination. However, the phenotypic and functional plasticity of CD4+ T cells has hindered a robust, unbiased delineation of pathogen-specific T cell subtypes. Although the precise characterization of T cell transcriptional states is fundamental toward understanding the dynamics of immune responses, the subtype-specific changes occurring over time in response to acute and chronic infections remain poorly understood. In recent years, scRNA-seq has enabled unbiased interrogation of T cell transcriptional diversity at the single-cell level. However, definition of T cell subtypes and states remains subjective and inconsistent across studies. We have previously demonstrated that reference projection is a scalable computational approach for robust and consistent single-cell data analysis, provided a reference cell map (Andreatta et al., 2021c).

In this work, we aimed at providing a reference map of the transcriptional and clonal landscape of virus-specific CD4+ T cells in acute and chronic infections. To this end, we generated a scRNA-seq dataset of >35,000 high-quality virus-specific polyclonal CD4+ T cells from infected mice in multiple timepoints throughout LCMV chronic or acute infections. One key advantage of this model is that it allowed us to explore T cell transcriptional diversity and clonality in both acute and chronic settings using a single antigen-specificity (GP66:I-Ab+) and to include biological replicates for every condition and timepoint. Combined with the projection algorithm ProjecTILs, our new reference map enabled robust and consistent interpretation of external CD4+ T cell single-cell transcriptomics data from multiple tissues, conditions, and biological models, providing a new powerful resource for the community.

We also presented new insights into the transcriptional adaptation of virus-specific CD4+ T cell populations over time and across conditions. Our analyses during acute infection highlighted that, although the major gene expression changes occur at the end of the initial proliferative burst, early memory CD4+ T cells that survive the contraction phase undergo continued transcriptional remodeling at later timepoints, similarly to late changes at play in CD8+ and NK T cells memory development (Chang et al., 2014; Lau et al., 2018; Milner et al., 2020). However, CD4+ T cell subsets undergo memory transition in a divergent manner, where each subtype acquires memory features with different kinetics and using non-overlapping transcriptional modules. Similar to the acquisition of the memory program in CD8+ T cells, Th1 memory differentiation is characterized by a dampening of effector functions accompanied by the upregulation of molecules associated with their long-term survival. In contrast, the transition of effector Tfh cells to memory states appears to be delayed, as molecules associated with Tfh function such as IL-21 or ICOS remain expressed in early memory Tfh cells, and their expression only decreases at later timepoints. This delayed transition into resting memory could be associated with differential and prolonged antigen exposure of Tfh within the germinal centers compared to Th1 cells (Künzli et al., 2020). Interestingly, the transition to the Tcm state diverges from both Th1 and Tfh. In fact, most memory-associated features like the expression of Ccr7, Il7r, or Bcl2 appear quickly at the early memory phase or are already present in Tcmp cells at the acute phase of the response. This observation is in line with the concept that Tcmp cells already express a large fraction of memory-associated genes allowing for their survival and homing to lymphoid organs. Thus, it is possible that most Tcm derived mainly from Tcmp cells through the acquisition of a transcriptional state poised for memory differentiation, similar to memory-precursors in CD8+ T cell populations (Joshi et al., 2007; Kaech et al., 2003).

During chronic infection, similar state-specific changes occur in CD4+ T cell subtypes. In addition to a shared transcriptional module upregulated in all subsets, Th1 and Tfh states differ in their adaptation to chronic antigen stimulation. Imprinting of persistent antigen exposure on Th1 cells results in a reduction of effector function characterized by the repression of effector molecules like granzymes, reminiscent of CD8+ T cell function dampening in chronic infection and cancer (Crawford et al., 2014; Singer et al., 2016). In contrast, Tfh effector functions remain unaffected at the chronic phase of the response. While this could be related to the strong Tfh bias observed in chronic infections (Fahey et al., 2011), we also observed in Th1 subsets features typically associated with Tfh cells, including the expression of IL-21. Because IL-21 is critical to limit T cell dysfunction during chronic infections (Elsaesser et al., 2009; Fröhlich et al., 2009; Yi et al., 2009), it is possible that compensatory mechanisms enforce its expression in non-Tfh subsets. Alternatively, the ‘boundaries’ between CD4+ T cells states may be less easily delineated during chronic infection. In line with this idea, our analyses reveal that during chronic infection, the accumulation of Eomes+ virus-specific CD4+ T cells is accompanied by the downregulation of the CD4+ T cell-defining factor Thpok. Similar to ‘redirected’ CD4+ T cells in both human and mouse (Mucida et al., 2013; Serroukh et al., 2018), this CD4+ T cell subset is characterized by the upregulation of targets actively repressed by Thpok, including the transcription factor Eomes and genes associated with cytotoxic activity (Ciucci et al., 2019; Vacchio et al., 2019). While the downregulation of Thpok in CD4+ T cells can be influenced by the cytokine milieu (Cervantes-Barragan et al., 2017; Reis et al., 2014), it is important to note that the Eomes+ Thpoklow CD4+ T cell population appears to be limited to chronic settings, including in responses to the gut microbiota (Cervantes-Barragan et al., 2017; Mucida et al., 2013).

Although we have focused on LCMV viral infection, this study represents the first step into building a more comprehensive reference map of the transcriptional landscape supporting the functional heterogeneity of CD4+ T cells across tissues and beyond viral infections. In fact, we have shown that unrepresented transcriptional states (e.g. tumoral Th cells) and programs (e.g. adaptation to non-lymphoid tissue) can be interpreted in the context of the reference map, suggesting a strategy where reference maps can evolve to incorporate novel cell states. We provide user-friendly computational resources for investigators to explore the new CD4+ T cell map and to analyze external datasets in the context of this reference. The virus-specific CD4+ T cell reference map developed in this study can be explored within the SPICA portal at https://spica.unil.ch/refs/viral-CD4-T, where users can compare the expression of genes of interest in individual cell subtypes and across the reference space. SPICA also hosts interactive analyses for the datasets described above and for several others (https://spica.unil.ch/projects). Finally, researchers can project their own scRNA-seq data through the SPICA web interface, or by using our R package ProjecTILs available at https://github.com/carmonalab/ProjecTILs (Carmona and Andreatta, 2022).

Materials and methods

Mice, virus, and infections

Request a detailed protocol

C57BL/6Ncr were infected by intra-peritoneal injection of 2 × 105 pfu of LCMV Armstrong or intravenously with 2 × 106 pfu of LCMV Clone 13. Viral stocks were prepared and titrated as previously described (Ciucci et al., 2019; Dangi et al., 2020).

Antibodies

Antibodies for the following specificities were purchased either from Becton-Dickinson Pharmingen, BioLegend or ThermoFisher: CD4 (GK1.5), CD8α (53-6-7), CD5 (53–7.3), B220 (RA3-6B2), CD44 (IM7), IL-7Ra (A7R34), CCR7 (4B12), CXCR5 (SPRCL5), CXCR6 (SA051D1), PSGL1 (2PH1), Ly6C (HK1.4), CD27 (LG/3A10), FR4 (12A5), Thpok (T43-94), Eomes (Dan11mag), CD69 (H1.2F3), LAG3 (C9B7W), Tim3 (RMT3-23), KLRG1 (2F1), PD1 (29 F.1A12), CX3CR1 (SA011F11). MHC tetramers loaded with the LCMV GP66 or GP33 peptides were obtained from the NIH Tetramer Core Facility.

Spleen cell preparation and staining

Spleen cells were prepared and stained as previously described (Ciucci et al., 2019). Surface staining with GP66:I-Ab tetramer and for CCR7 or CXCR5 was performed at 37 °C for 1 hr prior to staining with antibodies for other cell surface markers. Intracellular stainings were performed as previously described using the Transcription Factor Staining Buffer (ThermoFisher) (Chopp et al., 2020). Data was acquired on Aurora spectral flow cytometer (Cytek) and analyzed with FlowJo V10.8 software (TreeStar). Dead cells and doublets were excluded by DAPI or LiveDead staining (Invitrogen) and forward scatter height by width gating. Purification of lymphocytes by cell sorting was performed on a FACS Fusion and FACS Aria (BD Biosciences).

Tumor model and cell preparation

Request a detailed protocol

MC38 colon cancer carcinoma cell line expressing the LCMV-derived GP protein was kindly provided by R. Bosselut at the National Cancer Institute, NIH, and cultured and inoculated as previously described (Magen et al., 2019). The cell line was tested negative for Mycoplasma by PCR. The cell line is not part of the commonly misidentified cell lines of the International Cell Line Authentication Committee. 2 weeks following subcutaneously injection, tumor and draining lymph nodes were harvested and processed as previously described (Magen et al., 2019).

Single-cell RNA sequencing

Request a detailed protocol

GP66:I-Ab+ T cells were sorted from LCMV infected or tumor-bearing animals, loaded onto the Chromium platform (10 X Genomics) to generate cDNAs carrying cell- and transcript-specific barcodes that were used to construct sequencing libraries using the Chromium Single Cell 5′ or 3’ Library & Gel Bead Kit according to the manufacturer instructions. For pooled captures, two-cell populations were sorted and barcoded separately with TotalSeq antibodies (BioLegend) before mixing and cell captures (Supplementary file 1). Libraries were sequenced on multiple runs of Illumina NextSeq or Novaseq using paired-end reads to reach a sequencing saturation of 60–90%, resulting in at least 2–9 × 104 reads/cell. Single-cell sequencing files were processed, and count matrixes extracted using the Cell Ranger Single Cell Software Suite (10 X Genomics).

scRNA-seq and scTCR-seq data processing and quality control

Single-cell transcriptomes and single-cell TCR sequences were mapped and combined using the combineTCR function from scRepertoire (Borcherding et al., 2020). We performed quality control on the single-cell data using the following criteria: number of detected genes >700; number of UMIs >1500 and<15,000; percentage of ribosomal genes <50 and percentage of mitochondrial genes <10. For all these parameters, we additionally removed all extreme outlier cells outside the 1st and 99th percentile in each sample. In order to filter out potential contaminants and experimental artifacts, we applied the UCell package (Andreatta and Carmona, 2021b) to evaluate a panel of signatures for several common immune and non-immune cell types. This resulted in high-quality transcriptomes for 35,488 single cells from 11 samples, covering acute and chronic infections at three different timepoints.

scRNA-seq data integration

Request a detailed protocol

For the construction of the CD4+ T cell reference map, datasets were downsampled to balance the contribution from different types of infection and timepoint. To this end, a maximum of 5000 cells were randomly selected for each of the 5 subsets: acute day 7, acute day 21, acute day 60, chronic day 7 and chronic day 21. To mitigate batch effects between samples, we integrated the 11 samples using STACAS (Andreatta and Carmona, 2021a) with the following parameters: number of variable genes = 800, dist.thr=0.6, dims = 20. For the selection of variable genes for data integration, we did not consider mitochondrial genes, ribosomal genes, heatshock proteins, interferon-stimulated genes (ISGs), cell cycle genes or T cell receptor genes (gene sets available in Supplementary file 3). On the integrated data, unsupervised clusters were calculated using the FindNeighbors and FindClusters functions from Seurat (Hao et al., 2021) with parameters: k.parameter=10, resolution = 0.4. Finally, unsupervised clusters were manually annotated guided by differential expression analysis between clusters, merging clusters where appropriate, to obtain nine ‘functional clusters’ that summarize the diversity of CD4+ T cells in acute and chronic infections.

Dataset projection and reference-based analysis of scRNA-seq data

Request a detailed protocol

In order to avoid large imbalances between subtypes, and to limit its disk size, the CD4+ T cells reference map is a downsampled version of all available data. In order to obtain low-dimensional embeddings and subtype annotations for all cells generated in this study, we projected them onto the map using ProjecTILs with default parameters (Andreatta et al., 2021c). The same method was applied to project and interpret scRNA-seq data from several additional studies not included in the reference map (Ciucci et al., 2022, Ciucci et al., 2019; Khatun et al., 2021; Künzli et al., 2020; Snell et al., 2021; Swarnalekha et al., 2021). For all re-analyses of public data, we retrieved the gene expression matrices from Gene Expression Omnibus (GEO) under the following accession numbers: LCMV-specific CD4+ T cells at 7 and 30 days p.i. (GSE121002); LCMV-specific CD4+ T cells at day 10 p.i. (GSE158896); LCMV-specific CD4+ T cells at day 35 p.i. (GSE139198); CD4 +T cells from lung and lymph nodes of influenza-infected mice (multiple timepoints), and liver cells at day 37 post-LCMV infection (GSE146626); Blc6-deficient, Blimp-1 deficient and WT CD4 +T cells 7 days post-acute LCMV infection (GSE149912); Anti-PD-L1-treated and isotype-treated virus-specific CD4 +T cells at day 33 p.i. (GSE163345). Prior to reference projection, CD4 +T cells were purified from each dataset using scGate (Andreatta et al., 2022) and default T cell and CD4 +T cell gating models.

Detection of novel/unrepresented states

Request a detailed protocol

Several utilities are available in the ProjecTILs package to evaluate query projection accuracy and to detect the presence of novel cell states. First, on a qualitative level, one can apply the plot.states.radar() function to compare the expression of panels of key genes between query and reference. Second, the user can recalculate the UMAP embeddings of the combined reference and query space (function recalculate.embeddings()) to assess whether part of the projected data form a novel, separate cluster (e.g. Figure 7D). Third, per-subtype differential expression analysis (function find.discriminant.genes()) can reveal which and how many genes are differentially expressed between reference and query in a given subtype (e.g. Figure 7E). Fourth, the compute_silhouette() function calculates an average silhouette coefficient per subtype, which aims at measuring the average distance of query cells from their own assigned cluster compared to all other clusters of the reference. A case study highlighting the application of these metrics can be found at: https://carmonalab.github.io/ProjecTILs_CaseStudies/novelstate.html.

Gene signatures of adaptation to acute and chronic infections

Request a detailed protocol

CD4+ T cell state signatures were calculated by comparing each of the 3 main effector and 3 main memory clusters to the two other clusters in the same state (i.e. effector or memory) using the FindMarkers function implemented in Seurat. To summarize subtype-specific transcriptional changes at different stages of infection, we first identified differentially expressed genes between timepoints and subtypes. For this analysis, Th1 effector and Th1 memory cells were grouped together to identify Th1-type cells. Similarly, we combined the Tfh effector and Tfh memory clusters (Tfh type), as well as Tcmp and Tcm clusters (Tcm(p) type). For significantly differentially expressed genes, we calculated average expression profiles for the Th1, Tfh, and Tcm(p) types at individual timepoints using the find.discriminant.genes function of ProjecTILs (Andreatta et al., 2021c). The UCell algorithm (Andreatta and Carmona, 2021b) was applied on the CD4+ T cell map subtypes to evaluate gene signatures for subtypes identified in a previous study (Ciucci et al., 2019). To exclude potential confounding factors, ribosomal-associated, sex-specific and TCR transcripts (Magen et al., 2019) were removed from signatures related to acute and chronic adaptation (Supplementary file 3).

T cell clonal analysis

Request a detailed protocol

The CDR3 amino acid sequence for productive alpha-beta VDJ rearrangements obtained by scTCR-seq were used as unique ‘barcodes’ to identify individual T cell clones. The expansion level of a clone was calculated as the absolute number of cells with identical TCR sequence in a given sample, either in terms of CDR3 sequence or full nucleotide sequence. Expanded clones that were unique to a sample were denoted as private clones, expanded clones found in at least three samples were denoted as public clones. Merging and visualization of scTCR-seq data were performed using the scRepertoire package (Borcherding et al., 2020).

To measure gene expression- TCR relationship, we defined a ‘clonotype bias’ metric to quantify whether a given clonotype was preferentially composed of one of the T cell subtypes: c=maxi [ (fi - qi) / (1-qi) ] where fi is the observed frequency for subtype i in the clonotype, and qi is the background frequency of subtype i in the whole sample. To assess statistical significance of measured clonotype bias scores, we generated N random permutations of the observed clonotype data, preserving clonal size and global subtype background frequencies. On the distribution of permuted clonotype bias scores, binned by clone size, we determined expected mean and standard deviation for each clone size bin. Z scores for observed clonotype bias scores were then calculated as the number of standard deviations from the background mean (Z score = 5 corresponds to a p-value ~6*10–7). PWMs for the fate-biased TCR alpha CDR3 motifs identified by Khatun et al., 2021 were kindly provided by the authors. We applied glam2scan (Frith et al., 2008) as in the original study to score these motifs on our data and rank clones based on individual motifs. We found that the motifs by Khatun et al. did not correlate with biased clones from our datasets (Figure 4—figure supplement 1C). Selecting clonotypes based on the 65% percentile of each of these motifs (as in the original study) did not enrich biased clones compared to their expected frequency. In particular, Tfh motifs were not predictive of Tfh bias and Th1 motifs were not predictive of Th1 bias (Figure 4—figure supplement 1D).

Effect of cell cycling in defining reference spaces

Request a detailed protocol

To investigate the effect of cycling cells in the definition of low-dimensional embeddings, we re-analyzed the dataset by Khatun et al., 2021 (LCMV acute day 10). Unsupervised analysis of two replicates from this study (replicates 1 and 2) showed that cycling cells clustered together and that cells in this cluster expressed a mixture of markers for Th1, Tfh, and Tregs (Figure 5—figure supplement 1C-E). Next, we asked whether projecting these cycling cells into our reference map allowed discriminating the different subsets. Reference projection of cycling cells revealed that, while the majority of cells were assigned to Th1, more than 30% of cycling cells were predicted to be Tfh, Tcmp, and Treg (Figure 5—figure supplement 1F). The expression profile of a panel of marker genes for these cell subtypes corresponded closely with the reference profiles (with additional high expression of cycling markers for example Mki67, as these are all cycling cells), confirming that the cycling cluster was composed of a mixture of different cell types (Figure 5—figure supplement 1G). These analyses showed that cell cycling can mask differences between cell subtypes, and that failing to account for cell cycling signals leads to mixing of multiple cell types in low dimensional spaces.

Effect of sequencing depth on reference-based annotation

Request a detailed protocol

Starting from the data generated in this study, as well as on an external dataset (Khatun et al., 2021), we applied the ‘downsampleMatrix’ function from the scuttle package (McCarthy et al., 2017) to generate downsampled scRNA-seq count matrices with increasingly lower sequencing depth (99%–10% of measured depth) for each sample in the study. These reduced-depth datasets were then systematically projected into the reference map, and we evaluated the agreement of the cell subtype annotation with the annotation of the full-depth dataset. We measured the classification agreement as a function of the minimal (1% quantile) or median (50% quantile) number of genes, and of the minimal and median number of UMIs. We observed that subtype classification was robust to sequencing depths down to 30% of the original sequencing depth, corresponding roughly to a median of 500 detected genes and around 1000 median UMIs per cell (Figure 5—figure supplement 2B,E). Moreover, disagreement in classification between full depth and downsampled depth was mostly affecting related cell subtypes, such as the effector and memory states of Tfh or Th1 cells, or Tcm cells and other memory subtypes (Figure 5—figure supplement 2C,F). On the whole, these experiments show that reference-based annotation is robust to sequencing depth for transcript counts and numbers of detected genes currently yielded by standard scRNA-seq technologies.

Statistical analyses for flow cytometry data

Request a detailed protocol

Statistical significance was calculated with Prism software. Except where otherwise indicated in figure legends, error bars in graphs indicate standard deviation and statistical comparisons were done by one-way ANOVA test.

Data availability

Sequence data are deposited in the NCBI Gene Expression Omnibus under accession numbers GSE182320 and GSE200635. The new reference atlas can be downloaded (DOI: 10.6084/m9.figshare.16592693) or accessed via the web portal (https://spica.unil.ch/refs/viral-CD4-T). All code sources are available at https://github.com/carmonalab/ProjecTILs (copy archieved at swh:1:rev:b8bb396674697a3e6ca53967ca768f2e2fb7e61c) and https://github.com/carmonalab/ProjecTILs_CaseStudies.

The following data sets were generated
    1. Ciucci T
    2. Carmona S
    (2021) NCBI Gene Expression Omnibus
    ID GSE182320. Single-cell gene expression of virus-specific CD4 T cells in response to acute and chronic infection.
    1. Ciucci T
    2. Carmona S
    (2022) NCBI Gene Expression Omnibus
    ID GSE200635. Single-cell gene expression of tumor-specific CD4 T cells.
The following previously published data sets were used
    1. Cui W
    2. Khatun A
    (2020) NCBI Gene Expression Omnibus
    ID GSE158896. Single-cell lineage mapping of a diverse virus-specific naïve CD4 T cell repertoire.
    1. Kuenzli M
    2. Schreiner D
    3. Roux J
    4. King C
    (2019) NCBI Gene Expression Omnibus
    ID GSE134157. Single-cell RNA-sequencing of spleen memory CD4+ T cells.
    1. Bosselut R
    2. Ciucci T
    (2018) NCBI Gene Expression Omnibus
    ID GSE121002. Single-cell gene expression of anti-viral WT and Thpok-deficient effector and memory T cells.

References

Decision letter

  1. Juan Carlos Zúñiga-Pflücker
    Reviewing Editor; University of Toronto, Sunnybrook Research Institute, Canada
  2. Tadatsugu Taniguchi
    Senior Editor; Institute of Industrial Science, The University of Tokyo, Japan
  3. Laura M Snell
    Reviewer; Indiana University School of Medicine, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "A CD4+ T cell reference atlas delineates subtype-specific adaptation during acute and chronic viral infections" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Tadatsugu Taniguchi as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Laura M Snell (Reviewer #1).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The reviewers agreed that additional details, clarity, validation, and broader integration of the proposed atlas would strengthen the conclusions and usefulness of the study. However, if these additional analyses lead to altered interpretation or utility of the atlas then the authors should revise the work accordingly.

2) Please pay close attention to the detailed recommendations provided by Rev #2.

Reviewer #1 (Recommendations for the authors):

1) Viral titers should be paired to the different antiviral CD4 T cell transcriptional outcomes. Although LCMV Armstrong is quickly cleared, the clearance kinetics of LCMV Clone 13 can vary dramatically from laboratory to laboratory. The paper simply says that LCMV Clone 13 persists, but it is important to demonstrate how close the virus is to being cleared at the late chronic timepoint, as viral titer will impact transcriptional phenotypes. It will be relevant to have this information when using the reference atlas on other LCMV datasets which may clear with slightly altered kinetics.

2) Gene expression in Th1, Tfh and Tcm(p) is well characterized across time. Does gene expression in the Th1 memory, Tfh memory and Tcm change across early memory to late memory during acute infection? How does gene expression in the Tfh memory in late chronic infection compare to that of early memory (day 21 in both models)?

3) While the changes in proportions of the cell clusters across time and in acute versus chronic viral infection are demonstrated, a big contraction of virus-specific CD4 T cell numbers would be expected between Day 7 and Day 21 in both acute and chronic infection. As such, it would be relevant to also show the absolute number of cells in each cluster across the timepoints to get an accurate depiction of whether the enhancement in proportions of memory clusters also translated to an enhancement in absolute numbers of these clusters, or whether the numbers of memory cells in each cluster are simply maintained across the timepoints.

4) The text says that the atlas with the reference projection algorithm can enable interpretation of CD4 states across models, although all the examples given were based on LCMV datasets. Can the reference atlas accurately determine Th1/Tfh phenotypes from non-LCMV CD4 datasets? Many other models also drive Th1/Tfh differentiation. Single-cell analysis has been done on the discrimination of Th1/Tfh in malaria for instance: Lonnberg T et al. Sci Immunol. 2017 etc. and new data is emerging characterizing CD4s in various cancer models. Does the reference atlas hold up when determining CD4 subsets from data that is not LCMV-based?

5) The figure legends could benefit from more detail. In figure 1 for instance it is unclear if the UMAPs are based on a representative sample or the merged data of all samples. Also, the tissue of origin where the cells were sorted from should be mentioned for the reader's clarity.

Reviewer #2 (Recommendations for the authors):

1) The sequencing batches used to construct the 'atlas' contain biologically distinct samples (Figure 1A-B). Therefore, prior to integration, both technical and biological differences will drive cell separation. In such instances it is useful to have at least one cell population present in all batches to verify integration performance – cells from equivalent populations should produce a joint overlapping cluster whereas biologically distinct populations such as central memory T cells and exhausted T cells should produce distinct clusters. By difficult to understand experimental design, this paper does not seem to have any such populations so the performance of the integration is difficult to assess. Even so, the authors could quantify the degree of alignment between clusters in the d21 Clone 13 samples present in batches 2 and 3, and the d7 Arm samples present in Batch 1 and 2. Based on Figure S2A which is the only data related to integration performance, there is significant heterogeneity between biological replicates. For example, Tregs are virtually absent from the second Late Chronic biological replicate whereas the 'Tfh memory' subset is highly abundant compared to the first replicate. Similarly, the cluster frequencies of the low-frequency clusters look very different between replicates in the Early Memory (d21 Arm) group. Given this uncertainty about integration performance, it is difficult to interpret the subsequent data as it could be partially explained by technical variation between batches.

2) The TCR analysis does not address prior work by Khatun et al. (JEM 2020) which showed that the Tfh bias of certain TCR sequences could be predicted in independent mice. The authors' analysis is limited to stating the degree of bias in each clonotype frequency group. Did the authors attempt to replicate the observation by Khatun et al.? What was the overlap between CDR3 motifs? What was the overlap in motifs between Khatun et al. and this study?

3) The 'atlas' functionality is limited to a superficial demonstration of projecting several LCMV CD4 T cell dataset onto the authors' dataset. There is no data quantifying the performance of this integration in absolute terms or relative to other methods. For example, given that the 9 clusters defined by the authors are previously known CD4 T cell subsets, what is the advantage of using this method compared to quantifying the expression of existing marker gene sets in the primary datasets? What is the performance of this method compared to manual integration of individual datasets?

4) What is the effect of sequencing depth on integration performance? Would low-depth datasets produce annotation results with the most central clusters dominant due to lack of specific, cluster-defining lowly expressed genes? What is the minimum depth at which technical effects would not drive integration? This type of information is essential if the 'atlas' is to be used as a tool, otherwise the resulting misannotations could do more harm than good to the users.

5) It is unclear how the removal of cell cycle genes from the initial dataset affects interpretation and integration. Given that cell cycle state and cell fate are causally linked in T cells, would the removal of cell cycle genes not obscure some meaningful transcriptomic differences between populations? Are the cell cycle genes in dividing effector cells the same as in dividing early memory cells?

6) The experimental validation of this dataset is limited to showing that CD4 T cells in persistent infection express more EOMES than T cells in acutely infected mice and that they express lower levels of THPOK. However, what is the global alignment between flow cytometry data presented in Figure 1 and the scRNAseq data? Were any of the cluster frequencies predicted by the scRNAseq data validated using a protein panel?

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "A CD4+ T cell reference map delineates subtype-specific adaptation during acute and chronic viral infections" for further consideration by eLife. Your revised article has been evaluated by Tadatsugu Taniguchi (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there is one remaining issue that needs to be addressed, as outlined below:

Reviewer #1 (Recommendations for the authors):

In general this reviewer is satisfied with the revisions to the manuscript, however, one point needs to be better clarified. When the tumor-specific CD4 T cells from TILs were projected into the reference map 40-50% of them mapped into Th1 effectors. Yet upon further analysis and reclustering, these cells ended up being a completely distinct population of cells from the viral effector Th1. Thus, this reviewer is worried this could lead to misinterpretation and incorrect identification of subsets when using the reference map on other systems with unrepresented subsets not in the reference map. Could the authors comment on/clarify this point? It would be helpful to discuss the additional steps needed to verify that the corresponding states determined from projecting one's data into the reference map have similar gene profiles, and if they do not, how to address and identify these novel populations not represented in the map.

Reviewer #2 (Recommendations for the authors):

The authors have addressed my concerns. I can now recommend publication.

https://doi.org/10.7554/eLife.76339.sa1

Author response

Essential revisions:

1) The reviewers agreed that additional details, clarity, validation, and broader integration of the proposed atlas would strengthen the conclusions and usefulness of the study. However, if these additional analyses lead to altered interpretation or utility of the atlas then the authors should revise the work accordingly.

2) Please pay close attention to the detailed recommendations provided by Rev #2.

We thank reviewers and editors for their interest in the manuscript and especially for their valuable feedback. We have addressed all the reviewers' comments, supported by new data and several new analyses. In particular, we demonstrate the robustness and generalizability of our reference map and projection method by analyzing data from multiple tissues and viral infection models, as well as following genetic and therapeutic perturbations. In addition, we generated and analyzed a novel dataset derived from tumor-specific CD4+ T cells, showing the utility of our framework to interpret cell diversity even beyond the cell states currently present in the reference map.

The new results are shown in Figure 5 (new panels D and E), in two new main Figures 6 and 7, in new panels in supplemental figures Figure 1—figure supplement 1, Figure 2—figure supplement 1, Figure 4—figure supplement 1, and new supplemental figures Figure 5—figure supplement 1, Figure 5—figure supplement 2, and Figure 5—figure supplement 3. Please note that names of the supplemental figures have been changed and associated to main figures per request of the journal editors.

The manuscript has been modified to accommodate the new data and address the comments by the referees.

Reviewer #1 (Recommendations for the authors):

1) Viral titers should be paired to the different antiviral CD4 T cell transcriptional outcomes. Although LCMV Armstrong is quickly cleared, the clearance kinetics of LCMV Clone 13 can vary dramatically from laboratory to laboratory. The paper simply says that LCMV Clone 13 persists, but it is important to demonstrate how close the virus is to being cleared at the late chronic timepoint, as viral titer will impact transcriptional phenotypes. It will be relevant to have this information when using the reference atlas on other LCMV datasets which may clear with slightly altered kinetics.

This is an important technical point since viral replication can vary at later timepoints, especially in the presence of CD4+ T cells. We agree with the reviewer that, ideally, viral titers could be determined before proceeding to the single-cell capture and RT-PCR. However, our experience with single-cell transcriptomic technology showed that optimal capture, i.e. cell number and gene coverage, sharply decreases ~4h after animals are sacrificed. Therefore, determining viral titers, even by qPCR, represents a logistical challenge that could have technically jeopardized the study or compromised data quality.

However, we performed a phenotypic characterization of T cell populations by flow cytometry for each capture, in particular focusing on the persistence of the “exhaustion” marker PD1 and the diminution of IL7R expression by the virus-specific CD4+ T cell populations. As shown in Author response image 1, we confirmed that, for both “Late chronic” captures, CD4+ T cells were PD1high IL7Rintermadiate/low unlike their “Early Memory” counterparts which were PD1low IL7Rhigh.

Author response image 1
Flow cytometry expression of PD1 and IL7R on naïve and GP66 CD4+ T cells for each of the indicated sample processed for scRNAseq capture.

Moreover, to put our results in the context of known LCMV Cl13 viral titers, we compared our transcriptional profiles to those of a recent study of LCMV-specific CD4+ T cells at day 33 of chronic infection, in which plasma viral titers have been determined (in the order of 105 PFU per ml of plasma) (Snell et al., Nature Immunology 2021, PMID: 34795443). Although this study used (transgenic-TCR) SMARTA cells and our data consist of polyclonal GP66-specific tetramer sorted cells, projection of the data by Snell et al. into our reference map showed a remarkable similarity to our Late Chronic samples, both in terms of subtype composition and their transcriptional profiles. Our automated analysis confirmed that, compared to isotype control, anti-PD-L1 treatment induced amplification of Th1 SMARTA cells, and that these upregulated Th1-associated genes including Klrd1, Ly6c2, Ctla2a, Plac8 and Lgals3, in agreement with the findings by Snell et al. These results are shown in the new panel E of Figure 5 and new Figure 5—figure supplement 3:

This analysis is described in the manuscript:

“Next, we aimed at using our reference map to interpret the effect of immunotherapies. To this end, we projected scRNA-seq data of virus-specific CD4+ T cells isolated from mice with LMCV chronic infection, after treatment with an anti-PD-L1 antibody (Snell et al. 2021). While control samples showed a similar subtype distribution to our late chronic samples (Figure 3C, Figure 5E), anti-PD-L1 treatment increased the relative proportion of Th1 effectors (Figure 5E). Expression profiles for all major subtypes in this dataset largely matched those of our reference (Figure 5—figure supplement 3A), including the expression of exhaustion markers that was similar to that of the chronic infection samples of the reference map (Figure 5—figure supplement 3B). Notably, Th1 effector cells after anti-PD-L1 treatment, upregulated a Th1-associated gene module that includes Klrd1, Plac8, Ctla2a, and Ly6c2 (Figure 5—figure supplement 3C), confirming the findings by Snell et al. 2021.”

These two independent validations demonstrate that our late chronic datasets were isolated at a timepoint when CD4+ T cells were actively responding to viral antigens.

2) Gene expression in Th1, Tfh and Tcm(p) is well characterized across time. Does gene expression in the Th1 memory, Tfh memory and Tcm change across early memory to late memory during acute infection? How does gene expression in the Tfh memory in late chronic infection compare to that of early memory (day 21 in both models)?

This is an interesting point that we briefly discussed for Tfh memory (lines 174 to 180), since we observed that this subtype undergoes substantial changes in late memory compared to the early memory phase. These changes include the downregulation of Tfh effector program (including Icos and Il21) and are accompanied by the upregulation of memory genes like Il7r. We did note a modest increase in “memory-associated” genes for Th1 and Tcm(p) subtypes from early to late memory, though not to the same extent as Tfh memory cells (Figure 3B). Additional comparisons during Acute timepoints and subsets are included in Supplementary File 2 – Acute Adaptation.

Concerning the differences between Acute day 21 (Early Memory) to Chronic day 21 (Late Chronic), we now included these comparisons in Supplementary File 2 – Chronic Adaptation. As expected, many of the genes upregulated in Late Chronic are associated with T cell activation and T cell dysfunction in response to persistent TCR signaling and response to viral antigens. In particular, Tfh cells in late chronic infection upregulated inhibitory receptors such as Lag3, Pdcd1 and Ctla4, and transcription factors associated with exhaustion such as Tox and Bhlhe40 (Supplementary File 2 – Chronic Adaptation).

3) While the changes in proportions of the cell clusters across time and in acute versus chronic viral infection are demonstrated, a big contraction of virus-specific CD4 T cell numbers would be expected between Day 7 and Day 21 in both acute and chronic infection. As such, it would be relevant to also show the absolute number of cells in each cluster across the timepoints to get an accurate depiction of whether the enhancement in proportions of memory clusters also translated to an enhancement in absolute numbers of these clusters, or whether the numbers of memory cells in each cluster are simply maintained across the timepoints.

We agree that including absolute T cell numbers will provide a clearer view of the immune response. To address this point, we estimated the absolute number of T cells for all subtypes at early and late timepoints by multiplying total cell numbers in spleen determined by flow-cytometry across multiple experiments (new panel Figure 2—figure supplement 1F) and cluster proportions (shown in Figure 3A,C and Figure 2—figure supplement 1D). We observed that for both types of infection the number of memory cells at later timepoints exceed the number of the same cells at day 7 (new panel Figure 2—figure supplement 1G). Since there is no evidence that these memory subsets are actively proliferating at early timepoints (see reply to Reviewer #2), one possibility is that at least part of the memory pool is derived from effector subsets generated at earlier timepoints. This is consistent with previous studies showing that effector subtypes can transition – although to a lower extent than “memory precursors” – to memory subtypes (Harrington et al., Nature 2008, PMID: 18322463; Marshall et al., Immunity 2011; PMID: 22018471). We integrated these results in the Figure 2—figure supplement 1F-G and lines 163-164.

4) The text says that the atlas with the reference projection algorithm can enable interpretation of CD4 states across models, although all the examples given were based on LCMV datasets. Can the reference atlas accurately determine Th1/Tfh phenotypes from non-LCMV CD4 datasets? Many other models also drive Th1/Tfh differentiation. Single-cell analysis has been done on the discrimination of Th1/Tfh in malaria for instance: Lonnberg T et al. Sci Immunol. 2017 etc. and new data is emerging characterizing CD4s in various cancer models. Does the reference atlas hold up when determining CD4 subsets from data that is not LCMV-based?

This is a great suggestion. Indeed, our method can be used to annotate and interpret datasets from other models and tissues. To illustrate this, we analyzed T cells isolated from different tissues and infection models. In particular, our method accurately interpreted data from LCMV-specific CD4+ T cells in the liver (Künzli et al., Science Immunology, PMID: 32144185) and from influenza infection in the lung and lymph nodes (Swarnalekha et al., Science Immunology 2021, PMID: 33419790). These results are presented in the new Figure 6 and discussed in a new section of the manuscript: “Diversity of virus-specific CD4+ T cells across tissues”.

Briefly, we recapitulate the findings by Swarnalekha et al. that (a) Th1 subtypes dominate in lung populations at early time points, unlike their counterpart in the lymph nodes after influenza infection, (b) lung CD4+ T cell populations become enriched in Tfh-like states over time, (c) cells in the lungs display a tissue-resident gene module expressed across Tfh memory, Tfh effector and Th1 effector subtypes (Figure 6A-D). Interestingly, this gene module was also found to be significantly higher in liver compared to spleen in the context of LCMV infection (Figure 6F), suggesting a tissue residency program that is partly conserved across infections and cell states, and highlighting the power of our framework for discovering such programs.

In addition, we included a new analysis showing that our tool is useful to interpret the heterogeneity of CD4+ T cells beyond viral infections. These additional analyses are presented in the new Figure 7 and discussed in a new section of the manuscript: “Reference map projection to explore CD4+ T cell diversity beyond viral infections”.

Briefly, we generated an entirely new scRNA-seq dataset derived from tumor-specific CD4+ T cells (deposited in GEO under identifier GSE200635). Projection of these data into our viral infection-derived reference map revealed that tumor-specific CD4+ T cell populations in the tumor-draining lymph nodes were dominated by Tfh cells while tumors were enriched in Tregs, as well as by a distinct non-Treg, non-Tfh state (Figure 7A-B). Further investigation showed that, compared to Th1 effectors in response to viruses, these effector cells in the tumor differentially expressed Th2-associated genes (e.g. Ccr8, Tgfb1, Ccl1, Il5, Il13, Igfbp7) (Figure 7E-F).

5) The figure legends could benefit from more detail. In figure 1 for instance it is unclear if the UMAPs are based on a representative sample or the merged data of all samples. Also, the tissue of origin where the cells were sorted from should be mentioned for the reader's clarity.

More details have been added to clarify the figure legends throughout the manuscript.

Reviewer #2 (Recommendations for the authors):

1) The sequencing batches used to construct the 'atlas' contain biologically distinct samples (Figure 1A-B). Therefore, prior to integration, both technical and biological differences will drive cell separation. In such instances it is useful to have at least one cell population present in all batches to verify integration performance – cells from equivalent populations should produce a joint overlapping cluster whereas biologically distinct populations such as central memory T cells and exhausted T cells should produce distinct clusters. By difficult to understand experimental design, this paper does not seem to have any such populations so the performance of the integration is difficult to assess. Even so, the authors could quantify the degree of alignment between clusters in the d21 Clone 13 samples present in batches 2 and 3, and the d7 Arm samples present in Batch 1 and 2. Based on Figure S2A which is the only data related to integration performance, there is significant heterogeneity between biological replicates. For example, Tregs are virtually absent from the second Late Chronic biological replicate whereas the 'Tfh memory' subset is highly abundant compared to the first replicate. Similarly, the cluster frequencies of the low-frequency clusters look very different between replicates in the Early Memory (d21 Arm) group. Given this uncertainty about integration performance, it is difficult to interpret the subsequent data as it could be partially explained by technical variation between batches.

We thank the reviewer for these comments, which allow us to clarify a few points about batch effects, biological variability, and the power of our approach.

In their comment, we believe the reviewer refers to Figure 2A-B and not Figure 1, since Figure 1A-B are not for sequencing data but from spectral flow cytometry, for which the cells from all 4 conditions were analyzed at the same time. For additional clarification, the latter consisted in infecting animals at different timepoints so that the cell processing and analyses will be performed on the same day. This represents, for each experiment, 16-20 mice for which splenocytes were isolated, stained and acquired simultaneously to limit batch effect.

Regarding the integration of scRNA-seq data (Figure 2 A-B), we agree with the reviewer that data presented in Figure 2—figure supplement 1A were not sufficient to convincingly assess integration quality and reproducibly of the many replicates, which we have now addressed extensively (see below).

The reviewer commented that: “In such instances it is useful to have at least one cell population present in all batches to verify integration performance – cells from equivalent populations should produce a joint overlapping cluster whereas biologically distinct populations such as central memory T cells and exhausted T cells should produce distinct clusters.”. While we agree that it can be useful to have one cell population present in all batches, we have previously shown (Andreatta and Carmona (2021) PMID:32845323) that our integration method STACAS does not require that every batch shares a population with every other batch. Instead, it is sufficient that every batch shares one cell population with at least another batch to enable generating an integration guide tree. This is indeed the case, now highlighted in a new table (Supplementary File 1 B).

For instance, as the reviewer mentioned, cell populations present in d21 in Clone 13 and Arm samples (Early Memory and Late Chronic) are present in both batches 2 and 3, and cell populations of d7 Armstrong samples were present in both Batch 1 and 2; allowing for a pairwise batch integration strategy (e.g. B2-B3; (B2-B3)-B1). We acknowledge that this information was not clear in the first version, which was important to understand the experimental design.

Regarding heterogeneity between biological replicates (“Based on Figure S2A which is the only data related to integration performance, there is significant heterogeneity between biological replicates.”), we provide evidence that variability between biological replicates is within an expected range (by flow cytometry as well as by re-analysis of data by Snell et al. Nature Immunology 2021), and much lower than variability between conditions. Importantly, unlike the vast majority of scRNA-seq studies, we provide at least two biological replicates for every condition, allowing us to assess similarity between replicates (as shown in Figure 2—figure supplement 1A).

To better illustrate the proportion of each cell population across biological replicates, we show in the new Figure 2—figure supplement 1D-E the strong similarities in the distribution of all cell states across samples. In particular, despite minimal variations in the percentages of each state, all replicates from each condition are composed of similar subtype composition. Critically, samples clustered by condition rather than by batch (i.e. biological replicates bear the largest similarity to each other, despite batch effects). Validating the proportions of these populations by spectral cytometry showed that they are comparable to the subtypes defined from scRNA-seq (new Figure 1—figure supplement 1 E). These results are presented as new Figure 2—figure supplement 1D-E and discussed lines 129-132:

“All subtypes were present in similar proportions across biological replicates, and samples clustered by condition rather than by batch, further confirming a successful data integration (Figure 2—figure supplement 1D-E). Similar subtype proportions were confirmed by spectral cytometry (Figure 1—figure supplement 1E)”

Regarding the 3 minor states, the IFNI-stimulated population was mostly found in chronic samples, where it consistently represented 6-10% of the cells, and the EomesHI state was mostly seen in the two late chronic replicates, with a frequency of 3-4%. Finally, Tregs represented a very small fraction (consistently below 1% in all samples); preventing us to make any robust conclusion concerning their proportions variability.

As the reviewer correctly points out, we observed indeed that the Tfh_memory subtype was relatively more variable between Late Memory replicates (i.e. 24% vs 10%; ~17% on average). However, these proportions are (a) very similar to the Tfh_memory percentage confirmed by flow cytometry (~15%, Figure 1—figure supplement 1E), and (b) very similar to the Tfh_memory proportion determined in an independent scRNA-seq dataset of chronic infection (Figure 5E; ~15% Tfh memory in the untreated mice).

With all that said, precisely determining the degree of biological variability in the proportion of Tfh_memory is out of the scope of this manuscript, and these variabilities do not affect the conclusions of our study, given that gene expression profiles in each subtype were consistent across replicates.

2) The TCR analysis does not address prior work by Khatun et al. (JEM 2020) which showed that the Tfh bias of certain TCR sequences could be predicted in independent mice. The authors' analysis is limited to stating the degree of bias in each clonotype frequency group. Did the authors attempt to replicate the observation by Khatun et al.? What was the overlap between CDR3 motifs? What was the overlap in motifs between Khatun et al. and this study?

In line with our results, the study by Khatun et al. found that while the majority of clonotypes were not biased towards a specific fate, a subset of clonotypes were seen to preferentially develop into one particular lineage. In addition, they found that certain TCR α chains, but not β chains, were enriched in Tfh cells compared to other fates. Based on this observation, they derived CDR3 motifs of the TCR α chains that distinguished biased clones in their dataset, obtaining 27 (11 Th1-specific and 16 Tfh-specific) motifs. When tested on left-out mice not used to compute the motifs, they found that no Th1 motifs were predictive of Th1 bias, 2 Tfh motifs were predictive of Tfh bias, and one Tfh motif showed in fact significant preference towards the opposite lineage. The remaining 24 “fate-biased” motifs were not reproduced in the independent mice. We believe this is very weak evidence for the predictability of fate based on TCR sequence.

Nevertheless, we evaluated the PWMs for the motifs described by Khatun et al. (kindly provided by the authors) on our data to determine whether they were predictive of clonotype fate. As shown in the new Figure 4—figure supplement 1C, motif scores for the 7 motifs from Figure 4 of Khatun et al. (calculated using glam2scan [Frith et al. (2008)] as in the original paper) do not correlate with biased clones from our datasets. Selecting clonotypes based on the 65% percentile of each of these motifs (threshold used by Khatun et al) does not enrich biased clones compared to their expected frequency (column “All”). In particular, Tfh motifs are not predictive of Tfh bias and Th1 motifs are not predictive of Th1 bias (Figure 4—figure supplement 1D). These analyses were included as part of Figure 4—figure supplement 1 (panels C and D), and discussed on lines 281-283 and 710-717.

“We did not observe any robust CDR3 motif associated with biased clonotypes, and previously reported fate-biased CDR3 motifs (Khatun et al., 2021) were not predictive of clonotype lineage on our data (Figure 4—figure supplement 1C-D).”

“PWMs for the fate-biased TCR α CDR3 motifs identified by Khatun et al. (2021) were kindly provided by the authors. We applied glam2scan [PMID:18437229] as in the original study to score these motifs on our data and rank clones based on individual motifs. We found that the motifs by Khatun et al. did not correlate with biased clones from our datasets (Figure 4—figure supplement 1C). Selecting clonotypes based on the 65% percentile of each of these motifs (as in the original study) did not enrich biased clones compared to their expected frequency. In particular, Tfh motifs were not predictive of Tfh bias and Th1 motifs were not predictive of Th1 bias (Figure 4—figure supplement 1D).”

3) The 'atlas' functionality is limited to a superficial demonstration of projecting several LCMV CD4 T cell dataset onto the authors' dataset. There is no data quantifying the performance of this integration in absolute terms or relative to other methods. For example, given that the 9 clusters defined by the authors are previously known CD4 T cell subsets, what is the advantage of using this method compared to quantifying the expression of existing marker gene sets in the primary datasets? What is the performance of this method compared to manual integration of individual datasets?

Given the lack of robust methods or validated gene signatures for CD4+ T cell subtype classification, the only alternative is, as the reviewer suggests, manual annotation. Since there is no standardized way of performing a manual annotation, this process is usually both highly time-demanding and subjective. As an increasingly large body of scRNA-seq data becomes available, unsupervised analysis of individual datasets will become untenable. An automated and scalable system provides a unified framework to bring multiple datasets into the same space, to assign consistent labels, and to perform meta-analyses with robust criteria. We have previously demonstrated the usefulness and accuracy of such an approach in other contexts, and compared its performance with alternative methods (Andreatta et al. Nature Communications 2021).

In this revised version, we perform several additional analyses showing that our tool accurately predicts CD4+ T cell subtype composition changes across tissues in two different viral infections (LCMV infection spleen vs. liver; and flu in lymph node vs. lung; new Figure 6), upon anti-PD-L1 therapy (new Figure 5E, new Figure 5—figure supplement 3), and in the context of cancer (new Figure 7). Please see the detailed answer to reviewer #1 for these analyses. Using a fully automated method, we were able to recapitulate non trivial conclusions from previous studies, including (a) a large proportion of Tfh-like cells in lungs at late timepoints after influenza infection, (b) expression of a conserved non-lymphoid tissue residency gene module that was predictive across infection types, (c) amplification of Th1 cells following anti-PD-L1 therapy; showing that this reference-based analysis framework can save huge efforts in data analysis and subjectivity of manual annotations. Finally, we showed that even such an ‘incomplete’ CD4+ T cell map (built from viral infection data) provides a scaffold to identify novel states, such as that of tumor-specific CD4+ T cells (new Figure 7).

To address the reviewer’s concern about a potential misunderstanding of the word ‘atlas’ (i.e. collection of maps) in this context, we have replaced it with ‘map’ throughout the manuscript.

4) What is the effect of sequencing depth on integration performance? Would low-depth datasets produce annotation results with the most central clusters dominant due to lack of specific, cluster-defining lowly expressed genes? What is the minimum depth at which technical effects would not drive integration? This type of information is essential if the 'atlas' is to be used as a tool, otherwise the resulting misannotations could do more harm than good to the users.

We agree this is an important technical point that was not sufficiently addressed. We performed new analyses to assess the effect of sequencing depth on reference projection accuracy. Starting from the data generated in this study, as well as on an external dataset (Khatun et al. 2021), we applied the ‘downsampleMatrix’ function from the scuttle package (McCarthy et al. 2017) to generate downsampled scRNA-seq count matrices with increasingly lower sequencing depth (99% to 10% of measured depth) for each sample in the study. These reduced-depth datasets were then systematically projected into the reference map, and we evaluated the agreement of the cell subtype annotation with the annotation of the full-depth dataset (Figure 5—figure supplement 2A,D). We were also able to evaluate how the classification agreement was maintained as a function of the minimal (1% quantile) or median (50% quantile) number of genes, and of the minimal and median number of UMIs (Figure 5—figure supplement 2B,E). We observed that subtype classification was robust to sequencing depths down to 30% of the original sequencing depth, corresponding roughly to a median of 500 detected genes and around 1000 median UMIs per cell. Moreover, disagreement in classification between full depth and downsampled depth was mostly affecting related cell subtypes, such as the effector and memory states of Tfh or Th1 cells, or Tcm(p) cells and other memory subtypes (results for downsampling to 30% of select samples Figure 5—figure supplement 2 C,F). On the whole, these experiments show that reference-based annotation is robust to sequencing depth for transcript counts and numbers of detected genes currently yielded by standard scRNA-seq technologies. Thus, users can confidently use our tool without significant risks of misannotation. These results are presented in new Figure 5—figure supplement 2 and discussed in the new section: “Effect of sequencing depth on reference-based annotation” (lines 734-749).

5) It is unclear how the removal of cell cycle genes from the initial dataset affects interpretation and integration. Given that cell cycle state and cell fate are causally linked in T cells, would the removal of cell cycle genes not obscure some meaningful transcriptomic differences between populations? Are the cell cycle genes in dividing effector cells the same as in dividing early memory cells?

We thank the reviewer for allowing us to clarify this point. First, we would like to point out that cell cycle genes are not removed from the datasets. The misunderstanding might stem from the fact that, as part of our data integration procedure, cell cycle genes (as well as TCR genes and others) are excluded from variable gene selection (this has been clarified in line 651-653). This procedure is done to mitigate the effect of cell cycling, which causes dramatic transcriptional changes irrespective of cell subtype or basal transcriptional programs. Instead, we aim at clustering cells based on functionally distinct subtypes (i.e. the subtype-defining core transcriptional programs) rather than based on proliferation status, or other more transient signals. In practice, this only leads to excluding a limited number of genes that fell within the set of most highly variable genes (800 for this map). Of note, the exclusion of cell cycle genes from the variable features has much less impact than the typical ‘cell cycle regression’ procedure used to mitigate cell cycle, which affects all genes measurements (reviewed in Lueken et al. (2019) Mol Syst Biol).

To address more directly the reviewer’s concern about the possibility that we might be missing important biological information encoded in cell cycle genes with our procedure, we re-analyzed the dataset by Khatun et al. (LCMV acute day 10) (Figure 5—figure supplement 1C-G). Unsupervised analysis shows that cycling cells cluster together (Figure 5—figure supplement 1C-D), and that cells in this cluster express a mixture of markers for Th1 and Tfh (Figure 5—figure supplement 1D-E). A fraction of cells in the cycling cluster also express high levels of Foxp3, a marker of Tregs (Figure 5—figure supplement 1E). Thus, unsupervised clustering practically groups cells solely based on their proliferation status, regardless of their functional phenotype.

Next, we asked whether projecting these cycling cells into our reference map allows discriminating the different subsets. Projection of all cells from the cycling cluster reveals that, while the majority of cells were assigned to Th1, more than 30% of cycling cells were predicted to be Tfh, Tcmp and Treg (Figure 5—figure supplement 1F). The expression profile of a panel of marker genes for these cell subtypes corresponded closely with the reference profiles (with additional high expression of cycling markers e.g. Mki67, as these are all cycling cells), confirming that the cycling cluster was composed of a mixture of different cell types (Figure 5—figure supplement 1G).

These results show that cell cycling can indeed mask differences between cell subtypes, and that failing to account for cell cycling signals leads to mixing of multiple cell types in low dimensional spaces; thus validating our choices. Cell cycling, which is an essential characteristic to consider in a biological system, can then be evaluated as a separate quantity that is orthogonal to cell subtypes, and cycling cells can be annotated by projection into the reference space.

These results are presented in new Figure 5—figure supplement 1 and described in section: “Effect of cell cycling in defining reference spaces” (lines 719-732)

6) The experimental validation of this dataset is limited to showing that CD4 T cells in persistent infection express more EOMES than T cells in acutely infected mice and that they express lower levels of THPOK. However, what is the global alignment between flow cytometry data presented in Figure 1 and the scRNAseq data? Were any of the cluster frequencies predicted by the scRNAseq data validated using a protein panel?

We thank the reviewer for giving us the opportunity to explain in more details the steps we take to validate the robustness of our results. The manuscript described the validation of both the presence of the Thpoklow Eomeshigh population (Figure 2E-F and Figure 3—figure supplement 1D) and the decrease of CCR7-expressing cells in chronic settings (Figure 3—figure supplement 1B). In addition, we confirmed by flow cytometry the presence and proportions of major cell populations described in our study (see validation below). It is however important to note that identifying robust T cell states by flow cytometry is difficult and often fails to discriminate T cell subsets from plastic T cell populations. For example, defining Tcm populations by flow cytometry remains challenging as this cell population can only be distinguished from Tfh based on CCR7 expression. With these limitations in mind, our study was precisely designed to overcome these difficulties and define more robustly T cell states by relying on global transcriptional features instead of expression of a few proteins by flow cytometry, which is often highly variable across laboratories, models and tissues.

To directly address the reviewer’s comment, we included our flow cytometry validation using our flow cytometry datasets based on the expression of known markers on virus-specific T cells (new Figure 1—figure supplement 1E). Overall, this analysis shows a very good correspondence between scRNA-seq and flow cytometry in the presence and distribution of the major clusters across conditions. These results are also consistent with previous studies during acute (Ciucci et al., Immunity 2019, PMID: 30638736, Künzli et al., Science Immunology 2020, PMID: 32144185, Marshall et al., Immunity 2011; PMID: 22018471) and chronic (Snell et al., Nature Immunology 2022, PMID: 34795443, Zander et al., Immunity 2022, PMID: 35216666) LCMV infections.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there is one remaining issue that needs to be addressed, as outlined below:

Reviewer #1 (Recommendations for the authors):

In general this reviewer is satisfied with the revisions to the manuscript, however, one point needs to be better clarified. When the tumor-specific CD4 T cells from TILs were projected into the reference map 40-50% of them mapped into Th1 effectors. Yet upon further analysis and reclustering, these cells ended up being a completely distinct population of cells from the viral effector Th1. Thus, this reviewer is worried this could lead to misinterpretation and incorrect identification of subsets when using the reference map on other systems with unrepresented subsets not in the reference map. Could the authors comment on/clarify this point? It would be helpful to discuss the additional steps needed to verify that the corresponding states determined from projecting one's data into the reference map have similar gene profiles, and if they do not, how to address and identify these novel populations not represented in the map.

This is an important point. In fact, the analysis of T cells from tumor was included in the revised manuscript to illustrate how to detect novel populations in a query dataset compared to the reference. We have at least four possible strategies, all implemented as functions in the ProjecTILs analysis package, to aid the user in detecting new/unrepresented states. First, on a qualitative level, one can inspect the radar plots (function plot.states.radar()) to compare the expression of panels of key genes between query and reference. While for all other well-represented cell types the profiles displayed a good correspondence, Th1_Effector cells in the tumor lacked the expression of key Th1 markers (Ccl5, Ly6c2) (Figure 7C). Second, the user can recalculate the UMAP embeddings of the combined reference and query space (function recalculate.embeddings()) to assess whether part of the projected data form a novel, separate cluster (e.g. Figure 7D). Third, per-subtype differential gene expression analysis (function find.discriminant.genes()) can reveal which and how many genes are differentially expressed between reference and query in a given subtype. In the case of the TIL data, tumoral T helpers showed increased expression of Th2 markers, guiding the users towards interpreting the nature of the novel cluster (Figure 7E). Finally, to provide a more quantitative measure of deviation from the reference, we implemented an average silhouette coefficient per subtype (function compute_silhouette()), which aims at measuring the average distance of query cells from their own assigned cluster/state compared to all other clusters/states of the reference. We report below the silhouette scores, normalized by the silhouettes of the reference, for the T cells from tumor and lymphnode from Figure 7, as well as from the dataset from Khatun et al. JEM (2020), which come from the same viral model and should act as a control.

The outlier value in this analysis is the normalized silhouette score of tumor-specific cells classified as Th1 effector, indicating that they do match poorly with the Th1 effectors of the reference. Taken together with the other three diagnostic analyses outlined above and described in the manuscript, these results should warn the user of the presence of a novel state in the query dataset.

To better convey this message in the manuscript, we added the following text to the section “Reference map projection to explore CD4+ T cell diversity beyond viral infections”:

“While reference maps aim at being as comprehensive as possible, it is possible that new datasets contain novel states that are not represented in the reference, especially when used in different diseases models. In these cases, the user is encouraged to make use of all the analytic tools we provide with the ProjecTILs package (see Methods) to evaluate the degree of correspondence between reference and query, as we illustrated in the case of tumor-specific T cells (Figure 7C-E). These analyses demonstrate the feasibility of using a reference map to describe cell diversity beyond the states already present in the map, and as a strategy to expand references to incorporate novel, unrepresented cell states.”

And a more detailed description of the available analytical functions in a new section in the Methods:

“Detection of novel/unrepresented states

Several utilities are available in the ProjecTILs package to evaluate query projection accuracy and to detect the presence of novel cell states. First, on a qualitative level, one can apply the plot.states.radar() function to compare the expression of panels of key genes between query and reference. Second, the user can recalculate the UMAP embeddings of the combined reference and query space (function recalculate.embeddings()) to assess whether part of the projected data form a new, separate cluster (e.g. Figure 7D). Third, per-subtype differential expression analysis (function find.discriminant.genes()) can reveal which and how many genes are differentially expressed between reference and query in a given subtype (e.g. Figure 7E). Fourth, the compute_silhouette() function calculates an average silhouette coefficient per subtype, which aims at measuring the average distance of query cells from their own assigned cluster compared to all other clusters of the reference. A case study highlighting the application of these metrics can be found at: https://carmonalab.github.io/ProjecTILs_CaseStudies/novelstate.html”

https://doi.org/10.7554/eLife.76339.sa2

Article and author information

Author details

  1. Massimo Andreatta

    1. Department of Oncology, UNIL CHUV and Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
    2. Agora Cancer Research Center, Lausanne, Switzerland
    3. Swiss Institute of Bioinformatics, Lausanne, Switzerland
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8036-2647
  2. Ariel Tjitropranoto

    David H. Smith Center for Vaccine Biology and Immunology, Department of Microbiology and Immunology, University of Rochester, Rochester, United States
    Contribution
    Validation, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5525-5236
  3. Zachary Sherman

    David H. Smith Center for Vaccine Biology and Immunology, Department of Microbiology and Immunology, University of Rochester, Rochester, United States
    Contribution
    Validation, Methodology
    Competing interests
    No competing interests declared
  4. Michael C Kelly

    Single Cell Analysis Facility, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Frederick, United States
    Contribution
    Resources, Methodology, Project administration
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0654-2778
  5. Thomas Ciucci

    1. David H. Smith Center for Vaccine Biology and Immunology, Department of Microbiology and Immunology, University of Rochester, Rochester, United States
    2. Laboratory of Immune Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing
    Contributed equally with
    Santiago J Carmona
    For correspondence
    thomasciucci@icloud.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5828-0207
  6. Santiago J Carmona

    1. Department of Oncology, UNIL CHUV and Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
    2. Agora Cancer Research Center, Lausanne, Switzerland
    3. Swiss Institute of Bioinformatics, Lausanne, Switzerland
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing
    Contributed equally with
    Thomas Ciucci
    For correspondence
    Santiago.Carmona@unil.ch
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2495-0671

Funding

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (PZ00P3_180010)

  • Santiago J Carmona

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank the NIH tetramer facility for reagents; the CCR and University of Rochester Flow Cytometry Core, the University of Rochester Genomics Research Center and the NIH High performance computing cluster for assistance; D McGavern, T Mosmann and F David for technical assistance; R Bosselut for supporting the research; D Goldstein, M Malik and the NCI Office of Science and Technology Resources for their support. We would also like to thank Prof. Carolyn King and David Schreiner at the University of Basel for critical reading of the manuscript. This work was supported by the University of Rochester, and Intramural Research Program of the National Cancer Institute, Center for Cancer Research (CCR), National Institutes of Health, and by the Swiss National Science Foundation (SNF project 180010). The CCR Single Cell Analysis Facility is funded by the Frederick National Laboratory for Cancer Research, Contract HHSN261200800001E. Sequencing was performed with the CCR Genomics Core and the University of Rochester Genomics Research Center.

Ethics

This study was performed under the protocol UCAR 2020-003 approved by the University of Rochester Committee on Animal Resources.

Senior Editor

  1. Tadatsugu Taniguchi, Institute of Industrial Science, The University of Tokyo, Japan

Reviewing Editor

  1. Juan Carlos Zúñiga-Pflücker, University of Toronto, Sunnybrook Research Institute, Canada

Reviewer

  1. Laura M Snell, Indiana University School of Medicine, United States

Publication history

  1. Preprint posted: September 20, 2021 (view preprint)
  2. Received: December 13, 2021
  3. Accepted: July 12, 2022
  4. Accepted Manuscript published: July 13, 2022 (version 1)
  5. Version of Record published: July 26, 2022 (version 2)

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 1,854
    Page views
  • 448
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Massimo Andreatta
  2. Ariel Tjitropranoto
  3. Zachary Sherman
  4. Michael C Kelly
  5. Thomas Ciucci
  6. Santiago J Carmona
(2022)
A CD4+ T cell reference map delineates subtype-specific adaptation during acute and chronic viral infections
eLife 11:e76339.
https://doi.org/10.7554/eLife.76339
  1. Further reading

Further reading

    1. Immunology and Inflammation
    Lyra O Randzavola, Paige M Mortimer ... David C Thomas
    Research Article

    EROS (Essential for Reactive Oxygen Species) protein is indispensable for expression of gp91phox, the catalytic core of the phagocyte NADPH oxidase. EROS deficiency in humans is a novel cause of the severe immunodeficiency, chronic granulomatous disease (CGD), but its mechanism of action was unknown until now. We elucidate the role of EROS, showing it acts at the earliest stages of gp91phox maturation. It binds the immature 58kDa gp91phox directly, preventing gp91phox degradation and allowing glycosylation via the oligosaccharyltransferase (OST) machinery and the incorporation of the heme prosthetic groups essential for catalysis. EROS also regulates the purine receptors P2X7 and P2X1 through direct interactions and P2X7 is almost absent in EROS deficient mouse and human primary cells. Accordingly, lack of murine EROS results in markedly abnormal P2X7 signalling, inflammasome activation and T cell responses. The loss of both ROS and P2X7 signalling leads to resistance to influenza infection in mice. Our work identifies EROS as a highly selective chaperone for key proteins in innate and adaptive immunity and a rheostat for immunity to infection. It has profound implications for our understanding of immune physiology, ROS dysregulation and possibly gene therapy.

    1. Computational and Systems Biology
    2. Immunology and Inflammation
    Mingyao Pan, Bo Li
    Short Report Updated

    T cells are potent at eliminating pathogens and playing a crucial role in the adaptive immune response. T cell receptor (TCR) convergence describes T cells that share identical TCRs with the same amino acid sequences but have different DNA sequences due to codon degeneracy. We conducted a systematic investigation of TCR convergence using single-cell immune profiling and bulk TCRβ-sequence (TCR-seq) data obtained from both mouse and human samples and uncovered a strong link between antigen-specificity and convergence. This association was stronger than T cell expansion, a putative indicator of antigen-specific T cells. By using flow-sorted tetramer+ single T cell data, we discovered that convergent T cells were enriched for a neoantigen-specific CD8+ effector phenotype in the tumor microenvironment. Moreover, TCR convergence demonstrated better prediction accuracy for immunotherapy response than the existing TCR repertoire indexes. In conclusion, convergent T cells are likely to be antigen-specific and might be a novel prognostic biomarker for anti-cancer immunotherapy.