Early life imprints the hierarchy of T cell clone sizes
Abstract
The adaptive immune system responds to pathogens by selecting clones of cells with specific receptors. While clonal selection in response to particular antigens has been studied in detail, it is unknown how a lifetime of exposures to many antigens collectively shape the immune repertoire. Here, using mathematical modeling and statistical analyses of T cell receptor sequencing data, we develop a quantitative theory of human T cell dynamics compatible with the statistical laws of repertoire organization. We find that clonal expansions during a perinatal time window leave a longlasting imprint on the human T cell repertoire, which is only slowly reshaped by fluctuating clonal selection during adult life. Our work provides a mechanism for how early clonal dynamics imprint the hierarchy of T cell clone sizes with implications for pathogen defense and autoimmunity.
eLife digest
The human immune system develops a memory of pathogens that it encounters over its lifetime, allowing it to respond quickly to future infections. It does this partly through T cells, white blood cells that can recognize different pathogens. During an infection, the T cells that recognize the specific pathogen attacking the body will divide until a large number of clones of these T cells is available to help in the fight. After the infection clears, the immune system ‘keeps’ some of these cells so it can recognize the pathogen in the future, and respond quicker to an infection.
Over the course of their lives, people will be infected by many different pathogens, leading to a wide variety of T cells that each respond to one of these pathogens. However, it is not well understood how various infections throughout the human lifespan shape the overall population of different T cells.
Gaimann et al. used mathematical modelling to study how the composition of the immune system changes in people of different ages. Different populations of T cells – each specialized against a specific antigen – had been previously identified through genetic sequencing. Gaimann et al. analyzed their dynamics to show that many of the largest populations originate around birth, during the formation of the immune system.
These findings suggest a potential mechanism for how exposure to pathogens in infancy can influence the immune system much later in life. The results may also explain variations in how people respond to infections and in their risk of developing autoimmune conditions. This understanding could help develop new treatments or interventions to guide the immune system as it develops.
Introduction
The hallmark of adaptive immunity is the generation of diversity through genetic recombination and clonal selection. Their interplay balances the breadth and specificity of the ~10^{12} T cells in the human body (Figure 1A; Arstila et al., 1999; Farber et al., 2014): The genetic recombination of the T cell receptor (TCR) locus, termed VDJ recombination, generates an enormous potential diversity of receptors ranging from early estimates of ~10^{15} (Davis and Bjorkman, 1988) to more recent estimates of ~10^{61} (Mora and Walczak, 2016) different possible receptor TCRαβ heterodimers. Clonal selection expands the number of specific cells during an infection for effector functions, a fraction of which is retained over prolonged periods of time as immune memory (Ahmed and Gray, 1996; Farber et al., 2014).
Much progress has been made deciphering the mechanisms of regulation and control of T cell dynamics over the last few decades (Antia et al., 2005; Sallusto et al., 2010; Farber et al., 2014). However, much of that progress has focused on the dynamics of subsets of T cells specific to a particular antigen and has come from experiments in mice. An important open question is how exposures to many antigens over a human lifetime collectively shape our T cell repertoire (Farber et al., 2014; Davis and Brodin, 2018).
Highthroughput repertoire sequencing enables direct surveys of the diversity and clonal composition of T cells from human blood or tissue samples and thus promises to provide quantitative answers to this question (Robins et al., 2009; Thomas et al., 2014; Britanova et al., 2016; Emerson et al., 2017; Oakes et al., 2017; Thome et al., 2016; Robins et al., 2010; Qi et al., 2014; Lindau et al., 2019; TRACERx consortium et al., 2019). However, while the TCR locus provides a natural barcode for clonal lineages due to its large diversity, this same diversity also makes inferring past clonal dynamics a challenging inverse problem, in particular given practical limitations on sequencing depth and temporal resolution in longitudinal studies. Mathematical modeling can help address this challenge by solving the forward problem of linking clonal dynamics to emergent statistical patterns (Desponds et al., 2016; Lythe et al., 2016; Dessalles et al., 2019; AltanBonnet et al., 2020; de Greef et al., 2020). Comparing patterns to data can provide insights about dynamics from static snapshots of repertoire organization in different individuals. A particularly striking such pattern has been the observation of powerlaw scaling of clone sizes spanning several orders of magnitude (Robins et al., 2009; Thomas et al., 2014; Britanova et al., 2016; Emerson et al., 2017; Oakes et al., 2017). In a typical sample of T cells from peripheral blood, a large fraction of clones is only seen once within 10^{5}–10^{7} sampled sequences, while the most abundant clones account for more than 1% of all sequencing reads. Such powerlaw scaling of clone sizes has been shown to arise at steady state in models of fluctuating clonal selection driven by different antigen encounters (Desponds et al., 2016). However, it is unclear whether this mechanism alone is sufficient to explain how clone size scaling is established, and more broadly how variable the clonal hierarchy is over time.
Here, we develop a theory of T cell dynamics throughout the human lifespan based on the statistical laws of repertoire organization and dynamics revealed by cohort and longitudinal human TCR repertoire sequencing (Britanova et al., 2016; Emerson et al., 2017; Chu et al., 2019; Lindau et al., 2019). We find that clonal expansions during repertoire formation play an important role in establishing the clone size hierarchy, which is only slowly reshaped by fluctuating clonal selection during adult life.
Results
A scaling law of human T cell repertoire organization
An important statistic to summarize repertoire organization is the clone size distribution, which tabulates the number of clones found at different multiplicities within a repertoire or sample. Multiple previous studies have shown that these distributions are heavytailed (Robins et al., 2009; Thomas et al., 2014; Britanova et al., 2016; Emerson et al., 2017; Oakes et al., 2017). However, potential confounding by noise introduced during the sequencing process has remain debated (AltanBonnet et al., 2020) and systematic analyses of how variable these distributions are across healthy individuals have been lacking. To fill these gaps, we reanalyzed data from two largescale cohort repertoire sequencing studies of human blood samples, which used fundamentally different sequencing pipelines and thus have different sources of noise (Materials and methods – Experimental data sources). Both studies sequenced the locus coding for the hypervariable TCR CDR3 βchain from peripheral blood T cells of healthy human volunteers spanning a large range of ages (Figure 1—figure supplement 1). In both studies, T cells were sequenced without regard to their phenotypic characteristics. A clone thus represents the full lineage of cells derived from a common ancestral cell irrespective of differentiation status (see Appendix 5 – Relation between clone size and cellular phenotypes and Figure 1—figure supplement 2 for a phenotypically resolved analysis).
After normalizing clone sizes to account for variations in sampling depth and for the increasing fraction of T cells of memory phenotype with age (Figure 1—figure supplement 3), we found that the tails of the clone size distributions collapsed to the same statistical law across individuals and cohorts (Figure 1B,C): Ranking clones by decreasing size, the rank of the largest clones approximately scales with their size C as a power law,
where α is a scaling exponent. To quantify the apparent similarity of the scaling relationship, we determined α for each sample by maximum likelihood estimation. Only a small fraction of all T cells are sampled, which poses a challenge because subsampling a power law leads to deviations from scaling at small clone sizes (Stumpf et al., 2005). To overcome this challenge, we used a trimming procedure and excluded clones smaller than a minimal size from the fitting, which decreases bias arising from subsampling (Appendix 1). Determined in this subsamplingrobust manner the fitted powerlaw exponents agree remarkably well within the range of ages covered by both cohorts (Figure 1D): with α = 1.17 ± 0.03 (mean ± standard error [SE]) and α = 1.18 ± 0.01 in the Britanova cohort and Emerson cohort, respectively. Moreover, the fitted exponents varied little between individuals in both cohorts; with a sample standard deviation of fitted exponents of 0.14 and 0.21, respectively. The agreement of the mean exponents is noteworthy given the different sequencing pipelines and provides strong evidence that the scaling relationship (Equation 1) is a true feature of the clone size distribution and not of the measurement process.
What drives the emergence of a powerlaw distributed hierarchy of clone sizes? Given the reproducibility of the scaling law across individuals, we might hope for a statistical explanation independent of the precise antigenic history that has driven the expansion of specific cells in an individual. To test hypotheses about mechanisms underlying scaling, we describe repertoire dynamics using a general mathematical framework based on effective stochastic rate equations for the recruitment of new clones, and the proliferation and death of already existing clones within a T cell compartment (Materials and methods – Mathematical framework). In macroecology, where such reductionist approaches have a long history, simple neutral models within this framework have had surprising success in describing species abundance distributions only accounting for demographic stochasticity (Volkov et al., 2003), but this source of variability is insufficient to account for the observed breadth of T cell clone sizes (Desponds et al., 2016; de Greef et al., 2020; for a detailed discussion see Appendix 2). The failure of this null model has prompted a search for other mechanisms that explain scaling.
To constrain this search, we analyzed how fitted exponents varied with age. In particular, based on a finite time solution we derived for a previously proposed model (Desponds et al., 2016) of how powerlaw scaling can emerge from the cumulative effect of temporal fluctuations in clonal growth rates (Materials and methods – Slow convergence to steady–state scaling), we expected a substantially steeper tail in young individuals. While exponents overall decreased slightly with age, the dependence on age accounted for surprisingly little variation in both cohorts (Figure 1D), including when controlling for sex and cytomegalovirus (CMV) exposure status (Figure 1—figure supplement 4). Notably, scaling is established within the first decade of life, with significant clone size variability existing as early as at birth (Figure 1—figure supplement 5), defying previous model predictions.
A mechanism for the emergence of scaling during repertoire formation
We hypothesized that scaling might result from clonal expansions during repertoire formation, which would naturally explain the early onset of scaling. Our hypothesis is based on experimental evidence in mice (Le Campion et al., 2002; Min et al., 2003; Haluszczak et al., 2009; Kawabe et al., 2017) and human (Rufer et al., 1999; Schönland et al., 2003) that repertoire formation is driven not only by increased thymic output, but also by large proliferative expansion of some T cell clones. Additionally, multiple studies Hammarlund et al., 2003; Pogorelyy et al., 2017; Tanno et al., 2020 have shown that some T cell clones can persist over multiple decades, which suggests that clonal turnover might be sufficiently slow for transient expansionary dynamics early in life to shape repertoire organization over a prolonged time period (see also Appendix 2 –Relaxation time scale in a neutral model).
To test our hypothesis, we constructed a minimal model of repertoire formation based on known T cell biology (Figure 2A). Following previous work (Bains et al., 2009; Lythe et al., 2016), we assume that T cells proliferate at a rate inversely proportional to the total number of cells N already present in the repertoire. This assumption leads to increased proliferation early in life before the repertoire has reached its homeostatic size, for which there is experimental evidence Le Campion et al., 2002; Min et al., 2003; Haluszczak et al., 2009; Kawabe et al., 2017; Rufer et al., 1999; Schönland et al., 2003. This assumption is also compatible with a simple mechanistic model of T cell competition (Materials and methods – Mechanistic motivation for the competition function). We further assume that cells die at a rate d, and that new clones are recruited at rate θ with an initial size C_{0}. For simplicity, we set C_{0} equal to one in the following and we assume constant rates d and θ. Importantly, recruitment of new clones and total expansion of already existing clones maintain a constant ratio throughout development under these assumptions in line with findings that the fraction of cells with TCR excision circles, which are diluted during peripheral division, is constant during fetal development (Schönland et al., 2003) and infancy (Douek et al., 2001; Bains et al., 2009).
We simulated the model starting from an empty repertoire and found that large clones displayed powerlaw scaling (Figure 2B blue lines). The simulation results contrast with steadystate predictions (Figure 2B black line), where the model effectively reduces to the neutral null model introduced earlier (Materials and methods – Steadystate distribution). The powerlaw tail persisted over multiple decades of aging, much beyond the timescale of cellular turnover, 1/d = 5 years, assumed in the simulations. A mathematical analysis shows that relative timescales of clonal and cellular turnover are controlled by the control parameter $\gamma =\theta {C}_{0}/{b}_{0}$, which is the ratio of the contribution of recruitment and proliferation to overall compartment maintenance (Appendix 2). The long timescale of clonal turnover emerges because the chosen parameters are in the biological parameter regime $\gamma <1$, where most cell death is balanced by proliferation (den Braber et al., 2012; Macallan et al., 2017). Thus, we find that repertoire formation can produce transient but longlasting powerlaw scaling of clone sizes.
To obtain intuitive insight into how scaling is established, we developed a continuum theory of clonal dynamics during repertoire growth (Materials and methods – Continuum theory of clonal growth). We find that the clone size C_{i} of the ith clone recruited at time t_{i} follows a subexponential growth law $C}_{i}(t)={C}_{0}(t/{t}_{i}{)}^{1/(1+\gamma )$. Clones recruited early grow large deterministically until competition lowers proliferation rates below the death rate (Figure 2C, lower panel). Different clones are recruited at different times and thus have more or less time to grow (Figure 2C, upper panel), which leads to a clone size distribution that follows powerlaw scaling with an exponent $\alpha =1+\gamma$. We note that this origin of the powerlaw scaling is closely related to a wellknown generative mechanism for powerlaws first studied by Yule, 1924 (for a detailed discussion see Appendix 4 – A unified view on mechanisms generating power laws in different growth processes).
The predicted exponent closely matches simulation results for different values of γ (Figure 2D dashed lines). Intuitively, when recruitment rates are higher, clones founded early have less time to outgrow later competitors, and thus the power law is steeper (α is larger). Importantly, in the biological parameter regime in which proliferation dominates, $\gamma <1$, the exponent is compatible with experiments (Figure 1B–D). We thus find, that the model – without fine tuning of parameters – reproduces the observed scaling exponent.
To expose a basic mechanism capable of producing broad clone size distributions, we have kept the model deliberately simple. More detailed models demonstrate the conditions and limits on the generalizability of this mechanism (Appendix 3). Variable recruitment sizes only affect the distribution of small clones (Figure 2—figure supplement 1); while a saturation of proliferation rates, or competition between subsets of T cells for specific resources maintain distributions at small and intermediate sizes while leading to cutoffs for the largest clones (Figure 2—figure supplement 2 and Figure 2—figure supplement 3, respectively).
Longlived incumbency advantage shows early expansions imprint clone size hierarchy
Our proposed theory for the rapid emergence of scaling predicts that large clones have expanded massively during repertoire formation. To test this prediction, we need to trace the dynamics of early founded clones. To this end, we exploit a change in the recombination statistics taking place during fetal development (Feeney, 1991; Rechavi et al., 2015; Park et al., 2020; Figure 3A). While T cells are produced by the thymus from the late first trimester the enzyme terminal deoxynucleotidyl transferase (TdT), which inserts nontemplated nucleotides during VDJ recombination, is not expressed until the mid second trimester (Park et al., 2020). Therefore, many more T cells in fetal and neonatal blood have zero insertions than expected from the adult recombination statistics (Rechavi et al., 2015). This enables a statistical dating of individual clones in a repertoire based on their sequence (Sethna et al., 2017; Pogorelyy et al., 2017).
If our model is correct, we expect abundant clones to be more likely to have zero insertions than smaller clones. Analyzing data from the Emerson cohort, we find that zero insertion clones are indeed highly enriched within the most abundant clones (Figure 3B). This generalizes a previous report of such an enrichment within the naive compartment (Pogorelyy et al., 2017). The large cohort size allows us to perform a finegrained analysis of how the fraction of zeroinsertion clones depends on clonal abundance and age. We find that enrichment is particularly pronounced in the young and decreases with age at different speeds depending on clone size. Among the largest clones many more still have zero insertions than expected from the adult recombination statistics even multiple decades after repertoire formation. This suggests that the incumbent large clones created during repertoire formation are only slowly replaced by clones expanding later in life, similarly to what has been observed in mice (Gossel et al., 2017; Hogan et al., 2019).
Additional analyses rule out other potential explanations for the relation between insertion statistics and clonal abundance. First, sequences with zero insertions are similarly enriched among the largest clones in productive and unproductive sequences (Figure 3—figure supplement 1) demonstrating that convergent selection pressures during adult life are not a primary source of the higher abundance of these clones. Second, while abundant clones are also enriched for sequences with known antigen specificity (Figure 3—figure supplement 2) and sequences likely to be convergently recombined (Figure 3—figure supplement 3), these enrichments do not show the same striking dependence on age. Furthermore, we find that zero insertion clones are consistently less enriched in individuals infected by CMV (Figure 3—figure supplement 4), in contrast to the hypothesis that this infection might drive their expansion (Pogorelyy et al., 2017). Taken together, these analyses support the conclusion that dynamics during the perinatal time window of repertoire formation leave a longlasting imprint on the T cell clonal hierarchy well into adulthood.
Longitudinal clone size fluctuations predict the dynamics of the clone size hierarchy with aging
Building on this successful validation of a core prediction of our theory, we asked whether we could leverage the detailed pattern of enrichments at different ranks and ages to quantify how much being part of the wave of early expansions determines the fate of a clone relative to other sources of clone size variability. To this end, we extended our model beyond repertoire formation and allowed clonal proliferation rates to fluctuate over time to model the net effect of clonal selection by changing antigenic stimuli during adult life (Desponds et al., 2016). Specifically, we added a stochastic term to the growth rate of each clone to provide an effective Langevin description of the longterm dynamics induced by fastchanging antigen stimuli,
where $\u27e8{\eta}_{i}(t){\eta}_{j}({t}^{\prime})\u27e9={\delta}_{ij}\delta (t{t}^{\prime})$.
To determine a biologically plausible fluctuation strength $\sigma $, we analyzed the variability of clone sizes over time in a longitudinal repertoire sequencing study (Chu et al., 2019). We first analyzed how much recently expanded clones contribute to the tail of the clone sizes, and found that only a small fraction of the largest clones in any sample were not already large at the earliest time point (Figure 4A and Figure 4—figure supplement 1). To minimize confounding by transient dynamics affecting these clones, we excluded them from further analysis. We found that large clones had remarkably stable abundances over time, which we quantified by calculating the variance of logfoldchanges in clone size between the second and every subsequent time point (Figure 4B and Figure 4—figure supplement 2). The variability of clone sizes increased linearly over time as expected theoretically, from which we determined a magnitude of net growth rate fluctuations $\sigma $ compatible with the slope of increase (Materials and methods – Modeling longterm repertoire dynamics with fluctuating clonal growth rates).
Using the fitted fluctuation strength, we constructed an in silico cohort of individuals of different ages according to the extended model (Materials and methods – Simulated cohort). In short, we computationally assigned each newly recruited clone to have zero insertions in a way that mimics the change in fetal recombination statistics, and we simulated memory repertoire dynamics based on the combined effect of early expansion and fluctuating clonal selection (Equation 2). The enrichment of zero insertion clones in the simulated cohort (Figure 4C) closely recapitulated the empirical findings using plausible parameter values. Notably, the longer lasting enrichment of zero insertion clones among the very largest clones is also found in the simulated cohort, and the timescales over which the enrichment decays agree remarkably well.
To obtain analytical insight into the enrichment dynamics, we studied how fluctuating growth rates reorder the clone size hierarchy established during repertoire formation (Materials and methods – Relaxation of the zero insertion distribution). The early clone size hierarchy in our model is dominated by the time of recruitment, which leads to a steep gradient in the zero insertion probabilities as a function of rank in the first decade of life (Figure 4C). We thus calculated how the zero insertion probabilities ${P}_{0}(r,t)$ for clones of rank r at time t change due to fluctuating clonal growth starting from an idealized initial condition resembling the early clone size hierarchy, in which the clone sizes follow powerlaw scaling and the ${r}^{\star}$ largest clones have a zero insertion probability ${p}_{0,}$ and all others a probability ${p}_{0,+}$. We find that the probability of zero insertion clones then follows a sigmoidal shape as a function of log clone size rank, which changes with age as follows,
where $\mathrm{\Delta}{p}_{0}={p}_{0,}{p}_{0,+}$ is the difference of zero insertion probabilities, ${\tau}_{d}$ a characteristic timescale, and $\mathrm{e}\mathrm{r}\mathrm{f}\mathrm{c}(x)$ the complementary error function. These analytical results suggest a twoparameter rescaling of the enrichment of zero insertion clones as a general test of our theory. To demonstrate the feasibility of fitting these parameters from the enrichment dynamics, we determined them from the simulated data using weighted least squares fits setting r and t in Equation 3 to the midvalue of each bin. Rescaling the data with the fitted ${r}^{\star}$ and ${\tau}_{d}$ lead to a collapse of all simulated datapoints onto the predicted sigmoidal curve (Figure 4—figure supplement 3). We then applied the same fitting and rescaling procedure to the experimental data, and found that it also leads to a remarkably good data collapse (Figure 3C).
The fitted parameters quantify key features of longterm repertoire dynamics, with ${\tau}_{d}$ characterizing the timescale over which fluctuations change the clone size hierarchy, and ${r}^{\star}$ being related to the number of clones recruited during early repertoire growth. In line with the longlived enrichment of zero insertion clones, the fitting reveals a remarkably slow timescale of about a decade over which the clone size hierarchy is reordered during healthy aging. The fitted ${r}^{\star}$ indicates that early repertoire formation involves the expansion of a large number of different clones. Overall, the agreement between theory and data demonstrates that our model quantitatively captures how early expansions and ongoing fluctuating selection together shape the clone size hierarchy.
Discussion
The evolution of the adaptive immune system has endowed vertebrates with the ability to adapt to pathogens that evolve on a timescale faster than host reproduction (Mayer et al., 2016). However, this ability comes with a cost: every generation needs to rebuild immune memory anew. As the organism first comes into contact with the outside world, it quickly needs to train its adaptive immune system to tolerate innocuous antigens and build up immune memory against pathogens. Here, we have shown that this process of rapid adaptation leaves a longlasting imprint on the organization of the human T cell repertoire. More broadly, we propose a theory of repertoire dynamics that quantitatively describes how early expansions during repertoire formation combine with a lifetime of exposures to cumulatively shape the T cell hierarchy. Notably, we find that the T cell repertoire is remarkably stable over time in adult individuals outside of the punctuated expansions and contractions of specific clones in acute responses. Our study demonstrates that despite its vast complexity, repertoire dynamics is partially predictable by quantitative models. The model predictions can help guide future longitudinal studies, which in turn will allow refinements of modeling assumptions. The current work thus provides a stepping stone toward a detailed quantitative understanding of T cell dynamics that we hope will ultimately power the rational development of immunodiagnostics and therapeutics.
The general mechanism we describe for imprinting in the adaptive immune system provides a unified lens through which to view a number of converging lines of evidence about how a developmental time window shapes adaptive immunity (GueraudeArellano et al., 2009; Farber et al., 2014; Gostic et al., 2016; Constantinides et al., 2019; Li et al., 2019; Davenport et al., 2020; Hong et al., 2020). In our model, overall repertoire growth early in life amplifies the effect of any early exposures, as the responding clones continue to proliferate as memory cells and thus are much larger than memory produced from similar exposures after the homeostatic repertoire size is reached. We thus expect early pathogen exposures to be particularly potent, as has been observed in influenza, where disease severity across age cohorts for different strains depends on the first exposure (Gostic et al., 2016). Conversely, we expect the presence of tolerizing factors early in life to be particularly crucial during repertoire formation to avoid autoimmunity, as has been observed for the autoimmune regulator gene AIRE, for which expression is only essential during a perinatal time window (GueraudeArellano et al., 2009).
A limitation of datasets used in this study is that they do not provide direct information about the phenotypic characteristics of cells belonging to different clones. Repertoire sequencing of phenotypically sorted blood samples shows that the largest clones predominantly consist of cells with memory phenotype (Appendix 5). This indirectly suggests that the clonal expansions during repertoire formation produce memory cells as we have assumed in our simulated cohort (Figure 4C). Supporting this interpretation, a substantial number of memory cells circulate in the blood quickly following birth (Pediatric AIDS Clinical Trials Group et al., 2003) and recent evidence suggests that memorylike T cells are already generated in particular tissue sites such as the intestine even before birth (Zhang et al., 2014; Li et al., 2019). However alternatively, early expansions could also set up a broad distribution of naive T cell clone sizes (de Greef et al., 2020), whose hierarchy would then need to be roughly maintained during the transition into memory to be compatible with the observed impact of early expansions on the hierarchy of the most abundant clones. Advances in repertoire sequencing of T cells sorted with increasing granularity using cell surface markers (Oakes et al., 2017; Soto et al., 2020) along with advances in singlecell technologies linking TCR sequencing and cellular phenotyping could help differentiate between these scenarios in the future.
An important question raised by our work is which antigens drive the expansion of early T cell clones. To address this question, it will be necessary to determine the exposures that imprint the abundance of these clones, as has been done recently for mucosalassociated invariant T cells (Constantinides et al., 2019), a subset of nonconventional T cells. Going forward, the highly abundant clones with sequences close to the genetically inherited gene templates resulting from the absence of TdT expression during early fetal development are a particularly interesting target of study. They might constitute an evolutionarily controlled set of innatelike defenses within the adaptive immune system. Determining what imprints their abundances will help resolve the question of whether their large abundances are simply a byproduct of rapid repertoire formation or whether these clones serve particular functions.
Materials and methods
Experimental data sources
Request a detailed protocolWe analyzed T cell repertoire sequencing data from two large published cohort studies of healthy human volunteers by Britanova et al., 2016 and Emerson et al., 2017 and from a longitudinal study by Chu et al., 2019, detailed descriptions of which we provide in the following.
In short, Britanova et al., 2016 sequenced reverse transcribed mRNA with added unique molecular identifiers (UMIs), while Emerson et al., 2017 and Chu et al., 2019 sequenced genomic DNA coding for this region without the addition of UMIs. These approaches have complementary strengths: The addition of UMIs allows to correct for stochasticity during PCR amplification and sequencing artifacts, while DNA sequencing removes the influence of celltocell gene expression heterogeneity.
The Britanova cohort comprises 71 individuals spanning ages 6–103 years, as well as eight cord blood samples. The Emerson cohort spans ages 1–74 years and consists of a training and validation set of 666 and 120 individuals, respectively. From the training set, we excluded 111 samples with missing age information and 62 samples with a conflicting data format. We used only samples from the training set to analyze how the scaling law of repertoire organization changes with age (Figure 1). For the zero insertion enrichment analyses (Figure 3B,C), we combined both the training and validation set together with separately published repertoire sequencing data from eight elderly individuals Lindau et al., 2019 generated using the same experimental pipeline (immunoSEQ, Adaptive Biotechnologies, Seattle) to achieve the broadest possible coverage of all age groups.
The longitudinal study by Chu et al., 2019 performed repertoire sequencing of peripheral blood from three healthy female volunteers (using the immunoSEQ pipeline) over eight time points spanning a ~1 year time frame. One individual in the study was in midadulthood (24–45 years, Subject 3 in the original study), while two were in early adulthood (18–24 years, Subjects 1 and 2 in the original study). In Figure 4A,B, we display data from the older individual as we expect dynamics of large clones to be masked less by measurement noise as the large clones increase in relative abundance with age.
All studies from which we analyzed data sequenced the locus coding for the TCR CDR3 βchain only, and we thus define clones as collections of cells sharing the same CDR3 βchain. Clone sizes are defined as the number of distinct unique molecular identifiers (UMIs) sequenced (Britanova cohort), or based on sequencing reads (Emerson cohort). The definition of a clone solely based on the CDR3 βchain neglects convergent recombination of the most easily produced receptors with different CDR3 αchains, but we expect convergent recombination to be sufficiently rare overall for this distinction not to qualitatively affect clone size distributions.
For all studies we used data, which was preprocessed as described in the original study. This data is publicly available using the links provided in Table 1.
We also used flow cytometry data on the fraction of naive cells from Britanova et al., 2014 (available at https://doi.org/10.1371/journal.pcbi.1005572.s016) and from Pediatric AIDS Clinical Trials Group et al., 2003.
Data analysis
Fitting powerlaw exponents
Request a detailed protocolWe estimate the powerlaw exponent from sampled clone sizes $\{{C}_{i}\}$, $i=1,\mathrm{\dots},M$, which exceed a minimal size ${C}_{min}$ by numerically maximizing the loglikelihood of the data (Clauset et al., 2009),
where $\zeta (x,k)$ is the incomplete Riemann zeta function. We use ${C}_{min}=16$ for both cohorts, which provides a balance between minimizing bias of the estimated exponents induced by subsampling while not overly increasing the variance of the estimator by excluding most of the data (see Appendix 1—figure 2).
Fitting the zero insertion profiles
Request a detailed protocolTo fit the zero insertion fractions to the theory prediction (Equation 3) we determine the values for ${r}^{\star}$ and ${\tau}_{d}$ by a weighted least squares fit. We set r and t to the midvalue of each bin for the data. We weight each value by the sum of its empirical standard error and a fixed model specification error of $2\cdot {10}^{3}$ chosen based on fits to the simulated cohort (Figure 4—figure supplement 3). To demonstrate the feasibility of the parameter inference, we reinferred the parameters from the simulated data and recovered those used as parameter values for the simulation. We also fitted the values of ${p}_{0,}$ and ${p}_{0,+}$, but we note that they are not used in the rescaling and are only needed to display the theoretical curve (Equation 3).
Normalization of clone sizes
Request a detailed protocolVariations in sampling depth can confound comparisons of clone sizes (Appendix 1). Intuitively, if we sample more cells overall we also expect to sample proportionally more cells belonging to each given clone. This suggests to use the frequency with which cells are sampled from a given clone as a more robust measure, which can be empirically estimated by normalizing each clone size by the total sample size. We further normalize clone sizes by the fraction of memory T cells found in people of different ages to account for the increase in memory cell fraction in peripheral blood with age (Appendix 5). Together these two normalization steps lead to a large degree of data collapse as compared to unnormalized clone sizes (Figure 1—figure supplement 3).
Regression analyses
Request a detailed protocolWe determine 95% confidence intervals on regression lines by bootstrapping using case resampling (Efron and Hastie, 2016).
Mathematical framework
Request a detailed protocolWe describe T cell dynamics using the following general set of stochastic rate equations. The class of models we consider are known in the mathematical literature as birthdeathimmigration models. The number of cells ${C}_{i}$, $i=1,\mathrm{\dots},M$ of each of the M clones in the repertoire changes according to
where the rate of proliferation ${b}_{i}(\mathit{\bm{C}},\mathit{\bm{X}},t)$ or cell death ${d}_{i}(\mathit{\bm{C}},\mathit{\bm{X}},t)$ generally can depend on the repertoire composition $\mathit{\bm{C}}$, on the time t, and on the state of the environment $\mathit{\bm{X}}(t)$ representing for example the levels of different antigens and cytokines in the organism at a given time. We furthermore consider that new clones are added at rate $\theta (\mathit{\bm{X}},t)$ at a size C_{0},
This recruitment represents thymic output and antigendriven differentiation of naive cells for the naive and memory compartment, respectively.
where $N(t)={\sum}_{j=1}^{M(t)}{C}_{j}(t)$ is the total repertoire size. In the results section 'Longlived incumbency advantage shows early expansions imprint clone size hierarchy' w modify this model by adding a noise term that describes the effective influence of environmental variations $\mathit{\bm{X}}(t)$ on clonal proliferation,
where $\u27e8{\eta}_{i}(t){\eta}_{j}({t}^{\prime})\u27e9={\delta}_{ij}\delta (t{t}^{\prime})$.
Modeling repertoire formation
Mechanistic motivation for the competition function
Request a detailed protocolWe consider a population of N T cells that proliferate at a rate proportional to the concentration S of a set of stimuli (stimulatory cytokines), $b\propto S$. We assume that the cytokines are produced by other cells at some fixed rate p and degraded at a basal rate q. We further assume that competition between T cells is mediated by their consumption of cytokines. The dynamics of S is then described by
where $kSN$ is a mass action term describing how T cells lower cytokine levels. Assuming a separation of timescales in which cytokine concentrations change quickly we obtain the quasi steady state approximation
When the consumption term dominates relative to basal decay, $kN\gg q$, we obtain $b\propto S\propto 1/N$.
Meanfield competition approximation
Request a detailed protocolWe simplify the full stochastic model (Equation 5–Equation 7) using a meanfield approximation for the competition, which decouples the dynamics of individual clones while retaining the full stochasticity on the clonal level. This approximation replaces the dependence of the proliferation rate on N by a dependence on its continuum theory average given by Equation 14. We exactly simulated a system of reduced size to validate the meanfield approximation. The distributions of the exact and meanfield simulations agree to within stochasticity (Figure 5), with the exception of the largest clone, which is larger in the exact simulations as has been discussed elsewhere (Dodds et al., 2017).
Continuum theory of clonal growth
Request a detailed protocolTo obtain insight into why the model produces powerlaw scaling we present a simple continuum theory of early clonal dynamics. We approximate the clone size dynamics of the ith clone ${C}_{i}$ as
with ${C}_{i}({t}_{i})={C}_{0}$ at the time of recruitment t_{i}. The total repertoire size $N={\sum}_{i}{C}_{i}$ evolves according to
whose solution is given by
For times large compared to $1/d$ the total repertoire size given in Equation 14 reaches a steadystate,
because competition for proliferation signals acts as a homeostatic regulator. By combining Equation 14 and Equation 12 we derive the clonal growth law
where $\gamma $ as in Appendix 2 is the recruitment to proliferation ratio which in this model is given by $\gamma =\theta {C}_{0}/{b}_{0}$. To simplify we expand the growth law at leading order for small times, ${t}_{i}<t\ll 1/d$, to obtain
This expression can also be derived directly by noting that early repertoire growth is linear $N(t)\approx ({b}_{0}+\theta {C}_{0})t$, and that the early dynamics is dominated by proliferation and not death such that
which is solved by Equation 17. Given the constant recruitment of new clones the distribution of the t_{i}’s is uniform, which with Equation 17 implies a clone size distribution
that follows powerlaw scaling with an adjustable exponent that depends on $\gamma $. Note that the exponent for $P(C)$ differs by one from the exponent for the rank (Clauset et al., 2009), which is a complementary cumulative distribution, and thus $\alpha =1+\gamma $.
Steadystate distribution
Request a detailed protocolTo derive the powerlaw scaling, we have expanded the total repertoire size for small times (or death rates). How does the clone size distribution change later in life? At large times, the division rate ${b}_{0}/N(t)$ falls below the constant death rate d as the steadystate repertoire size ${N}_{\mathrm{\infty}}$ is approached following Equation 14. In this model this happens at a time ${t}^{\star}\simeq \mathrm{log}(1+1/\gamma )/d$, after which the large clones experience a deterministic force toward extinction. For times $t\gg {t}^{\star}$ the model effectively reduces to the neutral birthdeath dynamics considered in Appendix 2. (The growth rate fluctuations produced by variations of the total population size around steady state asymptotically vanish for large ${N}_{\mathrm{\infty}}$.) We thus expect the steadystate clone size distribution to be equivalent to that of the neutral model (Equation 48). Indeed this distribution accurately describes the distribution of small clones in old age (Figure 2B). The neutral distribution is not compatible with data as discussed before. However, the timescale over which large early founded clones vanish is long (Appendix 2) such that a tail of large clones resulting from the early growth dynamics can be maintained much beyond ${t}^{\star}$ until $t\gg {\tau}_{c}$ (see also Appendix 2 –Relaxation time scale in a neutral model).
Modeling longterm repertoire dynamics with fluctuating clonal growth rates
Slow convergence to steadystate scaling
Request a detailed protocolMultiplicative stochastic processes are a classical generative mechanisms for heavytailed distributions (Sornette and Cont, 1997; Gabaix, 1999; Newman, 2005). In the context of lymphocyte dynamics this mechanism has first been proposed by Desponds et al., 2016, who argued that fluctuations in antigen availability can lead to multiplicative stochastic dynamics producing powerlaw scaling at steady state. Here, we expand on this earlier work by analyzing a simple fluctuating fitness model outofsteadystate. Our analytical results show that the emergence of scaling can be slow when the fluctuation amplitude is small.
We opted to treat proliferation rate fluctuations as temporally uncorrelated for computational tractability (Equation 2). Correlations in proliferation rate fluctuations are clearly an important feature of shortterm dynamics – for example to describe the quick expansion and contraction during and following acute infection over a timescales of days and weeks, respectively (Mayer et al., 2019). However, given finite correlation times we expect to be able to capture dynamics over the long timescales which we are interested in here, with uncorrelated noise with an effective net fluctuation strength that averages over the shortterm dynamics.
In this limit clone sizes follow a geometric Brownian motion, that is $x=\mathrm{log}C/{C}_{0}$ follows the Langevin equation
with initial condition $x({t}_{i})=0$, where $\sigma $ sets the fluctuation strength and where $\u27e8{\eta}_{i}(t){\eta}_{j}({t}^{\prime})\u27e9={\delta}_{ij}\delta (t{t}^{\prime})$. A negative mean fitness ${f}_{0}<0$ balances the recruitment of new clones and the net expansion induced by the fluctuating term. In general, we might want to include also demographic noise and the extinction of clones as an absorbing boundary condition (Desponds et al., 2016), but here for simplicity we will neglect those effects. Equation 20 is a diffusion equation for the logarithmic clone size x and has the wellknown Green’s function
which describes how the distribution spreads out from an initial $\delta $distribution centered at size y. The clone size distribution at time T is given by
where t is the clonal age. For a constant immigration rate t is uniformly distributed and we obtain by integration
where $\theta (x)$ is the Heaviside step function, $\theta (x)=0$ for $x<0$ and $\theta (x)=1$ otherwise. For large T and $x>0$ this reduces to
which implies
recovering the steadystate result from Desponds et al., 2016.
Setting ${f}_{0}=\alpha {\sigma}^{2}$ and rescaling age as $\tau =T{\sigma}^{2}$, we can rewrite the finite time solution as
Plotting the cumulative distribution of clone sizes at different effective ages (Figure 6) we observe that the convergence of clone size distributions is slow when ${\sigma}^{2}$ is small. Based on estimates for the fluctuation strength from longitudinal data (Figure 4B), we would expect significant deviations from the steady state powerlaw scaling that persist into adulthood. Thus, this mechanism alone is unable to account for the observed powerlaw scaling in data.
A note on the scaling exponent
Request a detailed protocolA minimal requirement for the existence of a steady state is ${f}_{0}<0$ ensuring that clones eventually die to balance the recruitment of new clones. This condition still allows such multiplicative processes to produce powerlaws with arbitrary exponents as noted before (Desponds et al., 2016). Here, we propose that the parameters should fulfill a stronger condition. In particular, it seems reasonable to require that the large clones do not deterministically take up a larger fraction of the overall repertoire, or equivalently that their expected change in clone size should not exceed one. The mean of the lognormal distribution of clone size change is given by ${e}^{{f}_{0}+{\sigma}^{2}}$, and thus we find the stronger condition
Importantly, it follows that exponents in the vicinity of $\alpha =1$ arise without finetuning as long as the timescale of expected net clonal decay is large compared to the diffusion timescale.
Another perspective on the parameterization is provided by noting that the Langevin equation for C (not $x=\mathrm{log}C$) in the Stratonovich convention includes an extra drift term ${\sigma}^{2}$, to keep $\u27e8\mathrm{\Delta}C\u27e9$ independent of the choice of $\sigma $. Alternatively, in the Ito convention the extra drift term arises by Ito’s lemma when transforming the equation from C to x.
Predictions for longitudinal fluctuations in clone sizes
Request a detailed protocolTo quantify longitudinal fluctuations, we calculate the mean and variance of logclonesize changes with respect to a reference time t_{0}. From the model we, according to Equation 21, expect
The variance of logclonesize changes in empirical data involves an additional term ${\sigma}_{S}^{2}$ accounting for sampletosample variability. This term is expected not to depend on the time difference, and we can thus determine ${\sigma}^{2}$ by linear regression with an intercept that captures the sampling variability ${\sigma}_{S}^{2}$ (Figure 4B).
We note that a similar approach has been independently proposed in unpublished work by Ferri, 2018.
Relaxation of the zero insertion distribution
Request a detailed protocolHere, we solve for the relaxation dynamics of the zero insertion distribution in a simplified setting. Throughout we use log clone sizes $x=\mathrm{log}C$ for notational convenience. We posit that at time 0 the powerlaw distribution $P(x,0)=\alpha {e}^{\alpha x}$ is already established and we further assume that the ${r}^{\star}$ largest clones have zero insertion probability ${p}_{0,}$ and all smaller or later added clones have probability ${p}_{0,+}$. Then the probability that a clone of a given size x has zero insertions is given by
where $\mathrm{\Delta}{p}_{0}={p}_{0,}{p}_{0,+}$ and ${f}_{early}(x,t)$ is the fraction of clones of size x and time t that derive from the ${r}^{\star}$ largest clones at time 0.
In the following, we determine an analytical formula for ${f}_{early}(x,t)$ under the assumption that the dynamics leaves the distribution unchanged $P(x,t)=P(x,0)$. We then have
where $G(x,y,t)$ as before is the Green’s function of the fluctuating proliferation rate dynamics and ${x}_{min}$ is defined such that the total number of clones times $P(x>{x}_{min})$ equals ${r}^{\star}$. By integration one obtains
which after setting ${f}_{0}=\alpha {\sigma}^{2}$ reduces to
To convert clone size into ranks, we note that $\mathrm{rank}\sim {e}^{\alpha x}$ and thus ${x}_{min}x\sim \frac{1}{\alpha}\mathrm{log}\left(\frac{r}{{r}^{\star}}\right)$. In combination with Equation 33 and Equation 30 we thus obtain
Defining a characteristic timescale for the diffusive dynamics as ${\tau}_{d}=1/{(\alpha \sigma )}^{2}$, we can simplify this expression to
Simulation procedures
Repertoire formation
Request a detailed protocolTo simulate the model efficiently at large scales, we use a meanfield competition approximation (Materials and methods – Mean–field competition approximation). We verified the validity of the meanfield assumption by comparing them to full stochastic simulations of the coupled birthdeathimmigration equations, which we simulated using the Gillespie algorithm (Press et al., 2007; Figure 5). In the meanfield approximation, the proliferation rate is timedependent, which requires a specific procedure for sampling event times. The time interval until the next event depends on the total rate for all possible processes $\lambda (t)=\theta +b(t)+d.$ To sample an interval of time $\mathrm{\Delta}t$ between two events from an inhomogeneous Poisson process of rate $\lambda (t)$ one can sample from a Poisson process with a rate function ${\lambda}^{\star}(t)$ fulfilling the majoration condition ${\lambda}^{\star}\ge \lambda (t)\forall t$ and then reject a proposed time interval $\mathrm{\Delta}{t}^{\star}$ with a probability of $1\lambda (t+\mathrm{\Delta}{t}^{\star})/{\lambda}^{\star}(t+\mathrm{\Delta}{t}^{\star})$ (Lewis and Shedler, 1979). The thinned set of event times follows the statistics of the Poisson process with rate $\lambda (t)$. Here, because competition is increasing with time, $\lambda (t)$ decreases monotonically. Therefore, the homogeneous Poisson process with a constant rate function ${\lambda}^{\star}(t)=\lambda ({t}_{0})$, satisfies the majoration condition. Using this thinning technique, we are able to efficiently sample the next event time while accounting for the timedependence of the proliferation rate. The source code that allows reproduction of the statistical analyses and numerical results reported in the manuscript is published (Gaimann et al., 2020).
Simulated cohort
Request a detailed protocolAs empirical evidence shows that the tail of the clone size distribution is almost exclusively driven by cells with memory phenotype (Appendix 5), we focused on the clone size dynamics within the memory compartment. We assumed that the recruitment size for memory cells is independent of the prior naive cell dynamics, and we thus did not explicitly model the clone size dynamics within the naive compartment. Within the memory compartment, we modeled clone size dynamics under the combined effect of early deterministic expansions during repertoire formation and fluctuating clonal growth rates according to Equation 2. Given the large sizes of memory clones, we expect demographic stochasticity to be negligible relative to clone size variability introduced by fluctuating selection. For tractability, we thus ignored demographic fluctuations, which allowed us to combine the continuum solution to the deterministic clonal growth (Equation 16) with the stochastic propagator for the fluctuating dynamics (Equation 21) to efficiently simulate the dynamics. To study the enrichment of zero insertion clones in silico, we assigned newly recruited memory clones as having zero insertions with a probability equal to the fraction ${p}_{0}(t)$ of zero insertion clones within the naive compartment. We assumed ${p}_{0}(t)={p}_{0,}$ before TdT expression turnon at time ${t}^{\u2020}$ and ${p}_{0}(t)={p}_{0,}t/{t}^{\u2020}+{p}_{0,+}(1t/{t}^{\u2020})$ for $t>{t}^{\u2020}$, where $t/{t}^{\u2020}$ is the fraction of naive clones produced since the switch to the adult recombination statistics. Taken together, these simplifications lead to the following direct sampling scheme:
Sample the age T of an individual uniformly from the range $[0,80]$ years.
Set the number of clones equal to $\theta T$ (rounded to the nearest integer), where $\theta $ is the rate of recruitment of new clones to the memory compartment.
For each clone determine its recruitment time t_{i} by drawing uniformly from the range $[0,T]$.
Assign each clone as having zero insertions with a probability
$p}_{0}(t)=\{\begin{array}{ll}{p}_{0,}& t<{t}^{\u2020}\\ {p}_{0,}t/{t}^{\u2020}+{p}_{0,+}(1t/{t}^{\u2020})& \mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{w}\mathrm{i}\mathrm{s}\mathrm{e}\end{array$Sample the size ${C}_{i}(T)$ of each clone as follows (Equation 16 and Equation 21),
${C}_{i}=\mathrm{exp}({x}_{i}),\phantom{\rule{1em}{0ex}}{x}_{i}\sim \mathcal{\mathcal{N}}\left(d(T{t}_{i})+\frac{1}{1+\gamma}\mathrm{log}\left(\frac{{e}^{dT}1}{{e}^{d{t}_{i}}1}\right){\sigma}^{2}(T{t}_{i}),2{\sigma}^{2}(T{t}_{i})\right),\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}(36)$
where $d,\gamma ,{\sigma}^{2}$ are model parameters and $y\sim N(\mu ,{\sigma}^{2})$ indicates x being drawn from a normal distribution of mean µ and variance ${\sigma}^{2}$.
Finally to mimic the experimental sampling depth of ${N}_{sample}$ reads we determine sampled clone sizes ${\stackrel{~}{C}}_{i}$ by Poisson sampling,
${\stackrel{~}{C}}_{i}\sim \mathsf{\text{Pois}}({N}_{sample}\cdot {C}_{i}/N),\phantom{\rule{1em}{0ex}}\text{with}\phantom{\rule{thickmathspace}{0ex}}N=\sum _{i}{C}_{i},\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}(37)$
where $x\sim \mathrm{P}\mathrm{o}\mathrm{i}\mathrm{s}(\lambda )$ indicates x being drawn from a Poisson distribution of parameter $\lambda $.
Parameter choices
Request a detailed protocolIn the following, we summarize the parameters we used to simulate repertoire dynamics and provide additional motivation for our parameter choices.
Lifetimes of several years and several months have been measured by deuterium labeling for naive and memory T cells, respectively (De Boer and Perelson, 2013; Borghans et al., 2018). Clonal turnover can be substantially slower than cellular turnover when proliferation balances most death (Appendix 2). This has been shown to be the case for the maintenance of naive cells in human (Zhang et al., 2014), where the agingassociated decline of the fraction of T cells with TCR excision circles (TRECs) suggests $\gamma \sim 0.1$. Similarly, memory T cell numbers decline much more slowly overall than suggested by the deuterium labeling literature, which is thought to be driven by homeostatic proliferation in the absence of reinfection (Macallan et al., 2017). For example, T cell memory has been observed to decline with halflives of 8–15 years by following titers after small pox vaccination (Hammarlund et al., 2003). Additionally, the relatively short average lifetime of memory T cells likely masks substantial heterogeneity with a subset of more longlived cells also contributing to the slower longterm decline of memory cells (Akondy et al., 2017). Another line of direct evidence for long clonal persistence has come from two studies of identical twins (Pogorelyy et al., 2017; Tanno et al., 2020), which have shown an excess sharing of identical clones decades after in utero blood exchange in monochorionic twins.
To simulate repertoire formation (Figure 2B), we used a set of parameters summarized in Table 2. In choosing these parameters, we were not trying to reproduce exact values of any particular T cell subset, but rather illustrate a plausible biological parameter regime that characterizes T cell dynamics. We note that qualitatively the scaling presented in Figure 2B only depends on $\gamma $ as shown by our theoretical analysis, in a way that is shown in Figure 2D. For the recruitment rate $\theta $, we used a rate intermediate between those suggested by estimates of thymic output and the rate of recruitment of memory clones suggested by estimates of the diversity of the memory compartment (see below). We note that under meanfield competition the rate of recruitment $\theta $ only determines the overall number of clones, but does not influence the dynamics of individual clones. While $\theta $ decreases during adulthood for both the naive compartment (as thymic production wanes with age) and the memory compartment (as new primary responses are rarer at advanced age), we used a constant $\theta $ for simplicity. We expect this simplifying assumption not to qualitatively affect the results in Figure 2 due to a separation of timescales: The clone size scaling emerging from the expansionary dynamics only depends on the rate $\theta $ during infancy and not on slower changes in $\theta $ happening during adulthood. We chose a recruitment size of one to numerically investigate potential deviations from the continuum theory predictions due to demographic stochasticity. The number of recruited clones is of course much larger in practice in the memory compartment, and even naive cells undergo a few rounds of division before thymic export and thus have ${C}_{0}>1$. Assuming a larger C_{0} will further decrease the small effects of demographic stochasticity relative to the continuum theory.
To study the enrichment of zero insertion clones in a simulated cohort (Figure 4C), we used the same recruitment to proliferation ratio and death rate as in the previous simulation of repertoire formation. To determine the absolute number of large clones that have zero insertions in these simulations, the choice of the recruitment rate $\theta $ is important. Based on orderofmagnitude estimates of the clonal diversity of the memory compartment (Robins et al., 2009; Qi et al., 2014), we chose a value of $\theta ={10}^{5}$/year. Additionally, we chose a fraction of zero insertion clones within the early naive compartment of ${p}_{0,}=0.07$ (roughly equal to their overall fraction in cord blood [Pogorelyy et al., 2017]) and in the late naive compartment equal to ${p}_{0,+}=0.02$ (roughly equal to their overall fraction in adult blood). Finally, we used ${t}^{\u2020}=0.05$ years for the time of the recombination switch, which together with the choice of $\theta $ produces ~10^{4} excess zero insertion clones recruited during repertoire formation in line with the empirical enrichment data in the <10 years age group (Figure 3B). All parameters are summarized in Table 3.
Appendix 1
Subsampling scaling
Only a small fraction of the ~10^{12} T cells in the human body are sampled by repertoire sequencing. What effect does subsampling have on the clone size distribution? In the following, we discuss how subsampling affects the distribution of sampled clone sizes and we discuss analysis techniques for robust inferences and data visualization despite variations in sampling depth.
Inference of scaling exponent
Given a clone of size C in the repertoire, the number of reads from that clone $\stackrel{~}{C}$ follows a distribution $P(\stackrel{~}{C}C)$. The form of $P(\stackrel{~}{C}C)$ depends on the sampling process. To build intuition let us consider the simplest case, in which every cell is sampled independently with a probability $\eta $, the subsampling fraction. Then the sampling distribution is binomial
The mean of this distribution is
which implies that sampled clone sizes are on average smaller by a factor $\eta $ than the actual clone size. In the practically relevant limit where the sampling fraction is small, $\eta \ll 1$, we can further simplify and assume that the counts from the large clones follow a Poisson distribution. In the Poisson limit, the sampled clone size varies around its mean value with a coefficient of variation that scales as an inverse of the square root of the mean sampled count,
Importantly, the stochastic sampling introduces a subsampling scale, $\stackrel{~}{C}=\eta C\sim 1$, at the clone size $C=1/\eta $, from which on average we expect a single sampled cell. Due to the existence of this scale subsampling breaks scaleinvariance: even if $P(C)$ follows a perfect power law, the distribution of sampled counts
deviates from powerlaw scaling close to $\stackrel{~}{C}=1$. This intuition can be made rigorous using a generating function formalism (Stumpf et al., 2005): for example for $P(C)={C}^{2}/\zeta (2)$ one obtains for $\stackrel{~}{C}>1$
As expected the scaling with an exponent −2 is recovered asymptotically, but subsampling leads to a deviation from scaling when $\stackrel{~}{C}$ is close to 1.
The deviation from scaling due to subsampling leads to biases in naive estimates of the scaling exponent. How can we determine a powerlaw exponent in a way that is robust to subsampling? When the sampling distribution is known or can be inferred from replicate sequencing the exponent can be determined using maximum likelihood estimation of a model with an underlying power law distribution of clone sizes convolved with the sampling probability (Puelma Touzel et al., 2020). Here, we propose a simpler approach that does not require precise knowledge of the sampling process. We exploit the fact that the deviations from scaling vanish asymptotically for large $\stackrel{~}{C}$ (Equation 42), by excluding small clones below some minimal size ${C}_{min}$ from the fitting. The powerlaw exponent is expected to converge as we increase ${C}_{min}$, which we confirm using simulated data (Appendix 1—figure 1, blue dots). We can also consider more realistic models for the sampling process that account for overdispersion, that is their coefficient of variation exceeds the minimal value of one set by Poisson sampling. Mechanistically, such overdispersion arises for a number of reasons, most importantly because in practice we are not actually directly counting cells: in the DNAbased sequencing pipeline every cell can give rise to multiple sequencing reads due to the PCR amplification step, and in the mRNAbased sequencing pipeline despite the addition of unique molecular identifiers several of them can originate from different mRNA molecules from the same cell. As long as the number of reads from each cell is independently and identically distributed the law of large numbers ensures that the relative frequencies of large clones converge. We thus expect that the trimming method of fitting only to counts greater than ${C}_{min}$ also works for overdispersed sampling. We test the trimming method on simulated data, in which the sampling follows a negative binomial distribution with mean µ and variance $\mu +a{\mu}^{2}$ (which reduces to Poisson sampling for $a=0$). We find that trimming allows a correct estimate of $\alpha $ (Appendix 1—figure 1, orange and green dots).
Applying the same method to the empirical data we find that the fitted exponents also depend on ${C}_{min}$ (Appendix 1—figure 2). In practice, we chose ${C}_{min}=16$ to balance a tradeoff between minimizing bias and variance, which increases as more of the data is excluded from the fit Appendix 1—figure 2.
Graphical display of subsampled distributions
The intuition we have built about how subsampling affects clone size distributions can help us choose an appropriate method for displaying subsampled data (Appendix 1—figure 3). Which graphical representation of the clone size distribution minimizes the influence of variations in sampling depth?
The shift of the mean clone size (Equation 39) suggests that we should normalize sampled clone sizes by the sampling fraction $\eta $, as has been noted elsewhere (Levina and Priesemann, 2017). While experimentally we do not know the sampling fraction, we can instead simply divide the clone sizes by the total sample size (Appendix 1—figure 3C,D). This normalization is particularly intuitive as it corresponds to using the relative frequencies of cells in different clones. While the absolute number of cells in a large clone increases with more sampling, the fraction of all sampled cells that are part of a particular clone remains constant on average.
Plots of the cumulative distribution of clone sizes make it easier to visually assess the tail behavior of the distribution (Appendix 1—figure 3B) than plots of the probability density (Appendix 1—figure 3A). However, even after normalizing clone sizes by the sample size there remains a very visible shift between the cumulative distributions at different sampling depths (Appendix 1—figure 3C). This shift arises because the implicit normalization by the total number of unique clones makes the sampled cumulative distribution depend heavily on sampling depth. As sampling increases so does the total number of unique clones that will be sequenced. This suggests that we might do better by simply omitting the normalization. Ranking clones by their normalized size yields precisely such an unnormalized cumulative distribution. Taken together, by both scaling clone sizes by the sample size and resisting the temptation to normalize the ranks, we can collapse distributions sampled at different depths (Appendix 1—figure 3D).
Appendix 2
Modeling neutral repertoire dynamics
In the following, we review results on the neutral dynamics of clone sizes in which the continuous recruitment of new clones is balanced by a net negative growth of already established clones $b<d$. These models have a long history in ecology (Volkov et al., 2003), and have also been proposed as null models in the context of T cell dynamics previously (Desponds et al., 2017; Desponds et al., 2016; de Greef et al., 2020). While the results can be found in the literature their inclusion serves to introduce a parametrization highlighting the recruitment to proliferation ratio $\gamma $ as a key quantity governing clonal dynamics.
Steadystate clone size distribution
At steady state, the probability distribution $P(C)$ of clone sizes C needs to fulfill the balance condition
for all $C>{C}_{0}$ which yields
This distribution is characterized by powerlaw scaling with an exponent of 1 for small clone sizes, and, importantly, has an exponential cutoff at ${C}^{\star}=1/\mathrm{log}(d/b)$. In contradiction with this model experiments point toward powerlaw scaling with an exponent ~2 (Note that $P(C)\sim {C}^{\alpha 1}$ when $\mathrm{rank}\sim {C}^{\alpha}$). Additionally, the size of large clones seen experimentally is incompatible with the predicted exponential cutoff as we discuss below.
The total repertoire size follows the following continuum equation
such that at steadystate, $\frac{\mathrm{d}N}{\mathrm{d}t}=0$, the repertoire has a total size
For a more interpretable alternative parametrization, we introduce the recruitment to proliferation ratio for the maintenance of cells at steady state
Using this relation to rewrite Equation 44 we obtain
implying a cutoff clone size of ${C}^{\star}=1/\mathrm{log}(1+\gamma )$. The largest clones represent on the order of one percent of the repertoire, which assuming independent sampling from the underlying repertoire would correspond to ~10^{10} cells in the complete repertoire. For small $\gamma $, we can expand ${C}^{\star}\approx 1/\gamma $, so in order to have a cutoff clone size ${C}^{\star}$ of this order of magnitude one would need to have an unreasonably small $\gamma \sim {10}^{10}$.
Relaxation time scale in a neutral model
Over what timescale do transiently expanded clones disappear in a neutral model? The time scale ${\tau}_{c}=\frac{1}{db}$ for deterministic clonal decay can be much larger than the lifespan $1/d$ of a single cell when birth and death are closely balanced. Rewriting the birth rate in terms of $\gamma $ and d we obtain
demonstrating that for $\gamma \ll 1$ clonal dynamics is a factor of $1/\gamma $ slower than cellular dynamics.
Appendix 3
Influence of model assumptions on repertoire formation process
For tractability and interpretability, we have kept the model presented for repertoire formation in the main text deliberately simple. Here, we explore how a saturation of the proliferation rate, competition for specific resources, or variations in the recruited number of cells impact the clone size distributions.
Saturation of proliferation rate
Cellular growth is not arbitrarily fast, which is not accounted for in the simple model in which cells proliferate very rapidly early in life. To understand how such a saturation effect influences clone size distributions, we introduce an upper limit on birth rate that limits proliferation in the absence of competition. Following De Boer and Perelson, 1995, we set the clonal birth rate to $b(t)={b}_{0}/(K+N)$ for some constant K, which sets the repertoire size below which competition is negligible. Given this choice the birth rate remains limited to a value ${b}_{0}/K$ even in the absence of any competitors. Increasing K leads to deviation in the scaling of the largest clones, but the same scaling remains at intermediate clone sizes (Figure 2—figure supplement 2). In the model early clonal growth is exponential until the total repertoire has reached size $N(t)\sim K$, which explains the different distribution of the largest clones. However, the number of clones that are recruited during this phase grows only logarithmically with K due to the exponential increase in total repertoire size.
Competition for specific resources
T cells respond to stimuli from peptideMHC complexes, which could also act as limiting resources. T cells then compete only with those cells specific to the same antigens in contrast to the global competition considered previously. To assess how assumptions about the mechanisms of competition influence, our results we simulated the repertoire formation process using a classical description of competition for antigens (De Boer and Perelson, 1994; De Boer et al., 2001; Mayer et al., 2015). We consider a fixed number of antigens ${N}_{a}$ and encode the specificity of the M clones in a matrix K of size $M\times {N}_{a}$, where ${K}_{ij}=1$ if clone i recognizes the antigen j and ${K}_{ij}=0$ otherwise. We draw the entries of K independently with a fixed binding probability p_{b}. We assume that the proliferation rate of a cell of the ith clone is proportional to the amount of antigenic stimulation:
where the availability of antigen j is given by
The normalization of Equation 50 ensures that total proliferation is comparable to a global resource model with the same parameters independent of ${N}_{a}$. For computational tractability, we simulated the clone size dynamics without taking into account demographic stochasticity in proliferation and death of cells. While more specific competition (smaller p_{b}) leads to a deviation in the distribution of the largest clones, we find that clone size distributions are heavy tailed independently of the choice of p_{b} and all display the same scaling at intermediate clone sizes (Figure 2—figure supplement 3).
Variations of the recruitment size
The numbers of cells C_{0} that are recruited might also be variable. In particular, this will be the case when modeling the memory compartment, in which C_{0} represents the number of cells from a clone recruited into memory following infection. To understand how such variations modify the dynamics of repertoire formation, we derive an analytical prediction in the case where the distribution of recruitment sizes, $P({C}_{0})$, is lognormal. Given a lognormal distribution with parameters ${\mu}_{0}$ and ${\sigma}_{0}$ the mean introduction size is given by $\overline{{C}_{0}}={e}^{{\mu}_{0}+{\sigma}_{0}^{2}/2}$. To keep the mean introduction size constant while changing the variability of clone sizes, we use a parametrization in terms of $\overline{{C}_{0}}$ and ${\sigma}_{0}$ and set ${\mu}_{0}=\mathrm{log}(\overline{{C}_{0}}){\sigma}_{0}^{2}/2$. To determine the clone size distribution resulting from early repertoire growth, we integrate the continuum theory prediction, $P(C/{C}_{0})\propto {(C/{C}_{0})}^{2\gamma}$ over the distribution of C_{0}:
The complementary error function $erfc(x)$ saturates for $x\ll 1$. Thus, the distribution follows the same powerlaw $P(C)\sim {C}^{2\gamma}$ for large clones, $\mathrm{log}C/\overline{{C}_{0}}\gg {\sigma}_{0}\left(\sqrt{2}+\frac{3+2\gamma}{2}{\sigma}_{0}\right)$, while it deviates for smaller clones within the range of recruitment sizes (Figure 2—figure supplement 1).
Appendix 4
A unified view on mechanisms generating power laws in different growth processes
The origin of powerlaw scaling during repertoire formation is reminiscent of a class of stochastic processes widely studied in the literature as a mechanism underlying powerlaw distributions found in diverse contexts (Yule, 1924; Luria and Delbrück, 1943; Barabasi and Albert, 1999), which has been rediscovered multiple times since the pioneering work of Yule on speciation (Yule, 1924). Common to these processes is that the distribution of types at a given point is the result of a balance between the growth of existing types and the addition of new types. The different models depending on their context differ in (i) the growth rate $r(t)$ of the number of units of each already existing type and (ii) the rate function $\theta (t)$ at which new types are introduced. They all share the same basic mathematical mechanism that produces a power law distribution of types as we review below. The three maybe most wellknown instances of this class of processes are the following:
The Yule model of speciation (Yule, 1924), in which (i) species within a genus speciate at some constant rate, and (ii) new genus is created at a rate proportional to the number of already existing genera.
The LuriaDelbrück model of population genetics during exponential growth (Luria and Delbrück, 1943) in which (i) each cell divides at a constant rate, and (ii) new alleles arise through random mutation at a constant rate per cell division.
The BarabásiAlbert model of network growth (Barabasi and Albert, 1999), in which at every time step (ii) a new node is added, and is (i) linked to m already existing nodes with a probability proportional to the number links that a chosen node already has.
Despite differing in their assumptions about the functional form of the growth and innovation rates we show in the following that these different models all share a common mathematical basis. To provide a common terminology, we will use the language of urn models and refer to different types as urns and to the different number of units of each types as balls in each urn. In an attempt to unify the different models, we develop a continuum theory for these growthinnovation processes. To do so, we rescale time to
such that new urns are added at unit rate, $\theta (\tau )=1$. The number of balls in each urn then grows according to
The key to the powerlaw scaling in all these models is the existence of a regime in which
that is the growth rate scales inversely with rescaled time with a proportionality factor $1/\alpha $. Equation 55 has the same form as Equation 18 that we derived for our model of repertoire formation. Thus following the derivation of Equation 19 within Materials and methods – Continuum theory of clonal growth we obtain a subexponential growth of balls in already existing urns, which leads to a power law scaling of the distribution of balls per urn,
with an adjustable exponent that depends on $\alpha $.
Before deriving how Equation 55 arises in specific contexts let us first remark on a general consequence of this form of effective growth law: The total number of balls added to all existing urns per rescaled time unit is constant. To derive this let us assume that each new urn is populated by C_{0} balls, then the total number of balls $N(t)={\sum}_{i}{C}_{i}$ grows according to
which is solved by
Multiplying by the growth rate Equation 55 yields a constant,
thus showing the equivalency between the assumed growth rate dependency on rescaled time and the constancy of how many balls are added per rescaled time unit.
In the Yule process, we have $r(t)=r$ and recruitment is proportional to the number of genera, $\theta (t)=sG(t)$, which grow at a (generally different) rate s, $G(t)={G}_{0}{e}^{st}$. By integration we obtain $\tau (t)={G}_{0}\left({e}^{st}1\right)$, which leads to $\zeta (\tau )=\frac{r}{s\tau +s{G}_{0}}$. Thus $\zeta (\tau )\approx \frac{r}{s\tau}$ when the number of newly created genera exceeds the initial number $\tau \gg {G}_{0}$. The exponent of the powerlaw is determined by the ratio of the growth of genera and species, $\alpha =s/r$.
In the LuriaDelbrück model, we have $r(t)=r$ assuming mutations do not change the growth rate. Recruitment is proportional to the total population size $\theta (t)=\mu rN(t)$, where µ is the mutation probability per replication and where $N(t)={N}_{0}{e}^{rt}$. By integration we obtain $\tau (t)=\mu {N}_{0}\left({e}^{rt}1\right)$, which leads to $\zeta (\tau )=\frac{1}{\tau +\mu {N}_{0}}$. Thus $\zeta (\tau )=\frac{1}{\tau}$ when $\tau \gg \mu {N}_{0}$. In contrast to Yule’s model, the powerlaw exponent is fixed at $\alpha =1$ under our assumption that mutations are neutral, because the same growth process governs the increase in $\theta (t)$ and in cell numbers.
In the BarabasiAlbert model, the introduction rate $\theta (t)=1$ is constant, but $r(t)$ decreases with time. The m newly added links attach preferentially to those nodes that already have a large degree. The growth rate $r(t)=m/N(t)$ of a node thus decreases proportionally to the total degree $N=2mt$ of all present nodes. We have $r(t)=1/(2t)$, which implies $\zeta (\tau )=\frac{1}{2\tau}$ and $\alpha =2$.
Appendix 5
Relation between clone size and cellular phenotypes
In both cohorts, all T cells from peripheral blood were sequenced irrespective of their phenotypes. Antigenic challenges drive large clonal expansions and we thus expect clones with effector or memory cells to be larger than naive clones all else being equal (Farber et al., 2014; Mayer et al., 2019). This has generally been confirmed by TCR repertoire sequencing studies (Oakes et al., 2017), but there have also been some reports (Qi et al., 2014; Pogorelyy et al., 2017) of expanded naive clones with similar sizes to the largest memory clones. Given this unclear picture from the literature, we analyzed the relative contribution of naive and memory cells to clones of different sizes.
Overall, we might expect that naive clones dominate the clone size distribution at the smallest sizes. To test this idea, we compared sequencing and flow cytometry data from the Britanova cohort and found that the fraction of naive cells in different individuals explains a remarkably high 88% of variability in the number of clones sequenced only once after subsampling all repertoires to the same size (Figure 1—figure supplement 2A). To further determine how cells from clones of different sizes partition phenotypically, we analyzed data from a study in which T cells were sequenced both in unsorted blood as well as after sorting into naive and memory cells (Chu et al., 2019). We find that the sizes of large clones follow the same scaling in unsorted blood and in the memory compartment (Figure 1—figure supplement 2B) Within the naive compartment most clones are small, in particular when excluding clones from which cells are also found in the memory compartment (Figure 1—figure supplement 2B, red line). We note from the plot that all the largest 200 clones in unsorted blood have memory phenotype cells, and less than 1% of the top 1000 clones are not found within the memory compartment. This rules out that the enrichment of zero insertion clones among the most abundant clones found in Figure 3 is driven by naive clones as has been suggested in a previous study (Pogorelyy et al., 2017). The relative frequency of a clone within the memory compartment is larger by a constant foldfactor (Figure 1—figure supplement 2C), likely reflecting an increased relative frequency of the large clones when excluding naive cells from the denominator.
Our analysis of the clone size distributions in unsorted blood shows that the largest clones take up an increasing fraction of all sampled reads with age (Figure 1—figure supplement 3B,E). Flow cytometry data shows that the overall fraction of memory T cells in blood also increases with age (Figure 1—figure supplement 2D; Pediatric AIDS Clinical Trials Group et al., 2003; Britanova et al., 2016). How much is the former increase explained by the latter? In the following, we show how to answer this question in the absence of direct data on sorted memory T cell repertoires. Let us define ${C}_{i}^{+}$ as the number of memory cells of the ith clone and ${C}_{i}^{}$ as the number of naive cells. What we are measuring in unsorted blood is the overall fractional size f_{i} of the clone,
What we are interested in instead is the fraction of the memory compartment taken up by cells belonging to the same clone,
We have shown above that empirically for the largest clones ${C}_{i}^{+}\gg {C}_{i}^{}$. Thus
where ${r}^{+}$ is the fraction of memory cells within the repertoire. Finally, we approximate ${r}^{+}$ by its mean value expected at each given age as fitted from the flow cytometry data (Figure 1—figure supplement 2D). We find that this normalization mostly collapses the tails of empirical clone size distributions (Figure 1—figure supplement 3C,F). This implies that most of the increase in the sizes of the largest clones is explained by the overall expansion of the memory compartment, while the relative clone sizes within the memory compartment are more stable.
Data availability
No new data was generated in this study. All source codes associated with this manuscript are available online at https://github.com/andim/papertcellimprint (copy archived at https://archive.softwareheritage.org/swh:1:rev:9029ffeeb645d02f1fc880a89e136448c6430f49/).

ZenodoDynamics of individual T cell repertoires: from cord blood to centenarians.https://doi.org/10.5281/zenodo.826447

immunoACCESSImmunosequencing identifies signatures of cytomegalovirus exposure history and HLAmediated effects on the T cell repertoire.https://doi.org/10.21417/B7001Z

immunoACCESSLongitudinal immunosequencing in healthy people reveals persistent T cell receptors rich in highly public receptors.https://doi.org/10.21417/B7J01X

immunoACCESSCytomegalovirus Exposure in the Elderly Does Not Reduce CD8 T Cell Repertoire Diversity.https://doi.org/10.21417/PL2018JI
References

Quantitative immunology for physicistsPhysics Reports 849:1–83.https://doi.org/10.1016/j.physrep.2020.01.001

The role of models in understanding CD8+ Tcell memoryNature Reviews Immunology 5:101–111.https://doi.org/10.1038/nri1550

Agerelated decrease in TCR repertoire diversity measured with deep and normalized sequence profilingThe Journal of Immunology 192:2689–2698.https://doi.org/10.4049/jimmunol.1302064

Dynamics of individual T cell repertoires: from cord blood to centenariansThe Journal of Immunology 196:5005–5013.https://doi.org/10.4049/jimmunol.1600005

Building a T cell compartment: how immune cell development shapes functionNature Reviews Immunology 20:499–506.https://doi.org/10.1038/s4157702003323

Rebooting human immunologyAnnual Review of Immunology 36:843–864.https://doi.org/10.1146/annurevimmunol042617053206

Resource competition determines selection of B cell repertoiresJournal of Theoretical Biology 212:333–343.https://doi.org/10.1006/jtbi.2001.2379

T cell repertoires and competitive exclusionJournal of Theoretical Biology 169:375–390.https://doi.org/10.1006/jtbi.1994.1160

Towards a general function describing T cell proliferationJournal of Theoretical Biology 175:567–576.https://doi.org/10.1006/jtbi.1995.0165

Quantifying T lymphocyte turnoverJournal of Theoretical Biology 327:45–87.https://doi.org/10.1016/j.jtbi.2012.12.025

Evidence for increased T cell turnover and decreased thymic output in HIV infectionThe Journal of Immunology 167:6663–6668.https://doi.org/10.4049/jimmunol.167.11.6663

BookComputer Age Statistical InferenceCambridge University Press.https://doi.org/10.1017/CBO9781316576533

Human memory T cells: generation, compartmentalization and homeostasisNature Reviews Immunology 14:24–35.https://doi.org/10.1038/nri3567

Junctional sequences of fetal T cell receptor beta chains have few N regionsJournal of Experimental Medicine 174:115–124.https://doi.org/10.1084/jem.174.1.115

ThesisStochastic processes for natural evolutionary dynamics of Tcell repertoiresUniversità di Bologna (Master thesis).

Zipf's Law for Cities: An ExplanationThe Quarterly Journal of Economics 114:739–767.https://doi.org/10.1162/003355399556133

Softwarepapertcellimprint, version swh:1:rev:9029ffeeb645d02f1fc880a89e136448c6430f49/Software Heritage.

Neonatal tolerance revisited: a perinatal window for aire control of autoimmunityJournal of Experimental Medicine 206:1245–1252.https://doi.org/10.1084/jem.20090300

The antigenspecific CD8+ T cell repertoire in unimmunized mice includes memory phenotype cells bearing markers of homeostatic expansionJournal of Experimental Medicine 206:435–448.https://doi.org/10.1084/jem.20081829

Duration of antiviral immunity after smallpox vaccinationNature Medicine 9:1131–1137.https://doi.org/10.1038/nm917

Sex differences in immune responsesNature Reviews Immunology 16:626–638.https://doi.org/10.1038/nri.2016.90

Subsampling scaling: a theory about inference from partly observed systemsNature Communications 8:15140.https://doi.org/10.1038/ncomms15140

Simulation of nonhomogeneous poisson processes by thinningNaval Research Logistics Quarterly 26:403–413.https://doi.org/10.1002/nav.3800260304

Memory CD4^{+} T cells are generated in the human fetal intestineNature Immunology 20:301–312.https://doi.org/10.1038/s4159001802949

Cytomegalovirus exposure in the elderly does not reduce CD8 T cell repertoire diversityThe Journal of Immunology 202:476–483.https://doi.org/10.4049/jimmunol.1800217

Mutations of Bacteria from virus sensitivity to virus resistanceGenetics 28:491–511.

How many TCR clonotypes does a body maintain?Journal of Theoretical Biology 389:214–224.https://doi.org/10.1016/j.jtbi.2015.10.016

Power laws, pareto distributions and Zipf's lawContemporary Physics 46:323–351.https://doi.org/10.1080/00107510500052444

Lymphocyte subsets in healthy children from birth through 18 years of age: the pediatric AIDS clinical trials group P1009 studyThe Journal of Allergy and Clinical Immunology 112:973–980.https://doi.org/10.1016/j.jaci.2003.07.003

Persisting fetal clonotypes influence the structure and overlap of adult human T cell receptor repertoiresPLOS Computational Biology 13:e1005572.https://doi.org/10.1371/journal.pcbi.1005572

Inferring the immune response from repertoire sequencingPLOS Computational Biology 16:e1007873.https://doi.org/10.1371/journal.pcbi.1007873

Timely and spatially regulated maturation of B and T cell repertoire during human fetal developmentScience Translational Medicine 7:276ra25.https://doi.org/10.1126/scitranslmed.aaa0072

Overlap and effective size of the human CD8+ T cell receptor repertoireScience Translational Medicine 2:47ra64.https://doi.org/10.1126/scitranslmed.3001442

VDJdb: a curated database of Tcell receptor sequences with known antigen specificityNucleic Acids Research 46:D419–D427.https://doi.org/10.1093/nar/gkx760

Broadly targeted human cytomegalovirusspecific CD4+ and CD8+ T cells dominate the memory compartments of exposed subjectsJournal of Experimental Medicine 202:673–685.https://doi.org/10.1084/jem.20050882

A mathematical theory of evolution, based on the conclusions of Dr J C willis, frspPhilosophical Transactions of the Royal Society of London 213:21–87.

CD4 T cells with effector memory phenotype and function develop in the sterile environment of the fetusScience Translational Medicine 6:238ra72.https://doi.org/10.1126/scitranslmed.3008748
Decision letter

Naama BarkaiSenior Editor; Weizmann Institute of Science, Israel

Armita NourmohammadReviewing Editor; University of Washington, United States

Andrew J YatesReviewer; Columbia University, United States

Herbert LevineReviewer; Northeastern University, United States

Rob J de BoerReviewer; Utrecht University, Netherlands
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This paper provides a mechanistic model to explain the generation and maintenance of the "powerlaw" clone size distribution in the memory T cell repertoire. In particular, the authors show that clonal expansions during a perinatal time window are a dominant determinant of T cell clonal abundances a mechanistic explanation for a longstanding puzzle in quantitative immunology.
Decision letter after peer review:
Thank you for submitting your article "Early life imprints the hierarchy of T cell clone sizes" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Naama Barkai as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Andrew J Yates (Reviewer #1); Herbert Levine (Reviewer #2); Rob J de Boer (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
This manuscript provides potential mechanistic explanation for the distribution of memory T cell clone sizes in humans, which follow the same power law in two independent studies. The reviewers all agree that this is an excellent paper, and it presents a thorough explanation of the concepts and the analysis. However, there are still a few comments that we would like to see addressed in this manuscript.
Essential revisions:
1) The basic mechanism described in the manuscript relies on the existence of elevated proliferation early in life, which the authors invoke by postulating that proliferation rates depend inversely on pool size through a competitive mechanism. As far as I am aware there is no strong evidence for such competition, at least under normal (lymphoreplete) conditions. As evidence you cite the Douek data showing constant TREC frequencies in blood in children as thymic export wanes, implying reduced proliferation over time (which was analyzed by Bains et al. 2009) – but this was from naive T cells only. Can you find stronger evidence (e.g. Ki67 expression in memory subsets with age) to support this (key) assumption for memory?
In mice, Ki67 across all thymocyte subsets is elevated early in life and settles by about 3mo of age – suggesting that naive clones exported early in life are larger. If such a mechanism operates in humans it could add to the zeroinsertion effect that boosts naive T cell clone sizes early in life.
2) On similar lines, you assume that theta (the rate of recruitment of new clones into memory) is independent of age. But thymic involution means the rate of generation of new naive clones wanes with age – might this lead to a reduction in theta? Can you achieve the same results with agedependent generation of new naive clones and agedependent recruitment into memory, rather than variation in homeostatic division?
3) The reviewers suggest that the authors provide more details (both immunological and mathematical) in the main text. Make clear whether or not CD4^{+} and CD8^{+} T cells are distinguished by sorting. There is no such thing as "the human T cell repertoire", there are naive and memory, CD4 and CD8 T cell repertoires, and there is a Treg repertoire with a potentially very different clone size distribution. I do agree, however, that your interesting results could be general, i.e., relevant for several of these repertoires, but please make clear for which ones.
Examples from the main text that are confusing on their own include:
i) It is not clear from the main text if are you describing naive or memory T cell repertoires? For an immunologist this makes an enormous difference. For instance, when I read in the Introduction that "At the same time the most abundant clones typically account for more than 1% of all sequencing reads, equivalent to a clone size of ∼10^{10} cells when extrapolating to the full repertoire. It is unknown when these large clonal expansions happen, and more broadly what determines the hierarchy of clone sizes.", I immediately think of the several papers showing that CMV specific memory T cell comprising more than 10% of the repertoire. So, how can I appreciate this "It is unknown when these large clonal expansions happen"? First show that these are not simply CMV specific expansions occurring after CMV infection.
ii) When it is written in the legend of Figure 1 that "Clone sizes were normalized by the […] the memory cell fraction […] in the subset composition of peripheral blood", I read something that cannot be understood without reading the supplementary information. Even after reading the supplementary information I remain confused, because naive and memory repertoires probably have different clone size distributions, so how can one normalize by just knowing the fraction of memory cells?
iii) In the legend of Figure 2 parameter names and values are provided, without giving these parameters interpretation. Please add these interpretations to the main text, because a reader may want to question these values (now it is only explained in the supplementary information). For instance, a cell death rate of 0.2 per year assumes that memory T cells have an expected life span of 5 years. This could be fair given all he uncertainties (e.g., circulating but memory T cells have an expected life span of about half a year, tissue resident memory T cells probably many years, and which subset go memory cells are you sampling from). What is γ? Generating 10^{6} clones per year could be a questionable number (some will say 10^{6} per day, and you confirm this in the supplementary information).
https://doi.org/10.7554/eLife.61639.sa1Author response
Essential revisions:
1) The basic mechanism described in the manuscript relies on the existence of elevated proliferation early in life, which the authors invoke by postulating that proliferation rates depend inversely on pool size through a competitive mechanism. As far as I am aware there is no strong evidence for such competition, at least under normal (lymphoreplete) conditions. As evidence you cite the Douek data showing constant TREC frequencies in blood in children as thymic export wanes, implying reduced proliferation over time (which was analyzed by Bains et al. 2009) – but this was from naive T cells only. Can you find stronger evidence (e.g. Ki67 expression in memory subsets with age) to support this (key) assumption for memory?
In mice, Ki67 across all thymocyte subsets is elevated early in life and settles by about 3mo of age – suggesting that naive clones exported early in life are larger. If such a mechanism operates in humans it could add to the zeroinsertion effect that boosts naive T cell clone sizes early in life.
We thank the reviewers for raising this important question. The general existence of competition between T cells for homeostatic proliferation triggers has been relatively well accepted within the field for both the naive and memory compartment, see e.g. the reviews by Freitas and Rochas (1) or by Surh and Sprent (2) for a discussion of experimental evidence and possible mechanisms. Here, we thus focus on the more specific question of whether reduced competition early in life leads to elevated proliferation of T cells. While unfortunately we did not find any Ki67 expression data on human memory T cells in infancy in the literature, there are a number of other lines of evidence from both mice and human consistent with the hypothesis of elevated proliferation of T cells early in life. For example, Rufer et al. (3) have shown rapid telomere shortening among memory T cells early in life consistent with large proliferative turnover. Furthermore, early studies of homeostatic proliferation of T cells in neonatal mice have shown that the dividing T cells eventually express memorylike markers (4, 5). Additionally, homeostatic proliferation was shown in these studies to be suppressed more readily by the transfer of memory than naive cells, consistent with competition limiting proliferation specifically within the memory compartment. Following on these early studies there have been a number of reports demonstrating the existence of memoryphenotype antigenspecific cells in unimmmunized or germfree mice (6, 7), which are thought to result from proliferation of T cells during neonatal lymphopenic conditions stimulated by other antigens. (We have added citations to these studies to our revised manuscript.) Unexposed humans similarly have preexisting memoryphenotype cells against many viral antigens (8), which might plausibly have the same origin. It seems likely that similar proliferation happens in normal pathogenreplete environments for memory cells created in response to foreign antigens.
Finally, we note that while the specific model we present in the Results section assumes that early life proliferation happens within the memory compartment, the mechanism we present is more general. As we note in the discussion an alternative scenario would be that homeostatic proliferation early in life sets up a broad distribution of naive T cell clone sizes, which is translated upon antigen exposure into a broad distribution of memory T cell clone sizes. Distinguishing these two scenarios is an important direction for future studies.
2) On similar lines, you assume that theta (the rate of recruitment of new clones into memory) is independent of age. But thymic involution means the rate of generation of new naive clones wanes with age – might this lead to a reduction in theta? Can you achieve the same results with agedependent generation of new naive clones and agedependent recruitment into memory, rather than variation in homeostatic division?
A temporally changing recruitment rate alone cannot explain the early emergence of powerlaw scaling of clone sizes, as the distribution of clone sizes is independent of θ in a neutral model without homeostatic expansion. We agree that θ is likely a declining function of age, both for the naive compartment (because of thymic involution) as well as for the memory compartment (because of a decreasing rate of novel primary infections). Our assumption of a constant θ is motivated by a separation of timescales argument: The emergence of scaling during early life only depends on the value of θ during roughly the first year of life, and it thus seemed justified not to model the slower dynamics of θ due to thymic involution during adulthood.
We have provided an extended justification of this simplifying assumption in the Materials and methods:
"While θ decreases during adulthood for both the naive compartment (as thymic production wanes with age) and the memory compartment (as new primary responses are rarer at advanced age), we used a constant θ for simplicity. We expect this simplifying assumption not to qualitatively affect the results in Figure 2 due to a separation of timescale: The clone size scaling emerging from the expansionary dynamics only depends on the rate θ during infancy and not on slower changes in θ happening during adulthood. "
3) The reviewers suggest that the authors provide more details (both immunological and mathematical) in the main text.
We thank the reviewers for their valuable suggestions and have updated the main text to provide more details both on the biology and the mathematics. In addition to the specific suggested changes we have expanded the mathematical derivations underlying the last section of the results, which were previously relegated to the supplementary information. We hope that with these changes our paper strikes a better balance between conciseness and providing necessary context directly in the main text.
Make clear whether or not CD4^{+} and CD8^{+} T cells are distinguished by sorting. There is no such thing as "the human T cell repertoire", there are naive and memory, CD4 and CD8 T cell repertoires, and there is a Treg repertoire with a potentially very different clone size distribution. I do agree, however, that your interesting results could be general, i.e., relevant for several of these repertoires, but please make clear for which ones.
We have updated the caption of Figure 1 to reflect that both studies do not distinguish T cells by sorting (neither by CD4/CD8, naive/memory, or Treg/Tconv).
"Each line shows the size distribution of all T cell clones in an individual in an unsorted blood sample, this is independently of the phenotypes of the cells making up the different clones (see Appendix 5 for analyses of phenotypically resolved data). "
We have also now make more explicit reference to our analyses of phenotypically sorted data in the main text:
"In both studies T cells were sequenced without regard to their phenotypic characteristics. The clone sizes thus represent the full lineage of cells with a common origin irrespective of differentiation status (see Appendix 5 – Relation between clone size and cellular phenotypes and Figure 1—figure supplement 2 for a phenotypically resolved analysis). "
In short, these analyses show that the tail behavior in the unsorted data most closely reflects scaling within the memory compartment. This observation motivates the focus on memory dynamics in the Results section. Nevertheless, we agree that our results might be more general, which is why we present our initial conceptual model without a specific reference to a particular T cell subset. We hope that recent technological advances in coupled single cell phenotyping and T cell sequencing, as well as in bulk repertoire sequencing of sorted T cell populations (9, 10) will help further refine our understanding of how the broad patterns in unsorted blood reflect the composition of various T cell compartments.
Examples from the main text that are confusing on their own include:
i) It is not clear from the main text if are you describing naive or memory T cell repertoires? For an immunologist this makes an enormous difference. For instance, when I read in the Introduction that "At the same time the most abundant clones typically account for more than 1% of all sequencing reads, equivalent to a clone size of ∼10^{10} cells when extrapolating to the full repertoire. It is unknown when these large clonal expansions happen, and more broadly what determines the hierarchy of clone sizes.", I immediately think of the several papers showing that CMV specific memory T cell comprising more than 10% of the repertoire. So, how can I appreciate this "It is unknown when these large clonal expansions happen"? First show that these are not simply CMV specific expansions occurring after CMV infection.
We have updated the paragraph in the Introduction to give a more precise motivation of what we aim to explain in our paper:
"In a typical sample of T cells from peripheral blood a large fraction of clones are only seen once within 10^{5}−10^{7} sampled sequences, while the most abundant clones account for more than 1% of all sequencing reads. Such powerlaw scaling of clone sizes has been shown to arise at steady state in models of fluctuating clonal selection as driven by different antigen encounters (Desponds et al., 2016). However, it is unclear whether this mechanism alone is sufficient to explain how clone size scaling is established, and more broadly how variable the clonal hierarchy is over time."
Indeed, what is surprising about the empirical data from a theory perspective is not the existence of some large clones, but rather the reproducible scaling law linking clonal rank and size across individuals with varied history of exposure to different pathogens.
We have also included additional details in the main text to highlight that the cohort studies sequence unsorted T cells from blood samples:
"[…] we reanalyzed data from two largescale cohort repertoire sequencing studies of human blood samples, which used fundamentally different sequencing pipelines and thus have different sources of noise (Materials and methods). Both studies sequenced the locus coding for the hypervariable TCR CDR3−β chain from unsorted T cells from peripheral blood of healthy human volunteers spanning a large range of ages (Figure 1—figure supplement 1)."
Finally, to address the role of CMV specific expansions in setting up the broad hierarchy, we now more explicitly refer in the main text to a supplementary figure analyzing how scaling exponents depend on CMV status.
"While exponents overall decreased slightly with age, the dependence on age accounted for surprisingly little variation in both cohorts (Figure 1D), including when controlling for sex and cytomegalovirus exposure status (Figure 1—figure supplement 5)."
Our analysis shows that while CMV positive individuals do have a more skewed repertoire on average, CMV negative donors also have a broad distributions of clone sizes.
ii) when it is written in the legend of Figure 1 that "Clone sizes were normalized by the […] the memory cell fraction […] in the subset composition of peripheral blood", I read something that cannot be understood without reading the supplementary information. Even after reading the supplementary information I remain confused, because naive and memory repertoires probably have different clone size distributions, so how can one normalize by just knowing the fraction of memory cells?
In short, this normalization is possible because the tail of the distribution in unsorted blood is completely due to clones with a majority of memory phenotype cells. We have provided extended explanation in the main text as detailed below.
We have updated the caption,
"Normalized clone sizes were defined as the number of reads of a given receptor’s sequence divided by the total number of reads within a sample and a factor equal to the average fraction of T cells with memory phenotype at different ages to account for variations in sampling depth and in the subset composition of peripheral blood, respectively (Figure 1—figure supplement 3)."
and the text
"After normalizing clone sizes to account for variations in sampling depth and for the increasing fraction of T cells of memory phenotype with age (Figure 1—figure supplement 3), we found that.…" to more precisely define how this normalization step is implemented. We have also substantially expanded the rationale for this normalization described in Appendix 5.
"Our analysis of the clone size distributions in unsorted blood shows that the largest clones take up an increasing fraction of all sampled reads with age (Figure 1—figure supplement 3B,E). […] This implies that most of the increase in the raw counts of the largest clones is explained by the overall expansion of the memory compartment. "
iii) In the legend of Figure 2 parameter names and values are provided, without giving these parameters interpretation. Please add these interpretations to the main text, because a reader may want to question these values (now it is only explained in the supplementary information). For instance, a cell death rate of 0.2 per year assumes that memory T cells have an expected life span of 5 years. This could be fair given all he uncertainties (e.g., circulating but memory T cells have an expected life span of about half a year, tissue resident memory T cells probably many years, and which subset go memory cells are you sampling from). What is γ? Generating 10^{6} clones per year could be a questionable number (some will say 10^{6} per day, and you confirm this in the supplementary information).
We have restructured our exposition in the main text for greater clarity. We have expanded the description of the model parameters as follows:
"Following previous work (Bains et al., 2009; Lythe et al., 2016) we assume that T cells proliferate at a rate b = b_{0}/N inversely proportional to the total number of cells N already present in the repertoire. This assumption leads to increased proliferation early in life before the repertoire has reached its homeostatic size, and it is compatible with a simple mechanistic model of T cell competition (Material and methods subsection “Mechanistic motivation for the competition function”). We further assume that cells die at a rate d, and that new clones are recruited at rate θ with an initial size C_{0}. For simplicity, we set C_{0} equal to one in the following and we assume constant rates d and θ."
For easier reference we have repeated the parameter names in the caption of Figure 2. We have also included the formula that relates γ to the other model parameters in the main text, as well as given an explicit equation for the dependence of the clonal proliferation rate on repertoire size. We note that the parameters are also defined in the model sketch (panel A of the same figure). To highlight the most relevant parameter choices and how they relate to the observed results, we have added the following paragraph to the main text:
"The powerlaw tail persisted over multiple decades of aging, much beyond the timescale of cellular turnover, 1/d = 5 years, assumed in the simulations. A mathematical analysis shows that the relative timescales of clonal and cellular turnover are controlled by the control parameter γ = θC_{0}/b_{0}, which is the ratio of the contribution of recruitment and proliferation to overall compartment maintenance (Appendix 2). The long timescale of clonal turnover emerges because we have chosen parameters compatible with the biological parameter regime γ < 1 (den Braber et al., 2012; Macallan et al., 2017), where most cell death is balanced by proliferation."
We have also extended the parameter discussion in Materials and Methods to highlight that these parameters are not meant to exactly reproduce those for any particular T cell subset, but rather illustrate the dynamical mechanism in a generally relevant parameter regime.
References
1) A. A. Freitas and B. Rocha. Population biology of lymphocytes: the flight for survival. Annual review of immunology, 18(1):83 111, 2000.
2) C. D. Surh and J. Sprent. Homeostasis of Naive and Memory T Cells. Immunity, 29(6):848 862, 2008.
3) N. Rufer, T. H. Brummendorf, S. Kolvraa, C. Bischoff, K. Christensen, L. Wadsworth, M. Schulzer, and P. M. Lansdorp. Telomere Fluorescence Measurements in Granulocytes and T Lymphocyte Subsets Point to a High Turnover of Hematopoietic Stem Cells and Memory T Cells in Early Childhood. Journal of Experimental Medicine, 190(2):157 167, 1999.
4) A. Le Campion, C. Bourgeois, F. Lambolez, B. Martin, S. Léaument, N. Dautigny, C. Tanchot, C. Pénit, and B. Lucas. Naive T cells proliferate strongly in neonatal mice in response to selfpeptide/selfMHC complexes. Proceedings of the National Academy of Sciences, 99(7):4538 4543, 2002.
5) B. Min, R. McHugh, G. D. Sempowski, C. Mackall, G. Foucras, and W. E. Paul. Neonates support lymphopeniainduced proliferation. Immunity, 18(1):131 140, 2003.
6) C. Haluszczak, A. D. Akue, S. E. Hamilton, L. D. S. Johnson, L. Pujanauski, L. Teodorovic, S. C. Jameson, and R. M. Kedl. The antigenspecific CD8 + T cell repertoire in unimmunized mice includes memory phenotype cells bearing markers of homeostatic expansion. Journal of Experimental Medicine, 206(2):435 448, 2009.
7) T. Kawabe, D. Jankovic, S. Kawabe, Y. Huang, P.h. Lee, H. Yamane, J. Zhu, A. Sher, R. N. Germain, and W. E. Paul. Memoryphenotype CD4 + T cells spontaneously generated under steadystate conditions exert innate TH1lik effector function. Science Immunology, 9304(June), 2017.
8) L. F. Su, B. A. Kidd, A. Han, J. J. Kotzin, and M. M. Davis. VirusSpecific CD4^{+} MemoryPhenotype T Cells Are Abundant in Unexposed Adults. Immunity, 38(2):373 383, 2013.
9) T. Oakes, J. M. Heather, K. Best, R. ByngMaddick, C. Husovsky, M. Ismail, K. Joshi, G. Maxwell, M. Noursadeghi, N. Riddell, T. Ruehl, C. T. Turner, I. Uddin, and B. Chain. Quantitative characterization of the T cell receptor repertoire of naive and memory subsets using an integrated experimental and computational pipeline which is robust, economical, and versatile. Frontiers in Immunology, 8(OCT):1 17, 2017.
10) C. Soto, R. G. Bombardi, M. Kozhevnikov, M. Gujral, S. Mallal, and J. E. Crowe. High Frequency of Shared Clonotypes in Human T Cell Receptor Repertoires. Cell Reports, 32(2):107882, 2020.
https://doi.org/10.7554/eLife.61639.sa2Article and author information
Author details
Funding
Princeton University (LewisSigler fellowship)
 Andreas Mayer
Deutscher Akademischer Austauschdienst (RISE fellowship)
 Mario U Gaimann
Simons Foundation (SFARI/597491RWC)
 Jonathan Desponds
National Science Foundation (1764421)
 Jonathan Desponds
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank William Bialek, Curtis Callan, Ivana Cvijovic, Yuval Elhanati, Simone Mayer, Mikhail Pogorelyy, and Ned Wingreen for discussions and comments on the manuscript. This work was supported by a DAAD RISE Worldwide fellowship (MUG), the NSFSimons Center for Quantitative Biology under grants Simons Foundation SFARI/597491RWC and National Science Foundation 17764421 (JD), and a Lewis–Sigler fellowship (AM).
Senior Editor
 Naama Barkai, Weizmann Institute of Science, Israel
Reviewing Editor
 Armita Nourmohammad, University of Washington, United States
Reviewers
 Andrew J Yates, Columbia University, United States
 Herbert Levine, Northeastern University, United States
 Rob J de Boer, Utrecht University, Netherlands
Publication history
 Received: July 31, 2020
 Accepted: December 20, 2020
 Accepted Manuscript published: December 21, 2020 (version 1)
 Accepted Manuscript updated: January 5, 2021 (version 2)
 Version of Record published: February 8, 2021 (version 3)
 Version of Record updated: February 9, 2021 (version 4)
Copyright
© 2020, Gaimann et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,759
 Page views

 284
 Downloads

 13
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Physics of Living Systems
Schooling in fish is linked to a number of factors such as increased foraging success, predator avoidance, and social interactions. In addition, a prevailing hypothesis is that swimming in groups provides energetic benefits through hydrodynamic interactions. Thrust wakes are frequently occurring flow structures in fish schools as they are shed behind swimming fish. Despite increased flow speeds in these wakes, recent modeling work has suggested that swimming directly inline behind an individual may lead to increased efficiency. However, only limited data are available on live fish interacting with thrust wakes. Here we designed a controlled experiment in which brook trout, Salvelinus fontinalis, interact with thrust wakes generated by a robotic mechanism that produces a fishlike wake. We show that trout swim in thrust wakes, reduce their tailbeat frequencies, and synchronize with the robotic flapping mechanism. Our flow and pressure field analysis revealed that the trout are interacting with oncoming vortices and that they exhibit reduced pressure drag at the head compared to swimming in isolation. Together, these experiments suggest that trout swim energetically more efficiently in thrust wakes and support the hypothesis that swimming in the wake of one another is an advantageous strategy to save energy in a school.

 Physics of Living Systems
When a fish beats its tail, it produces vortices in the water that other fish could take advantage of to save energy while swimming.