Human B cell lineages associated with germinal centers following influenza vaccination are measurably evolving

  1. Kenneth B Hoehn
  2. Jackson S Turner
  3. Frederick I Miller
  4. Ruoyi Jiang
  5. Oliver G Pybus
  6. Ali H Ellebedy
  7. Steven H Kleinstein  Is a corresponding author
  1. Department of Pathology, Yale School of Medicine, United States
  2. Department of Pathology and Immunology, Washington University School of Medicine, United States
  3. Worcester Polytechnic Institute, United States
  4. Department of Immunobiology, Yale School of Medicine, United States
  5. Department of Zoology, University of Oxford, United Kingdom
  6. The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, United States
  7. Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, United States

Abstract

The poor efficacy of seasonal influenza virus vaccines is often attributed to pre-existing immunity interfering with the persistence and maturation of vaccine-induced B cell responses. We previously showed that a subset of vaccine-induced B cell lineages are recruited into germinal centers (GCs) following vaccination, suggesting that affinity maturation of these lineages against vaccine antigens can occur. However, it remains to be determined whether seasonal influenza vaccination stimulates additional evolution of vaccine-specific lineages, and previous work has found no significant increase in somatic hypermutation among influenza-binding lineages sampled from the blood following seasonal vaccination in humans. Here, we investigate this issue using a phylogenetic test of measurable immunoglobulin sequence evolution. We first validate this test through simulations and survey measurable evolution across multiple conditions. We find significant heterogeneity in measurable B cell evolution across conditions, with enrichment in primary response conditions such as HIV infection and early childhood development. We then show that measurable evolution following influenza vaccination is highly compartmentalized: while lineages in the blood are rarely measurably evolving following influenza vaccination, lineages containing GC B cells are frequently measurably evolving. Many of these lineages appear to derive from memory B cells. We conclude from these findings that seasonal influenza virus vaccination can stimulate additional evolution of responding B cell lineages, and imply that the poor efficacy of seasonal influenza vaccination is not due to a complete inhibition of vaccine-specific B cell evolution.

Editor's evaluation

The manuscript by Hoehn et al., introduces a novel approach to measure evolution in B cell responses, and apply it to a wide variety of data sets. The work provides significant new insight into which stimuli induce effective immune responses, and which has the potential to improve vaccine design. This will be of interest to those interested in B cell responses, especially in the case of vaccinations that induce poor immune responses.

https://doi.org/10.7554/eLife.70873.sa0

eLife digest

When the immune system encounters a disease-causing pathogen, it releases antibodies that can bind to specific regions of the bacterium or virus and help to clear the infection. These proteins are generated by B cells which, upon detecting the pathogen, can begin to mutate and alter the structure of the antibody they produce: the better the antibody is at binding to the pathogen, the more likely the B cell is to survive. This process of evolution produces B cells that make more effective antibodies. After the infection, some of these cells become ‘memory B cells’ which can be stimulated in to action when the pathogen invades again.

Many vaccines also depend on this process to trigger the production of memory B cells that can fight off a specific disease-causing agent. However, it is unclear to what extent memory B cells that already exist are able to continue to evolve and modify their antibodies. This is particularly important for the flu vaccine, as the virus that causes influenza rapidly mutates. To provide high levels of protection, the memory B cells formed following the vaccine may therefore need to evolve to make different antibodies that recognize mutated forms of the virus.

It is thought that the low effectiveness of the flu vaccine is partially because the response it triggers does not stimulate additional evolution of memory B cells. To test this theory, Hoehn et al. developed a computational method that can detect the evolution of B cells over time. The tool was applied to samples collected from the blood and lymph nodes (organ where immune cells reside) of people who recently received the flu vaccine. The results were then compared to B cells taken from people after different infections, vaccinations, and other conditions.

Hoehn et al. found the degree to which B cells evolve varies significantly between conditions. For example, B cells produced during chronic HIV infections frequently evolved over time, while such evolution was rarely observed during the autoimmune disease myasthenia gravis. The analysis also showed that memory B cells produced by the flu vaccine were able to evolve if recruited to the lymph nodes, but this was rarely detected in B cells in the blood.

These findings suggest the low efficacy of the flu vaccine is not due to a complete lack of B cell evolution, but likely due to other factors. For instance, it is possible the evolutionary process it stimulates is not as robust as in other conditions, or is less likely to produce long-lived B cells that release antibodies. More research is needed to explore these ideas and could lead to the development of more effective flu vaccines.

Introduction

Measurably evolving populations are systems that undergo evolution rapidly enough for significant genetic differences to be detected in longitudinally sampled timepoints (Drummond et al., 2003). While this concept is frequently applied to viruses such as HIV (Rambaut et al., 2004) and SARS-CoV-2 (e.g., du Plessis et al., 2021), B cells experience similarly rapid evolution during affinity maturation. B cell affinity maturation is critical for developing high-affinity antibodies in response to infection and vaccination (Shlomchik and Weisel, 2012; Victora and Nussenzweig, 2012). During affinity maturation, somatic hypermutation (SHM) introduces mutations into the B cell receptor (BCR) loci at a rate orders of magnitude higher than the background rate of somatic mutations (McKean et al., 1984; Murphy et al., 2008). These modified BCRs are selected based on their binding affinity, and the process repeats cyclically within germinal centers (GCs Teng and Papavasiliou, 2007; Victora and Nussenzweig, 2012). Infection or vaccination can also stimulate pre-existing memory B cells that rapidly differentiate into antibody secreting plasmablasts or possibly re-enter GCs to undergo additional affinity maturation (Ellebedy, 2018; Mesin et al., 2020). A lack of vaccine-specific affinity maturation is thought to underlie the poor efficacy of seasonal influenza virus vaccination (Arevalo et al., 2020; Ellebedy, 2018). While recent work has shown that antigen-specific B cell lineages can be recruited into GCs following influenza vaccination (Turner et al., 2020), other work has been unable to detect significant increases in SHM frequency among circulating influenza-binding antibody lineages following vaccination (Ellebedy et al., 2016).

Whether seasonal influenza vaccination stimulates an increase in SHM frequency can be answered by determining whether influenza-binding B cell lineages found in GCs are measurably evolving following vaccination. This is distinct from simply quantifying SHM frequency. While influenza vaccination stimulates memory B cell lineages with high SHM frequency (Laserson et al., 2014; Wrammert et al., 2008), these lineages are only measurably evolving if their level of SHM detectably increases during the sampling interval surrounding vaccination. In this study, we show how a phylogenetic test of measurable evolution can be a powerful tool to detect increasing SHM frequency in longitudinally sampled BCR sequence datasets (Duchêne et al., 2015; Murray et al., 2016). We validate this approach through simulations and a survey of measurable evolution in B cell repertoires across a wide range of infections and vaccinations. We document significant heterogeneity among conditions, with some like HIV infection and primary hepatitis B vaccination enriched for measurably evolving lineages in the blood. We further show that while most circulating lineages following influenza virus vaccination are not measurably evolving, a subset of memory B cell lineages re-enter GCs and increase in SHM frequency.

Results

Detecting measurable evolution in longitudinally sampled BCR repertoires

We develop a framework to test for measurable evolution in B cells based on longitudinally sampled sequence data from the BCR variable region. After preprocessing the sequencing data, we first identify clonal lineages – B cells that descend from a common V(D)J rearrangement – using clustering based on nucleotide sequence similarity, which we have previously shown detects clonal relationships with high confidence (Gupta et al., 2017; Zhou and Kleinstein, 2019). The pattern of shared SHM among BCR sequences within a lineage is then used to build a B cell lineage tree, which represents a lineage’s history of SHM. Branch lengths within these trees represent SHM per site. The divergence of each tip is the sum of branch lengths leading back to the lineage’s most recent common ancestor. In evolving lineages, sequences sampled at later timepoints are expected to have higher divergence than those from earlier timepoints (Figure 1A). To estimate the rate of evolution over time, we calculate the slope of the regression line between timepoint (weeks) and divergence (SHM/site) for each tip (Figure 1B, E; Rambaut et al., 2016). Because tips are not independent, standard linear regression p values are improper. We instead quantify significance using a modified phylogenetic date randomization test (Duchêne et al., 2015; Murray et al., 2016). This tests whether the Pearson’s correlation between divergence and time is significantly greater than that observed in the same tree with timepoints randomized among tips (Figure 1C, F). To account for population structure and sequencing error, we permute timepoints among single-timepoint monophyletic clusters of tips rather than individual tips (Figure 1—figure supplements 1 and 2; Duchêne et al., 2015; Murray et al., 2016). Further, it is possible that the combined effects of PCR and sequencing error will generate tree structures with multiple spurious tips radiating from a single node. This could increase the error rate of the date randomization test. Because trees are strictly binary, this would produce clusters of zero-length branches (soft polytomies) that could increase the error rate. To limit potential effects of this source of error, we resolve polytomies into the fewest number of single-timepoint monophyletic clades possible (Figure 1—figure supplements 1 and 2). We refer to lineages with a date randomization test p < 0.05 as ‘measurably evolving’. To limit our analyses to lineages with adequate statistical power, we include only lineages with ≥15 total sequences sampled over at least 3 weeks, and have a minimum possible p value <0.05 based on the number of distinct permutations. Because we use a p value cutoff of 0.05, we expect a false positive rate of approximately 5% if no measurable evolution is occurring. We therefore refer to datasets with >5% measurably evolving lineages as ‘enriched’ for measurable evolution. This test is implemented within the Immcantation.org framework in the R package dowser (Hoehn et al., 2020).

Figure 1 with 5 supplements see all
Detecting measurable evolution in B cell lineages.

(A) Example B cell lineage tree from Liao et al., 2013 showing increasing divergence with sample time. Branch lengths show somatic hypermutation (SHM)/site according to scale bar in (D). (B) Rate of SHM accumulation over time estimated using a regression of divergence vs time in tree (A). (C) Significance of the relationship between divergence and time estimated using a date randomization test comparing the Pearson’s correlation (r) between divergence and time in tree (A). (D–F) Same plots as (A–C) but on a tree that is not measurably evolving. (G) Simulation-based power analysis shows the permutation test has high power over an interval of at least 10–30 GC cycles (generations). Lineages were sampled once at generation 10, and a second time after the specified number of additional generations have elapsed. Percentage of lineages with p < 0.05 are listed above, rounded to three significant digits. The dotted line corresponds to p = 0.05. (H) Simulation-based analysis reproducing the sampling of Laserson et al., 2014 shows the test has high power even at slow (24 hr) GC cycle times.

To determine the necessary sampling interval to detect B cell evolution, we benchmarked the date randomization test using affinity maturation simulations performed with the package bcr-phylo (Davidsen and Matsen, 2018; Ralph and Matsen, 2020). This simulates alternating GC cycles of B cell proliferation, SHM, and selection based on amino acid similarity to a target sequence. Within these simulations, each lineage was first sampled after 10 simulated GC cycles, and then sampled a second time after a variable number of additional cycles. Using this framework in which all lineages are evolving, the date randomization test detected measurable evolution in 47% of lineages after 10 additional GC cycles, and 77% after 15 additional cycles (Figure 1G). Given a GC cycle time of 6–24 hr, 15 cycles corresponds to 4–15 days, within the timeframe of many longitudinal B cell repertoire studies (Ellebedy et al., 2016; Laserson et al., 2014). Interestingly, the date randomization test had higher power to detect measurable evolution in simulations of neutral evolution than those that included selection (Figure 1—figure supplements 34). This is likely because selection can reduce the rate of divergence within lineages compared to neutral evolution (Figure 1—figure supplement 5). To quantify the false positive rate, we repeated these calculations on the same simulations but with randomized sample time associations. Here, the date randomization test found measurable evolution in <4% in each case, indicating a low false positive rate (Figure 1—figure supplements 34). These analyses demonstrate that the date randomization test has sufficient sensitivity and specificity to detect ongoing B cell evolution from longitudinally sampled BCR data.

Primary immune responses are enriched for measurably evolving lineages

To further validate our approach, we tested for measurable evolution in cases of known or suspected affinity maturation in humans. We hypothesized that primary immune responses would be enriched for measurably evolving lineages. To test this, we used publicly available data primarily from the Observed Antibody Space (OAS) database (Kovaltsuk et al., 2018) to survey measurable evolution in BCR datasets from 99 human subjects in 21 studies spanning 10 conditions including HIV infection, Ebola virus infection, and healthy controls (Table 1). We observed considerable heterogeneity in measurable evolution among conditions. Confirming our hypothesis, we observed an enrichment of measurably evolving lineages (>5% of tested lineages) in primary immune responses including HIV infection, meningococcus vaccination, primary but not secondary hepatitis B vaccination, and early childhood development (Table 1, Figure 2A, and Figure 2—figure supplement 1).

Figure 2 with 4 supplements see all
Measurable evolution in B cell lineages across time and conditions.

(A) Percentage of lineages that are measurably evolving within each study (Table 1, Figure 1C). The dotted line indicates 5%, the percent expected under the null hypothesis that there is no measurable evolution occurring in a given dataset. Orange triangles indicate HIV datasets truncated to only include data within the first 60-week sampling interval. Note that three HIV studies were not truncated because they contained <2 sample timepoints within the first 60 weeks of sampling (Huang et al., 2016; Schanz et al., 2014; Wu et al., 2015). (B) Mean initial germline divergence (sum of branch lengths) from germline to sequences from each adjusted measurably evolving lineage’s first timepoint. Note that ‘Early/Late’ HIV in (B) separates studies by time since initial infection, while ‘HIV, first 60 weeks’ in (A) includes only samples taken over the first 60 weeks of sampling. Each point is a measurably evolving lineage with a Benjamini–Hochberg adjusted p value <0.1. Wilcoxon tests were used to compare divergence levels among datasets.

Table 1
Summary of datasets.

N shows number of subjects with at least one powered lineage. Mean range shows mean total sampling interval across subjects. Powered lineages shows the number of lineages that: (1) contained at least 15 sequences, (2) were sampled over at least 3 weeks, and (3) had a minimum possible p value <0.05. The rightmost column shows the percentage of these lineages with p < 0.05, rounded to two significant digits. Studies with at least 5% positive lineages are shown in bold. Turner et al., 2020 in this table and Figure 2 included only blood samples. Data from studies marked with an asterisk (*) were obtained from Observed Antibody Space (Kovaltsuk et al., 2018).

StudyConditionNMean range (weeks)Mean sample countMulti-timepoint lineagesPowered lineages% lineages p < 0.05
Levin et al., 2016*Allergy + SIT9522.742316.5
Davis et al., 2019*Ebola virus4363.61,5498775
Wang et al., 2014Healthy adults752218100
Nielsen et al., 2019Healthy children20692.72627114
Galson et al., 2015b*Hep. B vaccine (boost)944.84,9233,4222.9
Galson et al., 2016*Hep. B vaccine (primary)9236.94,4262,5297.2
Doria-Rose et al., 2014*HIV11908654844
Huang et al., 2016*1120123882215.9
Johnson et al., 2018*1170556133023
Landais et al., 2017*116071,08474348
Liao et al., 2013*1140520515153
Schanz et al., 2014*112031475411
Setliff et al., 2018*617037871739.8
Wu et al., 2011, Wu et al., 2015*1730739330526
Ellebedy et al., 2016*Influenza vaccine813519661,4795.2
Laserson et al., 2014*3491,1826394.9
Turner et al., 202018.651681042.9
Galson et al., 2015a*Meningococcus vaccine7434838010
Jiang et al., 2020aMyasthenia gravis32603.6110623.2
Jiang et al., 2020b152246333
Tsioris et al., 2015West Nile virus65.22151651.5

Chronic HIV infection stimulates ongoing affinity maturation as B cells evolve to contain viral escape mutants (Liao et al., 2013; Wendel et al., 2020). Consistent with this arms race, HIV infection was more enriched for measurably evolving lineages than other conditions surveyed, with each study having between 5.9% and 53% of lineages measurably evolving (Figure 2A). Lineages from subjects with broadly neutralizing anti-HIV lineages sampled over multiple years (Doria-Rose et al., 2014; Landais et al., 2017; Liao et al., 2013; Wu et al., 2015) were particularly enriched (26–53% measurably evolving). Importantly, the HIV studies included were sampled over longer time periods than studies of other conditions (mean = 225 vs 45 weeks, Table 1). To determine whether these results were simply due to longer sampling intervals, we repeated our analysis of subjects with HIV using only samples within the first 60 weeks of the study. These truncated datasets were still highly enriched for measurably evolving lineages (6.9–64%) compared to other non-HIV datasets with similar sampling intervals (0–7.2%, Figure 2A). This indicates that the observed high frequency of measurably evolving lineages is not simply due to long sampling intervals.

Other primary immune responses were also enriched for measurably evolving lineages (Table 1, Figure 2A). B cell lineages from healthy children sampled during the first 3 years of life were enriched for measurable evolution (14%), possibly reflecting continual exposure to novel antigens (Nielsen et al., 2019). We also observed an enrichment of measurably evolving lineages following primary meningococcus vaccination (10%; Galson et al., 2015a) and primary but not secondary hepatitis B vaccination (7.2% vs 2.9%, respectively; Galson et al., 2016; Galson et al., 2015b). Primary hepatitis B vaccinees were sampled over a longer time period than secondary vaccines, so this difference may also be due to different sampling intervals (Figure 2—figure supplement 2). Further, allergen-specific immunotherapy, which stimulates tolerance of allergy-causing antigens through exposure, was also enriched for measurable evolution (6.5%; Levin et al., 2016). Interestingly, Ebola virus infection showed a borderline (5%) percentage of measurably evolving lineages (Table 1) despite likely being a primary infection. Overall, however, these results confirm that the date randomization test can detect ongoing SHM in empirical datasets where it is expected to be occurring.

We next investigated whether measurable evolution was associated with antigen-binding lineages. While antigen-binding information was not available for most B cell lineages surveyed, some studies included experimentally validated monoclonal antibody sequences (mAbs). Lineages containing these sequences thus contain B cells that bind to the antigen under study. Experimentally validated mAbs were included from six studies: four in HIV (Doria-Rose et al., 2014; Landais et al., 2017; Liao et al., 2013; Wu et al., 2015), one in Ebola virus infection (Davis et al., 2019), and one in influenza vaccine response (Turner et al., 2020). We found that across these studies measurably evolving lineages were more likely to contain mAbs than nonmeasurably evolving lineages (p = 0.031, Wilcoxon test, Figure 2—figure supplement 3). This is consistent with the hypothesis that measurably evolving lineages are actively responding to antigens relevant to the condition being studied.

Measurably evolving lineages are rare in peripheral blood following influenza vaccination

Seasonal influenza vaccination is believed to trigger a memory B cell response in adults. If memory B cells rarely re-enter GCs to undergo additional affinity maturation (Mesin et al., 2020), and there is little evolution of naive B cell lineages, we expect little measurable evolution in the blood following vaccination. To test this, we applied the date randomization test to three longitudinally sampled adult influenza vaccine datasets. The first comprised three adults sampled seven times between 1 hr and 28 days postvaccination (Gupta et al., 2017; Laserson et al., 2014); the second contained eight adults sampled five times between 0 and 90 days postvaccination (Ellebedy et al., 2016) the third used blood samples from a single individual sampled five times between 0 and 60 days postvaccination (Turner et al., 2020). Across subjects in each study, between only 2.9% and 5.2% of lineages were measurably evolving (Table 1). These values are approximately as expected under the null hypothesis of no measurable evolution, and histograms of p values from these datasets are roughly uniform, suggesting the measurably evolving lineages identified are mostly false positives from multiple testing (Figure 2—figure supplement 1). Distributions of p values for all datasets are also available in Figure 2—figure supplement 1. To verify the 4- to 13-week sampling range of these studies was sufficient to detect measurable evolution, we performed simulation analyses replicating the sampling strategy of the influenza dataset with the shortest sampling range (Figure 1—figure supplement 4; Laserson et al., 2014). These simulations show this timescale was sufficiently long to detect ongoing affinity maturation with high sensitivity (>90%, Figure 1H). Overall, these results indicate B cell lineages present in blood infrequently undergo additional evolution within 13 weeks following influenza vaccination, consistent with a primarily GC-independent memory B cell response and/or rarity of antigen-specific lineages in the peripheral blood (Wrammert et al., 2008).

Measurably evolving lineages following influenza vaccination include memory B cell origin

While measurably evolving lineages do not occur at high frequency in the blood following influenza vaccination, we checked if any could be identified after adjustment for multiple testing. To adjust for multiple hypothesis tests, we pooled lineages across all studies and adjusted their p values using the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995). We identified 15 lineages in influenza datasets, and 354 lineages in other conditions, with adjusted date randomization p values < 0.1. We investigated if these ‘adjusted’ measurably evolving lineages were derived from naive or pre-existing memory B cells. Because memory B cell lineages have already undergone affinity maturation, they are expected to have higher initial SHM levels compared to naive B cell lineages. To test this, we compared germline sequence divergence in adjusted measurably evolving lineages from influenza vaccination to other conditions. Consistent with memory B cell reactivation, lineages from influenza vaccination had significantly higher initial divergence (median = 8.6%) than those from primary responses such as early HIV infection (median = 5%, p = 0.0012) and primary hepatitis B vaccination (median = 2.8%, p = 0.0019) (Figure 2B). Further, these influenza lineages had initial divergence levels similar to lineages from subjects with HIV first sampled >5 years after infection (Huang et al., 2016; Wu et al., 2015), and hepatitis B booster vaccination subjects (Figure 2B; Galson et al., 2015b). Ebola virus infection, meningococcus vaccination, and early childhood development had median initial divergence levels of 0.4%, 6.6%, and 2.0%, respectively, but contained less than three adjusted measurably evolving lineages each. To understand the effect of multiple hypothesis correction on these results, we repeated the comparisons in Figure 2B using all measurably evolving lineages (unadjusted p < 0.05) from the same datasets. Considering this larger set of lineages, initial divergence of lineages from influenza vaccination studies was significantly higher than those in all other conditions except late HIV infection (Figure 2—figure supplement 4). The same pattern from Figure 2B was also found when repeating these comparisons with a more strict cutoff (adjusted p < 0.05, Figure 2—figure supplement 4). Overall, these results are consistent with measurably evolving lineages from influenza vaccination arising mainly from pre-existing memory B cells.

Measurably evolving lineages show signs of purifying selection

We next investigated the type and degree of selection operating on measurably evolving B cell lineages. One way to detect natural selection in DNA sequences is to estimate the ratio of nonsynonymous (amino acid replacement) to synonymous (silent) mutation rates. This ratio is often called ω (Nielsen and Yang, 1998). Neutral evolution, where amino acid replacements are not selected for or against, should result in ω = 1. Purifying selection, where amino acid replacements are disfavored, should result in ω < 1. Diversifying selection, where amino acid replacements are favored, should result in ω > 1. In B cell lineages, ω is often estimated separately for complementarity-determining regions (CDRs) involved in antigen binding, and framework regions (FWRs), which are more structural. Further, it is important to estimate ω or similar metrics using models that account for intrinsic hot- and cold-spot biases of SHM (Hoehn et al., 2017; Uduman et al., 2011; Yaari et al., 2012). To understand what kind of selection operated on measurably evolving lineages, we estimated separate ω values for CDR and FWR regions (ωCDR and ωFWR) of the adjusted measurably evolving lineages (Figure 2B) using the HLP19 model in IgPhyML (Hoehn et al., 2019). Model parameters were shared among lineages within the same subject, and only subjects with at least two adjusted measurably evolving lineage were included to reduce noise. Across all conditions surveyed, we found evidence of purifying selection operating on adjusted measurably evolving lineages (mean ωCDR = 0.58, mean ωFWR = 0.48, Table 2). We estimated the significance of these results using a phylogenetic likelihood ratio test (Huelsenbeck and Rannala, 1997). We found that ωCDR was significantly <1 in 10/13 subjects (significantly >1 in none) and ωFWR was significantly <1 in 13/13 subjects (Table 2). This signal of purifying selection was particularly strong in both early and late HIV. Influenza vaccination showed higher ω values, comparable to primary hepatitis B vaccination.

Table 2
Analysis of selection on adjusted measurably evolving lineages.

Repertoire-wide estimates of ω for CDRs (ωCDR) and FWRs (ωFWR) for adjusted measurably evolving lineages within different subjects are shown. L indicates the maximum log-likelihood obtained when both ωCDR and ωFWR were estimated by maximum likelihood. LCDR=1 indicates the maximum log-likelihood obtained when ωFWR was estimated by maximum likelihood but ωCDR was fixed at 1. LFWR=1 indicates the maximum log-likelihood obtained when ωCDR was estimated by maximum likelihood but ωFWR was fixed at 1. The likelihood ratio statistic (LRS) was calculated as either 2×(L − LCDR=1) for CDRs or 2×(L − LFWR=1) for FWRs, and p values were calculated using a likelihood ratio test with one degree of freedom (see Methods). L values were rounded to two decimal places, LRS values are reported to three significant digits. Significant p values are in bold. p values below the numerical limit for double values are reported as <2E−16.

StudySubjectNRegionωLLCDR=1LFWR=1LRSp
Influenza vaccineEllebedy et al., 2016Donor-44CDR0.624−5753.04−5759.6513.22.80E−04
FWR0.503−5753.04−5783.961.74.00E−15
Donor-55CDR0.979−5074.01−5074.020.02128.84E−01
FWR0.584−5074.01−5090.5533.18.75E−09
Laserson et al., 2014FV4CDR0.583−9162.11−9173.3922.61.99E−06
FWR0.508−9162.11−9217.38111<2E−16
Early HIVDoria-Rose et al., 2014CAP2569CDR0.424−18,976.96−19,045.96138<2E−16
FWR0.451−18,976.96−19,151.29349<2E−16
Johnson et al., 2018CAP25626CDR0.423−46,720.28−46,858.67277<2E−16
FWR0.408−46,720.28−47,243.961,050<2E−16
Landais et al., 2017PC064188CDR0.39−416,489.06−418,035.93,090<2E−16
FWR0.386−416,489.06−422,043.6811,100<2E−16
Liao et al., 2013CH50551CDR0.527−102,131.66−102,307.3351<2E−16
FWR0.417−102,131.66−103,240.862,220<2E−16
Schanz et al., 2014ZA1592CDR0.657−5573.47−5577.127.296.93E−03
FWR0.466−5573.47−5617.9188.9<2E−16
Late HIVHuang et al., 2016Donor-Z2582CDR0.345−3162.12−3172.8721.53.54E−06
FWR0.401−3162.12−3196.568.71.11E−16
Wu et al., 2015Donor-4532CDR0.451−63,527.81−63,687318<2E−16
FWR0.378−63,527.81−64,444.731,830<2E−16
Hep B. vaccine, primaryGalson et al., 2016Subject-22776CDR0.885−5498.77−5499.150.7663.81E−01
FWR0.668−5498.77−5508.3619.21.18E−05
Subject-27522CDR0.533−1112.1−1113.823.446.36E−02
FWR0.461−1112.1−1120.0615.96.68E−05
Subject-29549CDR0.711−6023.13−6026.46.561.04E−02
FWR0.545−6023.13−6048.0849.91.62E−12

Influenza-binding lineages associated with GCs are measurably evolving

While we found little measurable evolution in the blood following seasonal influenza vaccination, influenza vaccination has been shown to stimulate both naive and memory B cells to enter GCs (Turner et al., 2020). This raises the possibility that additional affinity maturation could be occurring in GCs, but its products are not enriched in the blood. Data from Turner et al., 2020 provided both blood samples and fine-needle aspirations of lymph nodes (including GCs) from the same subject. By combining these samples, we identified 53 powered B cell lineages containing at least one GC B cell following influenza vaccination, and 100 powered lineages that contained none. We refer to lineages containing one or more GC B cells as ‘GC-associated’. To determine whether GC-associated lineages were undergoing additional SHM, we tested whether they were enriched for measurable evolution. We found that 7.5% of lineages containing sequences from GC B cells were measurably evolving, compared to only 3.0% of lineages with no identified GC sequences. This signal of measurable evolution increased with the fraction of GC sequences. For instance, while 10% of lineages containing ≥10% GC sequences were measurably evolving, 38% (3/8) of those with ≥25% GC sequences were measurably evolving (Figure 3A). Lineages with higher proportions of GC sequences also had a higher correlation between divergence and time (linear regression slope = 1.1, p = 8.9 × 10−13, Figure 3—figure supplement 1). We further estimated the significance of this positive relationship by bootstrapping our data using 10,000 resampling repetitions with replacement. We found that in all 10,000 resampling repetitions, the slope of the linear regression between GC sequence proportion and the correlation between divergence and time was positive, with 95% of repetitions having a slope between 0.81 and 1.3 (Figure 3—figure supplement 1). Measurably evolving lineages in this dataset did not contain significantly more sequences than other lineages, indicating these results were not significantly confounded by lineage size (Figure 3—figure supplement 2). Finally, the measurably evolving lineages with the highest proportion of GC sequences contained mAbs that bound to vaccine antigens (Figure 3B, C). These lineages show signs of origin from memory B cells, such as clonal relatedness to blood plasmablasts sampled 5 days postvaccination, and high mean germline divergence at their first sampled timepoint (6.3%, 7.2%, Figure 3B, C, respectively). To test whether GC-associated lineages accumulated new amino acid replacement mutations rather than just silent mutations, we repeated the date randomization test but calculated the divergence of each tip as the number of amino acid differences between that tip’s sequence and the unmutated germline ancestor. This amino acid-based correlation analysis also showed a strong positive relationship between the proportion of lineages that were measurably evolving and the percentage of sequences derived from GC B cells (Figure 3A). This indicates that these GC-associated lineages accumulated new amino acid mutations as well as nucleotide mutations over the study interval. Overall, these analyses demonstrate that influenza-binding, GC-associated B cell lineages undergo additional, measurable evolution following vaccination.

Figure 3 with 3 supplements see all
Germinal center (GC) association is positively related to measurable evolution following influenza vaccination.

(A) Percent of lineages that are measurably evolving given a minimum percentage of GC sequences. The minimum (inclusive) percent of GC sequences within a clone is shown on the x axis. The origin shows the percentage of measurably evolving lineages across all lineages. The left-most point shows lineages without any GC sequences. The total number of lineages in each category are listed above each point. The dashed line shows 5%, the expected false positive rate under the null hypothesis. Results are shown for clustered date randomization tests using divergence values calculated either as the sum of nucleotide-based phylogenetic branch lengths (nucleotide), and the amino acid Hamming distance of each sequence to the germline (amino acid). (B, C) Lineage trees showing measurably evolving lineages with the highest proportion of GC sequences. Tips are labeled by cell type if available. ABC, activated B cell; GC, germinal center; PB, plasmablast; RMB, resting memory B; and unlabeled tips are from bulk PBMC sequencing. mAb = influenza-binding monoclonal antibody sequence (2018/2019 quadrivalent inactivated influenza virus vaccine). Branch lengths represent somatic hypermutation (SHM)/site, as shown by the shared scale bar.

A possible alternative explanation for measurable evolution following influenza vaccination is that SHM is not occurring over the sampled time interval, but that highly mutated B cells were preferentially recalled due to their higher binding affinity. Preferential recall of highly mutated B cells would likely result in a positive correlation between divergence and sample time. While difficult to directly test, we believe this explanation is unlikely to be the sole source of measurable evolution in our data. First, blood samples taken 5 days postvaccination represent the breadth of the pool of memory B cells. If measurable evolution was simply due to expansion of mutated memory B cells, we would expect divergence of later-sampled B cells to be within the range of day 5 plasmablasts. Instead, many later-sampled GC sequences are clearly more diverged than earlier-sampled sequences within the measurably evolving, influenza-binding lineages we observed (Figure 3B, C). Second, if measurable evolution were due simply to preferential expansion of more mutated B cells, we would expect to observe measurable evolution within influenza-binding lineages in both the blood and GC. This is not tested in Figure 3A because that analysis includes all lineages, not just those that bind to influenza. To adjust for this, we repeated the analysis in Figure 3A while only including lineages that contained influenza-binding mAbs. Even among influenza-binding lineages, we still observed an association between GC cells and measurable evolution. While 1/10 of influenza-binding lineages found only in the blood were measurably evolving, 2/5 lineages with >25% GC sequences were measurably evolving (Figure 3—figure supplement 3). Overall, while we cannot rule out preferential expansion of highly mutated memory B cells, these results are more easily interpretable as the result of ongoing SHM in GCs.

Discussion

The extent to which seasonal influenza vaccination stimulates affinity maturation against vaccine antigens is unclear, and poor efficacy of seasonal influenza virus vaccination is often attributed to stimulation of pre-existing memory B cells interfering with novel responses to vaccine antigens (Ellebedy, 2018). While a prior study has shown that influenza-binding B cell lineages are found in GCs following seasonal influenza vaccination (Turner et al., 2020), other work has suggested that circulating influenza-binding B cell lineages do not accumulate additional SHM following vaccination (Ellebedy et al., 2016). To determine whether seasonal influenza vaccination stimulates additional evolution in B cell lineages, we developed and validated a framework to detect measurable evolution using longitudinally sampled BCR sequencing data. This phylogenetic test can be a powerful tool to detect ongoing B cell evolution using longitudinally sampled BCR datasets across a wide array of immunological conditions, including influenza virus vaccine responses. Our results confirm prior findings that there is little evidence of B cell evolution among lineages sampled in the peripheral blood following seasonal influenza vaccination (Ellebedy et al., 2016). However, we also show that seasonal influenza vaccination is capable of stimulating measurable evolution in influenza-binding, GC-associated B cell lineages.

To place our analyses of the influenza vaccination response in a broader context, we surveyed measurable evolution across a broad range of infections and vaccinations. Prior work has shown that chronic HIV infection induces long-term affinity maturation of broadly neutralizing antibody lineages in response to viral escape mutants (Liao et al., 2013; Vieira et al., 2018; Wu et al., 2015). Our results show that HIV infection is associated with an exceptionally strong signature of B cell evolution over time. This signature is not limited to single lineages. Rather, a substantial fraction of longitudinally sampled B cell lineages within the repertoires of subjects with HIV are measurably evolving, consistent with clonal competition among B cell lineages during HIV infection (Nourmohammad et al., 2019). Early childhood development during the first 3 years of life (Nielsen et al., 2019) showed the second-highest enrichment of measurably evolving lineages among surveyed conditions. This possibly reflects continual exposure to novel antigens during childhood. Further, primary vaccinations (meningococcus, primary hepatitis B)(Galson et al., 2016; Galson et al., 2015a) showed stronger signatures of measurable evolution than secondary vaccinations (adult seasonal influenza, hepatitis B booster) (Ellebedy et al., 2016; Galson et al., 2015b; Laserson et al., 2014; Turner et al., 2020). Overall, our results are consistent with the hypothesis that GC responses are stronger in response to novel antigens.

In addition to detecting measurable evolution, we also characterized selection operating on measurably evolving B cell lineages. We found that measurably evolving lineages showed evidence of purifying selection (ωCDR < 1) (Table 2). Though perhaps counterintuitive, a strong signal of purifying selection is a straightforward prediction of evolution toward an adaptive peak (Hoehn et al., 2019). Similar evidence of purifying selection during affinity maturation has been observed in other studies including influenza vaccination, HIV infection, and healthy controls (Cizmeci et al., 2021; Hoehn et al., 2019; Sheng et al., 2016; Yaari et al., 2015). Importantly, ω estimates are an average across all codons within CDRs or FWRs. It is possible that positive selection operated on a small number of codon sites, but that this signal was outweighed by the larger number of sites under purifying selection. Codon-specific models may be useful in future analyses to identify these sites under positive selection (e.g., Yang et al., 2000). When interpreting these results, it is also important to note that parameters were estimated using all mutations represented by each lineage tree, including those that potentially occurred before the first sampled timepoint. In all, these results are consistent with typical forces of selection having operated on measurably evolving lineages.

There are several limitations to this study. Data from different studies were sampled according to different schedules and time intervals. Because the power to detect measurable evolution should increase over time (Figure 1G), this could confound comparisons among datasets. We note however that multiple influenza and HIV datasets were surveyed, and enrichment of measurable evolution within these conditions was not strongly related to the sampling range (Figure 2A). This suggests immunological condition, rather than sample range, was the primary determinant of observed differences. By including monoclonal antibody (mAb) sequences with experimentally validated binding in several datasets, we were able to show that measurably evolving lineages are more likely than nonmeasurably evolving lineages to contain mAb sequences. This is consistent with the idea that measurably evolving lineages are actively responding to antigen. These results should be interpreted cautiously, however. With the exception of Turner et al., 2020 only a small number mAb sequences were found in the lineages analyzed (mean = 3.6 per study). Further, lineages containing GC sequences were preferentially selected for mAb generation in Turner et al., 2020, which may artificially increase the likelihood that measurably evolving lineages contain mAbs. While intriguing and biologically plausible, conclusively determining whether measurable evolution predicts antigen binding is beyond the scope of this study. Another limitation is that maximum parsimony was used to estimate lineage tree topologies and branch lengths. While more sophisticated methods are available for inferring B cell lineage trees (e.g., Hoehn et al., 2019), maximum parsimony often has competitive performance for topology estimation (Davidsen and Matsen, 2018) and is faster than more complex maximum likelihood models designed for B cell lineages. Computational efficiency was particularly important as our analyses required constructing more than 20,000 lineages trees spanning approximately 1,100,000 BCR sequences. Finally, our analysis of GC-associated B cell lineages was limited to data sampled from a single subject. Thus, while the results here demonstrate that influenza vaccination is capable of inducing measurable evolution, it remains unclear whether this is a general feature of influenza vaccination.

Our analyses of measurable evolution involved a series of hypothesis tests, and the definition for ‘enrichment’ of measurable evolution (>5% of lineages) was chosen based on the expected false positive rate under the null hypothesis. This enrichment measure was chosen to compare the relative frequency of measurably evolving lineages among datasets. A lack of enrichment does not indicate a complete lack of measurably evolving lineages. Conversely, slight enrichment of measurably evolving lineages (~5%) should not be interpreted as proof of ongoing affinity maturation in a set of lineages. The analysis in Figure 2A was not intended to test the null hypothesis that no lineages are measurably evolving in a particular dataset. Because multiple studies were surveyed (Table 1), it is possible our results contain false positives. Indeed, while some conditions such as HIV have a strong signal of measurable evolution across multiple studies, others such as naive hepatitis B vaccination and allergen-specific immunotherapy are just above the significance threshold (≤7.2% lineages with p < 0.05) and are each represented by a single study (Figure 2A, Table 1). These latter datasets should be interpreted cautiously and with the understanding that the vast majority of their lineages were not measurably evolving. To limit the influence of false positives, a multiple testing correction (false discovery rate < 0.1) was performed in analyses investigating the properties of measurably evolving lineages (Figure 2B, Table 2). We note, however, that repeating our analyses of initial germline divergence using all measurably evolving lineages (unadjusted p < 0.05) or a more strict p value cutoff (adjusted p < 0.05) yielded similar results (Figure 2—figure supplement 4), indicating our results are robust to the thresholds used. Finally, we validated the specificity of the date randomization test in empirical data (Figure 1—figure supplement 2). While a significant decrease in divergence over time is biologically unlikely, false positives due to multiple testing or sequencing error should produce a similar number of lineages with a significant correlation in either direction. We quantified correlation in either direction using a two-tailed version of the clustered, resolved date randomization test with a critical value of 0.025 (see Methods). Under a null hypothesis of no ongoing evolution, 2.5% of lineages were expected to have a significant negative correlation between divergence and time (all false positives). However, we found such ‘negatively evolving’ lineages at a mean frequency of only 1.2% (median = 1.3%, range = 0–2.8%) across all datasets when using clustered permutations and resolved polytomies (Figure 1—figure supplement 2). This indicates the date randomization test we used is conservative, and that our chosen thresholds are likely more strict than necessary.

Beyond multiple testing, there are some biological scenarios that could plausibly give rise to a signal of measurable evolution without additional SHM occurring during the sampling interval. For instance, it is possible that all SHM within a lineage occurred before the first sampled timepoint in a study, but that more mutated, higher-affinity BCRs were preferentially stimulated and sampled in later timepoints. Such a scenario would likely result in a positive correlation between divergence and sample time. However, our results are easier to explain if measurable B cell evolution results at least in part from ongoing SHM. For instance, our analysis of measurably evolving lineages following influenza vaccination showed (1) many later-sampled GC B cells had higher divergence than any sampled day 5 plasmablast sequence in the same lineage and (2) continued association between GC cells (rather than the blood) and measurable evolution even among influenza-binding lineages. Nonetheless, it is still theoretically possible that more mutated B cells were not sampled at early timepoints, and were preferentially expanded in GCs compared to the peripheral blood. Thus, while our interpretation is that measurable B cell evolution most likely represents an ongoing SHM process, we cannot conclusively rule out biased selection of more mutated sequences that were generated before the sampling interval.

Affinity maturation is a rapid evolutionary process. It is perhaps surprising that, while we identified conditions enriched for measurably evolving lineages, most lineages in circulation were not measurably evolving (Table 1). One explanation is that our analyses did not use sufficiently long sampling intervals to detect affinity maturation, though we believe this is unlikely. Studies in mice have estimated that SHM occurs at ~10−3 SHM/bp/division (Kleinstein et al., 2003; McKean et al., 1984), and that GC B cells cycle every 6–12 hr (Allen et al., 2007; Hauser et al., 2007; Victora and Nussenzweig, 2012). Simulations using conservative assumptions (strong selection, 24-hr cell cycle) and replicating the sample structure of our shortest-term influenza dataset (4 weeks), showed high power with >90% true positive rate (Figure 1H). Further, we found an enrichment of blood-derived measurably evolving lineages after only 4 weeks in one study (Galson et al., 2015a), and after 8 weeks corresponding to a known context of affinity maturation (GC entry, Figure 3). Overall, these results show that the sample times of our surveyed datasets should be sufficient to detect ongoing B cell evolution if it were occurring. However, it is possible that lineages may not remain in GCs continuously, which would slow the rate of evolution compared to our simulations. A more plausible explanation for the lack of measurable evolution is that most lineages in the blood are either nonspecific to the condition being studied, or derive from a GC-independent response (Mesin et al., 2020; Takemori et al., 2014; Taylor et al., 2012). It is also possible that lineages relevant to the condition being studied are inefficiently stimulated.

We find that seasonal influenza virus vaccination in young adults induces a GC reaction where maturation of vaccine-specific B cell lineages occurs, including those likely recruited from the pre-existing memory B cell compartment. These results imply that poor efficacy of seasonal influenza vaccination does not result from a complete lack of vaccine-induced B cell evolution. While we showed that B cells in these evolving lineages increased in amino acid replacement mutation frequency, it remains possible that this evolution is less able to select affinity-increasing mutations (Hoehn et al., 2019), that the overall number of evolving lineages is reduced, or that the products of this vaccine-induced evolution are not efficiently translated into memory and long-lived plasma cells. These latter two explanations are consistent with the results of our survey of longitudinally sampled peripheral blood datasets, which found an enrichment of measurably evolving lineages in some primary immune response conditions, but not influenza vaccination. Future studies will be needed to fully test these hypotheses about the causes of poor efficacy of seasonal influenza vaccination.

Materials and methods

Study design

Request a detailed protocol

The goal of this study was to determine whether B cell lineages found in GCs following influenza vaccination evolved over a given sample interval. This necessitated describing and validating a test for measurable evolution from longitudinally sampled BCR sequencing data. Simulation-based power analyses determined that this date randomization test has sufficient sensitivity to detect evolving B cell populations over a sampling interval of approximately 2 weeks. To determine whether the date randomization test also worked on known examples of affinity maturation, all longitudinally sampled datasets hosted on OAS (as of 6/2020) were downloaded and tested. To cover as wide a variety of conditions as possible, these datasets were supplemented with processed, publicly available datasets from other prior studies. To ensure datasets were appropriately powered, datasets were only included if they contained at least 10 B cell lineages with at least 15 sequences sampled over 3 weeks and a minimum possible date randomization test p value <0.05. BCR data from blood and fine-needle aspirations following influenza vaccination were obtained from Turner et al., 2020.

BCR sequence datasets and preprocessing

Request a detailed protocol

All longitudinally sampled BCR repertoire datasets were publicly available and obtained both from primary publications and through the OAS database (antibodymap.org, accessed 6/2020; Kovaltsuk et al., 2018). Both assembled nucleotide sequences and deduplicated amino acid sequences were obtained from OAS. To reduce the effect of sequencing error in OAS datasets, only nucleotide sequences corresponding to an amino acid sequence with a multiplicity of at least two were included. Datasets obtained from OAS are labeled in Table 1. Raw sequence data obtained from Nielsen et al., 2019 were preprocessed with pRESTO v0.5.13 (Vander Heiden et al., 2014). Quality control was performed by first removing all sequences with a Phred quality score <20, length <300 bp, or any missing (‘N’) nucleotides. The 3′ and 5′ ends of each read were matched to forward and constant region primers with a maximum error rate of 0.1. The region adjacent to the constant region primer was exactly matched to subisotype-specific internal constant region sequences. Only sequences with the same isotype predicted by their constant region primer and internal constant region sequence were retained. Identical reads within the same isotype were collapsed and sequences observed only once were discarded. All other datasets used processed BCR sequence data provided by the authors of their respective publications. Data from Wang et al., 2014 were processed in Hoehn et al., 2019. Data from Jiang et al., 2020b used only blood samples.

BCR sequence processing, genotyping, and clonal clustering

Request a detailed protocol

Datasets were processed using the Immcantation framework (immcantation.org). V(D)J gene assignment on data obtained from Nielsen et al., 2019 was performed using IgBLAST v1.13 (Ye et al., 2013) against the IMGT human germline reference database (Giudicelli et al., 2005) (IMGT/GENE-DB v3.1.24; retrieved August 3, 2019). V(D)J gene assignments and clonal cluster assignments were already available in all other non-OAS datasets and were retained. Nonproductively rearranged sequences were excluded. Using Change-O v1.0.0 (Gupta et al., 2015), the V and J genes of unmutated germline ancestors for each sequence were constructed with D segment and N/P regions masked by ‘N’ nucleotides. Sequence chimeras were filtered by removing any sequence with more than six mutations in any 10 nucleotide window. Individual immunoglobulin genotypes were computationally inferred using TIgGER v1.0.0 and used to finalize V(D)J annotations (Gadala-Maria et al., 2015). To infer clonal clusters, sequences were first partitioned based on common V and J gene annotations, and junction region length. Within these groups, sequences differing from one another by a specified Hamming distance threshold within the junction region were clustered into clones using single linkage hierarchical clustering (Gupta et al., 2017). The Hamming distance threshold was determined by finding the local minimum of a bimodal distance to nearest sequence neighbor plot using SHazaM v1.0.2.999 (Yaari et al., 2013). In cases where automated threshold detection failed, usually because the distance to nearest neighbor distribution was not bimodal, the threshold was set to 0.1 and verified by manual inspection to ensure that a threshold of 0.1 was near a local minimum. Finally, the V and J genes of unmutated germline ancestors for each clone were constructed. Within these unmutated ancestral sequences, D segments and N/P regions were masked using ambiguous ‘N’ nucleotides.

Testing for measurable evolution

Request a detailed protocol

Testing for measurable evolution begins with building B cell lineage trees. Within each B cell clone, identical sequences or those differing only by ambiguous nucleotides were collapsed unless they were sampled at different timepoints. To reduce computational complexity, lineages were randomly down-sampled to at most 500 sequences each. B cell lineage tree topologies and branch lengths were estimated using maximum parsimony using the pratchet function of the R package phangorn v2.5.5 (Schliep, 2011). R packages dowser v0.0.3 (Hoehn et al., 2020), alakazam v1.0.2.999 (Gupta et al., 2015), and ape v5.4-1 (Paradis et al., 2004) were used for phylogenetic analysis. Trees were visualized using ggtree v2.4.2 (Yu et al., 2016), and other figures were generated using ggplot2 v3.3.5 (Wickham, 2016) and ggpubr v0.4.0 (Kassambara, 2020). R v3.6.1 (R Development Core Team, 2017) was used for analysis of measurable evolution except for data from Davis et al., 2019. Due to technical upgrades, figure generation and selection analysis were performed using R v4.0.3, as well as ape v5.5, phangorn v2.7.1, shazam v1.1.0, alakazam v1.1.0, and dowser v0.1.0. Data from Davis et al., 2019 were also analyzed using these updated packages.

To test for measurable evolution over time, we use a modified version of the previously described phylogenetic date randomization test (Duchêne et al., 2015; Murray et al., 2016) implemented in dowser v0.0.3 (Hoehn et al., 2020). Briefly, for a given tree the divergence of each tip was calculated as the sum of branch lengths leading to the tree’s most recent common ancestor (MRCA). Only branches directly between a tip and the tree’s most recent common ancestor were used to calculate divergence. We next calculated the Pearson’s correlation between the divergence and sampling time of each tip. Measurably evolving lineages should show a positive correlation between divergence and time (Figure 1A). Divergence from the lineage’s predicted unmutated ancestral sequence rather than the MRCA could also be used. Because all sequences relate to the unmutated ancestral sequence through the MRCA node, this would add a constant additional divergence to all sequences, resulting in the same correlation as when the MRCA is used. We next identified monophyletic clades containing only sequences from a single timepoint (here referred to as ‘clusters’). We then randomly permuted sampling times among clusters, such that all sequences within each cluster had the same, randomly chosen timepoint. We next measured the correlation between divergence and time in this randomized tree, and repeated the process 100,000 times. We then estimated the p value that the observed correlation between divergence and time was no greater than expected from random distribution of times among clusters. This p value was calculated as the proportion of permutation replicates that had an equal or higher correlation than in the observed tree. We used a pseudocount of one for this calculation. The minimum possible p value for a lineage was calculated as one divided by possible number of distinct cluster permutations.

We modified the date randomization test to account for the high degree of topological uncertainty of many B cell lineage trees. More specifically, B cell lineage trees often contain large clusters of zero-length branches (soft polytomies) that represent high uncertainty in branching order (e.g., Figure 1—figure supplement 1). In bulk BCR data, these polytomies may be due to PCR error or sequencing error. If polytomies are resolved randomly into bifurcations, this can produce more single-timepoint monophyletic clades than necessary and lead to a high false positive rate of the date randomization test (Figure 1—figure supplements 1 and 2). To ensure this source of uncertainty did not increase the false positive rate of our analyses, we resolved bifurcations within each polytomy such that sequences from the same timepoint were grouped into the fewest possible number of single-timepoint monophyletic clades before performing permutations. While we do not have direct evidence that polytomies in B cell lineages trees are produced from PCR error, the fact that resolving them reduces the rate of lineages with a significant negative correlation between divergence and time (a biologically implausible result, Figure 1—figure supplement 2) suggests they are at least in part due to technical artifacts.

The clustered date randomization approach is more conservative than tests that permute tips uniformly (e.g., Unterman et al., 2020), but has been shown to be less biased if different subpopulations are sampled at each timepoint (Murray et al., 2016). To explore the effect of this modeling choice, we repeated the analyses in Table 1 using two-tailed clustered and uniform date randomization tests (Figure 1—figure supplement 2). Two-tailed tests can identify lineages with a significant positive or negative correlation between divergence and time. This is useful because a significant negative correlation between divergence and time is biologically implausible and represents a likely false positive result. Due to multiple testing under an alpha value of 0.025, we expect no more than 2.5% of lineages to have a significant negative correlation from these two-tailed tests. We found the uniform permutation test had a high rate of negatively evolving lineages (mean = 8.3%), indicating a high false positive rate. By contrast, the clustered permutation test without resolved polytomies had a mean rate of only 2.2% negatively evolving lineages, approximately as expected given an alpha value of 0.025. Resolving polytomies and then performing the clustered permutation test improved performance even more, with a mean rate of 1.2% negatively evolving lineages and no dataset having more than 2.8% of lineages negatively evolving. This analysis shows the uniform date randomization test is prone to false positives in empirical B cell data, while the clustered date randomization test with resolved polytomies corrects this issue. All other tests performed in this study used a one-tailed, clustered date randomization test with resolved polytomies and an alpha value of 0.05.

To identify and characterize measurably evolving lineages while adjusting for multiple testing, all lineages tested were pooled together and p values were adjusted using the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995) implemented in the function p.adjust (R Development Core Team, 2017). Lineages with adjusted p values <0.1 were referred to as adjusted measurably evolving lineages (Figure 2B). To determine whether lineages were measurably increasing in amino acid divergence (Figure 3A), we repeated the clustered date randomization test for each tree. However, instead of calculating divergence as the sum of phylogenetic branch lengths leading from each tip to the most recent common ancestor of the lineage, we calculated divergence as the number of nonambiguous amino acid differences between each tip and the lineage’s clonal germline. The clustered permutation test then proceeded as before, using the same cluster assignments as in the nucleotide-based test. This tested whether sequences at later timepoints had more amino acid substitutions compared to the germline than sequences at earlier timepoints.

It is possible that the results reported are affected by the size (number of sequences) of lineages in each dataset. A large number of lineages without adequate power could result in a spurious lack of measurable evolution. To ensure the lineages included in each study were adequately powered, we included only lineages with at least 15 sequences, that were sampled over at least 3 weeks, and had a minimum possible p value <0.05 based on the number of distinct permutations of timepoints among clusters. If measurable evolution were still strongly confounded by lineage size even after these filtering steps, we would expect measurably evolving lineages to be larger on average than nonmeasurably evolving lineages. By contrast, measurably evolving lineages were significantly larger than nonmeasurably evolving lineages in only 5/21 datasets surveyed (Figure 3—figure supplement 2), indicating our results are not strongly confounded by lineage size.

Inclusion of experimentally validated mAbs

Request a detailed protocol

To identify B cell lineages that likely bind to the antigen under study, we included experimentally validated monoclonal antibody (mAb) heavy chain sequences provided from multiple studies. This included multiple anti-HIV mAbs: 11 from Liao et al., 2013, 12 from Doria-Rose et al., 2014, 7 from Johnson et al., 2018, 42 from Landais et al., 2017, 31 from Wu et al., 2015, and 4 from Huang et al., 2016. Doria-Rose et al., 2014 and Wu et al., 2015 also provided 680 and 1033 bulk BCR sequences, respectively, identified as clonally related to the provided anti-HIV broadly neutralizing mAbs. These sequences were also included in processing and clonal clustering but were not labeled as experimentally validated mAbs. Davis et al., 2019 provided 885 mAb heavy chain sequences, some of which were tested for binding against Ebola virus proteins. All of these sequences were included in processing and clonal clustering, but only 368 validated by ELISA to bind to Ebola virus were labeled as EBV-binding mAbs. All of the above sequences were processed in the same manner as bulk sequences from OAS, except they were not filtered as potential PCR chimeras. Clonal lineages containing experimentally validated mAbs were labeled as antigen-binding; however, because sample timepoints were not always apparent, mAb sequences themselves were removed before lineage tree inference for the abovementioned studies. Processed data from Turner et al., 2020 also included 196 anti-influenza mAbs. These sequences were retained during tree inference because they were explicitly labeled by timepoint and usually cloned from previously identified sequences within the data. Of all mAbs included, only the 58 clonally clustered within powered lineages (at least 15 sequences sampled over 3 weeks, and minimum p value <0.05) were included in tests of mAb enrichment (Figure 2—figure supplement 3).

Simulation-based power analysis

Request a detailed protocol

We used simulations to determine whether the clustered date randomization test was sufficiently powered to detect ongoing B cell evolution. These analyses used the bcr-phylo package accessed 9/21/2020 (Davidsen and Matsen, 2018; Ralph and Matsen, 2020), which simulates clonal lineages of B cells undergoing affinity maturation against a target sequence. For all simulations, a random naive heavy chain sequence was chosen from those provided in bcr-phylo and the rate of SHM was set to the default of λ = 0.356, which corresponds to an SHM rate of ~0.001 SHM/site/division (Teng and Papavasiliou, 2007). Mutations were introduced according to the S5F model (Yaari et al., 2013). Selection strength was chosen to be either 0 (neutral) or 1 (entirely affinity driven). A single target sequence was chosen for affinity maturation. All other parameters were set to their default.

We performed two sets of simulations. In the first, we simulated single B cell lineages from which 50 cells were sampled at generation 10, and 50 more cells were sampled after a specified number of additional generations (Figure 1G, Figure 1—figure supplement 3). In the second type of simulation, we replicated the sampling strategy of Laserson et al., 2014. Briefly, for each clone in subject hu420143 from Laserson et al., 2014, we simulated one lineage with the same number of cells (if enough cells had been generated) sampled after the number of generations corresponding to 1, 3, 7, 14, 21, and 28 days (Figure 1—figure supplement 4). The number of generations corresponding to each sample day was calculated using a strict generation time of either 12 or 24 hr, which are conservative given previous GC cycle estimates of 6–12 hr (Allen et al., 2007; Hauser et al., 2007; Victora and Nussenzweig, 2012). These simulations used a selection strength of 1, which gave more conservative results in previous simulations (Figure 1—figure supplement 3).

To account for possible issues with clonal clustering, we did not preserve clonal identities among simulated sequences in either simulation type. Instead, we pooled sequences from all simulation repetitions under a particular parameter set and used the same clonal clustering method used for empirical data analyses to group them into clonal clusters. We did not repeat the genotyping or chimera filtering steps done on empirical data analyses as genotyped individuals and sequence chimeras were not part of the simulations. We performed the clustered date randomization test with resolved polytomies on each lineage with a minimum possible p value <0.05. Because all sequences were simulated under affinity maturation, the proportion of lineages with p < 0.05 indicated the true positive rate of the test. To determine the false positive rate, we randomized sample times among tips within each tree and repeated the date randomization test (Figure 1—figure supplements 3 and 4). Here, the proportion with p < 0.05 indicated the false positive rate.

Analysis of selection

Request a detailed protocol

To understand the force of selection operating on B cell lineages, we first separated all adjusted measurably evolving lineages into their respective subjects within each study. We then excluded all subjects with only one measurably evolving lineage. While all sequences included were labeled as productive by IgBlast, three contained premature stop codons in their IMGT-aligned sequences, likely due to insertions that were removed during alignment. These sequences were removed. For computational efficiency, all lineages were down-sampled to a maximum size of 100 sequences. Due to uncertainty in germline D-region assignment, only V-gene (IMGT positions 1–312) nucleotides were included for analyses of selection, similar to Hoehn et al., 2017. We then estimated lineage tree topologies, branch lengths, and subject-wide substitution model parameters under the GY94 model (Hoehn et al., 2019; Nielsen and Yang, 1998). Using fixed tree topologies estimated from the GY94 model, we then estimated branch lengths, subject-wide ω values for CDR and FWR partitions (ωCDR and ωFWR), and all six canonical SHM hot- and cold-spot motif parameters under the HLP19 model in IgPhyML v1.1.3 (Hoehn et al., 2019) for all adjusted measurably evolving lineages. Significance of ω estimates was determined using two phylogenetic likelihood ratio tests, similar to Hoehn et al., 2017. To determine the significance of ωCDR estimates, we compared the maximum log-likelihood obtained when both ωCDR and ωFWR were estimated by maximum likelihood (L) to that obtained when ωFWR was estimated by maximum likelihood but ωCDR was fixed at 1 (LCDR=1). The likelihood ratio statistic (LRS) for this test was calculated as 2×(L – LCDR=1). Because these models differ by one freely estimated parameter, the LRS will be approximately chi-squared distributed with one degree of freedom under the null hypothesis that ωCDR = 1, which allows for p value calculation (Huelsenbeck and Rannala, 1997). To determine significance of ωFWR estimates, the process is the same except LRS = 2×(L – LFWR=1), where LFWR=1 is the maximum log-likelihood obtained when ωCDR was estimated by maximum likelihood but ωFWR was fixed at 1 (LFWR=1). All of the above statistics are reported in Table 2.

Data and material availability

Request a detailed protocol

All data are publically available from prior publications. Script to reproduce all analyses performed are available at https://bitbucket.org/kleinstein/projects.git (Kleinstein Lab, 2021; copy archived at swh:1:rev:1ca83cda5d1baac880c71c314b0adc359314f6fa).

Data availability

The manuscript is a computational study. All data used are publicaly available. Source code are available at https://bitbucket.org/kleinstein/projects (copy archived at https://archive.softwareheritage.org/swh:1:rev:1ca83cda5d1baac880c71c314b0adc359314f6fa). All of the OAS datasets are available at this URL: http://opig.stats.ox.ac.uk/webapps/oas.

References

  1. Software
    1. R Development Core Team
    (2017) R: A Language and Environment for Statistical Computing
    R Foundation for Statistical Computing, Vienna, Austria.
    1. Wickham H
    (2016)
    Ggplot2: Elegant Graphics for Data Analysis
    ggplot2, Ggplot2: Elegant Graphics for Data Analysis, Cham, Springer-Verlag, 10.1007/978-3-319-24277-4.

Decision letter

  1. Armita Nourmohammad
    Reviewing Editor; University of Washington, United States
  2. Aleksandra M Walczak
    Senior Editor; École Normale Supérieure, France

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Human B cell lineages associated with germinal centers following influenza vaccination are measurably evolving" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Aleksandra Walczak as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. The authors should try to either show (with simulations) that the data rejects the following scenario or they should include it as an equally likely possibility:

"Suppose that B cell clones expanded and diversified through somatic hypermutation prior to the study period (that is, prior to the secondary vaccination event which is the focus of the study). It seems that preferential expansion of highly mutated subclones during the study period could bias detected sequences towards more divergent sequences, even without ongoing somatic mutation during the study period. Preferential expansion of divergent sequences would give rise to higher average divergence as the study period goes on, giving the appearance of accumulation of additional mutations, but in fact these mutations had occurred prior to the study period and are simply more readily detected in the sparsely sampled repertoire sequencing data after their expansion. Far from being simply a pathological counter-example, this scenario seems biologically plausible, given that B cells harboring more divergent, affinity-matured sequences should generally have higher affinity antibodies that allow them to better compete for limited antigen and thus provide stronger division stimulus. This model predicts that some highly divergent sequences exist at early timepoints and would occasionally be detected."

2. Do you see any evidence for positive (or negative) selection during somatic evolution? Do you see any significant statistical difference between non-synonymous and synonymous SHMs?

3. Please expand the discussion on multiple testing adjustments in the section "Measurably evolving lineages following influenza vaccination show signs of memory B cell origin". Notably, the reported percentages of measurably evolving lineages in several scenarios (7.2% for primary hepatitis B vaccination; 6.5% for allergen-specific immunotherapy; 5.9% for HIV infection) are near the false positive rate of the test (5% of lineages measurably evolving). The authors have performed this test on datasets from ~21 studies, raising a concern that multiple hypothesis testing could give rise to false positives in some of the datasets. These results are interpreted as evidence of measurable evolution, even though they could seemingly be explained by the false discovery rate combined with multiple hypothesis testing. The authors should clarify how these results can be interpreted in light of the false positive rate of their test and multiple hypothesis testing, and must consider whether more conservative conclusions are warranted in these scenarios.

4. Did non-measurably evolving lineages also contain monoclonal antibodies that bound to vaccine antigens? Is there enrichment of vaccine-binding monoclonal antibodies within measurably evolving lineages?

5. Please modify the abstract in how the following two scenarios are presented: "some lineages enter GCs and thus likely undergo SHM", and "the average SHM over all lineages doesn't increase more than some threshold". These two scenarios are not contradictory and could both be true.

6. Figure 3A is very visually striking, despite the small sample sizes. Could you explain why this figure show a much stronger correlation than Figure S7.

7. p23 l22: why is the divergence from mrca (rather than naive ancestor) the one that we want for these tests?

8. p24 l15: do you have direct evidence for how much of the polytomy prevalence is from PCR/sequencing error?

9. The distribution of p-values should be plotted for all datasets as in Figure S6. It would be instructive to look at them and compare the full distribution given the two choices of significance threshold at 1% and 5%.

Reviewer #1 (Recommendations for the authors):

I have a number of questions about how or why different steps were undertaken, but none of them seem likely to significantly affect the basic conclusions.

– Abstract: I don't think the two findings are contradictory. To my understanding, the first says "some lineages enter GCs and thus likely undergo SHM", whereas the second says "the average SHM over all lineages doesn't increase more than some threshold". Since I think the first doesn't measure what "some" is, and since flu is usually given to non-naive individuals whose responses vary greatly depending on exposure history, and since the threshold could be too small to detect some SHM that occurs, both of these results seem compatible with what I would imagine is most researchers' prior: some lineages undergo SHM in some circumstances. The (in my view quite large) contribution of the current paper is in illuminating what both of the "some"s in the previous sentence mean. I think setting it up as a conflict between two prior results that (unless I'm misunderstanding) aren't actually in conflict just confuses the reader. As it says at p11 l21: "consistent with a primarily GC-independent memory B cell response and/or rarity of antigen-specific lineages in the peripheral blood". I prefer the framing in the first sentence of Discussion: "The extent to which seasonal influenza vaccination stimulates affinity maturation…"

– p2 l10: saying that you "demonstrate measurable evolution" in some cases seems like you care only about false negatives, but not false positives. I prefer the way this is framed at end of intro, as a "survey" with "significant heterogeneity" that conforms to expectations in both directions.

– p3 l10: "at a rate orders of magnitude".

– p3 l14: "and, rarely, re-enter".

– p3 l21: same comment as abstract: not convinced they're in conflict.

– p6 l10: "and a second time after the".

– Figure 2A:

– two categories of HIV here (empty/first 60 weeks) should match those in B (early/late).

– why is hep B not broken into naive/boost here? I assume T=0 is prime, T~1.4 is boost?

– suggest "healthy children" rather than "healthy" so reader can guess whether they expect enrichment for ME.

– Figure 2B: having ~half the y axis devoted to p values and "late hiv" makes it hard to compare everything else. I think main message is e.g. flu is like hep boost, but not hep naive, which i can only really tell by parsing the tiny p values at the top.

– p11 l21: my understanding is that equating "GC-independent" with "no SHM" isn't correct, e.g. this https://pubmed.ncbi.nlm.nih.gov/33326765/ takes as settled that some shm takes place outside the GC. Also, I could be wrong but it would make more sense to me to say something like "only a small fraction of existing flu lineages are restimulated" (which as you say earlier is relatively rare) as the first alternative.

– p14 l1 I would think that "occur at low frequency in the blood" might be better than "are not enriched in the blood", since the latter (to me) sounds like the bottleneck is only on exiting GCs, rather than the other (previous) steps.

– Figure 3A is very striking/convincing (although I guess given the small sample sizes almost worrisome it's so straight?). But could you explain why just by eye it seems so much more striking (stronger correlation) than Figure S7? I realize "min GC %" and "proportion GC B cells" are different, as are "-log P" and "% ME", and one is a scatter plot with low transparency and lots of dots are on top of each other, but 3A looks like almost a perfect relationship, whereas S7 it's hard to even see a linear relationship.

– p17 l19: this paragraph is great, it's really convincing to me.

– p19 l21: couldn't there also be a lot of lineages that are condition-specific and GC-derived, but not re-stimulated by the current stimulus? I don't know a number for the frequency with which re-stimulation causes an antigen-specific lineage to re-enter GCs, but I wouldn't expect it to be very close to 1.

– p21 l1: "to cover as wide a variety of conditions as possible".

– p21 l15: does "redundancy" mean number of observed sequences? Or does it have something to do with the number of nucleotide changes you could make without affecting the AA seq? (I presume the former, i just haven't heard it used in this way). Could use "multiplicity" or "observations" if they would be equivalent.

– p23 l2: what does "manual inspection" consist of? i.e. how do you know by eye that 0.1 is correct for non-bimodal distributions?

– p23 l3: What does "masking" consist of? Does this mean that you're not inferring the D/insert portions of the naive ancestral sequence?

– p23 l22: why is the divergence from mrca (rather than naive ancestor) the one that we want for these tests? Maybe the trunk bit would maybe just cancel out? But then at p26 l6 it looks like for the AA version you do compare to naive/germline seq? From intro to Duchene 2015, it seems they used root (not mrca)?

– p24 l15: do you have direct evidence for how much of the polytomy prevalence is from PCR/sequencing error? For instance do you get fewer polytomies in data with barcodes/UMIs?

– Figure S3: why is it so much easier to detect measurable evolution when we're looking at neutral evolution, i.e. what causes the long downward tails of points in the top right plot (selection strength 1) vs the left plot (neutral)?

– Figure S5: why does it look like there's only a lower bound/quantile (no upper box) for red (Boost, Standard)?

– I would find it very interesting if you could expand on the alternative explanations in the last paragraph of the Discussion. Partly because "does not result from a complete lack of vaccine-induced B cell evolution" seems like a very low bar/unlikely null hypothesis (i don't think many people thought there was zero).

– It might be worth discussing why you don't (I think) attempt to measure selection (it's fine that you didn't). You do an amino acid-based analysis, which is related to this (but doesn't discuss selection strength), and do simulations with both neutral and strong selection, but I'm curious why you focused only on detecting SHM/evolution, and not on whether it was neutral or not.

- It would also be nice to discuss why using parsimony (very heuristic, not very accurate) was preferred over more sophisticated methods.

Reviewer #2 (Recommendations for the authors):

Congratulations on the paper! I enjoyed reading the preprint, and I only have a few comments and suggestions that I list below.

1. For more clarity, the distribution of p-values should be plotted for all datasets as in Figure S6. It would be instructive to look at them and compare the full distribution given the two choices of significance threshold at 1% and 5%. To this end, I think it would make sense to plot the cumulative function.

2. I think the section "Measurably evolving lineages following influenza vaccination show signs of memory B cell origin" could use a more extensive explanation of the multiple testing adjustment. The p-values distributions would also be important here to distinguish the standard randomization test p-values with the BH adjusted p-values. Detecting lineages using the second definition of the p-value should also be tested with synthetic datasets.

3. The initial germline divergence is quantified using the sum of branch lengths for each lineage. I suppose this depends strongly on the lineage size (and that one on the experimental protocol). Is there a way to control for this? (For instance, would it make sense to look at these distributions for subsampled lineages of equal size?)

4. In the discussion you refer to the rates of somatic hypermutation and the length of the GC cycles as given by the literature you cite. For completeness (perhaps as a supplementary figure), could you report the values of the slope fitted in the SHM number vs sample time plot for measurably evolving lineages (as in Figure 1B)? I would be curious to see how these numbers compare with independent estimations from the data and whether their distribution changes significantly between cohorts you've studied.

5. Re discussion on page 11: Even if memory B cells do not re-enter GC, one could imagine detecting the ongoing evolution of naive cells – this possibility should be discussed. Later the results suggest the evolving lineages come mainly from memory cells (page 12) but a priori both scenarios could be true.

6. Figure 1G misses the y axis label and the x axis label is somewhat confusing without reference to the main text. The fractions in boxes should be written with the "%" sign (also in other figures).

7. In Figure 2A the point corresponding to the early-childhood dataset should be distinguishable from other healthy data (I guess it's the "significant" green point).

The caption of Figure 2B should use the term "initial germline divergence" again, as in the y axis label to avoid confusion.

8. Page 5 line 3: before using "SHM/site" first, it would be better to say what it means in words.

Page 5 line 4: In evolving lineages, sequences sampled at later time points are (…).

https://doi.org/10.7554/eLife.70873.sa1

Author response

Essential revisions:

1. The authors should try to either show (with simulations) that the data rejects the following scenario or they should include it as an equally likely possibility:

"Suppose that B cell clones expanded and diversified through somatic hypermutation prior to the study period (that is, prior to the secondary vaccination event which is the focus of the study). It seems that preferential expansion of highly mutated subclones during the study period could bias detected sequences towards more divergent sequences, even without ongoing somatic mutation during the study period. Preferential expansion of divergent sequences would give rise to higher average divergence as the study period goes on, giving the appearance of accumulation of additional mutations, but in fact these mutations had occurred prior to the study period and are simply more readily detected in the sparsely sampled repertoire sequencing data after their expansion. Far from being simply a pathological counter-example, this scenario seems biologically plausible, given that B cells harboring more divergent, affinity-matured sequences should generally have higher affinity antibodies that allow them to better compete for limited antigen and thus provide stronger division stimulus. This model predicts that some highly divergent sequences exist at early timepoints and would occasionally be detected."

We agree that this is a potential alternative explanation for a significant positive correlation between divergence and time, and have now addressed this as a possibility in the Results and Discussion sections. We have also removed explicit references to detecting “ongoing SHM” in the text, in favor terms that more directly reflect what our test detects such as “B cell evolution” or “increasing SHM frequency” which do not imply novel SHM over the sampling interval. Nevertheless, we believe our results are more easily explained as a result of ongoing SHM, and have added some text making that point. In the context of influenza vaccination, day 5 plasmablasts represent the breadth of the B cell memory pool. If measurable evolution were due solely from preferential recall, we would expect the divergences of sequences at later timepoints to fall within the range of day 5 plasmablasts. Instead, in the high-GC influenza binding lineages we identified (Figure 3B/C), many late-sampled GC sequences are clearly more diverged from the day 5 plasmablast response. Further, if measurable evolution from influenza vaccination were due simply to preferential re-stimulation of highly mutated B cells, we would expect influenza binding lineages without any GC sequences to be measurably evolving. To test this, we repeated the analysis in Figure 3A using only lineages containing influenza-binding monoclonal antibodies (mAbs). Results were highly consistent with Figure 3A: influenza-binding lineages without GC sequences were less likely to be evolving than those with high proportions of GC sequences (Figure 3 —figure supplement 3). Thus, significant GC involvement, rather than simply binding to influenza, is more predictive of measurable evolution. All of these points are more easily explained if measurable evolution is the result of additional SHM. Nonetheless, we cannot definitely rule out this alternative explanation, we have highlighted both possible mechanisms of B cell evolution. We have included descriptions of this new analysis in the Results (pp. 14-15) and Discussion (pp. 20-21).

2. Do you see any evidence for positive (or negative) selection during somatic evolution? Do you see any significant statistical difference between non-synonymous and synonymous SHMs?

We have now included additional analysis of selection in a new section in the Results (pp. 1113), Methods (pp. 32-33) and in Table 2. We quantified the ratio of non-synonymous to synonymous substitutions (dN/dS) for CDRs and FWRs for all adjusted (and non-adjusted) measurably evolving lineages. We found evidence of purifying selection (dN/dS < 1), which is explainable as part of normal affinity maturation, and consistent with previous observations (Hoehn et al., 2019). We further added a discussion of these results to the Discussion section (p. 17).

3. Please expand the discussion on multiple testing adjustments in the section “Measurably evolving lineages following influenza vaccination show signs of memory B cell origin”. Notably, the reported percentages of measurably evolving lineages in several scenarios (7.2% for primary hepatitis B vaccination; 6.5% for allergen-specific immunotherapy; 5.9% for HIV infection) are near the false positive rate of the test (5% of lineages measurably evolving). The authors have performed this test on datasets from ~21 studies, raising a concern that multiple hypothesis testing could give rise to false positives in some of the datasets. These results are interpreted as evidence of measurable evolution, even though they could seemingly be explained by the false discovery rate combined with multiple hypothesis testing. The authors should clarify how these results can be interpreted in light of the false positive rate of their test and multiple hypothesis testing, and must consider whether more conservative conclusions are warranted in these scenarios.

We appreciate the reviewer’s concern and have added a new section on multiple hypothesis testing to the Discussion detailing these caveats (pp. 19-20), as well as additional details to the relevant Results sections (p. 10). We also repeated our initial germline divergence analysis that used “adjusted” measurably evolving lineages without the multiple testing correction, and found similar results (Figure 2 —figure supplement 4). We also repeated these analyses using a more strict cutoff (adjusted p < 0.05), which also yielded similar results (Figure 2 —figure supplement 4). These are discussed in the main text on p. 11.

4. Did non-measurably evolving lineages also contain monoclonal antibodies that bound to vaccine antigens? Is there enrichment of vaccine-binding monoclonal antibodies within measurably evolving lineages?

Yes, both measurably evolving and non-measurably evolving lineages contained antigen-binding mAbs. We added a new analysis to the Results (p. 9) showing that measurably evolving lineages were enriched for mAbs in nearly all studies that included mAbs relevant to the condition being studied (Figure 2 —figure supplement 3). This was also true in our analysis of influenza vaccination. While intriguing, we also detail multiple caveats for this new analysis in the Discussion (p. 18), including the low numbers of mAbs in individual studies and the fact that mAbs are not always randomly chosen among lineages. We also added further detail about how mAbs were selected and included in the Methods section (p. 30).

5. Please modify the abstract in how the following two scenarios are presented: “some lineages enter GCs and thus likely undergo SHM”, and “the average SHM over all lineages doesn’t increase more than some threshold”. These two scenarios are not contradictory and could both be true.

We have rephrased this sentence in the abstract to state both claims without implying they necessarily contradict each other. We also removed a sentence explicitly stating these results are contradictory in the Introduction (p. 3).

6. Figure 3A is very visually striking, despite the small sample sizes. Could you explain why this figure show a much stronger correlation than Figure S7.

These figures are showing different metrics. Figure 3A shows the minimum GC % on the x axis, while Figure S7 (now Figure 3 —figure supplement 1) shows the raw GC %. We have changed Figure 3 —figure supplement 1 to show the correlation between divergence and time rather than the -log10(p value). We believe this shows the relationship much more clearly. We have also included a bootstrap analysis of this relationship to demonstrate its significance. These are detailed in the main text on p. 13.

7. P23 l22: why is the divergence from mrca (rather than I ancestor) the one that we want for these tests?

Divergence from the lineage’s predicted germline ancestor sequence rather than the MRCA could also be used. However, because all sequences relate to the germline sequence through the MRCA node, this would add a constant additional divergence to all sequences, resulting in the same correlation as when the MRCA is used. We have now made note of this in the Methods (p. 26).

8. P24 l15: do you have direct evidence for how much of the polytomy prevalence is from PCR/sequencing error?

No, but circumstantially this seems like a real possibility. Further, without polytomy resolution we see biologically implausible patterns such as significant negative evolution in some lineages (Figure 1 —figure supplement 2). Because of this we believe this is an important factor to account for. We have now made note of this in the Results (p. 5) Methods (p. 27) and Discussion (p. 20).

9. The distribution of p-values should be plotted for all datasets as in Figure S6. It would be instructive to look at them and compare the full distribution given the two choices of significance threshold at 1% and 5%.

We have now included these in the revised Figure 2 —figure supplement 1, referenced in the main text on p. 10.

Reviewer #1 (Recommendations for the authors):

I have a number of questions about how or why different steps were undertaken, but none of them seem likely to significantly affect the basic conclusions.

– abstract: I don’t think the two findings are contradictory. To my understanding, the first says “some lineages enter GCs and thus likely undergo SHM”, whereas the second says “the average SHM over all lineages doesn’t increase more than some threshold”. Since I think the first doesn’t measure what “some” is, and since flu is usually given to non-I individuals whose responses vary greatly depending on exposure history, and since the threshold could be too small to detect some SHM that occurs, both of these results seem compatible with what I would imagine is most researchers’ prior: some lineages undergo SHM in some circumstances. The (in my view quite large) contribution of the current paper is in illuminating what both of the “some”s in the previous sentence mean. I think setting it up as a conflict between two prior results that (unless I’m misunderstanding) aren’t actually in conflict just confuses the reader. As it says at p11 l21: “consistent with a primarily GC-independent memory B cell response and/or rarity of antigen-specific lineages in the peripheral blood”. I prefer the framing in the first sentence of Discussion: “The extent to which seasonal influenza vaccination stimulates affinity maturation…”

As detailed in Essential Revisions, we have edited the abstract so that these two concepts are not framed as in conflict.

– p2 l10: saying that you “demonstrate measurable evolution” in some cases seems like you care only about false negatives, but not false positives. I prefer the way this is framed at end of intro, as a “survey” with “significant heterogeneity” that conforms to expectations in both directions.

We have edited this in the abstract.

– p3 l10: “at a rate orders of magnitude”.

– p3 l14: “and, rarely, re-enter”.

– p3 l21: same comment as abstract: not convinced they’re in conflict.

– p6 l10: “and a second time after the”.

Thanks for catching these typos! We have fixed them. We also changed “and, rarely, re-enter” to read “or possibly re-enter” which more accurately reflects the potential fate of memory B cells.

– Figure 2A:

– two categories of HIV here (empty/first 60 weeks) should match those in B (early/late).

These are meant to be different categories. In Figure 2A, the “first 60 weeks” are derived from all HIV studies and only include sequences from the first 60 weeks of sampling. The early/late categories in 2B refer to how long after HIV infection the study was performed. These were chosen to emphasize different aspects of the data. In Figure 2A we wanted to show that the sample interval did not determine the extent of evolution (thus subsampled data to 60 weeks). In Figure 2B we separated based on the time since infection because we were interested in initial divergence at the first timepoint. We’ve clarified this in the caption for Figure 2B. It is also worth noting that there are only 5 “60 week” HIV studies rather than 8 because three studies (Huang et al., 2016, Schanz et al., 2014, and Wu et al., 2015) did not have two sample points within the first year of sampling, and were thus not included in “HIV, first 60 weeks.” We have made note of this in the Figure 2 caption.

– why is hep B not broken into naive/boost here? I assume T=0 is prime, T~1.4 is boost?

– suggest "healthy children" rather than "healthy" so reader can guess whether they expect enrichment for ME.

We have made both of these changes in the updated Figure 2 caption.

– Figure 2B: having ~half the y axis devoted to p values and "late hiv" makes it hard to compare everything else. I think main message is e.g. flu is like hep boost, but not hep naive, which i can only really tell by parsing the tiny p values at the top.

We have increased the font size for the p values and lowered the bars somewhat for this figure.

The most interesting comparison for us was that between influenza vaccination and early/late HIV. This is because early vs late HIV likely represent early vs chronic primary immune response, while influenza vaccination likely represents a memory response. Because of that, we kept the HIV categories as close to the influenza results as possible.

– p11 l21: my understanding is that equating "GC-independent" with "no SHM" isn't correct, e.g. this https://pubmed.ncbi.nlm.nih.gov/33326765/ takes as settled that some shm takes place outside the GC. Also, I could be wrong but it would make more sense to me to say something like "only a small fraction of existing flu lineages are restimulated" (which as you say earlier is relatively rare) as the first alternative.

We agree with this reviewer that “GC independent” does not necessarily mean “no SHM” is occurring. However, in this sentence in the text we are making the converse claim, that “no SHM” indicates “GC independence.” We think this is a more defensible statement, since GCs are organs dedicated to SHM and affinity maturation (now p. 10).

– p14 l1 I would think that "occur at low frequency in the blood" might be better than "are not enriched in the blood", since the latter (to me) sounds like the bottleneck is only on exiting GCs, rather than the other (previous) steps.

We have changed this to “do not occur at high frequency in the blood” since the observed frequencies are close to the expected false positive rate (now p. 10).

– Figure 3A is very striking/convincing (although I guess given the small sample sizes almost worrisome it's so straight?). But could you explain why just by eye it seems so much more striking (stronger correlation) than Figure S7? I realize "min GC %" and "proportion GC B cells" are different, as are "-log P" and "% ME", and one is a scatter plot with low transparency and lots of dots are on top of each other, but 3A looks like almost a perfect relationship, whereas S7 it's hard to even see a linear relationship.

We have addressed this in the Essential Revisions.

– p17 l19: this paragraph is great, it's really convincing to me.

Thanks!

– p19 l21: couldn't there also be a lot of lineages that are condition-specific and GC-derived, but not re-stimulated by the current stimulus? I don't know a number for the frequency with which re-stimulation causes an antigen-specific lineage to re-enter GCs, but I wouldn't expect it to be very close to 1.

We have now revised the text to address this as a possibility (now p. 22).

– p21 l1: "to cover as wide a variety of conditions as possible".

We have made this change (p. 23).

– p21 l15: does "redundancy" mean number of observed sequences? Or does it have something to do with the number of nucleotide changes you could make without affecting the AA seq? (I presume the former, i just haven't heard it used in this way). Could use "multiplicity" or "observations" if they would be equivalent.

Yes. We have changed this to “multiplicity” as recommended (p. 23).

– p23 l2: what does "manual inspection" consist of? i.e. how do you know by eye that 0.1 is correct for non-bimodal distributions?

We have added more detail to clarify the approach. Briefly, it means we looked at the distributions to verify 0.1 was not too far away from an identifiable minimum.

– p23 l3: What does "masking" consist of? Does this mean that you're not inferring the D/insert portions of the naive ancestral sequence?

Yes. We have now clarified this in the text (p. 25).

– p23 l22: why is the divergence from mrca (rather than naive ancestor) the one that we want for these tests? Maybe the trunk bit would maybe just cancel out? But then at p26 l6 it looks like for the AA version you do compare to naive/germline seq? From intro to Duchene 2015, it seems they used root (not mrca)?

Addressed in Essential Revisions.

– p24 l15: do you have direct evidence for how much of the polytomy prevalence is from PCR/sequencing error? For instance do you get fewer polytomies in data with barcodes/UMIs?

Addressed in Essential Revisions.

– Figure S3: why is it so much easier to detect measurable evolution when we're looking at neutral evolution, i.e. what causes the long downward tails of points in the top right plot (selection strength 1) vs the left plot (neutral)?

We suspect this is due to selection reducing the rate of divergence in the simulations, therefore lowering the power of the test. We confirmed that simulations that include selection had lower average germline divergence than neutral simulations (Figure 1 —figure supplement 5). This makes sense, as selection can only reduce genetic diversity. We’ve addressed this now on p. 6.

– Figure S5: why does it look like there's only a lower bound/quantile (no upper box) for red (Boost, Standard)?

The lower bound whisker was covered by two of the dots. We re-ran these analyses which, in addition to changing the numerical results slightly also moved the dots out of the way (Figure 2 —figure supplement 2).

– I would find it very interesting if you could expand on the alternative explanations in the last paragraph of the Discussion. Partly because "does not result from a complete lack of vaccine-induced B cell evolution" seems like a very low bar/unlikely null hypothesis (i don't think many people thought there was zero).

We appreciate the reviewer’s interest in this question. This is an area of active investigation for us, and we are obtaining new data to look into alternative explanations. However, for this paper we wanted to avoid speculating too much on what could be driving the low efficacy of influenza vaccination. We’ve made it clearer that future work will be needed to test these alternative explanations (p. 22).

– It might be worth discussing why you don't (I think) attempt to measure selection (it's fine that you didn't). You do an amino acid-based analysis, which is related to this (but doesn't discuss selection strength), and do simulations with both neutral and strong selection, but I'm curious why you focused only on detecting SHM/evolution, and not on whether it was neutral or not.

We have addressed this in the Essential Revisions.

- It would also be nice to discuss why using parsimony (very heuristic, not very accurate) was preferred over more sophisticated methods.

We have added an explanation to the Discussion (p. 18).

Reviewer #2 (Recommendations for the authors):

Congratulations on the paper! I enjoyed reading the preprint, and I only have a few comments and suggestions that I list below.

Thanks! Glad you liked the paper.

1. For more clarity, the distribution of p-values should be plotted for all datasets as in Figure S6. It would be instructive to look at them and compare the full distribution given the two choices of significance threshold at 1% and 5%. To this end, I think it would make sense to plot the cumulative function.

We have addressed this in the Essential Revisions.

2. I think the section "Measurably evolving lineages following influenza vaccination show signs of memory B cell origin" could use a more extensive explanation of the multiple testing adjustment. The p-values distributions would also be important here to distinguish the standard randomization test p-values with the BH adjusted p-values. Detecting lineages using the second definition of the p-value should also be tested with synthetic datasets.

We have addressed this in the Essential Revisions.

3. The initial germline divergence is quantified using the sum of branch lengths for each lineage. I suppose this depends strongly on the lineage size (and that one on the experimental protocol). Is there a way to control for this? (For instance, would it make sense to look at these distributions for subsampled lineages of equal size?)

The divergence is not the sum of all branch lengths within a lineage, which is more appropriately referred to as the “diversity” of a lineage. Rather, the divergence for an individual tip is calculated by tracing the branches from that tip to the most recent common ancestor (MRCA). The divergence is the sum of only those branches between the tip and the MRCA. In principle, it should not be strongly affected by the lineage size. An example is shown in Author response image 1, with a full tree at top and a subsampled tree at bottom. The red branches show the lengths that are added up to get the divergence for the selected tip, and are the same despite so many tips being dropped. While having fewer sequences may move the MRCA node, ultimately that should not affect the analysis because it is only the relative divergence at each timepoint that is relevant for the correlation test. We note that in practice long phylogenetic branch lengths tend to be underestimated with maximum parsimony methods, so the divergence of a tip can often increase when trees are sampled more completely. However, this is a much more minor effect than the difference between diversity and divergence. We have now added text to clarify the definition of divergence in the Methods section. While we expect statistical power to detect measurable evolution to be generally higher in larger lineages, we show in Figure 3 —figure supplement 2 that measurably evolving lineages are significantly larger in only 5/21 datasets surveyed, indicating lineage size is not enough to drive a trend towards measurable evolution.

Author response image 1

4. In the discussion you refer to the rates of somatic hypermutation and the length of the GC cycles as given by the literature you cite. For completeness (perhaps as a supplementary figure), could you report the values of the slope fitted in the SHM number vs sample time plot for measurably evolving lineages (as in Figure 1B)? I would be curious to see how these numbers compare with independent estimations from the data and whether their distribution changes significantly between cohorts you've studied.

We thank the reviewer for this suggestion, and agree that the results could make for an interesting comparison. However, we do not believe that this analysis is directly related to the claims we are making in this manuscript. It is certainly an area we are looking into for future work, however.

5. Re discussion on page 11: Even if memory B cells do not re-enter GC, one could imagine detecting the ongoing evolution of naive cells – this possibility should be discussed. Later the results suggest the evolving lineages come mainly from memory cells (page 12) but a priori both scenarios could be true.

We have now added text to address this possibility in the Results (p. 9).

6. Figure 1G misses the y axis label and the x axis label is somewhat confusing without reference to the main text. The fractions in boxes should be written with the "%" sign (also in other figures).

We added the y-axis label to Figure 1 and the % symbol to Figure 1 and relevant supplemental figures.

7. In Figure 2A the point corresponding to the early-childhood dataset should be distinguishable from other healthy data (I guess it's the "significant" green point).

The caption of Figure 2B should use the term "initial germline divergence" again, as in the y axis label to avoid confusion.

We have made these changes to distinguish healthy adults and children in Figure 2 and Table 1.

8. Page 5 line 3: before using "SHM/site" first, it would be better to say what it means in words.

Page 5 line 4: In evolving lineages, sequences sampled at later time points are (…).

We have made these changes.

https://doi.org/10.7554/eLife.70873.sa2

Article and author information

Author details

  1. Kenneth B Hoehn

    Department of Pathology, Yale School of Medicine, New Haven, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Methodology, Software, Writing – original draft, Writing – review and editing
    Competing interests
    receives consulting fees from Prellis Biologics
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0411-4307
  2. Jackson S Turner

    Department of Pathology and Immunology, Washington University School of Medicine, St Louis, United States
    Contribution
    Data curation, Resources, Writing – review and editing
    Competing interests
    is the recipient of a licensing agreement with Abbvie and has received consulting fees from Gerson Lehman Group
  3. Frederick I Miller

    Worcester Polytechnic Institute, Worcester, United States
    Contribution
    Data curation, Software, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Ruoyi Jiang

    Department of Immunobiology, Yale School of Medicine, New Haven, United States
    Contribution
    Data curation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Oliver G Pybus

    Department of Zoology, University of Oxford, Oxford, United Kingdom
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Ali H Ellebedy

    1. Department of Pathology and Immunology, Washington University School of Medicine, St Louis, United States
    2. The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St Louis, United States
    Contribution
    Resources, Supervision, Writing – review and editing
    Competing interests
    The Ellebedy laboratory received funding under sponsored research agreements from Emergent BioSolutions and AbbVie
  7. Steven H Kleinstein

    1. Department of Pathology, Yale School of Medicine, New Haven, United States
    2. Department of Immunobiology, Yale School of Medicine, New Haven, United States
    3. Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, United States
    Contribution
    Conceptualization, Funding acquisition, Supervision, Writing – original draft, Writing – review and editing
    For correspondence
    steven.kleinstein@yale.edu
    Competing interests
    receives consulting fees from Northrop Grumman and Peraton
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4957-1544

Funding

National Institute of Allergy and Infectious Diseases (R01 AI104739)

  • Steven H Kleinstein

European Research Council (614725-PATHPHYLODYN)

  • Oliver G Pybus

National Institute of Allergy and Infectious Diseases (R21 AI139813)

  • Dr Ali Ellebedy

National Institute of Allergy and Infectious Diseases (U01 AI141990)

  • Dr Ali Ellebedy

National Institute of Allergy and Infectious Diseases (HHSN272201400006C)

  • Dr Ali Ellebedy

National Institute of Allergy and Infectious Diseases (5T32CA009547)

  • Jackson S Turner

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to thank Dr Louis Du Plessis for helpful discussion, and Dr Julian Q Zhou for providing processed data. This work was funded in part by National Institutes of Health, National Institute of Allergy and Infectious Diseases grant R01 AI104739, and by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/European Research Council grant agreement number 614725-PATHPHYLODYN. The Ellebedy laboratory was supported by NIAID grants R21 AI139813, U01 AI141990, and NIAID Centers of Excellence for Influenza Research and Surveillance (CEIRS) contract HHSN272201400006C to AHE. JST was supported by NIAID 5T32CA009547.

Senior Editor

  1. Aleksandra M Walczak, École Normale Supérieure, France

Reviewing Editor

  1. Armita Nourmohammad, University of Washington, United States

Publication history

  1. Preprint posted: January 7, 2021 (view preprint)
  2. Received: June 1, 2021
  3. Accepted: November 11, 2021
  4. Accepted Manuscript published: November 17, 2021 (version 1)
  5. Accepted Manuscript updated: November 25, 2021 (version 2)
  6. Version of Record published: January 7, 2022 (version 3)

Copyright

© 2021, Hoehn et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,103
    Page views
  • 197
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Kenneth B Hoehn
  2. Jackson S Turner
  3. Frederick I Miller
  4. Ruoyi Jiang
  5. Oliver G Pybus
  6. Ali H Ellebedy
  7. Steven H Kleinstein
(2021)
Human B cell lineages associated with germinal centers following influenza vaccination are measurably evolving
eLife 10:e70873.
https://doi.org/10.7554/eLife.70873

Further reading

    1. Cancer Biology
    2. Computational and Systems Biology
    Gökçe Senger et al.
    Research Article Updated

    Aneuploidy, a state of chromosome imbalance, is a hallmark of human tumors, but its role in cancer still remains to be fully elucidated. To understand the consequences of whole-chromosome-level aneuploidies on the proteome, we integrated aneuploidy, transcriptomic, and proteomic data from hundreds of The Cancer Genome Atlas/Clinical Proteomic Tumor Analysis Consortium tumor samples. We found a surprisingly large number of expression changes happened on other, non-aneuploid chromosomes. Moreover, we identified an association between those changes and co-complex members of proteins from aneuploid chromosomes. This co-abundance association is tightly regulated for aggregation-prone aneuploid proteins and those involved in a smaller number of complexes. On the other hand, we observed that complexes of the cellular core machinery are under functional selection to maintain their stoichiometric balance in aneuploid tumors. Ultimately, we provide evidence that those compensatory and functional maintenance mechanisms are established through post-translational control, and that the degree of success of a tumor to deal with aneuploidy-induced stoichiometric imbalance impacts the activation of cellular protein degradation programs and patient survival.

    1. Computational and Systems Biology
    2. Neuroscience
    Roshan Prakash Rane et al.
    Research Article

    Alcohol misuse during adolescence (AAM) has been associated with disruptive development of adolescent brains. In this longitudinal machine learning (ML) study, we could predict AAM significantly from brain structure (T1-weighted imaging and DTI) with accuracies of 73 - 78% in the IMAGEN dataset (n ~1182). Our results not only show that structural differences in brain can predict AAM, but also suggests that such differences might precede AAM behavior in the data. We predicted ten phenotypes of AAM at age 22 using brain MRI features at ages 14, 19, and 22. Binge drinking was found to be the most predictable phenotype. The most informative brain features were located in the ventricular CSF, and in white matter tracts of the corpus callosum, internal capsule, and brain stem. In the cortex, they were spread across the occipital, frontal, and temporal lobes and in the cingulate cortex. We also experimented with four different ML models and several confound control techniques. Support Vector Machine (SVM) with rbf kernel and Gradient Boosting consistently performed better than the linear models, linear SVM and Logistic Regression. Our study also demonstrates how the choice of the predicted phenotype, ML model, and confound correction technique are all crucial decisions in an explorative ML study analyzing psychiatric disorders with small effect sizes such as AAM.