Inference of germinal center evolutionary dynamics via simulation-based deep learning

  1. Duncan K Ralph  Is a corresponding author
  2. Athanasios G Bakis
  3. Jared G Galloway
  4. Ashni A Vora
  5. Tatsuya Araki
  6. Gabriel D Victora
  7. Yun S Song
  8. William S DeWitt
  9. Frederick A Matsen  Is a corresponding author
  1. Fred Hutchinson Cancer Research Center, United States
  2. Department of Statistics, University of California, Irvine, United States
  3. Laboratory of Lymphocyte Dynamics, The Rockefeller University, United States
  4. Howard Hughes Medical Institute, United States
  5. Department of Statistics, University of California, Berkeley, United States
  6. Computer Science Division, University of California, Berkeley, United States
  7. Department of Genome Sciences, University of Washington, United States
  8. Department of Statistics, University of Washington, United States
8 figures, 3 tables and 1 additional file

Figures

Figure 1 with 3 supplements
Overview of the workflow for this paper.

After specifying a birth-death model with sigmoid affinity–fitness response function, we simulate many trees and their sequences at each node, with parameters roughly consistent with our real data samples. The simulation is cell-based and implements a carrying-capacity population size limit. The results of the simulation are then encoded and used to train a neural network that infers the sigmoid response parameters on real data. In addition to encoded trees, the network also takes as input the assumed values of several non-sigmoid parameters (carrying capacity, initial population size, and death rate). Inference on real data is performed many times, with many different combinations of non-sigmoid parameter values, and additional ‘data mimic’ simulation is generated using each of the resulting inferred parameter value combinations. The summary statistics of each data mimic sample are compared to real data, with the best match selected as the ‘central data mimic’ sample with final parameter values for both sigmoid (inferred with the neural network) and non-sigmoid (inferred by matching summary statistics) parameters. Plots from elsewhere in the paper are rendered in schematic form: those in ‘infer on data’ refer to Figure 4—figure supplement 1, and those in ‘simulate with inferred parameters’ to Figure 5.

Figure 1—figure supplement 1
Simulation response function example with sampled affinity values.

Example of simulation response function (red, left axis) and resulting node affinity values (histogram, right axis). The response function describes the relationship between affinity and fitness, while the node values show the actual affinity values of nodes in the resulting simulation for both internal (blue) and leaf (orange) nodes. Note that the count is not sampling the response function, so we do not expect the histogram and the function to match.

Figure 1—figure supplement 2
Diagram of curve difference loss function calculation.

Curve difference loss function calculation. We divide the ‘difference’ area (red bars) between the true (green) and inferred (red) curves by the area (light red) under the true curve within the bounds [−2.5,3]. The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.

Figure 1—figure supplement 3
Example of approximate sigmoid parameter degeneracy.

A potential pair of true (green) and inferred (blue) sigmoid curves, with true (inferred) parameter values. In the range shown, the inferred curve has compensated for a too-small xscale (transition steepness) by increasing xshift (shifting to the right) and yscale (upper asymptote). This results in a curve difference loss value of only 9%, which is much smaller than our expected inference accuracy.

Figure 2 with 1 supplement
Training and testing results for the sigmoid model on simulation.

We show curve difference loss distributions on several subsets of the training sample (where each GC has different parameter values): training and validation (left) and testing (right). For computational efficiency when plotting, the curve difference distributions display only the first 1000 values. See Figure 3 for per-bin model.

Figure 2—figure supplement 1
Example true and inferred response functions on the training sample.

The ‘diff’ entry in the per-plot table gives the curve difference loss.

Training and testing results for the per-bin model on simulation.

See Figure 2 for details.

Figure 4 with 2 supplements
Inferred response functions on real data for sigmoid (left) and per-bin (right) models, corresponding to the non-sigmoid parameter values yielding simulation with the best-matching summary statistics.

The medoid curve is shown in orange with 68% and 95% confidence intervals in blue, with observed affinity values in gray.

Figure 4—figure supplement 1
Example inferred sigmoid curves on data for four representative GCs.
Figure 4—figure supplement 2
Example inferred per-bin response functions on data for four representative GCs.
Figure 5 with 2 supplements
Summary statistics on data vs simulation for the central data mimic simulation sample that most closely mimics inferred data parameters.

Simulation truth (dashed green) is unobservable and shown only for completeness; the important comparison is between purple and green solid lines, where both data and simulation have been run through IQ-TREE.

Figure 5—figure supplement 1
Additional summary statistics distributions for the same central data mimic sample as the main figure.
Figure 5—figure supplement 2
Summary statistic distributions for the simulation sample used for training.

In order to allow easy comparison of abundance distributions, these plots include only 120 of the simulated trees. This sample is designed to have a wide range of parameters, encompassing all plausible true data values, and is thus not designed to closely match data summary statistics.

Inferred sigmoid curves for the central data mimic data-like simulation sample.

The medoid curve is shown in orange, and the true curve is shown in green. This sample consists of 120 GCs all simulated with the same sigmoid parameters (from the central data prediction) and non-sigmoid parameters (from the best-matched summary statistics).

Effective birth rates (left column) calculated for three GCs simulated with the central data mimic parameters.

The effective birth rate for cell i is defined as mλiμi for the carrying capacity modulating factor m, intrinsic birth rate λi, and death rate μi. It is shown at several different time values for all living cells, except that for plotting clarity, cells closer to each other than 0.1 in affinity are not shown. Note that we extend time here to 50 days in order to aid comparison to the bulk data used by the traveling wave model (DeWitt et al., 2025, Figure S6D), which assumes a steady state. (Our extracted GC data is sampled at 15 and 20 days.) Vertical lines (left and middle columns) mark the point at which the slope has dropped to 1/2 its maximum value (if absent, the slope never falls below this threshold). We also show the intrinsic birth rate λ (middle column, solid red curve), and include histograms of sampled affinities at the final timepoint. The right column shows time traces of the number of living cells (i.e. nodes, in blue) and the mean affinity of all living cells (red).

The net growth rate from DeWitt et al., 2025 Figure S6D (left) compared to our effective birth rate (right) for three different ‘target’ mean affinity values.

We selected three well-spaced timepoints from Figure S6D, and matched the corresponding mean affinity values from the bulk data to particular time slices in 25 GCs simulated with our central data mimic parameters (inferred on the extracted GC data) out to time 70. The net growth rate describes the net population growth at affinity x and time t in a traveling wave fitness model (see text). In our model, the effective birth rate is defined and plotted as in Figure 7. We also show stacked histograms of observed affinity values from the bulk data (left) and the 25 simulated GCs (not from the extracted GC data) (right). The measured affinity values from the extracted GC data (which extends only to time 20) are shown in Figure 4.

Tables

Table 1
Simulation parameter values for the sample used for training (left column, in which each GC has different parameters), and the ‘central data mimic’ sample used to evaluate final results (right column, in which all GCs have the same, data-inferred parameters).

Square brackets indicate a range, from which values are chosen either as detailed in the text (for sigmoid parameters) or uniformly at random. The mutability multiplier is an empirical factor that modulates the intensity of mutation to more closely match the speed of evolution to that observed in data.

TrainingCentral data mimic
Birth response functionSigmoidSigmoid
xscale (xc)[0.01, 2]1.6
xshift (xh)[–0.5, 3]2.0
yscale (yc)[0.5, 35]18.2
yshift (yh)[0, 0.6]0.4
Carrying capacity[500, 2000]500
Capacity methodbirthbirth
Time to sampling[10, 35]20
# sampled cells/GC[50, 130][60, 95]
# GCs50,000120
Naive birth rate[0.1, 15]1.2
Death rate (functional)[0.05, 0.5]0.2
Death rate (stops)1010
Initial population[8-128]128
Mutability multiplier0.680.5
Table 2
Neural network architecture.
OrderTypeFiltersKernel sizePool sizeStride lengthN unitsN params
11D convolutional254426
21D convolutional2542525
3Max pooling22
41D convolutional4044040
5Global avg. pooling
6Dense481968
7Dense321568
8Dense16528
9Dense8136
10Dense327
Table 3
Clip function bounds for neural network training.
ParameterMin valueMax value
xscale (xc)0.0013.5
xshift (xh)–1.55
yscale (yc)0.165
yshift (yh)010

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Duncan K Ralph
  2. Athanasios G Bakis
  3. Jared G Galloway
  4. Ashni A Vora
  5. Tatsuya Araki
  6. Gabriel D Victora
  7. Yun S Song
  8. William S DeWitt
  9. Frederick A Matsen
(2026)
Inference of germinal center evolutionary dynamics via simulation-based deep learning
eLife 14:RP108880.
https://doi.org/10.7554/eLife.108880.3