Research Article

Evolutionary Biology

Inference of germinal center evolutionary dynamics via simulation-based deep learning

Fred Hutchinson Cancer Research Center, United States
Department of Statistics, University of California, Irvine, United States
Laboratory of Lymphocyte Dynamics, The Rockefeller University, United States
Howard Hughes Medical Institute, United States
Department of Statistics, University of California, Berkeley, United States
Computer Science Division, University of California, Berkeley, United States
Department of Genome Sciences, University of Washington, United States
Department of Statistics, University of Washington, United States

Apr 28, 2026

https://doi.org/10.7554/eLife.108880.3

Open access
Copyright information

Figures
Tables
Additional files

8 figures, 3 tables and 1 additional file

Figures

Figure 1 with 3 supplements

Download asset Open asset

Overview of the workflow for this paper.

After specifying a birth-death model with sigmoid affinity–fitness response function, we simulate many trees and their sequences at each node, with parameters roughly consistent with our real data samples. The simulation is cell-based and implements a carrying-capacity population size limit. The results of the simulation are then encoded and used to train a neural network that infers the sigmoid response parameters on real data. In addition to encoded trees, the network also takes as input the assumed values of several non-sigmoid parameters (carrying capacity, initial population size, and death rate). Inference on real data is performed many times, with many different combinations of non-sigmoid parameter values, and additional ‘data mimic’ simulation is generated using each of the resulting inferred parameter value combinations. The summary statistics of each data mimic sample are compared to real data, with the best match selected as the ‘central data mimic’ sample with final parameter values for both sigmoid (inferred with the neural network) and non-sigmoid (inferred by matching summary statistics) parameters. Plots from elsewhere in the paper are rendered in schematic form: those in ‘infer on data’ refer to Figure 4—figure supplement 1, and those in ‘simulate with inferred parameters’ to Figure 5.

Figure 1—figure supplement 1

Download asset Open asset

Simulation response function example with sampled affinity values.

Example of simulation response function (red, left axis) and resulting node affinity values (histogram, right axis). The response function describes the relationship between affinity and fitness, while the node values show the actual affinity values of nodes in the resulting simulation for both internal (blue) and leaf (orange) nodes. Note that the count is not sampling the response function, so we do not expect the histogram and the function to match.

Figure 1—figure supplement 2

Download asset Open asset

Diagram of curve difference loss function calculation.

Curve difference loss function calculation. We divide the ‘difference’ area (red bars) between the true (green) and inferred (red) curves by the area (light red) under the true curve within the bounds [−2.5,3]. The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.

Figure 1—figure supplement 3

Download asset Open asset

Example of approximate sigmoid parameter degeneracy.

A potential pair of true (green) and inferred (blue) sigmoid curves, with true (inferred) parameter values. In the range shown, the inferred curve has compensated for a too-small xscale (transition steepness) by increasing xshift (shifting to the right) and yscale (upper asymptote). This results in a curve difference loss value of only 9%, which is much smaller than our expected inference accuracy.

Figure 2 with 1 supplement

Download asset Open asset

Training and testing results for the sigmoid model on simulation.

We show curve difference loss distributions on several subsets of the training sample (where each GC has different parameter values): training and validation (left) and testing (right). For computational efficiency when plotting, the curve difference distributions display only the first 1000 values. See Figure 3 for per-bin model.

Figure 2—figure supplement 1

Download asset Open asset

Example true and inferred response functions on the training sample.

The ‘diff’ entry in the per-plot table gives the curve difference loss.

Figure 3

Download asset Open asset

Training and testing results for the per-bin model on simulation.

See Figure 2 for details.

Figure 4 with 2 supplements

Download asset Open asset

Inferred response functions on real data for sigmoid (left) and per-bin (right) models, corresponding to the non-sigmoid parameter values yielding simulation with the best-matching summary statistics.

The medoid curve is shown in orange with 68% and 95% confidence intervals in blue, with observed affinity values in gray.

Figure 4—figure supplement 1

Download asset Open asset

Example inferred sigmoid curves on data for four representative GCs.

Figure 4—figure supplement 2

Download asset Open asset

Example inferred per-bin response functions on data for four representative GCs.

Figure 5 with 2 supplements

Download asset Open asset

Summary statistics on data vs simulation for the central data mimic simulation sample that most closely mimics inferred data parameters.

Simulation truth (dashed green) is unobservable and shown only for completeness; the important comparison is between purple and green solid lines, where both data and simulation have been run through IQ-TREE.

Figure 5—figure supplement 1

Download asset Open asset

Additional summary statistics distributions for the same central data mimic sample as the main figure.

Figure 5—figure supplement 2

Download asset Open asset

Summary statistic distributions for the simulation sample used for training.

In order to allow easy comparison of abundance distributions, these plots include only 120 of the simulated trees. This sample is designed to have a wide range of parameters, encompassing all plausible true data values, and is thus not designed to closely match data summary statistics.

Figure 6

Download asset Open asset

Inferred sigmoid curves for the central data mimic data-like simulation sample.

The medoid curve is shown in orange, and the true curve is shown in green. This sample consists of 120 GCs all simulated with the same sigmoid parameters (from the central data prediction) and non-sigmoid parameters (from the best-matched summary statistics).

Figure 7

Download asset Open asset

Effective birth rates (left column) calculated for three GCs simulated with the central data mimic parameters.

The effective birth rate for cell i is defined as $m λ_{i} - μ_{i}$ for the carrying capacity modulating factor m, intrinsic birth rate $λ_{i}$ , and death rate $μ_{i}$ . It is shown at several different time values for all living cells, except that for plotting clarity, cells closer to each other than 0.1 in affinity are not shown. Note that we extend time here to 50 days in order to aid comparison to the bulk data used by the traveling wave model (DeWitt et al., 2025, Figure S6D), which assumes a steady state. (Our extracted GC data is sampled at 15 and 20 days.) Vertical lines (left and middle columns) mark the point at which the slope has dropped to 1/2 its maximum value (if absent, the slope never falls below this threshold). We also show the intrinsic birth rate λ (middle column, solid red curve), and include histograms of sampled affinities at the final timepoint. The right column shows time traces of the number of living cells (i.e. nodes, in blue) and the mean affinity of all living cells (red).

Figure 8

Download asset Open asset

The net growth rate from DeWitt et al., 2025 Figure S6D (left) compared to our effective birth rate (right) for three different ‘target’ mean affinity values.

We selected three well-spaced timepoints from Figure S6D, and matched the corresponding mean affinity values from the bulk data to particular time slices in 25 GCs simulated with our central data mimic parameters (inferred on the extracted GC data) out to time 70. The net growth rate describes the net population growth at affinity x and time t in a traveling wave fitness model (see text). In our model, the effective birth rate is defined and plotted as in Figure 7. We also show stacked histograms of observed affinity values from the bulk data (left) and the 25 simulated GCs (*not* from the extracted GC data) (right). The measured affinity values from the extracted GC data (which extends only to time 20) are shown in Figure 4.

Tables

Table 1

Simulation parameter values for the sample used for training (left column, in which each GC has different parameters), and the ‘central data mimic’ sample used to evaluate final results (right column, in which all GCs have the same, data-inferred parameters).

Square brackets indicate a range, from which values are chosen either as detailed in the text (for sigmoid parameters) or uniformly at random. The mutability multiplier is an empirical factor that modulates the intensity of mutation to more closely match the speed of evolution to that observed in data.

	Training	Central data mimic
Birth response function	Sigmoid	Sigmoid
xscale ( $x_{c}$ )	[0.01, 2]	1.6
xshift ( $x_{h}$ )	[–0.5, 3]	2.0
yscale ( $y_{c}$ )	[0.5, 35]	18.2
yshift ( $y_{h}$ )	[0, 0.6]	0.4
Carrying capacity	[500, 2000]	500
Capacity method	birth	birth
Time to sampling	[10, 35]	20
# sampled cells/GC	[50, 130]	[60, 95]
# GCs	50,000	120
Naive birth rate	[0.1, 15]	1.2
Death rate (functional)	[0.05, 0.5]	0.2
Death rate (stops)	10	10
Initial population	[8-128]	128
Mutability multiplier	0.68	0.5

Table 2

Neural network architecture.

Order	Type	Filters	Kernel size	Pool size	Stride length	N units	N params
1	1D convolutional	25	4				426
2	1D convolutional	25	4				2525
3	Max pooling			2	2
4	1D convolutional	40	4				4040
5	Global avg. pooling
6	Dense					48	1968
7	Dense					32	1568
8	Dense					16	528
9	Dense					8	136
10	Dense					3	27

Table 3

Clip function bounds for neural network training.

Parameter	Min value	Max value
xscale ( $x_{c}$ )	0.001	3.5
xshift ( $x_{h}$ )	–1.5	5
yscale ( $y_{c}$ )	0.1	65
yshift ( $y_{h}$ )	0	10

Additional files

MDAR checklist: https://cdn.elifesciences.org/articles/108880/elife-108880-mdarchecklist1-v1.pdf
Download elife-108880-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Duncan K Ralph
Athanasios G Bakis
Jared G Galloway
Ashni A Vora
Tatsuya Araki
Gabriel D Victora
Yun S Song
William S DeWitt
Frederick A Matsen

(2026)

Inference of germinal center evolutionary dynamics via simulation-based deep learning

eLife 14:RP108880.

https://doi.org/10.7554/eLife.108880.3

Share this article

Cite this article

Overview of the workflow for this paper.

Simulation response function example with sampled affinity values.

Diagram of curve difference loss function calculation.

Example of approximate sigmoid parameter degeneracy.

Training and testing results for the sigmoid model on simulation.

Example true and inferred response functions on the training sample.

Training and testing results for the per-bin model on simulation.

Inferred response functions on real data for sigmoid (left) and per-bin (right) models, corresponding to the non-sigmoid parameter values yielding simulation with the best-matching summary statistics.

Example inferred sigmoid curves on data for four representative GCs.

Example inferred per-bin response functions on data for four representative GCs.

Summary statistics on data vs simulation for the central data mimic simulation sample that most closely mimics inferred data parameters.

Additional summary statistics distributions for the same central data mimic sample as the main figure.

Summary statistic distributions for the simulation sample used for training.

Inferred sigmoid curves for the central data mimic data-like simulation sample.

Effective birth rates (left column) calculated for three GCs simulated with the central data mimic parameters.

The net growth rate from DeWitt et al., 2025 Figure S6D (left) compared to our effective birth rate (right) for three different ‘target’ mean affinity values.

Simulation parameter values for the sample used for training (left column, in which each GC has different parameters), and the ‘central data mimic’ sample used to evaluate final results (right column, in which all GCs have the same, data-inferred parameters).

Neural network architecture.

Clip function bounds for neural network training.

MDAR checklist

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)