Invasion scenario for Top row: one immune group, Middle row: M = 10 immune groups and fast mixing Cij = 1/M and Bottom row M = 10 immune groups and slow mixing Cij = 1/10M. Other parameters are the same for all rows: in units of δ, we set α = 3 and γ = 5 · 10−3, and f = 0.65, b = 0.8, ε = 0.99. For both rows, graphs represent: Left: number of hosts infectious with the wild-type and the variant; Middle: number of hosts susceptible to the wild-type and the variant, with the equilibrium value δ/α as a gray line; Right: fraction of the infections due to the variant. The thick gray line shows the expected equilibrium frequency ¡3 in the case with one immune group, given in Eq. 6. The dashed line shows the trajectory of a constant fitness logistic growth with the same initial growth rate.

A Simulation of SIR Equations 1 & 2 with additional strains appearing at regular time intervals. The fraction of infections (frequency) caused by each strain is shown as a function of time. The first strain to appear at t = 0 is the variant of interest, and curves are shown in shades of red if they appear on the background of this variant, and of blue if they appear on the background of the wild-type. B: Same as A but with frequencies stacked vertically. The black line delimiting the red and blue areas represents the frequency at which the mutations defining the original variant are found. C: Three realizations of the random walk of Equation 8, all starting at x ≃ 0.5. Two instances converge rapidly to frequency 0 and 1, corresponding to apparent selective sweeps, while the remaining one oscillates for a longer time. D: Representation of a partial sweep using the scalar fitness parametrization of Equation 10. The frequency x of the variant is shown as a blue line saturating at value β (gray line). The thin dashed line shows a selective sweep with constant fitness advantage s0. The fitness s is a red dashed line, using the right-axis.

Retrospective analysis of predictability of viral evolution: frequency trajectories of all amino acid substitutions that are observed to rise from frequency 0 to x* for Top: influenza virus A/H3N2 from 2000 to 2023, and Bottom: SARS-CoV-2 from 2020-2023. Left: all trajectories for x* = 0.4, with blue ones ultimately vanishing and red ones ultimately fixing. The average of all trajectories is shown as a thick black line. Right: showing only the average trajectories for different values of x* (grey lines).

Simulations under the Wright-Fisher model with expiring fitness. A: Average frequency dynamics of immune escape mutations that are found to cross the frequency threshold x* = 0.5, for four different rates of fitness decay. If the growth advantage is lost rapidly (high v/s0), the trajectories crossing x* have little inertia, while stable growth advantage (small v/s0) leads to steadily increasing frequencies. B,C,D: Ultimate probability of pfix (x) of trajectories found crossing frequency threshold x. Each panel corresponds to a different rate of emergence of immune escape variants, with four rates of fitness decay per panel. Increased clonal interference ρ/s0 and fitness decay v/s0 both result in gradual loss of predictability. E: Time to most recent common ancestor TMRCA for the simulated population, as a function of the prediction obtained using the random walk . Points correspond to different choices of parameters s0

Left Dynamics of the frequency of the variant for the SIR model from Equations A12&A13 using the invasion scenario from the main text. Two 2 × 2 cross-immunity matrices are used, with off-diagonal parameters f and b chosen to give the same equilibrium. The gray line represents the equilibrium that would be obtained using the model of the main text. Right: Equilibrium frequency β for this new SIR model (y-axis) versus the β from the main text (x-axis). Each point corresponds to a given pair (f, b).

Probability of a strictly monotonous trajectory in the random walk of the main text, as a function of β (fixed) and the initial value x0. The “exact” solution is obtained by numerically computing the product in Equation B1 up to t = 100.

Average coalescence times for a partial sweep coalescent with effective population size Ne and a Kingman coalescent with population size Ne. For simplicity, a constant β is used: Left: a high value β = 0.25; Right: a low value β = 0.05. For low β, the two coalescent processes are very similar until a high n. They considerably differ if β is larger. Note that for the partial sweep process, Tn never goes below ρ−1.

Realisations of different coalescence processes for 30 lineages (leaves). Left: Partial sweep coalescent, with constant β = 0.4 and ρ = 0.00625 such that Ne = (ρβ2)−1 = 1000. Right: Kingman coalescent with population size N = Ne = 1000.