Study analysis workflow.

BioVacSafe (a) mouse and (b) human transcriptomic datasets were retrieved, (c) normalized, and mapped through orthologous probes to enable cross-species comparison. A penalized ordinal regression model was trained on mouse muscle data to derive a time-specific reactogenicity signature. (d) The signature was subsequently applied to mouse and human blood to assess its translatability across compartments and species.

Pan-vaccine signature definition and evaluation in mouse muscle.

(a) A penalized ordinal regression model was trained and evaluated using repeated nested cross-validation at each time point, with performance assessed by the weighted F1-score on both the full mouse microarray and the subset of probes orthologous to human genes. (b) Gene importance for predicting reactogenicity was assessed by stability selection. The top 10 most frequently selected genes at 24, 48, and 72 hours are displayed. (c) Expression trajectories of the top 27 genes are shown over time for each reactogenicity class.

Predicted reactogenicity in mouse blood.

(a) Latent variables predicted across all combinations of three time points (24h, 48h, 72h), with models trained in mouse muscle and applied in mouse blood. (b) Temporal expression profiles of the most predictive genes for each reactogenicity class in mouse blood.

Predicted reactogenicity in human blood.

(a) Latent variables predicted across all combinations of three time points (24h, 48h, 72h), using models trained in mouse muscle and applied in human blood. (b) CRP and (c) ReactoScore measured in the same individuals for comparison.

Principal component analysis of vaccine formulations across three transcriptomic datasets.

(a) Percentage of variance explained by the principal components. (b) Sample projections onto the first two principal components, coloured by vaccine formulation (top) and by time point (bottom).

Principal component analysis of vaccine formulations by time point across transcriptomic datasets.

Samples are projected onto the first two principal components, coloured by vaccine formulation, with ellipses representing reactogenicity classes.

(a) Confusion matrices comparing predicted (columns) and observed (rows) reactogenicity classes in mouse muscle by time point. Results are averaged over 100 repetitions of nested cross-validation. (b) Comparison of gene predictive power (selection frequency computed via stability selection) and univariate differential expression (log fold change and p-value) between low and high reactogenicity classes at 24h, 48h and 72h. (c) Gene-set enrichment analysis on selection frequency, measuring the association between gene ranking and MSigDB gene sets.

(a) Once trained on mouse muscle, the two and three-class models were evaluated on mouse blood at all combinations of time points using the weighted F1-score. (b) Scatter plots showing the relationship between log₂ fold-changes in mouse muscle and blood across low and high reactogenicity classes at 24h, 48h, and 72 h for the most predictive genes (selection frequency greater than 10%). (c) Adjusted t-test p-values for all pairwise comparisons of latent variables predicted at three time points (24h, 48h, and 72 h) using two and three-class models trained on mouse muscle and applied to mouse blood.

Adjusted t-test p-values for all pairwise comparisons of (a) latent variables predicted at three time points (24h, 48h, and 72 h) using two and three-class models trained on mouse muscle and applied to human blood; (b) CRP and (c) reactoScore values.