Predicting structurally stable epitopes from the SARS-CoV-2 proteome.

(A) SARS-CoV-2 reference genome and proteome. (Red – non-structural proteins; Cyan – structural proteins). (B) Peptides with lower ΔASAr are more likely to adopt similar conformations as free peptides and are hypothesized to be more immunogenic than peptides covering the same region having higher ΔASAr. (C) The structural proteins of a SARS-CoV-2 virion. (D) Count of the 196 peptides from SARS-CoV-2 proteome by viral protein. (E) Top: the blue dots show the values of ΔASAr for all possible 10-100 residue long peptides in spike along the linear sequence. Peptides are represented by their midpoints. Green dots mark the selected peptides using our structure-guided approach. Bars on top are co-linear with the residue index and show the immunogenic profile of spike as determined by VirScan phage-display (orange) or by our approach (green). VirScan: Z-score difference, ours: ELISA ratio of positive sera to negative sera smoothened with a sliding window average of +/- 10aa and normalized to a scale of 0-1. Without using the sliding window average the Pearson’s correlation is r = 0.68 (p=8.21×10-129). Bottom: the amino acid region 750-900 of spike is shown in the structural context (left) or as a ΔASAr distribution (right). Orange dots are midpoints of VirScan peptides, which generally have higher ΔASAr values, as highlighted by the density diagram on the right (p<1×10-3 comparing all Spike proteins for both methods, Wilcoxon two-sided rank sum test).

Experimental validation of immunogenicity: an exposed Membrane epitope (M1) is an outlier with an IgM dominant response.

(A) Selection (red) of eight non-overlapping peptides based on immunogenicity. The ratios of the mean of at least three technical replicates for pooled positive sera to pooled negative sera (representative results of at least two biological repeats (i.e. antigens expressed and purified independently) against individual peptides from the SARS-CoV-2 structural proteins. (B) Position of the selected (red) epitopes on the Spike trimer and Nucleoprotein dimer models. (C) Position of the M1 epitope on the extravirion surface of the Membrane protein dimer (TM = transmembrane domain). (D) Ratio of IgM/IgA (M1 outlier, p=2.1×10-8) and IgM/IgG (M1 outlier, p=1.1×10-8) for all peptides. Larger size corresponds to higher absolute IgM immunogenicity. P values from Grubb’s outlier test. (E) Heatmap of individual NIBSC reference sera (rows) against peptides (columns) for a selection of epitopes. NIBSC reference panel uninfected individuals (top panel) and individuals after confirmed SARS-CoV-2 infection (bottom panel). Demonstrating individual heterogeneity in the immune response to the eight selected peptides and other peptides from across the structural proteins of the virus.

The antibody kinetics for M1 antigen show it provokes a strong IgM response in most exposed persons but isotype class switching to IgG occurs late in only a minority of individuals.

(A) Heatmap of IgM response by ELISA for 30 European individuals collected pre-2019. (B) Proportion of three NHS clinical cohorts with M1 IgM above the mean + 3SD of the responses in A. (C) IgM and IgG titre to M1 in two clinical cohorts (Edinburgh and Manchester). Dashed line mean + 2SDs of responses for European negative controls (as in A). (D) IgM for the M1 peptide titre tends to be higher than any of the other peptides or the whole receptor binding domain early in the course of infection. (E) Coefficients for time post PCR of random intercept models by antigen. M1 is the only epitope to show a significantly increasing titre in IgG over the three months post PCR. (F) IgM titres to M1 fall over the 3 months post infection and fall fastest for those with highest titres. (G) IgM to M1 and spike S1’ subunit predict aggregate whole spike IgG titre measured by a Euroimmun assay. ns = not significant, * = p<.05; ** p<.01; *** p<.001; **** p<.0001, all post-ANOVA pairwise comparisons are two-sided t-tests.

M1 IgM is a strong predictor of severe/critical acute COVID-19.

(A) Oxford cohort heatmap of individual sera (rows) against peptides (columns) for three immunoglobulin isotypes (column panels). Individuals represented by rows and separated by clinical severity of COVID-19 infection (row panels). (B) Penalized (Lasso) logistic regression demonstrates M1 IgM is the strongest predictor of the outcome (severe/critical COVID-19) of the ELISA responses to the eight peptides and receptor binding domain and age and sex: requires the strongest shrinkage/penalization factor to reduce the coefficient to null demonstrated as it reaches the null (0) coefficient line furthest along X axis. M1 IgM is the only antibody response that is retained in the model as a stronger predictor of severity than age. (C) Multi-variable logistic regression models (without penalization). Points are effect size estimate (adjusted odds ratio) and whiskers 95% confidence intervals. All models adjusted for days post PCR positive. Significance indicated by * p<.05; ** p<.01; *** p<.001 for Wald test for coefficient.

Persistent M1 IgM is associated with longCOVID and symptom burden.

(A) Heatmap of Scottish National Blood Transfusion plasma donors who donated plasma after infection early in the coronavirus pandemic for trials of therapy with convalescent plasma. K-means clustering identifies two clusters with persistent very high IgM M1 (bottom and second bottom), one cluster with little persistent IgM (large middle group), one cluster with medium IgM M1 (second top), and one cluster with persistent S51 IgM and two nucleoproteins. (B) IgM responses in the final visit of the Edinburgh cohort to M1 antigen and a whole spike S1’ subunit. The whole spike S1’ subunit used rather than other epitopes because no single epitope is close to the M1 epitope for IgM publicness and spike S1’ subunit likely reflects an aggregate of 100s of potential epitopes. LOD = limit of detection for calling positivity based on results in negative control subjects. (C) IgM responses in the long covid cohort to M1 antigen and a whole spike S1 subunit. (D) Chalder fatigue scale (y axis) against PHQ score for anxiety and depression. Subjects represented by points with persistent M1 IgM (cyan) and undetectable M1 IgM (red). (E) Chalder fatigue scale (y axis) against SF-12 score for a health-related quality of life score. SF-12 is comprised of two sub scores with mean 50 reflecting physical and mental domains of quality of life. Here these domains have been combined. Coloured as in D.

Lack of isotype class switching may be due to T independent B cell activation triggered by the repetitive arrangement of M1 antigen on virions.

(A) Top down view of the membrane protein dimer from the extra-virion perspective. (B) Repetitive surface of virions with membrane proteins arranged presenting M1 on their surface. (C) Model of T independent B cell activation by repetitive antigen. 1 –arrangement of repeating M1 antigens on a virion clusters membrane bound immunoglobulin (B cell receptors) on the surface of M1 specific B cells. 2 – Cross-linking activation of 10-20 clustered B cell receptors and active BtK is sufficient to trigger calcium influx resulting in B cell activation without T cell help. 3 – secreted antibody (IgG shown) binding to surface epitopes can negatively regulate TI responses by competing with mIg binding or by triggering signaling through surface receptors on the B cell. 4 – Cytokine/IFN release may potentiate or be triggered by TI activation or necessary from other cell types (e.g. Macrophages) and antigen specific IgM is secreted with limited isotype class switching. (D) Schematic illustrating production of spikeless virus-like particles and B cell in vitro stimulation assay. (E) Incubation of B cells with VLPs stimulates IgM secretion compared to unstimulated (PBS) or incubation with disassembled VLPs pre-treated with Triton-X-100, which have otherwise identical protein composition. *** = adjusted p<.001; **** = adjusted p<.0001 for Tukey’s post-hoc pairwise multiple comparison test. Shown are comparisons between IgM and other isotypes for VLPs at high concentration (1:10), all other pairwise comparisons were not significant.