Simultaneous polyclonal antibody sequencing and epitope mapping by cryo electron microscopy and mass spectrometry – a perspective

Douwe Schulte; Marta Šiborová; Lukas Käll; Joost Snijder

doi:10.7554/eLife.101322.1

eLife Assessment

This manuscript describes a method using EM polyclonal epitope mapping to help elucidate endogenous antibodies. Overall the work described is interesting and the contribution will be of use to the field that is expected to only increase in impact and value over time. The significance of the work is considered valuable and the strength of evidence to support its findings is considered solid.

https://doi.org/10.7554/eLife.101322.1.sa2

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

solid: Methods, data and analyses broadly support the claims with only minor weaknesses

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Antibodies are a major component of adaptive immunity against invading pathogens. Here we explore possibilities for an analytical approach to characterize the antigen-specific antibody repertoire directly from the secreted proteins in convalescent serum. This approach aims to perform simultaneous antibody sequencing and epitope mapping using a combination of single particle cryo-electron microscopy (cryoEM) and bottom-up proteomics techniques based on mass spectrometry (LC-MS/MS). We evaluate the performance of the deep-learning tool ModelAngelo in determining de novo antibody sequences directly from reconstructed 3D volumes of antibody-antigen complexes. We demonstrate that while map quality is a critical bottleneck, it is possible to sequence antibody variable domains from cryoEM reconstructions with accuracies of up to 80-90%. While the rate of errors exceeds the typical levels of somatic hypermutation, we show that the ModelAngelo-derived sequences can be used to assign the used V-genes. This provides a functional guide to assemble de novo peptides from LC-MS/MS data more accurately and improves the tolerance to a background of polyclonal antibody sequences. Following this proof-of-principle, we discuss the feasibility and future directions of this approach to characterize antigen-specific antibody repertoires.

Introduction

Adaptive immunity to invading pathogens is mediated to an important degree by antibodies^1–4. The available repertoire of antibodies is unique in individuals and constantly shifting in response to immunological pressure under which activation, selection, proliferation, maturation, and differentiation of the antibody-producing B-cells takes place^4–6. Therefore, understanding the molecular mechanisms of antibody-mediated immunity requires knowledge of how a complex repertoire of polyclonal antibodies targets a diverse landscape of epitopes on their respective antigens.

The analytical challenge at hand is to resolve antigen-antibody interactions down to the pairwise contacts between specific amino acid residues in the epitope and paratope regions, respectively. This would enable reconstruction of the evolutionary pathways of somatic recombination and hypermutation that lead to high-affinity antigen binding. Conversely, this would also reveal how this selective immune pressure drives the evolutionary pathways of antigenic drift in targeted pathogens^7–10. Knowledge of the precise antibody sequences, as well as near-atomic details of the epitope-paratope interaction, are thus prerequisites to understand the coevolution between replicating pathogens and antibody-mediated adaptive immunity in the host.

Current methods to determine the antigen-specific antibody repertoire rely on single (memory) B-cell sorting, followed by targeted sequencing of the coding mRNAs for the heavy and light chains^11,12. This enables the production of recombinant monoclonal antibodies, whose epitopes may be mapped to near-atomic structural detail by X-ray crystallography and cryo electron microscopy (cryoEM). This approach has generated a wealth of information about antibody-antigen interactions, though it is biased by the limited pool of memory B-cells that it probes. Antibodies function as circulating glycoproteins in bodily fluid, secreted from plasma cells which are in turn derived from diverse pools of memory B cells located in various tissues, including bone marrow, spleen, lymph nodes, and only to a minor degree in blood^13–15. Serological assays aimed to determine binding and neutralization titers specifically look at the secreted antibody in bodily fluid, and it remains an outstanding question how this ‘serum compartment’ of the antibody repertoire relates both qualitatively and quantitatively to the minor population of memory B-cells found in peripheral blood. This calls for new analytical approaches that can derive both antibody sequence and epitope information straight from the secreted antibody product.

Such approaches have been developed in recent years, based on mass spectrometry and electron microscopy. First, using a bottom-up proteomics approach, antibody-derived peptides can be sequenced de novo from fragmentation spectra and assembled into full heavy/light chain sequences^12,16. Sequence accuracy is such that functional monoclonal antibodies can be reconstructed from the input data, and several reports have described successful sequencing efforts of human serum, milk, and urine-derived antibodies^17–32. Second, both hydrogen-deuterium exchange mass spectrometry and electron microscopy have been used to resolve a complex landscape of epitopes targeted by polyclonal antibody mixtures^33–42. Ward and colleagues have reported that with the latter approach, which they coined Electron Microscopy based Polyclonal Epitope Mapping (EMPEM), they obtained near-atomic resolution reconstructions by cryoEM to resolve side-chain densities in the epitope-paratope region³³. This opens the possibility to derive antibody sequence information and integrate this into the structural modelling of the interaction. In essence, the reconstructions might reveal which antibodies from the complex polyclonal mixture bind which epitopes on the antigens. The improved resolutions of current cryoEM approaches thus allow for a type of visual proteomics in which protein identity (i.e. antibody sequence) may be directly inferred from the reconstructed 3D volumes^43–51. While the pure sequencing accuracies from these approaches is obviously limited by resolution/map quality, many tools have been recently developed to infer protein identity in an automated fashion, including cryoID, DeepTracer-ID, findmysequence, and most recently ModelAngelo^52–55.

Here we explore the use of ModelAngelo to derive de novo antibody sequences from experimental cryoEM density maps of antibody-antigen complexes. We have previously developed the software tool Stitch, which sorts and assembles MS-derived peptide sequences into full heavy/light chain sequences across complex repertoires^56,57. We adapted Stitch to perform the same task on ModelAngelo-derived de novo models and test the accuracy of the approach on a benchmark of 164 publicly available cryoEM maps of monoclonal antibody-antigen pairs. We demonstrate that map quality is a critical bottleneck, but that antibody sequences can be derived with up to 80-90% accuracy. We test the utility of these sequences for assigning the used V-genes, which together with reconstruction of CDRH3 may offer a useful guide to assemble more accurate MS-derived peptide sequences. We show that such EM-derived templates indeed improve MS-based sequencing accuracy in the context of complex antibody mixtures and that publicly available EMPEM reconstructions are of sufficient quality to leverage this approach. This proof-of-principle offers a promising perspective to integrate cryoEM and MS methods for a comprehensive characterization of the antibody repertoire on both sequence and epitope levels.

Results

To assess the feasibility of deriving de novo antibody sequences from experimental cryoEM density maps, we assembled a benchmark dataset from the Electron Microscopy Data Bank (see Supplementary Table S1). To infer the antibody sequences, we chose the recently published deep-learning tool ModelAngelo, developed by Jamali, Kimanius, Scheres, and colleagues, as it is capable of inferring complete de novo models in cryoEM density maps without the need for user input sequences or main chain models^52–55. We searched the EMDB for published maps containing antigen-binding fragments (Fabs) of any species, at nominal resolutions of ≤ 4 Å, released after the training data for ModelAngelo was obtained. These results were filtered for maps that included a deposited atomic model and which contained only a single unique monoclonal Fab (of which multiple copies may be bound in the reconstructed antigen complex). The final benchmark includes 164 maps, including Fabs from human, rabbit, mouse, and macaque species. The maps were used as input data for ModelAngelo without user-provided sequences, yielding completely de novo atomic models of both antigen and Fabs.

The output models from ModelAngelo are typically fragmented to varying degrees depending on the local quality of the map. In addition, maps may contain multiple copies of the same unique Fab molecule. We therefore aimed to consolidate all fragments for the built Fabs into a single consensus sequence for the antibody variable domains of the heavy and light chains. We have previously developed the software Stitch, which performs assembly of LC-MS/MS derived de novo peptide reads into the correct framework of the heavy and light chains by alignment to germline-template sequences from the ImMunoGeneTics (IMGT) database^58–60. We adapted Stitch to use models from ModelAngelo (or any mmCIF file) as input data, extract the amino acid sequences, and perform the same template-based assembly with the resulting reads. Of the 164 input maps, 141 and 144 yielded a non-zero alignment score for the heavy and light chains in the Stitch result, respectively. These add up to a total of 152 maps which were analyzed further to evaluate the sequence accuracy of this approach (including 134 maps with a non-zero alignment score for both heavy and light chain).

Results for one of the top-scoring entries in the benchmark, an Influenza B virus neuraminidase tetramer in complex with four identical copies of a neutralizing Fab⁶¹, is shown in Figure 1. The consensus sequences generated in Stitch have complete coverage of both heavy and light chain variable domains, including both CDRL3 and CDRH3. The de novo determined sequence is 84% and 86% identical to the true heavy and light chain sequences, respectively. The 15% rate of sequencing errors exceeds the typical levels of somatic hypermutation observed in mature antibody sequences, which are on the order of 1-10%. While the derived sequence should therefore not be taken at face value to reconstruct recombinant monoclonal antibodies, we reasoned that the accuracy is nevertheless likely sufficient to correctly infer the corresponding germline V-genes of the mature antibodies.

Determining de novo antibody sequences from cryoEM data with ModelAngelo.
A) Exemplary map (with top10% alignment scores of 1076/1174 for HC/LC) from the benchmark dataset, representing an Influenza B virus neuraminidase (NA) in complex with four copies of a neutralizing Fab, at global FSC resolution of 2.3 Å. Shown are the deposited map, model, and the de novo model generated by ModelAngelo, along with a detailed view of CDRH3. B) Consensus sequences for heavy and light chains as generated by Stitch compared to the true sequences. Sequencing errors are indicated by an asterisk (*).

For each of the 152 maps, we compared the pairwise sequence identity between the top scoring V-genes from the ModelAngelo input in Stitch with the top scoring V-genes from the true sequences (as deposited in the corresponding PDB entry). For reference, we also calculated the pairwise sequence identities of all available V-gene templates per species, reflecting what a completely random draw from the available V-gene repertoire would look like. As shown in Figure 2, the top scoring V-genes from the ModelAngelo sequences have significantly higher pairwise identities than a random draw from the V-gene repertoire in IMGT for both the heavy and light chains (p < 0.0001 in unpaired, two-tailed Kolmogorov Smirnov tests). Furthermore, the identity of the inferred V-gene with the true sequence scales with the alignment scores in Stitch, making it a valuable metric for the quality of the V-gene inference. The mean/median V-gene identity in a random draw from the IMGT repertoire is 0.59/0.52 and 0.56/0.53 for the heavy and light chains, respectively. With the ModelAngelo-derived sequences this improves to 0.78/0.87 and 0.75/0.85 for the heavy and light chains in the full dataset. This gradually improves with higher alignment scores to 0.89/1.00 and 0.88/1.00 for heavy and light chains, starting at a cutoff of 80 (representing 92/141 and 80/134 maps for heavy and light chain, respectively). The complete complementarity determining regions CDRH3 and CDRL3 were covered in 66 and 68 maps for the heavy and light chain, respectively (see Supplementary Figure S1). The length of the complete antigen binding loops was estimated with an average error of 0.5 ± 3.3 or 1.7 ± 6.0 residues for heavy and light chain, with average sequence identities of 0.63 and 0.41. We found the global FSC resolution of the input map to be a poor predictor of both the Stitch alignment score and the inferred V-gene identity, likely because it is dominated by the bulk of the antigen and not representative of the local resolution in the epitope-paratope region (see Supplementary Figure S2). These results demonstrate that candidate V-genes for the antibodies resolved in cryoEM densities can be accurately narrowed down using ModelAngelo and Stitch, and that a limited subset of maps contains accurate information on CDR3 sequence and length.

V-gene assignment from ModelAngelo data.
A) Correlation between Stitch alignment score and sequence identity between the top-scoring V-gene of the ModelAngelo vs PDB sequence of the heavy and light chain variable domains, as indicated with the non-parametric Spearman correlation coefficient. B) Distribution of V-gene sequence identity for progressive alignment score cutoffs, compared to the pairwise V-gene sequence identity in the IMGT repertoire.

In the context of polyclonal antibody mixtures, this analysis suggests that cryoEM densities of antigen-antibody complexes from EMPEM experiments can be leveraged to guide sequence assembly from complementary proteomics-based profiling of the same sample (see Figure 3). In such an experiment, reconstructed cryoEM densities would be used as input data for ModelAngelo, from which the sequences are extracted and run through Stitch to select the top-scoring V-gene and construct a placeholder sequence for CDR3 of both the heavy and light chain. These reconstructed variable domains may then act as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

Schematic workflow to estimate de novo antibody sequences by the integration of cryoEM and LC-MS/MS data with Stitch.

As a proof-of-principle, we tested this on the monoclonal antibody CR3022, for which both a cryoEM reconstruction and LC-MS/MS data are publicly available. This antibody was isolated from a convalescent survivor of a SARS-CoV infection and targets a cryptic, but conserved epitope at the base of the Spike receptor-binding domain, with cross-neutralization to SARS-CoV-2^62,63. The antibody consists of an IGHV5-51 heavy chain, paired with an IGKV4-1 light chain. When complexed with full-length SARS-CoV-2 Spike, its Fab induces an odd rearrangement of the Spike protomers to yield an antiparallel dimer of S1 subunits in the cryoEM reconstructions^64,65. When using this map as input for ModelAngelo and subsequently Stitch, the IGHV5-51 and IGKV4-1 germline sequences are correctly identified based on their alignment scores. Moreover, complete sequences for CDR3 of the heavy and light chains are built. When using these reconstructed variable domains as templates to guide assembly of the de novo peptide reads from the LC-MS/MS data published by Person and colleagues⁶⁶, the final consensus sequences are 96% and 99% identical to the true heavy and light chain respectively. Of note, the only three remaining errors in the six CDR sequences are I/L assignments, which have identical masses and are notoriously challenging for MS-based sequencing.

For this CR3022 dataset, derived from the monoclonal antibody, the LC-MS/MS data alone assembled with Stitch against the full range of V-genes already yields a similar accuracy of 97% and 98% for the heavy and light chain, respectively. For the case of an EMPEM experiment, the challenge would rather be to correctly assemble the CR3022 sequence against a background of unrelated antibody sequences in a complex mixture. We therefore tested the utility of the ModelAngelo-derived templates for sequence assembly by also mixing input reads from LC-MS/MS data of unrelated antibodies (Figure 4). First, we mixed in a background of whole IgG from a hospitalized COVID-19 patient⁵⁶, representing a diffuse polyclonal background in approximately a 1:1 ratio to the target input reads. Second, we mixed in a background of five additional unrelated anti-Influenza-HA monoclonal antibodies from the same study of Person and colleagues⁶⁶, amounting to a 5:1 ratio of background to target input reads. In our previous work on sequencing serum-derived antibodies by bottom-up proteomics with Stitch, we found that assembly of the consensus sequences becomes much more tolerant to background data if, beyond the top scoring V-gene, the remaining unrelated V-genes are also included as decoys for the final template matching step⁵⁶. The use of these decoy sequences is also included in the comparison here. Furthermore, we also included the use of the true CR3022 sequences as templates, serving as a best-case scenario, positive control. The analysis confirmed that even without ModelAngelo-derived templates, the sequence assembly with Stitch is already tolerant to the diffuse polyclonal background from the whole IgG fraction, yielding a similar accuracy as in the absence of these background peptides. By contrast, the sequence accuracy plummets to below 0.6 when the background consists of the five additional monoclonal antibodies in a 5:1 ratio to the target input reads. The accuracy is recovered to >0.95 when using either the true CR3022 sequences or the ModelAngelo-derived templates. There is also a gain in accuracy by using decoy templates with the LC-MS/MS data alone, though smaller compared to the use of ModelAngelo-derived templates. These results demonstrate that ModelAngelo-derived templates are useful for sequence assembly against complex polyclonal backgrounds.

Sequencing CR3022 with integrated cryoEM and LC-MS/MS data.
A) Shown are the deposited cryoEM map (global FSC resolution 4.1 Å), model, and de novo ModelAngelo output for the CR3022 Fab in complex with the SARS-CoV-2 Spike S1 subunit. The sequences were extracted from the de novo model and used as input for Stitch, resulting in the identification of the indicated V-genes and CDR3 sequences. These variable domains were used as templates in Stitch to assemble the LC-MS/MS derived de novo peptides. B) Consensus sequences for CR3022 from the integrated cryoEM-MS data in Stitch compared to the true heavy and light chain sequences. Sequencing errors are indicated with an asterisk (*).

Targeted sequencing of CR3022 against a complex background of other antibodies.
Plotted are the de novo consensus sequence identities derived from the LC-MS/MS data using either the true sequences, the full IMGT repertoire, or the ModelAngelo-derived variable domains as templates. We compare the output from the CR3022 dataset alone (‘No backgr.’) with the output after adding either a diffuse polyclonal IgG background from a COVID-19 patient (‘+whole IgG’) or full datasets from five additional anti-Influenza-HA monoclonals (‘+5 mAbs’). Use of decoy sequences as indicated by dark/light colors.

Inferring V-genes from published EMPEM data.
A) Plotted are all non-zero alignment scores in Stitch from published EMPEM maps. B) Views of the variable domains of EMPEM maps with alignment scores >50 for both heavy and light chains. The EMDB identifiers are indicated at each panel.

We have demonstrated that cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences. We also evaluated whether EMPEM data is indeed of sufficient quality to infer V-genes from automated de novo modeling in the maps. We downloaded all published EMPEM maps from EMDB of which 23 were of sufficient quality to give a non-zero alignment score when using the ModelAngelo results in Stitch (see Supplementary Table S2). Analysis of the benchmark of monoclonal antigen-antibody complexes showed that the quality of the V-gene inference scales with the alignment scores. In this set of EMPEM reconstructions, the alignment scores range from ca. 20 to 300. Of the 23 maps, 8 have alignment scores above 50 for both heavy and light chain, at which point we estimate the mean/median V-gene identity to be approximately 0.85/0.90. This analysis shows that experimental EMPEM studies may yield sufficiently detailed reconstructions of the antigen-antibody complexes to narrow down the candidate V-genes of the resolved Fabs accurately.

Discussion

The development of EMPEM and MS-based polyclonal antibody sequencing now make it possible to profile the antigen-specific antibody repertoire straight from the secreted pool of antibodies in bodily fluids. This approach bridges the gap between established single B-cell sequencing approaches and serological assays to probe binding and neutralization titers. The present work demonstrates that epitope and sequence information can be integrated using ModelAngelo and Stitch. This approach holds promise to better understand the serum compartment of the antibody repertoire. Use of the cryoEM data in these workflows complements the MS data beyond epitope mapping in several significant ways. First, we demonstrated that narrowing down the candidate V-genes improves the tolerance of LC-MS/MS-derived peptide sequence assembly to background in a complex antibody mixture. Second, heavy-light chain pairing is a problematic blind spot to proteomics sequencing, as the antibodies are denatured and digested as part of the sample workup. In contrast, this pairing is trivial in cryoEM data as the chains are in direct contact in the resolved Fabs in the map. Finally, reconstruction of CDRH3 is especially challenging with proteomics data alone, as it spans the junction between the recombined V-, D-, and J-segments, of which the D-segment is short and hypervariable to a point that germline sequences do not provide a functional template for sequence assembly. While CDRH3 coverage is also limited in ModelAngelo data, it can be built in many cases and a correct estimate of CDRH3 length is already useful to guide assembly of the de novo peptide reads.

Here we have extracted the flat sequences from ModelAngelo output to use as input for the template matching step in Stitch, using a modified Smith-Waterman Alignment. We believe the template matching step in Stitch could be further improved in several ways. First, ModelAngelo has a built-in database search based on HMMer using profile Hidden Markov models (HMM) encoding the amino acid probabilities across the alphabet at each position⁶⁷. While the ModelAngelo search currently does not consolidate the fragmented output models into a single search to build consensus sequences, we anticipate that implementing a similar HMM profile search in Stitch may further improve the V-gene inference. Similarly, from the benchmark dataset analyzed here we might learn what sequencing mistakes are common in ModelAngelo data. This could be used to adjust the conventional Smith-Waterman Alignment in Stitch accordingly, analogous to recent improvements in the alignment algorithm we implemented for MS data⁵⁷. Finally, the template matching step in Stitch is now solely based on the sequence of the ModelAngelo models, but we may expand this to a structure-based alignment to better place the error-prone sequence reads in the correct framework of the Ig-domains.

Next steps in our efforts are to bring together the EMPEM work and MS-based polyclonal antibody sequencing on antigen-Fab complexes purified from patient serum. The goal is to reconstruct functional monoclonal antibodies from these analyses as ultimate proof for the accuracy of the derived antibody sequences and then start working on the throughput and depth of coverage in the repertoire. We believe that both EMPEM and MS-based polyclonal antibody sequencing are still limited to the top 1-10 antibodies in the polyclonal mixture. The EMPEM approach is biased towards bigger and well-ordered target antigens, which calls for additional complementary approaches like HDX-MS for a comprehensive polyclonal epitope mapping exercise. Bringing together these perspectives in an integrated structural biology approach promises new insights into the serum compartment of the antibody repertoire to better understand the coevolutionary processes of antibody maturation and antigenic drift.

Materials and Methods

Benchmark of monoclonal antibody-antigen pairs and EMPEM maps from EMDB

We searched the EMDB for published maps containing antigen-binding fragments (Fabs) of any species, at nominal resolutions of ≤ 4 Ångström, released after the training data for ModelAngelo was obtained (April 1^st 2022). The search was performed on February 11^th 2023 at https://www.ebi.ac.uk/emdb/ using the search term “antibody fab resolution:[* TO 4] AND release_date:[2022-03-31T00:00:00Z TO 2023-11-02T00:00:00Z] sample_name:*fab* fitted_pdbs:[* TO *]”. These results were filtered for maps that included a deposited atomic model and which contained only a single unique monoclonal Fab (of which multiple copies may be bound in the reconstructed antigen complex). When multiple redundant maps from the same study were included, we selected the single representative map of the highest quality, based on manual inspection. Global FSC resolution was not a good indicator as the maps were often dominated by the bound antigen, which may be better resolved than the bound antibody. Typically, the selected map was the focused/local refinement around the epitope-paratope region, despite its lower nominal resolution. A full overview of the selected maps is provided in Supplementary Table S1. The EMPEM maps were compiled based on a literature survey, complemented with a search of the EMDB using the term “polyclonal”. A full overview of the selected EMPEM maps is provided in Supplementary Table S2.

Changes made to Stitch to take CIF input

Stitch was extended to allow mmCIF files as input. From these files all chains were extracted as separate amino acid sequences to align in Stitch. ModelAngelo outputs a confidence score per residue in the B-factor column of the mmCIF input, which was used as a local confidence for the sequence. First, each polypeptide gets assigned an Average Local Confidence (ALC) score based on the average across all residues, which can be used as an input filter on the data (along with polypeptide length). Second, the local confidence is used as weight in determining the consensus sequence of overlapping polypeptides, following assembly in Stitch.

Analysis of monoclonal antigen-antibody and EMPEM benchmarks

The deposited model mmCIF and EM maps for the full benchmark were downloaded and ran with ModelAngelo (version 1.01) in ‘build_no_seq’ mode. For each entry in the benchmark the deposited mmCIF was run with Stitch (version 1.5.0-rc.1+6d3b540) using CutoffALC 80, minimum length 5 and TemplateMatching CutoffScore 8. The same was done for each mmCIF file produced by ModelAngelo but with an additional segment containing the antigen and Ig constant domain template sequences. From these runs the consensus sequence and highest scoring germline for IGHV and IGLV (lambda + kappa) were retrieved. For each consensus sequence the CDR3 was determined if the flanking cysteine on the V gene and the tryptophan or phenylalanine on the J gene were present. As these conserved residues were not all positioned correctly in the IMGT database the data was manually fixed based on the same rules. The data from the deposited and produced Stitch runs was compared to produce the identity between the consensus sequences, distance between the inferred germlines, and identity between the CDR3s. The script used for this analysis is deposited as Supplementary Data. The results generated by this analysis is included in Supplementary Table S1. The EMPEM benchmark was downloaded, ran through ModelAngelo and subsequently Stitch with identical parameters as above. In contrast to the benchmark detailed before, the ground-truth sequences of these antibodies is not known.

Analysis of CR3022 data

The EM data for CR3022 was downloaded from EMDB (EMD-11648) and run with ModelAngelo (version 1.01) in ‘build_no_seq’ mode. The raw data for monoclonal antibodies CR3022, 107, 1028, 2771, 3576, and 3634 from PRIDE PXD030094 was downloaded and analyzed with PEAKS 10+. These listed monoclonal antibodies were chosen because these are the five most distant sequences from CR3022 and therefore present the biggest challenge to sequence assembly in Stitch (version 1.5.0-rc.1+6d3b540). The PEAKS analysis for the COVID-19 data from PRIDE PXD031941 was downloaded. Three sets of input were prepared: ‘no background’ consisting of only the CR3022 data, ‘+whole IgG’ consisting of the CR3022 and the COVID-19 data, and ‘+5 mAbs’ which consists of all mAbs from the CR3022 study. Three Stitch configurations where prepared: ‘True’ using the known CR3022 sequence as template as retrieved from PDB 7A5R, ‘IMGT’ using the conventional configuration of Stitch with all IMGT germlines as templates, and ‘MA’ using the closest V gene germline to the ModelAngelo consensus sequence of CR3022 together with the CDR3 sequence present. Each of these configurations were run with Stitch Recombine Decoy off and on, with this on for ‘True’ and ‘MA’ the full IMGT germline database was added and for ‘IMGT’ the Stitch parameter Decoy in Recombine was set allowing any unused germline from the Template Matching step to matched in the Recombine step. For all Stitch runs the CutoffALC was 90 and the TemplateMatching CutoffScore 10. The resulting consensus sequences from these 18 Stitch runs were then compared with the known CR3022 sequence to determine the identity. The full script used for this analysis is included in the deposited Supplementary Data.

Data and code availability

Stitch is available at https://github.com/snijderlab/stitch. The CR3022 and COVID-19 whole IgG LC-MS/MS data were taken from PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) via the PRIDE partner repository with the data set identifiers PXD030094 and PXD031941, respectively. All ModelAngelo and Stitch results, including a script to reproduce the full analysis, are made available on Zenodo under 10.5281/zenodo.12207014.

Acknowledgements

This research was funded by the Dutch Research Council NWO Gravitation 2013 BOO, Institute for Chemical Immunology (ICI; 024.002.009), and the European Research Council Executive Agency HORIZON ERC-2022-STG (FLAVIR; 101077640).

Additional files

Supplementary Information. Figure S1: Overview of CDR3 results from benchmark. Figure S2: Correlation of global FSC resolution with results from benchmark.

Supplementary Table S1. Overview and results of maps for benchmark.

Supplementary Table S2. Overview and results of maps for EMPEM benchmark.

References

(1)
1. Bonilla F. A.
2. Oettgen H. C.
2010Adaptive ImmunityJ. Allergy Clin. Immunol. 125:S33–S40https://doi.org/10.1016/j.jaci.2009.09.017 Google Scholar
(2)
1. Rees A. R
2020Understanding the Human Antibody RepertoiremAbs 12:1729683https://doi.org/10.1080/19420862.2020.1729683 Google Scholar
(3)
1. Burton D. R
2023Antiviral Neutralizing Antibodies: From in Vitro to in Vivo ActivityNat. Rev. Immunol 23:720–734https://doi.org/10.1038/s41577-023-00858-w Google Scholar
(4)
1. Lam J. H.
2. Smith F. L.
3. Baumgarth N
2020B Cell Activation and Response Regulation During Viral InfectionsViral Immunol 33:294–306https://doi.org/10.1089/vim.2019.0207 Google Scholar
(5)
1. Tellier J.
2. Nutt S. L
2019Plasma Cells: The Programming of an Antibody-secreting MachineEur. J. Immunol 49:30–37https://doi.org/10.1002/eji.201847517 Google Scholar
(6)
1. Pieper K.
2. Grimbacher B.
3. Eibel H.
2013B-Cell Biology and DevelopmentJ. Allergy Clin. Immunol. 131:959–971https://doi.org/10.1016/j.jaci.2013.01.046 Google Scholar
(7)
1. Marks C.
2. Deane C. M
2020How Repertoire Data Are Changing Antibody ScienceJ. Biol. Chem 295:9823–9837https://doi.org/10.1074/jbc.REV120.010181 Google Scholar
(8)
1. Han A. X.
2. De Jong S. P. J.
3. Russell C. A.
2023Co-Evolution of Immunity and Seasonal Influenza VirusesNat. Rev. Microbiol. 21:805–817https://doi.org/10.1038/s41579-023-00945-8 Google Scholar
(9)
1. Vajda S.
2. Porter K. A.
3. Kozakov D
2021Progress toward Improved Understanding of Antibody MaturationCurr. Opin. Struct. Biol 67:226–231https://doi.org/10.1016/j.sbi.2020.11.008 Google Scholar
(10)
1. White H. N
2021B-Cell Memory Responses to Variant Viral AntigensViruses 13:565https://doi.org/10.3390/v13040565 Google Scholar
(11)
1. Georgiou G.
2. Ippolito G. C.
3. Beausang J.
4. Busse C. E.
5. Wardemann H.
6. Quake S. R
2014The Promise and Challenge of High-Throughput Sequencing of the Antibody RepertoireNat. Biotechnol 32:158–168https://doi.org/10.1038/nbt.2782 Google Scholar
(12)
1. Lavinder J. J.
2. Horton A. P.
3. Georgiou G.
4. Ippolito G. C
2015Next-Generation Sequencing and Protein Mass Spectrometry for the Comprehensive Analysis of Human Cellular and Serum Antibody RepertoiresCurr. Opin. Chem. Biol 24:112–120https://doi.org/10.1016/j.cbpa.2014.11.007 Google Scholar
(13)
1. Inoue T.
2. Kurosaki T.
2024Memory B CellsNat. Rev. Immunol. 24:5–17https://doi.org/10.1038/s41577-023-00897-3 Google Scholar
(14)
1. Meng W.
2. Zhang B.
3. Schwartz G. W.
4. Rosenfeld A. M.
5. Ren D.
6. Thome J. J. C.
7. Carpenter D. J.
8. Matsuoka N.
9. Lerner H.
10. Friedman A. L.
11. Granot T.
12. Farber D. L.
13. Shlomchik M. J.
14. Hershberg U.
15. Luning Prak E. T
2017An Atlas of B-Cell Clonal Distribution in the Human BodyNat. Biotechnol 35:879–884https://doi.org/10.1038/nbt.3942 Google Scholar
(15)
1. Akkaya M.
2. Kwak K.
3. Pierce S. K
2020B Cell Memory: Building Two Walls of Protection against PathogensNat. Rev. Immunol 20:229–238https://doi.org/10.1038/s41577-019-0244-2 Google Scholar
(16)
1. De Graaf S. C.
2. Hoek M.
3. Tamara S.
4. Heck A. J. R
2022A Perspective toward Mass Spectrometry-Based de Novo Sequencing of Endogenous AntibodiesmAbs 14:2079449https://doi.org/10.1080/19420862.2022.2079449 Google Scholar
(17)
1. Fridy P. C.
2. Li Y.
3. Keegan S.
4. Thompson M. K.
5. Nudelman I.
6. Scheid J. F.
7. Oeffinger M.
8. Nussenzweig M. C.
9. Fenyö D.
10. Chait B. T.
11. Rout M. P
2014A Robust Pipeline for Rapid Production of Versatile Nanobody RepertoiresNat. Methods 11:1253–1260https://doi.org/10.1038/nmeth.3170 Google Scholar
(18)
1. Peng W.
2. Giesbers K. C. A. P.
3. Šiborová M.
4. Beugelink J. W.
5. Pronker M. F.
6. Schulte D.
7. Hilkens J.
8. Janssen B. J. C.
9. Strijbis K.
10. Snijder J
11. 2023.07.05.547778
2023Reverse Engineering the Anti-MUC1 Hybridoma Antibody 139H2 by Mass Spectrometry-Based de Novo SequencingbioRxiv https://doi.org/10.26508/lsa.202302366 Google Scholar
(19)
1. Bondt A.
2. Hoek M.
3. Dingess K.
4. Tamara S.
5. de Graaf B.
6. Peng W.
7. den Boer, M. A.; Damen, M.; Zwart, C.; Barendregt, A.; van Rijswijck, D. M. H.; Schulte, D.; Grobben, M.; Tejjani, K.; van Rijswijk, J.; Völlmy, F.; Snijder, J.; Fortini, F.; Papi, A.; Volta, C. A.; Campo, G.; Contoli, M.; van Gils, M. J.; Spadaro, S.; Rizzo, P.; Heck, A. J. R
2024Into the Dark Serum Proteome: Personalized Features of IgG1 and IgA1 Repertoires in Severe COVID-19 PatientsMol. Cell. Proteomics 23:100690https://doi.org/10.1016/j.mcpro.2023.100690 Google Scholar
(20)
1. Peng W.
2. den Boer M. A.
3. Tamara, S.; Mokiem, N. J.; van der Lans, S. P. A.; Bondt, A.; Schulte, D.; Haas, P.-J.; Minnema, M. C.; Rooijakkers, S. H. M.; van Zuilen, A. D.; Heck, A. J. R.; Snijder, J
2023Direct Mass Spectrometry-Based Detection and Antibody Sequencing of Monoclonal Gammopathy of Undetermined Significance from Patient Serum: A Case StudyJ. Proteome Res 22:3022–3028https://doi.org/10.1021/acs.jproteome.3c00330 Google Scholar
(21)
1. Tran N. H.
2. Rahman M. Z.
3. He L.
4. Xin L.
5. Shan B.
6. Li M
2016Complete De Novo Assembly of Monoclonal Antibody SequencesSci. Rep 6:31730https://doi.org/10.1038/srep31730 Google Scholar
(22)
1. Bondt A.
2. Hoek M.
3. Tamara S.
4. de Graaf B.
5. Peng W.
6. Schulte D.
7. van Rijswijck D. M. H.
8. den Boer M. A.
9. Greisch J.-F.
10. Varkila M. R. J.
11. Snijder J.
12. Cremer O. L.
13. Bonten M. J. M.
14. Heck A. J. R.
2021Human Plasma IgG1 Repertoires Are Simple, Unique, and DynamicCell Syst 12:1131–1143https://doi.org/10.1016/j.cels.2021.08.008 Google Scholar
(23)
1. Sousa E.
2. Olland S.
3. Shih H. H.
4. Marquette K.
5. Martone R.
6. Lu Z.
7. Paulsen J.
8. Gill D.
9. He T
2012Primary Sequence Determination of a Monoclonal Antibody against α-Synuclein Using a Novel Mass Spectrometry-Based ApproachInt. J. Mass Spectrom 312:61–69https://doi.org/10.1016/j.ijms.2011.05.005 Google Scholar
(24)
1. Sen K. I.
2. Tang W. H.
3. Nayak S.
4. Kil Y. J.
5. Bern M.
6. Ozoglu B.
7. Ueberheide B.
8. Davis D.
9. Becker C
2017Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical DiscoveryJ. Am. Soc. Mass Spectrom 28:803–810https://doi.org/10.1007/s13361-016-1580-0 Google Scholar
(25)
1. Rickert K. W.
2. Grinberg L.
3. Woods R. M.
4. Wilson S.
5. Bowen M. A.
6. Baca M
2016Combining Phage Display with de Novo Protein Sequencing for Reverse Engineering of Monoclonal AntibodiesmAbs 8:501–512https://doi.org/10.1080/19420862.2016.1145865 Google Scholar
(26)
1. Savidor A.
2. Barzilay R.
3. Elinger D.
4. Yarden Y.
5. Lindzen M.
6. Gabashvili A.
7. Adiv Tal O.
8. Levin Y
2017Database-Independent Protein Sequencing (DiPS) Enables Full-Length de Novo Protein and Antibody Sequence DeterminationMol. Cell. Proteomics 16:1151–1161https://doi.org/10.1074/mcp.O116.065417 Google Scholar
(27)
1. Peng W.
2. Pronker M. F.
3. Snijder J
2021Mass Spectrometry-Based De Novo Sequencing of Monoclonal Antibodies Using Multiple Proteases and a Dual Fragmentation SchemeJ. Proteome Res 20:3559–3566https://doi.org/10.1021/acs.jproteome.1c00169 Google Scholar
(28)
1. Guthals A.
2. Gan Y.
3. Murray L.
4. Chen Y.
5. Stinson J.
6. Nakamura G.
7. Lill J. R.
8. Sandoval W.
9. Bandeira N
2017De Novo MS/MS Sequencing of Native Human AntibodiesJ. Proteome Res 16:45–54https://doi.org/10.1021/acs.jproteome.6b00608 Google Scholar
(29)
1. Cheung W. C.
2. Beausoleil S. A.
3. Zhang X.
4. Sato S.
5. Schieferl S. M.
6. Wieler J. S.
7. Beaudet J. G.
8. Ramenani R. K.
9. Popova L.
10. Comb M. J.
11. Rush J.
12. Polakiewicz R. D
2012A Proteomics Approach for the Identification and Cloning of Monoclonal Antibodies from SerumNat. Biotechnol 30:447–452https://doi.org/10.1038/nbt.2167 Google Scholar
(30)
1. Castellana N. E.
2. McCutcheon K.
3. Pham V. C.
4. Harden K.
5. Nguyen A.
6. Young J.
7. Adams C.
8. Schroeder K.
9. Arnott D.
10. Bafna V.
11. Grogan J. L.
12. Lill J. R
2011Resurrection of a Clinical Antibody: Template Proteogenomic de Novo Proteomic Sequencing and Reverse Engineering of an Anti-lymphotoxin-α AntibodyProteomics 11:395–405https://doi.org/10.1002/pmic.201000487 Google Scholar
(31)
1. Bandeira N.
2. Pham V.
3. Pevzner P.
4. Arnott D.
5. Lill J. R
2008Automated de Novo Protein Sequencing of Monoclonal AntibodiesNat. Biotechnol 26:1336–1338https://doi.org/10.1038/nbt1208-1336 Google Scholar
(32)
1. Dupré M.
2. Duchateau M.
3. Sternke-Hoffmann R.
4. Boquoi A.
5. Malosse C.
6. Fenk R.
7. Haas R.
8. Buell A. K.
9. Rey M.
10. Chamot-Rooke J
2021De Novo Sequencing of Antibody Light Chain Proteoforms from Patients with Multiple MyelomaAnal. Chem 93:10627–10634https://doi.org/10.1021/acs.analchem.1c01955 Google Scholar
(33)
1. Antanasijevic A.
2. Bowman C. A.
3. Kirchdoerfer R. N.
4. Cottrell C. A.
5. Ozorowski G.
6. Upadhyay A. A.
7. Cirelli K. M.
8. Carnathan D. G.
9. Enemuo C. A.
10. Sewall L. M.
11. Nogal B.
12. Zhao F.
13. Groschel B.
14. Schief W. R.
15. Sok D.
16. Silvestri G.
17. Crotty S.
18. Bosinger S. E.
19. Ward A. B.
2022From Structure to Sequence: Antibody Discovery Using cryoEMSci. Adv 8:eabk2039https://doi.org/10.1126/sciadv.abk2039 Google Scholar
(34)
1. Antanasijevic A.
2. Schulze A. J.
3. Reddy V. S.
4. Ward A. B
2022High-Resolution Structural Analysis of Enterovirus-Reactive Polyclonal Antibodies in Complex with Whole VirionsPNAS Nexus 1:pgac253https://doi.org/10.1093/pnasnexus/pgac253 Google Scholar
(35)
1. Bangaru S.
2. Antanasijevic A.
3. Kose N.
4. Sewall L. M.
5. Jackson A. M.
6. Suryadevara N.
7. Zhan X.
8. Torres J. L.
9. Copps J.
10. De La Peña A. T.
11. Crowe J. E.
12. Ward A. B
2022Structural Mapping of Antibody Landscapes to Human Betacoronavirus Spike ProteinsSci. Adv 8:eabn2911https://doi.org/10.1126/sciadv.abn2911 Google Scholar
(36)
1. Boyoglu-Barnum S.
2. Ellis D.
3. Gillespie R. A.
4. Hutchinson G. B.
5. Park Y.-J.
6. Moin S. M.
7. Acton O. J.
8. Ravichandran R.
9. Murphy M.
10. Pettie D.
11. Matheson N.
12. Carter L.
13. Creanga A.
14. Watson M. J.
15. Kephart S.
16. Ataca S.
17. Vaile J. R.
18. Ueda G.
19. Crank M. C.
20. Stewart L.
21. Lee K. K.
22. Guttman M.
23. Baker D.
24. Mascola J. R.
25. Veesler D.
26. Graham B. S.
27. King N. P.
28. Kanekiyo M
2021Quadrivalent Influenza Nanoparticle Vaccines Induce Broad ProtectionNature 592:623–628https://doi.org/10.1038/s41586-021-03365-x Google Scholar
(37)
1. Dingens A. S.
2. Pratap P.
3. Malone K.
4. Hilton S. K.
5. Ketas T.
6. Cottrell C. A.
7. Overbaugh J.
8. Moore J. P.
9. Klasse P.
10. Ward A. B.
11. Bloom J. D
2021High-Resolution Mapping of the Neutralizing and Binding Specificities of Polyclonal Sera Post-HIV Env Trimer VaccinationeLife 10:e64281https://doi.org/10.7554/eLife.64281 Google Scholar
(38)
1. Grauslund L. R.
2. Ständer S.
3. Veggi D.
4. Andreano E.
5. Rand K. D.
6. Norais N
2024Epitope Mapping of Human Polyclonal Antibodies to the fHbp Antigen of a Neisseria Meningitidis Vaccine by Hydrogen-Deuterium Exchange Mass Spectrometry (HDX-MS)Mol. Cell. Proteomics 23:100734https://doi.org/10.1016/j.mcpro.2024.100734 Google Scholar
(39)
1. Han J.
2. Schmitz A. J.
3. Richey S. T.
4. Dai Y.-N.
5. Turner H. L.
6. Mohammed B. M.
7. Fremont D. H.
8. Ellebedy A. H.
9. Ward A. B
2021Polyclonal Epitope Mapping Reveals Temporal Dynamics and Diversity of Human Antibody Responses to H5N1 VaccinationCell Rep 34:108682https://doi.org/10.1016/j.celrep.2020.108682 Google Scholar
(40)
1. Nogal B.
2. Bianchi M.
3. Cottrell C. A.
4. Kirchdoerfer R. N.
5. Sewall L. M.
6. Turner H. L.
7. Zhao F.
8. Sok D.
9. Burton D. R.
10. Hangartner L.
11. Ward A. B
2020Mapping Polyclonal Antibody Responses in Non-Human Primates Vaccinated with HIV Env Trimer Subunit VaccinesCell Rep 30:3755–3765https://doi.org/10.1016/j.celrep.2020.02.061 Google Scholar
(41)
1. Ständer S.
2. Grauslund L. R.
3. Scarselli M.
4. Norais N.
5. Rand K.
2021Epitope Mapping of Polyclonal Antibodies by Hydrogen–Deuterium Exchange Mass Spectrometry (HDX-MS)Anal. Chem 93:11669–11678https://doi.org/10.1021/acs.analchem.1c00696 Google Scholar
(42)
1. Zhang Q.
2. Noble K. A.
3. Mao Y.
4. Young N. L.
5. Sathe S. K.
6. Roux K. H.
7. Marshall A. G
2013Rapid Screening for Potential Epitopes Reactive with a Polycolonal Antibody by Solution-Phase H/D Exchange Monitored by FT-ICR Mass SpectrometryJ. Am. Soc. Mass Spectrom 24:1016–1025https://doi.org/10.1007/s13361-013-0644-7 Google Scholar
(43)
1. Robinson C. V.
2. Sali A.
3. Baumeister W
2007The Molecular Sociology of the CellNature 450:973–982https://doi.org/10.1038/nature06523 Google Scholar
(44)
1. Leung M. R.
2. Zeng J.
3. Wang X.
4. Roelofs M. C.
5. Huang W.
6. Zenezini Chiozzi R.
7. Hevler J. F.
8. Heck A. J. R.
9. Dutcher S. K.
10. Brown A.
11. Zhang R.
12. Zeev-Ben-Mordehai T
2023Structural Specializations of the Sperm TailCell 186:2880–2896https://doi.org/10.1016/j.cell.2023.05.026 Google Scholar
(45)
1. Gui M.
2. Wang X.
3. Dutcher S. K.
4. Brown A.
5. Zhang R
2022Ciliary Central Apparatus Structure Reveals Mechanisms of Microtubule PatterningNat. Struct. Mol. Biol 29:483–492https://doi.org/10.1038/s41594-022-00770-2 Google Scholar
(46)
1. Gui M.
2. Ma M.
3. Sze-Tu E.
4. Wang X.
5. Koh F.
6. Zhong E. D.
7. Berger B.
8. Davis J. H.
9. Dutcher S. K.
10. Zhang R.
11. Brown A
2021Structures of Radial Spokes and Associated Complexes Important for Ciliary MotilityNat. Struct. Mol. Biol 28:29–37https://doi.org/10.1038/s41594-020-00530-0 Google Scholar
(47)
1. Schmidt L.
2. Tüting C.
3. Kyrilis F. L.
4. Hamdi F.
5. Semchonok D. A.
6. Hause G.
7. Meister A.
8. Ihling C.
9. Stubbs M. T.
10. Sinz A.
11. Kastritis P. L
2024Delineating Organizational Principles of the Endogenous L-A Virus by Cryo-EM and Computational Analysis of Native Cell ExtractsCommun. Biol 7:557https://doi.org/10.1038/s42003-024-06204-7 Google Scholar
(48)
1. Fianu I.
2. Ochmann M.
3. Walshe J. L.
4. Dybkov O.
5. Cruz J. N.
6. Urlaub H.
7. Cramer P
2024Structural Basis of Integrator-Dependent RNA Polymerase II TerminationNature 629:219–227https://doi.org/10.1038/s41586-024-07269-4 Google Scholar
(49)
1. Cingolani G.
2. Iglesias S.
3. Hou C.-F.
4. Lemire S.
5. Soriaga A.
6. Kyme P
2024Cryo-EM Analysis of Pseudomonas Phage Pa193 Structural ComponentsApril 12https://doi.org/10.21203/rs.3.rs-4189479/v1 Google Scholar
(50)
1. Jiang Y. X.
2. Cao Q.
3. Sawaya M. R.
4. Abskharon R.
5. Ge P.
6. DeTure M.
7. Dickson D. W.
8. Fu J. Y.
9. Ogorzalek Loo R. R.
10. Loo J. A.
11. Eisenberg D. S
2022Amyloid Fibrils in FTLD-TDP Are Composed of TMEM106B and Not TDP-43Nature 605:304–309https://doi.org/10.1038/s41586-022-04670-9 Google Scholar
(51)
1. Hugener J.
2. Xu J.
3. Wettstein R.
4. Ioannidi L.
5. Velikov D.
6. Wollweber F.
7. Henggeler A.
8. Matos J.
9. Pilhofer M
2024FilamentID Reveals the Composition and Function of Metabolic Enzyme Polymers during GametogenesisCell 187:3303–3318https://doi.org/10.1016/j.cell.2024.04.026 Google Scholar
(52)
1. Ho C.-M.
2. Li X.
3. Lai M.
4. Terwilliger T. C.
5. Beck J. R.
6. Wohlschlegel J.
7. Goldberg D. E.
8. Fitzpatrick A. W. P.
9. Zhou Z. H
2020Bottom-up Structural Proteomics: cryoEM of Protein Complexes Enriched from the Cellular MilieuNat. Methods 17:79–85https://doi.org/10.1038/s41592-019-0637-y Google Scholar
(53)
1. Chang L.
2. Wang F.
3. Connolly K.
4. Meng H.
5. Su Z.
6. Cvirkaite-Krupovic V.
7. Krupovic M.
8. Egelman E. H.
9. Si D
2022DeepTracer-ID: De Novo Protein Identification from Cryo-EM MapsBiophys. J 121:2840–2848https://doi.org/10.1016/j.bpj.2022.06.025 Google Scholar
(54)
1. Chojnowski G.
2. Simpkin A. J.
3. Leonardo D. A.
4. Seifert-Davila W.
5. Vivas-Ruiz D. E.
6. Keegan R. M.
7. Rigden D. J
2022findMySequence: A Neural-Network-Based Approach for Identification of Unknown Proteins in X-Ray Crystallography and Cryo-EMIUCrJ 9:86–97https://doi.org/10.1107/S2052252521011088 Google Scholar
(55)
1. Jamali K.
2. Käll L.
3. Zhang R.
4. Brown A.
5. Kimanius D.
6. Scheres S. H. W
2024Automated Model Building and Protein Identification in Cryo-EM MapsNature 628:450–457https://doi.org/10.1038/s41586-024-07215-4 Google Scholar
(56)
1. Schulte D.
2. Peng W.
3. Snijder J
2022Template-Based Assembly of Proteomic Short Reads For De Novo Antibody Sequencing and Repertoire ProfilingAnal. Chem 94:10391–10399https://doi.org/10.1021/acs.analchem.2c01300 Google Scholar
(57)
1. Schulte D.
2. Snijder J
2024A Handle on Mass Coincidence Errors in de Novo Sequencing of Antibodies by Bottom-up ProteomicsFebruary 22https://doi.org/10.1101/2024.02.20.581155 Google Scholar
(58)
1. Manso T.
2. Folch G.
3. Giudicelli V.
4. Jabado-Michaloud J.
5. Kushwaha A.
6. Nguefack Ngoune V.
7. Georga M.
8. Papadaki A.
9. Debbagh C.
10. Pégorier P.
11. Bertignac M.
12. Hadi-Saljoqi S.
13. Chentli I.
14. Cherouali K.
15. Aouinti S.
16. El Hamwi A.
17. Albani A.
18. Elazami Elhassani M.
19. Viart B.
20. Goret A.
21. Tran A.
22. Sanou G.
23. Rollin M.
24. Duroux P.
25. Kossida S.
2022IMGT® Databases, Related Tools and Web Resources through Three Main Axes of Research and DevelopmentNucleic Acids Res 50:D1262–D1272https://doi.org/10.1093/nar/gkab1136 Google Scholar
(59)
1. Lefranc M.-P
2014Immunoglobulin and T Cell Receptor Genes: IMGT® and the Birth and Rise of ImmunoinformaticsFront. Immunol 5https://doi.org/10.3389/fimmu.2014.00022 Google Scholar
(60)
1. Lefranc M.-P.
2. Giudicelli V.
3. Duroux P.
4. Jabado-Michaloud J.
5. Folch G.
6. Aouinti S.
7. Carillon E.
8. Duvergey H.
9. Houles A.
10. Paysan-Lafosse T.
11. Hadi-Saljoqi S.
12. Sasorith S.
13. Lefranc G.
14. Kossida S
2015IMGT®, the International ImMunoGeneTics Information System® 25 Years OnNucleic Acids Res 43:D413–D422https://doi.org/10.1093/nar/gku1056 Google Scholar
(61)
1. Momont C.
2. Dang H. V.
3. Zatta F.
4. Hauser K.
5. Wang C.
6. Di Iulio J.
7. Minola A.
8. Czudnochowski N.
9. De Marco A.
10. Branch K.
11. Donermeyer D.
12. Vyas S.
13. Chen A.
14. Ferri E.
15. Guarino B.
16. Powell A. E.
17. Spreafico R.
18. Yim S. S.
19. Balce D. R.
20. Bartha I.
21. Meury M.
22. Croll T. I.
23. Belnap D. M.
24. Schmid M. A.
25. Schaiff W. T.
26. Miller J. L.
27. Cameroni E.
28. Telenti A.
29. Virgin H. W.
30. Rosen L. E.
31. Purcell L. A.
32. Lanzavecchia A.
33. Snell G.
34. Corti D.
35. Pizzuto M. S
2023A Pan-Influenza Antibody Inhibiting Neuraminidase via Receptor MimicryNature 618:590–597https://doi.org/10.1038/s41586-023-06136-y Google Scholar
(62)
1. ter Meulen J.
2. van den Brink E.
3. Poon L. L. M.
4. Marissen W. E.
5. Leung C. S. W.
6. Cox F.
7. Cheung C. Y.
8. Bakker A. Q.
9. Bogaards J. A.
10. van Deventer E.
11. Preiser W.
12. Doerr H. W.
13. Chow V. T.
14. De Kruif J.
15. Peiris J. S. M.
16. Goudsmit J.
2006Human Monoclonal Antibody Combination against SARS Coronavirus: Synergy and Coverage of Escape MutantsPLoS Med 3:e237https://doi.org/10.1371/journal.pmed.0030237 Google Scholar
(63)
1. Yuan M.
2. Wu N. C.
3. Zhu X.
4. Lee C.-C. D.
5. So R. T. Y.
6. Lv H.
7. Mok C. K. P.
8. Wilson I. A.
2020A Highly Conserved Cryptic Epitope in the Receptor Binding Domains of SARS-CoV-2 and SARS-CoVScience 368:630–633https://doi.org/10.1126/science.abb7269 Google Scholar
(64)
1. Wrobel A. G.
2. Benton D. J.
3. Hussain S.
4. Harvey R.
5. Martin S. R.
6. Roustan C.
7. Rosenthal P. B.
8. Skehel J. J.
9. Gamblin S. J
2020Antibody-Mediated Disruption of the SARS-CoV-2 Spike GlycoproteinNat. Commun 11:5337https://doi.org/10.1038/s41467-020-19146-5 Google Scholar
(65)
1. Huo J.
2. Zhao Y.
3. Ren J.
4. Zhou D.
5. Duyvesteyn H. M. E.
6. Ginn H. M.
7. Carrique L.
8. Malinauskas T.
9. Ruza R. R.
10. Shah P. N. M.
11. Tan T. K.
12. Rijal P.
13. Coombes N.
14. Bewley K. R.
15. Tree J. A.
16. Radecke J.
17. Paterson N. G.
18. Supasa P.
19. Mongkolsapaya J.
20. Screaton G. R.
21. Carroll M.
22. Townsend A.
23. Fry E. E.
24. Owens R. J.
25. Stuart D. I.
2020Neutralization of SARS-CoV-2 by Destruction of the Prefusion SpikeCell Host Microbe 28:445–454https://doi.org/10.1016/j.chom.2020.06.010 Google Scholar
(66)
1. Gadush M. V.
2. Sautto G. A.
3. Chandrasekaran H.
4. Bensussan A.
5. Ross T. M.
6. Ippolito G. C.
7. Person M. D
2022Template-Assisted De Novo Sequencing of SARS-CoV-2 and Influenza Monoclonal Antibodies by Mass SpectrometryJ. Proteome Res 21:1616–1627https://doi.org/10.1021/acs.jproteome.1c00913 Google Scholar
(67)
1. Mistry J.
2. Finn R. D.
3. Eddy S. R.
4. Bateman A.
5. Punta M
2013Challenges in Homology Search: HMMER3 and Convergent Evolution of Coiled-Coil RegionsNucleic Acids Res 41:e121–e121https://doi.org/10.1093/nar/gkt263 Google Scholar

Article and author information

Author information

Douwe Schulte
Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
ORCID iD: 0000-0003-0594-0993
Marta Šiborová
Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
ORCID iD: 0000-0002-6879-5247
Lukas Käll
Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, Royal Institute of Technology – KTH, Solna, Sweden
ORCID iD: 0000-0001-5689-9797
Joost Snijder
Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
ORCID iD: 0000-0002-9310-8226
- For correspondence: j.snijder@uu.nl

Author Notes

Competing Interest Statement: The authors have declared no competing interest.

Version history

Preprint posted: June 27, 2024
Sent for peer review: November 4, 2024
Reviewed Preprint version 1: January 16, 2025
Reviewed Preprint version 2: March 18, 2025
Version of Record published: April 23, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.101322. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 2,495
downloads: 204
citations: 9

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Determining de novo antibody sequences from cryoEM data with ModelAngelo.

V-gene assignment from ModelAngelo data.

Schematic workflow to estimate de novo antibody sequences by the integration of cryoEM and LC-MS/MS data with Stitch.

Sequencing CR3022 with integrated cryoEM and LC-MS/MS data.

Targeted sequencing of CR3022 against a complex background of other antibodies.

Inferring V-genes from published EMPEM data.