PGFinder, a novel analysis pipeline for the consistent, reproducible, and high-resolution structural analysis of bacterial peptidoglycans

  1. Ankur V Patel  Is a corresponding author
  2. Robert D Turner
  3. Aline Rifflet
  4. Adelina E Acosta-Martin
  5. Andrew Nichols
  6. Milena M Awad
  7. Dena Lyras
  8. Ivo Gomperts Boneca
  9. Marshall Bern
  10. Mark O Collins  Is a corresponding author
  11. Stéphane Mesnage  Is a corresponding author
  1. School of Biosciences, University of Sheffield, United Kingdom
  2. Department of Computer Science, University of Sheffield, United Kingdom
  3. Institut Pasteur, Unité Biologie et Génétique de la Paroi Bactérienne, France
  4. INSERM, Équipe Avenir, France
  5. CNRS, UMR 2001 "Microbiologie intégrative et moléculaire", France
  6. biOMICS Facility, Faculty of Science Mass Spectrometry Centre, University of Sheffield, United Kingdom
  7. Protein Metrics Inc, United States
  8. Infection and Immunity Program, Monash Biomedicine Discovery Institute, Australia
  9. Department of Microbiology, Monash University, Australia
9 figures, 3 tables and 2 additional files

Figures

Diversity of peptidoglycan composition and structure.

(a) Representative peptidoglycan building block made of N-acetylglucosamine (GlcNAc) and N-acetylmuramic acid (MurNAc) forming a disaccharide subunit linked to a pentapeptide stem attached to the MurNAc via a lactyl moiety. Peptide stem contains both L and D-amino acids and show a great diversity in composition. Some examples of amino acids found in peptidoglycan are shown for each residue. Modifications of the sugars are also shown. (b) Representation of crosslinking diversity, 4–3 bonds (direct or via peptide crossbridge) and 3–3 bonds are made by D,D- or L,D-transpeptidases, respectively. The enzymes catalysing 1–3 and 4–2 bonds remain unknown. Acceptors stems are shown in blue and donor stems in red. DAA: diamino acid; m-DAP: meso-diaminopimelic acid; D-Lac: D-lactate; X: cell surface polymer (e.g teichoic acid); Z: lateral chain.

Flowchart outlining the algorithm for the matching script.

The identification of muropeptides was carried out using four successive steps, indicated by different colours (orange, green, blue, and red, respectively). As a first step, observed masses in the dataset are compared to a list of theoretical masses corresponding to monomers (database 1). Matched masses within the ppm tolerance set (10 ppm for Orbitrap data) are used to build a list of inferred monomeric structures and their corresponding theoretical masses (library 1). This is then used to generate a list of theoretical multimers (dimers and trimers) and their masses (database 2). A second matching round is carried out to build a list of inferred multimers (library 2). At this stage, matched monomers and multimers are combined to generate a list of modified muropeptides (library 3). Two libraries of matched theoretical masses (monomers and dimers, trimers) and a third library (their modified counterparts) are used to search the dataset. Muropeptide structures are inferred from a match within tolerance between theoretical and observed masses. This data is then ‘cleaned up’ by combining the intensities of ions corresponding to in-source decay and salt adducts to those of parent ions. The final matched mass spectrometry data is then written to a .csv file.

Distribution of E. coli peptidoglycan fragments identified using automated search workflow.

Breakdown of peptidoglycan is shown by oligomerisation state (left) branching to specific composition (right). Branch size is proportional to percentage. Monomers, dimers, trimers, and glycan chains (left) are broken down into muropeptide composition and structure (right). Individual structures are grouped by colour according to oligomerisation state. Monomers, green; dimers, yellow; trimers, orange. Residues in square brackets are only found in some muropeptides. For example, GM-AEJ[A]-GM-AEJ[A] can represent GM-AEJA-GM-AEJA, GM-AEJA-GM-AEJ, and GM-AEJ-GM-AEJ. G: N-acetylglucosamine; M, N-acetylmuramic acid; A: L- or D- alanine; E: γ-D-glutamic acid; J: meso-diaminopimelic acid; K: D-lysine; R: D-arginine; G: glycine.

Figure 4 with 1 supplement
Comparative analysis of C. difficile R20291 and M7404 peptidoglycan (PG) composition.

(a) Pearson’s correlation coefficients across biological replicates of R20291 and M7404 C. difficile isolates. Heatmap gradient shows highest value in green to lowest value in red. (b) Muropeptide distribution according to degree of crosslinking. Comparison was carried out using a Student’s t-test; p-value is indicated for each category of muropeptides. (c) Volcano plot, where each dot represents an individual muropeptide, plotted against the significance (Student’s t-test p-value<0.05, FDR < 0.05, S0 = 0.1) and difference (log2). Muropeptides showing a significantly different abundance between strains are highlighted in red. Lac: lactyl group; A: D/L-alanine; E: γ-D-glutamate; J: meso-diaminopimelic acid V: D-valine; L: D-leucine; I: D-isoleucine; G: glycine.

Figure 4—source data 1

C. difficile mass database.

https://cdn.elifesciences.org/articles/70597/elife-70597-fig4-data1-v1.csv
Figure 4—source data 2

C. difficile 20291 versus M7404, list of muropeptides, abundance, RT.

https://cdn.elifesciences.org/articles/70597/elife-70597-fig4-data2-v1.xlsx
Figure 4—figure supplement 1
C. difficile LC-MS chromatograms.
Appendix 1—figure 1
UHPLC-MS chromatogram of E. coli reduced disaccharide peptides.
Appendix 1—figure 2
Consistency of E. coli PG analyses.

(a) Pearson’s correlation coefficients across biological replicates of E. coli BW25113. (b) Muropeptide distribution according to degree of crosslinking. The crosslinking index was calculated as described previously (Glauner, 1988). (c) Pairwise comparisons of intensities corresponding to individual muropeptides identified in biological replicates. WT1, WT2 and WT3 correspond to individual biological replicates; Av., average abundance; SD, standard deviation.

Appendix 2—figure 1
Workflow for production of MaxQuant compatible MS data files from Agilent QTOF data.

Agilent MS data (data: .d) is converted by Proteowizard to a mzML format (data: XML). Relevant settings for Proteowizard are shown (left). mzML file is then converted by TOPPAS to a mzXML file (data: XML). Relevant settings are shown (right).

Appendix 2—figure 2
Workflow for MS data processing using MaxQuant, before automated analysis.

mzXML (data: XML) is passed to MaxQuant (process) for deconvolution and monoisotopic mass determination. Default values used except where indicated (right). MaxQuant output (data: text file) is then passed to the data parser module (process). This module removes superfluous data and reformats remaining data to be compatible with the matching script as an Excel file (data: xlsx).

Author response image 1

Tables

Table 1
Processed match output.
StructureRT (min)Abundance (%)Monoisotopic mass (Da)
Av±SDAv±SDObsTheoΔppm
GM|03.62±0.013.465±0.683498.205498.2062.5
GlycansGM (x2)|010.11±0.030.428±0.349976.384976.3862.2
4.38%±0.35%GM (anhydro) |08.20±1.920.238±0.025478.179478.1802.9
GM (deacetyl) |02.57±0.000.155±0.032456.194456.1963.5
GM (x2) (deacetyl) |06.86±0.020.093±0.012934.372934.3763.2
GM-AEmA|110.04±0.0436.098±2.131941.405941.4082.8
GM-AEm|16.57±0.0114.352±0.397870.368870.3713.0
GM-AEmKR|19.56±0.058.030±0.7741154.5631154.5673.6
GM-AE|19.57±0.041.809±0.231698.284698.2863.1
GM-AEmG|17.85±0.050.689±0.049927.390927.3922.3
GM-AEm (anhydro) |113.98±0.020.668±0.073850.342850.3442.2
MonomersGM-AEmA (anhydro) |116.55±0.010.573±0.100921.380921.3822.0
63.14%±1.13%GM-AEmAG|19.45±0.050.219±0.009998.426998.4293.1
GM-AEmKR (anhydro) |114.83±0.010.160±0.0391134.5371134.5402.9
GM-AEmA (deacetyl) |18.57±0.060.083±0.055899.394899.3973.1
GM-GM-AEmA|113.10±0.020.075±0.0401419.5841419.5882.9
GM-AE (anhydro) |117.44±0.010.069±0.013678.258678.2602.8
M-AEm|14.56±0.010.062±0.064667.289667.2913.8
M-AEmKR|18.16±0.060.061±0.056*951.484951.4873.2
GM-AEmAA|111.38±0.040.059±0.0031012.4421012.4452.4
M-AEmA|18.52±0.050.053±0.015738.325738.3284.0
GM-GM-AEm|111.31±0.040.042±0.0251348.5471348.5512.4
GM-AEm (deacetyl) |14.77±0.010.024±0.014828.358828.3603.0
GM-GM-AEmKR|112.18±0.030.011±0.002*1632.7421632.7473.0
GM-AEmA-GM-AEmA|216.01±0.0217.247±0.7771864.8001864.8052.3
GM-AEmA-GM-AEmKR|214.83±0.024.589±0.5892077.9572077.9643.0
GM-AEmA-GM-AEm|215.09±0.023.207±0.1681793.7631793.7682.6
GM-AEmA-GM-AEmA (anhydro) |220.56±0.010.873±0.0371844.7741844.7782.4
GM-AEm-GM-AEmKR|214.22±0.000.855±0.1012006.9202006.9263.3
GM-AEmA-GM-AEmKR (anhydro) |218.89±0.170.665±0.0792057.9342057.9371.8
GM-AEm-GM-AEm|214.23±0.010.558±0.0621722.7251722.7303.0
GM-AEm-GM-AEmAG|214.68±0.010.416±0.0251850.7851850.7892.4
GM-AEmA-GM-AEm (anhydro) |219.66±0.010.381±0.0281773.7381773.7412.1
DimersGM-AEmA-GM-AEmAG|215.33±0.020.179±0.0051921.8221921.8262.2
29.54%±0.46%GM-AEm-GM-AEmKR (anhydro) |218.07±0.010.170±0.0241986.8961986.9002.1
GM-AEm-GM-AEm (anhydro) |218.77±0.010.141±0.0151702.6971702.7044.5
GM-AEmA-GM-AEmAA|216.54±0.010.075±0.0021935.8381935.8422.1
GM-AEm-GM-AEmG|213.91±0.010.054±0.0031779.7471779.7522.7
GM-GM-AEmA-GM-AEmA|217.51±0.010.046±0.0282342.9762342.9853.6
GM-AEmA-GM-AEmA (deacetyl) |215.17±0.010.029±0.0221822.7891822.7943.0
GM-AEmA-GM-AEmG (anhydro) |219.12±0.010.021±0.0011830.7611830.7630.7
GM-AEmA-GM-AEmAG (anhydro) |219.73±0.010.019±0.0021901.7961901.8002.1
GM-AEmA-GM-AEmAA (anhydro) |221.17±0.020.015±0.0021915.8121915.8161.8
GM-GM-AEmA-GM-AEm|216.85±0.000.003±0.0042271.9432271.9472.1
GM-AEmA-GM-AEmA-GM-AEmA|318.86±0.011.751±0.2212788.1922788.2023.5
GM-AEmA-GM-AEmA-GM-AEm|318.23±0.210.371±0.0312717.1582717.1642.2
GM-AEmA-GM-AEmA-GM-AEmA (anhydro) |322.39±0.020.222±0.0272768.1692768.1752.3
GM-AEmA-GM-AEmA-GM-AEmKR|317.54±0.010.207±0.0283001.3503001.3603.4
GM-AEmA-GM-AEmA-GM-AEm (anhydro) |321.60±0.020.117±0.0032697.1332697.1381.8
TrimersGM-AEmA-GM-AEmA-GM-AEmKR (anhydro) |320.90±0.160.088±0.0262981.3282981.3342.2
2.94%±0.36%GM-AEmA-GM-AEmA-GM-AEmG|317.72±0.010.039±0.0042774.1822774.1861.4
GM-AEmA-GM-AEm-GM-AEm|317.45±0.010.029±0.0052646.1232646.1271.7
GM-AEmA-GM-AEm-GM-AEm (anhydro) |321.16±0.010.025±0.0012626.0962626.1011.9
GM-AEmA-GM-AEm-GM-AEmKR|317.11±0.010.022±0.0022930.3162930.3232.7
GM-AEmA-GM-AEmA-GM-AEmAG|318.24±0.010.021±0.0012845.2172845.2232.0
GM-AEmA-GM-AEmA-GM-AEmAA|319.23±0.010.014±0.002*2859.2352859.2391.3
GM-AEmA-GM-AEm-GM-AEmKR (anhydro) |320.31±0.020.014±0.0052910.2932910.2971.5
GM-AEm-GM-AEmG-GM-AEmAG|317.18±0.000.004±0.0052703.1432703.1492.0
GM-AEmA-GM-AEm-GM-AEmG (anhydro) |321.21±0.020.011±0.003*2754.1572754.1601.1
GM-AEmA-GM-AEmA-GM-AEmAG (anhydro) |321.77±0.010.006±0.0042825.1892825.1972.8
  1. Inferrred dimers and trimers are based on the most abundant monomers and could correspond to alternative structures.

  2. G: GlcNAc; M: MurNAc; m: meso-diaminopimelic acid; the number following the symbol ‘|’ refers to the oligomerisation state (1 for monomers, 2 for dimers, and 3 for trimers).

  3. *

    Calculated from two values.

Table 2
Automated identification of P. aeruginosa peptidoglycan fragments.
Inferred structureMass∆ppmMaxQuant
TheoreticalObservedThis workAnderson et al.
GM (anhydro)478.1799478.17804.0–2.7+
GM498.2061498.20423.9–4.2+
GM (x2) (deacetyl)934.3755934.37065.3–8.6+
GM (x2) (anhydro)956.3598956.35515.06.0+
GM (x2)976.3860976.37946.7–6.1+
GM (x3) (deacetyl)1412.55541412.54904.5–6.2+
GM (x3) (anhydro)1434.53971434.53483.4–7.5+
GM (x3)1454.56591454.55924.6–5.3+
GM (x4)1932.74581932.73525.5–5.1+
GM-AE (anhydro)678.2596678.25674.3–9.1+
GM-AE698.2858698.28303.9–12.9+
GM-AEJ (anhydro)850.3444850.34015.1–10.6+
GM-AEJ870.3706870.36763.5–5.9+
GM-AEJA (anhydro)921.3815921.37655.4–9.9+
GM-AEJG927.3920927.38685.6–8.9+
GM-AEJA941.4077941.40453.4–5.0+
GM-AEJC973.3843973.37638.2–2072.2+
GM-AEJL983.4593983.44989.6–15.5+
GM-AEJK998.4703998.46248.0–10.6+
GM-AEJM1001.41531001.40609.2–13.5+
GM-AEJAA1012.44481012.44133.4–7.8+
GM-AEJY (anhydro)1013.40911013.4242–14.917.8+
GM-AEJF1017.44331017.43478.4–15.0+
GM-AEJY1033.43531033.42787.2–5.3+
GM-AEJAV1040.48081040.47168.8–14.7+
GM-AEJIA1054.49641054.48748.5–11.3+
GM-AEJW1056.43941056.4455–5.84.0+
GM-AEJAM1072.45241072.44605.9–4.3+
GM-AEJKR1154.56671154.56313.1–8.1+
GM-GM-AE1176.48361176.459020.9–24.7+
GM-GM-AEJ1348.56841348.545716.9–24.9+
GM-GM-AEJA1419.60551419.582416.2–23.5+
GM-AEJA-GM-AEJ (amidase product)1313.57211313.56743.5–11.0+
GM-AEJA-GM-AEJA (amidase product)1384.60921384.60374.0–7.4+
GM-AEJ-GM-AEJ (anhydro)1702.70421702.69763.938.3+
GM-AEJ-GM-AEJ1722.73041722.72344.1–8.6+
GM-AEJA-GM-AEJ (double anhydro)1753.71511753.70436.2–7.2+
GM-AEJA-GM-AEJ (anhydro)1773.74131773.73394.2–11.1+
GM-AEJA-GM-AEJ1793.76751793.75964.4–8.8+
GM-AEJA-GM-AEJA (dacetyl)1822.79411822.78087.3–7.4+
GM-AEJA-GM-AEJA (double anhydro)1824.76011824.74478.4–15.6+
GM-AEJA-GM-AEJA (anhydro)1844.77841844.77044.3–8.3+
GM-AEJA-GM-AEJG1850.78891850.8158–14.69.7+
GM-AEJA-GM-AEJA1864.80461864.79624.5–6.6+
GM-AEJA-GM-AEJK (anhydro)1901.84101901.82975.9–14.5+
GM-AEJA-GM-AEJL1906.85621906.84525.8–11.3+
GM-AEJA-GM-AEJK1921.86721921.85864.5–12.0+
GM-AEJA-GM-AEJF1940.84021940.82637.2–8.8+
GM-AEJA-GM-AEJY1956.83221956.82105.7–7.6+
GM-AEJA-GM-AEJAL1977.89331977.88136.0–10.7+
GM-AEJA-GM-AEJKR2077.96362077.95892.2–13.0+
GM-GM-AEJ-GM-AEJ2200.92822200.900012.8–17.7+
GM-GM-AEJA-GM-AEJ2271.96532271.936812.6–18.4+
GM-GM-AEJA-GM-AEJA2343.00242342.973412.4411.4+
GM-AEJA-GM-AEJA-GM-AEJ (double anhydro)2677.11202677.10004.5–10.7+
GM-AEJA-GM-AEJA-GM-AEJ (anhydro)2697.13822697.12594.6–8.6+
GM-AEJA-GM-AEJA-GM-AEJ2717.16442717.15324.1–10.7+
GM-AEJA-GM-AEJA-GM-AEJA (double anhydro)2748.14912748.13634.7–11.0+
GM-AEJA-GM-AEJA-GM-AEJA (anhydro)2768.17532768.16742.9–11.2+
GM-AEJA-GM-AEJA-GM-AEJA2788.20152788.19193.4–9.7+
GM-AEJA-GM-AEJA-GM-AEJK (anhydro)2825.23792825.22056.1–9.3+
GM-GM-AEJA-GM-AEJA-GM-AEJ3195.36223195.326411.2–14.0+
GM-GM-AEJA-GM-AEJA-GM-AEJA3266.39933266.363011.1–12.5+
  1. Alternative structures were matched:

  2. GM-AEJ-GM-AEJK.

  3. GM-AEJ-GM-AEJKA (anhydro).

  4. GM-AEJ-GM-AEJKA.

  5. GM-AEJ-GM-AEJA-GM-AEJKA (anhydro).

Table 2—source data 1

Pseudomonas aeruginosa matched muropeptides not reported previously.

https://cdn.elifesciences.org/articles/70597/elife-70597-table2-data1-v1.xlsx
Table 2—source data 2

Raw output of automated search using MaxQuant and PGFinder.

https://cdn.elifesciences.org/articles/70597/elife-70597-table2-data2-v1.xlsx
Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background(Escherichia coli)BW25113https://doi.org/10.1073/pnas.120163297RRID:Addgene_72340Model strain for PG analysis
Strain, strain background(Clostridioides difficile)R20291https://doi.org/10.1128/JB.0073107Model strain for PG analysis
Strain, strain background(Clostridioides difficile)M7404https://doi.org/10.1371/journal.ppat.1002317Model strain for PG analysis
Software, algorithmPGFinderv.0.02This workUsed for MS1 analysis of PG structure
Software, algorithmByosv.3.9–32Protein Metrics IncUsed for MS data deconvolution and MS/MS analysis
Software, algorithmMaxQuant v2.0.1.0Cox and Mann, 2008RRID:SCR_014485Used for MS data deconvolution
Software, algorithmPerseusv.1.6.10.53Tyanova et al., 2016RRID:SCR_015753Used statistical analysis of muropeptide abundance

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ankur V Patel
  2. Robert D Turner
  3. Aline Rifflet
  4. Adelina E Acosta-Martin
  5. Andrew Nichols
  6. Milena M Awad
  7. Dena Lyras
  8. Ivo Gomperts Boneca
  9. Marshall Bern
  10. Mark O Collins
  11. Stéphane Mesnage
(2021)
PGFinder, a novel analysis pipeline for the consistent, reproducible, and high-resolution structural analysis of bacterial peptidoglycans
eLife 10:e70597.
https://doi.org/10.7554/eLife.70597