Structure-guided secretome analysis of gall-forming microbes offers insights into effector diversity and evolution
Figures

Description of the pathogens included in the study, workflow overview and statistics of structure prediction.
(A) Cladogram of the pathogens used in the study. The schematics represent the disease symptoms on their respective hosts, with the white areas representing galls. The secretome count indicates the number of proteins per species predicted to be secreted, and the functional annotation shows the percentage of the secretome predicted to contain a known protein domain in the Pfam, SUPERFAMILY, or Gene3D databases. (B) Flowchart of the workflow used to predict the secretome and the corresponding 3D structures. (C) Raincloud plot showing the median and density distribution of pLDDT scores of the predicted structures in each pathogen.

Visualization of dominant protein folds present in each pathogen.
(Top) Network plot of structurally similar secretome clusters with at least 15 members. Not all 255 clusters are shown to reduce complexity. Each node represents a single protein, and an edge between two nodes represents structural similarity (TMScore >0.5). (Bottom) Representative structure of the dominant fold in each pathogen. Since Ankyrin repeats are common in both P. brassicae and S. subterranea, they are represented only once.

The structures of Beta vulgari’s GTPases, KP6 fold, RALPH-like fold, and ToxA fold share structural similarity with candidate effectors found in P. betae, U. maydis, and T. deformans, respectively.
The number of clusters where proteins share structural similarities with these folds is provided.

Sequence and structural similarity among HopQ1 homologs in U. maydis, A. candida, and P. brassicae.
(A) Network plot showing the structural similarity between the members of cluster 21. Edges denote structural similarity (TMScore >0.5). (B) Pairwise sequence identity between selected HopQ1 structural homologs from plasmodiophorids, oomycetes, and fungi, illustrating sequence dissimilarity between some proteins despite structural homology. (C) Gene expression values (log2 TPM) of two highly induced P. brassicae genes at 16 and 26 dpi. (D) 3D structure of the mature protein sequences, assuming a HopQ1-like fold.

Sequence and structural similarity among Mig1 homologs in Ustilago maydis.
(A) Similarity matrix showing the pairwise sequence identity (%) between Mig1 cluster members. (B) Similarity matrix showing the pairwise structural homology scores (TMScore) between Mig1 cluster members. (C) Superimposition of two Mig1 homologues, illustrating structural similarity despite extreme sequence divergence. (D) Differential gene expression patterns of two Mig1 tandem duplicates. (E) Multiple alignment of protein sequences, highlighting the conservation of cysteine residues (marked in yellow). (F) Visualization of the conserved cysteine residues forming disulfide bridges.

SUSS effector families are enriched in common motifs.
(A) Network plots demonstrating that the two primary effector families in A. candida and S. endobioticum can only be grouped together when structural data is incorporated into the sequence-based clustering. The plots also indicate which sequence-based clusters contain the known effectors from these groups. (B) ‘RAYH’ and ‘CCG’ motif patterns identified by MEME scan. (C) Disulfide bridges in the ‘CCG’ motif, likely playing a pivotal role in structural maintenance, are highlighted in the virulence factor CCG30. A zoomed-in view of the ‘CCG module’ shows the four conserved cysteine residues forming disulfide bridge. (D) The ‘RAYH’ motif, occupying the central position in the core alpha-helix bundle, is highlighted in six sequence-based subclusters within the AvrSen1-like cluster in S. endobioticum.

Analysis evidencing purifying selection in the CCG effector family.
Plots have been downloaded from the Datamonkey server, which hosts the FEL package. The indicated codons are numbered according to the codon-aligned nucleotide alignment file, where positions with 50% gaps have been trimmed.

Multiple alignment of representative members of CCG sequence-based clusters.
Conserved residues (>70% similar by biochemical properties in ESPript3.0) are highlighted.

Multiple alignment of representative members of AvrSen1-like sequence-based clusters.
Conserved residues (>70% similar by biochemical properties in ESPript3.0) are highlighted.

Diversity, structural features, and host immune targets of ankyrin repeats in Plasmodiophorids.
(A) Frequency of repeat-containing proteins in P. brassicae and S. subterranea. (B) Network plot showing structural homology within P. brassicae Ankyrin repeats, also highlighting the Ankyrin repeats with SKP1/BTB/POZ superfamily domains. (C) Alignment of Ankyrin motifs from P. brassicae and S. subterranea. (D) Visualization of conserved hydrophobic residues in a single Ankyrin repeat module. (E) Number of Ankyrin repeat proteins predicted to target Arabidopsis immune proteins. (F) AlphaFold Multimer predicted complex of MPK3 and PbANK1 (PBTT_00818), highlighting the predicted aligned errors of surface contacts under 4 Ångströms.

Validation of PbANK1-MPK3 and PbANK1-GroES-like interactions through Yeast two-hybrid (Y2H) and bimolecular fluorescence complementation (BiFC).
(A) 1-by-1 Y2H assay results evaluating the interaction PbANK1-MPK3 (AT3G45640) predicted through AlphaFold-Multimer. N=3. (B) 1-by-1 Y2H assay results evaluating the interaction PbANK1-GroES-like (AT3G56460) predicted through a Y2H screening of an Arabidopsis seedling library. N=3. (C) BiFC assay results evaluating the interaction PbANK1-MPK3. N=3, presented in Figure 7—figure supplement 1. Bar = 50 μm. (D) BiFC assay results evaluating the interaction PbANK1- GroES-like. N=3, presented in Figure 7—figure supplement 2. Bar = 50 μm.

Extended BiFC assay results evaluating the interaction PbANK1-MPK3 presented in Figure 7c including the three replicates, empty vectors, and positive controls.

Extended BiFC assay results evaluating the interaction PbANK1-GroES-like presented in Figure 7d including the three replicates, empty vectors, and positive controls.
Additional files
-
Supplementary file 1
Detailed information about the pathogens investigated in this study and their secretome composition with pLDDT scores, IUPred3 results, and InterproScan identity.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp1-v1.xlsx
-
Supplementary file 2
Known/orphan effector families searched in gall-forming secretome investigated in this study.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp2-v1.xlsx
-
Supplementary file 3
Structural clusters generated in this study; ID of the top 26 clusters detected; and frequency of clusters, orphan effectors and known fungal effectors per species studied here.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp3-v1.xlsx
-
Supplementary file 4
Detailed information on cluster 2, FoldSeek search results, AFDB cluster web tool, and sequence identity matrix with HopQ1.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp4-v1.xlsx
-
Supplementary file 5
Sequence identity matrix of Ustilago maydis Mig1 protein with other members of cluster 30.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp5-v1.xlsx
-
Supplementary file 6
Expression of Ustilago maydis mig1 and members of the cluster 30 in axenic, 1, 2, 4, 6, 8, and 12 dpi.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp6-v1.xlsx
-
Supplementary file 7
Detailed information on the new SUSS effector families identified in this study.
Structure-based and Sequence-based clustering of CCG effectors. Structure-based and Sequence-based clustering of AvrSen1-like effectors. MEME output of CCG and RAYH motif scan. Selection of one representative member (highest pLDDT) from each sequence-based cluster for selection pressure analysis in CCGs. Selection of one representative member (highest pLDDT) from each sequence-based cluster to examine RAYH motif conservation in AvrSen1-like cluster.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp7-v1.xlsx
-
Supplementary file 8
Sequence clusters generated in this study.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp8-v1.xlsx
-
Supplementary file 9
Detailed information of proteins containing repeats.
Ankyrin repeat-containing proteins, leucine-rich repeats, and ankyrin repeat-containing proteins with SPK1 domains in P. brassicae and S. subterranean. Expression of P. brassicae ankyrin repeat-containing proteins and leucine-rich repeats at 16- and 26 dpi in A. thaliana. P. brassicae and S. subterranean ankyrin repeat-containing proteins MEME scan.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp9-v1.xlsx
-
Supplementary file 10
List of A. thaliana and P. brassicae proteins used to study modeled interactions with AlphaFold-Multimer.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp10-v1.xlsx
-
Supplementary file 11
Output of the AlphaFold-Multimer predicting interaction between A. thaliana immunity hubs and P. brassicae Ankyrin repeat-containing proteins.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp11-v1.xlsx
-
Supplementary file 12
Detailed list of PBTT_00818 His + clones resulted from a yeast two-hybrid screen versus an Arabidopsis seedling library.
- https://cdn.elifesciences.org/articles/105185/elife-105185-supp12-v1.xlsx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/105185/elife-105185-mdarchecklist1-v1.pdf