Introduction

Specificity between protein-protein interactions is key for several biological processes, such as metabolism, development, and intercellular signaling. Binding to an incorrect partner or disrupted binding of the specific (cognate) protein can cause a diseased state or cell death1. For bacterial social behaviors, cognate protein-protein interactions between cells impact organism fitness and community structure, such as excluding foreign cells2. The importance of specificity between cognate partners in social behaviors remains largely unchallenged. In other contexts, flexible (“promiscuous”) binding allows protein partners to retain their interactions when undergoing rapid mutational changes, such as during immune recognition of viral particles3, 4. Unknown is whether flexible binding between noncognate proteins can occur during bacterial social behaviors. An expanded protective function would reveal new bacterial behaviors that influence individual fitness and community structure in dynamic ecosystems.

Microbes often exist within dense, multi-phyla communities, like the human gut microbiome, where they communicate and compete with neighbors. Bacteria can use effector-immunity protein (EI) pairs in these environments to gain advantages5, 6. Unlike toxin-antitoxin systems in which a single cell produces both toxic and neutralizing proteins, bacteria inject cell-modifying proteins (effectors) directly into neighboring cells via several contact-dependent transport mechanisms, including the type VI secretion system (T6SS), type IV secretion system (T4SS), and contact-dependent inhibition (CDI)7, 8. Clonal siblings produce the matching immunity protein that modifies the effector’s activity. For lethal effectors, both clonal and non-clonal cells are negatively impacted when binding is disrupted or absent9. These interactions between EI pairs can shape community composition by changing bacterial fitness.

The binding specificity between matching EI pairs has historically defined immunity protein protection, but recent studies raise doubts. Antitoxins have multiple mechanisms for neutralizing their cognate toxin, usually within a single cell. By contrast, EI pairs interact within a neighboring cell which puts unique restrictions on both their protection mechanisms and their evolution10. The predominant model is that T6SS-associated EI pairs act like a tumbler lock-and-key, where each effector protein has a single cognate partner11. Most immunity proteins bind their cognate effectors, often at the active site, to neutralize effector activity12. As such, EI pairs typically have increased protein-protein specificity between matching pairs to protect sibling cells13. However, experiments with engineered proteins reveal that changes to an immunity protein could allow it to bind effectors other than its cognate partner14, 15. Also, the T6SS-associated effector and immunity proteins from Salmonella enterica subsp. enterica serovar Typhimurium and Enterobacter cloacae, which are phylogenetically close, bind each other in vitro and protect against the other in vivo16. These studies suggest that the predominant model needs to be revised to explain the breadth of immunity protein protection.

We studied an EI pair in Proteus mirabilis to examine the prevailing model. This opportunistic pathogen resides in human and animal guts and can cause recurrent and persistent urinary tract infections17. P. mirabilis encodes a lethal T6SS-dependent EI pair that impacts competition and collective motility18. The molecular functions of the putative effector and immunity proteins remained unknown. Here, we characterized this EI pair and determined the critical residues for activity, leading to the discovery of two protein families. We showed that these immunity proteins bind non-cognate effectors produced by bacteria from different phyla. Our results indicated that binding was insufficient to neutralize effector proteins within this immunity protein family. Further, we found that these flexible EI pairs from various phyla naturally co-occur in individual human microbiomes. This work provides compelling evidence for cross-protection and supports a fundamental reinterpretation of EI pairs and their ecological significance.

Results

RdnE is a DNA nuclease and seeds a PD-(D/E)XK subfamily

To compete against other strains, P. mirabilis strain BB2000 needs both the idrD gene and the T6SS, suggesting that the idrD-encoded protein functions as a T6SS-associated effector18. Similar effectors often contain an enzymatic domain in the C-terminus19, 20. As a result, we investigated the function of the final 138 amino acids at IdrD’s C-terminus, now named RdnE for recognition DNA nuclease effector. We measured bacterial growth using a strain derived from BB2000 that has disruptions in its native idrD and downstream genes18. These P. mirabilis cells had 1000 fewer cells per mL when engineered to overproduce RdnE in trans than the negative control containing no RdnE protein (Figure 1A). An equivalent growth pattern occurred in Escherichia coli cells under the same conditions (Supplemental Figure 1). Thus, RdnE was lethal in vivo. RdnE’s initial 86 amino acids contain a PD-(D/E)XK motif, which is suggestive of nucleotide degradation. The PD-(D/E)XK superfamily includes proteins with broad functions, including effectors that degrade DNA or RNA2123. Three residues in the catalytic site—D, D/E, and K—are required for activity24. Therefore, we changed the corresponding residues in RdnE (D39, E53, and K55) to alanine, separately and together.

RdnE homologs act as DNA endonucleases and contain interchangeable domains.

A) Cell viability (colony forming units per mL) after protein production in swarms of P. mirabilis strain idrD*, which does not produce RdnE and RdnI. Cells produced GFPmut2, RdnE, or mutant variants in the predicted PD-(D/E)XK motif: D39A, E53A, K55A, or all. B) In vitro DNA degradation assay for ProteusRdnE. Increasing concentrations of a negative control, ProteusRdnE-FLAG, or ProteusRdnED39A-FLAG were incubated with methylated or unmethylated lambda DNA (48,502 bp) and analyzed by gel electrophoresis. Plasmid DNA degradation is in Supplemental Figure 1. C) In vitro DNA degradation assay for domain deletions of ProteusRdnE. The first construct removed the first alpha helix without disturbing the catalytic residues, and the second construct contained the PD-(D/E)XK motif and removed region 2. Increasing concentrations were analyzed as in (B). D) Multiple sequence alignment between P. mirabilis and R. dentocariosa RdnE sequences. Black bar show the PD-(D/E)XK motif and the grey bar marks the variable region 2 domain. Conserved residues are highlighted in dark blue. Secondary structure predictions identified using Ali2D55, 56 (h for alpha helix, e for beta sheet); the catalytic residues (stars) are noted above the alignment. (E,F) In vitro DNA degradation assay and analysis as in (B). (E) Increasing concentrations of either a negative control, RothiaRdnE-FLAG, or RothiaRdnED39A-FLAG. (F) The PD-(D/E)XK motifs were swapped between the RothiaRdnE (orange) and the ProteusRdnE (green) sequences and compared to the wild-type RdnE proteins.

P. mirabilis producing these mutant proteins showed growth equivalent to the negative control lacking RdnE (Figure 1A). We also noticed that E. coli cells that were producing RdnE had morphologies that were indicative of DNA damage or stress. These cells were elongated, and the DAPI-stained DNA was distributed irregularly within the cells (Supplemental Figure 1). Cells producing a defunct D39A mutant (RdnED39A) did not have this appearance (Supplemental Figure 1). Therefore, the PD-(D/E)XK motif was essential for growth arrest.

The importance of the PD-(D/E)XK motif for activity suggested that RdnE was a nuclease, but defining its molecular target required direct analysis. We synthesized RdnE with a C-terminus FLAG epitope tag using in vitro translation because of its lethality (Supplemental Figure 1). We measured the protein concentrations, added phage lambda DNA (methylated or unmethylated) to progressively higher RdnE protein concentrations, and then performed agarose gel electrophoresis analysis. Degradation of lambda DNA occurred in the presence of RdnE, regardless of the DNA methylation state (Figure 1B). The RdnED39A construct caused a slight reduction in lambda DNA, while the negative control showed no DNA loss (Figure 1B). RdnE also caused a reduction in plasmid DNA, indicating its endonuclease activity (Supplemental Figure 1). These results revealed that RdnE caused DNA degradation in vitro in a PD-(D/E)XK-dependent manner.

RdnE likely has two different domains, as there is a region directly following the PD-(D/E)XK motif. A two-domain architecture is similar to that described for DNases25, 26 and other T6SS effectors like some PoNe-containing DNases21. We made independent deletions of each potential RdnE domain to assess whether both were essential for DNase activity. One construct deleted the first alpha helix without disturbing the catalytic residues; the other deleted the region after the PD-(D/E)XK motif, which we termed “region 2.” The resulting proteins, produced via in vitro translation, were assayed for DNase activity as described above. The truncated proteins resulted in no loss of lambda DNA (Figure 1C), indicating that both domains are necessary for degradation activity.

We next asked whether RdnE homologs also act as DNA nucleases. A bioinformatics search revealed the closest RdnE homolog (outside of Proteus) was found in the Actinobacteria, Rothia dentocariosa C6B. Rothia species are inhabitants of the normal oral flora, dwelling in biofilms within the human oral cavity and pharynx27. The two RdnE proteins (ProteusRdnE and RothiaRdnE) share approximately 55% amino acid sequence identity, mostly within the PD-(D/E)XK domain, and have similar predicted secondary structures (Figure 1D). Given this, we hypothesized that RothiaRdnE also acted as a DNA nuclease.

We analyzed RothiaRdnE for PD-(D/E)XK-dependent DNA nuclease activity. We produced it and a predicted null mutant, RothiaRdnED39A, using in vitro translation. Samples containing the RothiaRdnED39A protein or a negative control had similar DNA levels (Figure 1E). By contrast, samples with the wild-type RothiaRdnE protein showed a loss of lambda DNA regardless of methylation state, indicating that RothiaRdnE also had DNA nuclease activity (Figure 1E). To further analyze RdnE’s architecture, we exchanged region 2 between the ProteusRdnE and RothiaRdnE sequences and assayed for nuclease activity. The hybrid proteins degraded lambda DNA, unlike the negative control (Figure 1F). While DNA degradation by these trans-phyla hybrid proteins required more protein than the wild-type enzymes, the cross-phyla protein fragments could complement one another. These data indicated that these RdnE proteins form a novel PD-(D/E)XK-containing DNA nuclease subfamily.

RdnI binds and neutralizes RdnE

Effectors have cognate immunity proteins, often adjacently located on the chromosome. We hypothesized that the gene adjacent to rdnE in BB2000, rdnI (previously named “idrE”), encodes the cognate immunity protein (Figure 2A). RdnI did not have defined domains, and its function was unknown. We assessed RdnI’s activity using microscopic and cell growth analysis. Swarming P. mirabilis cells are normally elongated with DAPI-stained DNA found along the cell body (Figure 2B). By contrast, swarming cells producing RdnE in trans did not elongate, had a reduced DAPI signal, and had an accumulation of misshapen cells (Figure 2B). Cell shape and DNA-associated fluorescence levels returned to normal when cells concurrently produced the RdnE and RdnI proteins (Figure 2B). RdnI production also rescued cell growth in E. coli cells producing RdnE (Supplemental Figure 2). These data suggest that RdnI inhibits RdnE’s lethality.

RdnI binds to and protects against RdnE in vivo and in vitro.

A) Domain architecture for the idr locus in P. mirabilis strain BB200018. At the top are genes with Pfam domains listed below them. Gray boxes denote PAAR and Rhs domains in the N-terminal region of the full-length IdrD protein. B) Micrographs of P. mirabilis strain idrD* cells carrying an empty vector, a vector for producing RdnE, or a vector for producing RdnE and RdnI. DNA was visualized by DAPI stain. Phase, left; fluorescence, right. C) Swarm competition assay18 of wild-type P. mirabilis strain BB2000 (donor) competed against the vulnerable target, which is P. mirabilis strain ATCC29906 carrying an empty vector, a vector for producing RdnI-StrepII, or a vector for producing GFP. Left, schematic of swarm competition assay where top left colony is BB2000, top right colony is ATCC29906 with its vector cargo, and bottom colony is a 1:1 mixture of BB2000 and ATCC29906 with its vector cargo. Gray boxes underneath indicate whether BB2000 (top) or ATCC29906 (bottom) dominated in the 1:1 mixture and white arrows point to the boundary line that forms between different strains. D) Bacterial two-hybrid (BACTH) assay with RdnED39A-FLAG, RdnI-StrepII, and GFPmut2. The colorimetric change was discerned in the presence of the substrate Xgal and inducer IPTG. E) An anti-FLAG batch co-immunoprecipitation of RdnED39A-FLAG and RdnI-StrepII. RdnED39A-FLAG or exogenous FLAG-BAP (soluble fraction) was incubated with anti-FLAG resin (FLAG flow through). RdnI-StrepII was then added to the resin (RdnI-StrepII flow through). Any proteins bound to resin were eluted with FLAG-peptide (Elution) and analyzed by anti-FLAG and anti-StrepII western blots.

To determine whether RdnI provided protection against injected RdnE within mixed communities, we used swarm competition assays, which combine one-to-one mixtures of P. mirabilis strains and then measure dominance18. The control strain was wild-type strain BB2000 (herein called “BB2000”), which naturally produced RdnE and RdnI. The other was strain ATCC29906, which did not naturally produce RdnE and RdnI. These two strains formed a visible boundary between swarming monoculture colonies (Figure 2C). The mixed-strain colony merged with BB2000 in one-to-one competitions, demonstrating BB2000’s dominance (Figure 2C). A similar outcome was seen when ATCC29906 was producing Green Fluorescent Protein (GFPmut2). However, BB2000 did not outcompete ATCC29906 engineered to produce RdnI with a C-terminal StrepII epitope tag (“RdnI-StrepII”), seen by a merging of the mixed-strain colony with ATCC29906 (Figure 2C). Thus, RdnI protected cells against injected RdnE within mixed communities.

Based on the prevailing model, we predicted that a cognate EI pair should bind to one another, which we evaluated in vivo and in vitro. We used the defunct mutant (RdnED39A-FLAG) for these assays because producing the wild-type RdnE protein kills cells. For in vivo analysis, we used bacterial two-hybrid assays (BACTH) in which the reconstitution of the T18 and T25 fragments of adenylate cyclase results in the colorimetric change to blue in the presence of the substrate, X-gal28, 29. Constructed vectors contained genes for RdnED39A-FLAG, RdnI-StrepII, or GFPmut2 on the C-termini of the T18 or the T25 fragments. When the reporter strain produced RdnED39A-FLAG or RdnI-StrepII with GFPmut2, the resultant yellow color was equivalent to when X-gal was absent (Figure 2D). There was also minimal color change when an individual protein was produced on both fragments (Figure 2D). However, the reporter strains made blue colonies when X-gal was present and the cells concurrently produced RdnED39A-FLAG and RdnI-StrepII. These results indicated that RdnE and RdnI bind to each other in vivo. We used batch in vitro co-immunoprecipitation assays to confirm the in vivo binding result. Separate E. coli strains produced either RdnED39A-FLAG or had a negative control, exogenous FLAG-BAP (E. coli bacterial alkaline phosphatase with a FLAG epitope tag) added to cell lysate. An anti-FLAG western blot showed both FLAG-BAP (∼ 50 kDa) and RdnED39A-FLAG (∼ 17 kDa) in the soluble and elution fractions. RdnI-StrepII eluted with RdnED39A-FLAG but not the negative control (Figure 2E). The western blot results corresponded with the Coomassie blue-stained gels (Supplemental Figure 2). Overall, our data showed that Proteus RdnE and RdnI form a cognate EI pair. However, questions about their prevalence among bacteria remained, because RdnE and RdnI matched no known effector or immunity families.

Expansion of the RdnE and RdnI protein families revealed similar gene architecture and secondary structures

Gene neighborhood analysis can guide homology inference and protein comparisons. We conducted consecutive searches with BLAST30 and HMMER31 to identify sequences that encoded proteins with high similarity to RdnE and RdnI (Supplemental Figure 3). The final list contained 21 EI pairs from a variety of phyla (Table 3). Although the genes surrounding these putative EI pairs differed, many shared mobile and transport-associated elements, such as recombinant hotspot (Rhs) sequences or other similar peptide-repeat sequences (Figure 3A). Several gene neighborhoods had transport-associated genes, such as the T6SS-associated vgrG/tssC gene and the CDI-associated cdiB gene. A few also included putative immunity proteins from other reported families, like immunity protein 44 (Pfam15571) in Taylorella asinigenitalis MCE3 and immunity protein 51 (Pfam15595) in Chryseobacterium populi CF314. Notably, these organisms varied widely in origin and residence. Some were from the soil rhizosphere (Pseudomonas ogarae and C. populi) and others from the human microbiome (P. mirabilis, R. dentocariosa, and Prevotella jejuni) (Figure 3A). The phylogenetic tree showed distribution across bacterial phyla (Figure 3B), perhaps reflecting these proteins’ role in community interactions.

RdnE and RdnI protein families share conserved residues and predicted structures.

A) Gene neighborhoods for RdnE and RdnI homologs. Listed are gene neighborhoods, relevance, and niche, which we identified using IMG/M from the Joint Genomics Institute. Colors highlight conserved function/genes (not to scale). (Agr: Agriculture, Med: Medical, Env: Environmental), and the site of isolation. B) Phylogenetic tree based on NCBI taxonomy. Scale is located below the graph. The colored circles represent phyla (green: Actinobacteriota; yellow: Firmicutes; blue: Bacteroidota; pink: Proteobacteria). C) Unrooted maximum likelihood trees of the RdnE (left) and RdnI (right) homologs. Trees were created with RaxML60, and the scale is annotated below. The colored circles represent phyla (same as in B). D) Protein alignments overlaid with either predicted secondary structures (top) or conserved residues (bottom) of the RdnE and RdnI homologs. MUSCLE alignments52 are highlighted by secondary structures (red: alpha helices, light blue: beta sheets), or conserved residues (dark blue). White represents gaps in the protein alignment. The bars below mark the predicted conserved and variable domains. E) Alignments of AlphaFold2 predictions for RdnE and RdnI sequences from P. mirabilis (green), R. dentocariosa (orange), P. jejuni (magenta), and P. ogarae (dark blue). Structures were generated using ColabFold32 and aligned using PyMol. The P. mirabilis RdnI sequence is a natural variant with two residue substitutions (V101E and I222M) as compared to BB2000 (supplemental figure 6).

We next constructed maximum likelihood trees to examine the relationship between the identified proteins. Both the RdnE and RdnI trees diverged from the species tree (Figure 3B) and each other (Figure 3C). Proteins from evolutionarily distant bacteria appeared to share more similarities compared to those from more closely related genera. These results are consistent with potential horizontal gene transfer seen for other EI pairs7.

Yet, despite the differences in phyla and gene neighborhoods, the RdnE and RdnI proteins displayed the defining characteristics of protein families. The predicted secondary structures of the RdnE proteins were similar even though the amino acid sequences differed (Figure 3D, Supplemental Figure 3). The RdnE proteins showed two distinct domains, as with the Proteus and Rothia results (Figure 1): the PD-(D/E)XK region followed by a sequence variable region (region 2). Further, the predicted tertiary structures were consistent with PD-(D/E)XK folds (three β-sheets flanked by two α-helices, α/β/α) found in other proteins (Figure 3E, Supplemental Figure 4)24, 32. These findings suggested that domains in RdnE are conserved across diverse phyla and the sequences seed a distinct PD-(D/E)XK subfamily.

While immunity proteins within a family have diverse overall amino acid sequences, conserved secondary structures and some conserved residues are not uncommon among immunity protein families. Indeed, they are often used to characterize these families33. We found that, while the RdnI proteins shared minimal primary amino acid sequence identity, they were predicted to contain several α-helices (Figure 3D, Supplemental Figure 3) and had similar AlphaFold2-predicted tertiary structures (Figure 3E, Supplemental Figure 4). We also discovered a region with three alpha-helices and several conserved residues, which we named the “conserved motif” (Figure 3D). Interestingly, the addition of this conserved motif indicates that the RdnI architecture mirrors that of the two-domain RdnE proteins (Figure 3D). Therefore, the RdnI conserved motif may also indicate a novel immunity domain and seed a protein family.

Binding flexibility in RdnI allows for cross-species protection

We deployed a structure-function approach to determine the conserved motif’s role in the ProteusRdnI’s activity. Bioinformatics using AlphaFold232 and Consurf34 revealed seven highly conserved residues within this region; four of these (Y197, H221, P244, E246) clustered together within the AlphaFold2 structure and are identical between sequences (Figures 4A). In a sequence-optimized (SO) RdnI, we individually changed each of these four to alanine and discovered that each alanine-substituted variant behaved like the wildtype and inhibited RdnE activity (Supplemental Figure 5). We then replaced all seven residues (Y197, S235, K258, and the original four) with alanine (ProteusRdnI7mut-StrepII) and found that, unlike wild-type ProteusRdnI-StrepII, this construct was not protective in swarm competition assays (Figure 4B). However, the ProteusRdnI7mut-StrepII mutant still bound RdnED39A-FLAG in bacterial two-hybrid assays (Figure 4C). Therefore, the seven residues in the conserved motif are critical for RdnI’s neutralizing function but dispensable for binding RdnE.

The RdnI protein family can offer cross-protection due to an interchangeable conserved domain that is critical for function.

A) Sequence logo of the RdnI’s conserved motif. Stars indicate the seven analyzed residues. B) Swarm competition assay with ATCC29906 producing either RdnI-StrepII or RdnI7mut-StrepII, which contains mutations in all seven conserved residues. We used a sequence-optimized (SO) RdnI protein that had a higher GC% content and an identical amino acid sequence for ease of cloning. Left: schematic of swarm competition assay as in Figure 2. Gray boxes indicate which strain dominated over the other. White arrows point to the boundary formed between different strains. C) BACTH assay of RdnED39A-FLAG with SO RdnI-StrepII or RdnI7mut-StrepII. GFPmut2 was used as a negative control. D) Swarm competition assay with ATCC29906 expressing either the wild-type RdnI or a RdnI truncation. The three truncations were in the first alpha helix (amino acids 1-85), the second half of RdnI (amino acids 150-305), and the end of the protein (amino acids 235-305). E) BACTH assay of RdnED39A-FLAG with wild-type RdnI and the three RdnI truncations. F) Swarm competition assay with ATCC29906 expressing foreign RdnI proteins. G) BACTH assay of RdnED39A-FLAG with each of the foreign RdnI proteins. GFPmut2 was used as a negative control. H) Swarm competition assay with ATCC29906 producing SO RdnI with swapped conserved motifs. I) BACTH assay of RdnED39A-FLAG with SO RdnI with swapped conserved motifs. Colored bars denote RdnI-StrepII proteins from P. mirabilis (green), R. dentocariosa (orange), P. jejuni (magenta), or P. ogarae (dark blue).

Given that the conserved motif and nearby regions are likely involved in function, we queried for potential functions in the remainder of the RdnI protein. We engineered variants that were either (1) the first 85 amino acids, (2) amino acids 150 to 305, which contained an intact conserved motif, or (3) amino acids 235 to 305, which contained the last alpha helix of the conserved motif (Supplemental Figure 3). None of these constructs protected against RdnE’s lethality in vivo (Figure 4D), demonstrating that the entire protein is likely essential for function. The variant containing the first 85 amino acids of ProteusRdnI was the only construct to bind ProteusRdnE, indicating that the N-terminal region is sufficient for binding between this P. mirabilis EI pair (Figure 4E). Thus, binding is necessary but insufficient for neutralization, and the neutralization activity likely resides within the second half of RdnI. These results, as well as the functional domain swaps between the Proteus and Rothia RdnE proteins (Figure 1F), contradicted the prevalent model’s assertion that one-to-one binding is both necessary and sufficient for immunity protein function against a cognate effector.

Therefore, we explored the relationship between non-cognate RdnE and RdnI proteins from various phyla. We first asked whether non-cognate RdnI immunity proteins could protect against injected ProteusRdnE. Using the swarm competition assays, we competed BB2000 against ATCC29906 engineered to produce RdnI homologs from R. dentocariosa, P. jejuni, or P. ogarae (RothiaRdnI-StrepII, PrevotellaRdnI-StrepII, and PseudomonasRdnI-StrepII, respectively). BB2000 dominated the swarm when ATCC29906 produced GFPmut2, PrevotellaRdnI-StrepII, or PseudomonasRdnI-StrepII (Figure 4F). However, ATCC29906 outcompeted BB2000 when making ProteusRdnI-StrepII or RothiaRdnI-StrepII (Figure 4F). Consistent with these findings, the RdnI immunity proteins from Proteus and Rothia consistently bound ProteusRdnED39A in in vivo and in vitro assays, but the Prevotella and Pseudomonas variants did not (Figure 4G, Supplemental Figure 7). Thus, RothiaRdnI bound and protected against ProteusRdnE, demonstrating that cross-protection between non-cognate EI pairs from different phyla is possible, provides a fitness benefit during competition, and influences community structure.

Given that the entire immunity protein is necessary for protection, we next evaluated which region(s) of RdnI allowed for cross-protection. We first moved the conserved motif of the three foreign RdnI homologs into ProteusRdnI-StrepII and measured neutralizing activity using swarm competition assays and binding activity using BACTH. The conserved motifs from Rothia and Prevotella were sufficient to preserve ProteusRdnI’s neutralizing (Figure 4H) and binding functions (Figure 4I). However, the conserved motif from Pseudomonas was insufficient to neutralize ProteusRdnE (Figure 4H), even though the construct could still bind ProteusRdnED39A (Figure 4I). We then moved the Proteus conserved motif into the RdnI variants from Prevotella and Pseudomonas, but the motif was insufficient to confer neutralizing (Figure 4H) or binding function (Figure 4I). These results showed that binding RdnE is necessary but not sufficient for RdnI protection. Further, within RdnI, the conserved motif is necessary for neutralization.

RdnE and RdnI proteins from diverse phyla are present in individual human microbiomes

Our findings indicate that immunity proteins such as RdnI could provide a broader protective umbrella for a cell beyond inhibiting the effector proteins of their siblings. If so, one would expect to find evidence of RdnE and RdnI homologs from different phyla in the same environment or microbial community. We tested this hypothesis by analyzing about 500,000 publicly available microbiomes (metagenomes) for the specific rdnE and rdnI genes examined in Figure 4 (Figure 5A). 2,296 human and terrestrial metagenomes contained reads matching with over 90% identity to these rdnE sequences (Figure 5B). The reads mapped to the expected niche for each organism, underscoring the presence of these specific effector proteins in naturally occurring human-associated microbiomes. Further, the rdnE and rdnI genes from various human-associated bacteria occurred concurrently in individual human oral and, to a lesser extent, gut metagenomes. The rdnE and rdnI genes from Rothia and Prevotella co-occurred in approximately 5% of the metagenomes analyzed (Figure 5C). Stringent detection parameters were utilized, so the true number could be higher. We then compared the abundance of rdnI to rdnE, since metagenomic coverage (i.e., the number of short reads that map to a gene) approximates the underlying gene’s abundance in the sampled community. In most gut samples, rdnI recruited more reads than rdnE, although there was substantial variance (Figure 5D). These data could indicate the presence of orphan rdnI genes, which is consistent with published T6SS orphan immunity alleles35, 36. These metagenomic patterns suggest that a single community can produce multiple RdnE and RdnI proteins from different phyla to defend against each other, further highlighting the intricate nature of bacterial defense in crowded environments.

The RdnI protein family has the potential for broader protection within oral and gut microbiomes.

A) Methodology used to identify rdnE and rdnI genes in publicly available metagenomic data. Metagenomes were mapped against sequences with a stringency of 90%. “Coverage” denotes the average depth of short reads mapping to a gene in a single sample. Colors represent rdnE and rdnI from P. mirabilis (green), R. dentocariosa (orange), P. jejuni (magenta), or P. ogarae (dark blue). B) The experimentally tested rdnE gene sequences from different organisms (colors) are found in thousands of human-associated metagenomes. Each dot represents a single sample’s coverage of an individual rdnE gene, note log10-transormed y-axis. Only samples with >1x coverage are shown. C) Euler diagram showing the number of samples with co-occurring rdnE genes from different taxa (colors). D) Kernel density plot of the ratio of rdnI to rdnE coverage. The ratio of rdnI to rdnE was defined as log10(I/E) where I and E are the mean nucleotide’s coverage for rdnI and rdnE, respectively. The distribution of ratios was summarized as a probability density function (PDF) for each taxon (color) in each environment (subpanel). Here, the y-axis (unitless) reflects the probability of observing a given ratio (x-axis) in that dataset. The colored numbers in the top right of each panel show the number of metagenomes above the detection limit for both rdnE and rdnI for each taxon. Dashed vertical lines represent the median ratio. E) Skeleton-key model for immunity protein protection. Top, the current prevailing model for immunity proteins is that protection is defined by necessary and sufficient binding between cognate effectors (locks) and immunity proteins (keys). Bottom, our proposed skeleton-key model for protection is that multiple immunity proteins (skeleton-keys) can bind a single effector due to a flexible (promiscuous) binding site. Protection is a two-step process of binding and then neutralization.

Discussion

Using these results as a foundation, we propose an alternate model of “EI skeleton keys,” in which flexible (“promiscuous”) binding between immunity and effector proteins enables protection in mixed-species communities (Figure 5E). This model expands the current paradigm of (selective) cognate EI partners37 and has far-reaching implications for immunity protein evolution and molecular function. Unlike most known immunity proteins that fit within the prevailing model, we demonstrated that flexible EI binding is necessary but not sufficient to neutralize RdnE. We also showed that both RdnE and RdnI proteins have multiple domains, one conserved with functional activity and the other variable across phyla. The combined findings point to a two-step mechanism for RdnI protection: the N-terminal variable-sequence domain mediates binding to an effector, while the C-terminal conserved domain is required for neutralization in a function that has yet to be determined (Figure 5E).

The RdnE and RdnI protein families have parallel structural organizations. RdnE’s conserved region contains the enzymatic PD-(D/E)XK motif, which is necessary for DNA nuclease activity. Its variable region comprises the second half of the protein, which is also necessary for degradation activity. Despite amino acid variability among variants, these domains can be swapped between phyla and retain function. On the other hand, RdnI’s variable and conserved regions likely determine binding and function, respectively. ProteusRdnI’s first 85 amino acids are sufficient and necessary for binding ProteusRdnE, while the conserved motif, found in the last two-thirds of RdnI, is necessary for function. RdnI’s protective ability can be preserved when the conserved motif is swapped between species, even among those that cannot otherwise bind. The interchangeability between domains in RdnE and RdnI variants could reflect constraints on their evolutionary paths.

Compared to the prevailing model, our proposed skeleton-key model better explains RdnI’s conserved motif, orphan immunity genes, and other non-cognate protein interactions. Sequence diversity among EI pairs is often attributed to the coevolution between effector and immunity proteins, which is thought to maintain specificity and binding33. However, this coevolution model does not fully explain immunity proteins’ conserved regions or the presence and protective activity of orphan immunity proteins38, 39. A recent study with the Tri1 immunity protein shows that it has conserved enzymatic activity, allowing for protection against foreign effectors, but there is no cross-binding between non-cognate EI pairs40. This data suggests that conserved enzymatic activity can be maintained alongside the coevolution necessary for strict cognate pair binding. Therefore, we propose that EI protein domains may evolve independently instead of coevolution across the full protein. For example, RdnE’s PD-(D/E)XK motif and RdnI’s conserved motif might need to be maintained for activity, while the variable domains may diversify in sequence more freely. Depending on the selective pressures, the variable regions could reinforce specificity between cognate EI pairs as they coevolve, like the Tri1 family. Additional evolutionary analysis would reveal how the balance between specificity and flexibility evolves in EI pairs, both within domains and across the entire protein.

Flexible protection likely applies beyond contact-dependent EI pairs. Indeed, this model is conceptually similar to the cross-binding seen in type II antitoxins and bacteriocins. Within the ParD antitoxin family, 45% of the twenty antitoxins tested show cross-protective activity with at least one non-cognate ParE toxin41. For the E group of colicins, non-cognate immunity proteins show a lower binding affinity than for their cognate immunity proteins, but they still offer cross-protection13, 42. Similarly, immunity proteins from Carnobacterium piscicola and Enterococcus faecium offer cross-protection against the other’s cognate bacteriocin43. In these systems, as with RdnE-RdnI, flexible binding between a harmful protein and a non-cognate neutralizing protein can result in a potential fitness advantage for the bacteria.

When considering the impacts on community structure, the broadened activity of RdnI proteins against RdnE effectors from multiple phyla likely increases bacterial fitness, which is advantageous in dense environments. RdnI production increased cells’ fitness and modified the community in experimental setups like the swarm competition assays as it enabled vulnerable bacteria to inhabit previously restricted spaces. Supporting this experimental data, both gut and oral metagenomes showed evidence of multiple rdnE-rdnI pairs within individual samples, particularly between Rothia and Prevotella. Interestingly, the oral microbiome had roughly equivalent abundance between the effector and immunity genes, which might reflect that bacteria occupy distinct spatiotemporal niches within oral microbiomes, e.g. R. dentocariosa is predominantly on tooth surfaces44. By contrast, rdnI genes had greater abundance compared to rdnE in the gut microbiomes, which may reflect the greater diversity in member species and community structures found in the gut45. Orphan immunity genes are indeed a known phenomenon in T6SS EI literature but are usually documented through single isolate sequencing. This community level assessment affirms the presence of rdnI orphan genes on a population scale and points to relatively widespread immunity genes in hundreds or thousands of samples.

Given the ability of immunity genes to protect against non-cognate effectors, the presence of diverse orphan rdnI genes hints at the ecological complexity surrounding RdnE and RdnI. This community of immunity proteins is reminiscent of the model for shared immunity proteins within an ecosystem, also called a “hyper-immunity state,” which was seen among colicins in wild field mice46. In this hyper-immunity state, a set of immunity proteins shared among a community could offer an advantage against pathogens. Invading bacteria would be unable to protect themselves from certain effectors while the community would be protected as they share the immunity proteins necessary for defense. Flexible binding like RdnI could contribute to such a “hyper-immunity state” to help a bacterial community maintain its niche.

Indeed, bacteria have a diverse set of protective measures to ward off foreign effectors in addition to flexible immunity proteins. Recent work has identified non-specific mechanisms of protection including stress-response, physical barriers, and a stronger offense11. Orphan immunity genes also exist throughout many bacterial genomes and may be a part of this system38, for example, these genes offer a fitness advantage in mouse microbiomes39. Our data extends the current repertoire of protection mechanisms by adding another tool: a flexible immunity protein collection, where each immunity protein acts as a skeleton key against a wider class of effectors. This collection could be useful in dense, diverse communities where contact-dependent competition using EI pairs is critical to maintain one’s population. Taken together, the physical interactions between, and evolution of, effector and immunity proteins remain a rich area for new explorations.

Acknowledgements

We thank Caroline Boyd, Neils Bradshaw, Emma Keteku, Alecia Septer, Nora Sullivan, Adnan Syed, and Larissa Wenren for contributing experimental materials to this project. Rachelle Gaudet, Colleen Cavanaugh, and members of the Gibbs Lab provided valued advice on the manuscript. The David and Lucile Packard Foundation, the George W. Merck Fund, Harvard University, and the University of California, Berkeley and its Rausser College of Natural Resources funded our research. A.K., D.S., D.U., and K.A.G designed and performed research as well as analyzed data. A.K., D.S., D.U., and K.A.G wrote the paper. We have no competing interests to declare.

Materials and Methods

Bacterial strains and media

All strains are described in Table 1. Strains for bacterial two-hybrid assays were transformed fresh the day before. Overnight cultures were grown aerobically at 37°C in LB (Lennox) broth47. All E. coli strains were plated on LB (Lennox) agar surfaces (1.5% Bacto agar) and all P. mirabilis strains were plated on LSW- agar47 or 25 mL CM55 media (blood agar base agar [Oxoid, Basingstoke, England]). When necessary, antibiotics were used in the media at the following concentrations: 35 μg/mL kanamycin, 100 μg/mL carbenicillin.

List of strains used in this study.

Plasmid construction

Plasmids were constructed according to Table 2. Primers and gBlocks were ordered from Integrated DNA Technologies (IDT), Coralville, IA. PidrA-RdnE was constructed by Polymerase Chain Reaction (PCR) amplifying the last 416 bp of the idrD gene from BB2000 and cloning it into the SacI and AgeI sites of the pBBR1-NheI vector, resulting in plasmid pAS1054. RdnE is the final 138 amino acids of IdrD (out of its total of 1581). PidrA-rdnE-rdnI was constructed by PCR amplifying the last 416 bp of the idrD gene through the end of the rdnI gene from BB2000, resulting in the plasmid pAS1059. The gBlock and primer sequences will be archived on an OSF website (https://osf.io/scb7z/) and made publicly available upon publication.

Plasmids used in this study.

RdnE and RdnI homolog species.

We used several standard protocols for vector construction. Seamless ligation cloning extract (SLiCE) was adapted from Zhang et al48. Restriction-digest reactions were based on manufacturer’s protocols. Overlap extension (SOE) PCR Amplification was adapted from Heckman 200749. Plasmids were transformed into OmniMax E. coli and confirmed using Sanger Sequencing (UC Berkeley DNA Sequencing Facility and Genewiz, South Plainfield NJ).

In vitro DNase assay

RdnE proteins were produced using the New England Biolabs PURExpress In Vitro Protein Synthesis Kit (New England BioLabs Inc., Ipswich MA). Template DNA contained the rdnE gene and required elements specified by the PURExpress kit. We adapted this protocol from prior in vitro DNA-degradation assays50. Reactions were performed with 250 ng of template DNA (no template DNA added to negative control reaction) and incubated at 37°C for two hours. Protein amount was determined using an anti-FLAG western blot with a known gradient of FLAG-BAP (2.5, 5, 10, and 20 ng). Synthesized protein (2.5, 5, and 10 ng) was added to 0.5 µg of lambda DNA (methylated and unmethylated), 5 µL of New England Biolabs Buffer 3.1, and up to a final volume of 25 µL. For plasmid DNase assays, 10 ng of synthesized protein was added to 250 ng of circular or linear plasmid DNA (pids 51). This reaction was incubated for one hour at 37°C, then Proteinase K (New England Biolabs Inc., Ipswich MA) was added and incubated for an additional 15 minutes at 37°C. Reaction was then run on a 1% agarose gel for analysis.

E. coli liquid growth and viability assays

Overnight cultures were grown at 37°C in a shaking incubator in LB broth with appropriate antibiotics. Cultures were normalized to an optical density at 595 nm (OD595) of 1 and diluted 1:100 into LB broth with 35 μg/mL kanamycin, with and without 200 nM anhydrotetracycline (aTc). Some samples were analyzed for OD595 every thirty minutes for 16 hours in a 96-well plate using a TECAN. Other samples were incubated at 37°C for 6 hours while rocking. At indicated time points, 100 μL of sample was removed, diluted, and then plated on fresh LB agar plates to measure colony forming units per mL (CFU) after overnight growth at 37°C using standard protocols.

Microscopy

We performed microscopy on P. mirabilis strain idrD::Tn5 (CmR) (also called, idrD*), which has a transposon insertion to disrupt rdnE and rdnI expression18, carrying either vector pBBR1-NheI or pDS0002 (producing RdnE) and on E. coli carrying either pBBR1-NheI, pDS0002, or pDS0048 (producing RdnED39A). P. mirabilis cells were normalized to OD595 of 0.1 after overnight growth in LB broth supplemented with kanamycin. Cells were inoculated onto CM55 swarm pads containing 10 µg/mL DAPI and 10 nM aTc and grown in humidified chambers at 37°C. Images were taken at five and six hours after growth. From overnight cultures, E. coli cells were grown in LB broth plus kanamycin until mid-logarithmic phase and then mounted directly onto glass slides. Glass coverslips were sealed with nail polish. For all microscopy, we captured phase contrast and DAPI (150 ms exposure) images using a Leica DM5500B microscope (Leica Microsystems, Buffalo Grove IL) and CoolSnap HQ CCD camera (Photometrics, Tucson AZ) cooled to −20°C. MetaMorph version 7.8.0.0 (Molecular Devices, Sunnyvale CA) was used for image acquisition.

Sequence Optimized RdnI

The P. mirabilis rdnI-StrepII sequence was difficult to genetically engineer into due to its low GC% content (23%). As such, we engineered the sequence to have a higher GC%, called “Sequence optimized (SO) ProteusRdnI-StrepII” but no change to the amino acid sequence. The change to the nucleotide sequence did not affect the construct’s ability to offer protection to a vulnerable strain (Figure 3G).

Swarm competition assay

The swarm competition (territoriality) assay was adapted from Wenren et. al. 201318. 5 mL cultures were grown in LB broth with appropriate antibiotics overnight in a 37°C rocking incubator. Overnight cultures were normalized to an OD595 of 1. For the competition samples, the strains were mixed 1:1. 2 μL of each sample were inoculated onto CM55 agar with the appropriate antibiotic. Plates were incubated at 37°C for 22 hours and then photographed and assessed for boundary formation.

BACTH assay

The vectors are described in Battesti and Bouveret 201228 with an added linker region between the T25 or T18 fragments and multiple cloning sites. BTH101 cultures were grown at 30°C overnight in LB broth with kanamycin and carbenicillin. 10 μL of the overnight culture were inoculated onto LB agar with kanamycin, carbenicillin, 1 mM IPTG, and 0 or 40 μg/mL of X-gal (Thermo Fisher, Waltham MA), and grown at 30°C for 24 hours. Color was amplified by an additional 24 hours at 4°C, and then samples were imaged.

FLAG co-immunoprecipitation assays

The protocol was adapted from Cardarelli et. al. 20152. E. coli cells were harvested from LB broth, grown for either 3 hours after induction with 200 nM aTc at 37°C or 16 to 20 hours at 16°C after induction with 1 mM IPTG. Cells were then pelleted by centrifugation and flash frozen in liquid nitrogen. RdnE-containing samples were lysed in 50 mM Tris pH 7.4, 150 mM NaCl, and 1x Protease Inhibitor Cocktail (Selleck Chemicals LLC, Houston TX), via bead bashing for 20 minutes at 4°C. RdnI-containing samples were lysed in 100 mM Tris-HCl pH 8, 180 mM NaCl, and 1x Protease Inhibitor Cocktail via 10x 10 second sonication pulses. The soluble fraction for both samples was obtained after centrifugation at 15,000 rpm for 15 minutes. FLAG epitope containing samples were incubated with prepared resin for two hours at 4°C. The resin was then washed twice (50 mM Tris pH 7.4, 150 mM NaCl, and 1% Tween-20), incubated with approximately 1 mL of the soluble fraction of the RdnI-StrepII-containing samples for another 2 hours, and washed thrice more. The protein was finally eluted with 50 μL of 300 ng/μL 3x FLAG peptide (Sigma-Aldrich, St. Louis, MO) for 45 minutes at 4°C. Sample buffer (63 mM Tris pH 6.8, 2% Sodium Dodecyl Sulfate, 10% glycerol, 5% 2-Mercaptoethanol) was added to samples, boiled at 95°C for 10 minutes, and frozen at −80°C.

SDS-Page and Western blotting

The protocol was adapted from Cardarelli et. al. 20152. Protein samples were separated by gel electrophoresis using 13% Tris-Tricine polyacrylamide gels and either transferred to a 0.45-μm nitrocellulose membrane (Bio-Rad Laboratories, Hercules CA) or stained with Coomassie blue (Bio-Rad Laboratories, Hercules CA). Western blot membranes were probed with primary antibody (either 1:4000 rabbit anti-FLAG [Sigma Aldrich, St. Louis MO] or 1:4000 mouse anti-StrepII [Genscript, Piscataway NJ]) for 1 hour at room temperature or overnight at 4°C and with secondary antibody (either 1:5000 goat anti-rabbit or anti-mouse respectively conjugated to horseradish peroxidase (HRP) [KPL, Inc., Gaithersburg MD]) for 30 minutes at room temperature. Samples were finally visualized using Immun-Star HRP substrate kit (Bio-Rad Laboratories, Hercules CA) and the Chemidoc XRS system (Bio-Rad Laboratories, Hercules, CA). TIFF files were analyzed on Fiji (ImageJ, Madison, WI).

Bioinformatics search for RdnE and RdnI homologs

A BLAST30 search of the P. mirabilis RdnE protein sequence revealed seven RdnE homologs from a variety of species. The downstream genes of these RdnE homologs were identified using the DOE Joint Genome Institute (JGI) Integrated Microbial Genomes and Microbiomes (IMG/M). The seven RdnE and RdnI amino acid sequences were separately aligned with MUSCLE using Jalview52. These alignments were then used as seeds for a second homology search using HMMERsearch (HmmerWeb version 2.41.2) and the Ensembl Database53. The two data sets were then compared for genomes that contained both rdnE and rdnI genes next to one another within their respective genomes. Any EI pairs that contained disrupted PD-(D/E)XK motifs within their RdnE sequence were removed.

Gene neighborhood and primary conservation analyses

Gene neighborhoods were obtained using JGI’s IMG/M Neighborhood viewer and then redrawn using Adobe Illustrator (Adobe Inc., 2022). Locations of predicted functions are approximate primarily based on the Pfam domain calling by IMG/M. The final 21 RdnE and RdnI sequences were aligned with MUSCLE using Jalview52. The conserved residues were identified using Jalview and the cartoons were created using Adobe Illustrator (Adobe Inc., 2022). The sequence logo for the RdnI conserved motif was generated with WebLogo54 and constrained to only visualize the conserved motif.

Secondary and tertiary structure predictions

Secondary structure predictions of the MUSCLE aligned sequences were determined with Ali2D from the MPI Bioinformatics toolkit55, 56. The resulting predictions were made into cartoons manually using Adobe Illustrator (Adobe Inc., 2022). Tertiary structure predictions were done with AlphaFold2 using MMseqs2 on Google Colab32. Query protein sequences were inputted into the program and then run producing 5 models ranked 1-5. Rank 1 models are shown. pIDDT scores indicate confidence levels for each amino acid position. Structures were analyzed in PyMOL (The PyMOL Molecular Graphics System, Version 2.2.3 Schrödinger, LLC.). The pIDDT graphs are in the supplemental data.

Metagenomic analyses

A sourmash-based approach was used to screen approximately 500,000 public metagenomes stored on NCBI’s SRA (https://github.com/sourmash-bio/2022-search-sra-with-mastiff) for the presence of the ten genomes shown in Figure 3A. Hits with a containment score greater than 0.2 were downloaded for further analysis, representing 9,137 metagenomes. Each metagenome was then mapped with bbmap57 against the 10 rdnE and rdnI sequences with a stringency of 90% (minid = 0.9), along with quality filtering (trim1 = 20, minaveragequality = 10). After mapping, metagenomes were retained if they had (1) a mean coverage greater than 2X, (2) at least one base covered greater than 5X, and (3) more than half of the bases on reference rdnE-rdnI sequence receiving coverage. 2,857 metagenomes met these criteria, of which 2,296 contained P. mirabilis, R. dentocariosa, P. jejuni, or P. ogarae sequences and could be confidently assigned to samples obtained from the human gut or oral microbiome or from terrestrial sources. Gene-level coverage in a sample was then summarized as each gene’s average nucleotide’s coverage. The ratio of rdnI to rdnE coverage was then calculated for each sample and log10-transformed, and the distribution of ratios was summarized with Python’s seaborn kdeplot using a bandwidth of 0.458.

Open Science statement

The sequence files and associated data will be archived on an OSF website (https://osf.io/scb7z/) and made publicly available upon publication.