Genetic organization, signature sequence motifs, structural models, and phyletic distribution of McrB GTPases detected in this work

A) McrBC is a two-component restriction system with each component typically (except for extremely rare gene fusions) encoded by a separate gene expressed as a single operon, depicted here and in subsequent figures as arrows pointing in the direction of transcription. In most cases, McrB is the upstream gene in the operon. B) In the prototypical E.coli K-12 McrBC system, McrB contains an N-terminal methylcytosine-binding domain, ADAM/DUF3578, fused to a GTPase of the AAA+ ATPase clade (Sukackaite et al., 2012). This GTPase contains the Walker A and Walker B motifs that are conserved in P-loop NTPases as well as a signature NxxD motif, all of which are required for GTP hydrolysis (Nirwan et al., 2019, Niu et al., 2020, Pieper et al., 1999). McrC consists of a PD-DxK nuclease and an N-terminal DUF2357 domain, which comprises a helical bundle with a stalk-like extension that interacts with and activates individual McrB GTPases while they are assembled into hexamers (Niu et al., 2020, Nirwan et al., 2019). C) AlphaFold2 structural model of E. coli K-12 ADAM-McrB GTPase fusion protein monomer and separate X-ray diffraction and cryo-EM structures of the ADAM and GTPase domains (Niu et al., 2020, Sukackaite et al., 2012). D) AlphaFold2 structural model and cryo-EM structure of E. coli K-12 McrC monomer with DUF2357-PD-D/ExK architecture (Niu et al., 2020). The structures were visualized with ChimeraX (Pettersen et al., 2021). E) Phyletic distribution of McrB GTPase homologs detected in this work, each found in a genomic island with distinct domain composition, clustered to a similarity threshold of 0.9, then assigned a weight inversely proportional to the cluster size (see Methods).

Phylogenetic tree of the McrB GTPases

The major clades in the phylogenetic tree of the McrB GTPases are distinguished by the distinct versions of the Nx(xx)D signature motif. The teal and yellow groups, with bootstrap support of 97%, have an NxD motif, whereas the blue, green, red, and purple groups, with variable bootstrap support, have an NxxD motif, indicated by the arrows; the sequences in the smaller, cyan clade, with 98% bootstrap support, have an NxxxD motif. Each of the differently colored groups is characterized by distinct conserved genomic associations that are abundant within but not completely confined to the respective groups. This tree was built from the representatives of 90% identity clusters of all validated homologs. Abbreviations of domains: McrB – McrB GTPase domain; CoCo/CC – coiled-coil; MN – McrC N-terminal domain (DUF2357); CSD – cold shock domain; IG – Immunoglobulin (IG)-like beta-sandwich domain; ZnR – zinc ribbon domain; SPB – SmpB-like domain; RTL – RNase toxin-like domain; HEPN – HEPN family nuclease domain; OB – OB-fold domain; iPD-D/ExK – inactivated PD-D/ExK fold; Hsp70-like ATPase – Hsp70-like NBD/SBD; HEAT – HEAT-like helical repeats; YprA – YprA-like helicase domain; DUF1998 - DUF1998 is often found in or associated with helicases and contains four conserved, putatively metal ion-binding cysteine residues; SWI2/SNF2 – SWI2/SNF2-family ATPase; PglX – PglX-like DNA methyltransferase; HsdR/M/S – Type I RM system restriction, methylation, and specificity factors.

CoCoNuT system phylogeny and classification

The figure shows the detailed phylogeny of McrB GTPases from CoCoNuT systems and their close relatives. All these GTPases possess an NxD GTPase motif rather than NxxD. This tree was built from the representatives of 90% clustering of all validated homologs. Abbreviations of domains: McrB – McrB GTPase domain; CoCo – coiled-coil; MN – McrC N-terminal domain (DUF2357); CSD – cold shock domain; YTH – YTH-like domain; IG – Immunoglobulin (IG)-like beta-sandwich domain; Hsp70 – Hsp70-like NBD/SBD; HEAT – HEAT-like helical repeats; ZnR – zinc ribbon domain; PYD – pyrin/CARD-like domain; SPB – SmpB-like domain; RTL – RNase toxin-like domain; HEPN – HEPN family nuclease domain; OB – OB-fold domain; iPD-D/ExK – inactivated PD-D/ExK fold; REC – Phosphoacceptor receiver-like domain; PLD – Phospholipase D-like nuclease domain; Vsr – very-short-patch-repair PD-D/ExK nuclease-like domain. Underneath each gene is a proposed protein name, with Cnu as an abbreviation for CoCoNuT.

Phyletic distribution of CoCoNuTs

The phyletic distribution of CnuB/McrB GTPases in CoCoNuT systems found in genomic islands with distinct domain compositions. Most CoCoNuTs are found in either Bacillota or Pseudomonadota, with particular abundance in Gammaproteobacteria. Type I-B and the related Pseudo-Type I-B CoCoNuTs are restricted mainly to Bacillota. In contrast, the other types are more common in Pseudomonadota, but can be found in a wide variety of bacteria. Type III-B and III-C are primarily found in Alphaproteobacteria and Cyanobacteriota, respectively.

Domain composition, operon organization, and AlphaFold2 structural predictions of components of the Type I CoCoNuT systems

A) Type I CoCoNuT domain composition and operon organization. The arrows indicate the direction of transcription. B-D) High quality (Average pLDDT > 80), representative AlphaFold2 structural predictions for protein monomers in B) Type I-A CoCoNuT systems (CnuB and CnuC, from top to bottom), C) Type I-B CoCoNuT systems (CnuA, CnuB, and CnuC, from top to bottom), and D) Type I-C CoCoNuT systems (CnuB and CnuC, from top to bottom). Models were generated from representative sequences with the following GenBank accessions (see Supplementary Data for sequences and locus tags): ROR86958.1 (Type I-A CnuB), APL73566.1 (Type I-A CnuC), TKH01449.1 (Type I-B CnuA), GED20858.1 (Type I-B CnuB), GED20857.1 (Type I-B CnuC), GFD85286.1 (Type I-C CnuB), MBV0932851.1 (Type I-C CnuC). Abbreviations of domains: CSD – cold shock domain; YTH – YTH-like domain; CoCo – coiled-coil; IG – Immunoglobulin (IG)-like beta-sandwich domain; ZnR – zinc ribbon domain; PYD – pyrin/CARD-like domain; REC – Phosphoacceptor receiver-like domain. These structures were visualized with ChimeraX (Pettersen et al., 2021).

Domain composition, operon organization, and AlphaFold2 structural predictions for core protein components of Type II and III CoCoNuT systems

A) Type II and III CoCoNuT domain composition and operon organization. The arrows indicate the direction of transcription. Type II and III-A systems very frequently contain TerY-P systems as well, but not invariably, and these are never found in Type III-B or III-C, thus, we do not consider them core components. B-D) High quality (Average pLDDT > 80), representative AlphaFold2 structural predictions for protein monomers in B) Type II CoCoNuT systems (CnuB and CnuC, from top to bottom), C) Type II and III-A CoCoNuT systems (CnuH at the top, Type II CnuE on the bottom left, Type III-A CnuE on the bottom right), and D) Type III-A CoCoNuT systems (CnuB and CnuC, from top to bottom). Models were generated from representative sequences with the following GenBank accessions (see Supplementary Data for sequences and locus tags): AMO81401.1 (Type II CnuB), AVE71177.1 (Type II CnuC), AMO81399.1 (Type II and III-A CoCoNuT CnuH), AVE71179.1 (Type II CnuE), ATV59464.1 (Type III-A CnuE), PNG83940.1 (Type III-A CnuB), NMY00740.1 (Type III-A CnuC). Abbreviations of domains: CSD – cold shock domain; CoCo – coiled-coil; IG – Immunoglobulin (IG)-like beta-sandwich domain; ZnR – zinc ribbon domain; SPB – SmpB-like domain; RTL – RNase toxin-like domain; HEPN – HEPN family nuclease domain; OB/stalk – OB-fold domain attached to a helical stalk-like extension of ATPase; Vsr – very-short-patch-repair PD-D/ExK nuclease-like domain; PLD – Phospholipase D family nuclease domain; HEAT – HEAT-like helical repeats. These structures were visualized with ChimeraX (Pettersen et al., 2021).

Complex operonic associations of Type II CoCoNuTs

Type II CoCoNuTs are frequently associated with RtcR homologs, and in many cases, ancillary defense genes are located between the RtcR gene and the CoCoNuT, almost always oriented in the same direction in an apparent superoperon. Abbreviations of domains: YprA – YprA-like helicase domain; DUF1998 - DUF1998 is often found in or associated with helicases and contains four conserved, putatively metal ion-binding cysteine residues; PLD – Phospholipase D family nuclease domain; SWI2/SNF2 – SWI2/SNF2-family ATPase; HsdR/M/S – Type I RM system restriction, methylation, and specificity factors; ShdA – Shield system core component ShdA; TPR – Tetratricopeptide repeat protein; MBL fold – Metallo-beta-lactamase fold; 4 TM domain – Protein with 4 predicted transmembrane helices; Mod/Res – Type III RM modification and restriction factors.