Figures and data

Overview of the methodological pipeline for massive protein structure prediction and experimental crystallographic analysis of extant and ancestral nitrogenase enzymes.
A) Nitrogenase phylogeny built from concatenated Nif/Vnf/AnfHDK protein sequences. Clades are labeled according to the nomenclature used by Raymond et al. 2004. (Nif-I: Group I Nif, Nif-II: Group II Nif, Nif-III: Group III Nif). Nif, Vnf, or Anf homologs from select model organisms are labeled with dashed lines (Avin: Azotobacter vinelandii, Cpas: Clostridium pasteurianum, Kpne: Klebsiella pneumoniae, Mace: Methanosarcina acetivorans, Rpal: Rhodopseudomonas palustris). Anc1A/B and Anc2 ancestors targeted for crystallographic analysis are labeled with stars. B) Graphical overview of the pipeline for nitrogenase protein structure prediction and crystallization (see Materials and Methods for further details). Colored rectangles correspond to protein sequences for H, D, and K subunits. For each ancestral node, protein structures were predicted for the most likely ancestral sequence (“ML”) and five alternative sequences (“Alt”) reconstructed based on the site-wise posterior probability distributions in the ancestral sequence. Ancestors Anc1A and Anc2 hybrid enzymes were crystallized containing an ancestral NifD subunit and WT NifH and NifK subunits (WT subunits indicated by lighter color). All predicted structures are publicly available at https://nsdb.bact.wisc.edu.

Global analyses of nitrogenase DDKK sequence and structural diversity.
A) Root Mean Squared Deviation (RMSD) distribution on paired nitrogenase alignments across extant and ML-ancestor pairs. B) Sequence identity and structural similarity (quantified by root mean square deviation (RMSD) of aligned predicted structures) distribution of paired nitrogenase alignments. C) Hierarchical clustering of predicted nitrogenase structures based on structural similarity (RMSD). Each tile in the heatmap corresponds to the RMSD between two nitrogenase structures.

Nitrogenase structure variation in a phylogenetic context.
A) Nitrogenase protein phylogeny. Branches and ancestral nodes corresponding to structural insertion events, as well as representative extant variants conserving those insertions, are highlighted and/or labeled. Clade and node colors correspond to the subunit for which an insertion is observed (i.e., blue for the D subunit, red for the K subunit). B) Elongation of the NifD C-terminus coincident with the origin of the Anf clade. C) Progressive elongation of the NifK N-terminus through the early evolution of the Nif-I clade. D) Insertion within NifD coincident with the origin of the Nif-II clade. B-D) All visualized structures are predicted unless otherwise specified with the corresponding Protein Data Bank identifier. Bound G- and H-subunit structures were not predicted together with the NifDK structures and are thus indicated with an asterisk. The binding positions of the G- and H-subunit structures are inferred based on alignment with PDB 8BOQ (Trncik et al., 2023) and PDB 1M34 (Schmid et al., 2002), respectively.

Phylogenetic patterns of nitrogenase structural attributes calculated across DDKK proteins.
A) Node taxonomy overview. B) Binding affinity prediction for extant and ancestral nitrogenase nodes mapped to the nitrogenase phylogeny. C) Structure overview of the D and K subunit interactions around the N-terminal insertion in Nif-I nitrogenases. Structure corresponding to Azotobacter vinelandii NifDK (PDB code: 3U7Q). Note: We reduced the number of displayed phylogenetic nodes to mitigate visual overcrowding, refer to Supplementary Figure S6 for a complete visualization of all nodes.

Crystal structures of targeted ancestral nitrogenases.
A) Nitrogenase protein phylogeny from which nitrogenase ancestral proteins were reconstructed and crystallized for structural characterization. Major clades are labeled following (Garcia et al., 2023). B) Residue-level structural similarity (root mean square deviation, RMSD) between ancestral and wild-type (A. vinelandii) NifDK/NifH structures. C) Spatial distribution of ancestral amino acid substitutions relative to WT. Crystallized protein complexes for Anc1A and Anc2 contain ancestral NifD and WT NifH and NifK. Therefore, ancestral substitutions are only in the NifD subunit for these structures. D) Ancestral amino acid substitutions within the NifH-NifDK interface of Anc1B. Bound positions of NifH are inferred by alignment with either the nucleotide-free (PDB 2AFH (Tezcan et al., 2005)) or MgATP-bound (PDB 7UT8 (Rutledge et al., 2022)) structures. E-G) Close views of specific, ancestral amino acid substitutions that are inferred to impact NifH-NifDK interactions.