Structural evolution of nitrogenase enzymes over geologic time

  1. Department of Bacteriology, University of Wisconsin-Madison, Madison, United States
  2. Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM)—Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria-CSIC (INIA/CSIC), Madrid, Spain
  3. Institute of Biochemistry, University of Freiburg, Freiburg, Germany
  4. Department of Chemistry and Biochemistry, Utah State University, Logan, United States

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Volker Dötsch
    Goethe University Frankfurt, Frankfurt am Main, Germany
  • Senior Editor
    Volker Dötsch
    Goethe University Frankfurt, Frankfurt am Main, Germany

Reviewer #1 (Public review):

This was a clearly written manuscript that did an excellent job summarizing complex data. In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph.

This work provides a useful resource for studying nitrogenase evolution. However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

Comments on revisions:

I appreciate the authors responding to my comments. I think Fig. S10 helps put the structural data into more context. It would be helpful to make clearer in the legend what proteins are being compared, especially in 10C.

Although I can see why the authors focus on the NifK extension and its potential connection to oxygen protection, I would point out that Vnf and Anf do not have this extension in their K subunit, and you find both Vnf and Anf in aerobic and facultative anaerobic diazotrophs. This is a minor point, but I think it is important to mention in the discussion.

Reviewer #2 (Public review):

Summary:

This work aims to study the evolution of nitrogenanses, understanding how their structure and function adapted to changes in environment, including oxygen levels and changes in metal availability.

The study predicts > 3000 structures of nitrogenases, corresponding to extant, ancestral and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive and admirable undertaking. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes

Author response:

The following is the authors’ response to the original reviews.

Reviewer #1 (Recommendations for the authors):

Line 122: There were a number of qualitative descriptors in the paper. For instance, if the authors want to say massive campaign, how massive? How rapid? These are relative terms in this context.

We have revised the text to minimize qualitative descriptors and to provide concrete numbers where possible. The revised sentence (line 121) now reads “We began our structural investigation of nitrogenase evolutionary history by conducting on a large-scale structure prediction analysis of 5378 protein structures, a more than threefold increase compared to available nitrogenase structures in the PDB. We then analyzed our phylogenetic dataset to identify notable structural changes.”

Line 179: "massively scale up" How massive?

We agree with the reviewer’s observation, in response, we have removed the phrase “massively scale up” and revised the text.

Line 182: "no compromise on alignment depth and negligible cost to prediction accuracy". How do you know this? Is this shown somewhere? Was there a comparison between known structures and the predicted structure for those nitrogenases that have structures?

In response to this comment, we have made several clarifications and revisions in the manuscript:

We modified Figure S1, which now shows the pLDDT (per-residue confidence metric from Alphafold) values of all our predictions. These scores are consistently high (over 90 for the D and K subunits, and approximetly 90 for the H subunits) regardless of whether the recycling protocol or the bona-fide protocol was used.

The reviewer’s comment demonstrated to us that the Figure S1 needed to more clearly representing these values, we therefore updated it accordingly.

To prevent any misinterpretation of our claims about the accuracy and cost of the method , we have revised the text at line 179, as follows:

“In total, 2,689 unique extant and ancestral nitrogenase variants were targeted. All structures were generated in approximately 805 hours, including GPU computations and MMseqs2 alignments performed using two different protocols: one for extant or most likely ancestral sequences, and another for ancestral variants.”

To support our analyses further, Figure S10A compares our model predictions with available PDB structures for nitrogenases.

Additionally, Figure S10B compare our predicted structures with the experimental structures reported in this article. In all cases, we observe low RMSD values.

Line 220: "fall within 2 angstroms" instead of "fall 2A"?

We have updated it in the text.

Line 315: It is not clear how the binding affinities and other measurements in Figure 4 and S6C were measured, and it is not discussed in the material and methods.

We thank the reviewer for pointing out this lack of clarity. The binding affinity estimations were performed using Prodigy. We have updated the main text (see line 322) to explicitly state that binding affinities were estimated using Prodigy. In addition, we have expanded the Materials and Methods section to include additional information about the structure characterization methods (lines 745-749). Previously, these details were only noted in Supplementary Table S6.

Line 510-511: "Subtle, modular structural adjustments away from the active site were key to the evolution and persistence of nitrogenases over geologic time". This seems like a bit of an overstatement. While the authors see structural differences in the ancestral nitrogenase and speculate these differences could be involved in oxygen protection, there is no evidence that the ancestral nitrogenase is more sensitive to oxygen than the extant nitrogenase.

We appreciate the reviewer’s comment. Our intention was to emphasize that subtle, modular structural adjustments might have contributed to oxygen protection rather than to assert that ancestral nitrogenases are more oxygen-sensitive than their extant counterparts. We have revised the text to clarify.

Reviewer #2 (Recommendations for the authors):

What is the reference for the measured RMSDs in Fig 2A? What is the value on the y-axis? The range of 'Count' is unclear, given that there are 5000 structures predicted in the study.

Figure 2A presents a histogram of RMSD values from all pairwise alignments among 769 structures (385 extant and 384 ancestral DDKK), totaling 591,361 comparisons. We excluded ancestral DDKK variants due to computational limitations.

Similarly, what is the sequence identity in Figure 2B calculated relative to?

In Figure 2B, sequence identities are derived from pairwise comparisons across all structures in our dataset. Each value represents the identity between two specific structures, rather than being measured against a single reference.

The claim that 'structural analysis could reproduce sequence-based phylogenetic variation' should probably be tempered or qualified, given that the RMSD differences calculated are so low.

We hope to have addressed the concerns about the low RMSD values in the previous comments. We have revised the text (line 204), which now reads: “it still strongly correlates with sequence identity (Figure 2B), indicating that even minor structural variations can recapitulate sequence-based phylogenetic distinctions.”

How are binding affinities (Figure 4) calculated?

We have now clarified the binding affinity calculations in the main text. The model used is now detailed at line 322, with additional information provided in the Methods section.

Presumably, crystallized proteins (Anc1A, Anc1B, Anc2) were also among those whose structures were predicted with AF. A comparison should be provided of the predicted and crystallized structures, as this is an excellent opportunity to further comment on the reliability of AlphaFold.

In the revised manuscript, Figure S10 now present structural comparisons between the crystallized proteins and their AlphaFold-predicted counterparts.

The labels in Figure 5B are not clear. Are the 3rd and 4th panels also comparative RMSD values? But only one complex name is provided.

We appreciate this feedback and now revised the Figure 5B for clarity.

Page 9 line 220, missing word: 'varaints fall within/under 2angstroms'

We thank the reviewer for the correction, we have updated the text.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation