Negative-stain data of AdhE spirosomes indicate that conformation differs between bacteria.

A) Sample negative stained images. Ct = C. thermocellum, Ec = E. coli. B) Measurement of spirosome conformation as determined in at least 500 instances. C) Violin plot of the lengths of spirosomes measured in 1250 instances – example spirosomes to the left of each group represents the majority conformation in the apo state. The difference in mean length was statistically significant for all three samples.

Cryo-EM structure of the extended C. thermocellum AdhE spirosome.

A) Representative cryo-EM image B) 2D class averages of the single-particle analysis C) Local resolution of the final map D) Cartoon model fit in the density E) Stick representation to show that side chains can be determined at this resolution, the sphere represents the catalytic Fe in the ADH domain F) Dimer interface shows a swapped domain dimerization with a large buried surface area G) Tetramer interface has a smaller buried surface area.

Hydrogen bond networks are more prevalent in the extended spirosomes of both E. coli and C. thermocellum.

A) C. thermocellum extended spirosome. B) E. coli extended spirosome from PDB ID 7BVP. C) SWISS-MODEL of the C. thermocellum compact spirosome. D) E. coli compact spirosome from PDB ID 6AHC. Individual monomers have unique colors, the E. coli structure monomers are colored with lighter shades to correlate to the corresponding monomer in C. thermocellum. All salt bridges and hydrogen bonds are represented with spheres to emphasize their locations in the ribbon structure.

Previously identified mutants in E. coli compared to the C. thermocellum structure.

A) Top row represents the mutant E. coli F670 that disrupts spirosomes, as well as E. coli F670/S705 that lengthens spirosomes as found in PDBID 7BVP. B) Bottom row represents the E. coli 446-449 linker deletion as found in PDBID 6TQH.

Active sites of C. thermocellum AdhE compared to E. coli AdhE.

A) ALDH NAD+-binding domain, where all residues within 2.5 Å of the NAD+ are shown in stick (Ct – green, NAD+ – navy) B) ADH NAD+ binding domain, where all residues within 2.5 Å of the NAD+ are shown in stick (Fe2+ – rust sphere) C) Sequence alignment of the active site residues D) Surface representation of the novel NAD(P)H binding site (magenta) shown between two AdhE monomers (blue and yellow) E) ALDH binding domain compared to R. palustris (Rp) bound to acetyl-CoA, where all residues within 2.5 Å of the docked Rp acetyl-CoA are shown as sticks (Ct – green, acetyl-CoA – pink) F) Zoom panel from C showing the catalytic cysteine in relation to the docked acetyl-CoA (Ct –green, Ec – cyan, Rp – magenta) G) Surface view of C. thermocellum with both NAD+ (Ec, navy) and acetyl-CoA (Rp, pink) docked into the structure, white circle indicates a clash between NAD+ and C. thermocellum H) Density of the C. thermocellum ADH active site, showing the coordinated Fe2+ atom and an empty density between H734, D844, and C846, circled in red I) Snapshot from molecular dynamics simulation at the ADH active site, illustrating hydrogen bonding between water molecules occupying the active site and the three catalytic residues (H734, D844, and C846, green sticks). Also shown are NADH molecules (navy sticks), H644, H730, and H744 (green sticks), and zinc (gray sphere).

Analysis of the residues lining the spirosome channel.

A) Sequence homology of the channel from both C. thermocellum and E. coli, residues only identified in the Ct channel are green and in the Ec only channel are cyan. B) Log2-fold difference between the residues that line the channel and the full-length protein. The scale of change is color-coded from deep red for fewer residues represented in the channel to deep blue for more residues represented in the channel.

Molecular dynamics (MD) of aldehyde channeling in C. thermoceullum AdhE spirosomes.

A) Starting configuration for MD simulation of the C. thermocellum extended spirosome structure overlaid with the channel connecting the ALDH and ADH active sites as determined by MOLE (shown in gold spheres) (36). Also shown are C252 (representing the ALDH active site, green sticks), three histidine residues that coordinate the divalent metal at the ADH active site (His664, His730, His744, yellow sticks), zinc ion (purple sphere), the starting location for acetaldehyde (violet spheres), and the tertiary AdhE structure (each AdhE molecule colored distinctly). B) Same as panel A, except MOLE tunnel is removed, and acetaldehyde location from MD simulation is shown every 1 ns for 160 ns. In this simulation, the acetaldehyde molecule exits the enzyme into solution after about 125 ns. C) Representations are the same as in A, but here the channel shown is determined by MOLE for the compact spirosome structure (E. coli 6TQM and aligned with C. thermocellum AdhE homology structure). D) Same as panel C, except MOLE tunnel is replaced with acetaldehyde location from MD simulation, shown every 1 ns. In this simulation, the acetaldehyde molecule exits the enzyme into solution at about 4 ns. E) Average residence time for acetaldehyde in channel before exiting AdhE in MD simulations for two different starting configurations for the AdhE spirosome (extended and compact) and acetaldehyde (at ALDH active site and midway along the channel). Error bars are the standard deviation for simulations performed in triplicate.

CryoDRGN results show that there is conformational heterogeneity in the sample.

A) UMAP representation of the particles shows a continuous heterogeneity (bottom right corner). Colored UMAP graph shows the division of the particles into five classes, broken into their individual map based on color. B) One representative density for each group (excepting the junk class) shown at two angles to illustrate the movement of the spirosome (purple – extended, lavender – intermediate 1, light blue – intermediate 2, teal – compact). C) Back-projected template classes generated in Cryosparc from PDB ID 6AHC (top) and the matching classes selected from our data (bottom). D) Two angles of the final compact spirosome density shown with local resolution coloring as indicated by the scale bar in the center.

Comparing structural properties of the interfaces of the C. thermocellum and E. coli AdhE spirosomes

A chart examining the dimer and tetramer interfaces of the C. thermocellum (Ct) and E. coli (Ec) extended and compact structures. Ec extended – PDBID 7BVP, Ec compact – PDBID 6AHC, Ct compact – SWISS Model. Scale bars for the electrostatics, hydrophobicity, and B-factor are included at the bottom of each cell.

Distribution of amino acids found in spirosome interfaces Bar graph representing the amino acid types found in the interfaces. “Special” amino acids are cysteine, glycine, and proline.

AdhE active site consensus sequences Graphs showing the prevalence of residues at positions lining the NADH binding pocket of both the ALDH and ADH domains in 1000 AdhE sequences. The position on the x-axis represents the location in C. thermocellum.

The residue listed at the top of each bar indicates the consensus residue, as well as the percentage of consensus.

Consensus sequences of the AdhE channel-lining residues Graphs showing the prevalence of residues at positions lining the extended and compact channels in 1000 AdhE sequences.

The position on the x-axis represents the location in C. thermocellum. The residue listed at the top of each bar indicates the consensus residue, as well as the percentage of consensus.

Molecular dynamics (MD) of aldehyde channeling in E. coli

A) Starting configuration for MD simulation of the E. coli extended spirosome structure overlaid with the channel connecting the ALDH and ADH active sites as determined by MOLE (36) (shown in gold spheres, structure used is PDB code 7BVP). Also shown are C246 (representing the ALDH active site, green sticks), three histidine residues that coordinate the divalent metal at the ADH active site (His657, His723, His737, yellow sticks), zinc ion (purple sphere), the starting location for acetaldehyde (violet spheres), and the tertiary AdhE structure (each AdhE molecular colored distinctly). B) Same as panel A, except MOLE tunnel is removed, and acetaldehyde location from MD simulation is shown every 1 ns for 200 ns. In this simulation, the acetaldehyde molecule remains within the enzyme for the full 200 ns simulation. C) Representations are the same as in A, but here the channel shown is determined by MOLE for the compact spirosome structure (structure used is PDB code 6TQM and aligned with homology structure for compact C. thermocellum spirosome). D) Same as panel C, except MOLE tunnel is replaced with acetaldehyde location from MD simulation, shown every 1 ns. In this simulation, the acetaldehyde molecule exits the enzyme into solution at about 8 ns. E) Average residence time for acetaldehyde in channel before exiting AdhE in MD simulations for two different starting configurations for the AdhE spirosome (extended and compact) and acetaldehyde (at ALDH active site and midway along the channel). Error bars are the standard deviation for simulations performed in triplicate. F) Channel radius profile from MOLE for E. coli extended cryo-EM structure (PDB code 6TQH). G) Channel radius profile from MOLE for C. thermocellum extended cryo-EM structure. H) To give a sense for the overall MD system setup, the extended E. coli spirosome (PDB code 6TQH) is shown as an example (solvent omitted for clarity). Two full AdhE molecules are shown in yellow and green. Light blue and royal blue indicate the two capping ADH domains. The active site for ALDH and ADH domains are indicated. Also shown is the tunnel found by MOLE that connects the two active sites. This panel is analogous to panel A, shown at larger scale.

Sequences of proteins expressed in this manuscript The full sequences of the E. coli and C. thermocellum AdhE genes.

The 6-His tag is highlighted in cyan and the TEV protease cleavage site is highlighted in green.

Lower-resolution C. thermocellum AdhE structure

A) Density of the 3.8 Å AdhE structure colored by local resolution. B) Initial model generated fit into the density, gray. This was used as the template for the final structure model.

Data processing of the C. thermocellum AdhE structure

A) Pipeline of data processing indicating number of particles at each major step, as well as resolutions pre– and post-polishing. B) Final FSC curve of the masked density, indicating the structure is 3.28 Å at and FSC of 0.143. C) SMOC correlation of the residue to density fit for all 6 chains as calculated by TEMPy local in CCPEM.

Plasmids, primers, and strains used in this study

Data collection and processing