Introduction

Oxygenic photosynthesis in cyanobacteria, algae, and land plants converts solar energy into chemical energy concomitant with the evolution of oxygen molecules (1). The light-energy conversion reactions occur within two multi-subunit membrane protein complexes known as photosystem I and photosystem II (PSI and PSII, respectively), which perform light harvesting, charge separation, and electron transfer reactions (25). To harvest light energy efficiently, a number of light-harvesting antenna subunits are attached to the periphery of PSI and PSII core complexes and transfer excitation energy to the two types of photosystem cores (1). Light-harvesting antennae have a wide variety among photosynthetic organisms in relation to protein sequences and pigment compositions and can be categorized into two major groups: membrane proteins and water-soluble proteins (1).

The membrane protein category is composed mainly by the light-harvesting complex (LHC) protein superfamily (6, 7), which absorbs light energy using chlorophylls (Chls) and carotenoids (Cars). The number and types of Chls and Cars are highly diversified among LHCs, resulting in color variations in photosynthetic organisms, which can be classified into green and red lineages (8). The green lineage organisms include green algae and land plants, whereas the red lineage organisms include red algae, diatoms, haptophytes, cryptophytes, and dinoflagellates (8). LHCs that are specific to PSI (LHCIs) interact with a eukaryotic PSI monomer, thereby forming a PSI-LHCI supercomplex (9, 10), whose structures have been revealed by cryo-electron microscopy (cryo-EM) in a number of eukaryotes (9, 10). In the red lineage, the number of LHCIs and their protein and pigment compositions have been found to differ greatly among the PSI-LHCI structures of red algae (1114), a diatom (15, 16), a cryptophyte (17), and dinoflagellates (18, 19).

We have recently shown the conservation and diversity of LHCIs among red-lineage algae based on structural and phylogenetic analyses of PSI-LHCI supercomplexes (14). This work clearly showed that the binding sites of LHCIs to PSI were conserved to some extent among red-lineage algae but their evolutionary relationship was poor. It is known that LHCIs have similar protein structures, especially their three-transmembrane helices, among photosynthetic organisms irrespective of the green and red lineages (9, 10); nevertheless, individual LHCIs have changed their sequences and structures to adapt their binding sites to the PSI cores during assembly of the PSI-LHCI supercomplexes. These observations raise a critical question as to how LHCIs recognize their binding sites in the PSI core.

Diatoms are one of the most important phytoplankton in aquatic environments and contribute to the primary production in the ocean remarkably (20). Diatoms have unique LHCs called fucoxanthin Chl a/c-binding proteins (FCPs), whose pigment compositions and amino acid sequences are different from those of LHCs in land plants (2123). Isolation and characterization of PSI-FCPI supercomplexes isolated from the diatom Chaetoceros gracilis and their structures have been reported (15, 16, 2429). Kumazawa et al. showed the high diversity of FCPs between two types of diatoms, C. gracilis and Thalassiosira pseudonana (30). C. gracilis and T. pseudonana have 46 and 44 FCPs, respectively, which were categorized into multiple, closely related subgroups (30), and the amino acid sequences of FCPs are not completely identical between the two diatoms. Therefore, comparing FCPIs including their amino acid residues and protein structures at similar binding sites in PSI-FCPIs between the two diatoms may lead to molecular insights into how FCPIs are bound to PSI. However, an overall structure of the T. pseudonana PSI-FCPI has not been solved yet.

In this study, we solved a PSI-FCPI structure of T. pseudonana CCMP1335 at a resolution of 2.30 Å by cryo-EM single-particle analysis. The structure shows a PSI-monomer core and five FCPI subunits. Structural and sequence comparisons exhibit unique protein-protein interactions of each FCPI subunit with PSI. Based on these findings, we discuss the molecular assembly and selective binding mechanisms of FCPI subunits in diatom species.

Results and discussion

Overall structure of the T. pseudonana PSI-FCPI supercomplex

The PSI-FCPI supercomplexes were purified from the diatom T. pseudonana CCMP1335 as described in the Methods section. Biochemical and spectroscopic analyses showed that this complex is intact with several bands corresponding to FCPI (Fig. S1). Cryo-EM images of the PSI-FCPI supercomplex were obtained by a JEOL CRYO ARM 300 electron microscope operated at 300 kV. The final cryo-EM map was determined at a resolution of 2.30 Å with a C1 symmetry (Fig. S2, S3, Table S1), based on the “gold standard” Fourier shell correlation (FSC) = 0.143 criterion (Fig. S3A).

The atomic model of PSI-FCPI was built based on the resultant cryo-EM map (see Methods; Fig. S3, Table S1–S3). The structure reveals a monomeric PSI core associated with five FCPI subunits (Fig. 1A, B). The five FCPI subunits were named FCPI-1 to 5 (Fig. 1A) following the naming of LHCI subunits in the PSI-LHCI structure of Cyanidium caldarium RK-1 (NIES-2137) (14); in particular, the sites of FCPI-1 and FCPI-2 in the T. pseudonana PSI-FCPI structure (Fig. 1A) corresponded to those of LHCI-1 and LHCI-2 in the C. caldarium PSI-LHCI structure. The PSI core contains 94 Chls a, 18 β-carotenes (BCRs), 1 zeaxanthin (ZXT), 3 [4Fe-4S] clusters, 2 phylloquinones, and 6 lipid molecules, whereas the five FCPI subunits contain 45 Chls a, 7 Chls c, 2 BCRs, 15 fucoxanthins (Fxs), 7 diadinoxanthins (Ddxs), and 3 lipid molecules (Table S3).

Overall structure of the PSI-FCPI supercomplex from T. pseudonana.

Structures are viewed from the stromal side (left panels) and the direction perpendicular to the membrane normal (right panels). Only protein structures are shown, and cofactors are omitted for clarity. The FCPI (A) and PSI-core (B) subunits are labeled and colored differently. The five FCPI subunits are labeled as FCPI-1 to 5 (red) with their gene products indicated in parentheses (black) in panel A.

Structure of the T. pseudonana PSI core

The PSI core contains 12 subunits, 11 of which are identified as PsaA, PsaB, PsaC, PsaD, PsaE, PsaF, PsaI, PsaJ, PsaL, PsaM, and Psa29 (Fig. 1B), whereas the remaining one could not be assigned and was modeled as polyalanines because of insufficient map resolution for identification of this protein (Fig. S4A). The unidentified subunit was named Unknown, which is located at the same site as Psa28 in the C. gracilis PSI-FCPI (15). The protein structure of Unknown is virtually identical to that of Psa28 in the C. gracilis PSI-FCPI (Fig. S4B). Psa28 is a novel subunit found in the C. gracilis PSI-FCPI structure (15), and its name follows the nomenclature as suggested previously (31). It is known that the genes encoding various PSI proteins have been designated as psaA, psaB, etc. PsaZ has been identified in PSI of Gloeobacter violaceus PCC 7421 (32, 33). After psaZ, the newly identified genes should be named psa27, psa28, etc., and the corresponding proteins are called Psa27, Psa28, etc. Psa27 has been identified in PSI of Acaryochloris marina MBIC11017 (3436). Thus, we named the novel subunit as Psa28 (15). In contrast, Psa28 was also named PsaR in the PSI-FCPI structure of C. gracilis (16) and the structure of a PSI supercomplex containing alloxanthin Chl a/c-binding protein (PSI-APCI) from Chroomonas placoidea (17).

Psa29 is newly identified in the T. pseudonana PSI-FCPI structure using ModelAngelo (37) and the NCBI database (https://www.ncbi.nlm.nih.gov/) (Fig. 2). The subunit corresponding to Psa29 was found in the C. gracilis PSI-FCPI structures (15, 16), which was modeled as polyalanines and named as either Unknown1 (15) or PsaS (16), respectively. Psa29 shows a unique structure distinct from the other PSI subunits in the T. pseudonana PSI-FCPI (Fig. 2A), and engages in multiple interactions with PsaB, PsaC, PsaD, and PsaL at distances of 2.5–3.2 Å (Fig. 2B–G). Based on sequence analyses, Psa29 can exhibit evolutionary divergence between Bacillariophyceae (diatoms) and Bolidophyceae, the latter of which is a sister group of diatoms within Stramenopiles (Fig. 2H); however, it was not found from other organisms. The arrangement of each PSI subunit in the T. pseudonana PSI-FCPI is virtually identical to that in the C. gracilis PSI-FCPI structures already reported (15, 16).

Structure and diversity of Psa29.

(A) Structure of Psa29 depicted as cartoons. Psa29 was modeled from V47 to L178. (B) Cryo-EM map of Psa29 and its surrounded environments viewed from the stromal side. The red squared areas are enlarged in panels CG. Yellow, PsaB; cyan, PsaC; blue, PsaD; magenta, PsaL; orange, Psa29. (CG) Protein-protein interactions of Psa29 with PsaB/PsaC (C), PsaC/PsaD (D), PsaB/PsaD (E), PsaB/PsaD/PsaL (F), and PsaD (G). Interactions are indicated by dashed lines, and the numbers are distances in Å. Amino acid residues participating in the interactions are labeled; for example, A71/C means Ala71 of PsaC. B, PsaB; C, PsaC; D, PsaD, L, PsaL; 29, Psa29. (H) Phylogenetic analysis of Psa29 in photosynthetic organisms. A maximum-likelihood tree of the Psa29 proteins was inferred by IQ-TREE v2.2.2.7 using WAG+F+G4 model and trimmed alignment of 22 sequences with 245 amino-acid residues. Numbers at nodes are ultrafast bootstrap support (%) (1000 replicates). The tree was mid-point rooted between diatoms and Bolidophyceae Parmales. Psa29 of T. pseudonana CCMP1335 is indicated by a red line.

The number and arrangement of Chls and Cars within the PSI core in the T. pseudonana PSI-FCPI structure (Fig. S4C, D) are largely similar to those in the C. gracilis PSI-FCPI structure (15). However, Chl a102 of PsaI was found in the T. pseudonana PSI-FCPI structure but not in the C. gracilis PSI-FCPI structure (15), whereas a845 of PsaA and BCR863 of PsaB were identified in the C. gracilis PSI-FCPI structure (15) but not in the T. pseudonana PSI-FCPI structure. One of the Car molecules in PsaJ was identified as ZXT103 in the T. pseudonana PSI-FCPI structure but it is BCR112 in the C. gracilis PSI-FCPI structure (15).

Structure of the T. pseudonana FCPIs

Kumazawa et al. classified 44 Lhc genes in T. pseudonana with their characteristic names of Lhcf, Lhcq, Lhcr, Lhcx, and Lhcz (30). Based on this report, the five FCPI subunits in the PSI-FCPI structure were identified using five genes, namely, RedCAP, Lhcr3, Lhcq10, Lhcf10, and Lhcq8, at the sites of FCPI-1 to 5, respectively (Fig. 1A). It should be noted that RedCAP is not included in the 44 Lhc genes (30) but grouped into the LHC protein superfamily (6, 7). For the assignments of each FCPI subunit, we focused on characteristic amino acid residues based on their cryo-EM map, especially S61/V62/Q63 in FCPI-1; A70/R71/W72 in FCPI-2; Y64/R65/E66 in FCPI-3; M63/R64/Y65 in FCPI-4; and A62/R63/R64 in FCPI-5 (Fig. S5). The root mean square deviations (RMSDs) of the structures between FCPI-4 and the other four FCPIs range from 1.91 to 3.73 Å (Table S4).

Each FCPI subunit binds several Chl and Car molecules, namely, 7 Chls a/1 Chl c/2 Fxs/3 Ddxs/2 BCRs in FCPI-1; 10 Chls a/1 Chl c/3 Fxs/1 Ddx in FCPI-2; 7 Chls a/3 Chls c/2 Fxs/2 Ddxs in FCPI-3; 11 Chls a/2 Chls c/4 Fxs in FCPI-4; and 10 Chls a/4 Fxs/1 Ddx in FCPI-5 (Fig. S6, Table S3). The axial ligands of the central Mg atoms of Chls within each FCPI are provided mainly by the main and side chains of amino acid residues (Table S5). Possible excitation-energy-transfer pathways can be proposed based on close physical interactions among Chls between FCPI-3 and PsaA, between FCPI-3 and PsaL, between FCPI-1 and PsaI, and between FCPI-2 and PsaB (Fig. S7).

Structural characteristics of RedCAP and its evolutionary implications

Among the FCPI subunits, only FCPI-1 has BCRs in addition to Fxs and Ddxs (Fig. S6A). This is a first report for the binding of BCRs to FCPIs in diatoms. FCPI-1 is a RedCAP, which belongs to the LHC protein superfamily but is distinct from the LHC protein family (6, 7). FCPI-1 is located near PsaB, PsaI, and PsaL through protein-protein interactions with these subunits at the stromal and lumenal sides (Fig. 3A). I138 and S139 of FCPI-1 interact with K121, G122, and F125 of PsaL at the stromal side (Fig. 3B), whereas at the lumenal side, multiple interactions are found between I109 of FCPI-1 and F5 of PsaI, between T105/L106/T108 of FCPI-1 and W92/P94/F96 of PsaB, and between E102/W103 of FCPI-1 and S71/I73 of PsaL (Fig. 3C). The protein-protein interactions at the lumenal side (Fig. 3C) appear to be caused by a loop structure of FCPI-1 from Q96 to T116 (pink in Fig. 3D), which is characteristic of FCPI-1 but not present in other four FCPI subunits (pink in Fig. 3E). This loop structure is inserted into a cavity formed by PsaB, PsaI, and PsaL (Fig. 3C, D). These findings indicate that the Q96–T116 loop recognizes the cavity for its binding at the specific FCPI-1 site.

Structural characteristics of FCPI-1 (RedCAP).

(A) Interactions of FCPI-1 with PsaB, PsaI, and PsaL viewed from the stromal (left) and lumenal (right) sides. The areas encircled by black squares are enlarged in panels B, C. Yellow, PsaB; magenta, PsaI; dark red, PsaL; red, FCPI-1. (B, C) Protein-protein interactions of FCPI-1 with PsaL (B) and PsaB/PsaI/PsaL (C), respectively. Interactions are indicated by dashed lines, and the numbers are distances in Å. Amino acid residues participating in the interactions are labeled; for example, S139/1 means Ser139 of FCPI-1. B, PsaB; I, PsaI; L, PsaL; 1, FCPI-1. (D) Characteristic loop structure from Q96 to T116 in FCPI-1 viewed from the lumenal side. Q96 and T116 are labeled with sticks and the Q96–T116 loop is colored pink. (E) Superpositions of FCPI-1 (red) with FCPI-2 (green), FCPI-3 (blue), FCPI-4 (magenta), and FCPI-5 (orange). Only proteins are depicted. The Q96–T116 loop of FCPI-1 is colored pink and Q96 and T116 are labeled with sticks.

RedCAP of C. gracilis (CgRedCAP) was not identified in the C. gracilis PSI-FCPI structures (15, 16). As discussed previously (14), we identified CgRedCAP by sequence analysis and suggested the binding of CgRedCAP to the C. gracilis PSI core at the site similar to LHCI-1 of the red alga C. caldarium PSI-LHCI, whose site corresponds to the FCPI-1 site in the PSI-FCPI of T. pseudonana in this study. A sequence alignment between RedCAP of T. pseudonana (TpRedCAP) and CgRedCAP is shown in Fig. S8A, which shows that CgRedCAP has a sequence similarity of 72% to TpRedCAP and has a protein motif of Q106–I113 (QWGTLATI) corresponding to E102–I109 (EWGTLATI) of TpRedCAP (Fig. 3C). This suggests the possible binding of CgRedCAP to PSI of C. gracilis at the position similar to FCPI-1 in the T. pseudonana PSI-FCPI structure. However, it is unknown (i) whether CgRedCAP is indeed bound to the C. gracilis PSI-FCPI supercomplex and (ii) if a loop structure corresponding the Q96–T116 loop of TpRedCAP exists in CgRedCAP. Further structural study of the C. gracilis PSI-FCPI will pave the way for elucidating the mechanism of molecular assembly of diatom RedCAPs.

RedCAPs have been found in the structures of PSI-LHCI of the red alga Porphyridium purpureum (13) and a PSI supercomplex with alloxanthin Chl a/c-binding proteins (PSI-ACPI) of the cryptophyte Chroomonas placoidea (17), which have been summarized in our previous study (14). Both of the RedCAPs of P. purpureum (PpRedCAP) and C. placoidea (CpRedCAP) display loop structures similar to the Q96–T116 loop in TpRedCAP found in the present study (Fig. 8B). Multiple sequence alignments of TpRedCAP with PpRedCAP and CpRedCAP are shown in Fig. 8C. PpRedCAP and CpRedCAP show sequence similarities of 39% and 60%, respectively, to TpRedCAP. PpRedCAP has a protein motif of V105–L112 (VWGPLAQL), while CpRedCAP has a protein motif of Q117–A124 (QWGPLASA). These two motifs correspond to E102–I109 (EWGTLATI) of TpRedCAP; however, the conservation of sequences is lower between TpRedCAP and PpRedCAP/CpRedCAP than between TpRedCAP and CgRedCAP. Among the four RedCAPs, the four amino acids of Trp, Gly, Leu, and Ala are conserved in the protein motifs (xWGxLAxx), implying that the characteristic loop structure including the conserved protein motifs (xWGxLAxx) contributes to the binding of RedCAP to PSI among the red-lineage algae.

Protein-protein interactions of other FCPI subunits

FCPI-2 (Lhcr3) is located near PsaB and PsaM through protein-protein interactions with these subunits at distances of 3.0–4.3 Å at the stromal and lumenal sides (Fig. 4). The amino acid residues of I63/T65/D66/Y69/W134/Y138/D140 in FCPI-2 are associated with W153/L154/K159/F160/W166 in PsaB at the stromal side (Fig. 4B), whereas F116 and F120 of FCPI-2 interact with F5/I9/M12 of PsaM at the lumenal side (Fig. 4C). The amino acid sequences of I63–Y69, F116–F120, and W134–D140 in Lhcr3 are not conserved in the Lhcr subfamily comprising of Lhcr1, Lhcr4, Lhcr7, Lhcr11, Lhcr12, Lhcr14, Lhcr17, Lhcr18, Lhcr19, and Lhcr20 according to Kumazawa et al. (30) (Fig. S9).

Structural characteristics of FCPI-2.

(A) Interactions of FCPI-2 with PsaB and PsaM viewed from the stromal (left) and lumenal (right) sides. The areas encircled by black squares are enlarged in panels B, C. The PSI and FCPI subunits are colored grey and yellow, respectively. Protein-protein interactions are shown as different colors: green, FCPI-2; cyan, PsaB; pink, PsaM. (B, C) Protein-protein interactions of FCPI-2 with PsaB (B) and PsaM (C), respectively. Interactions are indicated by dashed lines, and the numbers are distances in Å. Amino acid residues participating in the interactions are labeled; for example, Y138/2 means Tyr138 of FCPI-2. B, PsaB; M, PsaM.

FCPI-3 (Lhcq10) is located near PsaL through protein-protein interactions with it at distances of 2.3–4.2 Å at the stromal side (Fig. 5A, B). The amino acid residues of L126/I130/L142/Y146/W147/V148/W155 in FCPI-3 are associated with F4/K6/P20/S25/L26/L30 in PsaL (Fig. 5B). Because TpLhcq10 is homologous to CgLhcr9 (30), we compared the amino acid sequence of Lhcq10 with the Lhcq and Lhcr subfamilies in T. pseudonana (Fig. S10A, B). It was shown that the amino acid sequence of L126–W155 in Lhcq10 is not conserved in the Lhcq subfamily consisting of Lhcq1, Lhcq2, Lhcq3, Lhcq4, Lhcq5, Lhcq6, Lhcq7, Lhcq8, and Lhcq9 (Fig. S10A), as well as in the Lhcr subfamily consisting of Lhcr1, Lhcr3, Lhcr4, Lhcr7, Lhcr11, Lhcr12, Lhcr14, Lhcr17, Lhcr18, Lhcr19, and Lhcr20 according to Kumazawa et al. (30) (Fig. S10B).

Structural characteristics of FCPI-3, 4, and 5.

(A) Interactions among FCPIs and between FCPIs and PsaL viewed from the stromal (left) and lumenal (right) sides. The areas encircled by black squares are enlarged in panels BD. The PSI and FCPI subunits are colored grey and yellow, respectively. Protein-protein interactions are shown as different colors: blue, FCPI-3; magenta, FCPI-4; orange, FCPI-5; purple, PsaL. (BD) Protein-protein interactions between FCPI-3 and PsaL (B), between FCPI-4 and FCPI-5 (C), and between FCPI-5 and PsaL (D), respectively. Interactions are indicated by dashed lines, and the numbers are distances in Å. Amino acid residues participating in the interactions are labeled; for example, L126/3 means Leu126 of FCPI-3. 4, FCPI-4; 5, FCPI-5; L, PsaL.

FCPI-4 (Lhcf10) is located near FCPI-5 through protein-protein interactions with it at distances of 2.6–3.6 Å at the lumenal side (Fig. 5A, C). The amino acid residues of Y196/P198/F199 in FCPI-4 interact with F82/F86/G87 in FCPI-5 (Fig. 5C). The amino acid sequence of Y196–F199 in Lhcf10 is not conserved in the Lhcf subfamily comprising of Lhcf1, Lhcf2, Lhcf3, Lhcf4, Lhcf5, Lhcf6, Lhcf7, Lhcf8, Lhcf9, Lhcf11, and Lhcf12 according to Kumazawa et al. (30) (Fig. S11).

FCPI-5 (Lhcq8) is located near PsaL and FCPI-4 through protein-protein interactions with them at distances of 2.6–4.1 Å at the stromal and lumenal sides (Fig. 5A, C, D). The amino acid residues of P108/Q109/A112/I115 in FCPI-5 are associated with F134/I137/S141 in PsaL at the lumenal side (Fig. 5D). The interactions of FCPI-5 with FCPI-4 are shown in Fig. 5C. The amino acid sequences of F82–G87 and P107–I115 in Lhcq8 are not conserved in the Lhcq subfamily consisting of Lhcq1, Lhcq2, Lhcq3, Lhcq4, Lhcq5, Lhcq6, Lhcq7, Lhcq9, and Lhcq10 according to Kumazawa et al.(30) (Fig. S12A, B).

Molecular insights into the assembly of FCPIs in diatom PSI-FCPI supercomplexes

To evaluate the molecular assembly of FCPI subunits in the T. pseudonana PSI-FCPI structure, we focused on both protein-protein interactions based on their close distances (Fig. 35) and amino acid residues of non-conserved region among 44 FCPs (Fig. S9–S12). This is because selective associations of FCPIs with PSI require specific amino acid residues of each FCPI. The protein-protein interactions among FCPI subunits and between FCPI and PSI subunits exist at the stromal and lumenal sides (Fig. 35) and appear to be recognized by unique amino acid residues of FCPIs, which are not conserved in each LHC subfamily (Fig. S9–S12). Thus, the binding and assembly of each FCPI subunit to PSI is likely determined based on amino acid sequences among 44 FCPs in T. pseudonana, especially in their loop regions.

The diatom C. gracilis has shown two different PSI-FCPI structures, one with 16 FCPI subunits (15) and the other with 24 FCPI subunits (16). This is because of alterations of antenna sizes of FCPIs in the C. gracilis PSI-FCPI supercomplexes in response to growth conditions, especially CO2 concentrations and temperatures (38). The C. gracilis PSI-FCPI structure has five FCPI subunits at the same binding sites of FCPI-1 to 5 in the T. pseudonana PSI-FCPI structure (Fig. 6A). A relationship of the Lhc genes encoding FCPs and RedCAP with the binding positions of FCPI-1 to 5 in T. pseudonana and C. gracilis is summarized in Fig. 6B. It should be noted that the gene names of the C. gracilis FCPIs are based on Kumazawa et al. (30) as discussed in our recent study (14).

Comparisons of structures and sequences of FCPIs in the PSI-FCPI structures between T. pseudonana and C. gracilis.

(A) Superposition of the PSI-FCPI structures between T. pseudonana and C. gracilis (PDB: 6LY5). The T. pseudonana and C. gracilis FCPI subunits are colored red and cyan, respectively. The structures are viewed from the stromal side. The FCPI-1 to 5 sites are labeled. (B) Correlation of the names of FCPIs in the structures with their genes between T. pseudonana and C. gracilis. The genes of FCPIs are derived from Kumazawa et al. (30) and Kato et al. (14) for C. gracilis. (C) Phylogenetic analysis of FCPs and RedCAPs from T. pseudonana (Tp) and C. gracilis (Cg). In addition to the RedCAP subfamily, 44 TpFCPs and 46 CgFCPs are grouped into five Lhc subfamilies and CgLhcr9 homologs. Maroon, RedCAP subfamily; magenta, Lhcq subfamily; red, Lhcz subfamily; orange, Lhcr subfamily; brown, CgLhcr9 homologs; green, Lhcf subfamily; blue, Lhcx subfamily. The FCPs and RedCAPs located at the FCPI-1 to 5 sites are labeled. The tree was inferred by IQ-TREE 2 (58) using the Q.pfam + R4 model selected with ModelFinder (59). The light purple circular symbols on the tree represent bootstrap support (%).

Phylogenetic analysis clearly showed that at the sites of FCPI-1, 2, 3, and 5 in the T. pseudonana PSI-FCPI structure, TpRedCAP, TpLhcr3, TpLhcq10, and TpLhcq8 are orthologous to CgRedCAP, CgLhcr1, CgLhcr9, and CgLhcq12, respectively (Fig. 6C). The characteristic protein loops of TpRedCAP and CpRedCAP appear to be involved in interactions with PSI at the FCPI-1 site, as described above (Fig. S8) At the FCPI-2 site, comparative analyses revealed that the amino acid residues facilitating interactions between TpLhcr3 and TpPsaB/TpPsaM closely parallel those observed in the CgLhcr1-CgPsaB and CgLhcr1-CgPsaM pairs (Fig. S13). Similarly, a high degree of similarity characterized the residues involved in the interaction pairs of TpLhcq10-TpPsaL/CgLhcr9-CgPsaL at the FCPI-3 site (Fig. S14A, B) and TpLhcq8-TpPsaL/CgLhcq12-CgPsaL at the FCPI-5 site (Fig. S14C, D). However, TpLhcf10 was not homologous to CgLhcf3 (Fig. 6C), both of which are located at the FCPI-4 site in each PSI-FCPI structure (Fig. 6A). Thus, the two diatoms appear to possess both a conserved mechanism of protein-protein interactions across characteristic protein motifs between FCPI and PSI subunits and a different interaction mechanism among FCPIs.

It is of note that the PSI-FCPI structure of C. gracilis binds much more FCPI subunits than that of T. pseudonana, namely, 16 or 24 subunits in C. gracilis found in the previous studies (15, 16) vs. 5 subunits in T. pseudonana found in the present study. We do not know the reason for this difference at present. It is possible that some of the FCPI subunits are released during detergent solubilization in T. pseudonana, but they are reserved in C. gracilis. Alternatively, it is also possible that the number of FCPI subunits is inherently less than that of C. gracilis probably due to the differences in their living environments. Further studies will clarify this question.

Extension to molecular assembly of PSI-LHCI supercomplexes

The mechanisms of protein-protein interactions for diatom PSI-FCPI supercomplexes are likely developed by specific bindings of FCPs selected from 44 TpFCPs and 46 CgFCPs in addition to RedCAPs. Like a lock-and-key mechanism, one FCP cannot be substituted by the other FCPs to form the PSI-FCPI supercomplexes in the two diatoms, e.g., TpLhcq10 cannot bind at the FCPI-2 site. The selective binding mechanism of FCPIs may dictate the fate of the molecular assembly of PSI-FCPI. It is important to note that the selective bindings of FCPIs are found for the first time by comparing structures of PSI-FCPI supercomplexes and amino acid sequences of FCPIs in the two diatom species. This approach can be extended to the LHC protein superfamily in the green and red lineages, so that comparing protein structures and sequences of PSI-LHCI supercomplexes among closely related species lays the foundation for elucidating the general mechanism of molecular assembly of PSI-LHCI supercomplexes. Thus, this study will shed light on answering the evolutionary question as to how LHCIs recognize their binding sites to PSI in photosynthetic organisms.

Methods

Cell growth and preparation of thylakoid membranes

The marine centric diatom, T. pseudonana CCMP1335, was grown in artificial seawater containing sodium metasilicate and KW21 (39) at 20 °C at a photosynthetic photon flux density of 30 μmol photons m−2 s−1 provided by white LED with bubbling of air containing 3% (v/v) CO2. The cells were harvested by centrifugation, and then disrupted by agitation with glass beads (40), followed by centrifugation to pellet the thylakoid membranes. The thylakoid membranes thus prepared were suspended with a 50 mM Mes-NaOH (pH 6.5) buffer containing 1 M betaine and 1 mM EDTA.

Purification of the PSI-FCPI supercomplex

Thylakoid membranes were solubilized with 1% (w/v) n-dodecyl-β-D-maltoside (β-DDM) at a Chl concentration of 0.5 mg mL−1 for 20 min on ice in the dark with gentle stirring. After centrifugation at 162,000 × g for 20 min at 4°C, the resultant supernatant was loaded onto a Q-Sepharose anion-exchange column (1.6 cm of inner diameter and 25 cm of length) equilibrated with a 20 mM MES-NaOH (pH 6.5) buffer containing 0.2 M trehalose, 5 mM CaCl2, and 10 mM MgCl2, and 0.03% β-DDM (buffer A). The column was washed with buffer A until the eluate became colorless. Elution was performed at a flow rate of 1.0 mL min−1 using a linear gradient of buffer A and buffer B (buffer A plus 500 mM NaCl): 0–600 min, 0–60% buffer B; 600–800 min, 60–100% buffer B; 800–900 min, 100% buffer B. The fraction enriched in PSI-FCPI was eluted at 194–247 mM NaCl, which was collected and loaded onto a linear gradient containing 10–40% (w/v) trehalose in a medium of 20 mM MES-NaOH (pH 6.5), 5 mM CaCl2, 10 mM MgCl2, 100 mM NaCl, and 0.03% β-DDM. After centrifugation at 154,000 × g for 18 h at 4°C (P40ST rotor; Hitachi), a green fraction (Fig. S1A) was collected and concentrated using a 150 kDa cut-off filter (Apollo; Orbital Biosciences) at 4,000 × g. The concentrated samples were stored in liquid nitrogen until use.

Biochemical and spectroscopic analyses of the PSI-FCPI supercomplex

The polypeptide bands of PSI-FCPI were analyzed by SDS-polyacrylamide gel electrophoresis (PAGE) containing 16% (w/v) acrylamide and 7.5 M urea according to Ikeuchi and Inoue (41) (Fig. S1B). The PSI-FCPI supercomplexes (4 µg of Chl) were solubilized by 3% lithium lauryl sulfate and 75 mM dithiothreitol for 10 min at 60°C, and loaded onto the gel. A standard molecular weight marker (SP-0110; APRO Science) was used. The absorption spectrum of PSI-FCPI was measured at room temperature using a UV-Vis spectrophotometer (UV-2450; Shimadzu) (Fig. S1C), and the fluorescence-emission spectrum of PSI-FCPI was measured at 77 K upon excitation at 430 nm using a spectrofluorometer (RF-5300PC; Shimadzu) (Fig. S1D). The pigment composition of PSI-FCPI was analyzed by HPLC according to Nagao et al. (42), and the elution profile was monitored at 440 nm (Fig. S1E).

Cryo-EM data collection

A 3-μL aliquot of the T. pseudonana PSI-FCPI supercomplex (3.0 mg of Chl mL−1) in a 20 mM MES-NaOH (pH 6.5) buffer containing 0.5 M betaine, 5 mM CaCl2, 10 mM MgCl2, and 0.03% β-DDM was applied to Quantifoil R1.2/1.3 Cu 300 mesh grids in the chamber of FEI Vitrobot Mark IV (Thermo Fisher Scientific). Then, the grid was blotted with a filter paper for 4 sec at 4°C under 100% humidity and plunged into liquid ethane cooled by liquid nitrogen. The frozen grid was transferred into a CRYO ARM 300 electron microscope (JEOL) equipped with a cold-field emission gun operated at 300 kV. All image stacks were collected from 5 × 5 holes per stage adjustment to the central hole and image shifts were applied to the surrounding holes while maintaining an axial coma-free condition. The images were recorded with an in-column energy filter with a slit width of 20 eV and at a nominal magnification of × 60,000 on a direct electron detector (Gatan K3, AMETEK). The nominal defocus range was −1.8 to −1.2 μm. Physical pixel size corresponded to 0.752 Å. Each image stack was exposed at a dose rate of 21.46 eÅ−2sec−1 for 2.33 sec in CDS mode with dose-fractionated 50 movie frames. In total 8,950 image stacks were collected.

Cryo-EM image processing

The resultant movie frames were aligned and summed using MotionCor2 (43) to yield dose-weighted images. Estimation of the contrast transfer function (CTF) was performed using CTFFIND4 (44). All of the following processes were performed using RELION-4.0 (45). In total 2,733,572 particles were automatically picked up and used for reference-free 2D classification. Then, 1,132,721 particles were selected from good 2D classes and subsequently subjected to 3D classification without any symmetry. An initial model for the first 3D classification was generated de novo from 2D classification. As shown in Fig. S2C, the final PSI-FCPI structure was reconstructed from 75,667 particles. The overall resolution of the cryo-EM map was estimated to be 2.30 Å by the gold-standard FSC curve with a cut-off value of 0.143 (Fig. S3A) (46). Local resolutions were calculated using RELION (Fig. S3C).

Model building and refinement

Two types of the cryo-EM maps were used for the model building of the PSI-FCPI supercomplex: one was a postprocessed map, and the other was a denoised map using Topaz version 0.2.4 (47). The postprocessed map was denoised using the trained model in 100 epochs with two half-maps. Initial models of each subunit in the PSI-FCPI supercomplex were built by ModelAngelo (37), and then their structures were inspected and manually adjusted against the maps with Coot (48). Each model was built based on interpretable features from the density maps at a contour level of 2.5 σ in the denoised and postprocessed maps. For the assignment of Chls, Chls a and c were distinguished by inspection of the density map corresponding to the phytol chain with the above thresholds, which was found to be the least level not to link the map of Chls with that of noise. All Chls c were assigned as Chl c1, because of difficulty in distinction between Chl c1 and Chl c2 at the present resolution. For the assignment of Cars, Fx and Ddx were distinguished based on the density covering the head group of carotenoids with the above threshold. The PSI-FCPI structure was refined with phenix.real_space_refine (49) and Servalcat (50) with geometric restraints for the protein-cofactor coordination. The final model was validated with MolProbity (51), EMRinger (52), and Q-score (53). The statistics for all data collection and structure refinement are summarized in Table S1, S2. All structural figures were made by PyMOL (54), UCSF Chimera (55), and UCSF ChimeraX (56).

Since the numbering of Chls, Cars, and other cofactors in this paper were different from those of the PDB data, we listed the relationship of their numbering in this paper with those in the PDB data in Table S6–S8.

Phylogenetic analysis

Amino acid sequences were aligned using MAFFT L-INS-i v7.490 or MAFFT E-INS-i v7.520 (57). The alignment was trimmed using ClipKit v1.4.1 with smart-gap mode. The phylogenetic tree was inferred using IQ-TREE 2 (58) with the model selected by ModelFinder (59). The tree was visualized by iTOL v6 (60). Ultrafast bootstrap approximation was performed with 1000 replicates (61).

Data availability

Atomic coordinate and cryo-EM maps for the reported structure have been deposited in the Protein Data Bank under an accession code 8XLS [https://www.rcsb.org/structure/8XLS] and in the Electron Microscopy Data Bank under an accession code EMD-38457 [https://www.ebi.ac.uk/emdb/EMD-38457].

Acknowledgements

We thank Kumiyo Kato and Satoko Kakiuchi for their assistance in this study. The cells of T. pseudonana CCMP1335 were given by Prof. Yusuke Matsuda, Kwansei Gakuin University, Japan. Cryo-EM data was obtained using EM01CT and EM02CT of SPring-8 with the approval of the Japan Synchrotron Radiation Research Institute (JASRI Proposal No. 2022B2728 (J.-R.S.) and No. 2023A2715 (Y.N.)). This work was supported by JSPS KAKENHI grant Nos. JP22KJ2017 (M.K.), JP23K14211 (Y.N.), JP22H04916 (J.-R.S.), JP23H02347 (K.I.), and JP23H02423 (R.N.), Takeda Science Foundation (R.N., K.K.), and Research Support Project for Life Science and Drug Discovery (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under support No. 4176 (J.-R.S.).

Author Contributions

R.N. conceived the project; H.O. and R.N. prepared the PSI-FCPI supercomplexes and analyzed their biochemical characteristics; J.X., M.K., and K.I. performed phylogenetic and sequence analyses; Y.N. collected cryo-EM images; K.K. processed the cryo-EM data and reconstructed the final cryo-EM map; K.K. built the structural model and refined the final model; K.K., Y.N., M.K., K.I., and R.N. drafted the original manuscript; J.-R.S. modified the manuscript; and R.N. wrote the final manuscript, and all authors joined the discussion of the results.

Declaration of competing interest

The authors declare no conflict of interest.