Introduction

Immunoglobulin light-chain (AL) amyloidosis is a systemic disease associated with the overproduction and subsequent amyloid aggregation of patient-specific light chains (LCs) (1-4). Such aggregation may take place in one or several organs, the heart and kidneys being the most affected ones (1). AL originates from an abnormal proliferation of a plasma cell clone that results in LCs overexpression and over-secretion in the bloodstream (1). LCs belonging both to lambda (λ) and kappa (κ) isotypes are associated with AL; however, λ-LCs are greatly overrepresented in the repertoire of AL patients. Specifically, AL-causing LCs (AL-LCs) most often belong to a specific subset of lambda germlines such as IGLV6 (λ6), IGLV1 (λ1), and IGLV3 (λ3) (5-8).

λ-LCs are dimeric in solution with each subunit characterized by two immunoglobulin domains, a constant domain (CL) with a highly conserved sequence and a variable domain (VL) whose extreme sequence variability is the result of genomic recombination and somatic mutations (9-12). VL domains are generally indicated as the key responsible for LC amyloidogenic behavior. The observation that the fibrillar core in most of the structures of ex-vivo AL amyloid fibrils consist of VL residues further strengthens this hypothesis (13-17). However, in a recent Cryo-EM structure, a stretch of residues belonging to the CL domain is also part of the fibrillar core, and mass spectrometry (MS) analysis of several ex vivo fibrils from different patients indicates that amyloids are composed of several LC proteoforms including full-length LCs (18-21).

Interestingly the overproduction of a light chain is a necessary but not sufficient condition for the onset of AL. Indeed, the uncontrolled production of a clonal LC is often associated with Multiple Myeloma (MM), a blood cancer, but only a subset of MM patients develops AL, thus indicating that specific sequence/biophysical properties determine LC amyloidogenicity and AL onset (12,22-24). To date, the extreme sequence variability of AL-LCs has prevented the identification of sequence patterns predictive of LC amyloidogenicity, however, it has been reproducibly reported that several biophysical properties correlate with LC aggregation propensity. AL-LCs display a lower thermodynamic and kinetic fold stability compared to non-amyloidogenic LCs found overexpressed in MM patients (named hereafter M-LCs) (12,20-24).

Interestingly, previous work on LCs has indicated how differences in conformational dynamics can play a role in the aggregation properties of AL-LCs (22-26). Oberti et al. have compared multiple λ-LCs obtained from either AL patients or MM patients identifying the susceptibility to proteolysis as the best biophysical parameter distinguishing the two sets (12). Weber et al. have shown, using a mice-derived κ-LC, how a modification in the linker region can lead to a greater conformational dynamic, an increased susceptibility to proteolysis, as well as an increased in vitro aggregation propensity (25). Additionally, AL-LC flexibility and conformational freedom have also been correlated to the proteotoxicity observed in patients affected by cardiac AL and experimentally verified in human cardiac cells and a C. elegans model (27,28). It is noteworthy that the amyloid LCs analyzed in this study were originally purified from patients with cardiac amyloidosis.

Here building on this previous work as well on our previous experience on β2-microglobulin, another natively folded amyloidogenic protein (29-33) we investigated the native solution state dynamics of multiple λ-LCs by combining MD simulations, SAXS, and HDX-MS. Interestingly, we found a unique conformational fingerprint of amyloidogenic LCs corresponding to a low-populated state characterized by extended linkers, with an accessible VL-CL interface and possible structural rearrangements in the CL-CL interface.

Results

SAXS suggests differences in the conformational dynamics of amyloidogenic and non-amyloidogenic LC

SAXS was acquired either in bulk or in-line with SEC for a set of LCs previously described (cf. Table 1 and Methods). H3, H7, H18 (AL-LC), M7, and M10 (M-LC) were studied by Oberti, et al. (12) and identified in multiple AL or MM patients, while ex vivo fibrils of AL55 from heart, kidney, and fat tissue of an AL patient have been previously studied by Cryo-EM and MS (16,17,19,20). These LCs cover multiple germlines, with H18 and M7 belonging to the same germline (cf. Table 1). The sequence identity is the largest for H18 and M7 (91.6%) while is the lowest for AL55 and M7 (75.2%). A table showing the statistics for all pairwise sequence alignments is reported in Table S1 in the Supporting Information. For H3, H7, and M7 a crystal structure was previously determined (12) while for H18, AL55, and M10 we obtained a model using either homology modeling (H18 and AL55) or AlphaFold2 (M10). Qualitatively, the SAXS curves in Figure 1 did not reveal any macroscopic deviation of the solution behavior with respect to the crystal or model conformation. For each LC, we compared the experimental and theoretical curves calculated from the LC structures (cf. Table 1) analyzing the residuals and the associated 𝒳2. The analysis indicated a discrepancy between the model conformation and the data in the case of the AL-LCs, which was instead not observed in the case of the M-LCs. For AL-LC, residuals deviate from normality in the low q region, suggesting some variability in the global size of the system. Additionally, a weak trend distinguishing AL-LC from M-LCs could be identified in the radius of gyration (Rg) (cf. Table 1). H3, H7, H18, and AL55 display an Rg, as derived from the Guinier analysis of the SAXS curves, of 0.5 to 0.8 Å larger than M7 and M10. Overall, SAXS measurements point to less compact and more structurally heterogeneous AL-LCs compared to more compact and structurally homogeneous M-LCs.

LC systems studied in this work. For each LC in the table are reported the germline, the phenotype, the structure or the method used to obtain one, the agreement between the structure and the SAXS curves, and the radius of gyration derived from the SAXS data.

SAXS measurements for AL- and M light chains. Kratky plots comparing experimental (orange), and theoretical (black) curves and associated residuals (bottom panels) indicate that H LC solution behavior deviates from reference structures more than M LC. (A) H3 measured in bulk (Hamburg), 3.4 mg/ml. (B) H7 measured in bulk (Hamburg), 3.4 mg/ml. (C) H18 measured by online SEC-SAXS (ESRF), starting at 2.8 mg/ml. (D) AL55 measured in bulk (ESRF), 2.6 mg/ml. (E) M7 measured in bulk (Hamburg), 3.6 mg/ml. (F) M10 measured by online SEC-SAXS starting at 6.7 mg/ml (ESRF).

MD simulations reveal a conformational fingerprint for amyloidogenic light chains

To investigate the conformational dynamics of the six LCs we performed Metadynamics Metainference (M&M) MD simulations employing the SAXS curves (q<0.3 Å) as restraints (cf. Methods) (34-37). Metainference is a Bayesian framework that allows the integration of experimental knowledge on-the-fly in MD simulations improving the latter while accounting for the uncertainty in the data and their interpretation. Metadynamics is an enhanced sampling technique able to speed up the sampling of the conformational space of complex systems. The combination of SAXS and MD simulations has been shown to be effective for multi-domain proteins as well as for intrinsically disordered proteins (38-40).

For each LC we performed two independent M&M simulations coupled by SAXS restraint, accumulating around 120-180 µs of MD per protein (cf. Methods and Table S2 in the Supporting Information). The resulting conformational ensembles resulted in a generally improved agreement with the SAXS data employed as restraints (cf. Table S2 and Figure S1 in the Supporting Information). To investigate differences in the LC local flexibility we first analyzed the root mean square fluctuations (RMSF) for the CL and VL separately, averaging over the chains and the replicates, Figure 2A. The RMSF indicates comparable flexibility in most of the regions, with differences localized in the termini and in some loops. The VL of amyloidogenic LCs are generally more flexible than the M ones, but this may be associated with the lengths of their complementarity-determining regions (CDRs). Indeed, M10 has the longest and most flexible CDR1 and the shortest and least flexible CDR3. Unexpectedly, there are some differences also in the CL domains. Here, in Figure 2A, the AL-LCs are always more flexible in at least one region even if these differences are relatively small. Overall, the RMSF does not provide a clear indication to differentiate AL and M-LCs. To provide a global description of the dynamics of the six LC systems, we then introduced two collective variables, namely the elbow angle, describing the relative orientation of VL and CL dimers, and the distance between the VL and CL dimers center of mass, illustrated in Figure 2B.

(A) Residue-wise root mean square fluctuations (RMSF) obtained by averaging the two Metainference replicates and the two equivalent domains for the six systems studied. The top panel shows data for the variable domains, while the bottom panel shows data for the constant domain. Residues are reported using Chothia numbering (49). (B) Schematic representation of two global collective variables used to compare the conformational dynamics of the different systems, namely the distance between the center of mass of the VL and CL dimers and the angle describing the bending of the two domain dimers.

In Figure 3 we report the free energy surfaces (FES) obtained from the processing of the two replicates of each LC as a function of the elbow angle and the CL-VL distance calculated from their center of mass. The visual inspection of the FES indicates converged simulation: in all cases, the replicates explore a comparable free-energy landscape with comparable features. All six LC FES share common features: a relatively continuous low free energy region along the diagonal, spanning configurations where the CL and VL are bent and close to each other (state LB), and configurations where the CL and VL domains are straight and at relative distance between 3.4 and 4.1 nm (state LS). A subset of LCs, namely H18, M7, and AL55, display conformations where the domains are straight in line (elbow angle greater than 2.5 rad) and in close vicinity, with a relative distance between the center of mass of less than 3.4 nm (state G). Of note, H18 and M7 belong to the same germline, letting us speculate that this state G may be germline-specific. Most importantly, only the AL-LCs display configurations with CL and VL straight in line but well separated at relative distances greater than 4.1 nm, this state H seems to be a fingerprint specific for AL-LCs. A set of configurations exemplifying the four states is reported in Figure 2. The estimates of the populations for the four states LB, LS, G, and H are reported in Table 2. The quantitative analysis indicates that, within the statistical significance of the simulations, states LB and LS represent in all cases most of the conformational space. In the case of H18, AL55, and M7, the compact state G is also significantly populated (10-34%). The state H, associated with amyloidogenic LCs, is populated between 5 and 10% in H3, H7, H18, and AL55 and less than 1% in M7 and M10.

Populations of the four states shown in Figure 3 resulting from the two independent Metadynamics Metainference simulations performed for each of the 6 LCs. The population of the H state, which we supposed to be a fingerprint specific for AL-LCs, is in bold.

Free Energy Surfaces (FESes) for the six light chain systems under study by Metadynamics Metainference MD simulations. For each system, the simulations are performed in duplicate. The x-axis represents the elbow angle indicating the relative bending of the constant and variable domains (in radians), while the y-axis represents the distance in nm between the center of mass of the CL and VL dimers. The free energy is shown with color and isolines every 2kBT corresponding to 5.16 kJ/mol. On each FES are represented four regions (green, red, blue, and black rectangles) highlighting their main features. For each region, a representative structure is reported.

To identify additional differences between the conformations observed in state H and the rest of the conformational space, we focused our attention on the VL-VL and CL-CL dimerization interfaces. In Figure 4, we show the free energy as a function of the distance between the CL domains versus the distance between the VL domains for each of the four states for one of the two simulations performed on H3; the same analysis for all other simulations is shown in Figures S2 to S7 in the Supporting Information. From the comparison of the FESes, it is clear that only in the conformations corresponding to the state H do the CL-CL dimers display an alternative configuration. In the case of H3, the CL-CL domains in the H state are characterized by a shift towards configurations characterized by a larger distance, the same is observed in the case of H18 and AL55, while in the case of H7 the H state is characterized by a smaller distance between the CL domains.

Free energy surfaces for the four substates identified in Figure 3 in the case of the first H3 Metainference simulation. The x-axis shows the distance between the centers of mass of the constant domains, while the y-axis shows the distance between the centers of mass of the variable domains. The free energy is shown with color and isolines every 2kBT corresponding to 5.16 kJ/mol.

Our conformational ensembles allowed us to hypothesize a conformational fingerprint for AL proteins, namely the presence of a weakly but significantly populated state (H) characterized by a more extended quaternary structure, with VL and CL dimers well separated, and with perturbed CL-CL interfaces.

HDX independently validates the amyloidogenic LC conformational fingerprint

To gain further molecular insight into how the dynamics of the tertiary and quaternary structures can be differentiated in AL- and M-LCs, HDX-MS was performed on our set of proteins. HDX-MS probes the protein dynamics by monitoring the hydrogen-to-deuterium uptake over time and the obtained data well complement structural, biophysical, and computational data. Four LCs from our set (H3, H7, AL55, and M10) yielded good peptide sequence coverages of 98.6, 92.5, 98.6, and 99.1%, respectively (Figures S8A-S11A, and Table S3 in the Supporting Information) while H18 and M7 were not included in this analysis due to their poor sequence coverage and were not further investigated.

HDX-MS analysis revealed subtle structural dynamics of the individual proteins. The most significant difference between the AL and M-LCs is observed for residues 34-50, which are part of both the VL-VL dimerization interface and, more importantly in the context of this work, the CL-VL interface. These residues show significantly higher deuterium uptake in all H-proteins, with H3 being the highest, implying that AL-LCs dimeric interfaces (VL-VL and CL-VL) are more dynamic and hence significantly destabilized than in M10 (Figure 5, Figure S12 in the Supporting Information). The highly dynamic VL-VL interface of H3 also correlates well with its open VL-VL interface in a crystal structure (PDB 8P89) which houses two nanobodies interacting with each VL in a dimeric structure (28). On the other hand, residues 54-70, which are not part of either interface, show higher deuterium uptake and hence more dynamics in the M10 protein, which may be a result of redistributed dynamics due to the rigidity of its VL-VL interface, as observed previously (41,42) (Figure 5). In contrast, the VL-CL hinge regions (residues 100-120) show homogenous high flexibility in all the proteins due to their higher accessible surface area (Figure 5). As expected, the CL domains also show a similar pattern of deuterium uptake and hence flexibility in AL- and M-LCs, with a minor difference contributed by the rigid VL-CL interface containing residues 161-180 (Figure S10 in the Supporting Information). This region shows significantly less deuterium uptake (rigid) in both AL and M proteins when compared to other peptides in the CL domain (Figures S8B-S11B in the Supporting Information). However, comparing the average uptake for this region (161-180) between AL and M proteins shows that H3 and AL55 have higher uptake than M10. In contrast, H7 is an exception with the lowest deuterium uptake in this region (Figure S12 in the Supporting Information). These data are particularly interesting in the light of our simulations. The dimeric conformations identified in the H state, Figure 3, are characterized by higher accessibility for the CL-VL interface, which is in perfect agreement with the increased accessibility for the region 34-50 on the VL and 161-180 in the CL observed in the HDX-MS analysis. Notably, the H state of H7 is the only one in which the CL-CL interface is remarkably compact (see Figure S3 in the Supporting Information), consistent with the lower H/D exchange for regions 161-180 observed in H7. Overall, the HDX-MS data provide an independent validation of the H-state predicted from our conformational ensembles.

HDX-MS analysis. The top panel represents the simplified presentation of the primary structure of an LC including variable domain (VL) and constant domain (CL). The location of β-strands according to Chothia and Lesk (49). The middle panel represents the relative HDX butterfly plots of H3, H7, AL55, and M10 proteins. The peptides showing significantly higher deuterium uptake are labeled on their respective peaks. The peptide from residues 34-50 in AL-LCs and 54-70 in M-LC are labeled in orange. The lower panel represents the structural mapping of the selected peptides showing the highest deuterium uptakes using PyMOL. The VL-VL and VL-CL interfaces covering residues 34-50 and residues 161-180 are pointed with dark orange and light orange arrows, respectively.

Discussion

Understanding the molecular determinants of AL amyloidosis has been hampered by its high sequence variability in contrast to its highly conserved three-dimensional structure (8). In this work, building on our previous studies highlighting susceptibility to proteolysis as a property that can discriminate between AL-LC and M-LC, as well as the role of conformational dynamics in protein aggregation, we characterized LC conformational dynamics under the assumption that AL-LC proteins, despite their sequence diversity, may share a property that emerges at the level of their dynamics. We combined SAXS measurements with MD simulations under the integrative framework of Metainference to generate conformational ensembles representing the native state conformational dynamics of 4 AL-LC and 2 M-LC. While SAXS alone already indicated possible differences, its combination with MD allowed us to observe a possible low-populated state, which we refer to as state H, characterized by well-separated VL and CL dimers and a perturbed CL-CL interface, which is significantly populated in AL-LCs while only marginally populated in M-LCs. HDX measurements allowed us to independently validate this state by observing increased accessibility in CL-VL interface regions. Notably, our conformational ensembles are similar to those observed for a linker mutation in the case of a kappa LC (25). Furthermore, the presence of high-energy, so-called excited states associated with amyloidogenic proteins has been previously identified in the case of SH3 (43), and β2m (29).

Having established a conformational fingerprint for AL-LC proteins, it would be tempting to identify possible mutations that could be associated with the presence of the H state. Comparing the sequences and structures of M7 and H18, both of which belong to the IGLV3-19*01 germline, we can identify a single mutation, A40G, that could easily be associated with the appearance of the H state in H18. This mutation is located in the 37-43 loop, which H/D exchange showed to be more accessible in our three AL-LCs than in our M-LC (see Figure 5 and Figure S12), and it breaks a hydrophobic interaction with the methyl group of T165, as observed in the crystal structure of M7 (PDB 5MVG and Figure S13), potentially making T165 more accessible in H18 than in M7 (see Figure 5, and Figure S12). Comparing the H18 and M7 sequences with the germline reference sequence, we see that position 40 in IGLV3-19*01 is a glycine (see Figure S13). This would suggest the intriguing interpretation that the G40A mutation in M7 may increase the interdomain stability compared to the germline sequence, making it less susceptible to aggregation. However, it should also be noted that while this framework position is a glycine in H3, H7, and AL55, it is also a glycine in M10. Previous research has often focused on identifying, on a case-by-case basis, the key mutations that may be considered responsible for the emergence of the aggregation propensity, under the assumption that such aggregation propensity should not be present in germline sequences, but this assumption may be misleading given the observation that few germlines are strongly overrepresented in AL, suggesting that these starting germline sequences may be inherently more aggregation-prone than the germline genes that are absent or rarely found in AL patients. More generally, by comparing our AL-LC sequences with their germline references (Figures S14 to S19 in the Supporting Information), we observe that all mutations fall exclusively in the variable domain, allowing us to exclude for these systems a direct role for residues in the linker region, as observed in ref (25,44), or in the constant domain, as observed in ref (45). Many mutations fall in the CDR regions, as expected, but others are found in the framework regions, both near the dimerization interface and in other regions of the protein. Regarding mutations in the CDRs, it has been suggested that AL-LC proteins may exhibit frustrated CDR2 and CDR3 loops, with few key residues populating the left-hand alpha helix or other high-energy conformations (46), resulting in the destabilization of the VL. In Figures S14 to S19 in the Supporting Information, we have analyzed the Ramachandran plot obtained from our conformational ensembles, focusing only on those residues that most populate the left-hand alpha helix region, which are marked with a red circle symbol and whose Ramachandran is reported. Our data indicate the presence of residues populating left-hand alpha regions in the Ramachandran plot, but these are also found in the case of M-LCs, so our simulations do not allow to confirm or exclude this mechanism in our set of protein systems.

In conclusion, our study provides a novel, complementary, perspective on the determinants of the misfolding propensity of AL-LCs that we schematize in Figure 6. The identification of a high-energy state, with perturbed CL dimerization interfaces, extended linkers, and accessible regions in both the VL-CL and VL-VL interfaces may be the common feature interplaying with specific properties shown by previous work including the direct or indirect destabilization of both the VL-VL and CL-CL dimerization interfaces (22,23,45-48). Our conformational fingerprint is also consistent with the observation that protein stability does not fully correlate with the tendency to aggregate, whereas susceptibility to proteolysis and conformational dynamics may better to capture the differences between AL-LC and M-LC. In this context, our data allow us to rationally suggest that targeting the constant domain region at the CL-VL interface, which is more labile in the H state, maybe a novel strategy to search for molecules against LC aggregation in AL amyloidosis.

Schematic representation summarizing our findings in the context of previous work on the biophysical properties of amyloidogenic light chains. We propose that the H state is the conformational fingerprint distinguishing AL LCs from other LCs, which together with other features contributes to the amyloidogenicity of AL LCs.

Materials and Methods

LC production and purification

Recombinant AL- (H3, H7, H18, and AL5) and M- (M7, M10) proteins were produced and purified from the host E. coli strain BL21(DE3). Firstly, the competent BL21(DE3) cells were transformed with plasmid pET21(b+), which contains genes encoding H3, H7, H18, AL55, M7, and M10 proteins. The transformed cells were selected for each plasmid by growing them on LB agar plates containing the antibiotic ampicillin at a final concentration of 100 µg/ml. For over-expression of protein, one colony was picked from each plate and grown overnight in 20 ml of LB broth containing ampicillin at a final concentration of 100 µg/ml. The overnight-grown cells were then used to inoculate a secondary culture in one liter of LB broth. The cells were grown until the turbidity (OD600nm) reached between 0.6-0.8 and protein expression was subsequently induced by adding 0.5 mM isopropyl-β-D-thiogalactopyranoside (IPTG) for 4 h. The bacterial cells containing overexpressed LCs were then harvested using a Backman Coulter centrifuge at 6000 rpm for 20 min at 4 °C. All the proteins were overexpressed as inclusion bodies. For protein purification, the inclusion bodies were isolated by cell lysis induced by sonication. The purification of inclusion bodies was performed by washing them with buffer containing 10 mM Tris (pH 8) and 1% triton X 100. The purified inclusion bodies were unfolded with buffer containing 6.0 M guanidinium hydrochloride (GdnHCl) for 4h at 4 °C. The unfolded LCs were then refolded in a buffer containing reduced and oxidized glutathione to assist in disulfide bond formation. The refolded proteins were subjected to anion exchange and size-exclusion chromatography steps for final purification. The level of protein purity was checked on 12% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gels. The final protein concentration was measured using molecular weight and extinction coefficient of individual proteins. The purified proteins were stored at -20 °C for further use.

Additional Methods for SAXS, MD simulations, and HDX experiments are available in the Supporting Information. SAXS data are available on the SASBDB (cf. Dataset S1 in the supporting information). Simulations data are available on Zenodo (cf. Dataset S2 in the supporting information).

Acknowledgements

We acknowledge Martin A. Schroer and Dmitri Svergun at EMBL Hamburg, and Sonia Longhi at AFMB Marseille, for discussion and support on the SAXS data acquisition and analysis. We acknowledge PRACE for awarding us access to Piz Daint at CSCS, Switzerland. The authors acknowledge CINECA for an award under the ISCRA initiative for the availability of high-performance computing resources and support. This work was supported by the Italian Ministry of Research PRIN 2020 (20207XLJB2) and by CARIPLO/TELETHON Foundations (GJC23044). SP acknowledges Fondazione Veronesi for a postdoctoral fellowship. We also acknowledge Karen Hsu, and Yong-sheng Wang for protein quality check experiments, Shu-Yu Lin, and Ming-Jie Tsi of the MS core facility, IBC, Academia Sinica for their support in HDX-MS experiments.