Conformational and oligomeric states of SPOP from smallangle Xray scattering and molecular dynamics simulations
Abstract
Speckletype POZ protein (SPOP) is a substrate adaptor in the ubiquitin proteasome system, and plays important roles in cellcycle control, development, and cancer pathogenesis. SPOP forms linear higherorder oligomers following an isodesmic selfassociation model. Oligomerization is essential for SPOP’s multivalent interactions with substrates, which facilitate phase separation and localization to biomolecular condensates. Structural characterization of SPOP in its oligomeric state and in solution is, however, challenging due to the inherent conformational and compositional heterogeneity of the oligomeric species. Here, we develop an approach to simultaneously and selfconsistently characterize the conformational ensemble and the distribution of oligomeric states of SPOP by combining smallangle Xray scattering (SAXS) and molecular dynamics (MD) simulations. We build initial conformational ensembles of SPOP oligomers using coarsegrained molecular dynamics simulations, and use a Bayesian/maximum entropy approach to refine the ensembles, along with the distribution of oligomeric states, against a concentration series of SAXS experiments. Our results suggest that SPOP oligomers behave as rigid, helical structures in solution, and that a flexible linker region allows SPOP’s substratebinding domains to extend away from the core of the oligomers. Additionally, our results are in good agreement with previous characterization of the isodesmic selfassociation of SPOP. In the future, the approach presented here can be extended to other systems to simultaneously characterize structural heterogeneity and selfassembly.
Editor's evaluation
In this important paper, the authors have developed an approach for simultaneously optimizing the conformational ensemble and degrees of oligomerization, and this has been tested by applying it to a specific protein (SPOP). Comparison of the quality of fits with different models also provides valuable insights into structural features important to the assembly of oligomers. The approach, presented with compelling experimental support, is potentially applicable to other systems as well.
https://doi.org/10.7554/eLife.84147.sa0Introduction
Protein selfassociation is fundamental for many processes in biology (Ali and Imperiali, 2005; Marsh and Teichmann, 2015), and it has been estimated that around half of all proteins form dimers or higherorder complexes (Lynch, 2012). One such protein is Speckletype POZ protein (SPOP), a substrate adaptor in the ubiquitin proteasome system, which recruits substrates for the Cullin3RING ubiquitin ligase (CRL3) (HernándezMuñoz et al., 2005; Kent et al., 2006; Kwon et al., 2006). SPOP targets a range of substrates for degradation, including proteins involved in hormonal signalling, epigenetic modification, and cellcycle control, such as the androgen receptor (AR) (An et al., 2014) and deathassociated protein 6 (DAXX) (Kwon et al., 2006; Cuneo and Mittag, 2019). SPOP is thus an important regulator of cellular signalling, and mutations in SPOP are associated with a variety of cancers (Le Gallo et al., 2012; Kim et al., 2013; Cuneo and Mittag, 2019).
The 374residue SPOP monomer consists of three domains. From N to Cterminus, these are the MATH domain (i.e., the meprin and TRAFC homology domain), the BTB domain (i.e. the broadcomplex, tramtrack, and bricabrac domain), and the BACK domain (i.e. the BTB and Cterminal Kelch domain). MATH is the substrate binding domain, while the BTB domain mediates interaction with CRL3 (Zhuang et al., 2009; Bosu and Kipreos, 2008). Both the BTB and BACK domains can homodimerize, resulting in the formation of polydisperse, linear higherorder SPOP oligomers with alternating BTBBTB and BACKBACK interfaces (Errington et al., 2012; van Geersdaele et al., 2013; Marzahn et al., 2016). The BTBmediated dimer is formed with nanomolar affinity, and this dimer acts as the unit of higherorder oligomerization, which occurs through micromolar affinity BACK dimerization. Thus, only evennumbered SPOP oligomers are substantially populated (Marzahn et al., 2016; Figure 1).
Chemical crosslinking experiments have shown that SPOP oligomers form inside cells (Marzahn et al., 2016), and analysis of SPOP homologues shows sequence covariation across both the BTBBTB and BACKBACK interfaces (Bouchard et al., 2018), together suggesting that selfassociation has physiological relevance. By presenting multiple MATH domains for substrate binding, SPOP oligomers can simultaneously bind to multiple lowaffinity binding motifs in a single substrate, resulting in an overall increased affinity through avidity effects (Pierce et al., 2016). The longer lifetimes of these complexes enable effective polyubiquitination (Pierce et al., 2016). This suggests that tuning SPOP’s oligomerization state could act as a mechanism to regulate substrate degradation (Errington et al., 2012). SPOP oligomerization is also involved in phase separation. SPOP localizes to nuclear speckles in cells (Marzahn et al., 2016), and upon overexpression of certain substrates, SPOP and substrate colocalize to condensates which recruit CRL3 and display active substrate ubiquitination (Bouchard et al., 2018). This process requires both SPOP oligomerization and substrate binding (Marzahn et al., 2016; Bouchard et al., 2018), and it has been proposed that SPOP oligomers function as scaffolds that enable binding of substrates both within and between oligomers, resulting in filamentformation at low substrate concentrations and condensate formation at higher substrate concentrations (Bouchard et al., 2018; Schmit et al., 2020).
The higherorder selfassociation of SPOP follows the isodesmic model (Marzahn et al., 2016), in which the equilibrium between oligomer $i$ and $i$+1 is described by a single equilibrium constant independently of oligomer size (Oosawa and Kasai, 1962). In the case of SPOP, the BTBmediated dimer acts as the protomer of higherorder selfassociation, and the isodesmic $K}_{D$ thus describes BACKBACK selfassociation. The isodesmic model can be used to calculate the equilibrium concentration of every oligomeric species as a function of the total protomer concentration (Figure 1a and c). For SPOP, a low micromolar isodesmic ${K}_{D}$ has been determined from composition gradient multiangle light scattering experiments (CGMALS; Marzahn et al., 2016). While these insights describe the heterogeneity in oligomer sizes, the conformational heterogeneity of the higherorder oligomers has not been characterized. Previous work revealed that constitutive SPOP dimers, created via deletion of the BACK domain, have considerable conformational heterogeneity in the position of their MATH domains. The MATH domains are seen docked onto the BTB dimer in the structure, but smallangle Xray scattering (SAXS) experiments showed that they could undock from the BTB domains, enabled by a long flexible linker (Zhuang et al., 2009). This may enable binding of multivalent substrates with different spacing between SPOP binding motifs. Whether this conformational flexibility also exists in higherorder SPOP oligomers is unclear.
Here, we aimed to determine simultaneously both the distribution of oligomeric states of SPOP and the conformational ensemble of each SPOP oligomer by combining SAXS experiments and MD simulations. SAXS can provide lowresolution information on protein structure in solution, but reports on an ensemble average, which in the case of SPOP is both an average over different oligomeric states and the structural heterogeneity of each oligomeric state. Therefore, SAXS experiments are often combined with MD simulations to provide a full structural model of the system (Thomasen and LindorffLarsen, 2022). In the case of polydisperse systems, it is sometimes possible to deconvolute the information into contributions from a small number of individual species and analyse these individually (HerranzTrillo et al., 2017; Meisburger et al., 2021). We took a different approach and aimed to explicitly model every relevant configuration of SPOP in its range of oligomeric states along with the associated thermodynamic weight of each configuration. We collected SAXS data on SPOP at a range of protein concentrations and constructed initial conformational ensembles of every substantially populated oligomeric state using coarsegrained MD simulations. We then developed an approach to simultaneously and selfconsistently optimize the distribution of oligomeric states, given by the isodesmic model (Oosawa and Kasai, 1962; Shemesh et al., 2021), and refine the conformational ensemble of each oligomer against the SAXS data using Bayesian/maximum entropy (BME) reweighting (Bottaro et al., 2020; Figure 2). Our results show that SPOP forms rigid, helical oligomers in solution, and that the linker connecting the MATH and BTB domains is likely flexible, allowing for repositioning of the MATH domains during substrate binding. Our results also provide further evidence that SPOP selfassociation follows the isodesmic model, and we find an isodesmic ${K}_{\text{D}}$ in good agreement with the previously determined value (Marzahn et al., 2016). Using SAXS experiments of a cancer variant of SPOP we also show how our approach can be used to determine changes in the level of selfassociation.
Results
We collected a concentration series of SAXS data on a previously used truncated version of SPOP, SPOP^{28–359} (full length is 374 residues), with total protein concentrations ranging from 5 to 40 µM. In order to build structural models to refine against the SAXS data, we first needed to decide which oligomeric species to include in our modelling. To this aim, we used the isodesmic selfassociation model, which has previously been shown to describe SPOP oligomerization well (Marzahn et al., 2016). We fitted previously measured CGMALS data (Marzahn et al., 2016) to obtain an isodesmic ${K}_{D}$ of 1.6±0.3 µM (Figure 1—figure supplement 1). Based on the isodesmic model fitted to the CGMALS data, the population of oligomers larger than ~30mersshould be very low at the concentration range used in our SAXS experiments. As scattering intensity is proportional to particle size squared, larger oligomers, however, make a considerable contribution to the SAXS signal despite their low concentrations (Figure 1c). Given the concentrations from the isodesmic model and taking into account the increased scattering of larger oligomers, we decided that constructing models of oligomers up to the 60mer should be sufficient to capture all substantial contributions to the SAXS data.
There are no crystal structures of SPOP^{28–359} available, so we constructed a model of the SPOP^{28–359} BTBdimer using the crystal structure of the isolated BACK domain (4HS2) (van Geersdaele et al., 2013) and the crystal structure of a truncated construct containing only the MATH and BTB domains (3HQI) (Zhuang et al., 2009; Figure 1b). We used this model of the BTBdimer to construct SPOP^{28–359} oligomers, which we used as starting structures for MD simulations. We ran 60 µs MD simulations of oligomers ranging from the dimer to the dodecamer; we used a coarsegrained representation of SPOP modelled using the Martini 3 force field (Souza et al., 2021) further modified by increasing proteinwater interactions by 6% (Thomasen et al., 2022). It would be computationally prohibitive to run simulations of large oligomers up to the 60mer. Instead, we relied on the assumption that the dodecamer behaves similarly to a segment of an arbitrarily long oligomer, and constructed conformational ensembles of oligomers up to the 60mer by joining together conformers from the simulations of the dodecamer at the BACKBACK interface (Figure 1d).
We calculated SAXS intensities from our conformational ensembles and, given the relative population of each oligomer from the isodesmic model with ${K}_{\text{D}}$=1.6 µM, determined from CGMALS, we calculated SAXS profiles averaged over all the oligomeric species. We found that the SAXS data calculated in this way from the ensembles generated by MD simulations convoluted with the isodesmic model were in good agreement with the experimental SAXS data, giving a reduced ${\chi}^{2}$ to the concentration series of SAXS data (${\chi}_{r,\text{global}}^{2}$) of 1.53 (Figure 3). Despite the overall good agreement, the residuals revealed some systematic deviations to the experimental SAXS profiles. These deviations could potentially arise from inaccuracies in the distribution of oligomeric states given by the isodesmic model, from inaccuracies in the modelled conformational ensembles, or from both. As a first step, we wanted to see if we could eliminate the deviations by only tuning the distribution of oligomeric states. We globally optimized the ${K}_{\text{D}}$ of the isodesmic model against the concentration series of SAXS data, which gave ${K}_{\text{D}}$=0.9±0.4 µM, in good agreement with ${K}_{\text{D}}$=1.6±0.3 µM determined from CGMALS, and resulted in a ${\chi}_{r,\text{global}}^{2}$ of 1.24 to the SAXS data (Figure 3). However, this did still not fully eliminate the systematic deviations from the experimental SAXS profiles.
To improve the agreement with the experimental SAXS data further, we aimed to simultaneously refine the conformational ensemble of each oligomer and optimize the distribution of oligomeric states. We developed a selfconsistent optimization scheme, in which the isodesmic ${K}_{\text{D}}$ is optimized globally to the entire concentration series of SAXS data followed by reweighting of the conformations of each oligomer against a SAXS profile deconvoluted from the experimental SAXS data (Figure 2; see Methods section for details). To reweight the ensembles, we used BME reweighting, in which the population weights of the conformational ensemble are minimally perturbed with respect to the prior ensemble (generated by the MD simulations) to improve the agreement with a set of experimental data (Bottaro et al., 2020). This approach resulted in excellent agreement with the experimental SAXS data, giving a ${\chi}_{r,\text{global}}^{2}$ of 0.69, while only small deviations remained (Figure 3). The isodesmic ${K}_{\text{D}}$ was fitted to 1.3±0.5 µM, and thus also remained in good agreement with the previously determined value (Marzahn et al., 2016). To validate our approach and to examine the possibility of overfitting, we left out one SAXS profile recorded with 15 µM protein from the optimization. The optimized ${K}_{\text{D}}$ and ensemble weights did not substantially affect the fit to this SAXS profile, suggesting that we had avoided overfitting (Figure 3—figure supplement 1). These results show that SAXS data on SPOP can be explained well by conformational ensembles of linear oligomers with populations given by the isodesmic model, and thus provide further evidence that SPOP selfassociation follows a simple isodesmic mechanism (Marzahn et al., 2016).
The previously published CGMALS data on SPOP clearly precludes a simple dimer–tetramer, dimer–hexamer, dimer–octamer, or dimer–decamer equilibrium in favor of an isodesmic selfassociation model (Marzahn et al., 2016). To determine whether the SAXS data also favors the isodesmic model, we used our conformational ensembles to examine whether monodisperse oligomers, ranging in size between an octamer and 60mer, as well as corresponding dimeroligomer equilibria, would be compatible with the SAXS concentration series (Figure 3—figure supplement 2). For each dimer–oligomer equilibrium, we fitted the ${K}_{\text{D}}$ globally to the SAXS data. Thus, the isodesmic model and dimer–oligomer models are of comparable complexity, with only a single free parameter. The results show that the SAXS concentration series is in better agreement with an isodesmic distribution of oligomers than with any of the tested single oligomers or dimer–oligomer equilibria.
We also wished to examine whether the conformational ensembles of SPOP generated by the MD simulations described the SAXS data better than static structures. As a first comparison, we calculated SAXS profiles from the initial SPOP oligomer structures constructed based on crystal structures. To make the results comparable with our optimized ensembles, we fitted the isodesmic ${K}_{\text{D}}$ to the SAXS data for the static structures. This resulted in a worse agreement with the SAXS data (${\chi}_{r,\text{global}}^{2}$=4.03 with ${K}_{\text{D}}$=0.43 µM) than what we obtained using the ensembles both before and after reweighting. We also investigated the agreement with the SAXS data for individual structures drawn from the ensembles of the oligomers, again fitting the isodesmic ${K}_{D}$ for each set of structures. We found that some of the single structures from the ensembles could fit the SAXS data as well as the entire ensembles before reweighting, but no set of static structures fit the SAXS data as well as the reweighted ensembles (Figure 3—figure supplement 3). This result highlights that, while an ensemble of multiple conformers is likely necessary to produce the best agreement with the SAXS data, SPOP oligomers have a relatively rigid structure overall, allowing for reasonable agreement with the SAXS data without modelling the conformational heterogeneity for each oligomer. The improvement in agreement with the SAXS data over the starting structures, also for individual conformers, suggests that the MD simulations contribute, not only by modelling the conformational heterogeneity, but also by simply relaxing the structure to a more accurate state.
The results described above show that the SAXS data fit well to an isodesmic model with a ${K}_{D}$ value close to that determined from CGMALS. We wished to validate our approach further by comparing the SAXSderived model of selfassociation with the CGMALS data more directly. We therefore calculated the average molecular weight given by the isodesmic model with the ${K}_{D}$ of 1.3 µM that we obtained by fitting to the SAXS data and compared the results with the CGMALS data (Figure 3—figure supplement 4). This analysis confirmed that the model of selfassociation derived from our analysis of the SAXS data is fully consistent with the independently measured CGMALS data.
Having generated a conformational ensemble of each SPOP oligomer in agreement with the SAXS data, we proceeded to analyze the structures. Reweighting resulted in an increase in the radius of gyration (${R}_{g}$) for almost all oligomeric species, suggesting that slightly more expanded conformations than those sampled with our modified version of Martini are more consistent with the SAXS data (Figure 4a–b and Figure 4—figure supplement 1). This expansion can be attributed both to a slight increase in the endtoend distance for most oligomers (Figure 4c–d and Figure 4—figure supplement 2), as well as a slight increase in the average distance between the MATH and BTB/BACK domains for all oligomers upon reweighting (Figure 4e–f).
In order to investigate the global flexibility and compaction of SPOP oligomers, we fitted a power law to the average endtoend distance (${R}_{\text{EE}}$) as a function of the number of subunits in the oligomer ($N$), ${R}_{\text{EE}}={R}_{0}{N}^{\nu}$, where R_{0} is the subunit segment size and $\nu $ is a scaling exponent (Figure 4c). The fit gave R_{0}=~3.1 nm and $\nu $=0.99, showing a linear growth of the endtoend distance with the number of subunits. This result is consistent with no significant curvature or compaction of the oligomers and, along with the narrow distribution of endtoend distances for each oligomer (Figure 4—figure supplement 2), suggests that the SAXS data is compatible with a distribution of straight and relatively rigid SPOP oligomers, at least on length scales up to the ~180 nm of the 60mer. The helical structure of larger oligomers, with ~16 subunits per turn, is evident as small periodic deviations from the fit (Figure 4c).
The MATH and BTB domains are connected through a ~20 residue long linker region (Figure 1b). We hypothesized that this linker may be flexible, allowing for reconfiguration of the MATH domains with respect to the crystal structure (Zhuang et al., 2009). We calculated the distances between the centerofmass (COM) of the MATH domain and the COM of the BTB/BACK domains in the ensembles for every subunit of every oligomer. The distribution of these MATHBTB/BACK distances reveal two populations overlapping with the two crystal structure configurations (Figures 4e, i ,, 5c), where the MATH domains are in close proximity to the BTB/BACK domains (Zhuang et al., 2009). However, there is also a third population in which the MATH domains are extended away from the BTB/BACK domains, suggesting that the MATHBTB linker is flexible and allows for movement out of the configurations observed in the crystal structure. Reweighting slightly increased the population of this extended state (Figure 4e–f). This flexibility in the configuration of the MATH domains gives rise to a broad distribution of distances between the substrate binding sites in neighbouring MATH domains, which is also slightly increased upon reweighting for all oligomers (Figure 4g–h). Both the overall rigidity of the oligomers and the flexibility of the MATH domains are also evident from visual inspection of the conformational ensemble of the 60mer (Figure 4j).
To examine further whether the SAXS data support the observed flexibility of the MATHBTB linker and repositioning of the MATH domains, we generated new ensembles of SPOP oligomers following the same protocol as above, but this time restraining the MATH domains to the BTBBACK domains based on the configuration in the crystal structure using the elastic network model implemented in Martini. We calculated SAXS data from the generated ensembles and again fitted the isodesmic ${K}_{\text{D}}$ globally to the SAXS data, resulting in ${K}_{\text{D}}$=0.2±0.2 µM. The agreement with the SAXS data was substantially worse than for the original ensembles with the MATH domains unrestrained (${\chi}_{r,\text{global}}^{2}$=4.38 and ${\chi}_{r,\text{global}}^{2}$=1.24 respectively), and the systematic deviations from the experimental SAXS profiles were clearly exacerbated (Figure 5). These results suggest that, first, the resolution of the SAXS data is high enough to distinguish between different configurations of the MATH domains and, second, that the SAXS data are indeed in better agreement with a model where the MATHBTB linker is flexible. Taken together, our results support a model where, in solution, SPOP oligomers behave as rigid, helical structures with flexible MATH domains that can extend away from the BTB/BACK domains.
The comparison between the conformational ensembles generated with the MATH domains free or restrained also suggests that accurate conformational ensembles are necessary for accurate determination of the isodesmic ${K}_{\text{D}}$. Fitting the SAXS data using ensembles with the MATH domains restrained resulted in a lower isodesmic ${K}_{\text{D}}$, and calculating the agreement with SAXS for a range of isodesmic ${K}_{\text{D}}$ values revealed that there was no clear minimum in the ${\chi}_{r,\text{global}}^{2}$ for ${K}_{\text{D}}$ values > 0, which is also reflected in the large error range for the fitted ${K}_{\text{D}}$ (Figure 5—figure supplement 1). In line with this observation, validation with SAXS data at 15 µM protein revealed that improving the accuracy of the ensembles by reweighting also improved the accuracy of the fitted isodesmic ${K}_{\text{D}}$ independently of the fitted ensemble weights (Figure 3—figure supplement 1d).
To explore further how changes in the conformational ensemble would affect the agreement with the SAXS data, we used subsampling to generate ensembles with specific properties based on the ensembles with unrestrained MATH domains. To keep the comparison of ensembles unbiased by our previous fitting to the SAXS data, we used the isodesmic ${K}_{D}$=1.6 µM from CGMALS for all comparisons with SAXS. First, we selected frames with lower average MATHBTB/BACK COM distances (Figure 5—figure supplement 3). In line with the results from simulations with restrained MATH domains, this worsened the agreement with the SAXS data. In contrast, selecting frames with higher average MATHBTB/BACK COM distance slightly improved the agreement with the SAXS data, in line with the results from reweighting (Figure 5—figure supplement 4). We also wished to test how sensitive the agreement with the SAXS data was to the overall shape of the oligomers. However, the conformational space that we could explore by subsampling the ensembles was limited by the rigidity of the oligomers. Despite this limitation, we subsampled ensembles with slightly higher and lower endtoend distances than the original ensembles, corresponding to oligomers that are more or less extended than the original ensembles. Again consistent with the results from reweighting, ensembles with lower endtoend distance resulted in slightly worse agreement with the SAXS data (Figure 5—figure supplement 5), while ensembles with higher endtoend distance did not substantially change the agreement with the SAXS data (Figure 5—figure supplement 6). This result suggests that the SAXS data is less consistent with more compact SPOP oligomers, at least within the local part of conformational space explored here.
Selfassociation allows SPOP to bind disordered substrates that contain multiple SPOP binding motifs through multivalent interactions (Pierce et al., 2016). We hypothesized that the spacing between MATH domains in SPOP oligomers could be related to the spacing between SPOP binding motifs in disordered substrates. To investigate this, we selected five SPOP substrates with multiple SPOP binding motifs located in IDRs (SETD2 Zhu et al., 2017, SCAF1 Theurillat et al., 2014, SRC3 Li et al., 2011; Geng et al., 2013; Janouskova et al., 2017, Gli2, and Gli3 Zhang et al., 2006; Zhang et al., 2009) and ran coarsegrained simulations of their IDRs (seven IDRs in total) using CALVADOS, a onebeadperresidue implicit solvent model that has been optimized to reproduce accurate global dimensions and transient interactions in IDPs (Tesei et al., 2021). We calculated the distances between neighbouring SPOP binding motifs in the simulations, and compared these with the distances between substratebinding sites in neighbouring MATH domains given by our ensembles of SPOP oligomers (Figure 4g). This revealed substantial overlap between the two distributions, with a similar average distance between neighbouring binding sites in SPOP and in substrates, suggesting that the spacing of SPOP binding motifs in substrates may be evolutionarily optimized for multivalent binding to MATH domains.
Having analyzed the conformational properties of wild type SPOP and shown that the SAXS data are sensitive to the degree of selfassociation, we next wished to test whether our approach could capture the effects of mutations on SPOP selfassociation. We collected a concentration series of SAXS data on the SPOP mutant R221C, which has been identified in melanoma (Krauthammer et al., 2012) and colorectal cancer (Giannakis et al., 2016). R221C is located in the BTBBTB interface, so we hypothesized that it may affect SPOP’s propensity to selfassociate. We used the same approach as for wild type to fit the isodesmic ${K}_{\text{D}}$ globally to the SAXS data, but without reweighting the conformational ensembles. For R221C, the isodesmic ${K}_{\text{D}}$ was fitted to 8.2±2.3 µM, which resulted in a reasonable fit to the SAXS data with ${\chi}_{global}^{2}$=1.79 (Figure 3—figure supplement 7), suggesting that the mutation results in a decreased propensity to selfassociate compared with wild type (${K}_{\text{D}}$=0.9 µM using a comparable approach or 1.3 µM when also reweighting the ensemble). Because R221C is located at the BTBBTB interface, the 6–9 fold increase of the isodesmic ${K}_{\text{D}}$ (which relates to BACKBACK dimerization) is perhaps surprising. While a longrange effect of R221C cannot be ruled out, an alternative mechanism may involve shifting the equilibrium of the BTBBTB dimer, thus effectively decreasing the concentration of dimeric species available for selfassociation.
Discussion
The ability of SPOP, a cancerassociated substrate adaptor in the ubiquitination machinery, to selfassociate is important for its role in biology and disease. Characterizing the conformational ensemble of flexible and selfassociating proteins such as SPOP from ensembleaveraged experiments is, however, difficult due to conformational and compositional heterogeneity. In one approach, SAXS data of mixtures may be attempted to be decomposed into contributions of individual components that may then be analysed separately (HerranzTrillo et al., 2017; Meisburger et al., 2021). Here, we have developed an alternative ‘forward modelling’ approach to characterize proteins that undergo polydisperse oligomerization by selfconsistently and globally fitting the distribution of oligomeric species and reweighting the conformational ensembles of the oligomers against SAXS data. A similar idea has recently been applied to study the selfassociation of tubulin using static structures as input (Shemesh et al., 2021). We recorded a concentration series of SAXS data on SPOP, which is known to form linear higherorder oligomers, and combined MD simulations with our approach to simultaneously refine conformational ensembles of thirty oligomeric states of SPOP along with the relative populations.
Our results suggest that SPOP oligomers are rigid, helical structures in solution and that the MATHBTB linker is flexible, allowing for the extension of MATH domains away from the oligomer core. This is consistent with SPOP’s proposed role in phase separation, as reconfiguration of the MATH domains could facilitate binding of substrates across multiple MATH domains and between different SPOP oligomers (Pierce et al., 2016; Bouchard et al., 2018). Indeed, the spacing of the MATH domains in our model of SPOP oligomers is consistent with the distances between motifs in ensembles of disordered SPOP substrates, based on coarsegrained simulations of disordered SPOP substrates. It has been suggested previously that rigidity could play an important role in the phase separation of SPOP oligomers by ensuring a low conformational entropy penalty upon stacking linear oligomers with crossbound substrates in the dense phase (Schmit et al., 2020). This is also consistent with the rigid structural model of SPOP oligomers proposed here. Our results also provide orthogonal evidence that SPOP selfassociation is described well by the isodesmic model, and that the isodesmic ${K}_{\text{D}}$ for BACKBACK mediated selfassociation is in the low micromolar range, in agreement with previous measurements by CGMALS (Marzahn et al., 2016). We also collected SAXS data and fitted the isodesmic ${K}_{\text{D}}$ for the SPOP mutant R221C. Our results suggest that SPOP R221C has a six to ninefold decreased propensity to selfassociate.
While the analysis of the SAXS data presented here does not strictly exclude the possibility that SPOP forms branched or otherwise nonlinear oligomers, our results show that linear oligomers based only on the selfassociation interfaces known from existing crystal structures are consistent with SAXS data. Thus, linear oligomers seem to be the most plausible model based on this and other existing experimental evidence, for example that removal or mutation of either the BACKBACK or BTBBTB interface results in abolishment of higherorder selfassociation and that higherorder oligomers are formed through the selfassociation of SPOP dimers with every step of subunitaddition populated (Marzahn et al., 2016). Finally, as shown here, linear isodesmic selfassociation with the same ${K}_{D}$ provides a good fit to both SAXS and CGMALS data (Marzahn et al., 2016).
The approach presented here to study SPOP can be extended to other polydisperse systems to characterize the distribution of oligomeric states and their conformational properties. However, there are a few limitations to be aware of; SAXS is a lowresolution technique, and may not be able to distinguish between all relevant conformations, a problem that is likely exacerbated here, as the contribution of many species to the SAXS signal may average out distinct features in the profile. One way to mitigate this problem is to construct multiple structural models, and test whether they show any difference in the agreement with the SAXS data. In the case of SPOP we used this approach to examine the flexibility of the MATH domain in SPOP^{28–359}.
Another limitation of the approach is the correlation between the fitted distribution of oligomeric states and the conformational properties of the oligomers. Here, we observed that a low isodesmic ${K}_{\text{D}}$ with large uncertainty was fitted when using more compact structures (MATH domains restrained), which suggests that the model can compensate for the underestimated dimensions of the proteins by increasing the populations of larger oligomers. Therefore, it is important to use prior conformational ensembles that are as accurate as possible. Additionally, it is important to include all the oligomeric species that make a substantial contribution to the SAXS data in the modelling. In the future, it might be relevant to include independent data reporting on the distribution of oligomeric species, such as from CGMALS, when fitting SAXS data.
In the case of SPOP, we described the distribution of oligomers using the isodesmic selfassociation model, but this can be replaced by any model that describes the populations of the species in solution — with the caveat that there should not be too many free parameters to fit to the SAXS data. Similarly, the approach to generate prior conformational ensembles is not limited to MD simulations, and can be varied based on the system at hand. This flexibility in the modelling approach will make it useful to study other polydisperse systems in the future.
Methods
Protein expression and purification
The SPOP gene encoding residues 28–359 (HisSUMOSPOP^{28–359}) was expressed and purified as previously described (Bouchard et al., 2018). Briefly, HisSUMOSPOP^{28–359} was transformed into BL21RIPL cells and expressed in autoinduction media (Studier, 2005). Cells were harvested, lysed, and cell debris was pelleted by centrifugation. The clarified supernatant was applied to a gravity Ni Sepharose resin equilibrated in resuspension buffer (30 mM imidazole, 1 M NaCl, pH 7.8). After washing with wash buffer (75 mM imidazole, 1 M NaCl, pH 7.8), the protein was eluted with a buffer containing 300 mM imidazole, 1 M NaCl, pH 7.8. One milligram of TEV protease was added to the eluted protein and the reaction was left to dialyze into 20 mM Tris pH 7.8, 300 mM NaCl, and 5 mM DTT at 4 °C overnight. The cleaved protein was then further purified using a Superdex S200 sizeexclusion chromatography column equilibrated with 20 mM Tris pH 7.8, 300 mM NaCl, and 5 mM DTT.
Smallangle Xray scattering
SAXS experiments were performed at the LIXbeamline (16ID) of the National Synchrotron Light Source II (Upton, NY) (DiFabio et al., 2016). Data were collected at a wavelength of 1.0 Å, yielding an accessible scattering angle range of 0.006 $<q<$ 3.2 Å^{−1}, where $q$ is the momentum transfer, defined as $q=4\pi \mathrm{sin}(\theta )/\lambda $, where $\lambda $ is the Xray wavelength and 2θ is the scattering angle. Data with $q<$ 0.4 Å^{−1} were used for all analyses. Prior to data collection, SPOP was dialyzed into 20 mM Tris pH 7.8, 150 mM NaCl, and 5 mM DTT. Samples were loaded into a 1 mm capillary for ten 1 s Xray exposures. Data were reduced at the beamline using the Python package py4xs.
Molecular dynamics simulations with Martini
We ran coarse grained molecular dynamics simulations of six SPOP^{28–359} oligomers ranging from the dimer to dodecamer (in steps of dimeric protomer subunits) using a beta version (3.0.4.17) of the Martini 3 force field (https://github.com/KULLCentre/papers/tree/main/2020/TIA1SASLarsenetal/Martini; Souza et al., 2021) and Gromacs 2020 (Abraham et al., 2015). We built the SPOP monomer structure using Modeller (Sali and Blundell, 1993) based on the crystal structure of the MATH and BTB domains (PDB: 3HQI) (Zhuang et al., 2009) and a crystal structure of the BACK domain (PDB: 4HS2) (van Geersdaele et al., 2013). We built the dimer structure by superposing two monomer structures to the crystal structure of the BTBBTB dimer interface in 3HQI. We then built larger oligomers by iteratively adding dimer structures to the linear oligomer. Dimers were added by superposing the terminal BACK domain of the oligomer and a terminal BACK domain of the dimer to the structure of the BACKBACK dimer (4HS2).
The starting structures were coarse grained using the Martinize2 python script. Elastic network restraints of 500 kJ mol^{–1} nm^{–2} between backbone beads within a 1.2 nm cutoff were applied with Martinize2 to keep folded domains intact and to hold oligomer subunits together. In the ‘MATH free’ model, we removed all elastic network restraints between MATH and BTB/BACK domains, between MATH and MATH domains, and in the linker region between MATH and BTB/BACK domains, while in the ‘MATH restrained’ model, we only removed elastic network restraints between MATH and MATH domains and in the linker region between MATH and BTB/BACK domains, but kept restraints between MATH and BTB/BACK domains. We added dihedral and angle potentials between side chains and backbone beads with the scfix flag in Martinize2. Using Gromacs editconf, we placed the dimer and tetramer in a dodecahedral box. To keep the box volume small, larger oligomers were aligned with the principal axis of the system and placed in triclinic boxes that were thus elongated along the xaxis. To keep these oligomers from rotating and selfassociating across the periodic boundary, we added soft harmonic position restraints of 5 J mol^{–1} nm^{–2} along the y and zaxis to the backbone beads of the terminal BTB/BACK domains. We solvated the systems using the Insane python script (Wassenaar et al., 2015) and added 150 mM NaCl along with Na^{+} ions to neutralize the systems. In the ‘MATH free’ system, we rescaled the $\u03f5$ of the LennardJones potentials between all protein and water beads by a factor 1.06 to favour extension of the MATH domains into solution (Thomasen et al., 2022), while the unmodified Martini 3 beta v.3.0.4.17 was used for the ‘MATH restrained’ model.
Energy minimization was performed using steepest descent for 10,000 steps with a 30 fs timestep. Simulations were run in the NPT ensemble at 300 K and 1 bar using the VelocityRescaling thermostat (Bussi et al., 2007) and ParinelloRahman barostat (Parrinello and Rahman, 1981). Nonbonded interactions were treated with the Verlet cutoff scheme. The cutoff for Van der Waals interactions and Coulomb interactions was set to 1.1 nm. A dielectric constant of 15 was used. We equilibrated the systems for 10 ns with a 2 fs timestep and ran production simulations for 60 µs with a 20 fs timestep, saving a frame every 1 ns.
After running the simulations, molecule breaks over the periodic boundaries were treated with Gromacs trjconv using the flags pbc mol center. Simulations were backmapped to allatom using a modified version of the Backward algorithm (Wassenaar et al., 2014), in which simulation runs are excluded and energy minimization is shortened to 200 steps (Larsen et al., 2020). Every fourth simulation frame was backmapped for a total of 15,000 conformers in each backmapped ensemble.
Constructing ensembles of larger SPOP oligomers
We constructed conformational ensembles of larger SPOP^{28–359} oligomers with up to 60 subunits by joining together conformers from the allatom backmapped ensembles of the SPOP dodecamer. Using ensembles of two input SPOP oligomers (SPOP 1 and SPOP 2) we started by removing the last subunit of SPOP 1 and the first subunit of SPOP 2 to ensure that the newly joined subunits were internal and not terminal. We then removed additional subunits from SPOP 2 to reach the desired length of the output oligomer. Then, we selected a random frame from SPOP 1 and SPOP 2, superposed the BTB/BACK domains of the last two subunits of SPOP 1 to the BTB/BACK domains of the first two subunits of SPOP 2, and deleted the first two subunits of SPOP 2. Next, we checked for clashes between the newly joined subunits (shortest interatomic distance <0.4 Å), and rejected the new frame if there was a clash. This approach ensured that the terminal subunits in the constructed oligomer were also the terminal subunits in the MD simulation of the dodecamer, while all internal subunits in the constructed oligomer were also internal in the MD simulation. This approach was repeated to create 15,000 structures of each larger oligomer.
Calculating SAXS intensities from conformational ensembles
We calculated SAXS intensities from each of the 15,000 conformers in each of our allatom ensembles of SPOP oligomers using PepsiSAXS (Grudinin et al., 2017). To avoid overfitting to the experimental SAXS data, we used fixed values for the parameters that describe the contrast of the hydration layer, $\delta \rho $=3.34 e/nm^{3}, and the volume of displaced solvent, r_{0}/r_{m} = 1.025, that have been shown to work well for intrinsically disordered and multidomain proteins (Pesce and LindorffLarsen, 2021). The forward scattering ($I(0)$) was set equal to the number of subunits in the oligomer, in order to scale the SAXS intensities proportionally to the particle volume.
The isodesmic selfassociation model and averaging of SAXS intensities
The experimental SAXS profiles of SPOP report on the average of a polydisperse mixture of oligomeric species in solution. The concentration of each oligomer should follow the isodesmic model where the concentration of the smallest subunit, the BTBBTB dimer, is given by:
The concentration c_{i} of any larger oligomer with $i$ subunits can be calculated given c_{1} and the concentration of oligomer $i$–1, c_{i1}:
${K}_{A}$ is the isodesmic association constant and ${c}_{tot}$ is the total concentration of protomers. Here we assume that the SPOP BTBBTB dimer is always fully formed (Marzahn et al., 2016) and ${c}_{tot}$ in Equation 1 is thus half of the total protein concentration reported for the SAXS experiments, which refers to the SPOP monomer concentration. Given the concentration c_{i} of each oligomer $i$ from the isodesmic model, we can calculate the volume fraction ${\varphi}_{i}$ of the oligomer:
The average SAXS intensities from the mixture of oligomers ${\u27e8I\u27e9}_{\text{mix}}$ are then given by:
where ${\u27e8I\u27e9}_{i,\text{ensemble}}$ is the conformationally averaged SAXS intensity of oligomer $i$. Note that the magnitude of the SAXS intensities calculated with PepsiSAXS were set to be proportional to the number of subunits in the oligomer, so given Equations 3 and 4 the total contribution of each oligomer to the averaged SAXS intensity is proportional to ${i}^{2}{c}_{i}$.
Selfconsistent optimization of isodesmic model parameters and conformational ensemble weights
The algorithm we developed to selfconsistently optimize the isodesmic distribution of oligomer concentrations and reweight the conformational ensemble of each oligomer against SAXS data consists of three iterative steps: (1) fitting the scale and constant background of the SAXS data, (2) fitting the isodesmic ${K}_{\text{A}}$, and (3) reweighting the conformational ensemble of each oligomer using BME reweighting. We used a concentration series of SAXS experiments, to which the isodesmic ${K}_{\text{A}}$ was fitted globally, and only subsequently transformed the ${K}_{\text{A}}$ to the ${K}_{\text{D}}$ (${K}_{D}=1/{K}_{\text{A}}$) for reporting our results.
Step 1: Fitting the SAXS scale and constant background
The following step was repeated for each SAXS experiment in the concentration series. The concentration of each oligomer was calculated using the isodesmic model with the given ${c}_{tot}$ (Equations 1 and 2). The average SAXS intensities ${\u27e8I\u27e9}_{\text{mix}}$ from all oligomers were then calculated using Equations 3 and 4. The scale and constant background ($cst$) of ${\u27e8I\u27e9}_{\text{mix}}$ were fitted to the experimental SAXS intensities, ${I}_{\text{exp}}$, using leastsquares linear regression weighted by the experimental errors (LinearRegression function in scikitlearn Pedregosa et al., 2011):
In practice, to avoid modifying the SAXS scale and constant background for every conformer in our ensembles, we instead performed the inverse operation on the experimental SAXS profile:
and propagated the experimental errors ${\sigma}_{\text{exp}}$ accordingly:
Step 2: Fitting the isodesmic model
The isodesmic ${K}_{\text{A}}$ was fitted globally to the concentration series of SAXS experiments using Metropolis Monte Carlo (Metropolis et al., 1953) with simulated annealing. For each Monte Carlo step, we generated a new random ${K}_{\text{A}}$ with a Gaussian probability distribution centered around the previous ${K}_{\text{A}}$, calculated new oligomer concentrations and corresponding ${\u27e8I\u27e9}_{\text{mix}}$ for each SAXS experiment in the concentration series using Equations 1–4, and for each SAXS experiment calculated the reduced ${\chi}^{2}$, ${\chi}_{r}^{2}$, as:
where $m$ is the number of SAXS intensities $j$ in the SAXS profile. We then calculated the average of the ${\chi}_{r}^{2}$values across the SAXS concentration series to get the global ${\chi}_{r}^{2}$, ${\chi}_{r,\text{global}}^{2}$, as the number of intensities was the same in each SAXS profile. Next, we evaluated the acceptance criterion by calculating:
where ${\chi}_{\text{new,r,global}}^{2}$ and ${\chi}_{\text{old,r,global}}^{2}$ are the from the current and previous Monte Carlo step respectively and $T$ is the simulated annealing temperature. If $\alpha $ > 1, we accepted the new ${K}_{\text{A}}$. If $\alpha \le $=1, we generated a random number, $rand$, between 0 and 1, and if $\alpha $ > $rand$, accepted the new ${K}_{\text{A}}$. Otherwise, we kept the ${K}_{\text{A}}$ from the previous Monte Carlo step. Finally, we decreased $T$ for the next Monte Carlo step.
Step 3: Reweighting the conformational ensemble
The following step was repeated for each SAXS experiment in the concentration series. We calculated the oligomer concentrations using the isodesmic model given the new ${K}_{\text{A}}$ determined in step 2. For each oligomer $i$, we extracted a SAXS profile for BME reweighting from the experimental profile using the following method: we calculated the average SAXS profile from the ensembles as in Equation 4 but leaving out oligomer $i$ from the sum to get ${\u27e8I\u27e9}_{\text{mix,rest}}$. Next, we determined the contribution of species $i$ to the experimental SAXS intensity as:
where ${I}_{\text{exp}}$ is the experimental SAXS intensity and ${\varphi}_{i}$ is the volume fraction of oligomer $i$. We then propagated the error $\sigma}_{i,\text{extr}$ on $\u27e8I{\u27e9}_{i,\text{extr}}$ from both the errors on the experimental SAXS intensities and the errors on the calculated SAXS intensities, which we determined using block error analysis (Flyvbjerg and Petersen, 1989). The propagated errors were given by:
where the sum $r$ to $N$ runs over all oligomers that contributed to ${\u27e8I\u27e9}_{\text{mix,rest}}$, ${\sigma}_{\text{exp}}$ is the error on the experimental SAXS intensity, $\sigma}_{r,\text{block}$ is the error on the average SAXS intensity calculated from the ensemble of oligomer $r$ prior to reweighting using block error analysis (https://github.com/fpesceKU/BLOCKING; Pesce, 2023), and ${\varphi}_{i}$ is the volume fraction of oligomer $i$. The conformational ensemble of oligomer $i$ was then reweighted against this extracted SAXS profile using BME reweighting (Bottaro et al., 2020), in which a set of ensemble weights $w$ are obtained by minimizing the function:
where $n$ is the number of ensemble conformations, $m$ is the number of experimental observables (in this case the number of SAXS intensities in the profile), ${\chi}_{r}^{2}$ quantifies the agreement between ${\u27e8I\u27e9}_{\text{ensemble}}$ and $\u27e8I{\u27e9}_{\text{extr}}$, $S}_{\text{rel}$ is the relative Shannon entropy that quantifies the deviation of the new weights from the initial weights, ${w}^{0}$, and $\theta $ is a scaling parameter that quantifies the confidence in the experimental data versus the prior ensemble. ${\chi}_{r}^{2}$ is given by:
where ${I}_{j,k,\text{ensemble}}$ is the SAXS intensity $j$ calculated from the conformer $k$ of the ensemble. $S}_{\text{rel}$ is given by:
Using the ensemble weights obtained from BME reweighting, we calculated new weighted average SAXS intensities, ${\u27e8I\u27e9}_{i,\text{ensemble}}$, from the ensemble of oligomer $i$. The process of extracting a SAXS profile followed by BME reweighting was repeated for each oligomer.
Optimization parameters
The three steps described above were repeated iteratively to converge on selfconsistent values of the SAXS scale and constant background, the isodesmic ${K}_{\text{A}}$, and the ensemble weights for each oligomer species. As the SAXS profile against which the ensemble of each oligomer was reweighted is a function of the ensemble weights of all other oligomeric species, we wished to reweight the ensembles only slightly in initial iterations, and then gradually increase the degree of reweighting as the conformational weights and isodesmic ${K}_{\text{A}}$ converged. We achieved this by starting with a high value of $\theta $ (Equation 12) and then gradually decreasing $\theta $ each iteration. The fraction of effective frames, ${\varphi}_{\text{eff}}$, given by $\mathrm{exp}({S}_{\text{rel}})$, provides a measure of the fraction of the initial ensemble that is retained after reweighting. At every iteration, we checked whether the ensemble of each oligomer had reached a ${\varphi}_{\text{eff}}$ below a set cutoff, after which $\theta $ was no longer decreased for that specific oligomer. Thus, the overall degree of reweighting could be tuned through selection of this ${\varphi}_{\text{eff}}$cutoff.
We ran the optimization scheme for 1000 iterations starting with $\theta $=100 and decreasing $\theta $ by 2% every iteration. The simulated annealing of the isodesmic ${K}_{\text{A}}$ was run from $T$=10 to $T$=0.1 every iteration, with $T$ decreased by 30% every Monte Carlo step, and with a standard deviation of 0.1 µM^{−1} for the Gaussian probability distribution used to generate the new ${K}_{\text{A}}$. The step was repeated if ${K}_{A}\le $ 0 was generated.
Preventing overfitting of ensemble weights
We ran the optimization with a range of ${\varphi}_{\text{eff}}$cutoffs from 0.1 to 1. To prevent overfitting, we aimed to choose a value of ${\varphi}_{\text{eff}}$ that retained as much of the prior ensemble as possible (high ${\varphi}_{\text{eff}}$) while not sacrificing substantial improvement in the fit to the SAXS data (low ${\chi}_{r,\text{global}}^{2}$). As an additional approach to prevent overfitting, we left out the SAXS experiment recorded with 15 µM protein from the optimization, and used it as validation for the determined weights (averaged as explained in the next section) and isodesmic ${K}_{\text{A}}$ at different values of the ${\varphi}_{\text{eff}}$cutoff. For each ${\varphi}_{\text{eff}}$cutoff, we fitted only the SAXS scale and constant background to the 15 µM SAXS experiment. We tested the effect of using the fitted ${K}_{\text{A}}$ and ensemble weights in combination, but also the effect of using only the fitted ${K}_{\text{A}}$ or ensemble weights independently. Although in all cases the fitted ${K}_{\text{A}}$ and ensemble weights combined improved the fit to the SAXS data compared with the initial weights and ${K}_{\text{A}}$, ${\varphi}_{\text{eff}}$cutoff=0.4 was the lowest value of ${\varphi}_{\text{eff}}$ where the fit was not improved by replacing the fitted weights with uniform weights in combination with the fitted ${K}_{\text{A}}$ (Figure 3—figure supplement 1). Thus, we selected the conformational weights and isodesmic ${K}_{\text{A}}$ determined with ${\varphi}_{\text{eff}}$cutoff=0.4 to avoid overfitting the ensemble weights.
Averaging the conformational weights from different SAXS experiments
The optimization scheme outputs a set of conformational weights for each SAXS experiment in the concentration series. We combined these conformational weights to obtain a single set of weights for further analysis, under the assumption that the conformational properties of each SPOP oligomer are independent of protein concentration. The distribution of oligomeric species from the isodesmic model depends on the protein concentration. Thus, each SAXS experiment does not contain the same amount of information on every oligomer; SAXS experiments at lower concentrations have a relatively smaller contribution from large oligomers and vice versa. Therefore, we weighted the averaging of the conformational weights to reflect this mismatch in information. The average weight of conformation $k$ of oligomer $i$ was calculated as:
where ${w}_{k,i,l}$ is the weight of conformer $k$ of oligomer $i$ from reweighting against SAXS experiment $l$ and ${\rho}_{i,l}$ is the contribution of oligomer $i$ to SAXS experiment $l$ relative to the contribution of oligomer $i$ to the other SAXS experiments in the concentration series, given by:
where ${c}_{i,l}$ is the concentration of oligomer $i$ in SAXS experiment $l$ given by the isodesmic model. For a plot of the contributions ${\rho}_{i,l}$, see Figure 3—figure supplement 9.
Determining the error of the fitted isodesmic K_{D}
To determine the uncertainty of the isodesmic ${K}_{\text{D}}$ fitted with our optimization scheme, we scanned a range of ${K}_{\text{D}}$ values around the fitted ${K}_{\text{D}}$ and determined the ${\chi}_{r,\text{global}}^{2}$ to the concentration series of SAXS data. We used the same ensemble weights for every value of ${K}_{\text{D}}$, and only fitted the scale and constant background to the SAXS data. We then defined the error of the fitted ${K}_{\text{D}}$ to include all ${K}_{\text{D}}$ values that gave a ${\chi}_{r,\text{global}}^{2}$ to the SAXS data within 10% of the minimum ${\chi}_{r,\text{global}}^{2}$.
Analysis of SPOP conformational ensembles
${R}_{g}$ was calculated from ensembles using the gyrate function in Gromacs. Endtoend distances were calculated from ensembles as the distances between the centerofmass (COM) of the BTB/BACK domains in the terminal subunits using the compute_center_of_mass function in MDTraj (McGibbon et al., 2015) and the linalg.norm function in NumPy (Harris et al., 2020). We fitted ensemble averaged endtoend distances against oligomer size (number of subunits) with a power law: ${R}_{\text{EE}}={R}_{0}{N}^{\nu}$, where R_{0} is the subunit segment size, $N$ is the number of subunits in the oligomer, and $\nu $ is a scaling exponent, using the curve_fit function in SciPy (Virtanen et al., 2020). To subsample ensembles with extended or compacted oligomers, frames were selected with $R}_{\text{EE}}>max({R}_{\text{EE}})\frac{max({R}_{\text{EE}})\u27e8{R}_{\text{EE}}\u27e9}{2$ or $R}_{\text{EE}}<min({R}_{\text{EE}})+\frac{\u27e8{R}_{\text{EE}}\u27e9min({R}_{\text{EE}})}{2$ respectively, where $max({R}_{\text{EE}})$ and $min({R}_{\text{EE}})$ are the maximum and minimum over all frames of the ensemble and $\u27e8{R}_{EE}\u27e9$ is the ensemble average. MATHBTB/BACK COM distance was calculated from ensembles as the distance between the COM of the MATH domain and BTB/BACK domains in every subunit using the compute_center_of_mass function in MDTraj and the linalg.norm function in NumPy. The histogram of MATHBTB/BACK COM distances shows values for all conformations of all subunits of all oligomers. To subsample ensembles with compacted or extended MATH domains, frames were selected with an average MATHBTB/BACK COM distance over all subunits <4.4 nm or >5.2 nm, respectively. The COM distance between substrate binding sites in neighbouring MATH domains was calculated from ensembles using the distance function in Gromacs. The MATH substrate binding site was defined as residue Arg70, Tyr87, Ser119, Tyr123, and Lys129Phe133. The histogram of MATH binding site COM distances shows values for all conformations of all subunits of all oligomers. Structures for Figure 4i were selected by fitting three Gaussians to the histogram in Figure 4e (after reweighting) using SciPy curve_fit and for each Gaussian selecting conformers within 0.1σ of the mean. All visualizations of protein structures were made with ChimeraX (Pettersen et al., 2021). To examine the agreement of single frames drawn from the ensembles with SAXS data, we drew a random frame from the ensemble of each oligomer and scanned the isodesmic ${K}_{D}$ from 0.01 to 100 µM (with 10,000 logspaced steps) to select the ${K}_{D}$ that gave the optimal agreement with the SAXS concentration series based on ${\chi}_{r,\text{global}}^{2}$. The SAXS scale and constant background were fitted for each ${K}_{D}$. This procedure was repeated for 10,000 iterations. The same procedure was performed with oligomer structures constructed prior to the MD simulations.
Dimeroligomer equilibria and averaging of SAXS intensities
For dimeroligomer equilibria, the total concentration of BTBmediated dimer subunits (both free and in oligomers), $c}_{\text{tot,dimer}$, was assumed to be half of the total SPOP monomer concentration. We determined the equilibrium dimer concentration, $c}_{\text{dimer}$, and oligomer concentration, c_{i}, for a given association constant ${K}_{A}$ using:
and the equation for conservation of mass:
where, $i$ is the number of subunits in the oligomer. The averaged SAXS intensities $\u27e8I{\u27e9}_{\text{mix}}$ were then calculated as:
where $\u27e8I{\u27e9}_{\text{dimer,ensemble}}$ and $\u27e8I{\u27e9}_{i,\text{ensemble}}$ are the ensemble averaged SAXS intensity for the dimer and oligomer, and $\varphi}_{\text{dimer}$ and ${\varphi}_{i}$ are the volume fractions of the dimer and oligomer calculated based on the concentrations and number of subunits. For each possible dimeroligomer equilibrium, we scanned ${K}_{A}$ values from 10^{−12}–10^{12} µM^{−1} and selected the ${K}_{A}$ that gave the optimal agreement with the SAXS concentration series based on ${\chi}_{r,\text{global}}^{2}$. The SAXS scale and constant background were fitted for each ${K}_{A}$.
Molecular dynamics simulations with CALVADOS
We selected five SPOP substrates with at least 8 SPOP binding motifs (Cuneo and Mittag, 2019) for simulations (SETD2 Zhu et al., 2017, SCAF1 Theurillat et al., 2014, SRC3 Li et al., 2011; Geng et al., 2013; Janouskova et al., 2017, Gli2, and Gli3 Zhang et al., 2006; Zhang et al., 2009). We selected the IDRs of these proteins based on low Alphafold pLDDT scores and pairwise alignment errors (Jumper et al., 2021). We ran coarsegrained simulations of these with CALVADOS 2 (Tesei et al., 2021; Tesei and LindorffLarsen, 2023). Simulations were run at 298 K, with an ionic strength of 150 mM, and pH 7.2 for determining the partial charge of histidine sidechains. Simulations were run for $3\times {10}^{3}{N}_{\text{res}}^{2}$ steps, where $N}_{\text{res}$ is the number of residues, using a 10 fs timestep (Tesei and LindorffLarsen, 2023). Frames were saved every $3{N}_{\text{res}}^{2}$ steps to obtain weakly correlated frames. We used a 2 nm cutoff for the AshbaughHatch potential and a 4 nm cutoff for the DebyeHückel potential. All simulations were started from a linear arrangement of the protein chain, except for simulations of the two longest IDRs, SCAF1 IDR and SETD2 IDR 1, which were started from an Archimedean spiral arrangement. Simulations were performed with HOOMDblue 2.9.3 (Anderson et al., 2020).
Analysis of motif spacing in SPOP substrates
We identified SPOP binding motifs in the substrate sequences as five consecutive positions with residues 1: GAVLIMWFPC, 2: STCYNQDEHR, 3: ST, 4: STCYNQDEHR, 5: ST or 1: GAVLIMWFPC, 2: STCYNQDEHR, 3: ST, 4: ST, 5: STCYNQDEHR, where each set of amino acids are allowed at the given position (Zhuang et al., 2009; Cuneo and Mittag, 2019). We calculated a histogram of all distances between neighbouring motifs in the SPOP susbstrate sequences over the CALVADOS simulations. Distances were calculated between the middle residue beads of the neighbouring motifs using the compute_contacts function in MDTraj. We also calculated the average distance, $R$, between each neighbouring motif and fit this with a power law $R={R}_{0}{N}^{\nu}$, where R_{0} is the segment size, $N$ is the number of residues spacing the two motifs, and $\nu $ is a scaling exponent, using the curve_fit function in SciPy.
Fitting CGMALS data
Given the concentration of each oligomer from the isodesmic model, the average molecular weight, as measured by CGMALS, was calculated as:
where $N$ is the number of oligomers, c_{i} is the concentration of oligomer $i$ given by the isodesmic model and $M{W}_{\text{monomer}}$ is the molecular weight of the subunit of oligomerization.
We fitted the isodesmic ${K}_{D}$ and $M{W}_{\text{monomer}}$ to the CGMALS data from Marzahn et al., 2016. The CGMALS data consists of two merged datasets, so we allowed a different $M{W}_{\text{monomer}}$ for each of the two merged datasets to absorb uncertainties from determination of the protein concentrations. The ${K}_{D}$ was fitted globally to the two merged datasets, and the error of the fit on the ${K}_{D}$ was set to two standard deviations. Fitting was done with the curve_fit function in SciPy.
Data availability
Code and data is available at https://github.com/KULLCentre/_2022_Thomasen_SPOP (copy archived at swh:1:rev:be995dd615079fe8b4fbb86941160d519429ee4c). Simulation data is available at https://doi.org/10.17894/ucph.ef999f72b5e845c4835f3e49619a0f91. Plasmids are available from Addgene (plasmid IDs 194115 and 194116).

Electronic Research Data Archive at University of CopenhagenSupporting data for Conformational and oligomeric states of SPOP from smallangle Xray scattering and molecular dynamics simulations.https://doi.org/10.17894/ucph.ef999f72b5e845c4835f3e49619a0f91
References

Protein oligomerization: How and whyBioorganic & Medicinal Chemistry 13:5013–5020.https://doi.org/10.1016/j.bmc.2005.05.037

HOOMDblue: A python package for highperformance molecular dynamics and hard particle monte carlo simulationsComputational Materials Science 173:109363.https://doi.org/10.1016/j.commatsci.2019.109363

Integrating molecular simulation and experimental data: A bayesian/maximum entropy reweighting approachMethods in Molecular Biology 2112:219–240.https://doi.org/10.1007/9781071602706_15

The ubiquitin ligase adaptor Spop in cancerThe FEBS Journal 286:3946–3958.https://doi.org/10.1111/febs.15056

ConferenceThe life science xray scattering beamline at NSLSIIProceedings of the 12th international conference on synchrotron radiation instrumentation – SRI2015.https://doi.org/10.1063/1.4952872

Error estimates on averages of correlated dataJ Chem Phys 91:461–466.https://doi.org/10.1063/1.457480

PepsiSAXS: An adaptive method for rapid and accurate computation of smallangle Xray scattering profilesActa Crystallographica. Section D, Structural Biology 73:449–464.https://doi.org/10.1107/S2059798317005745

Mutational and expressional analyses of SPOP, A candidate tumor suppressor geneIn Prostate, Gastric and Colorectal Cancers. Apmis 121:626–633.https://doi.org/10.1111/apm.12030

Exome sequencing identifies recurrent somatic Rac1 mutations in melanomaNature Genetics 44:1006–1014.https://doi.org/10.1038/ng.2359

Btb domaincontaining speckletype POZ protein (Spop) serves as an adaptor of Daxx for ubiquitination by Cul3based ubiquitin ligaseThe Journal of Biological Chemistry 281:12664–12672.https://doi.org/10.1074/jbc.M600204200

The evolution of multimeric protein assemblagesMolecular Biology and Evolution 29:1353–1366.https://doi.org/10.1093/molbev/msr300

Structure, dynamics, assembly, and evolution of protein complexesAnnual Review of Biochemistry 84:551–575.https://doi.org/10.1146/annurevbiochem060614034142

MD Traj: A modern open library for the analysis of molecular dynamics trajectoriesBiophysical Journal 109:1528–1532.https://doi.org/10.1016/j.bpj.2015.08.015

Equation of state calculations by fast computing machinesJ Chem Phys 21:1087–1092.https://doi.org/10.1063/1.1699114

A theory of linear and helical aggregations of macromoleculesJournal of Molecular Biology 4:10–21.https://doi.org/10.1016/s00222836(62)801120

Polymorphic transitions in single crystals: A new molecular dynamics methodJournal of Applied Physics 52:7182–7190.https://doi.org/10.1063/1.328693

Scikitlearn: Machine learning in pythonJournal of Machine Learning Research 12:2825–2830.

Multiple weak linear motifs enhance recruitment and processivity in SPOPmediated substrate ubiquitinationJournal of Molecular Biology 428:1256–1271.https://doi.org/10.1016/j.jmb.2015.10.002

Comparative protein modelling by satisfaction of spatial restraintsJournal of Molecular Biology 234:779–815.https://doi.org/10.1006/jmbi.1993.1626

Protein network structure enables switching between liquid and gel statesJournal of the American Chemical Society 142:874–883.https://doi.org/10.1021/jacs.9b10066

Structure and energetics of GTP and GDPtubulin isodesmic selfassociationACS Chemical Biology 16:2212–2227.https://doi.org/10.1021/acschembio.1c00369

Protein production by autoinduction in high density shaking culturesProtein Expression and Purification 41:207–234.https://doi.org/10.1016/j.pep.2005.01.016

Conformational ensembles of intrinsically disordered proteins and flexible multidomain proteinsBiochemical Society Transactions 50:541–554.https://doi.org/10.1042/BST20210499

Improving martini 3 for disordered and multidomain proteinsJournal of Chemical Theory and Computation 18:2033–2041.https://doi.org/10.1021/acs.jctc.1c01042

Structural basis of highorder oligomerization of the cullin3 adaptor SpopActa Crystallographica. Section D, Biological Crystallography 69:1677–1684.https://doi.org/10.1107/S0907444913012687

Going backward: A flexible geometric approach to reverse transformation from coarse grained to atomistic modelsJournal of Chemical Theory and Computation 10:676–690.https://doi.org/10.1021/ct400617g

Computational lipidomics with insane: A versatile tool for generating custom membranes for molecular simulationsJournal of Chemical Theory and Computation 11:2144–2155.https://doi.org/10.1021/acs.jctc.5b00209

SPOPcontaining complex regulates SETD2 stability and h3k36me3coupled alternative splicingNucleic Acids Research 45:92–105.https://doi.org/10.1093/nar/gkw814
Article and author information
Author details
Funding
Lundbeckfonden (R15520152666)
 Kresten LindorffLarsen
Novo Nordisk Fonden (NNF18OC0033950)
 Kresten LindorffLarsen
National Institutes of Health (R01GM112846)
 Tanja Mittag
American Lebanese Syrian Associated Charities
 Tanja Mittag
Novo Nordisk Fonden (NNF18OC0032608)
 Kresten LindorffLarsen
National Institutes of Health (P30GM133893)
 Tanja Mittag
DOE Office of Science's Biological and Environmental Research (KP1605010)
 Tanja Mittag
National Institutes of Health (OD012331)
 Tanja Mittag
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by the Lundbeck Foundation BRAINSTRUC structural biology initiative (R15520152666, to K.L.L.), the PRISM (Protein Interactions and Stability in Medicine and Genomics) centre funded by the Novo Nordisk Foundation (NNF18OC0033950, to K.L.L.), by NIH grant R01GM112846 (to T.M.) and by the American Lebanese Syrian Associated Charities (to T.M.). We acknowledge access to computational resources from the ROBUST Resource for Biomolecular Simulations (supported by the Novo Nordisk Foundation; NNF18OC0032608), the Danish National Supercomputer for Life Sciences (Computerome), and the Biocomputing Core Facility at the Department of Biology, University of Copenhagen. We thank Melissa R Marzahn and Erik W Martin for the generation of preliminary data. We thank Shirish Chodankar for assistance with SAXS data collection and reduction. The LiX beamline is part of the Center for BioMolecular Structure (CBMS), which is primarily supported by the National Institutes of Health, National Institute of General Medical Sciences (NIGMS) through a P30 Grant (P30GM133893), and by the DOE Office of Biological and Environmental Research (KP1605010). LiX also received additional support from NIH Grant S10 OD012331. As part of NSLSII, a national user facility at Brookhaven National Laboratory, work performed at the CBMS is supported in part by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences Program under contract number DESC0012704.
Copyright
© 2023, Thomasen et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,317
 views

 202
 downloads

 3
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Structural Biology and Molecular Biophysics
The KCNH family of potassium channels serves relevant physiological functions in both excitable and nonexcitable cells, reflected in the massive consequences of mutations or pharmacological manipulation of their function. This group of channels shares structural homology with other voltagegated K^{+} channels, but the mechanisms of gating in this family show significant differences with respect to the canonical electromechanical coupling in these molecules. In particular, the large intracellular domains of KCNH channels play a crucial role in gating that is still only partly understood. Using KCNH1(K_{V}10.1) as a model, we have characterized the behavior of a series of modified channels that could not be explained by the current models. With electrophysiological and biochemical methods combined with mathematical modeling, we show that the uncovering of an open state can explain the behavior of the mutants. This open state, which is not detectable in wildtype channels, appears to lack the rapid flicker block of the conventional open state. Because it is accessed from deep closed states, it elucidates intermediate gating events well ahead of channel opening in the wild type. This allowed us to study gating steps prior to opening, which, for example, explain the mechanism of gating inhibition by Ca^{2+}Calmodulin and generate a model that describes the characteristic features of KCNH channels gating.

 Structural Biology and Molecular Biophysics
The M2 proton channel aids in the exit of mature influenza viral particles from the host plasma membrane through its ability to stabilize regions of high negative Gaussian curvature (NGC) that occur at the neck of budding virions. The channels are homotetramers that contain a cytoplasmfacing amphipathic helix (AH) that is necessary and sufficient for NGC generation; however, constructs containing the transmembrane spanning helix, which facilitates tetramerization, exhibit enhanced curvature generation. Here, we used allatom molecular dynamics (MD) simulations to explore the conformational dynamics of M2 channels in lipid bilayers revealing that the AH is dynamic, quickly breaking the fourfold symmetry observed in most structures. Next, we carried out MD simulations with the protein restrained in four and twofold symmetric conformations to determine the impact on the membrane shape. While each pattern was distinct, all configurations induced pronounced curvature in the outer leaflet, while conversely, the inner leaflets showed minimal curvature and significant lipid tilt around the AHs. The MDgenerated profiles at the protein–membrane interface were then extracted and used as boundary conditions in a continuum elastic membrane model to calculate the membranebending energy of each conformation embedded in different membrane surfaces characteristic of a budding virus. The calculations show that all three M2 conformations are stabilized in inwardbudding, concave spherical caps and destabilized in outwardbudding, convex spherical caps, the latter reminiscent of a budding virus. One of the C2broken symmetry conformations is stabilized by 4 kT in NGC surfaces with the minimum energy conformation occurring at a curvature corresponding to 33 nm radii. In total, our work provides atomistic insight into the curvature sensing capabilities of M2 channels and how enrichment in the nascent viral particle depends on protein shape and membrane geometry.