Introduction

Amyloids are highly ordered protein aggregates with self-templating activity. This activity drives the progression of multiple incurable diseases of aging, such as Alzheimer’s (Chiti and Dobson, 2017; Huang et al., 2019). Understanding how amyloids start, or nucleate, is therefore fundamental to preventing these diseases. However, despite decades of intense research, we still lack a clear picture of even the gross anatomy of an amyloid nucleus.

Polyglutamine (polyQ) is an amyloid-forming sequence common to eukaryotic proteomes (Mier et al., 2020). In humans, it is responsible for nine invariably fatal neurodegenerative diseases, the most prevalent of which is Huntington’s Disease. Cells exhibiting polyQ pathology, whether in Huntington’s disease patients, tissue culture, organotypic brain slices, or animal models, die independently and stochastically with a constant frequency (Clarke et al., 2000; Linsley et al., 2019). Two observations suggest that this emerges directly from the amyloid nucleation barrier. First, polyQ aggregation itself occurs stochastically in cells (Colby et al., 2006; Kakkar et al., 2016; Sinnige et al., 2021). Second, polyQ disease onset and progression are determined almost entirely by an intramolecular change, specifically, a genetically encoded expansion in the number of sequential glutamines beyond a protein-specific threshold of approximately 36 residues (Lieberman et al., 2019). Unlike other amyloid diseases (Book et al., 2018; Kim et al., 2020; Selkoe and Hardy, 2016), polyQ disease severity generally does not worsen with gene dosage (Cubo et al., 2019; Lee et al., 2019; Wexler et al., 1987), implying that the rate-determining step in neuronal death occurs in a fixed minor fraction of the polyglutamine molecules. It is unclear how, and why, those molecules differ from the bulk. Hence, more so than for any other proteopathy, therapeutic progress against polyQ diseases awaits detailed knowledge of the very earliest steps of amyloid formation.

Amyloid nuclei cannot be observed directly by any existing experimental approach. This is because unlike mature amyloid fibrils, which are stable and amenable to structural biology, nuclei are unstable by definition. Their structures do not necessarily propagate into or correspond with the structures of mature amyloids that arise from them (Auer et al., 2008; Buell, 2017; Hsieh et al., 2017; Levin et al., 2014; Liang et al., 2018; Li et al., 2010; Sil et al., 2018; Yamaguchi et al., 2005; Zanjani et al., 2020). Moreover, nucleation occurs far too infrequently, and involves far too many degrees of freedom, to simulate computationally from a naive state (Barrera et al., 2021; Kar et al., 2011; Strodel, 2021). Unlike phase separation, which primarily concerns a loss of intermolecular entropy, amyloid formation additionally involves a major loss of intramolecular entropy. That is, nucleation selects for a specific combination of backbone and side chain torsion angles (Khan et al., 2018; Vitalis and Pappu, 2011; Zhang and Schmit, 2016). As a consequence, amyloid-forming proteins can accumulate to supersaturating concentrations while remaining soluble, thereby storing potential energy that will subsequently drive their aggregation following a stochastic nucleating event (Buell, 2017; Khan et al., 2018).

Due to the improbability that the requisite increases in both density and conformational ordering will occur spontaneously at the same time (i.e. homogeneously), amyloid nucleation tends to occur heterogeneously. In other words, amyloids tend to emerge via a progression of relatively less ordered but more probable, metastable intermediates of varying stoichiometry and conformation (Auer et al., 2008; Buell, 2017; Hsieh et al., 2017; Levin et al., 2014; Liang et al., 2018; Li et al., 2010; Serio et al., 2000; Sil et al., 2018; Vekilov, 2012; Vitalis and Pappu, 2011; Yamaguchi et al., 2005; Zanjani et al., 2020). Heterogeneities divide the nucleation barrier into smaller, more probable steps, only one of which will be rate-limiting (Fig. 1A). Therefore, the occurrence of heterogeneities implies that the nature of the actual nucleus for a given amyloid can depend not only on the protein’s sequence but also its concentration and cellular factors that influence its conformation (Bradley et al., 2002; Buell, 2017; Collinge and Clarke, 2007; Sanders et al., 2014; Törnquist et al., 2018).

A hidden pattern of glutamines governs amyloid nucleation.

A. Reaction coordinate diagram schematizing the energy barriers for forming an amyloid nucleus homogeneously (left), facilitated by a density heterogeneity (middle), or facilitated by a conformational heterogeneity (right). Because amyloid involves a transition from disordered monomer to ordered multimer, the nucleation barrier (purple) results from a combination of high energy fluctuations in both density (blue) and conformation (red).

B. Cellular volumes quantize amyloid nucleation. Amyloid nuclei occur at such low concentrations that fewer than one exists in the femtoliter volumes of cells. This causes amyloid formation to be rate-limited by stochastic nucleation in individual yeast cells (right) but not in the microliter volumes of conventional in vitro kinetic assays (left). Taking a population-level snapshot of the extent of protein self-assembly as a function of concentration in each cell reveals heterogeneity attributable to the nucleation barrier.

C. DAmFRET plots, showing the extent of de novo self-assembly (AmFRET) as a function of protein concentration for polyglutamine (Q60) or polyasparagine (N60) in yeast cells, lacking endogenous amyloid ([pin-]). Cells expressing Q60 partition into distinct populations that either lack (no AmFRET) or contain (high AmFRET) amyloid. The bimodal distribution persists even among cells with the same concentration of protein, indicating that amyloid formation is rate-limited by nucleation. The nucleation barrier for N60 is so large that spontaneous amyloid formation occurs at undetectable frequencies. Insets show histograms of AmFRET values. AU, arbitrary units.

D. Bar plot of the fraction of [pin-] cells in the AmFRET-positive population for the indicated length variants of polyQ, along with the pathologic length thresholds for polyQ tracts in the indicated proteins. Shown are means +/-SEM of biological triplicates. CACNA1A, Cav2.1; ATXN2, Ataxin-2; HTT, Huntingtin; AR, Androgen Receptor; ATXN7, Ataxin-7; ATXN1, Ataxin-1; TBP, TATA-binding protein; ATN1, Atrophin-1; ATXN3, Ataxin-3.

E. Bar plot of the fraction of [pin-] cells in the AmFRET-positive population for the indicated sequences, showing that amyloid is inhibited when Q tracts are interrupted by a non-Q side chain at odd-numbered intervals. Shown are means +/-SEM of biological triplicates.

Heterogeneities may be responsible for amyloid-associated proteotoxicity. Partially ordered species accumulate during the early stages of amyloid aggregation by all well-characterized pathological amyloids, but generally do not occur (or less so) during the formation of functional amyloids (Otzen and Riek, 2019). In the case of pathologically expanded polyQ, or the huntingtin protein containing pathologically expanded polyQ, amyloid-associated oligomers have been observed in vitro, in cultured cells, and in the brains of patients (Auer et al., 2008; Hsieh et al., 2017; Legleiter et al., 2010; Levin et al., 2014; Liang et al., 2018; Li et al., 2010; Olshina et al., 2010; Sathasivam et al., 2010; Sil et al., 2018; Takahashi et al., 2008; Vitalis and Pappu, 2011; Yamaguchi et al., 2005; Zanjani et al., 2020), and are likely culprits of proteotoxicity (Kim et al., 2016; Leitman et al., 2013; Lu and Palacino, 2013; Matlahov and van der Wel, 2019; Miller et al., 2011; Takahashi et al., 2008; Wetzel, 2020). In contrast, mature amyloid fibers are increasingly viewed as benign or even protective (Arrasate et al., 2004; Kim et al., 2016; Leitman et al., 2013; Lu and Palacino, 2013; Matlahov and van der Wel, 2019; Takahashi et al., 2008; Wetzel, 2020). The fact that polyQ disease kinetics is governed by amyloid nucleation, but the amyloids themselves appear not to be responsible, presents a paradox. A structural model of the polyQ amyloid nucleus can be expected to resolve this paradox by illuminating a path for the propagation of conformational order into distinct multimeric species that either cause or mitigate toxicity.

Structural features of amyloid nucleation may be deduced by studying the effects on amyloid kinetics of rational mutations made to the polypeptide (Thakur and Wetzel, 2002). The occurrence of density heterogeneities confounds this approach, however, as they blunt the dependence of amyloid kinetics on concentration that could otherwise reveal nucleus stoichiometry (Vitalis and Pappu, 2011). Resolving this problem would require an assay that can detect nucleation events independently of amyloid growth kinetics, and which can scale to accommodate the large numbers of mutations necessary to identify sequence-structure relationships. Classic assays of in vitro amyloid assembly kinetics are poorly suited to this task, both because of their limited throughput, and because even the smallest experimentally tractable reaction volumes are too large to observe discrete aggregation events (Michaels et al., 2017).

We recently developed an assay to circumvent these limitations (Khan et al., 2018; Posey et al., 2021; Venkatesan et al., 2019). Distributed Amphifluoric FRET (DAmFRET) uses a photoconvertible fusion tag and high throughput flow cytometry to treat living cells as femtoliter-volume test tubes, thereby providing the large numbers of independent reaction vessels of exceptionally small volume that are required to discriminate independent nucleation events under physiological conditions (Fig. 1B). Compared to conventional reaction volumes for protein self-assembly assays, the budding yeast cells employed in DAmFRET increase the dependence of amyloid formation on nucleation by nine orders of magnitude (the difference in volumes), allowing amyloid to form in the same nucleation-limited fashion as it does in afflicted neurons (Colby et al., 2006; Wetzel, 2006). DAmFRET employs a high-variance expression system to induce a hundred-fold range of concentrations of the protein of interest. By taking a single snapshot of the protein’s extent of self-association in each cell, at each concentration, at a point in time appropriate for the kinetics of amyloid formation (hours), DAmFRET probes the existence and magnitude of critical density fluctuations that may govern nucleation.

The yeast system additionally allows for orthogonal experimental control over the critical conformational fluctuation. This is because yeast cells normally contain exactly one cytosolic amyloid species -- a prion state formed by the low abundance Q-rich endogenous protein, Rnq1 (Kryndushkin et al., 2013; Nizhnikov et al., 2014). The prion state can be gained or eliminated experimentally to produce cells whose sole difference is whether the Rnq1 protein does ([PIN+]) or does not ([pin-]) populate an amyloid conformation (Derkatch et al., 2001). [PIN+] serves as a partial template for amyloid formation by compositionally similar proteins including polyQ (Alexandrov et al., 2008; Duennwald et al., 2006a; Meriin et al., 2002; Serpionov et al., 2015). Known as cross-seeding, this phenomenon is analogous to, but much less efficient than, amyloid elongation by molecularly identical species (Keefer et al., 2017; Khan et al., 2018; Serio, 2018). By evaluating nucleation frequencies as a function of concentration in both [pin-] and [PIN+] cells, we can uncouple the two components of the nucleation barrier and thereby relate specific sequence features to the nucleating conformation. Sequence changes that are specific to the sequence-encoded nucleus will more strongly influence nucleation in [pin-] cells than in [PIN+] cells.

PolyQ is an ideal polypeptide with which to deduce the physical nature of a pathologic amyloid nucleus. It has zero complexity, which profoundly simplifies the design and interpretation of sequence variants. Unlike other pathogenic protein amyloids, which occur as different structural polymorphs each plausibly with their own structural nucleus, polyQ amyloids have an invariant core structure under different assembly conditions and in the context of different flanking domains (Boatz et al., 2020; Galaz-Montoya et al., 2021; Lin et al., 2017; Schneider et al., 2011). This core contains antiparallel beta-sheets (Buchanan et al., 2014; Matlahov and van der Wel, 2019; Schneider et al., 2011), while most other amyloids, including all other Q-rich amyloids, have a parallel beta-sheet core (Eisenberg and Jucker, 2012; Margittai and Langen, 2008). This suggests that amyloid nucleation and pathogenesis of polyQ may follow from an ability to spontaneously acquire a specific nucleating conformation. Any variant that decelerates nucleation can therefore be interpreted with respect to its effect on that conformation.

The experiments and results reported herein reveal that the polyQ amyloid nucleus is a steric zipper encoded by a pattern of approximately twelve Q residues in a single polypeptide molecule. We find that the clinical length threshold for polyQ disease is the minimum length that can encompass the pattern. Consistent with a monomeric nucleus, amyloid formation occurred less frequently at high concentrations or when polyQ was expressed in oligomeric form. We further found that the nucleus promotes its own kinetic arrest by templating competing dimensions of Q zipper ordering -- both along the amyloid axis and orthogonally to it. This leads to the accumulation of partially ordered aggregates.

Results

A hidden pattern of glutamines governs amyloid nucleation

To validate the use of DAmFRET for pathologic polyQ, we first surveyed the length-dependence for amyloid formation by polyQ tracts expressed as fusions to mEos3.1 in nondividing [pin-] yeast cells. We genetically compromised protein degradation in these cells to prevent differential turnover of potential polyQ heterogeneities, which could otherwise obscure the relationship of aggregation to concentration. Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner (Fig. 1C top, 1D, S1A). Specifically, the cells partitioned into two discontinuous populations: one with high AmFRET and the other with none (top and bottom populations in Fig. 1C). The two populations occurred with overlapping concentration ranges (red dashed box in Fig. 1C top), indicating a kinetic barrier to forming the high-FRET state. The two populations were well-resolved at high concentrations, where relatively few cells populated intermediate values of AmFRET. These features together indicate that aggregation of polyQ is rate-limited by a nucleation barrier associated with a rare conformational fluctuation.

Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments, and corresponds clinically to disease onset in early adulthood (Kuiper et al., 2017). We therefore focused on this length moving forward. The frequency of polyQ nucleation greatly exceeds that of other Q-rich amyloid-forming proteins of comparable length (Khan et al., 2018; Posey et al., 2021). To determine if the protein’s exceptional amyloid propensity can be attributed to a property of the Q residues themselves, as opposed to its extremely low sequence complexity, we next tested a 60-residue homopolymer of asparagine (polyN), whose side chain is one methylene shorter but otherwise identical to that of glutamine. PolyN populated the high FRET state at much lower frequencies (typically undetectable) than polyQ even at the highest concentrations (Fig. 1C bottom, approximately 200 µM (Khan et al., 2018)).

The much larger nucleation barrier for polyN -- despite its physicochemical similarity to polyQ -- led us to consider whether substituting Qs for Ns at specific positions in polyQ might reveal patterns that uniquely encode the structure of the polyQ nucleus. We reasoned that a random screen of Q/N sequence space would be unlikely to yield informative patterns, however, given that there are more than one trillion combinations of Q and N for even a 40-residue peptide. We therefore devised a systematic approach to rationally sample Q/N sequence space. Specifically, we characterized by DAmFRET a series of related sequences with a single N residue inserted after every q Q residues over a total length of 60, for all values of q from 0 through 9 (designated Q1N, Q2N, etc. with the value of q subscripted; Table S1). The resulting dataset revealed a shockingly strong dependence of amyloid propensity on the exact sequence of Q residues, and specifically the following two determinants.

First, we observed amyloid formation for all values of q ≥ 6 (Fig. S1B). Second, amyloid formation for values of q < 6 was limited to odd numbers 1, 3, and 5 (Fig. 1E). This simple pattern could be explained by a local requirement for Q at every other position along the sequence (i.e. i, i+2, i+4, etc.). To test this, we inserted a second N residue into the repeats for two such sequences. Indeed, these failed to aggregate (Fig. S1C).

To determine if this “odd” dependency results from a general preference of amyloid for homogeneity at every other position, irrespective of their amino acid identity, we created an analogous “polyN” series with a single Q placed after every first, second, third, or fourth N. This series showed no preference for even or odd values (Fig. S1C), confirming that Qs, specifically, are required at every other position. We next asked if the odd dependency concerns the Q side chains themselves, rather than their interaction with Ns, by replacing the Ns with residues of diverse physicochemistry -- either glycine, alanine, serine, or histidine. Again, nucleation was much more frequent for Q3X and Q5X than for Q4X (Fig. 1E, S1D), confirming that the odd dependency resulted from the Q side chains interacting specifically with other Q side chains. The identity of X did, however, influence nucleation frequency particularly for Q4X. Glycine proved just as detrimental as asparagine; histidine slightly less so; and alanine lesser still. Serine was relatively permissive. The relative indifference of Q3X and Q5X to the identity of X suggests that these sequences form a pure Q core that excludes every other side chain. In other words, the nucleus is encoded by segments of sequence with Qs at every other position.

To reduce the contribution of the conformational fluctuation to nucleation, we next performed DAmFRET experiments for all sequences in [PIN+] cells. Amyloid formation broadly increased (Fig. S1A-D), with polyQ now nucleating in most [PIN+] cells even at the lowest concentrations sampled by DAmFRET (∼1 µM, (Khan et al., 2018)). Importantly, amyloid formation again occurred only for polyQ lengths exceeding the clinical threshold. Additionally, nucleation in [PIN+] retained the structural constraints of de novo nucleation, as indicated by a) the relative impacts of different X substitutions, which again increased in the order serine < alanine < histidine < glycine/asparagine; and b) the persistence of an odd preference, with q = 1, 3, and 5 nucleating more frequently than q = 4. The latter failed entirely to form amyloid in the cases of asparagine and glycine. All N-predominant sequences and only two Q-predominant sequences (Q2N and (Q3N)2N; introduced below) nucleated robustly in [PIN+], even though they failed entirely to do so in [pin-] cells. Their behavior resembles that of typical “prion-like” sequences (Khan et al., 2018; Posey et al., 2021), consistent with our deduction that these amyloids necessarily have a different nucleus structure than that of uninterrupted polyQ.

The pattern encodes a single, singular steric zipper

What structure does the pattern encode, and why is it so sensitive to Ns? We sought to uncover a physical basis for the different nucleation propensities between Q and N using state-of-the-art amyloid predictors (Charoenkwan et al., 2021; Keresztes et al., 2021; Prabakaran et al., 2021). Unfortunately, none of the predictors were able to distinguish Q3N and Q5N from Q4N (Table S2), despite their very dissimilar experimentally-determined amyloid propensities. In fact, most predictors failed outright to detect amyloid propensity in this class of sequences. Apparently, polyQ is exceptional among the known amyloid-forming sequences on which these predictors were trained, again hinting at a specific nucleus structure.

Most pathogenic amyloids feature residues that are hydrophobic and/or have a high propensity for beta strands. Glutamine falls into neither of these categories (Fujiwara et al., 2012; Nacar, 2020; Simm et al., 2016). We reasoned that nucleation of polyQ therefore corresponds to the formation of a specific tertiary structure unique to Q residues. Structural investigations of predominantly Q-containing amyloid cores reveal an exquisitely ordered tertiary motif wherein columns of Q side chains from each sheet fully extend and interdigitate with those of the opposing sheet (Hoop et al., 2016; Schneider et al., 2011; Sikorski and Atkins, 2005). The resulting structural element exhibits exceptional shape complementarity (even among amyloids) and is stabilized by the multitude of resulting van der Waals interactions. Additionally, the terminal amides of Q side chains form regular hydrogen bonds to the backbones of the opposing beta-sheet (Esposito et al., 2008; Hervas et al., 2020; Man et al., 2015; Schneider et al., 2011; Y. Zhang et al., 2016). To minimize confusion while referring to these distinguishing tertiary features of the all-Q amyloid core, relative to the similarly packed cores of virtually all other amyloids, which lack regular side chain-to-backbone H bonds (Eisenberg and Sawaya, 2017; Sawaya et al., 2007), we will henceforth refer to them as “Q zippers” (Fig. 2A).

The pattern encodes a single, singular steric zipper.

A. A view down the axis of a local segment of all-glutamine steric zipper, or “Q zipper”, between two antiparallel two-stranded sheets. Residues with internally facing side-chains on the top layer are colored red to emphasize interdigitation and H-bonding (dashed lines) between the terminal amides and the opposing backbone.

B. Schema of the “odd dependency” for Q zipper formation, showing side chain arrangements along a continuous beta strand for sequences composed of tandem repeats of Qs (red) interrupted by single Ns (gray). Shading highlights contiguous stretches of Qs that would occur in a continuous beta strand. Note that the illustrated strands will not necessarily be continuous in the context of the nucleus; i.e. the nucleus may contain shorter strands connected by loops.

C. Schema of the tertiary contacts between two beta strands, as in a steric zipper. The zipper can be formed only when the single interrupting non-Q residue follows an odd number of Qs (e.g. Q1N), but not when it follows an even number of Qs (e.g. Q2N).

D. Molecular simulations of model Q zippers formed by a pair of two-stranded antiparallel beta sheets, wherein non-Q residues (in red) face either inward or outward. The schema are oriented so the viewer is looking down the axis between two sheets. The zipper is stable for pure polyQ (QQQQQQQ, top simulation), or when substitutions face outward (QQQNQQQ, second simulation; and QNQNQNQ, fourth simulation), but not when even a single substitution faces inward (QQQNQQQ, third simulation).

E. Snapshot from the uninterrupted Q zipper simulation, showing H-bonds (black arrows) between internal extended Q side chains and the opposing backbones.

F. Snapshot from the internally interrupted Q zipper simulation, illustrating that the side chain of N is too short to H-bond the opposing backbone. However, the N side chain is long enough to H-bond the opposing Q side chain (red arrow), thereby intercepting the side chain-backbone H-bond that would otherwise occur (dashed arrow) between that Q side chain and the backbone amide adjacent to the N. This leads to dissolution of the zipper.

We reasoned that the entropic cost of acquiring such an exquisitely ordered structure by the entirely disordered ensemble of naive polyQ (Chen et al., 2001; Moradi et al., 2012; Newcombe et al., 2018; Vitalis et al., 2007; Wang et al., 2006) may very well underlie the nucleation barrier. If so, then the odd dependency could relate to the fact that successive side chains alternate by 180° along a beta strand (Fig. 2B-C). This would force non-Q side chains into the zipper for all even values of q. Our experimental results imply that the intruding residue is most disruptive when it is an N, and much less so when it is an S. To determine if the Q zipper has such selectivity, we carried out fully atomistic molecular dynamics simulations on Q-substituted variants of a minimal Q zipper model (Y. Zhang et al., 2016). Specifically, we aligned four Q7 peptides into a pair of interdigitated two-stranded antiparallel β-sheets with interdigitated Q side chains. We then substituted Q residues in different positions with either N or S and allowed the structures to evolve for 200 ns. In agreement with the experimental data, the Q zipper was highly specific for Q side chains: it rapidly dissolved when any proximal pair of inward pointing Qs were substituted, as would occur for unilaterally discontinuous sequences such as Q4X (Fig. 2D, S2A). In contrast, it remained intact when any number of outward-pointing Q residues were substituted, as would occur for Q3X and Q5X. Given the sensitivity of this minimal zipper to substitutions, we next constructed a zipper with twice the number of strands, where each strand contained a single inward pointing N or S substitution, and allowed the structures to evolve for 1200 ns. In this case, the S-containing zipper remained intact while the N-containing zipper dissolved (Fig. S2B), consistent with our experimental results.

A close examination of the simulation trajectories revealed how N disrupts Q zippers. When inside a Q zipper, the N side chain is too short to H-bond the opposing backbone. In our simulations, the terminal amide of N competed against the adjacent backbone amides for H-bonding with the terminal amide of an opposing Q side-chain, blocking that side chain from fully extending as required for Q zipper stability (Fig. 2E-F). The now detached Q side chain collided with adjacent Q side chains, causing them to also detach, ultimately unzipping the zipper (Fig. S2C and Movies S1, S2). We found that the side chain of S, which is slightly shorter and less bulky than that of N, does not intercept H-bonds from the side chains of opposing Q residues (Fig. S2D). As a result, order is maintained so long as multiple S side chains do not occur in close proximity inside the zipper.

We also considered an alternative mechanism whereby N side chains H-bonded to adjacent Q side chains on the same side of the strand (i.e. i + 2). This structure -- termed a “polar clasp” (Gallagher-Jones et al., 2018) -- would be incompatible with interdigitation by opposing Q side chains (Fig. S2E). We therefore quantified the frequencies and durations of side chain-side chain H-bonds in simulated singly N-substituted polyQ strands in explicit solvent, both in the presence and absence of restraints locking the backbone into a beta strand. This analysis showed no evidence that Q side chains preferentially H-bond adjacent N side chains (Fig. S2F-G), ruling out the polar clasp mechanism of inhibition.

In sum, our simulations are consistent with polyQ amyloid nucleating from a single Q zipper comprising two β-sheets engaged across an interface of fully interdigitated Q side chains.

The Q zipper grows in two dimensions

Prior structural studies show that polyQ amyloids have a lamellar or “slab-like” architecture (Boatz et al., 2020; Galaz-Montoya et al., 2021; Nazarov et al., 2022; Sathasivam et al., 2010; Sharma et al., 2005). This indicates that a nucleus comprising a single Q zipper would have to propagate not only axially but also laterally as it matures toward amyloid, provided that both sides of each strand can form a Q zipper (Fig. 3A). We observed that when expressed to high concentrations, sequences with q values larger than 5 generally reached higher AmFRET values than those with q = 3 or 5 (Fig. 3B). This suggests that the former has a higher subunit density consistent with a multilamellar structure. We will therefore refer to such sequences as bilaterally contiguous or QB and sequences that are only capable of Q zipper formation on one side of a strand, such as Q3X and Q5X, as unilaterally contiguous or QU. AmFRET is a measure of total cellular FRET normalized by concentration. In a two phase regime, AmFRET scales with the fraction of protein in the assembled phase (Posey et al., 2021). At very high expression levels where approximately all the protein is assembled, AmFRET should theoretically approach a maximum value determined only by the proximity of fluorophores to one another in the assembled phase. This proximity in turn reflects the density of subunits in the phase as well as the orientation of the fluorophores on the subunits. To control for the latter, we varied the terminus and type of linker used to attach mEos3.1 to polyQ, Q3N, and Q7N. We found that polyQ and Q7N achieved higher AmFRET than Q3N regardless of linker terminus and identity (Fig. S3A). The high AmFRET level achieved by polyQ amyloids therefore results from the subunits occurring in closer proximity than in amyloids that lack the ability to form lamella.

The Q zipper grows in two dimensions.

A. Schematic illustrating how sequences with bilaterally contiguous Qs (QB) can hypothetically allow for lateral growth (secondary nucleation) of Q zippers giving rise to lamellar amyloid fibers. In contrast, sequences with only unilateral contiguous Qs (QU) can form amyloids with only a single Q zipper.

B. Maximum AmFRET values for the indicated sequences in [pin-] cells, suggesting that QB amyloids have a greater subunit density. Shown are means +/-SEM of the median AmFRET values of triplicates. **** p < 0.0001; ANOVA and Dunnett’s multiple comparison test.

C. Densitometric analysis of SDD-AGE characterizing amyloid length distributions for the indicated QU and QB amyloids, showing that QB amyloid particles are larger. Data are representative of multiple experiments.

D. Fraction of cells at intermediate AmFRET values for the indicated sequences in [pin-] cells, suggesting that QB amyloids grow slower. Shown are means +/-SEM of the percentage of cells between lower and upper populations, of triplicates. ****, ***, * p < 0.0001, < 0.001, < 0.05; ANOVA.

To more directly investigate structural differences between amyloids of these sequences, we analyzed the size distributions of SDS-insoluble multimers using semi-denaturing detergent-agarose gel electrophoresis (SDD-AGE) (Halfmann and Lindquist, 2008; Kryndushkin et al., 2003). We found that QU amyloid particles were smaller than those of QB (Fig. 3C, S3B). This difference necessarily means that they either nucleate more frequently (resulting in more but smaller multimers at steady state), grow slower, and/or fragment more than polyQ amyloids (Knowles et al., 2009). The DAmFRET data do not support the first possibility (Fig. S1D). To compare growth rates, we examined AmFRET histograms through the bimodal region of DAmFRET plots for [pin-] cells. Cells with intermediate AmFRET values (i.e. cells in which amyloid had nucleated but not yet reached steady state) were less frequent for QU (Fig. 3D, S3C), which is inconsistent with the second possibility. Therefore, the SDD-AGE data are most consistent with the third possibility -- increased fragmentation. We attribute this increase to a difference in the structures rather than sequences of the two types of amyloids, because Q and N residues are not directly recognized by protein disaggregases (Alexandrov et al., 2012, 2008; Osherovich et al., 2004). Based on the fact that larger amyloid cores, such as those with multiple steric zippers, oppose fragmentation (Tanaka et al., 2006; Verges et al., 2011; Zanjani et al., 2020), these data are therefore consistent with a lamellar (Fig. 3A) architecture for polyQ.

Q zippers poison themselves

A close look at the DAmFRET data reveals a complex relationship of amyloid formation to concentration. While all sequences formed amyloid more frequently with concentration in the low regime, as concentrations increased further, the frequency of amyloid formation plateaued somewhat before increasing again at higher concentrations (Fig. 4A). A multiphasic dependence of amyloid formation on concentration implies that some higher-order species inhibits nucleation and/or growth (Vitalis and Pappu, 2011). The non-amyloid containing cells in the lower population lacked AmFRET altogether, suggesting that the inhibitory species either have very low densities of the fluorophore (e.g. due to co-assembly with other proteins), and/or do not accumulate to detectable concentrations. Liquid-liquid phase separation by low-complexity sequences can give rise to condensates with low densities and heterogeneous compositions (Wei et al., 2017), and condensation has been shown to inhibit amyloid formation in some contexts (Gabryelczyk et al., 2022; Küffner et al., 2021; Lipiński et al., 2022). We therefore inspected the subcellular distribution of representative proteins in different regions of their respective DAmFRET plots. The amyloid-containing cells had large round or stellate puncta, as expected. To our surprise, however, cells lacking AmFRET contained exclusively diffuse protein (no detectable puncta), even at high expression (Fig. 4B, S4A). This means that naive polyQ does not itself phase separate under normal cellular conditions. The inhibitory species are therefore some form of soluble oligomer. That these oligomers are too sparse to detect by AmFRET, even at high total protein concentration, further suggests that they are dead-end (kinetically trapped) products of the Q zipper nuclei themselves. In other words, high concentrations of soluble polyQ stall the growth of nascent Q zippers.

Q zippers poison themselves.

A. Fraction of [PIN+] cells in the AmFRET-positive population as a function of concentration for the indicated sequences. Arrows denote the population of cells with self-poisoned aggregation (inset), and the corresponding plateaus in the relationship of amyloid formation to concentration. The purple arrow highlights the sharp reduction in aggregation for Q7N relative to Q5N, which we attribute to enhanced poisoning as a result of intramolecular Q zipper formation. Shown are means +/-SEM of triplicates.

B. Distribution of cytosolic concentrations (AFU/μm3) of Q60 in [pin-] cells either lacking or containing puncta, showing that the protein remains diffuse even when supersaturated relative to amyloid. Representative diffuse or punctate cells (N = 31 and 26, respectively) of equivalent total concentration are shown. Scale bar: 5 μm.

C. Schematic illustrating self-poisoned growth as a function of concentration for Q zippers of Q5N and Q7N. Conformational conversion of Q5N to amyloid decelerates (becomes poisoned) at high concentrations, as a consequence of polypeptides interfering with each other’s conversion on the templating surface. This is illustrated here by the red trace and inset showing entangled, partially ordered polypeptides on the axial surface. The presence of contralaterally contiguous Qs in Q7N exacerbates poisoning at low concentrations, as illustrated here by the blue trace and inset showing partially-ordered species immobilized with bilateral zippers. Growth resumes at high concentrations through relatively disordered deposition with subsequent slow ordering.

D. Graph of spline fits of AmFRET values for the indicated sequences in [PIN+] cells. The upper and lower populations of Q3N and Q5N were treated separately due to the extreme persistence of the low population for these sequences. The red dashed lines denote these are subpopulations of the same samples. The ability of amyloid to grow at low concentrations fell sharply with the onset of bilateral contiguity at Q6N and then gradually increased with higher q values. Shown are means +/-SEM of triplicates.

E. Histogram of AmFRET values for Q7N -expressing cells transitioning from the low to high populations (boxed region from DAmFRET plot in Fig. S4D) upon translation inhibition for six hours following 18 hours of expression. Shown are means +/-95% CI of biological triplicates. Blocking new protein synthesis prior to analysis causes AmFRET to rise, whether by cycloheximide or lactimidomycin (p < 0.01, < 0.05, respectively, Dunnett’s test).

We were struck by the similarity of this phenomenon to “self-poisoning” in the polymer crystal literature (Sadler, 1983; Ungar et al., 2005; Whitelam et al., 2016; M. Zhang et al., 2016; Zhang et al., 2021). Self-poisoning is a deceleration of crystal growth with increasing driving force. In short, when multiple molecules simultaneously engage the templating crystal surface, they tend to “trap” each other in partially ordered configurations that block the recruitment of subsequent molecules (Fig. 4C, red). This phenomenon requires conformational conversion on the templating surface to be slow relative to the arrival of new molecules. Indeed, disordered polyQ is an extremely viscous globule (Crick et al., 2006; Dougan et al., 2009; Kang et al., 2017; Walters and Murphy, 2009) whose slow conformational fluctuations limit the rate of amyloid elongation (Walters et al., 2012).

Self-poisoning is most extreme where phase boundaries converge, as when a polymer is equally compatible with either of two crystal polymorphs (Hu, 2018; Ungar et al., 2005). For example, the rate of polymer crystallization rises, falls, and then rises again as a function of polymer length, where the second minimum results from the polymer heterogeneously conforming to either a single-long-strand polymorph or a hairpin polymorph (Fig. S4B). This is a consequence of secondary nucleation within the molecule (Hu, 2018; Ungar et al., 2005; M. Zhang et al., 2016; Zhang et al., 2021). Remarkably, an analogous phenomenon manifested for Q zippers. Specifically, we noticed that while Q5N formed amyloid robustly at low concentrations (i.e. prior to self-poisoning at higher concentrations), Q7N did not (Fig. 4A, purple arrow). We were initially perplexed by this because all odd q sequences have a fully contiguous pattern of Qs at every other position that should allow for Q zipper nucleation. We then realized, however, that the anomalous solubility of Q7N coincides with the expected enhancement of self-poisoning at the transition between single long Q zippers (favored by QU sequences) and short lamellar Q zippers (favored by QB sequences). In other words, the striking difference between Q5N and Q7N appears to result from the formation of intramolecular Q zippers with strands only six residues long, which form contralaterally to the intermolecular Q zipper interface (Fig. 4C, blue). As for other polymers, QB sequences increasingly escaped the poisoned state as the potential length of lamellar strands increased beyond the minimum, i.e. with increasing q values beyond 6 (Fig. 4D).

In our experimental system, ongoing protein translation will cause the polypeptide to become increasingly supersaturated with respect to the self-poisoned amyloid phase. However, as concentrations increase, the contributions of unstructured low affinity interactions (primarily H-bonds between Q side chains (Kang et al., 2018; Punihaole et al., 2018)) to intermolecular associations will increase relative to the contributions of Q zipper elements. Disordered polyQ has a greater affinity for itself than does disordered polyN (Halfmann et al., 2011). The ability of QB sequences to escape poisoning with increasing concentrations may therefore be a simple consequence of their greater content of Qs. Indeed, polyQ amyloid (Walters et al., 2012), as for other polymer crystals (Zhang et al., 2021), has a relatively disordered growth front at high polymer concentrations.

Our finding that polyQ does not phase-separate prior to Q zipper nucleation implies that intermolecular Q zippers will be less stable than intramolecular Q zippers of the same length, because the effective concentration of Q zipper-forming segments will always be lower outside the intramolecular globule than inside it. Therefore, the critical strand length for growth of a single (non-lamellar) Q zipper must be longer than six, whereas strands of length six will only grow in the context of lamella. To test this prediction, we designed a series of sequences with variably restricted unilateral or bilateral contiguity (Fig. S4C-D). We found that, while all sequences formed amyloid with a detectable frequency in [PIN+] cells, only those with at least five unilaterally contiguous Qs, or at least six bilaterally contiguous Qs, did so in [pin-] cells. Given that the ability to nucleate in the absence of a pre-existing conformational template is characteristic of Q zippers, this result is consistent with a threshold strand length of nine or ten residues for single Q zippers to propagate, and six residues for lamellar Q zippers to propagate.

One consequence of self-poisoning is that crystallization rates accelerate with monomer depletion (Ungar et al., 2005). In our system, poisoning should be most severe in the early stages of amyloid formation when templates are too few for polypeptide deposition to outpace polypeptide synthesis, thereby maintaining concentrations at poisoning levels. For sufficiently fast nucleation rates and slow growth rates, this property predicts a bifurcation in AmFRET values as concentrations enter the poisoning regime, as we see for QB in [PIN+] cells. To determine if the mid-AmFRET cells in the bifurcated regime indeed have not yet reached steady state, we used translation inhibitors to stop the influx of new polypeptides -- and thereby relax self-poisoning -- for six hours prior to analysis. As predicted, treated AmFRET-positive cells in the bifurcated regime achieved higher AmFRET values than cells whose translation was not arrested (Fig. 4E, S4E).

The nucleus forms within a single molecule

Because amyloid formation is a transition in both conformational ordering and density, and the latter is not rate-limiting over our experimental time-scales (Khan et al., 2018), the critical conformational fluctuation for most amyloid-forming sequences is generally presumed to occur within disordered multimers (Auer et al., 2008; Buell, 2017; Serio et al., 2000; Vekilov, 2012; Vitalis and Pappu, 2011). In contrast, homopolymer crystals nucleate preferentially within monomers when the chain exceeds a threshold length (Hu, 2018; Xu et al., 2021). The possibility that polyQ amyloid may nucleate as a monomer has long been suspected to underlie the length threshold for polyQ pathology (Chen et al., 2002). A minimal intramolecular Q zipper would comprise a pair of two-stranded beta sheets connected by loops of approximately four residues (Chou and Fasman, 1977; Hennetin et al., 2006). If we take the clinical threshold of approximately 36 residues as the length required to form this structure, then the strands must be approximately six residues with three unilaterally contiguous Qs -- exactly the length we deduced for intramolecular strand formation in the previous section.

This interpretation leads to an important prediction concerning the inability of Q4N to form amyloid. With segments of four unilaterally and bilaterally contiguous Qs, this sequence should be able to form the hypothetical intramolecular Q zipper, but fail to propagate it either axially or laterally as doing so requires at least five unilaterally or six bilaterally contiguous Qs, respectively. If the rate-limiting step for amyloid is the formation of a single short Q zipper, then appending Q4N to another Q zipper amyloid-forming sequence will facilitate its nucleation. If nucleation instead requires longer or lamellar zippers, then appending Q4N will inhibit nucleation. To test this prediction, we employed Q3N as a non-lamellar “sensor” of otherwise cryptic Q zipper strands donated by a 30 residue Q4N tract appended to it. To control for the increased total length of the polypeptide, we separately appended a non Q zipper-compatible tract: Q2N. As predicted, the Q4N appendage increased the fraction of cells in the high-AmFRET population relative to those expressing Q3N alone, and even more so relative to those expressing the Q2N-appended protein (Fig. 5A, S5A). These data are consistent with a single short Q zipper as the amyloid nucleus.

The nucleus forms within a single molecule.

A. Fraction of cells in the AmFRET-positive population when expressing Q3N with the indicated non-amyloidogenic fusions. Shown are means +/-SEM of triplicates. ***, ** p < 0.001, < 0.01; ANOVA.

B. Fraction of cells in the AmFRET-positive population (with higher AmFRET than that of the oligomer itself) when expressing the indicated protein fused to proteins with the indicated stoichiometry. Shown are means +/-SEM of triplicates. *** p < 0.001; t-test.

C. Schema, DAmFRET plots, and quantitation of amyloid formation by a synthetic minimal polyQ amyloid-forming sequence. Q side chains in the nucleating zipper are colored red, while those necessary for growth of the zipper -- which requires lateral propagation due to its short length -- are colored blue. The three G3 loops are represented by dashed gray lines; the actual topology of the loops may differ. Mutating a single Q to N blocks amyloid formation. Shown are means +/- SEM of triplicates. ** p < 0.01; t-test.

To now determine if the nucleating zipper occurs preferentially in a single polypeptide chain, we genetically fused QU and QB to either of two homo-oligomeric modules: a coiled coil dimer (oDi, (Fletcher et al., 2012)), or a 24-mer (human FTH1, (Bracha et al., 2018)). Importantly, neither of these fusion partners contain β-structure that could conceivably template Q zippers. Remarkably, the oDi fusion reduced amyloid formation, and the FTH1 fusion all but eliminated it, for both QU and QB sequences (Fig. 5B, S5B). To exclude trivial explanations for this result, we additionally tested fusions to oDi on the opposite terminus, with a different linker, or with a mutation designed to block dimerization (designated Odi{X}). The inhibitory effect of oDi manifested regardless of the terminus tagged or the linker used, and the monomerizing mutation rescued amyloid formation (Fig. S5C-F), altogether confirming that preemptive oligomerization prevents Q zipper formation.

Finally, we designed a sequence representing the minimal “polyQ” amyloid nucleus. This sequence has the precise number and placement of Qs necessary for amyloid to form via intramolecular Q zipper nucleation: four tracts of six Qs linked by minimal loops of three glycines. The sequence itself is too short -- at 33 residues -- to form amyloid unless the glycine loops are correctly positioned (recall from Fig. 1D and S1A that pure polyQ shorter than 40 residues does not detectably aggregate). We found that the protein indeed formed amyloid robustly, and with a concentration-dependence and [PIN+]-independence that is characteristic of QB (Fig. 5C, S5G). If the nucleus is indeed a monomer, then a defect in even one of the four strands should severely restrict amyloid formation. We therefore mutated a single Q residue to an N. Remarkably, this tiny change -- removing just one carbon atom from the polypeptide -- completely eliminated amyloid formation (Fig. 5C, S5G). We conclude that the polyQ amyloid nucleus is an intramolecular Q zipper.

Discussion

PolyQ amyloid begins within a molecule

The initiating molecular event leading to pathogenic aggregates is the most important yet intractable step in the progression of age-associated neurodegenerative diseases such as Huntington’s. Here we used our recently developed assay to identify sequence features that govern the dependence of nucleation frequency on concentration and conformational templates, to deduce that amyloid formation by pathologically expanded polyQ begins with the formation of a minimal Q zipper within a single polypeptide molecule.

Kinetic studies performed both with synthetic peptides in vitro (Bhattacharyya et al., 2005; Chen et al., 2002; Kar et al., 2011) and recombinant protein in animals (Sinnige et al., 2021) and neuronal cell culture (Colby et al., 2006) have found that polyQ aggregates with a reaction order of approximately one. Under the assumption of homogeneous nucleation, this implies that pathogenic nucleation occurs within a monomer (Bhattacharyya et al., 2005; Chen et al., 2002; Kar et al., 2011). However, that conclusion, along with the assumption it rests on, contrasts with observations that polyQ can form apparently non-amyloid multimers prior to amyloids in vitro (Crick et al., 2006; Vitalis and Pappu, 2011), and that phase separation can promote amyloid formation (Babinchak and Surewicz, 2020; Camino et al., 2021; Shin and Brangwynne, 2017; Sprunger and Jackrel, 2021). By explicitly manipulating concentration, length, density and conformational heterogeneities, we found that, indeed, the rate-limiting step for pathologic polyQ amyloid formation occurs within a monomer. Importantly, it does so even in the dynamic, intermingled, living cellular environment.

Once formed, this minimal Q zipper germinates in all three dimensions to ultimately produce amyloid fibers with lamellar zippers longer than that of the nucleus proposed here. Both features have been confirmed by structural studies (Boatz et al., 2020; Galaz-Montoya et al., 2021; Hoop et al., 2016; Nazarov et al., 2022). The difference in critical lengths between intra- and intermolecular Q zippers follows from the fact that intramolecular segments are covalently constrained to much higher effective concentrations. In the case of polyQ, the short-stranded nucleus is merely a catalyst for longer, lamellar Q zippers that make up the amyloid core. That the nascent amyloid should become progressively more ordered is in keeping with decades of experimental and theoretical work by polymer physicists (Keller and O’Connor, 1957; Xu et al., 2021; M. Zhang et al., 2016). The minimally competent zipper identified here provides lower bounds on the widths of fibers -- they can be as thin as two sheets (1.6 nm) and as short as six residues. This hypothetical restriction is thinner and shorter than the cores of all known amyloid structures of full length polypeptides (Sawaya et al., 2021).

Remarkably, observations from recent cryoEM tomography studies closely match this prediction: pathologic polyQ or huntingtin amyloids, whether in cells or in vitro, feature a slab-like architecture with restrictions as thin as 2 nm (Galaz-Montoya et al., 2021) and as short as five residues (Nazarov et al., 2022).

Multidimensional polymer crystal growth offers a simple explanation for the characteristically antiparallel arrangement of strands in polyQ amyloid. Each lamellum first nucleates then propagates along the templating lamellum by the back-and-forth folding of monomers (M. Zhang et al., 2016). In the context of polyQ, the end result is a stack of beta hairpins. If lamella are responsible for antiparallel strands, then QU sequences should form typical amyloid fibers with parallel beta strands. The only atomic resolution structure of a Q zipper amyloid happens to be of a protein with striking unilateral (but not bilateral) contiguity, and the strands are indeed arranged in parallel (Hervas et al., 2020).

Seven of the nine polyQ diseases have length thresholds at or above (Lieberman et al., 2019) the minimum length for an intramolecular Q zipper as deduced here. The two polyQ diseases with length thresholds below that of the intramolecular Q zipper, SCA2 (32 residues) and SCA6 (21 residues), are also atypical for this class of diseases in that disease onset is accelerated in homozygous individuals (Laffita-Mesa et al., 2012; Mariotti et al., 2001; Soga et al., 2017; Spadafora et al., 2007; Tojima et al., 2018). This is consistent with nucleation occurring in oligomers, as would necessarily be the case for such short polyQ tracts.

We expect that the monomeric nucleus of polyQ will prove to be unusual among pathologic amyloid-forming proteins. Among the hundreds of amyloid-forming proteins we have now characterized by DAmFRET (Khan et al., 2018; Posey et al., 2021), and unpublished), only Cyc8 PrD has a similar concentration-dependence. We now attribute this to self-poisoning due to its exceptionally long tract of 50 unilaterally contiguous Qs, primarily in the form of QA dipeptide repeats.

Proteotoxic multimers are likely to be self-poisoned polymer crystals

Soluble oligomers accumulate during the aggregation of pathologically lengthened polyQ and/or Htt in vitro (Auer et al., 2008; Hsieh et al., 2017; Levin et al., 2014; Liang et al., 2018; Li et al., 2010; Sil et al., 2018; Vitalis and Pappu, 2011; Yamaguchi et al., 2005; Zanjani et al., 2020), in cultured cells (Olshina et al., 2010; Takahashi et al., 2008), and in the brains of patients (Legleiter et al., 2010; Sathasivam et al., 2010). They are likely culprits of proteotoxicity (Kim et al., 2016; Leitman et al., 2013; Lu and Palacino, 2013; Matlahov and van der Wel, 2019; Takahashi et al., 2008; Wetzel, 2020). We found that Q zipper nucleation precedes the accumulation of multimers, rather than the other way around. Notwithstanding one report to the contrary (Peskett et al., 2018) -- which we respectfully attribute to a known artifact of the FLAG tag when fused to polyQ (Duennwald et al., 2006a, 2006b; Jiang et al., 2017) -- our findings corroborate prior demonstrations that polyQ (absent flanking domains) does not phase separate prior to amyloid formation, whether expressed in human neuronal cells (Colby et al., 2006; Kakkar et al., 2016), C. elegans body wall muscle cells (Sinnige et al., 2021), or [pin-] yeast cells (Duennwald et al., 2006a, 2006b; Jiang et al., 2017).

Our findings that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation suggests a resolution to the paradox that polyQ diseases are rate-limited by amyloid nucleation despite the implication of non-amyloid species (Kim et al., 2016; Leitman et al., 2013; Lu and Palacino, 2013; Matlahov and van der Wel, 2019; Takahashi et al., 2008; Wetzel, 2020). Likewise, our discovery that polyQ amyloid formation is blocked by oligomerization or phase separation, together with demonstrations that full length Huntingtin protein with a pathologically lengthened polyQ tract can phase separate in cells (Aktar et al., 2019; de Mattos et al., 2022; Peskett et al., 2018; Wan et al., 2021), suggests a simple explanation for the otherwise paradoxical fact that Huntington’s Disease onset and severity do not increase in homozygous individuals (Cubo et al., 2019; Lee et al., 2019; Wexler et al., 1987). Specifically, if mutant Huntingtin forms an endogenous condensate (if only in some microcompartment of the cell) when expressed from just one allele, then the buffering effect of phase separation (Klosin et al., 2020) will prevent a second allele from increasing the concentration of nucleation-competent monomers.

Our data indirectly illuminate the nature of these arrested multimers. For QU, the multimers failed to accumulate to a level detectable by AmFRET, and failed to coalesce to microscopic puncta. They therefore seem to involve only a small fraction of the total protein, presumably limited to soluble species that have acquired a nascent Q zipper. The disordered polyQ globule is extremely viscous and this causes very slow conformational conversion on the tips of growing fibers (Bhattacharyya et al., 2005; Walters et al., 2012). It should therefore be highly susceptible to self-poisoning as has been widely observed and studied for polymer crystals in vitro (Jiang et al., 2016; Ungar and Keller, 1987; Whitelam et al., 2016; Zhang et al., 2020, 2018). In short, the multimers are nascent amyloids that templated their own arrest in the earliest stages of growth following nucleation. We do not yet know if these aborted amyloids contribute to pathology. However, the role of contralateral zippers in their formation is consistent with the apparent protective effect of CAT trinucleotide insertions that preserve unilateral contiguity (QHQHQH) in the polyQ disease protein, SCA1 (Menon et al., 2013; Nethisinghe et al., 2018; Sen et al., 2003), and the absence of bilateral contiguity in the only known functional Q zipper, formed by the neuronal translational regulator, Orb2 (Hervas et al., 2020). We speculate that the toxicity of polyQ amyloid arises from its lamellar architecture.

Concluding remarks

The etiology of polyQ pathology has been elusive. Decades-long efforts by many labs have revealed precise measurements of polyQ aggregation kinetics, atomistic details of the amyloid structure, a catalog of proteotoxic candidates, and snapshots of the conformational preferences of disordered polyQ. What they have not yet led to are treatments. We synthesized those insights to recognize that pathogenesis likely begins with a very specific conformational fluctuation. We set out to characterize the nature of that event in the cellular milieu, deploying a technique we developed to do just that, and arrived at an intramolecular four-stranded polymer crystal. Our findings rationalize key aspects of polyQ diseases, such as length thresholds, kinetics of progression, and involvement of pre-amyloid multimers. More importantly, they illuminate a new avenue for potential treatments. Current therapeutic efforts focused on lowering the levels of mutant Huntingtin have not been successful. As an admittedly radical alternative, we suggest that therapies designed to (further) oligomerize huntingtin preemptively will delay nucleation and thereby decelerate the disease.

Acknowledgements

We thank anonymous reviewers of the Review Commons platform for providing invaluable feedback prior to this manuscript’s submission. We thank Viet Man and Jonathon Russell for assistance with molecular simulations; Rohit Pappu and Wei-feng Xue for helpful discussions; and Mark Miller, Alejandro Rodriguez Gama, and Megan Halfmann for assistance with figure preparation. We thank Alexander I. Alexandrov for plasmids encoding Q4X repeats, early results from which stimulated this project’s inception. This work was funded by the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R01GM130927 (to RH) and the Stowers Institute for Medical Research. A portion of this work was done to fulfill, in part, requirements for a PhD thesis research for T.S.K. as a student registered with the Open University, UK at the Stowers Institute for Medical Research Graduate School, USA. Original data underlying this manuscript can be accessed from the Stowers Original Data Repository at http://www.stowers.org/research/publications/libpb-1494

Materials and Methods

Plasmid and Strain Construction

ORFs were codon optimized for expression in S. cerevisiae, synthesized, and cloned into vector V08 (Khan et al., 2018) by Genscript (Piscataway, NJ, USA). See Table S1 for all full-length protein sequences.

Yeast strain rhy3078a ([PIN+]) was constructed as follows. We first deleted PDR5 and ATG8 from strain rhy1713 (Khan et al., 2018) by sequentially mating and sporulating it with the respective strains from the MATa deletion collection (Open Biosystems). We then used PCR-based mutagenesis (Goldstein and McCusker, 1999) with template vector CX (Miller et al. submitted) to integrate BDFP1.6:1.6 prior to the stop codon of chromosomal PGK1. Yeast strain rhy3082 ([pin-]) is rhy3078a with the amyloid form of Rnq1 eliminated by passaging it four times on YPD plates containing 3 mM GdHCl, a prion-curing agent (Ferreira et al., 2001).

DAmFRET

The yeast strains were transformed using a standard lithium acetate protocol with plasmids encoding the sequence to be tested as a fusion to the indicated tags (Table S1) under the control of the GAL1 promoter.

Individual colonies were picked and incubated in 200µL of a standard synthetic media containing 2% dextrose (SD -ura) overnight while shaking on a Heidolph Titramax-1000 at 1000 rpm at 30°C. Following overnight growth, cells were spun down and resuspended in a synthetic induction media containing 2% galactose (SGal - ura). Cells were induced for 16 hours while shaking before being resuspended in fresh 2% SGal -ura for 4 hours to reduce autofluorescence. 75 µL of cells were then re-arrayed from a 96 well plate to a 384 well plate and photoconverted, while shaking at 800 rpm, for 25 minutes using an OmniCure S1000 fitted with a 320-500 nm (violet) filter and a beam collimator (Exfo), positioned 45 cm above the plate.

High-throughput flow cytometry (10 uL/well, flow speed 1.8) was performed on the Bio-Rad ZE5 with a Propel automation setup and a GX robot (PAA Inc). Donor and FRET signals were collected from a 488 nm laser set to 100 mW, with voltages 351 and 370 respectively, into 525/35 and 593/52 detectors. Acceptor signal was collected from a 561 nm laser set to 50 mW with voltage at 525, into a 589/15 detector. Autofluorescence was collected from a 405 nm laser at 100 mW and voltage 450, into a 460/22 detector.

Compensation was performed manually, collecting files for non-photoconverted mEos3.1 for pure donor fluorescence and dsRed2 to represent acceptor signal as it has a similar spectrum to the red form of mEos3.1. FRET is compensated only in the direction of donor and acceptor fluorescence out of FRET channels, as there is not a pure FRET control. Acceptor fluorescence intensities were divided by side scatter (SSC), a proxy for cell volume (Miller et al. submitted), to convert them to concentration (in arbitrary units). Imaging flow cytometry was conducted as in previous work (Khan et al., 2018; Venkatesan et al., 2019).

DAmFRET Automated Analysis

FCS 3.1 files resulting from assay were gated using an automated R-script running in flowCore. Prior to gating, the forward scatter (FS00.A, FS00.W, FS00.H), side scatter (SS02.A), donor fluorescence (FL03.A) and autofluorescence (FL17.A) channels were transformed using a logicle transform in R. Gating was then done using flowCore by sequentially gating for cells using FS00.A vs SS02.A then selecting for single cells using FS00.H vs FS00.W and finally selecting for expressing cells using FL03.A vs FL17.A.

Gating for cells was done using a rectangular gate with values of Xmin = 2.7, Xmax = 4.8, Ymin = 2.7, Ymax = 4.7). Gating for single cells was done using a rectangular gate with values of Xmin = 4.45, Xmax = 4.58, Ymin = 2.5, Ymax = 4.4. Gating of expressing cells was done using a polygon gate with x/y vertices of (1,0.1), (1.8, 2), (5, 2), (5,0.1). Cells falling within all of these gates were then exported as FCS3.0 files for further analysis.

The FCS files resulting from the autonomous gating in Step 1 were then utilized for the JAVA-based quantification of a curve similar to the analysis found in the original assay (Khan et al 2018). Specifically:

The quantification procedure first divides a defined negative control DAmFRET histogram into 64 logarithmically spaced bins across a pre-determined range large enough to accommodate all potential data sets. Then upper gate values were determined for each bin as the 99th percentile of the DamFRET distribution in that bin. For bins at very low and very high acceptor intensities, there are not enough cells to accurately calculate this gate value. As a result, for bins above the 99 th acceptor percentile and bins below 2 million acceptor intensity units, the upper gate value was set to the value of the nearest valid bin. The upper gate profile was then smoothed by boxcar smoothing with a width of 5 bins and shifted upwards by 0.028 DAmFRET units to ensure that the negative control signal lies completely within the negative FRET gate. The lower gate values for all bins were set equal to -0.2 DamFRET units. For all samples, then, cells falling above this negative FRET gate can be said to contain assembled (FRET-positive) protein. A metric reporting the gross percentage of the expressing cells containing assembled proteins is therefore reported as fgate which is a unitless statistic between 0 and 1.

This gate is then applied to all DAmFRET plots to define cells containing proteins that are either positive (self-assembled) or negative (monomeric). In each of the 64 gates, the fraction of cells in the assembled population were plotted as a ratio to total cells in the gate.

Microscopy

Cells were imaged using a CSU-W1 spinning disc Ti2 microscope (Nikon) and visualized through a 100x Plan Apochromat objective (NA 1.45). mEos3.1 was excited at 488 nm and emission was collected for 50 ms per frame through a ET525/36M bandpass filter on to a Flash 4 camera (Hamamatsu). Full z stacks of all cells were acquired over ∼15 μm total distance with z spacing of 0.2 μm. Transmitted light was collected at the middle of the z stack for reference. To quantify the total intensity of each cell, the z stacks were processed using Fiji (https://imagej.net/software/fiji/). Images were first converted to 32-bit and sum projected in Z. Regions of interest (ROIs) were hand drawn around each cell using the ellipse tool in Fiji on the transmitted light image. These ROIs were then used to measure the area, mean, standard deviation, and integrated density of each cell on the fluorescence channel. Each cell was then classified as being diffuse or punctate by calculating the coefficient of variance (CV, standard deviation divided by the square root of the mean intensity) of the fluorescence. Cells were only considered punctate if their CV was greater than 30. Cells from each category that had equivalent integrated densities were directly compared and the images were plotted on the same intensity scale. The volume of each cell was calculated by fitting the transmitted light hand drawn ROI to an ellipse. The cell was then assumed to be a symmetric ellipsoid with the parameters of the fit ellipse from Fiji. The volume was calculated by 4/3*pi*major*minor*minor, where major and minor are the major and minor axes of the Fiji ellipse fit to the hand drawn ellipse ROI. The volumes reported are the volumes of the 3D ellipsoids. Concentrations were calculated by dividing the integrated densities by the calculated volume in μm3, yielding units of fluorescence per μm3.

Sdd age

Semi-denaturing detergent agarose gel electrophoresis (SDD-AGE) was done as in (Khan et al., 2018). The gel was imaged directly using a GE Typhoon Imaging System using a 488 laser and 525(40) BP filter. Images were then loaded into ImageJ for contrast adjustment. Images were gaussian blurred with a radius of 1 and then background subtracted with a 200 pixel rolling ball. Representative samples were then cropped from the original image for emphasis. Dotted or solid lines denote where different regions of the same gel were aligned for comparison. Line profiles of the protein smear were quantified using the following procedure. All images were processed in Fiji (https://imagej.net/software/fiji/). Gel images were first background subtracted using a rolling ball with a radius of 200 pixels. The images were then rotated so that the orientation was perfectly aligned. A user-defined line was drawn on the image down the first lane. A 10 pixel wide line profile was generated using an in house written plugin “polyline kymograph jru v1”. This line was then programmatically shifted to every other lane and line profiles were generated for each. Each line profile was then normalized to the integral under the line profile using “normalize trajectories jru v1”. From these line profiles, csv sheets were generated and these were imported to python to make line profile plots.

Amyloid Prediction

Comparative analyses were performed using default settings at the listed, public web servers as were available on April 1st, 2021. See Table S2 for further details.

Molecular Simulations

The simulations were carried out using the AMBER 20 package (Case et al., 2020) with ff14SB force field (Maier et al., 2015) and explicit TIP3P water model (Price and Brooks, 2004). The aggregates were placed in a cubic solvation box. In each aggregate, the minimum distance from the aggregate to the box boundary was set to be 12 nm to avoid self-interactions. An 8 Å cutoff was applied for the nonbonded interactions, and the particle mesh Ewald (PME) method (Essmann et al., 1995) was used to calculate the electrostatic interaction with cubic-spline interpolation and a grid spacing of approximately 1 Å. Once the box was set up, we performed a structural optimization with aggregates fixed, and on a second step allowed them to move, followed by a graduate heating procedure (NVT) with individual steps of 200 ps from 0 K to 300 K. Finally, the production runs were carried out using NPT Langevin dynamics with constant pressure of 1 atm at 300 K.

Supplemental Files

Table S1. List of plasmids and sequences.

Table S2. Amyloid predictor output.

Movie S1. Simulation of the pure Q zipper shown in Fig. 2D (top model), showing its persistence.

Movie S2. Simulation of the N-substituted Q zipper shown in Fig. 2D (bottom model), showing the motions of side chains summarized in Fig. S2C.

A. DAmFRET plots of polyQ length variants. Shown are representative plots of biological triplicates.

B. DAmFRET plots of polypeptides composed of tandem repeats of q (subscripted) Qs separated by an N for a total length of 60 residues. Plots are representative of biological triplicates.

C. DAmFRET plots of polypeptides composed of tandem repeats of the indicated N-rich sequences, for a total length of 60 residues, showing negligible nucleation in the absence of a conformational template. Note that because the nominal pattern repeats, “Q1N2”, “Q1N3”, and “Q1N4” are synonymous to “N2Q1”, “N3Q1”, and “N4Q1”, respectively. Plots are representative of biological triplicates.

D. DAmFRET plots of polypeptides composed of tandem repeats of the indicated sequences, for a total length of 60 residues, showing that Q3X and Q5X have a greater amyloid propensity than Q4X regardless of the identity of X. Labels above the boxed regions of the [PIN+] Q4N, Q4G, and Q4H plots indicate the percentage of cells in the high-FRET region, revealing rare but significant nucleation for the latter (p = 0.046 versus Q4N, one-tailed T-test). Plots are representative of biological triplicates.

A. Molecular simulations of model Q zippers formed by a pair of two-stranded antiparallel beta sheets, containing a single serine residue (QQQSQQQ) per strand. The structure is unstable when the S side chains face inward (top), but not when the S side chains face outward (bottom).

B. Simulations of model steric zippers formed by a pair of four-stranded antiparallel beta sheets, containing a single asparagine (top) or serine (bottom) residue per strand. The structure proved less stable in the case of asparagine.

C. As a consequence of the N side chain’s interception of the opposing Q side chain’s H-bond, the Q is no longer anchored in the outstretched configuration and sterically interferes with the ordering of adjacent Qs. This effect propagates through the zipper, resulting in its dissolution.

D. As for N, the side chain of S is too short to H-bond with the opposing backbone. Unlike for N, however, the S side chain is also too short to intercept the opposing Q side chain’s H-bond, allowing the Q to H-bond (black arrow) the backbone amide adjacent to the S. Therefore, whereas Q zippers cannot accommodate internal N residues, they can accommodate sparse internal S residues.

E. Schematic demonstrating how a polar clasp (red dashed line) would preclude Q zipper formation.

F. Schema and frequencies of polar clasps occurring between two unilaterally adjacent Q side chains (left) or a unilaterally adjacent N and Q side chain (right) within a QQQNQQQ peptide, simulated either with (top) or without (bottom) the backbone restrained in a beta conformation. The bar graphs show that polar clasps between Q and N occur less frequently than between Q and Q, indicating that the mechanism of Q zipper destabilization by N side chains cannot be attributed to polar clasps.

G. Schema and lifetimes of H-bonds between exterior stacked (axially adjacent) Q side chains (top) or N and Q side chains (bottom) in the Q zipper simulated in Fig. 2D, showing no difference in stabilities between axial H-bonds between Q and Q/N side chains.

A. DAmFRET plots of the indicated sequence variants with either a C-terminal (EAAAR)4-mEos3.1 fusion, as used throughout this work, an N-terminal mEos3.1-(EAAAR)4 fusion, or a C-terminal (GGGGS)4-mEos3.1 fusion, showing that the sequence-specific differences in relative steady state AmFRET levels do not depend on the linker or terminus fused. Plots are representative of biological triplicates.

B. Fluorescence images of SDD-AGE gels showing the size distributions of SDS-resistant complexes of the indicated mEos3.1-tagged proteins. Left: raw data quantified in Fig. 3C. Right: Additional Q3X, Q4X, Q5X proteins, showing that QU amyloids are consistently smaller than other amyloids, such as those of Q60 and Sup35 PrD (which only nucleates in [PIN+] cells). The solid line shows where the image was spliced, although all lanes are from the same gel. Lysates were normalized by fluorescence to within 50% of each other prior to loading.

C. Histograms of AmFRET values for the indicated gates (at the respective approximate EC50s) for the indicated sequences. The brown gate on the histograms shows the percentage of transitioning cells.

A. DAmFRET plots of Q5N, Q60, and Q8N acquired using imaging flow cytometry, showing gates at high expression for both high- or low-AmFRET populations. Insets show from left to right the distribution of donor, FRET, and acceptor fluorescence, respectively, in representative cells from each gate.

B. Schematic of the rate of polymer crystallization as a function of length, showing a sharp deceleration when the polymer length is equally compatible with either of two polymorphs. Adapted from (Ungar et al., 2005).

C. DAmFRET plots of [pin-] cells expressing unilateral contiguity variants of the Q3N base sequence, showing that at least five unilaterally contiguous glutamines (see schematic) are required for de novo nucleation of single long Q zipper amyloids. Plots are representative of biological triplicates.

D. DAmFRET plots of bilateral contiguity variants (Q4N2, Q6N2, Q8N2), showing that at least six bilaterally contiguous Qs are required for de novo amyloid formation. Numbers indicate the percentage of cells in the high-FRET boxed region, revealing significant nucleation by Q6N2 (p = 0.0004, T-test).

E. DAmFRET plots of Q7N in [pin-] cells treated as indicated for six hours prior to analysis. The boxed region was used to compute histograms of AmFRET in Fig. 4E.

A. DAmFRET plots of cells expressing Q3N (length 60) with the indicated appendage (length 30). Plots are representative of biological triplicates.

B. DAmFRET plots of [pin-] and [PIN-] cells expressing the indicated sequences either unfused (“monomer”) or fused to oDi (“dimer”) for FTH1 (24-mer). Plots are representative of biological triplicates.

C. DAmFRET plots of cells expressing Q60 either with or without oDi and with the indicated linkers and termini of the fusion. Plots are representative of biological triplicates.

D. Quantification of the data in C), showing that oDi reduces Q60 nucleation irrespective of the linker and terminus it is fused to. Shown are means +/-SEM. **, ***, **** p < 0.01, < 0.001, < 0.0001; t-test.

E. DAmFRET plots of cells expressing Q60 either with or without oDi or a monomeric mutant of oDi (oDi{X}). Plots are representative of biological triplicates.

F. Quantification of the data in E), showing that the monomerizing mutation eliminates the amyloid-inhibiting effect of oDi. Shown are means +/-SEM. **** p < 0.0001; t-test.

G. DAmFRET plots of [pin-] cells expressing a synthetic minimal polyQ amyloid-forming sequence, or the same sequence with the tenth Q mutated to N. Quantitation is the same as in Fig. 5C. Plots are representative of biological triplicates.

Summary of results from publically available APR and aggregation propensity predictor webservers. All methods available as of April 2021.