Amino acid homorepeats act as buffers to maintain proteostasis and constrain the compatible sequence space of proteomes

Yukihiro Murase; Naoki Kitamura; Shotaro Namba; Ayano Satoh; Takashi Makino; Ayako Moriya; Hisao Moriya

doi:10.7554/eLife.110733.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Johannes Herrmann
University of Kaiserslautern, Kaiserslautern, Germany
Senior Editor
David Ron
University of Cambridge, Cambridge, United Kingdom

Reviewer #1 (Public review):

Summary:

The authors created a metric to score the toxicity of specific amino acid homorepeats that accounts for differences in physicochemical properties. This "neutrality" score reflects how often a particular homorepeat appears in nature across the proteomes of different species. This can be used to understand known proteins and their characteristics, as well as inform on the upcoming field of protein design.

Strengths:

This study represents a very careful and thorough study of the amino acid homorepeats and does a remarkable job of accounting for the effects of the fluorescent protein tags.

Weaknesses:

The initial characterization of the neutrality score is missing a control of a known toxic homorepeat to help validate this method of characterizing amino acid homorepeats.

The authors did achieve their aim of developing a metric by which to score the toxicity and properties of amino acid homorepeats. This can be used in the future with other common amino acid motifs that are not homorepeats and can help scientists refine computer models for rational protein design.

https://doi.org/10.7554/eLife.110733.1.sa2

Reviewer #2 (Public review):

Summary

The aim of this study was to assess which amino acid stretches are tolerated/favoured in the course of evolution, considering their physico-chemical properties, metabolic costs and proteotoxicity. To address this question, the authors expressed PolyX variants in yeast, E. coli and also referred to COS cells. The PolyX constructs were tagged with GFP or a different fluorescence reporter to assess expression levels and localization at the C-terminus with or without a cleavable linker or to study topological effects. The PolyX stretch was also embedded between two different fluorescent proteins. The authors used growth rate and expression levels as judged by fluorescence intensities to calculate the relative neutrality in comparison to GFP alone.

They could show that harmful/beneficial effects depend on the specific amino acid (aa) and polar aa are tolerated well, whereas hydrophobic and positively charged aa are harmful to the cell. This is not surprising as hydrophobic and positively charged aa are known to be aggregation-prone. They could further show that the topology matters for some, but not all, PolyX variants. The PolyX stretch can affect the subcellular localization and aggregation propensity of the GFP it is fused to. Interestingly, overexpression of PolyG, PolyQ or PolyS was not harmful, and overexpression of PolyE was potentially even beneficial for the cell. The authors concluded their study with a theoretical analysis of the presence of aa stretches in various species and identified a high correlation between their expression in yeast and other species, suggesting that the selection of aa stretches is conserved and follows biochemical rules (trade-off between tolerance of expression levels, solubility, sub-cellular localization, and maybe metabolic costs).

Strengths:

The authors performed a high number of experiments and systematically assessed the expression and tolerance of 10mer stretches of 20 aa fused to GFP or other fluorophores in yeast and E. coli. This is an impressive effort.

Weaknesses:

(1) The analysis of expression levels of the various PolyX variants should not rely only on fluorescence intensities. The fusion of the PolyX stretch may affect the fluorescence properties (brightness, photostability) of the fluorescent partner and may or may not affect abundance. A quantitative analysis of PolyX-GFP (same applies to the other fusion constructs shown in Figure 3) is needed. Preferably by an MS-based proteomic analysis via peptide count. Western blot is less ideal as it would rely on epitope recognition of the respective antibody, and the epitope accessibility might be altered upon fusion with different PolyX stretches. In addition, the authors should analyse the PolyX stretch without an attached fluorophore as a control.

(2) The images shown in Figure 4 are not very informative. The constructs should be subjected to FRAP to assess the solubility of the PolyX variants and Ssa1 (Hsp70). FCS could be an alternative as well.

(3) The observation of the lack of mCherry fluorescence for PolyK and PolyP (Figure 4) can also be interpreted as an instability of the fusion protein (partial truncation and degradation) or quenching. The authors should test different fluorophores and different linker lengths between the PolyX stretch and the fluorophores. Fluorophore swapping (N/C-terminally) would also be a good control.

(4) The study would benefit from a consideration of a large body of literature on protein aggregation and the contribution of amino acid composition. The here identified amino acids that as 10mer stretch are harmful to the cell and are known to be aggregation-prone and are also recognised by molecular chaperones to prevent their aggregation.

(5) The study could further benefit from ex vivo and in vitro analyses of the PolyX constructs. They could isolate the PolyX variants and study their solubility by, e.g. light scattering outside of the cellular context.

https://doi.org/10.7554/eLife.110733.1.sa1

Reviewer #3 (Public review):

Summary:

The constraints limiting the usage of especially repetitive amino acid sequences in proteins remain enigmatic. In their manuscript, Murase et al. analyse the impact of polyamino acid homorepeats (PolyX) on the expression of EGFP-variants with PolyX modifications. Introducing a new measure, relative neutrality, allows us to rate beneficial versus harmful sequences. The authors find that especially hydrophobic repeats (I, V, W, F, Y) show harmful effects on the respective proteins, enhancing their aggregation. Hydrophilic repeats (E, S, N, Q), on the other hand, show beneficial properties but suppress proteotoxic stress. Interestingly, these observations correlate with the occurrence of such PolyX in natural proteins across the proteomes of different organisms.

Strengths:

The manuscript seems especially valuable in the context of rational or de novo protein design. The observations on the one hand should allow for enhancing the solubility of proteins by using beneficial PolyX. On the other hand, they explain very well why some PolyX do not occur in natural proteins. The authors present a sound, broad and well-analysed dataset. The study is well designed, the manuscript is very well written, and the conclusions drawn are overall valid.

Weaknesses:

The whole data set relies on the definition of the newly introduced "relative neutrality" score. Besides being a well-chosen tool, this score is limited and biased as it does not directly include a measure for "solubility" but relies on "fluorescence emission" derived from the respective EGFP-fusion-proteins.

A second major weakness is that the influence of PolyX-modifications on secondary structure is neither analysed nor discussed.

https://doi.org/10.7554/eLife.110733.1.sa0

Amino acid homorepeats act as buffers to maintain proteostasis and constrain the compatible sequence space of proteomes

Peer review process

Editors

Be the first to read new articles from eLife