Heterogeneity of the GFP fitness landscape and data-driven protein design
Abstract
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design - instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.
Data availability
All data generated or analysed during this study are included in the manuscript and supporting file and are available on GitHub https://github.com/aequorea238/Orthologous_GFP_Fitness_Peaks
Article and author information
Author details
Funding
European Research Council (771209-CharFL)
- Fyodor A Kondrashov
MRC London Institute of Medical Sciences (UKRI MC-A658-5QEA0)
- Karen S Sarkisyan
President's Grant (МК-5405.2021.1.4)
- Karen S Sarkisyan
Marie Skłodowska-Curie Fellowship (898203)
- Aubin Fleiss
Russian Science Foundation (19-74-10102)
- Alexander S Mishin
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2022, Gonzalez Somermeyer et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 5,503
- views
-
- 726
- downloads
-
- 34
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
Live-cell microscopy routinely provides massive amounts of time-lapse images of complex cellular systems under various physiological or therapeutic conditions. However, this wealth of data remains difficult to interpret in terms of causal effects. Here, we describe CausalXtract, a flexible computational pipeline that discovers causal and possibly time-lagged effects from morphodynamic features and cell–cell interactions in live-cell imaging data. CausalXtract methodology combines network-based and information-based frameworks, which is shown to discover causal effects overlooked by classical Granger and Schreiber causality approaches. We showcase the use of CausalXtract to uncover novel causal effects in a tumor-on-chip cellular ecosystem under therapeutically relevant conditions. In particular, we find that cancer-associated fibroblasts directly inhibit cancer cell apoptosis, independently from anticancer treatment. CausalXtract uncovers also multiple antagonistic effects at different time delays. Hence, CausalXtract provides a unique computational tool to interpret live-cell imaging data for a range of fundamental and translational research applications.
-
- Cell Biology
- Computational and Systems Biology
Induced pluripotent stem cell (iPSC) technology is revolutionizing cell biology. However, the variability between individual iPSC lines and the lack of efficient technology to comprehensively characterize iPSC-derived cell types hinder its adoption in routine preclinical screening settings. To facilitate the validation of iPSC-derived cell culture composition, we have implemented an imaging assay based on cell painting and convolutional neural networks to recognize cell types in dense and mixed cultures with high fidelity. We have benchmarked our approach using pure and mixed cultures of neuroblastoma and astrocytoma cell lines and attained a classification accuracy above 96%. Through iterative data erosion, we found that inputs containing the nuclear region of interest and its close environment, allow achieving equally high classification accuracy as inputs containing the whole cell for semi-confluent cultures and preserved prediction accuracy even in very dense cultures. We then applied this regionally restricted cell profiling approach to evaluate the differentiation status of iPSC-derived neural cultures, by determining the ratio of postmitotic neurons and neural progenitors. We found that the cell-based prediction significantly outperformed an approach in which the population-level time in culture was used as a classification criterion (96% vs 86%, respectively). In mixed iPSC-derived neuronal cultures, microglia could be unequivocally discriminated from neurons, regardless of their reactivity state, and a tiered strategy allowed for further distinguishing activated from non-activated cell states, albeit with lower accuracy. Thus, morphological single-cell profiling provides a means to quantify cell composition in complex mixed neural cultures and holds promise for use in the quality control of iPSC-derived cell culture models.