Epithelialtomesenchymal transition proceeds through directional destabilization of multidimensional attractor
Abstract
How a cell changes from one stable phenotype to another one is a fundamental problem in developmental and cell biology. Mathematically, a stable phenotype corresponds to a stable attractor in a generally multidimensional state space, which needs to be destabilized so the cell relaxes to a new attractor. Two basic mechanisms for destabilizing a stable fixed point, pitchfork and saddlenode bifurcations, have been extensively studied theoretically; however, direct experimental investigation at the singlecell level remains scarce. Here, we performed live cell imaging studies and analyses in the framework of dynamical systems theories on epithelialtomesenchymal transition (EMT). While some mechanistic details remain controversial, EMT is a cell phenotypic transition (CPT) process central to development and pathology. Through timelapse imaging we recorded single cell trajectories of human A549/VimRFP cells undergoing EMT induced by different concentrations of exogenous TGFβ in a multidimensional cell feature space. The trajectories clustered into two distinct groups, indicating that the transition dynamics proceeds through parallel paths. We then reconstructed the reaction coordinates and the corresponding quasipotentials from the trajectories. The potentials revealed a plausible mechanism for the emergence of the two paths where the original stable epithelial attractor collides with two saddle points sequentially with increased TGFβ concentration, and relaxes to a new one. Functionally, the directional saddlenode bifurcation ensures a CPT proceeds towards a specific cell type, as a mechanistic realization of the canalization idea proposed by Waddington.
Editor's evaluation
This is a multifaceted study of the epithelial to mesenchymal transition (EMT) in live cells. EMT is relevant for cancer, development, and wound healing. The authors were able to discern two possible cell transition path categories without multicolor labeling or other advanced experimental approaches, which could be impactful for other studies. The study draws on a wide range of experimental, data science, and modelling tools and techniques.
https://doi.org/10.7554/eLife.74866.sa0eLife digest
Cells with the same genetic code can take on many different formss, or phenotypes, which have distinct roles and appearances. Sometimes cells switch from one phenotype to another as part of healthy growth or during disease. One such change is the epithelialtomesenchymal transition (EMT), which is involved in fetal development, wound healing and the spread of cancer cells.
During EMT, closely connected epithelial cells detach from one another and change into mesenchymal cells that are able to migrate. Cells undergo a number of changes during this transition; however, the path they take to reach their new form is not entirely clear. For instance, do all cells follow the same route, or are there multiple ways that cells can shift from one state to the next?
To address this question, Wang et al. studied individual lung cancer cells that had been treated with a protein that drives EMT. The cells were then imaged at regular intervals over the course of two to three days to see how they changed in response to different concentrations of protein. Using a mathematical analysis designed to study chemical reactions, Wang et al. showed that the cells transform into the mesenchymal phenotype through two main routes.
This result suggests that attempts to prevent EMT, in cancer treatment for instance, would require blocking both paths taken by the cells. This information could be useful for biomedical researchers trying to regulate the EMT process. The quantitative approach of this study could also help physicists and mathematicians study other types of transition that occur in biology.
Introduction
Cells of multicellular organisms assume different phenotypes that can have drastically different functions, morphologies, and gene expression patterns, and can undergo distinct changes when subjected to specific stimuli and microenvironments. Examples of cell phenotypic transitions (CPTs) include cell differentiation during development and induced and spontaneous cell fate transition such as reprogramming and transdifferentiation. Epithelialtomesenchymal transition (EMT) is a prototypic progress that has been extensively studied due to its significance in cell and developmental biology as well as in biomedical research (Figure 1a). Recent advances in singlecell techniques have further accelerated the longterm efforts on unraveling the mechanisms of CPTs, for understanding processes such as differentiation and reprogramming in developmental and cell biology, and for potential biomedical implications of modulating cell phenotypes in regenerative medicine and diseases such as cancer and chronic diseases (Wagner and Klein, 2020; Weinreb et al., 2020).
Considering cell as a dynamical system, understanding the CPT process from a dynamical systems theory perspective is an intriguing longstanding question in mathematical and systems biology. Mathematically, at any time a cell state can be specified by a set of state variables, such as gene expression levels, or other collective cell feature variables. A stable cell phenotype is an attractor in a highdimensional cell state space formed by the state variables (Huang et al., 2005), and a CPT is a transition between different attractors. Waddington famously made an analogy comparing developmental processes to a ball sliding down a potential landscape, and diverging into multiple paths at some bifurcation points (Rand et al., 2021). This picture exemplifies a pitchfork bifurcation, where an original stable attractor turns to an unstable saddle point together with the emergence of two new attractors (Figure 1b top). Another welldiscussed mechanism for attractortoattractor transition is through a series of saddlenode bifurcations, where first a new fixed point emerges, splits into a saddle point and attractor, then the saddle point separating the two attractors moves toward and eventually collides with the old attractor to destabilize the latter (Figure 1b bottom). A notable difference between the two types of bifurcations is that after a pitchfork bifurcation the original state can relax to multiple new stable states, while the saddlenode bifurcation the system is directed to relax to one attractor. A fundamental theoretical question is which mechanism a CPT process assumes, and why.
Pitchfork and saddlenode bifurcation are two important theoretical mechanisms of critical state transition and have been discussed in the context of CPTs (Mojtahedi et al., 2016). A new challenge then is how to evaluate various mathematical models experimentally at single cell levels. Several studies have suggested that analyzing snapshot singlecell genomics data (Mojtahedi et al., 2016; Chen et al., 2012), fixed cell data, to understand critical state transitions cannot provide temporal information on how individual cells transition. In addition, information from fluorescencebased live cell imaging is typically restricted to a small number of molecular species. Thus, tracking a small number of molecular species through live cell imaging cannot provide the collective transition dynamics due to the intrinsic highdimensional nature.
Specifically in the context of EMT, theoretical studies suggest that EMT proceeds as a saddlenode bifurcation (Tian et al., 2013); however, there is no direct experimental study on the singlecell transition dynamics. As recognized in a consensus statement from researchers in the EMT field, several open questions and challenges exist on understanding the mechanisms of the EMT process (Yang et al., 2020). For example, it is unclear whether the process proceeds as hopping among a small number of discrete and distinct intermediate states, or a continuum of such states with no clear boundary. The transition may proceed either along a linear array of states or through multiple parallel paths. Pseudotime analyses of high throughput single cell genomics studies infer that EMT proceeds through a 1D continuum path (McFalineFigueroa et al., 2019), consistent with a prevalent EMT axis concept with the epithelial and the mesenchymal states as the two end states (Nieto et al., 2016). These predictions, which are indirectly inferred from snapshot singlecell data, require direct testing by tracking single cells over time. However, live cell imaging studies are impeded because EMT status cannot be assessed based on only a small number of molecular markers such as key transcriptional factors (Yang et al., 2020).
To address the above challenge when studying EMT dynamics, recently we developed a platform of tracking cell state change in a composite cell feature space that is accessible for multiplex and longterm live cell imaging (Wang et al., 2020). Here, we first apply the platform to study EMT in a human A549 derivative cell line with endogenous vimentinRFP labeling (ATCC CCL185EMT, denoted as A549/VimRFP in later discussions) induced by different concentrations of TGFβ. Then we analyze an ensemble of recorded multidimensional singlecell trajectories within the framework of reaction rate theories that have been a focused subject in the context of physics and chemistry.
Results
Singlecell trajectories are mathematically represented in a collective morphology/texture feature space
For quantitative studies of CPT dynamics, one first needs to establish a mathematical representation of the cell states and cell trajectories. Mathematically, one can represent a cell state by a point in a multidimensional space defined by gene expression (Ye and Sarkar, 2018) or other cell properties (Gordonov et al., 2016; Chang and Marshall, 2019; Kimmel et al., 2018). Noticing that cells have phenotypespecific morphological features that can be monitored even with transmission light microscopes, recently we developed a framework that defines cell states in a combined morphology and texture feature space (Wang et al., 2020). The framework allows one to trace individual cell trajectories during a CPT process through livecell imaging. We applied the framework to study the TGFβ induced EMT with the A549/VimRFP cells (Figure 2; Wang et al., 2020). Vimentin is a type of intermediate filament protein commonly used as a mesenchymal marker, and changes its expression and spatial distribution during EMT (Zhang et al., 2014). Furthermore, during EMT cells undergo dramatic change of cell morphology accompanying gene expression change (Figure 1a; Zhang et al., 2014; Zhang et al., 2019). Accordingly, we used a combination of vimentin texture and cell shape features to specify a cell state.
We first performed timelapse imaging of the EMT process of A549/VimRFP induced with 4 ng/ml TGFβ (Figure 2a, Materials and methods Wagner and Klein, 2020; Weinreb et al., 2020), then performed singlecell segmentation and tracking on the acquired images (Figure 2b, Materials and methods Huang et al., 2005). We quantified the images with an active shape model (Cootes et al., 1995) and performed principal component analysis (PCA) to form a set of orthonormal basis vectors of collective variables, which include cell body shape of 296 degrees of freedom (DoF) quantified by an active shape model (Cootes et al., 1995), and texture features of cellular vimentin distribution quantified by 13 Haralick features (Haralick, 1979; Figure 2c, Materials and methods Huang et al., 2005). Then the state of a cell at a given instant is represented as a point in the 309dimensional composite morphology/texture feature space (Figure 2d), and the temporal evolution of the state forms a continuous trajectory in the space subject to further theoretical analyses (Figure 2e).
Transition path analyses identify an ensemble of reactive singlecell trajectories
Before TGFβ treatment, a population of cells assume a localized stationary distribution in this 309dimensional composite space, and most cells are epithelial (Figure 3a, blue). TGFβ treatment destabilizes such distribution, and the cells relax into a new stationary distribution dominated by mesenchymal cells (Figure 3a red). We recorded 204 continuous trajectories in the state space. A representative trajectory shown in Figure 2e (right) (also in Video 1) reveals how a cell transits stepbystep from an epithelial cell with convex polygon shapes and a localized intracellular vimentin distribution, to the mesenchymal phenotype with elongated spear shapes and a dispersive vimentin distribution.
Next, we applied rate theory analyses on the recorded singlecell trajectories. Rate theory studies how a system escapes from a metastable state, or relaxation from one stationary distribution to a new one (Hänggi et al., 1990). Specifically, transition path theory (TPT) is a modern development in the field (Weinan et al., 2005; Rohrdanz et al., 2013; Bolhuis et al., 2002; Weinan and VandenEijnden, 2010). Within the TPT framework, one divides the state space into regions containing the initial (A) and final (B) attractors, and an intermediate (I) region. A reactive trajectory is one that originates from region A, and enters region I then B before reentering region A (Figure 3b). The reactive trajectories form an ensemble of transition paths that connect regions A and B.
Accordingly, we divided the fourdimensional principal component subspace into epithelial (E), intermediate (I), and mesenchymal (M) regions (Figure 2e left) (Wang et al., 2020). With this division of space, we identified a subgroup of 135 recorded singlecell trajectories that form an ensemble of reactive trajectories that connect E and M by day 2. It should be noted that the practical definition of reactive trajectories here differs from the theoretical definition in classical TPT. In classical TPT, one runs a trajectory of infinite length that travel back and forth between the initial and final regions numerous times, and a reactive trajectory refers to a segment that starts from the initial region and ends in the final region.
TGFβ induced EMT in A549/VimRFP cells proceeds through parallel transition paths
We first examined whether EMT proceeds through one or multiple types of paths by analyzing the ensemble of reactive trajectories using selforganizing map (SOM) (Materials and methods Rand et al., 2021). SOM is an unsupervised artificial neural network that utilizes a neighborhood function to represent the topology structure of input data (Kohonen, 1982). The algorithm clusters the recorded cell states into 144 discrete states, and represents the EMT process as a Markovian transition process among these states (Figure 3—figure supplement 1). Shortest path analysis using the Dijkstra algorithm (Dijkstra, 1959) over the transition matrix reveals two groups of paths: vimentin Haralick PC1 varies first, and concerted variation of morphology PC1 and vimentin Haralick PC1, with finite probabilities of transition between the two groups (Figure 3c). This result is consistent with our previous trajectory clustering analysis using dynamics time warping (DTW) distance between reactive trajectories (Wang et al., 2020). To further support this conclusion, we examined the density of reactive trajectories ρ_{R}, in the plane of morphology PC1 and vimentin Haralick PC1 (Materials and methods Mojtahedi et al., 2016). The contour map of ρ_{R} shows two peaks corresponding to the two groups of shorted paths in the directed network (Figure 3d). The peak where vimentin Haralick PC1 varies first is higher than the peak of concerted variation, indicating more reactive trajectories along this path. The distinct features of the two types of trajectories are apparent from the two representative singlecell trajectories shown in Figure 3e.
A revised string method is developed to reconstruct reaction coordinate from singlecell trajectories
Due to the continuous and stochastic nature of the system dynamics there are infinite number of reactive trajectories, which mainly concentrate within a tube in the state space that connects regions A and B (Figure 3b; Weinan and VandenEijnden, 2010). The center of the tube has been used to define a reaction coordinate (RC), a concept central to rate theories (Figure 4a; VandenEijnden and Venturoli, 2009). A RC is a onedimensional geometric parameter (denoted as s in the subsequent discussions) that describes the progression along a continuous reaction path defined in the state space (Rohrdanz et al., 2013). A good choice of RC can provide mechanistic insight on how the transition process proceeds.
Therefore, we set to reconstruct the RC for each path with the recorded reactive trajectories using a revised finite temperature string method, which was originally designed for RC reconstruction for theoretical and computational modeling systems (VandenEijnden and Venturoli, 2009; Allen et al., 2005; Dickson et al., 2009). The method first approximates the RC by a set of discrete image points that are uniformly distributed along the arc length of the RC, so the multidimensional state space can be divided by a 1D array of Voronoi polyhedra containing individual images, and the data points can be assigned to individual Voronoi cells (Figure 4a). Starting with a trial RC to define the initial division of the Voronoi cells, one optimizes the trial RC iteratively by minimizing the distance dispersion between the string point and sample points within each Voronoi cell (Figure 4—figure supplement 1). Since here we have an ensemble of continuous trajectories, we modified the iteration procedure slightly. Specifically, we minimized both the distance between the ensemble of measured reactive trajectories and the image point within each individual Voronoi cell, as well as the overall distance between each individual trajectory and the trial RC (Materials and methods Chen et al., 2012).
Therefore, we grouped the reactive trajectories based on the DTW distance, then identified the RCs for each group separately following the modified string method. The iteration procedure gives the RC of each path (${s}_{1}$ and ${s}_{2}$) that characterizes common features of the reactive trajectories for the TGFβ induced EMT in A549/VimRFP cells (Figure 4b). The recorded trajectories fluctuate around the RCs and form reaction tubes as expected from the TPT theory (Figure 4c). Along each RC, the cell shape changes dominantly through elongation and growth (Figure 4d & e, left), and most of the 13 vimentin Haralick features increase or decrease monotonically and continuously over time (Figure 4d & e, right) (Materials and Methods Tian et al., 2013). The two RCs first diverge from the E region to follow two distinct paths, then converge within the M region. In one group (Figure 4d), most of the Haralick feature changes take place before major morphology change. For the group with concerted dynamics (Figure 4e), both cell shape and Haralick features vary gradually along the RC.
Reconstructed reaction coordinate and quasipotentials reveal EMT as relaxation along continuum manifolds
Considering a cell as a dynamical system, with the singlecell trajectories one may formulate an inverse problem to reconstruct the underlying dynamical equations governing the EMT transition. Let’s take a minimal ansatz that assumes the dynamics of the collective variables (x) can be described by a set of Langevin equations in the morphology/texture feature space, $d\mathbf{x}/dt=\mathbf{F}\left(\mathbf{x}\right)+\eta \left(\mathbf{x},t\right)$, where F(x) is a governing vector field, and η are white Gaussian noises with zero mean. Then in principle one can reconstruct F(x) from the singlecell trajectory data, $\mathrm{F}(\mathrm{x})=\u27e8\mathrm{d}\mathit{x}/\mathrm{d}\mathrm{t}\u27e9$, averaged over the neighborhood of each x. Notice for the ensemble average one needs to use all the reactive and nonreactive trajectories.
For mathematical simplicity and better numerical convergence, here we restricted to reconstructing the dynamics along the RC (Figure 5a). The ansatz becomes a 1D convectiondiffusion process, $ds/dt=d\varphi /ds+\eta$. Notice that for a 1D system even without detailed balance one can define a quasiscalar potential ϕ (Xing, 2010; Xing and Kim, 2011; Risken, 1996). This quasipotential corresponds to a potential of mean force in the case of a conserved system. To confirm that the ansatz of using a memoryless 1D Langevin equation is a good approximation for the EMT dynamics along a RC, we performed the ChapmanKolmogorov Test (CKtest). The test shows that the onestep transition matrix can indeed predict the dynamics on longer time scales, and thus the Markovian assumption is a good zero^{th}order approximation of the EMT dynamics (Figure 5—figure supplement 1, Materials and Methods Yang et al., 2020).
To reconstruct the dynamical equations, we used the measured trajectories and instant velocities (along the RC direction) as input. An enlarged view in Figure 5b shows the instant velocities (ds/dt) of various trajectories segments within Voronoi cells. Numerically, we related the potential gradient dϕ/ds with the mean velocity within the i_{th} Voronoi cell through averaging over all recorded reactive and nonreactive trajectory segments that locate within the Voronoi cell at any time t (Figure 5a & c, Materials and methods McFalineFigueroa et al., 2019; Nieto et al., 2016). On the obtained curve of dϕ/ds vs. s (Figure 5—figure supplement 2), the zeroes correspond to stationary points of the potential. We then reconstructed the quasipotential through integrating over s, $\varphi ({s}_{i})=\varphi ({s}_{0})+{\int}_{{s}_{0}}^{{s}_{i}}(d\varphi /ds)ds$ (Figure 5d). We also obtained the quasipotential of the untreated cells (Figure 5e) from the steady state distribution of untreated cells along the RC, ${\varphi}_{\text{0}}\propto \mathrm{log}{p}_{ss}$.
The reconstructed RCs and quasipotentials provide mechanistic insight on the transition. Before TGFβ induction, cells reside on the untreated cell potential centered with a potential well corresponding to the epithelial attractor (Figure 5e). After induction, the system relaxes following the new potential into a new well corresponding to the mesenchymal attractor. Notably, in the new potential the original epithelial attractor disappears, reflecting that the epithelial phenotype is destabilized under the applied 4 ng/ml TGFβ concentration.
With the reconstructed quasipotentials and variance of dϕ/ds obtained from experiment, we solved the FokkerPlanck (FP) equation corresponding to the Langevin equation to predict the steady distribution vs. RC (Materials and methods Wang et al., 2020). While the dynamical equations were reconstructed from trajectories during the transition process and with a minimal Langevin equation ansatz, the predicted stationary distribution have average values and standard deviations, (17.2, 4.6) and (14.7, 4.4) for the two paths, respectively, close to those calculated from the distributions sampled from cells after 39 h TGFβ treatment, (16.0, 5.0) and (16.2, 4.0), respectively (Figure 5—figure supplement 3a and b).
Lowering TGFβ concentration leads to relaxation to a new partial EMT attractor through two parallel paths
Notice that quasipotential with 4 ng/ml TGFβ for the Vimentin varies first path shows a plateau around s_{1} = 8 (Figure 5d). We hypothesized that this plateau is a remnant of the original epithelial attractor, and if so we expect to observe a flatter plateau or even a metastable attractor with a lower TGFβ concentration. Therefore, we treated A549/VimRFP cells with 1 ng/ml TGFβ and recorded 135 3day long singlecell trajectories. Among the recorded trajectories 65 indeed reached the M region. The reactive trajectories also cluster into two groups (Figure 5—figure supplement 4). For the purpose of comparison, we projected the trajectories onto the RCs obtained from the 4 ng/ml TGFβ and reconstructed the quasipotentials in the case of 1 ng/ml of TGFβ. Indeed, the quasipotential for the path with vimentin varying first has a plateau flatter than that of the 4 ng/ml TGFβtreated cells. The quasipotential of another path, however, does not show such flattening.
In the EMT field, it is under debate whether EMT has a discrete or continuum spectrum of intermediate states (Yang et al., 2020). Singlecell transcriptomic studies reveal intermediate states (Cook and Vanderhyden, 2020). If EMT proceeds through a set of discrete and definite states, we expect to observe attractors, i.e., potential wells, corresponding to these states. In addition, lowering the TGFβ concentration would lead to an increased barrier between neighboring wells and cells trapped at some intermediate state for a long time. While all the quasipotentials at the two TGFβ concentrations reveal destabilization of the epithelial attractor, they do not have multiple attractors anticipated for discrete EMT states (Figure 5e dϕ/ds in Figure 5—figure supplement 5a, b). Compared to those at 4 ng/ml TGFβ, both quasipotentials at 1 ng/ml TGFβ have the new attractor closer to the epithelial state. Examination of individual trajectories also reveals that cells with 1 ng/ml TGFβ treatment leave the E region and mostly reside around a new attractor in the I region (Figure 5—figure supplement 5c, d). Some trajectories fluctuate within the attractor and only transiently reach the M region. That is, cells reach a new stable phenotype whose degree of mesenchymal features depends on the TGFβ concentration, which is consistent with previous studies on MCF10A cells treated with different concentrations of TGFβ (Zhang et al., 2014). Fluctuations of singlecell trajectories limit the state resolution in the cell feature space and prevent us from telling whether the two paths reach one common intermediate state or two distinct ones. To examine how the number of Voronoi cells affects the quasipotential properties, we varied the number of discretization points along the RCs and did the same analysis to reconstruct the quasipotential (Figure 5—figure supplement 6a and b). The reconstructed potentials have similar shapes, although show different smoothness.
Discussion
Cell type regulation is an important topic in mathematical and systems biology, and several theoretical and computational studies on modeling CPT systems have been performed in the context of rate theories (Aurell and Sneppen, 2002; Brackston et al., 2018; Morelli et al., 2008; Wang et al., 2014; Ge et al., 2015; Wang et al., 2011; Tse et al., 2018). Recent advances in singlecell techniques have catalyzed quantitative measurements on the dynamics of CPTs, and have opened new questions on how to quantify and analyze singlecell data in the framework of dynamical systems theory and how it compares with theoretical results. In our previous studies (Wang et al., 2020), we developed an integrated live cell imaging and image analysis framework for quantifying and representing live cell imaging data. In this work, we further showed that the mathematical representation allows one to analyze the single cell trajectories and study CPT dynamics using rate theories for quantitative mechanistic insights (Figure 5—figure supplement 7).
A longheld concept in cell and developmental biology is that cells exist as discrete and distinct phenotypes. Single cell genomics measurements have challenged such views and have revealed that cells move along continuum manifolds. The live cell imaging studies presented here support such a picture by demonstrating EMT as a relaxation along a continuum manifold, which is also supported by single cell RNAseq studies (McFalineFigueroa et al., 2019).The highthroughput singlecell data provides high dimensional information, but identifying the intermediate states in the snapshot data requires the development of new dimension reduction and clustering methods (MacLean et al., 2018). Compared to the snapshot studies, live cell trajectories reveal that the EMT trajectories can be clustered as fluctuating around two distinct paths that connect the epithelial and mesenchymal phenotypes. The existence of two types of transition paths suggest that the previously proposed conceptual 1D EMT axis (Nieto et al., 2016) is insufficient for understanding how EMT proceeds, and puts dynamics inferred indirectly from snapshot data questionable (McFalineFigueroa et al., 2019). Extracting multidimensional features from live cell imaging data provides a complementary strategy, and further development along this direction will benefit from advances of imaging technology and algorithm (Christiansen et al., 2018; Ounkomol et al., 2018).
In rate theories, identifying an appropriate RC provides mechanistic insights of a transition process. For example, recognizing collective solvent reorganization as part of the reaction coordinate is a key part of Marcus’s electron transfer theory (Marcus, 1993). Adopting the fraction of native contacts as a RC also historically plays a key role in the development of protein folding theories. Therefore, continuous efforts have been made on defining an optimal RC for a complex dynamical system (Li and Ma, 2014; Best and Hummer, 2005; Fukui, 2002), and it is even questionable to assume that the system dynamics can be well presented by a 1D RC, as suggested by theoretical analysis on protein folding (Udgaonkar, 2008). Indeed, the analyses presented in this study shows such a breakdown of this basic assumption.
In the introduction, we discussed two basic mechanisms of critical state transitions. For TGFβ induced EMT in A549 cells, analysis of the trajectories reveals that mathematically multiple paths may originate from destabilization of a multidimensional epithelial attractor through colliding with multiple saddle points sequentially. We illustrated such a mechanism in Figure 5f and in Video 2 schematically with a metaphorical potential system. With no or low TGFβ, the system resides in the epithelial attractor (Figure 5f, first). Adding TGFβ leads to the appearance of a new (mesenchymal) attractor, and elevation of the epithelial attractor (Figure 5f, second). The epithelial attractor approaches and collides first with a saddle point to form a barrierless (concertedvariation) path to the new attractor (Figure 5f, third). At this TGFβ concentration (e.g. 1 ng/ml), some barrier still exists along an alternative (vimentinfirst) path, which then disappears with the increase of TGFβ concentration again (as revealed in Figure 5d) through saddle node collision (Figure 5f, fourth). Indeed, we observed the reactive trajectories predominantly assume the concertedvariation path under 1 ng/ml TGFβ treatment (Figure 5—figure supplement 4a), while the vimentinfirst path becomes dominant under 4 ng/ml TGFβ treatment (Figure 5—figure supplement 4b). It should be pointed out that this sequential saddlenode bifurcation mechanism is only a plausible one that can explain the experimental data. Multidimensional systems can show complex bifurcation patterns (Rand et al., 2021; Kheir Gouda et al., 2019). With sufficient singlecell trajectories, one can extend the procedure described in this work and in Qiu et al., 2022 to reconstruct the multidimensional vector field and the full governing FokkerPlanck equations directly. The full model will help address questions on the role of noise in CPTs (Balázsi et al., 2011).
One major limitation of the minimal Langevin equation ansatz used in this work is that the dynamics is assumed to be Markovian, that is, the temporal evolution of the system depends only on the current but not previous cell state. A cell has a much larger number of degrees of freedom than what can be measured explicitly through livecell imaging, and these implicit degrees of freedom result in possible nonMarkovian dynamics. Future studies may use the generalized Langevin equation formalism that takes into account the nonMarkovian effects. In addition, we used the added exogenous TGFβ concentration as a control parameter, as in previous studies (Tian et al., 2013; Zhang et al., 2014). Cells also secrete endogenous TGFβ, and interact with other cells, so the extracellular microenvironment for individual cells is in general spatially and temporally changing. In future studies, one can relax the meanfield approximation assumed in this work. Experimentally, one can use microfluidic chambers for more precise control of extracellular environments. Furthermore, one needs to identify the expression profiles of cell states along the two paths for deeper mechanistic understanding how the two paths emerge and how one can modulate the EMT dynamics, possibly through combining live cell imaging and fixedcell measurements.
In summary, through an integrated framework of live cell imaging and singlecell trajectories with dynamical systems theories we obtained quantitative mechanistic insights of TGFβ induced EMT as a prototype for CPTs in general. Since a cell is a complex dynamical system with many strongly coupled degrees of freedom and a broad range of relevant time scales, further development will benefit from a finer resolution of cell state through including additional measurable cell features.
Materials and methods
Cell line
Request a detailed protocolA549/VIMRFP cells were obtained from (ATCC CCL185EMT, RRID:CVCL_LI35), and cells within 5–15 generations were used in the studies. Cells were cultured in F12K medium (Corning) with 10% fetal bovine serum (FBS) in MatTek glass bottom culture dishes (P35G0–10 C) in a humidified atmosphere at 37 °C and 5% CO2, as detailed in Wang et al., 2020. The cell line was authenticated as A549 cells with short tandem repeat (STR) analysis by University of Arizona Genetics Core (UAGC). The result of mycoplasma contamination test (MycoFluor Mycoplasma Detection Kit Molecular Probes, M7006) was negative.
Timelapse imaging
Request a detailed protocolTimelapse images were taken with a Nikon Ti2E microscope with differential interference contrast (DIC) and TRITC channels (Excitation wavelength is 555 nm and Emission wavelength is 587) (20× objective, N.A. = 0.75). The cell culture condition was maintained with Tokai Hit Microscope Stage Top Incubator. For the 4 ng/ml TGFβ (R&D Systems 240B) experiment, cells were imaged every 5 min with the DIC channel and every 10 min with the TRITC channel for 2 days. The exposure time for DIC was 100 ms and the exposure time for the TRITC channel was 30 ms. For the 1 ng/ml TGFβ experiment, cells were imaged every 5 min with the DIC channel and every 15 min with the TRITC channel for 3 days. The exposure time for DIC was 100ms and the exposure time for the TRITC channel was 30ms. While taking the images, all the imaging fields were chosen randomly.
Trajectory analyses
Request a detailed protocolAfter singlecell segmentation, we performed single cell tracking with CellProfiler (RRID:SCR_007358) using a linear assignment problem (LAP) method (Jaqaman et al., 2008; Carpenter et al., 2006). To reduce the influence of over and undersegmentation on the trajectory analysis, we utilized the cell tracking to detect these false segmentations through calculating the overlap relationship between single cells in the consecutive frames. If a cell in one frame and its maximum overlap cell in the next frame do not belong to the same trajectory, cells that overlap with the two cells were manually checked. Specifically, we trained an CNN (ResNet) to identify the over and undersegmented cells (He et al., 2016). To increase accuracy, we used both the prediction results from this CNN, and other criteria. For instance, falsely segmented cells usually do not exist for a long duration. We used the lasting duration (30 minutes) to identify possible over and undersegmentation. The identified falsely segmented cells were removed from the trajectories, and the broken trajectories were relinked using the LAP method.
We represented the state of a cell by its morphology and Vimentin Haralick features (Wang et al., 2020). The morphology was described with the active shape model (Pincus and Theriot, 2007). Calculation of Haralick features was based on the graylevel cooccurrence matrix G. $\mathrm{G}}_{\mathrm{i}\mathrm{j}$ is the frequency of observing the graylevels values of two neighboring pixels i and j, respectively. It was normalized by the total counts. In 2D images, there are four directions of such neighbors (0°, 45°, 90°, and 135°). To be rotation invariant, each feature was calculated on all the four directions of G matrix and averaged over them. Haralick features include the following features: 1 Angular Second Moment; 2: Contrast; 3: Correlation; 4: Sum of Squares: Variance; 5: Inverse Difference Moment; 6: Sum Average; 7: Sum Variance; 8: Sum Entropy; 9: Entropy; 10: Difference Variance; 11: Difference Entropy; 12: Information Measure of Correlation 1; 13: Information Measure of Correlation 2.
Selforganizing map and shortest transition paths in the directed network
Request a detailed protocolThe selforganizing map is an unsupervised machine learning method to represent the topology structure of date sets. We used a 12 × 12 grid (neurons) to perform space approximation of all reactive trajectories. The SOM was trained for 50 epochs on the data with Neupy (http://neupy.com/pages/home.html). We set the learning radius as one and standard deviation 1. These neurons divide the data into 144 microclusters ($\left\{\psi \right\}$). With the singlecell trajectory data, we counted the transition probabilities from cluster i to cluster j (including self), with $\sum _{j}{p}_{ij}=1$. If the transition probability is smaller than 0.01, the value is then reset as 0. With the transition probability matrix, we built a directed network of these 144 neurons (Figure 3—figure supplement 1). The distance (weight) of the edge between neuron $\psi}_{i$ and neuron $\psi}_{j$ is defined as the negative logarithm of transition probability ($\mathrm{l}\mathrm{o}\mathrm{g}\phantom{\rule{thinmathspace}{0ex}}{p}_{ij}$). We set the neurons that close to the center of epithelial and mesenchymal state (sphere with radius = 0.7) as epithelial community and mesenchymal community, respectively, and used Dijkstra algorithm to find the shortest path between each pair of epithelial and mesenchymal neurons (Dijkstra, 1959) with NetworkX (Hagberg et al., 2008). We recorded the frequency of neurons and edge between these neurons that were past by these shortest paths.
Calculation of density of reactive trajectories
Request a detailed protocolThe density of reactive trajectory on the plane of morphology PC1 and vimentin Haralick PC1 was calculated with the following procedure:
Divide the whole plane into 200 × 200 grids.
In each grid, count the number of reactive trajectory (only the parts of each reactive trajectory that were in the intermediate region were taken into consideration) that enters and leaves it. If a reactive trajectory passes certain grid multiple times, only one was added in this grid’s density. Thus, the density matrix was obtained.
Use Gaussian filter to smooth the obtained density matrix. The standard deviation was set to be two and the truncation was two (i.e., truncate at twice of the standard deviation).
Procedure for determining a reaction coordinate
Request a detailed protocolWe followed a procedure adapted from what used in the finite temperature string method for numerical searching of reaction coordinate and nonequilibrium umbrella sampling (VandenEijnden and Venturoli, 2009; Dickson et al., 2009), with a major difference that we used experimentally measured singlecell trajectories (Figure 4—figure supplement 1).
Identify the starting and ending points of the reaction path as the means of data points in the epithelial and mesenchymal regions, respectively. The two points are fixed in the remaining iterations.
Construct an initial guess of the reaction path that connects the two ending points in the feature space through linear interpolation. Discretize the path with N ( = 30) points (called images, and the k_{th} image denoted as ${s}_{k}$ with corresponding coordinate X(s_{k})) uniformly spaced in arc length.
Collect all the reactive singlecell trajectories that start from the epithelial region and end in the mesenchymal region.
For a given trial RC, divide the multidimensional state space by a set of Voronoi polyhedra containing individual images, and calculate the score function F given in the main text (with w = 10 in our calculations). We carried out the minimization procedure through an iterative process. For a given trial path defined by the set of image points, we calculated a set of average points using the following equations, $\overline{\mathbf{X}}({s}_{k})=\frac{\u27e8\sum _{u}\sum _{\alpha}\left\{{\mathbf{X}}_{u,{t}_{\alpha}}{\mathbf{X}}_{u,{t}_{\alpha}}\in {s}_{k}\right\}\u27e9+w\u27e8\sum _{u}{\mathbf{X}}_{u,\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{m}\mathrm{i}\mathrm{n}\Vert \mathbf{X}({\mathrm{s}}_{\mathrm{k}}){\mathbf{X}}_{\mathrm{u},{\mathrm{t}}_{\alpha}}{\Vert}^{2}}\u27e9}{1+w}$. Next we updated the continuous reaction path through cubic spline interpolation of the average positions (Virtanen et al., 2020), and generated a new set of N images $\left\{X({s}_{k})\right\}$ that are uniformly distributed along the new reaction path. We set a smooth factor, that is, the upper limit of $\sum _{k=1}^{N}(\overline{X}({s}_{{}_{k}})X({s}_{{}_{k}}))$, as one for calculating the RC in Figure 4.
We iterated the whole process in step three until there was no further change of Voronoi polyhedron assignments of the data points.
For obtaining the quasipotential of a larger range of s, extrapolate the obtained reaction path forward and backward by adding additional image points (three for the two parallel reaction paths) beyond the two ends of the path linearly, respectively. These new image points are also uniformly distributed along s as the old image sets do. Reindex the whole set of image points as $\left\{{s}_{0},{s}_{1},...,{s}_{i},...{s}_{N},{s}_{N+1}\right\}$.
Calculation of dynamics of morphology and Haralick features along reaction path
Request a detailed protocolThe reaction path is calculated in the principal component (PC) space of morphology PC1, vimentin Haralick PC1, PC3, and PC4. Distribution of cells show significant shift before and after TGFβ treatment in these dimensions (Wang et al., 2020). To reconstruct dynamics in the original features space from PCs, the reaction path’s coordinates on the other dimensions of PC are set as means of data in the corresponding Voronoi cell of each point on the reaction path. We obtain the reaction path in full dimension of PC space. The dynamics of morphology and Haralick features are calculated by inversetransform of coordinates of PCs.
Markov property test
Request a detailed protocolWe examined the Markov property of the RC trajectories with ChapmanKolmogorov Test (CKtest) (Figure 5—figure supplement 1). The CKtest compares the left and right sides of ChapmanKolmogorov equation ($P(k\tau )={p}^{k}(\tau )$). The $k$step transition matrix should equal to the ${k}_{th}$ power of 1step transition matrix if the process is Markovian. We estimated the transition matrix with PyEMMA (Scherer et al., 2015).
Reconstruction of quasipotential along the reaction coordinate
Request a detailed protocolBased on the theoretical framework in Figure 5c, we followed the procedure below:
The N + 2 image points of an identified RC divide the space in N + 2 Voronoi cells that data points can assign to. Ignore the first and last Voronoi cells, and use the remaining N cells for the remaining analyses.
Within the i_{th} Voronoi cell, calculate the mean drift speed (and thus $d\varphi /ds$) at $s}_{i$ approximately by $\frac{d\varphi}{ds}{\left{}_{{s}_{i}}=\u27e8\frac{d{s}_{i}}{dt}\u27e9\approx \u27e8\frac{s\u27ee\mathit{X}(t+\mathrm{\Delta}t)\u27ef{s}_{i}}{\mathrm{\Delta}t}\u27e9\right}_{s(\mathit{X}(t))={s}_{i}}$ where s(X) is the assumed value along s for a cell state X in the morphology/texture feature space. The sum $s(\mathit{X}(t))={s}_{i}$ is over all time and all data points from all the recorded trajectories that lie within the i_{th} Voronoi cell ($s(\mathit{X}(t))={s}_{i}$), and Δt = 1 is one recording time interval. Using data points from all instead of just reactive trajectories is necessary for unbiased sampling within each Voronoi cell with ${\u27e8\eta \u27e9}_{{s}_{i}}=0$.
Calculate the quasipotential through numerical integration, $\varphi ({s}_{i})=\varphi ({s}_{0})+{\int}_{{s}_{0}}^{{s}_{i}}\frac{d\varphi}{ds}ds\approx \varphi ({s}_{0})+\mathrm{\Delta}s\sum _{j=1}^{i}\frac{d\varphi}{ds}{}_{{s}_{j}}$. The exact value of ϕ(s_{0}) does not affect the quasipotential shape.
Reconstruction of quasipotential along parallel paths
Request a detailed protocolWe followed the same mapping procedure for trajectories of cells treated with different TGFβ concentration. We used tslearn to calculate the DTW distance between two reactive trajectories, then performed KMeans clustering (Sammut and Webb, 2017) on the DTW distance matrix (Sakoe and Chiba, 1978) to cluster the reactive trajectories into two groups. We then followed the procedure in Materials and methods (McFalineFigueroa et al., 2019) to reconstruct the RC for each group. We reconstructed the quasipotentials using all trajectories.
For a singlecell trajectory, it is possible that the cell jumps out its original path due to fluctuation. For example, for a trajectory that mainly follows RC1, certain parts of it may transit into the range of RC2. So we used a partaligning method to map all the trajectories to the RCs.
For a trajectory not belonging to the reactive trajectory ensemble, we assigned it to one of the two group associated to the two RCs, $\left\{{s}_{1}\right\}=\left\{{s}_{1,1},...,{s}_{1,i},...{s}_{1,N}\right\}$ and $\left\{{s}_{2}\right\}=\left\{{s}_{2,1},...,{s}_{2,i},...{s}_{2,N}\right\}$ by using subsequence DTW distance (Sammut and Webb, 2017). We first calculate the subsequence DTW distance of this trajectory to the Voronoi cells of $\left\{{S}_{1}\right\}$ and $\left\{{S}_{2}\right\}$, respectively, and identified its matching coordinates $\left\{{s}_{1,a},...,{s}_{1,i},...,{s}_{1,b}\right\}$, and $\left\{{s}_{2,c},...,{s}_{2,i},...{s}_{2,d}\right\}$ on the RCs. Each point on the trajectory was assigned to the RCs based on minimum Euclidean distance. Then the consecutive parts of this trajectory along each RC were used to calculate the drift speed $\u27e8ds/dt\u27e9$ and quasipotential $\varphi (s)$ following the definition and procedure described in Materials and methods (McFalineFigueroa et al., 2019).
Numerical solution of the FokkerPlanck equation
Request a detailed protocolThe Langevin equation, $ds/dt=F\left(s\right)+\eta (s,t)$, describes how one cell evolves along a reaction coordinate s in the state space. Equivalently under the Ito interpretation the FokkerPlanck equation describes on the probability density of observing cells at a specific point s, $\rho \left(s,t\right)$ changes over time (Kubo et al., 1991; Schnoerr et al., 2017),
In the equation, the spatially dependent diffusion constant D was estimate from the experiment data, ${D}_{{s}_{i}}=Var\left(\frac{ds}{dt}{{}_{s}}_{{}_{i}}\right)$, the variance of $\frac{ds}{dt}{{}_{s}}_{{}_{i}}$ calculated from experiment data in corresponding Voronoi cells. The term $F\left(s\right)=\nabla U$, where $U$ is the quasipotential obtained from experiment data, the pseudotemperature T was set as $\frac{1}{2{k}_{B}}$ , where ${k}_{B}$ is the Boltzmann constant. With these inputs, we solved the equations numerically (Holubec et al., 2019).
Data availability
Request a detailed protocolThe scripts and singlecell trajectory data can be found in the link https://github.com/opnumten/trajectory_analysis (Wang, 2022 copy archived at swh:1:rev:1153076f88acab6f1030eefa119f4e6bd49cbb97).
Data availability
The computer code are shared on GitHub, so other researchers can run to reproduce Figure 3, 4, and 5. The processed single cell trajectory data are on Dryad.

Dryad Digital RepositoryA549 VIMRFP EMT single cell trajectory data.https://doi.org/10.5061/dryad.7h44j0zvp
References

Sampling rare switching events in biochemical networksPhysical Review Letters 94:018104.https://doi.org/10.1103/PhysRevLett.94.018104

Epigenetics as a first exit problemPhysical Review Letters 88:048101.https://doi.org/10.1103/PhysRevLett.88.048101

Transition path sampling: throwing ropes over rough mountain passes, in the darkAnnual Review of Physical Chemistry 53:291–318.https://doi.org/10.1146/annurev.physchem.53.082301.113146

Transition state characteristics during cell differentiationPLOS Computational Biology 14:e1006405.https://doi.org/10.1371/journal.pcbi.1006405

Context specificity of the EMT transcriptional responseNature Communications 11:2142.https://doi.org/10.1038/s41467020160662

Active Shape ModelsTheir Training and ApplicationComputer Vision and Image Understanding 61:38–59.https://doi.org/10.1006/cviu.1995.1004

Nonequilibrium umbrella sampling in spaces of many order parametersThe Journal of Chemical Physics 130:074104.https://doi.org/10.1063/1.3070677

A note on two problems in connexion with graphsNumerische Mathematik 1:269–271.https://doi.org/10.1007/BF01386390

The path of chemical reactions  the IRC approachAccounts of Chemical Research 14:363–368.https://doi.org/10.1021/ar00072a001

BookExploring Network Structure, Dynamics, and Function Using NetworkXLos Alamos National Lab.(LANL), Los Alamos, NM (United States.

Reactionrate theory: fifty years after KramersReviews of Modern Physics 62:251–341.https://doi.org/10.1103/RevModPhys.62.251

Statistical and structural approaches to textureProceedings of the IEEE 67:786–804.https://doi.org/10.1109/PROC.1979.11328

ConferenceDeep Residual Learning for Image Recognition2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778.https://doi.org/10.1109/CVPR.2016.90

Selforganized formation of topologically correct feature mapsBiological Cybernetics 43:59–69.https://doi.org/10.1007/BF00337288

BookStatistical Physics IIIn: Kubo R, editors. Statistical Physics II: Nonequilibrium Statistical Mechanics (2nd Ed). Berlin, Heidelberg: Springer. pp. 1–279.https://doi.org/10.1007/9783642582448

Recent developments in methods for identifying reaction coordinatesMolecular Simulation 40:784–793.https://doi.org/10.1080/08927022.2014.907898

Exploring intermediate cell states through the lens of single cellsCurrent Opinion in Systems Biology 9:32–41.https://doi.org/10.1016/j.coisb.2018.02.009

Electron transfer reactions in chemistryTheory and Experiment. Reviews of Modern Physics 65:599–610.https://doi.org/10.1103/RevModPhys.65.599

Reaction coordinates for the flipping of genetic switchesBiophysical Journal 94:3413–3423.https://doi.org/10.1529/biophysj.107.116699

Comparison of quantitative methods for cellshape analysisJournal of Microscopy 227:140–156.https://doi.org/10.1111/j.13652818.2007.01799.x

BookThe FokkerPlanck EquationIn: Risken H, editors. The FokkerPlanck Equation: Methods of Solutions and Applications (2nd Ed). Berlin, Heidelberg: SpringerVerlag. pp. 1–474.https://doi.org/10.1007/9783642615443

Dynamic programming algorithm optimization for spoken word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing 26:43–49.https://doi.org/10.1109/TASSP.1978.1163055

PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov ModelsJournal of Chemical Theory and Computation 11:5525–5542.https://doi.org/10.1021/acs.jctc.5b00743

Rareevent sampling of epigenetic landscapes and phenotype transitionsPLOS Computational Biology 14:e1006336.https://doi.org/10.1371/journal.pcbi.1006336

Multiple routes and structural heterogeneity in protein foldingAnnual Review of Biophysics 37:489–510.https://doi.org/10.1146/annurev.biophys.37.032807.125920

Revisiting the finite temperature string method for the calculation of reaction tubes and free energiesThe Journal of Chemical Physics 130:194103.https://doi.org/10.1063/1.3130083

Lineage tracing meets singlecell omics: opportunities and challengesNature Reviews Genetics 21:410–427.https://doi.org/10.1038/s4157602002232

Softwaretrajectory_analysis, version swh:1:rev:1153076f88acab6f1030eefa119f4e6bd49cbb97Software Heritage.

Finite temperature string method for the study of rare eventsThe Journal of Physical Chemistry. B 109:6688–6693.https://doi.org/10.1021/jp0455430

Transitionpath theory and pathfinding algorithms for the study of rare eventsAnnual Review of Physical Chemistry 61:391–420.https://doi.org/10.1146/annurev.physchem.040808.090412

Lineage tracing on transcriptional landscapes links state to fate during differentiationScience (New York, N.Y.) 367:6479.https://doi.org/10.1126/science.aaw3381

Mapping between dissipative and Hamiltonian systemsJournal of Physics A 43:375003.https://doi.org/10.1088/17518113/43/37/375003

Application of the projection operator formalism to nonhamiltonian dynamicsThe Journal of Chemical Physics 134:044132.https://doi.org/10.1063/1.3530071

Guidelines and definitions for research on epithelialmesenchymal transitionNature Reviews. Molecular Cell Biology 21:341–352.https://doi.org/10.1038/s4158002002379

Towards a Quantitative Understanding of Cell IdentityTrends in Cell Biology 28:1030–1048.https://doi.org/10.1016/j.tcb.2018.09.002

Spatial clustering and common regulatory elements correlate with coordinated gene expressionPLOS Computational Biology 15:e1006786.https://doi.org/10.1371/journal.pcbi.1006786
Article and author information
Author details
Funding
National Institute of Diabetes and Digestive and Kidney Diseases (R01DK119232)
 Jianhua Xing
National Cancer Institute (R37 CA232209)
 Jianhua Xing
National Institutes of Health (NIBIB T32EB009403)
 Dante Poe
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was partially supported by National Cancer Institute (R37 CA232209), National Institute of Diabetes and Digestive and Kidney Diseases (R01DK119232) to JX, and National Institutes of Health (NIBIB T32EB009403) to DP. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) at SDSC Dell Cluster with NVIDIA V100 GPUs NVLINK and HDR IB (Expanse GPU) through allocation BIO200085. We thank Eric Siggia, John J Tyson, and Sophia Hu for critical reading of the manuscript and constructive comments.
Version history
 Preprint posted: January 28, 2020 (view preprint)
 Received: October 20, 2021
 Accepted: February 6, 2022
 Accepted Manuscript published: February 21, 2022 (version 1)
 Version of Record published: March 14, 2022 (version 2)
Copyright
© 2022, Wang et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,192
 views

 372
 downloads

 16
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Physics of Living Systems
While inhomogeneous diffusivity has been identified as a ubiquitous feature of the cellular interior, its implications for particle mobility and concentration at different length scales remain largely unexplored. In this work, we use agentbased simulations of diffusion to investigate how heterogeneous diffusivity affects the movement and concentration of diffusing particles. We propose that a nonequilibrium mode of membraneless compartmentalization arising from the convergence of diffusive trajectories into lowdiffusive sinks, which we call ‘diffusive lensing,’ is relevant for living systems. Our work highlights the phenomenon of diffusive lensing as a potentially key driver of mesoscale dynamics in the cytoplasm, with possible farreaching implications for biochemical processes.

 Physics of Living Systems
Filamentous cyanobacteria are one of the oldest and today still most abundant lifeforms on earth, with manifold implications in ecology and economics. Their flexible filaments, often several hundred cells long, exhibit gliding motility in contact with solid surfaces. The underlying force generating mechanism is not yet understood. Here, we demonstrate that propulsion forces and friction coefficients are strongly coupled in the gliding motility of filamentous cyanobacteria. We directly measure their bending moduli using micropipette force sensors, and quantify propulsion and friction forces by analyzing their selfbuckling behavior, complemented with analytical theory and simulations. The results indicate that slime extrusion unlikely generates the gliding forces, but support adhesionbased hypotheses, similar to the betterstudied singlecelled myxobacteria. The critical selfbuckling lengths align well with the peaks of natural length distributions, indicating the importance of selfbuckling for the organization of their collective in natural and artificial settings.