Deciphering anomalous heterogeneous intracellular transport with neural networks

  1. Daniel Han  Is a corresponding author
  2. Nickolay Korabel
  3. Runze Chen
  4. Mark Johnston
  5. Anna Gavrilova
  6. Victoria J Allan  Is a corresponding author
  7. Sergei Fedotov  Is a corresponding author
  8. Thomas A Waigh  Is a corresponding author
  1. Department of Mathematics, University of Manchester, United Kingdom
  2. School of Biological Sciences, University of Manchester, United Kingdom
  3. Department of Physics and Astronomy, University of Manchester, United Kingdom
  4. Department of Computer Science, University of Manchester, United Kingdom
  5. The Photon Science Institute, University of Manchester, United Kingdom


Intracellular transport is predominantly heterogeneous in both time and space, exhibiting varying non-Brownian behavior. Characterization of this movement through averaging methods over an ensemble of trajectories or over the course of a single trajectory often fails to capture this heterogeneity. Here, we developed a deep learning feedforward neural network trained on fractional Brownian motion, providing a novel, accurate and efficient method for resolving heterogeneous behavior of intracellular transport in space and time. The neural network requires significantly fewer data points compared to established methods. This enables robust estimation of Hurst exponents for very short time series data, making possible direct, dynamic segmentation and analysis of experimental tracks of rapidly moving cellular structures such as endosomes and lysosomes. By using this analysis, fractional Brownian motion with a stochastic Hurst exponent was used to interpret, for the first time, anomalous intracellular dynamics, revealing unexpected differences in behavior between closely related endocytic organelles.


The majority of transport inside cells on the mesoscale (nm-100μm) is now known to exhibit non-Brownian anomalous behavior (Metzler and Klafter, 2004; Barkai et al., 2012; Waigh, 2014). This has wide ranging implications for most of the biochemical reactions inside cells and thus cellular physiology. It is vitally important to be able to quantitatively characterize the dynamics of organelles and cellular responses to different biological conditions (van Bergeijk et al., 2015; Patwardhan et al., 2017; Moutaux et al., 2018). Classification of different non-Brownian dynamic behaviors at various time scales has been crucial to the analysis of intracellular dynamics (Fedotov et al., 2018; Bressloff and Newby, 2013), protein crowding in the cell (Banks and Fradin, 2005; Weiss et al., 2004), microrheology (Waigh, 2005; Waigh, 2016), entangled actin networks (Amblard et al., 1996), and the movement of lysosomes (Ba et al., 2018) and endosomes (Flores-Rodriguez et al., 2011). Anomalous transport is currently analyzed by statistical averaging methods and this has been a barrier to understanding the nature of its heterogeneity.

Spatiotemporal analysis of intracellular dynamics is often performed by acquiring and tracking microscopy movies of fluorescing membrane-bound organelles in a cell (Rogers et al., 2007; Flores-Rodriguez et al., 2011; Chenouard et al., 2014; Zajac et al., 2013). These tracks are then commonly interpreted using statistical tools such as the mean square displacement (MSD) averaged over the ensemble of tracks, Δr2(t). The MSD is a measure that is widely used in physics, chemistry and biology. In particular, MSDs serve to distinguish between anomalous and normal diffusion at different temporal scales by determining the anomalous exponent α through Δr2(t)tα (Metzler and Klafter, 2000). Diffusion is defined as α=1, sub-diffusion 0<α<1 and super-diffusion 1<α<2 (Klafter and Sokolov, 2011). To improve the statistics of MSDs, they are often averaged over different temporal scales, forming the time-averaged MSD (TAMSD), Δr2(τ)¯τα, where τ is the lag time (Sokolov, 2012).

For stochastic processes with long-range time dependence such as fractional Brownian motion (fBm), other statistical averaging methods exist. For fBm, the MSD is BH2(t)t2H with the Hurst exponent, H varying between 0 and 1. One can use rescaled and sequential range analysis (Samorodnitsky, 2016; Peters, 1994) to estimate H. The advantage of modeling intracellular transport with fBm is that both sub-diffusion (0<H<1/2) and super-diffusion (1/2<H<1) can be explained in a unified manner using only the Hurst exponent. The essence of fBm is that long-range correlations result in random trajectories that are anti-persistent (0<H<1/2) or persistent (1/2<H<1). How can we understand persistence in the context of intracellular transport? The term persistence can be understood as the processive motor-protein transport of cargo in one direction, whether it be retrograde or anterograde. From a probabilistic viewpoint, persistence can be interpreted as the cargo being more likely to keep the same direction given it had been moving in this fashion before. Conversely, anti-persistence is interpreted as cargo being more likely to change its direction given it had been moving in that direction before. Anti-persistence can arise if cargo is confined to a local volume in the cytoplasm simply due to crowding or tethering biochemical interactions (Harrison et al., 2013), which in effect leads to sub-diffusion (Weiss et al., 2004; Ernst et al., 2012). By interpreting intracellular cargo transport as fBm, there are two main advantages: we can describe movement with the intuitive biological concepts of persistence and anti-persistence; and we can provide an immediate link to anomalous diffusion since α = 2H for constant H.

Cargo movement in vivo often exhibits random switching between persistent and anti-persistent movement, even in a single trajectory (Chen et al., 2015). Therefore, we can model this by a stochastic local Hurst exponent, H(t), which jumps between persistent (1/2<H(t)<1) and anti-persistent (0<H(t)<1/2) states. Still, a major challenge exists: how can we estimate a local stochastic Hurst exponent from a trajectory?

Whilst exponent estimation using neural networks is an emerging field (Bondarenko et al., 2016), segmentation of single trajectories into persistent and anti-persistent sections based on instantaneous dynamic behavior has not been studied. Instead, hidden Markov models (Monnier et al., 2015; Persson et al., 2013) and windowed analyses (Getz and Saltz, 2008) are commonly used to segment local behavior along single trajectories (see Appendix A for comparisons). Even so, most methods neglect the microscopic processes which are often a feature of intracellular transport (e.g. alternation between ‘runs’ and ‘rests’) (Weiss et al., 2004; Chen et al., 2015; Fedotov et al., 2018) and the non-Markovian nature of their motion (Fuliński, 2017). fBm was chosen due to its self-similar properties that allow direct analysis at short time scales given by experimental systems; and the experimental evidence for fBm in the crowded cytoplasm (Weiss et al., 2004; Szymanski and Weiss, 2009; Krapf et al., 2019). Moreover, other anomalous diffusion models, such as scaled Brownian motion (Lim and Muniandy, 2002), subdiffusive continuous time random walks (Sokolov, 2012) and superdiffusive Lévy walks (Fedotov et al., 2018) are not suitable to interpret anomalous trajectories on the microscopic level.

Here, we present a new method for characterizing anomalous transport inside cells based on a Deep Learning Feedforward Neural Network (DLFNN) that is trained on fBm. Neural networks are becoming a general tool in a wide range of fields, such as single-cell transcriptomics (Deng et al., 2019) and protein folding (Evans et al., 2018). We find the neural network is a much more sensitive method to characterise fBm than previous statistical tools, since it is an intrinsically non-linear regression method that accounts for correlated time series. In addition, it can estimate the Hurst exponent using as few as seven consecutive time points with good accuracy.

To test the ability of the DLFNN to segment real-world biological motility, we focused on organelles in the endocytic pathway. This pathway is essential for cell homeostasis, allowing nutrient uptake, the turnover of plasma membrane components, and uptake of growth factor receptors bound to their ligands. Early endosomes then sort components destined for degradation from material that needs to be recycled back to the cell surface or to the Trans-Golgi Network (TGN) (Naslavsky and Caplan, 2018). Many aspects of endosome function are regulated by Rab5, a small GTPase that is localized to the cytosolic face of early endosomes (Stenmark and Olkkonen, 2001). Sorting nexin 1 (SNX1) also localises to early endosomes, where it works with the retromer complex to retrieve and recycle cargoes from early endosomes to the TGN (Simonetti and Cullen, 2019). SNX1 achieves this through regulating tubular membrane elements on early endosomes by associating with regions of high membrane curvature (Carlton et al., 2004). Early endosomes mature into late endosomes, which then fuse with lysosomes, delivering their contents for degradation (Huotari and Helenius, 2011). Endocytic pathway components are highly dynamic, with microtubule motors driving long-distance movement while short-range dynamics involve actin-based motility (Granger et al., 2014; Cabukusta and Neefjes, 2018), making them ideal test cases for DLFNN analysis. The new method enables the interpretation of experimental trajectories of lysosomes and endosomes as fBm with stochastic local Hurst exponent, H (t). This in turn allows us to unambiguously and directly classify endosomes and lysosomes to be in anti-persistent or persistent states of motion at different times. From experiments, we observe that the time spent within these two states both exhibit truncated heavy-tailed distributions.

To our knowledge, this is the first method which is capable of resolving heterogeneous behavior of anomalous transport in both time and space. We anticipate that this method will be useful in characterizing a wide range of systems that exhibit anomalous heterogeneous transport. We have therefore created a GUI computer application in which the DLFNN is implemented, so that the wider community can conveniently access this analysis method.

Results and discussion

The DLFNN is more accurate than established methods

We tested a DLFNN trained on fBm with three hidden layers of densely connected nodes on N = 104 computer-generated fBm trajectories each with n = 102 evenly spaced time points and constant Hurst exponent Hsim, randomly chosen between 0 and 1. The DLFNN estimated the Hurst exponents Hest based on the trajectories, and these were compared with those estimated from TAMSD, rescaled range, and sequential range methods (Figure 1a). The difference between the simulated and estimated values ΔH=Hsim-Hest was much smaller for the DLFNN than for the other methods (Figure 1a), and the DLFNN was 3 times more accurate at estimating Hurst exponents with a mean absolute error (σH) 0.05. Also, the errors in estimation of the DLFNN are more stable across values of Hsim.

Tests of exponent estimation for the DLFNN using N = 104 simulated fBm trajectories.

(a) Plots showing the Hurst exponent estimates of fBm trajectories with n=102 data points by a triangular DLFNN with three hidden layers compared with conventional methods. Plots are vertically grouped by Hurst exponent estimation method: (left to right) rescaled range, MSD, sequential range and DLFNN. σH values are shown in the title. Top row: Scatter plots of estimated Hurst exponents Hest and the true value of Hurst exponents from simulation Hsim. The red line shows perfect estimation. Second row: Due to the density of points, a Gaussian kernel density estimation was made of the plots in the top row (see Materials and methods). Third row: Scatter plots of the difference between the true value of Hurst exponents from simulation and estimated Hurst exponent ΔH=Hsim-Hest. Last row: Gaussian kernel density estimation of the plots in the third row. (b) σH as a function of the number of consecutive fBm trajectory data points n for different methods of exponent estimation. Example structures for two hidden layers and n=5 time series input points of the anti-triangular, rectangular and triangular DLFNN are shown in (c, d and e), respectively. (f) σH as a function of the number of hidden layers in the DLFNN for triangular, rectangular and anti-triangular structures. (g) σH as a function of the number of randomly sampled fBm trajectory data points nrand with different number of hidden layers in the DLFNN shown in the legend. (h) σH as a function of the noise-to-signal ratio (NoiseSignal) (NSR) from Gaussian random numbers added to all n=102 data points in simulated fBm trajectories. (i) Plots of bias b(Hsim), variance Var(Hsim) and mean square error (MSE) as functions of Hsim. For each value of Hsim, fBm trajectories with n=100 points were simulated and estimated by a triangular DLFNN.

Tracking of intracellular motion usually generates trajectories with a variable number of data points. We therefore compared the performance of the different exponent estimation methods when the number of evenly spaced, consecutive fBm time points in a trajectory varied over n=5,6,,102 points. The DLFNN maintained an accuracy of σH0.05 across n, whereas σH of other methods increase as n decreases (Figure 1b), and was always substantially worse than that of the DLFNN estimation. Different DLFNN structures (see Figure 1c,d and e) performed similarly, and introducing more hidden layers did not affect the accuracy of estimation (Figure 1f and g). Given that the structure of DLFNN does not significantly affect the accuracy of exponent estimation, a triangular densely connected DLFNN was used for all subsequent analyses.

The structure of a triangular DLFNN means that the input layer consists of n nodes, which are densely connected to n-1 nodes in the first hidden layer, such that at the lth hidden layer, there would be n-l densely connected nodes. Then to estimate the Hurst exponent these nodes are connected to a single node using a Rectified Linear Unit (ReLU) activation function, which returns the exponent estimate. A triangular DLFNN therefore uses only l=0L(n-l)+1 nodes for L hidden layers and n input points, whereas the rectangular structure uses nL+1 nodes and the anti-triangular structure uses l=0L(n+l)+1. The triangular structure results in a significant decrease in training parameters, and hence computational requirements, while maintaining good levels of accuracy. This demonstrates that a computationally inexpensive neural network can accurately estimate exponents.

The DLFNN’s estimation capabilities were tested further by inputting nrand randomly sampled time points from the original fBm trajectories. Surprisingly, σH0.05 is regained even with just 40 out of 100 data points randomly sampled from the time series for any triangular DLFNN with more than one hidden layer (Figure 1g). For this method to work with experimental systems, it must estimate Hurst exponents even when the trajectories are noisy. Figure 1h shows how the exponent estimation error increases when Gaussian noise with varying strength compared to the original signal is added to the fBm trajectories. Importantly, the DLFNN accuracy σH at 20% NSR is as good as the accuracy of other methods with no noise (compare 1a and h).

To characterize the accuracy of Hsim estimation by the DLFNN, we calculated the bias, b(Hsim)=𝔼[Hest]-Hsim; variance, Var(Hsim)=𝔼[Hest-𝔼[Hest]2]; and mean square error, MSE=Var(Hsim)+b(Hsim)2 (Figure 1i). To quantify the efficiency of the estimator the Fisher information of the neural network’s estimation needs to be found and the Cramer-Rao lower bound calculated. The values of bias, variance and MSE were very low (Figure 1i), which taken together with the simplicity of calculation and the accuracy of estimation even with small number of data points, demonstrates the strength of the DLFNN method. Furthermore, once trained, the model can be saved and reloaded at any time. Saved DLFNN models, code and the DLFNN Exponent Estimator GUI are available to download (see Software and Code).

DLFNN allows analysis of simulated trajectories with local stochastic Hurst exponents

Estimating local Hurst exponents is fundamentally important because much research has focused on inferring active and passive states of transport within living cells using position-derived quantities such as windowed MSDs, directionality and velocity (Arcizet et al., 2008; Monnier et al., 2015). The trajectories are then segmented and Hurst exponents measured in an effort to characterize the behavior of different cargo when they are actively transported by motor proteins (Chen et al., 2015; Fedotov et al., 2018) or sub-diffusing in the cytoplasm (Jeon et al., 2011). However, conventional methods such as the MSD and TAMSD need trajectories with many time points (n102-103) to calculate a single Hurst exponent value with high fidelity. In contrast, the DLFNN enables the Hurst exponent to be estimated, directly from positional data, for a small number of points. Furthermore, the DLFNN measures local Hurst exponents without averaging over time points and is able to characterize particle trajectories that may exhibit multi-fractional, heterogeneous dynamics.

To provide a synthetic data set that mimics particle motion in cells, we simulated fBm trajectories with Hurst exponents that varied in time, and applied a symmetric moving window to estimate the Hurst exponent using a small number of data points before and after each time point (Figure 2). The DLFNN was able to identify segments with different exponents, and provided a good running estimation of the Hurst exponent values. The DLFNN could also handle trajectories with different diffusion coefficients, and generally performed better than MSD analysis when a sliding window was used (see Appendix B).

DLFNN analysis of a simulated trajectory.

Top: Plot of displacement as a function of time from a simulated fBm trajectory (blue) with multiple exponent values. Bottom: Hurst exponent values used for simulation (magenta), and the DLFNN exponent predictions of the neural network using a 15 point moving window (black).

DLFNN analysis reveals differences in motile behavior of organelles in the endocytic pathway

Early endosomes labeled with green fluorescent protein (GFP)-Rab5 undergo bursts of rapid cytoplasmic dynein-driven motility interspersed with periods of rest (Flores-Rodriguez et al., 2011; Zajac et al., 2013). We therefore applied the DLFNN method to experimental trajectories obtained from automated tracking (Newby et al., 2018) data of GFP-Rab5-labeled endosomes in an MRC-5 cell line that stably expressed GFP-Rab5 at low levels (Figure 3). A moving window of 15 points identified persistent (green) and anti-persistent (magenta) segments, which corresponded well to the moving window velocity plots (Figure 3, lower panel), confirming that the neural network is indeed distinguishing passive states from active transport states with non-zero average velocity. We then used it to analyze the motility of two other endocytic compartments: SNX1-positive endosomes (Allison et al., 2017; Hunt et al., 2013) and lysosomes (Cabukusta and Neefjes, 2018; Hendricks et al., 2010). It successfully segmented tracks of GFP-SNX1 endosomes (Figure 3—figure supplement 1) in a stable MRC-5 cell line (Allison et al., 2017) and lysosomes visualized using lysobrite dye (Figure 3—figure supplement 2). A total of 63–71 MRC-5 cells were analyzed, giving 40,800 (GFP-Rab5 endosome), 11,273 (GFP-SNX1 endosome) and 38,039 (lysosome) tracks that were segmented into 277,926 (GFP-Rab5), 215,087 (GFP-SNX1) and 474,473 (lysosome) persistent or anti-persistent sections, each yielding a displacement, duration and average H.

Figure 3 with 2 supplements see all
DLFNN analysis of a GFP-Rab5 endosome trajectory.

Top: Plot of displacement from a single trajectory in an MRC-5 cell (blue). Shaded areas show persistent (0.55 < H < 1 in green) and anti-persistent (0 < H < 0.45 in magenta) behaviour. Middle: A 15 point moving window DLFNN exponent estimate for the trajectory (black) with a line (dashed) marking diffusion H = 0.5 and two lines (dotted) marking confidence bounds for estimation marking H = 0.45 and 0.55. Bottom: Plot of instantaneous and moving (15 point) window velocity. Right: Plot of the trajectory with start and finish positions. Persistent (green) and anti-persistent (magenta) segments are shown. For sections that were 0.45 < H < 0.55 were not classified as persistent or anti-persistent and are depicted in blue.

These data revealed intriguing similarities and differences in behavior between the three endocytic components. Analysis of the duration and displacement of segments (Appendix C) revealed that all organelles spent longer in anti-persistent than persistent states (Figure 4) but moved much further when persistent (Appendix 3—figure 1), as expected. However, GFP-SNX1 endosomes spent much less time than GFP-Rab5 endosomes or lysosomes in an anti-persistent state (Figure 4). This difference in behavior was also seen when histograms of the Hurst exponents were plotted (Figure 5), as SNX1 endosomes were much less likely to exhibit anti-persistent behavior, particularly with H<0.3, than Rab5 endosomes or lysosomes. This was confirmed by fitting the histograms of the Hurst exponent with a six component Gaussian mixture model (Figure 5b–d; Appendix D). In contrast, all three organelle classes exhibited a similar range of Hurst exponents when they underwent directionally persistent motion.

Survival functions plotted with error bars for persistent and anti-persistent segments for Rab5-positive endosomes, SNX1-positive endosomes and lysosomes with the power-law fits.

Fit parameters can be found in Appendix 3—table 1.

Comparison of Hurst exponent distributions for GFP-Rab5, GFP-SNX1 and lysosomes.

(a) Histograms of Hurst exponents for GFP-Rab5 (black), GFP-SNX1 (magenta) endosomes and lysosomes (green) plot on the same axes for comparison. The individual histograms of Hurst exponents (black solid) for GFP-Rab5-tagged endosomes, GFP-SNX1-tagged endosomes and lysosomes are shown in (b, c and d) respectively. For each histogram, the Gaussian mixture model fit for six components (red dashed) and individual Gaussian distribution components are shown on the same plot. The number of components were chosen through the Bayes information criterion shown in Appendix 4—figure 1.

To understand organelle motility in the context of cell behavior, an additional layer of complexity needs to be considered - the location of the moving structure within the cell itself. Such information would reveal zones that favor anti-persistent or persistent movement (Bálint et al., 2013). Using the neural network, trajectories of GFP-Rab5, GFP-SNX1 endosomes and lysosomes from MRC-5 cells were plotted with colors depicting the changing Hurst exponent at different points in each trajectory (Figure 6). For Rab5- and SNX1-positive endosomes, anti-persistent organelles were enriched in the cell periphery, but occasionally underwent long-range persistent movement towards the nucleus (Figure 6—video 1; Figure 6—video 2), as expected (Flores-Rodriguez et al., 2011; Zajac et al., 2013; Hunt et al., 2013; Allison et al., 2017). Lysosomes displayed completely different behavior, with most trajectories being anti-persistent, while the persistent trajectories were not obviously organized spatially (Figure 6; Figure 6—video 3). The location information together with classification of anti-persistent and persistent trajectories qualitatively shows the regions of high motor-driven activity within the cell for different endocytic organelles.

Figure 6 with 4 supplements see all
MRC-5 cells stably expressing GFP-Rab5, GFP-SNX1 or stained with Lysobrite with tracking data overlaid.

The colours show the value of H estimated by the neural network using a 15 point window. The scalebar is 10 µm.

Many cargos that move along microtubules can switch their direction of motility, between dynein-driven inward (retrograde) motion toward the microtubule minus ends at the cell centre and plus-end-directed outward (anterograde) movement driven by kinesin family members (Hancock, 2014). To investigate the characteristics of anterograde and retrograde motility of endocytic organelles, we adapted our method to subdivide persistent segments according to whether the movement occurred towards or away from the user-defined centrosomal region (see Materials and methods). Only tracks with displacement of >0.5µm from their start point were selected, which yielded 2369 Rab5, 2099 SNX1 and 7645 lysosome persistent segments that were then analyzed to give the duration, displacement and velocity of anterograde and retrograde excursions (Figure 7; Table 1). The anti-persistent segments contained within these tracks were also analyzed.

Box and whisker plots of displacements, times and velocities of persistent retrograde, persistent anterograde and anti-persistent segments in experimental trajectories.

Any segment with H>0.55 was classed as persistent and H<0.45 as anti-persistent. These H values were chosen as a precaution against the mean error of the neural network estimation. Each data point within the box and whisker plots are averages of all trajectory segments in a single cell. A total of 65 MRC-5 cells for GFP-Rab5-tagged endosomes, 63 MRC-5 cells for SNX1-GFP-tagged endosomes and 71 MRC-5 cells for lysosomes were analysed with at least 5 to 500 (average 54) anterograde or retrograde segments for each cell.

Table 1
Statistics of experimental trajectory segments.

The persistent and anti-persistent segments in this table are: from trajectories that travelled over 0.5 µm at any point from their initial starting positions; contained more points than the window size; and switched behavior more than twice in the trajectory. Note that these conditions are much stricter than those to generate Figures 4 and 5. Each persistent segment was then further subdivided into retrograde and anterograde segments (see Materials and methods).

Number of persistent segments236920997645
Number of anti-persistent segments6983394719,320
Number of retrograde segments292523435882
Number of anterograde segments230316096827
Anti-persistent displacement (µm)Mean0.050.050.03
St. Dev0.020.010.004
Anti-persistent speed (µms-1)Mean0.820.750.10
St. Dev0.310.190.02
Anti-persistent time (s)Mean0.230.200.93
St. Dev0.050.030.11
Retrograde displacement (µm)Mean0.530.740.29
St. Dev0.190.280.08
Retrograde speed (µms-1)Mean2.291.351.49
St. Dev0.870.390.25
Retrograde time (s)Mean0.220.460.17
St. Dev0.090.090.03
Anterograde displacement (µm)Mean0.350.430.31
St. Dev0.170.200.08
Anterograde speed (µms-1)Mean2.061.101.51
St. Dev0.950.300.27
Anterograde time (s)Mean0.180.340.18
St. Dev0.080.080.03

These statistics revealed that each endocytic organelle moved with different characteristics. GFP-Rab5 endosomes moved much faster than GFP-SNX1 endosomes or lysosomes, particularly in the retrograde direction (Figure 7, upper panel). Strikingly, although the GFP-SNX1 endosomes were slowest in both directions, they moved furthest and for longest in each segment, in keeping with the longer duration of persistent segments seen in the global analysis of tracks (Figure 4) and higher H values (Figure 5). The differences in behavior between Rab5 and SNX1 endosomes is intriguing, since both are recruited to the early endosome by the lipid phosphoinositol-3-phosphate (Christoforidis et al., 1999; Carlton et al., 2004; Behnia and Munro, 2005; Huotari and Helenius, 2011). However, SNX1 also senses membrane curvature (Carlton et al., 2004), and immunofluorescence labeling of MRC-5 cells with antibodies to Rab5 and SNX1 demonstrated that they reside on distinct domains of larger early endosomes (Figure 6—figure supplement 1), as expected van Weering et al. (2012). In addition, while SNX1 endosomes were usually Rab5-positive, there was a significant population of Rab5 endosomes that lacked SNX1, especially smaller early endosomes that were often located in the cell periphery. It is likely that this population of Rab5-positive, SNX1-negative endosomes is particularly motile. The high retrograde velocity of these endosomes might be explained by the recruitment of dynein to Rab5 endosomes via Hook family members (Bielska et al., 2014; Zhang et al., 2014; Schroeder and Vale, 2016; Guo et al., 2016). These dynein adaptors have the intriguing property of recruiting two dyneins per dynactin (Urnavicius et al., 2018; Grotjahn et al., 2018), leading to faster rates of movement in motility assays using purified protein than adaptors that only recruit one dynein per dynactin. Perhaps, SNX1 endosomes move more slowly than Rab5 endosomes because they use a ‘single-dynein’ adaptor. An alternative explanation could be that SNX1 endosomes are slowed down by interactions with the actin cytoskeleton, since SNX1 domains are enriched in the WASH complex, which in turn controls localized actin assembly (Gomez and Billadeau, 2009; Simonetti and Cullen, 2019). Actin might also contribute to the slow, steady motion of SNX1 endosomes via myosin motors or the formation of actin comets (Simonetti and Cullen, 2019). These interesting possibilities remain to be tested experimentally.

The analysis of anterograde and retrograde segments revealed that lysosomes moved at moderate speed, and were equally fast in both directions, but each burst of movement was short (Figure 7, upper panels). In addition, pauses were 4 times longer for lysosomes than either early endosome type (Figure 7, lower panels). Lysosomes also often changed direction of movement (e.g. Figure 3—figure supplement 2), as previously reported (Hendricks et al., 2010). So far, no activating dynein adaptor has been identified on lysosomes (Reck-Peterson et al., 2018), although several potential dynein interactors have been identified, such as RILP (Rab7 interacting lysosomal protein (Cabukusta and Neefjes, 2018). Whether this underlies the difference in motile behavior between lysosomes and early endosomes remains to be tested: however, a less active dynein could well contribute to frequent reversals in direction (Hancock, 2014).

fBm with a stochastic Hurst exponent is a new possible intracellular transport model

fBm is a Gaussian process BH(t) with zero mean and covariance BH(t)BH(s)t2H+s2H-(t-s)2H, where the Hurst exponent, H is a constant between 0 and 1. With the DLFNN providing local estimates of the Hurst exponent, the motion of endosomes and lysosomes can be described as fBm with a stochastic Hurst exponent, H(t). This is different to multifractional Brownian motion (Peltier and Lévy Véhel, 1995) where H(t) is a function of time. In our case, H(t) is itself a stochastic process and such a process has been considered theoretically (Ayache and Taqqu, 2005). This is the first application of such a theory to intracellular transport and opens a new method for characterizing vesicular movement. Furthermore, Figure 3 shows that the motion of a vesicle, BH(t), exhibits regime switching behavior between persistent and anti-persistent states.

We found that the times that lysosomes and endosomes spend in a persistent and anti-persistent state are heavy-tailed (Figure 4). These times are characterized by the probability densities ψ(t)t-μ-1, where anti-persistent states have 0 < µ < 1 and persistent states have 1 < µ < 2. Extensive plots and fittings are shown in Figure 4 and Appendix C. In fact, the residence time probability density has an infinite mean to remain in an anti-persistent state (0<H(t)<1/2) but in persistent states (1/2<H(t)<1) the mean of the residence time probability density is finite and the second moment is infinite. This implies that the vesicles may have a biological mechanism to prioritize certain interactions within the complex cytoplasm, similar to ecological searching patterns (Reynolds and Rhodes, 2009), mRNPs song2018neuronal, swarming bacteria (Ariel et al., 2015) and how human dynamics are often heavy tailed and bursty (Barabási, 2005).


We developed a Deep Learning Feedforward Neural Network trained on fBm that estimates accurately the Hurst exponent for heterogeneous trajectories. Estimating the Hurst exponent using a DLFNN is not only more accurate than conventional methods but also enables direct trajectory segmentation without a drastic increase in computational cost. We package this DLFNN analysis code into a user-friendly application, which can predict the Hurst exponent with consistent accuracy for as few as seven consecutive data points. This is useful to biologists since major limitations to trajectory analysis are: the brevity of tracks due to the fact that particles may rapidly switch between motile states or move out of the plane of focus; the rapid nature of some biochemical reactions; and the bleaching of fluorescent probes (with non-bleaching probes often being bulky or cytotoxic). This method can be used to detect persistent and anti-persistent states of motion purely from the positional data of trajectories and removes the prerequisite of time or ensemble averaging for effective heterogeneous transport characterization.

The DLFNN enabled us to discover regime switching in lysosome and endosome movement that can be modeled by fBm with a stochastic Hurst exponent. This interpretation is a unified approach to describe motion with anti-persistence and persistence varying over time. Furthermore, the residence time of vesicles in a persistent or anti-persistent state is found to be heavy tailed, which implies that endosomes and lysosomes possess biological mechanisms to prioritize varying biological processes similar to ecological searching patterns (Reynolds and Rhodes, 2009), mRNPs song2018neuronal, swarming bacteria (Ariel et al., 2015) and even human dynamics (Barabási, 2005). Importantly, applying this method to identify and analyze the anterograde and retrograde motility reveals unexpected differences in behavior between closely-related organelles. Finally, in addition to providing a new segmentation method of active and passive transport, this new technique distinguishes the difference in motility between lysosomes, Rab5-positive endosomes and SNX1 positive endosomes. The results suggest that the manner in which these vesicles move is dependent on their identity within the endocytic pathway, especially when the motion is anti-persistent. This implies that directionality and the correlation between consecutive steps is important to measure in addition to the displacement, velocity and duration of movement. There is considerable scope for using these methods to identify changes in motility of different organelles caused by disease. We hope that this type of analysis will allow discoveries in particle motility of a more refined nature and make applying anomalous transport theory more accessible to researchers in a wide variety of disciplines.

Materials and methods

Key resources table
Reagent type or resourceDesignationSourceIdentifiersAdditional information
Cell line (Homo sapiens)Lung fibroblast lineAllison et al., 2017 cell line stably expressing GFP-SNX1. Mycoplasma free.
Cell line (H. sapiens)Lung fibroblast lineOtherGFP-Rab5-MRC5MRC5 cell line stably expressing GFP-Rab5 generated by retroviral transduction by G. Pearson and E. Reid, University of Cambridge. Mycoplasma free.
Cell line (H. sapiens)MRC-5 SV1 TG1 Lung fibroblast lineECACCMRC-5 SV1 TG1 cells, cat no. 85042501Mycoplasma free.
AntibodyAnti-human Rab5A Rabbit monoclonalCell Signalling Technology3547SIF(1/200)
AntibodyAnti-human sorting nexin 1 (mouse monoclonal)BD Biosciences611482IF(1/200)
AntibodyAlexa594-conjugated anti-mouse IgG (donkey polyclonal)Jackson ImmunoResearch715-585-150IF(1/400)
AntibodyA488-conjugated donkey anti-rabbit IgGJackson Immunoresearch711-545-152IF(1/400)
Recombinant DNA reagentpLXIN-GFP-Rab5C-I-NeoROtherUsed by G. Pearson and E. Reid, University of Cambridge to generate retrovirus containing GFP-Rab5C
Sequence-based reagentHpa1 GFP ForwardOtherPCR primerUsed by G. Pearson and E. Reid, University of Cambridge to generate retrovirus containing GFP-Rab5C. TAGGGAGTTAACATGGTGAGCAAGGGCGAGGA
Sequence-based reagentNot1 Rab5C ReverseOtherPCR primerUsed by G. Pearson and E. Reid, University of Cambridge to generate retrovirus containing GFP-Rab5C . ATCCCTGCGGCCGCTCAGTTGCTGCAGCACTGGC
Chemical compound, drugDAPIBiolegend422801IF (1 µg/mL)
Chemical compound, drugProlong GoldThermoFisherP36930
Chemical compound, drugLysobrite RedAAT Bioquest22645(1/2500)
Chemical compound, drugGeneticin (G418)Sigma-AldrichG1397200 µg/mL to maintain GFP-Rab5-MRC5 and GFP-SNX1-MRC5 cells in culture.
Chemical compound, drugFormaldehyde solution, 37% (wt/v)Sigma-Aldrich252549
Chemical compound, drugTriton X-100AnatraceT1001
Software, algorithmNNT ( et al., 2018AITrackerWeb-based automated tracking service
Software, algorithmMetamorphMolecular Devices LLCMetamorphMetamorph Microscopy Automation and Image Analysis Software
Software, algorithmFIJISchindelin, J.; Arganda-Carreras, I. and Frise, E. et al. (2012) ,‘Fiji: an open-source platform for biological-image analysis’, Nature methods 9 (7): 676–682, PMID22743772, doi:10.1038/nmeth.2019FIJI/ImageJ
Software, algorithmDLFNN Exponent EstimatorHan, Daniel. (2020, January 20). DLFNN Exponent Estimator (Version 0). Exponent EstimatorHurst exponent estimator with Deep Learning Feed-forward Neural Network application for Windows 10. Documentation included.
Software, algorithmPython3Python Software Foundation.Python Language Reference 3.7. Available at www.python.orgPython/Python3
Software, algorithmSciPyVirtanen et al. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, in press.SciPy/scipy
Software, algorithmTensorflowAbadi et al., 2016Tensorflow
Software, algorithmKerasChollet, François and others. ‘Keras.' (2015). Available from https://keras.ioKeras
Software, algorithmfbmFlynn, Christopher, fbm 0.3.0 available for download at or package in PythonExact methods for simulating fractional Brownian motion (fBm) or fractional Gaussian noise (fGn) in python. Approximate simulation of multifractional Brownian motion (mBm) or multifractional Gaussian noise (mGn).
Other35 mm glass-bottomed dishes (µ-Dish)IbidiCat. No. 81150

Hurst exponent estimation methods

Request a detailed protocol

Time averaged MSDs were calculated using

(1) x2(nδt)=1N-nm=0N-n[x((m+n)δt)-x(mδt)]2

where x(nδt) is the track displacement at time nδt and a track contains N coordinates spaced at regular time intervals of δt. From now on, x will denote the time average of x unless explicitly specified otherwise. The total time is T=(N-1)δt and n=1,2,,N-1. Lag-times are the set of possible nδt within the data set and x2(nδt) was then fit to a power-law t2H using the ‘scipy.optimize’ package in Python3 to estimate the exponent H.

Rescaled ranges were calculated by creating a mean adjusted cumulative deviate series z(nδt)=m=0nx(mδt)-x from original displacements x(nδt) and mean displacement x. Then the rescaled range is calculated by

(2) [R/S](nδt)=max({z}n)-min({z}n)1nδtm=0n(x(mδt)-x(nδt))2

where {z}n=z(0),z(δt),z(2δt),,z(nδt). The rescaled range is then fitted to a power law [R/S](nδt)(nδt)H where H is the Hurst (1951). The ‘compute_Hc’ function in the ‘hurst’ package in Python3 estimates the Hurst exponent in this way.

Sequential ranges are defined as

(3) M(nδt)=sup0snδt(x(s)-x(0))-inf0snδt(x(s)-x(0))

where sup(x) is the supremum and inf(x) is the infimum for the set x of real numbers. Then M(nδt)=(nδt)HM(δt) Feller (1951).

DLFNN structure and training

Request a detailed protocol

The fractional Brownian trajectories were generated using the Hosking method within the ‘FBM’ function available from the ‘fbm’ package in Python3. The DLFNN was built using Tensorflow Abadi et al. (2016) and Keras Chollet (2015) in Python3 and trained by using the simulated fractional Brownian trajectories. The training and testing of the neural network were performed on a workstation PC equipped with 2 CPUs with 32 cores (Intel(R) Xeon CPU E5-2640 v3) and 1 GPU (NVIDIA Tesla V100 with 16 GB memory). The structure of the neural network was a multilayer, feedforward neural network where all nodes of the previous layer were densely connected to nodes of the next layer. Each node had a ReLU activation function and the parameters were optimized using the RMSprop optimizer (see Keras documentation Chollet, 2015). Three separate structures were explored and examples of these structures for two hidden layers and five time point inputs are shown in Figure 1g,h and i. The triangular structure was predominantly used since this was the least computationally expensive and accuracy between different structures were similar. To compare the accuracy of different methods, the mean absolute error (σH) of N trajectories, σH=m=1N(Hnsim-Hnest)/N, was used. Before inputting values into the neural network, the time series was differenced to make it stationary. The input values of a fBm trajectory {x}=x0,x1,,xn were differenced and normalized so that {xinput}=(x1-x0)/range(x),(x2-x1)/range(x),,(xn-xn-1)/range(x). Since the model requires differenced and normalized input values, in theory it should be applicable to a wide range of datasets. However, further testing must be done in order to confirm this expectation.

Gaussian kernel density estimation

Request a detailed protocol

Kernel density estimation (KDE) is a non-parametric method to estimate the probability density function (PDF) of random variables. If N random variables xn are distributed by an unknown density function P(x), then the kernel density estimate P(x) is

(4) P^(x)=1Nn=1NK(x-xnl)

where K() is the kernel function and l is the bandwidth. In this paper, we have used a Gaussian KDE, K(y)=12πe-y2/2, to estimate the two dimensional PDFs of the second and bottom row in Figure 1a. This was performed in Python3 using ‘scipy.stats.gaussian_kde’ and Scott’s rule of thumb for bandwidth selection.

Segmenting trajectories into persistent and anti-persistent segments

Request a detailed protocol

From the estimates of Hurst exponent from the DLFNN, trajectories were segmented into persistent and anti-persistent segments. Given an experimental trajectory x=x0,x1,,xn and window of length Nw (an odd number) starting at xi, we obtain the H estimate for the position at xj, where j=i+(Nw-1)/2. This will give us a series of Ht values, H(Nw-1)/2,H(Nw-1)/2+1,,Hn-(Nw-1)/2, which correspond to the positions, x(Nw-1)/2,x(Nw-1)/2+1,,xn-(Nw-1)/2. Then, the values Ht can be segmented into consecutive points of persistence Ht>0.55 and anti-persistence Ht<0.45. The bounding values, 0.55 and 0.45, were used since the mean error of the DLFNN estimation method was σH0.05. Any segment less than the length of Nw was discarded as a precaution against spurious detection.

Directional segmentation of persistent segments

Request a detailed protocol

Once segments of persistence and anti-persistence were defined, we measured the displacement, time and velocity of these segments, shown in the bottom row of Figure 7 and Table 1. The persistent segments were filtered to be only from trajectories that travelled over 0.5 µm; contained more points than the window size; and switched behaviour more than twice in the trajectory. In addition, we assessed if persistent segments were anterograde or retrograde in direction. In order to do this, the centrosomal region was defined by the user as the point where the lysosomes, Rab5 and SNX1 organelles were the largest, brightest, or the most clustered. Image contrast enhancements, such as histogram equalization, were used to locate the centrosomes. By locating the centrosomal region and the cell boundary from user input, the persistent segments can then be classified as anterograde or retrograde. This was done by finding the cosine of the angles, cos(θ), between the vector, r0,i, from the centrosome to the current particle position xi and the vector, ri,i+1, from the current particle position to the next particle position xi+1. The exact formula is cos(θ)=r0,iri,i+1/|r0,i||ri,i+1|. Using windows in a similar fashion as determining persistent and anti-persistent segments, cos(θi) corresponding to position xi was found for the points within a persistent segment. If cos(θi)>σcos(θ), then the motion was deemed to be anterograde and if cos(θi)<-σcos(θ), retrograde. Sweeping through the points of xi, consecutive retrograde or anterograde points formed segments from the persistent segments. A threshold of σcos(θ)=0.3 was used.

Cell lines

Request a detailed protocol

The MRC-5 SV1 TG1 Lung fibroblast cell line was purchased from ECACC. MRC-5 cell lines stably expressing GFP-Rab5C and GFP-SNX1 were kindly provided by Drs. Guy Pearson and Evan Reid (Cambridge Institute for Medical Research, University of Cambridge). The GFP-SNX1 cell line has been previously described in Allison et al. (2017). Cell lines were routinely tested for mycoplasma infection. To generate the MRC-5 GFP-Rab5C stable cell line, GFP-Rab5C was PCRed from pIRES GFP-Rab5C Seaman (2004) using ‘Hpa1 GFP Forward’ (TAGGGAGTTAACATGGTGAGCAAGGGCGAGGA) and ‘Not1 Rab5C Reverse’ (ATCCCTGCGGCCGCTCAGTTGCTGCAGCACTGGC) oligonucleotide primers. The GFP-Rab5C PCR product and a pLXIN-I-NeoR plasmid were digested using Hpa1 (New England Biolabs - R0105) and Not1 (New England Biolabs - R3189) restriction enzymes. The GFP-Rab5C PCR product was then ligated into the digested pLXIN-I-NeoR using T4 DNA Ligase (New England Biolabs - M0202). The ligated plasmid was amplified in bacteria selected with ampicillin and verified using Sanger Sequencing. To generate the GFP-Rab5C MRC-5 cell line, Phoenix retrovirus producer HEK293T cells were transfected with the pLXIN-GFP-Rab5C-I-NeoR plasmid to generate retrovirus containing GFP-Rab5C. MRC-5 cells were inoculated with the virus, and successfully transduced cells were selected using 200 µg/mL Geneticin (G418 - Sigma-Aldrich G1397). Cells used for imaging were not clonally selected.

Live-imaging and tracking

Request a detailed protocol

Stably expressing MRC-5 cells were co-stained with LysoBrite Red (AAT Bioquest), imaged live using fluorescence microscopy and tracked with NNT; Newby et al. (2018). The cells were grown in MEM (Sigma Life Science) and 10% FBS (HyClone) and incubated for 48 hr at 37 in 5% CO2 on 35 mm glass-bottomed dishes (µ-Dish, Ibidi, Cat. No. 81150). For LysoBrite staining, LysoBrite was diluted 1 in 500 with Hank’s Balanced Salt solution (Sigma Life Science). Then 0.5 mL of this solution was added to cells on a 35 mm dish containing 2 mL of growing media and incubated at 37 for at least 1 hr. Cells were then washed with sterile PBS and the media replaced with growing media.

After at least 6 hr incubation, the growing media was replaced with live-imaging media composed of Hank’s Balanced Salt solution (Sigma Life Science, Cat. No. H8264) with added essential and non-essential amino acids, glutamine, penicillin/streptomycin, 25 mM HEPES (pH 7.0) and 10% FBS (HyClone). Live-cell imaging was performed on an inverted Olympus IX71 microscope with an Olympus 100 × 1.35 oil PH3 objective. Samples were illuminated using an OptoLED (Cairn Research) light source with 470 nm and white LEDs. For GFP, a 470 nm LED and Chroma ET470/40 excitation filter was used in combination with a Semrock FITC-3540C filter set. For Lysobrite-Red, a white light LED, Chroma ET573/35 was used with a dualband GFP/mCherry dichroic and an mCherry emission filter (ET632/60). GFP-Rab5-labeled endosomes were imaged in a total of 65 cells, from three independent experiments. GFP-SNX1-labeled endosomes were imaged in a total of 63 cells from four independent experiments. Lysosomes were imaged in separate experiments, with 71 cells imaged from three independent repeats. A stream of 20 ms exposures was collected with a Prime 95B sCMOS Camera (Photometrics) for 17 s using Metamorph software while the cells were kept at 37 (in atmospheric CO2 levels). The endosomes and lysosomes in the videos were then tracked using an automated tracking software (AITracker) Newby et al. (2018).

Confocal imaging

Request a detailed protocol

To compare the localization of SNX1 and Rab5, GFP-Rab5-MRC-5 cells were grown on #1.5 coverslips and then fixed in 3% (w/v) formaldehyde in PBS for 20 min at room temperature (RT). Coverslips were washed twice in PBS, quenched in PBS with glycine, then permeabilized by incubation for 5 min in 0.1% Triton X-100. After another wash in PBS, coverslips were labeled with antibodies to SNX1 and Rab5 for 1 h at RT, washed three times in PBS, then labeled with Alexa488-donkey anti-rabbit and Alexa594-donkey anti-mouse antibodies in 1 µg/mL DAPI in PBS for 30 min. After three PBS washes, coverslips were dipped in deionized water, air-dried and mounted on slides using Prolong Gold.

Images were collected on a Leica TCS SP8 AOBS inverted confocal using a 100x/1.40 NA PL apo objective. The confocal settings were as follows: pinhole, one airy unit; scan speed 400 Hz unidirectional; format 2048 × 2048. Images were collected using hybrid detectors (A488 and A594) or a PMT (DAPI) with these detection mirror settings; [Alexa488, 498 nm-577 nm; Alexa594, 602 nm-667 nm; DAPI, 420 nm-466 nm] using the SuperK Extreme supercontinuum white light laser for 488 nm (10.5%) and 594 nm (5%) excitation, and a 405 nm laser (5%) for DAPI. Images were collected sequentially to eliminate cross-talk between channels. When acquiring 3D optical stacks the confocal software was used to determine the optimal number of Z sections. The data were deconvolved using Huygens software before generating maximum intensity projections of 3D stacks using FIJI.

Software and code

Request a detailed protocol

The code and documentation for determining the Hurst exponent can be found in (copy archived at, 2019) and a GUI is available on

Appendix 1

Comparison of HMM model against fBm model segmentation of experimental trajectories

In order to compare the effectiveness of the neural network and hidden Markov models (HMM), qualitative plots were made of real trajectories and their respective comparisions. The models in the HMM analysis approach had a maximum of three different motion states. It is clear from comparing the endosome track segmented using DLFNN (Figure 3) with Appendix 1—figures 13, that segmentation using hidden Markov models is not suitable for endosome trajectories. Perhaps, by increasing the number of states within models, the hidden Markov models can achieve similar results of the neural network, but this analysis becomes computationally expensive.

Appendix 1—figure 1
The same single trajectory as in Figure 3 but processed using the HMM-Bayes package described in Monnier et al. (2015).

The plot shows: the original trajectory (top left); the inferred state sequence of the most likely model (top right); the model probabilities given a maximum of three possible states (bottom left); the average lifetime of each state and estimated parameters of the most likely model (bottom center left), which in this case is two different diffusive states; individual increment displacements (bottom center right); and the step size distribution of those increments classed into the two different states.

Appendix 1—figure 2
Top: Plot of displacement from a single GFP-SNX1 endosome trajectory in an MRC-5 cell (blue).

Shaded areas show persistent (0.55<H<1 in green) and anti-persistent (0<H<0.45 in magenta) behaviour. Middle: A 15 point moving window DLFNN exponent estimate for the trajectory (black) with a line (dashed) marking diffusion H=0.5 and lines (dotted) marking the confidence bounds H=0.55 and 0.45. Bottom: Plot of instantaneous and moving (15 point) window velocity. Right: Plot of the trajectory of a GFP-SNX1 endosome in an MRC-5 cell with start and finish positions, and persistent (green) and anti-persistent (magenta) segments indicated.

Appendix 1—figure 3
The same HMM-Bayes analysis as shown in Appendix 1—figure 1 applied to the trajectory in Appendix 1—figure 2.

Appendix 2

Testing DLFNN accuracy for different diffusion coefficients

The DLFNN was compared to the MSD estimation method for simulated trajectories with different diffusion coefficients to ensure that the DLFNN estimation was not scale dependent.

Appendix 2—figure 1
Top: MSD (points) and power-law fits (dashed) for two different Brownian trajectories containing 1000 data points with diffusion coefficient 2.5 (red) and 0.025 (blue).

The Hurst exponent should be α=2H=1 for both trajectories. Second Row: Simulation of the two Brownian trajectories with diffusion co-efficient 2.5 (red) and 0.025 (blue). Third Row: Local Hurst exponent estimates given by the DLFNN for the two different trajectories using a 90 point window. The averages of DLFNN Hurst exponent estimates are α=2H=1.110 (red) and α=2H=1.114 (blue). Bottom: Local Hurst exponent estimates of the D=2.5 track given by DLFNN and MSDs using a 90 point window. The average of DLFNN Hurst exponent estimates is α=2H=1.110 and the average of MSD estimates is α=2H=0.937.

Appendix 3

Measuring the residence time and flight length probability density functions of persistent and anti-persistent states

Classifying persistent and anti-persistent states by Hurst exponent values 1/2<H<1 and 0<H<1/2 respectively, individual lysosome and endosome trajectories were segmented with a moving window of 15 points. Data was extracted from microscopy movies from three independent experiments. GFP-Rab5 endosomes from 65 cells, GFP-SNX1 endosomes from 63 cells and lysosomes from 71 different cells were tracked using AITracker Newby et al. (2018). Then the trajectories were segmented into anti-persistent (0<H<1/2) and persistent (1/2<H<1) using the Hurst exponent estimates by DLFNN. The time duration and particle displacement of these segments were measured and then fitted to distributions. In this way, we could measure the stochastic switching between active and passive transport and the statistics of vesicle movement within these states.

Figure 4 shows the survival time probabilities Ψ(t) of different states of motion (persistent and anti-persistent) in vesicle trajectories. Ψ(t) is the probability that the vesicle will still be in the same state of motion after time t has elapsed. Figure 4 shows that the persistent and anti-persistent states follow Ψ(t)=e-λt(τ0τ0+t)μ. The survival time probabilities show that lysosomes and endosomes are far more likely to remain trapped in a anti-persistent state than be persistently transported by motor proteins. While this is intuitively obvious in the context of cell biology, this analysis provides quantitative characterization of endosomal and lysosomal motility. Appendix 3—table 1 shows the parameters of fitting for Figure 4. Appendix 3—figure 1 shows the empirical probability density functions (PDF) of the particle displacements for different states of motion. Displacements of segments are fitted to Burr Type XII distributions,


where x, x0, c and k>0.

As expected, this analysis demonstrates that both endosomes and lysosomes are far more likely to move large distances when they are in the persistent state. This reconciles how vesicles are able to move large distances even though they are more likely to stay in a anti-persistent state for long periods of time. The large displacements in the persistent state compete with long durations spent in the anti-persistent state.

Appendix 3—figure 1
Normalized histograms (blue) and corresponding maximum likelihood estimation for Burr distributions (line) of segment displacements from lysosome and endosome experimental trajectories segmented using DLFNN.

Parameter estimates are shown the legend.

Appendix 3—table 1
Results for the fits of survival time probabilities shown in Figure 4.

The parameters and the analytical survival functions used to fit the Kaplan-Meier estimator survival curves.

Rab5 dataSurvival function Ψ(t)Fit parameters for Figure 4
Anti-persistente-λt(τ0τ0+t)μμ=0.518±0.004, τ0=0.140±0.002s, λ=0.352±0.002s1
Persistente-λt(τ0τ0+t)μμ=1.352±0.102, τ0=0.045±0.006s, λ=1.286±0.142s1
SNX1 dataSurvival function Ψ(t)Fit parameters for Figure 4
Anti-persistente-λt(τ0τ0+t)μμ=0.757±0.023, τ0=0.118±0.006s, λ=1.004±0.016s1
Persistente-λt(τ0τ0+t)μμ=2.034±0.205, τ0=0.185±0.026s, λ=0.659±0.137s1
Lysosome dataSurvival function Ψ(t)Fit parameters for Figure 4
Anti-persistente-λt(τ0τ0+t)μμ=1.113±0.009, τ0=0.208±0.003s, λ=0.5s1 (fixed)
Persistente-λt(τ0τ0+t)μμ=1.748±0.065, τ0=0.041±0.003s, λ=1.216±0.139s1

Appendix 4

Calculating information criteria for GMM fittings

In order to determine the minimum amount of components necessary to model the histograms of Hurst exponent in Figure 5, the Akaike and Bayes Information Criterion were computed.

Appendix 4—figure 1
The Akaike and Bayes information criterion against number of components in the Gaussian mixture model shown in Figure 5 for GFP-Rab5 tagged endosomes (top), SNX1-GFP tagged endosomes (middle) and lysosomes (bottom).

Data availability

Supporting files are on GitHub and Zenodo.

The following data sets were generated


  1. Conference
    1. Abadi M
    2. Barham P
    3. Chen J
    4. Chen Z
    5. Davis A
    6. Dean J
    7. Devin M
    8. Ghemawat S
    9. Irving G
    10. Isard M
    Tensorflow: a system for large-scale machine learning
    12th USENIX Symposium on Operating Systems Design and Implementation.
    1. Evans R
    2. Jumper J
    3. Kirkpatrick J
    4. Sifre L
    5. Green T
    6. Qin C
    7. Zidek A
    8. Nelson A
    9. Bridgland A
    10. Penedones H
    De novo structure prediction with deeplearning based scoring
    Annual Review of Biochemistry 77:6.
    1. Hurst HE
    Long-term storage capacity of reservoirs
    Transactions of the American Society of Civil Engineers 116:770–799.
  2. Report
    1. Peltier R-F
    2. Lévy Véhel J
    Multifractional Brownian Motion : Definition and Preliminary Results. Research Report RR-2645, INRIA
    Projet FRACTALES.
  3. Book
    1. Peters EE
    Fractal Market Analysis: Applying Chaos Theory to Investment and Economics
    John Wiley & Sons.
  4. Book
    1. Waigh TA
    The Physics of Living Processes: A Mesoscopic Approach
    John Wiley & Sons.

Article and author information

Author details

  1. Daniel Han

    1. Department of Mathematics, University of Manchester, Manchester, United Kingdom
    2. School of Biological Sciences, University of Manchester, Manchester, United Kingdom
    3. Department of Physics and Astronomy, University of Manchester, Manchester, United Kingdom
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9088-1651
  2. Nickolay Korabel

    Department of Mathematics, University of Manchester, Manchester, United Kingdom
    Formal analysis, Validation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Runze Chen

    Department of Computer Science, University of Manchester, Manchester, United Kingdom
    Software, Formal analysis
    Competing interests
    No competing interests declared
  4. Mark Johnston

    School of Biological Sciences, University of Manchester, Manchester, United Kingdom
    Validation, Investigation
    Competing interests
    No competing interests declared
  5. Anna Gavrilova

    1. Department of Mathematics, University of Manchester, Manchester, United Kingdom
    2. School of Biological Sciences, University of Manchester, Manchester, United Kingdom
    Data curation, Investigation
    Competing interests
    No competing interests declared
  6. Victoria J Allan

    School of Biological Sciences, University of Manchester, Manchester, United Kingdom
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Project administration, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4583-0836
  7. Sergei Fedotov

    Department of Mathematics, University of Manchester, Manchester, United Kingdom
    Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Methodology, Project administration, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
  8. Thomas A Waigh

    1. Department of Physics and Astronomy, University of Manchester, Manchester, United Kingdom
    2. The Photon Science Institute, University of Manchester, Manchester, United Kingdom
    Conceptualization, Supervision, Validation, Methodology, Project administration, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7084-559X


Wellcome Trust (215189/Z/19/Z)

  • Daniel Han

EPSRC (EP/J019526/1)

  • Nickolay Korabel
  • Victoria J Allan
  • Sergei Fedotov
  • Thomas A Waigh

BBSRC (BB/H017828/1)

  • Victoria J Allan

Wellcome Trust (108867/Z/15/Z)

  • Anna Gavrilova

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


The authors thank: Dr. Jay Newby and Zach Richardson (UNC Chapel Hill) for assistance in using the automated tracking system (AITracker); Prof. Philip Woodman, Prof. Martin Lowe, Prof. Matthias Weiss (Universität Bayreuth), Dr. Henry Cox, Dr. Jack Hart, Rebecca Yarwood and Hannah Perkins for discussions; and, Dr. Guy Pearson and Dr. Evan Reid (University of Cambridge) for providing the GFP-Rab5 and GFP-SNX1 MRC-5 cell lines. DH acknowledges financial support from the Wellcome Trust Grant No. 215189/Z/19/Z. SF, NK, TAW, and VJA acknowledge financial support from EPSRC Grant No. EP/J019526/1. VJA acknowledges support from the Biotechnology and Biological Sciences Research Council grant number BB/H017828/1. AG acknowledges financial support from the Wellcome Trust Grant No. 108867/Z/15/Z. The Leica SP8 microscope used in this study was purchased by the University of Manchester Strategic Fund. Special thanks goes to Dr. Peter March for his help with the confocal imaging.

Version history

  1. Received: September 26, 2019
  2. Accepted: March 22, 2020
  3. Accepted Manuscript published: March 24, 2020 (version 1)
  4. Version of Record published: April 8, 2020 (version 2)
  5. Version of Record updated: May 26, 2020 (version 3)


© 2020, Han et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 1,822
  • 301
  • 39

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Daniel Han
  2. Nickolay Korabel
  3. Runze Chen
  4. Mark Johnston
  5. Anna Gavrilova
  6. Victoria J Allan
  7. Sergei Fedotov
  8. Thomas A Waigh
Deciphering anomalous heterogeneous intracellular transport with neural networks
eLife 9:e52224.

Share this article

Further reading

    1. Cell Biology
    2. Structural Biology and Molecular Biophysics
    Aaron JO Lewis, Frank Zhong ... Ramanujan S Hegde
    Research Article

    The protein translocon at the endoplasmic reticulum comprises the Sec61 translocation channel and numerous accessory factors that collectively facilitate the biogenesis of secretory and membrane proteins. Here, we leveraged recent advances in cryo-electron microscopy (cryo-EM) and structure prediction to derive insights into several novel configurations of the ribosome-translocon complex. We show how a transmembrane domain (TMD) in a looped configuration passes through the Sec61 lateral gate during membrane insertion; how a nascent chain can bind and constrain the conformation of ribosomal protein uL22; and how the translocon-associated protein (TRAP) complex can adjust its position during different stages of protein biogenesis. Most unexpectedly, we find that a large proportion of translocon complexes contains RAMP4 intercalated into Sec61’s lateral gate, widening Sec61’s central pore and contributing to its hydrophilic interior. These structures lead to mechanistic hypotheses for translocon function and highlight a remarkably plastic machinery whose conformations and composition adjust dynamically to its diverse range of substrates.

    1. Cell Biology
    2. Developmental Biology
    Nicolas Loyer, Elizabeth KJ Hogg ... Jens Januschke
    Research Article

    The generation of distinct cell fates during development depends on asymmetric cell division of progenitor cells. In the central and peripheral nervous system of Drosophila, progenitor cells respectively called neuroblasts or sensory organ precursors use PAR polarity during mitosis to control cell fate determination in their daughter cells. How polarity and the cell cycle are coupled, and how the cell cycle machinery regulates PAR protein function and cell fate determination is poorly understood. Here, we generate an analog sensitive allele of CDK1 and reveal that its partial inhibition weakens but does not abolish apical polarity in embryonic and larval neuroblasts and leads to defects in polarisation of fate determinants. We describe a novel in vivo phosphorylation of Bazooka, the Drosophila homolog of PAR-3, on Serine180, a consensus CDK phosphorylation site. In some tissular contexts, phosphorylation of Serine180 occurs in asymmetrically dividing cells but not in their symmetrically dividing neighbours. In neuroblasts, Serine180 phosphomutants disrupt the timing of basal polarisation. Serine180 phosphomutants also affect the specification and binary cell fate determination of sensory organ precursors as well as Baz localisation during their asymmetric cell divisions. Finally, we show that CDK1 phosphorylates Serine-S180 and an equivalent Serine on human PAR-3 in vitro.