Deciphering anomalous heterogeneous intracellular transport with neural networks
Abstract
Intracellular transport is predominantly heterogeneous in both time and space, exhibiting varying nonBrownian behavior. Characterization of this movement through averaging methods over an ensemble of trajectories or over the course of a single trajectory often fails to capture this heterogeneity. Here, we developed a deep learning feedforward neural network trained on fractional Brownian motion, providing a novel, accurate and efficient method for resolving heterogeneous behavior of intracellular transport in space and time. The neural network requires significantly fewer data points compared to established methods. This enables robust estimation of Hurst exponents for very short time series data, making possible direct, dynamic segmentation and analysis of experimental tracks of rapidly moving cellular structures such as endosomes and lysosomes. By using this analysis, fractional Brownian motion with a stochastic Hurst exponent was used to interpret, for the first time, anomalous intracellular dynamics, revealing unexpected differences in behavior between closely related endocytic organelles.
Introduction
The majority of transport inside cells on the mesoscale (nm100μm) is now known to exhibit nonBrownian anomalous behavior (Metzler and Klafter, 2004; Barkai et al., 2012; Waigh, 2014). This has wide ranging implications for most of the biochemical reactions inside cells and thus cellular physiology. It is vitally important to be able to quantitatively characterize the dynamics of organelles and cellular responses to different biological conditions (van Bergeijk et al., 2015; Patwardhan et al., 2017; Moutaux et al., 2018). Classification of different nonBrownian dynamic behaviors at various time scales has been crucial to the analysis of intracellular dynamics (Fedotov et al., 2018; Bressloff and Newby, 2013), protein crowding in the cell (Banks and Fradin, 2005; Weiss et al., 2004), microrheology (Waigh, 2005; Waigh, 2016), entangled actin networks (Amblard et al., 1996), and the movement of lysosomes (Ba et al., 2018) and endosomes (FloresRodriguez et al., 2011). Anomalous transport is currently analyzed by statistical averaging methods and this has been a barrier to understanding the nature of its heterogeneity.
Spatiotemporal analysis of intracellular dynamics is often performed by acquiring and tracking microscopy movies of fluorescing membranebound organelles in a cell (Rogers et al., 2007; FloresRodriguez et al., 2011; Chenouard et al., 2014; Zajac et al., 2013). These tracks are then commonly interpreted using statistical tools such as the mean square displacement (MSD) averaged over the ensemble of tracks, $\u27e8\mathrm{\Delta}{r}^{2}(t)\u27e9$. The MSD is a measure that is widely used in physics, chemistry and biology. In particular, MSDs serve to distinguish between anomalous and normal diffusion at different temporal scales by determining the anomalous exponent $\alpha $ through $\u27e8\mathrm{\Delta}{r}^{2}(t)\u27e9\sim {t}^{\alpha}$ (Metzler and Klafter, 2000). Diffusion is defined as $\alpha =1$, subdiffusion $0<\alpha <1$ and superdiffusion $1<\alpha <2$ (Klafter and Sokolov, 2011). To improve the statistics of MSDs, they are often averaged over different temporal scales, forming the timeaveraged MSD (TAMSD), $\overline{\mathrm{\Delta}{r}^{2}(\tau )}\sim {\tau}^{\alpha}$, where $\tau $ is the lag time (Sokolov, 2012).
For stochastic processes with longrange time dependence such as fractional Brownian motion (fBm), other statistical averaging methods exist. For fBm, the MSD is $\u27e8{B}_{H}^{2}(t)\u27e9\sim {t}^{2H}$ with the Hurst exponent, $H$ varying between 0 and 1. One can use rescaled and sequential range analysis (Samorodnitsky, 2016; Peters, 1994) to estimate $H$. The advantage of modeling intracellular transport with fBm is that both subdiffusion ($0<H<1/2$) and superdiffusion ($1/2<H<1$) can be explained in a unified manner using only the Hurst exponent. The essence of fBm is that longrange correlations result in random trajectories that are antipersistent ($0<H<1/2$) or persistent ($1/2<H<1$). How can we understand persistence in the context of intracellular transport? The term persistence can be understood as the processive motorprotein transport of cargo in one direction, whether it be retrograde or anterograde. From a probabilistic viewpoint, persistence can be interpreted as the cargo being more likely to keep the same direction given it had been moving in this fashion before. Conversely, antipersistence is interpreted as cargo being more likely to change its direction given it had been moving in that direction before. Antipersistence can arise if cargo is confined to a local volume in the cytoplasm simply due to crowding or tethering biochemical interactions (Harrison et al., 2013), which in effect leads to subdiffusion (Weiss et al., 2004; Ernst et al., 2012). By interpreting intracellular cargo transport as fBm, there are two main advantages: we can describe movement with the intuitive biological concepts of persistence and antipersistence; and we can provide an immediate link to anomalous diffusion since α = 2H for constant H.
Cargo movement in vivo often exhibits random switching between persistent and antipersistent movement, even in a single trajectory (Chen et al., 2015). Therefore, we can model this by a stochastic local Hurst exponent, $H(t)$, which jumps between persistent ($1/2<H(t)<1$) and antipersistent ($0<H(t)<1/2$) states. Still, a major challenge exists: how can we estimate a local stochastic Hurst exponent from a trajectory?
Whilst exponent estimation using neural networks is an emerging field (Bondarenko et al., 2016), segmentation of single trajectories into persistent and antipersistent sections based on instantaneous dynamic behavior has not been studied. Instead, hidden Markov models (Monnier et al., 2015; Persson et al., 2013) and windowed analyses (Getz and Saltz, 2008) are commonly used to segment local behavior along single trajectories (see Appendix A for comparisons). Even so, most methods neglect the microscopic processes which are often a feature of intracellular transport (e.g. alternation between ‘runs’ and ‘rests’) (Weiss et al., 2004; Chen et al., 2015; Fedotov et al., 2018) and the nonMarkovian nature of their motion (Fuliński, 2017). fBm was chosen due to its selfsimilar properties that allow direct analysis at short time scales given by experimental systems; and the experimental evidence for fBm in the crowded cytoplasm (Weiss et al., 2004; Szymanski and Weiss, 2009; Krapf et al., 2019). Moreover, other anomalous diffusion models, such as scaled Brownian motion (Lim and Muniandy, 2002), subdiffusive continuous time random walks (Sokolov, 2012) and superdiffusive Lévy walks (Fedotov et al., 2018) are not suitable to interpret anomalous trajectories on the microscopic level.
Here, we present a new method for characterizing anomalous transport inside cells based on a Deep Learning Feedforward Neural Network (DLFNN) that is trained on fBm. Neural networks are becoming a general tool in a wide range of fields, such as singlecell transcriptomics (Deng et al., 2019) and protein folding (Evans et al., 2018). We find the neural network is a much more sensitive method to characterise fBm than previous statistical tools, since it is an intrinsically nonlinear regression method that accounts for correlated time series. In addition, it can estimate the Hurst exponent using as few as seven consecutive time points with good accuracy.
To test the ability of the DLFNN to segment realworld biological motility, we focused on organelles in the endocytic pathway. This pathway is essential for cell homeostasis, allowing nutrient uptake, the turnover of plasma membrane components, and uptake of growth factor receptors bound to their ligands. Early endosomes then sort components destined for degradation from material that needs to be recycled back to the cell surface or to the TransGolgi Network (TGN) (Naslavsky and Caplan, 2018). Many aspects of endosome function are regulated by Rab5, a small GTPase that is localized to the cytosolic face of early endosomes (Stenmark and Olkkonen, 2001). Sorting nexin 1 (SNX1) also localises to early endosomes, where it works with the retromer complex to retrieve and recycle cargoes from early endosomes to the TGN (Simonetti and Cullen, 2019). SNX1 achieves this through regulating tubular membrane elements on early endosomes by associating with regions of high membrane curvature (Carlton et al., 2004). Early endosomes mature into late endosomes, which then fuse with lysosomes, delivering their contents for degradation (Huotari and Helenius, 2011). Endocytic pathway components are highly dynamic, with microtubule motors driving longdistance movement while shortrange dynamics involve actinbased motility (Granger et al., 2014; Cabukusta and Neefjes, 2018), making them ideal test cases for DLFNN analysis. The new method enables the interpretation of experimental trajectories of lysosomes and endosomes as fBm with stochastic local Hurst exponent, H (t). This in turn allows us to unambiguously and directly classify endosomes and lysosomes to be in antipersistent or persistent states of motion at different times. From experiments, we observe that the time spent within these two states both exhibit truncated heavytailed distributions.
To our knowledge, this is the first method which is capable of resolving heterogeneous behavior of anomalous transport in both time and space. We anticipate that this method will be useful in characterizing a wide range of systems that exhibit anomalous heterogeneous transport. We have therefore created a GUI computer application in which the DLFNN is implemented, so that the wider community can conveniently access this analysis method.
Results and discussion
The DLFNN is more accurate than established methods
We tested a DLFNN trained on fBm with three hidden layers of densely connected nodes on N = 10^{4} computergenerated fBm trajectories each with n = 10^{2} evenly spaced time points and constant Hurst exponent ${H}_{sim}$, randomly chosen between 0 and 1. The DLFNN estimated the Hurst exponents ${H}_{est}$ based on the trajectories, and these were compared with those estimated from TAMSD, rescaled range, and sequential range methods (Figure 1a). The difference between the simulated and estimated values $\mathrm{\Delta}H={H}_{sim}{H}_{est}$ was much smaller for the DLFNN than for the other methods (Figure 1a), and the DLFNN was $\sim 3$ times more accurate at estimating Hurst exponents with a mean absolute error (${\sigma}_{H}$) $\sim 0.05$. Also, the errors in estimation of the DLFNN are more stable across values of ${H}_{sim}$.
Tracking of intracellular motion usually generates trajectories with a variable number of data points. We therefore compared the performance of the different exponent estimation methods when the number of evenly spaced, consecutive fBm time points in a trajectory varied over $n=5,6,\mathrm{\dots},{10}^{2}$ points. The DLFNN maintained an accuracy of ${\sigma}_{H}\sim 0.05$ across $n$, whereas ${\sigma}_{H}$ of other methods increase as $n$ decreases (Figure 1b), and was always substantially worse than that of the DLFNN estimation. Different DLFNN structures (see Figure 1c,d and e) performed similarly, and introducing more hidden layers did not affect the accuracy of estimation (Figure 1f and g). Given that the structure of DLFNN does not significantly affect the accuracy of exponent estimation, a triangular densely connected DLFNN was used for all subsequent analyses.
The structure of a triangular DLFNN means that the input layer consists of $n$ nodes, which are densely connected to $n1$ nodes in the first hidden layer, such that at the $l$^{th} hidden layer, there would be $nl$ densely connected nodes. Then to estimate the Hurst exponent these nodes are connected to a single node using a Rectified Linear Unit (ReLU) activation function, which returns the exponent estimate. A triangular DLFNN therefore uses only ${\sum}_{l=0}^{L}(nl)+1$ nodes for $L$ hidden layers and $n$ input points, whereas the rectangular structure uses $nL+1$ nodes and the antitriangular structure uses ${\sum}_{l=0}^{L}(n+l)+1$. The triangular structure results in a significant decrease in training parameters, and hence computational requirements, while maintaining good levels of accuracy. This demonstrates that a computationally inexpensive neural network can accurately estimate exponents.
The DLFNN’s estimation capabilities were tested further by inputting ${n}_{rand}$ randomly sampled time points from the original fBm trajectories. Surprisingly, ${\sigma}_{H}\sim 0.05$ is regained even with just 40 out of 100 data points randomly sampled from the time series for any triangular DLFNN with more than one hidden layer (Figure 1g). For this method to work with experimental systems, it must estimate Hurst exponents even when the trajectories are noisy. Figure 1h shows how the exponent estimation error increases when Gaussian noise with varying strength compared to the original signal is added to the fBm trajectories. Importantly, the DLFNN accuracy ${\sigma}_{H}$ at 20% NSR is as good as the accuracy of other methods with no noise (compare 1a and h).
To characterize the accuracy of ${H}_{sim}$ estimation by the DLFNN, we calculated the bias, $b({H}_{sim})=\mathbb{E}\left[{H}_{est}\right]{H}_{sim}$; variance, $\text{Var}({H}_{sim})=\mathbb{E}\left[{H}_{est}\mathbb{E}{\left[{H}_{est}\right]}^{2}\right]$; and mean square error, $\text{MSE}=\text{Var}({H}_{sim})+b{({H}_{sim})}^{2}$ (Figure 1i). To quantify the efficiency of the estimator the Fisher information of the neural network’s estimation needs to be found and the CramerRao lower bound calculated. The values of bias, variance and MSE were very low (Figure 1i), which taken together with the simplicity of calculation and the accuracy of estimation even with small number of data points, demonstrates the strength of the DLFNN method. Furthermore, once trained, the model can be saved and reloaded at any time. Saved DLFNN models, code and the DLFNN Exponent Estimator GUI are available to download (see Software and Code).
DLFNN allows analysis of simulated trajectories with local stochastic Hurst exponents
Estimating local Hurst exponents is fundamentally important because much research has focused on inferring active and passive states of transport within living cells using positionderived quantities such as windowed MSDs, directionality and velocity (Arcizet et al., 2008; Monnier et al., 2015). The trajectories are then segmented and Hurst exponents measured in an effort to characterize the behavior of different cargo when they are actively transported by motor proteins (Chen et al., 2015; Fedotov et al., 2018) or subdiffusing in the cytoplasm (Jeon et al., 2011). However, conventional methods such as the MSD and TAMSD need trajectories with many time points ($n\sim {10}^{2}{10}^{3}$) to calculate a single Hurst exponent value with high fidelity. In contrast, the DLFNN enables the Hurst exponent to be estimated, directly from positional data, for a small number of points. Furthermore, the DLFNN measures local Hurst exponents without averaging over time points and is able to characterize particle trajectories that may exhibit multifractional, heterogeneous dynamics.
To provide a synthetic data set that mimics particle motion in cells, we simulated fBm trajectories with Hurst exponents that varied in time, and applied a symmetric moving window to estimate the Hurst exponent using a small number of data points before and after each time point (Figure 2). The DLFNN was able to identify segments with different exponents, and provided a good running estimation of the Hurst exponent values. The DLFNN could also handle trajectories with different diffusion coefficients, and generally performed better than MSD analysis when a sliding window was used (see Appendix B).
DLFNN analysis reveals differences in motile behavior of organelles in the endocytic pathway
Early endosomes labeled with green fluorescent protein (GFP)Rab5 undergo bursts of rapid cytoplasmic dyneindriven motility interspersed with periods of rest (FloresRodriguez et al., 2011; Zajac et al., 2013). We therefore applied the DLFNN method to experimental trajectories obtained from automated tracking (Newby et al., 2018) data of GFPRab5labeled endosomes in an MRC5 cell line that stably expressed GFPRab5 at low levels (Figure 3). A moving window of 15 points identified persistent (green) and antipersistent (magenta) segments, which corresponded well to the moving window velocity plots (Figure 3, lower panel), confirming that the neural network is indeed distinguishing passive states from active transport states with nonzero average velocity. We then used it to analyze the motility of two other endocytic compartments: SNX1positive endosomes (Allison et al., 2017; Hunt et al., 2013) and lysosomes (Cabukusta and Neefjes, 2018; Hendricks et al., 2010). It successfully segmented tracks of GFPSNX1 endosomes (Figure 3—figure supplement 1) in a stable MRC5 cell line (Allison et al., 2017) and lysosomes visualized using lysobrite dye (Figure 3—figure supplement 2). A total of 63–71 MRC5 cells were analyzed, giving 40,800 (GFPRab5 endosome), 11,273 (GFPSNX1 endosome) and 38,039 (lysosome) tracks that were segmented into 277,926 (GFPRab5), 215,087 (GFPSNX1) and 474,473 (lysosome) persistent or antipersistent sections, each yielding a displacement, duration and average $H$.
These data revealed intriguing similarities and differences in behavior between the three endocytic components. Analysis of the duration and displacement of segments (Appendix C) revealed that all organelles spent longer in antipersistent than persistent states (Figure 4) but moved much further when persistent (Appendix 3—figure 1), as expected. However, GFPSNX1 endosomes spent much less time than GFPRab5 endosomes or lysosomes in an antipersistent state (Figure 4). This difference in behavior was also seen when histograms of the Hurst exponents were plotted (Figure 5), as SNX1 endosomes were much less likely to exhibit antipersistent behavior, particularly with $H<0.3$, than Rab5 endosomes or lysosomes. This was confirmed by fitting the histograms of the Hurst exponent with a six component Gaussian mixture model (Figure 5b–d; Appendix D). In contrast, all three organelle classes exhibited a similar range of Hurst exponents when they underwent directionally persistent motion.
To understand organelle motility in the context of cell behavior, an additional layer of complexity needs to be considered  the location of the moving structure within the cell itself. Such information would reveal zones that favor antipersistent or persistent movement (Bálint et al., 2013). Using the neural network, trajectories of GFPRab5, GFPSNX1 endosomes and lysosomes from MRC5 cells were plotted with colors depicting the changing Hurst exponent at different points in each trajectory (Figure 6). For Rab5 and SNX1positive endosomes, antipersistent organelles were enriched in the cell periphery, but occasionally underwent longrange persistent movement towards the nucleus (Figure 6—video 1; Figure 6—video 2), as expected (FloresRodriguez et al., 2011; Zajac et al., 2013; Hunt et al., 2013; Allison et al., 2017). Lysosomes displayed completely different behavior, with most trajectories being antipersistent, while the persistent trajectories were not obviously organized spatially (Figure 6; Figure 6—video 3). The location information together with classification of antipersistent and persistent trajectories qualitatively shows the regions of high motordriven activity within the cell for different endocytic organelles.
Many cargos that move along microtubules can switch their direction of motility, between dyneindriven inward (retrograde) motion toward the microtubule minus ends at the cell centre and plusenddirected outward (anterograde) movement driven by kinesin family members (Hancock, 2014). To investigate the characteristics of anterograde and retrograde motility of endocytic organelles, we adapted our method to subdivide persistent segments according to whether the movement occurred towards or away from the userdefined centrosomal region (see Materials and methods). Only tracks with displacement of >0.5µm from their start point were selected, which yielded 2369 Rab5, 2099 SNX1 and 7645 lysosome persistent segments that were then analyzed to give the duration, displacement and velocity of anterograde and retrograde excursions (Figure 7; Table 1). The antipersistent segments contained within these tracks were also analyzed.
These statistics revealed that each endocytic organelle moved with different characteristics. GFPRab5 endosomes moved much faster than GFPSNX1 endosomes or lysosomes, particularly in the retrograde direction (Figure 7, upper panel). Strikingly, although the GFPSNX1 endosomes were slowest in both directions, they moved furthest and for longest in each segment, in keeping with the longer duration of persistent segments seen in the global analysis of tracks (Figure 4) and higher H values (Figure 5). The differences in behavior between Rab5 and SNX1 endosomes is intriguing, since both are recruited to the early endosome by the lipid phosphoinositol3phosphate (Christoforidis et al., 1999; Carlton et al., 2004; Behnia and Munro, 2005; Huotari and Helenius, 2011). However, SNX1 also senses membrane curvature (Carlton et al., 2004), and immunofluorescence labeling of MRC5 cells with antibodies to Rab5 and SNX1 demonstrated that they reside on distinct domains of larger early endosomes (Figure 6—figure supplement 1), as expected van Weering et al. (2012). In addition, while SNX1 endosomes were usually Rab5positive, there was a significant population of Rab5 endosomes that lacked SNX1, especially smaller early endosomes that were often located in the cell periphery. It is likely that this population of Rab5positive, SNX1negative endosomes is particularly motile. The high retrograde velocity of these endosomes might be explained by the recruitment of dynein to Rab5 endosomes via Hook family members (Bielska et al., 2014; Zhang et al., 2014; Schroeder and Vale, 2016; Guo et al., 2016). These dynein adaptors have the intriguing property of recruiting two dyneins per dynactin (Urnavicius et al., 2018; Grotjahn et al., 2018), leading to faster rates of movement in motility assays using purified protein than adaptors that only recruit one dynein per dynactin. Perhaps, SNX1 endosomes move more slowly than Rab5 endosomes because they use a ‘singledynein’ adaptor. An alternative explanation could be that SNX1 endosomes are slowed down by interactions with the actin cytoskeleton, since SNX1 domains are enriched in the WASH complex, which in turn controls localized actin assembly (Gomez and Billadeau, 2009; Simonetti and Cullen, 2019). Actin might also contribute to the slow, steady motion of SNX1 endosomes via myosin motors or the formation of actin comets (Simonetti and Cullen, 2019). These interesting possibilities remain to be tested experimentally.
The analysis of anterograde and retrograde segments revealed that lysosomes moved at moderate speed, and were equally fast in both directions, but each burst of movement was short (Figure 7, upper panels). In addition, pauses were $\ge 4$ times longer for lysosomes than either early endosome type (Figure 7, lower panels). Lysosomes also often changed direction of movement (e.g. Figure 3—figure supplement 2), as previously reported (Hendricks et al., 2010). So far, no activating dynein adaptor has been identified on lysosomes (ReckPeterson et al., 2018), although several potential dynein interactors have been identified, such as RILP (Rab7 interacting lysosomal protein (Cabukusta and Neefjes, 2018). Whether this underlies the difference in motile behavior between lysosomes and early endosomes remains to be tested: however, a less active dynein could well contribute to frequent reversals in direction (Hancock, 2014).
fBm with a stochastic Hurst exponent is a new possible intracellular transport model
fBm is a Gaussian process ${B}_{H}(t)$ with zero mean and covariance $\u27e8{B}_{H}(t){B}_{H}(s)\u27e9\sim {t}^{2H}+{s}^{2H}{(ts)}^{2H}$, where the Hurst exponent, $H$ is a constant between 0 and 1. With the DLFNN providing local estimates of the Hurst exponent, the motion of endosomes and lysosomes can be described as fBm with a stochastic Hurst exponent, $H(t)$. This is different to multifractional Brownian motion (Peltier and Lévy Véhel, 1995) where $H(t)$ is a function of time. In our case, $H(t)$ is itself a stochastic process and such a process has been considered theoretically (Ayache and Taqqu, 2005). This is the first application of such a theory to intracellular transport and opens a new method for characterizing vesicular movement. Furthermore, Figure 3 shows that the motion of a vesicle, ${B}_{H}(t)$, exhibits regime switching behavior between persistent and antipersistent states.
We found that the times that lysosomes and endosomes spend in a persistent and antipersistent state are heavytailed (Figure 4). These times are characterized by the probability densities $\psi (t)\sim {t}^{\mu 1}$, where antipersistent states have 0 < µ < 1 and persistent states have 1 < µ < 2. Extensive plots and fittings are shown in Figure 4 and Appendix C. In fact, the residence time probability density has an infinite mean to remain in an antipersistent state ($0<H(t)<1/2$) but in persistent states ($1/2<H(t)<1$) the mean of the residence time probability density is finite and the second moment is infinite. This implies that the vesicles may have a biological mechanism to prioritize certain interactions within the complex cytoplasm, similar to ecological searching patterns (Reynolds and Rhodes, 2009), mRNPs (Song et al., 2018), swarming bacteria (Ariel et al., 2015) and how human dynamics are often heavy tailed and bursty (Barabási, 2005).
Conclusions
We developed a Deep Learning Feedforward Neural Network trained on fBm that estimates accurately the Hurst exponent for heterogeneous trajectories. Estimating the Hurst exponent using a DLFNN is not only more accurate than conventional methods but also enables direct trajectory segmentation without a drastic increase in computational cost. We package this DLFNN analysis code into a userfriendly application, which can predict the Hurst exponent with consistent accuracy for as few as seven consecutive data points. This is useful to biologists since major limitations to trajectory analysis are: the brevity of tracks due to the fact that particles may rapidly switch between motile states or move out of the plane of focus; the rapid nature of some biochemical reactions; and the bleaching of fluorescent probes (with nonbleaching probes often being bulky or cytotoxic). This method can be used to detect persistent and antipersistent states of motion purely from the positional data of trajectories and removes the prerequisite of time or ensemble averaging for effective heterogeneous transport characterization.
The DLFNN enabled us to discover regime switching in lysosome and endosome movement that can be modeled by fBm with a stochastic Hurst exponent. This interpretation is a unified approach to describe motion with antipersistence and persistence varying over time. Furthermore, the residence time of vesicles in a persistent or antipersistent state is found to be heavy tailed, which implies that endosomes and lysosomes possess biological mechanisms to prioritize varying biological processes similar to ecological searching patterns (Reynolds and Rhodes, 2009), mRNPs (Song et al., 2018), swarming bacteria (Ariel et al., 2015) and even human dynamics (Barabási, 2005). Importantly, applying this method to identify and analyze the anterograde and retrograde motility reveals unexpected differences in behavior between closelyrelated organelles. Finally, in addition to providing a new segmentation method of active and passive transport, this new technique distinguishes the difference in motility between lysosomes, Rab5positive endosomes and SNX1 positive endosomes. The results suggest that the manner in which these vesicles move is dependent on their identity within the endocytic pathway, especially when the motion is antipersistent. This implies that directionality and the correlation between consecutive steps is important to measure in addition to the displacement, velocity and duration of movement. There is considerable scope for using these methods to identify changes in motility of different organelles caused by disease. We hope that this type of analysis will allow discoveries in particle motility of a more refined nature and make applying anomalous transport theory more accessible to researchers in a wide variety of disciplines.
Materials and methods
Hurst exponent estimation methods
Request a detailed protocolTime averaged MSDs were calculated using
where $x(n\delta t)$ is the track displacement at time $n\delta t$ and a track contains $N$ coordinates spaced at regular time intervals of $\delta t$. From now on, $\u27e8x\u27e9$ will denote the time average of $x$ unless explicitly specified otherwise. The total time is $T=(N1)\delta t$ and $n=1,2,\mathrm{\dots},N1$. Lagtimes are the set of possible $n\delta t$ within the data set and $\u27e8{x}^{2}(n\delta t)\u27e9$ was then fit to a powerlaw $\sim {t}^{2H}$ using the ‘scipy.optimize’ package in Python3 to estimate the exponent $H$.
Rescaled ranges were calculated by creating a mean adjusted cumulative deviate series $z(n\delta t)={\sum}_{m=0}^{n}x(m\delta t)\u27e8x\u27e9$ from original displacements $x(n\delta t)$ and mean displacement $\u27e8x\u27e9$. Then the rescaled range is calculated by
where ${\left\{z\right\}}_{n}=z(0),z(\delta t),z(2\delta t),\mathrm{\dots},z(n\delta t)$. The rescaled range is then fitted to a power law $\left[\text{R/S}\right](n\delta t)\sim {(n\delta t)}^{H}$ where $H$ is the Hurst (1951). The ‘compute_Hc’ function in the ‘hurst’ package in Python3 estimates the Hurst exponent in this way.
Sequential ranges are defined as
where $sup(x)$ is the supremum and $inf(x)$ is the infimum for the set $x$ of real numbers. Then $M(n\delta t)={(n\delta t)}^{H}M(\delta t)$ Feller (1951).
DLFNN structure and training
Request a detailed protocolThe fractional Brownian trajectories were generated using the Hosking method within the ‘FBM’ function available from the ‘fbm’ package in Python3. The DLFNN was built using Tensorflow Abadi et al. (2016) and Keras Chollet (2015) in Python3 and trained by using the simulated fractional Brownian trajectories. The training and testing of the neural network were performed on a workstation PC equipped with 2 CPUs with 32 cores (Intel(R) Xeon CPU E52640 v3) and 1 GPU (NVIDIA Tesla V100 with 16 GB memory). The structure of the neural network was a multilayer, feedforward neural network where all nodes of the previous layer were densely connected to nodes of the next layer. Each node had a ReLU activation function and the parameters were optimized using the RMSprop optimizer (see Keras documentation Chollet, 2015). Three separate structures were explored and examples of these structures for two hidden layers and five time point inputs are shown in Figure 1g,h and i. The triangular structure was predominantly used since this was the least computationally expensive and accuracy between different structures were similar. To compare the accuracy of different methods, the mean absolute error (${\sigma}_{H}$) of $N$ trajectories, ${\sigma}_{H}={\sum}_{m=1}^{N}\left({H}_{n}^{sim}{H}_{n}^{est}\right)/N$, was used. Before inputting values into the neural network, the time series was differenced to make it stationary. The input values of a fBm trajectory $\left\{x\right\}={x}_{0},{x}_{1},\mathrm{\dots},{x}_{n}$ were differenced and normalized so that $\left\{{x}_{input}\right\}=({x}_{1}{x}_{0})/\text{range}(x),({x}_{2}{x}_{1})/\text{range}(x),\mathrm{\dots},({x}_{n}{x}_{n1})/\text{range}(x)$. Since the model requires differenced and normalized input values, in theory it should be applicable to a wide range of datasets. However, further testing must be done in order to confirm this expectation.
Gaussian kernel density estimation
Request a detailed protocolKernel density estimation (KDE) is a nonparametric method to estimate the probability density function (PDF) of random variables. If $N$ random variables ${x}_{n}$ are distributed by an unknown density function $P(x)$, then the kernel density estimate $P(x)$ is
where $K(\cdot )$ is the kernel function and $l$ is the bandwidth. In this paper, we have used a Gaussian KDE, $K(y)=\frac{1}{\sqrt{2\pi}}{e}^{{y}^{2}/2}$, to estimate the two dimensional PDFs of the second and bottom row in Figure 1a. This was performed in Python3 using ‘scipy.stats.gaussian_kde’ and Scott’s rule of thumb for bandwidth selection.
Segmenting trajectories into persistent and antipersistent segments
Request a detailed protocolFrom the estimates of Hurst exponent from the DLFNN, trajectories were segmented into persistent and antipersistent segments. Given an experimental trajectory $x={x}_{0},{x}_{1},\mathrm{\dots},{x}_{n}$ and window of length ${N}_{w}$ (an odd number) starting at ${x}_{i}$, we obtain the $H$ estimate for the position at ${x}_{j}$, where $j=i+({N}_{w}1)/2$. This will give us a series of ${H}_{t}$ values, ${H}_{({N}_{w}1)/2},{H}_{({N}_{w}1)/2+1},\mathrm{\dots},{H}_{n({N}_{w}1)/2}$, which correspond to the positions, ${x}_{({N}_{w}1)/2},{x}_{({N}_{w}1)/2+1},\mathrm{\dots},{x}_{n({N}_{w}1)/2}$. Then, the values ${H}_{t}$ can be segmented into consecutive points of persistence ${H}_{t}>0.55$ and antipersistence ${H}_{t}<0.45$. The bounding values, 0.55 and 0.45, were used since the mean error of the DLFNN estimation method was ${\sigma}_{H}\sim 0.05$. Any segment less than the length of ${N}_{w}$ was discarded as a precaution against spurious detection.
Directional segmentation of persistent segments
Request a detailed protocolOnce segments of persistence and antipersistence were defined, we measured the displacement, time and velocity of these segments, shown in the bottom row of Figure 7 and Table 1. The persistent segments were filtered to be only from trajectories that travelled over 0.5 µm; contained more points than the window size; and switched behaviour more than twice in the trajectory. In addition, we assessed if persistent segments were anterograde or retrograde in direction. In order to do this, the centrosomal region was defined by the user as the point where the lysosomes, Rab5 and SNX1 organelles were the largest, brightest, or the most clustered. Image contrast enhancements, such as histogram equalization, were used to locate the centrosomes. By locating the centrosomal region and the cell boundary from user input, the persistent segments can then be classified as anterograde or retrograde. This was done by finding the cosine of the angles, $\mathrm{cos}(\theta )$, between the vector, ${\overrightarrow{r}}_{0,i}$, from the centrosome to the current particle position ${x}_{i}$ and the vector, ${\overrightarrow{r}}_{i,i+1}$, from the current particle position to the next particle position ${x}_{i+1}$. The exact formula is $\mathrm{cos}(\theta )={\overrightarrow{r}}_{0,i}\cdot {\overrightarrow{r}}_{i,i+1}/{\overrightarrow{r}}_{0,i}{\overrightarrow{r}}_{i,i+1}$. Using windows in a similar fashion as determining persistent and antipersistent segments, $\mathrm{cos}({\theta}_{i})$ corresponding to position ${x}_{i}$ was found for the points within a persistent segment. If $\mathrm{cos}({\theta}_{i})>{\sigma}_{\mathrm{cos}(\theta )}$, then the motion was deemed to be anterograde and if $\mathrm{cos}({\theta}_{i})<{\sigma}_{\mathrm{cos}(\theta )}$, retrograde. Sweeping through the points of ${x}_{i}$, consecutive retrograde or anterograde points formed segments from the persistent segments. A threshold of ${\sigma}_{\mathrm{cos}(\theta )}=0.3$ was used.
Cell lines
Request a detailed protocolThe MRC5 SV1 TG1 Lung fibroblast cell line was purchased from ECACC. MRC5 cell lines stably expressing GFPRab5C and GFPSNX1 were kindly provided by Drs. Guy Pearson and Evan Reid (Cambridge Institute for Medical Research, University of Cambridge). The GFPSNX1 cell line has been previously described in Allison et al. (2017). Cell lines were routinely tested for mycoplasma infection. To generate the MRC5 GFPRab5C stable cell line, GFPRab5C was PCRed from pIRES GFPRab5C Seaman (2004) using ‘Hpa1 GFP Forward’ (TAGGGAGTTAACATGGTGAGCAAGGGCGAGGA) and ‘Not1 Rab5C Reverse’ (ATCCCTGCGGCCGCTCAGTTGCTGCAGCACTGGC) oligonucleotide primers. The GFPRab5C PCR product and a pLXININeoR plasmid were digested using Hpa1 (New England Biolabs  R0105) and Not1 (New England Biolabs  R3189) restriction enzymes. The GFPRab5C PCR product was then ligated into the digested pLXININeoR using T4 DNA Ligase (New England Biolabs  M0202). The ligated plasmid was amplified in bacteria selected with ampicillin and verified using Sanger Sequencing. To generate the GFPRab5C MRC5 cell line, Phoenix retrovirus producer HEK293T cells were transfected with the pLXINGFPRab5CINeoR plasmid to generate retrovirus containing GFPRab5C. MRC5 cells were inoculated with the virus, and successfully transduced cells were selected using 200 µg/mL Geneticin (G418  SigmaAldrich G1397). Cells used for imaging were not clonally selected.
Liveimaging and tracking
Request a detailed protocolStably expressing MRC5 cells were costained with LysoBrite Red (AAT Bioquest), imaged live using fluorescence microscopy and tracked with NNT aitracker.net; Newby et al. (2018). The cells were grown in MEM (Sigma Life Science) and 10% FBS (HyClone) and incubated for 48 hr at 37 in 5% CO_{2} on 35 mm glassbottomed dishes (µDish, Ibidi, Cat. No. 81150). For LysoBrite staining, LysoBrite was diluted 1 in 500 with Hank’s Balanced Salt solution (Sigma Life Science). Then 0.5 mL of this solution was added to cells on a 35 mm dish containing 2 mL of growing media and incubated at 37 for at least 1 hr. Cells were then washed with sterile PBS and the media replaced with growing media.
After at least 6 hr incubation, the growing media was replaced with liveimaging media composed of Hank’s Balanced Salt solution (Sigma Life Science, Cat. No. H8264) with added essential and nonessential amino acids, glutamine, penicillin/streptomycin, 25 mM HEPES (pH 7.0) and 10% FBS (HyClone). Livecell imaging was performed on an inverted Olympus IX71 microscope with an Olympus 100 × 1.35 oil PH3 objective. Samples were illuminated using an OptoLED (Cairn Research) light source with 470 nm and white LEDs. For GFP, a 470 nm LED and Chroma ET470/40 excitation filter was used in combination with a Semrock FITC3540C filter set. For LysobriteRed, a white light LED, Chroma ET573/35 was used with a dualband GFP/mCherry dichroic and an mCherry emission filter (ET632/60). GFPRab5labeled endosomes were imaged in a total of 65 cells, from three independent experiments. GFPSNX1labeled endosomes were imaged in a total of 63 cells from four independent experiments. Lysosomes were imaged in separate experiments, with 71 cells imaged from three independent repeats. A stream of 20 ms exposures was collected with a Prime 95B sCMOS Camera (Photometrics) for 17 s using Metamorph software while the cells were kept at 37 (in atmospheric CO2 levels). The endosomes and lysosomes in the videos were then tracked using an automated tracking software (AITracker) Newby et al. (2018).
Confocal imaging
Request a detailed protocolTo compare the localization of SNX1 and Rab5, GFPRab5MRC5 cells were grown on #1.5 coverslips and then fixed in 3% (w/v) formaldehyde in PBS for 20 min at room temperature (RT). Coverslips were washed twice in PBS, quenched in PBS with glycine, then permeabilized by incubation for 5 min in 0.1% Triton X100. After another wash in PBS, coverslips were labeled with antibodies to SNX1 and Rab5 for 1 h at RT, washed three times in PBS, then labeled with Alexa488donkey antirabbit and Alexa594donkey antimouse antibodies in 1 µg/mL DAPI in PBS for 30 min. After three PBS washes, coverslips were dipped in deionized water, airdried and mounted on slides using Prolong Gold.
Images were collected on a Leica TCS SP8 AOBS inverted confocal using a 100x/1.40 NA PL apo objective. The confocal settings were as follows: pinhole, one airy unit; scan speed 400 Hz unidirectional; format 2048 × 2048. Images were collected using hybrid detectors (A488 and A594) or a PMT (DAPI) with these detection mirror settings; [Alexa488, 498 nm577 nm; Alexa594, 602 nm667 nm; DAPI, 420 nm466 nm] using the SuperK Extreme supercontinuum white light laser for 488 nm (10.5%) and 594 nm (5%) excitation, and a 405 nm laser (5%) for DAPI. Images were collected sequentially to eliminate crosstalk between channels. When acquiring 3D optical stacks the confocal software was used to determine the optimal number of Z sections. The data were deconvolved using Huygens software before generating maximum intensity projections of 3D stacks using FIJI.
Software and code
Request a detailed protocolThe code and documentation for determining the Hurst exponent can be found in https://github.com/dadanhan/hurstexp (copy archived at https://github.com/elifesciencespublications/hurstexp; Han, 2019) and a GUI is available on https://zenodo.org/record/3613843#.XkPf2Wj7SUl.
Appendix 1
Comparison of HMM model against fBm model segmentation of experimental trajectories
In order to compare the effectiveness of the neural network and hidden Markov models (HMM), qualitative plots were made of real trajectories and their respective comparisions. The models in the HMM analysis approach had a maximum of three different motion states. It is clear from comparing the endosome track segmented using DLFNN (Figure 3) with Appendix 1—figures 1–3, that segmentation using hidden Markov models is not suitable for endosome trajectories. Perhaps, by increasing the number of states within models, the hidden Markov models can achieve similar results of the neural network, but this analysis becomes computationally expensive.
Appendix 2
Testing DLFNN accuracy for different diffusion coefficients
The DLFNN was compared to the MSD estimation method for simulated trajectories with different diffusion coefficients to ensure that the DLFNN estimation was not scale dependent.
Appendix 3
Measuring the residence time and flight length probability density functions of persistent and antipersistent states
Classifying persistent and antipersistent states by Hurst exponent values $1/2<H<1$ and $0<H<1/2$ respectively, individual lysosome and endosome trajectories were segmented with a moving window of 15 points. Data was extracted from microscopy movies from three independent experiments. GFPRab5 endosomes from 65 cells, GFPSNX1 endosomes from 63 cells and lysosomes from 71 different cells were tracked using AITracker Newby et al. (2018). Then the trajectories were segmented into antipersistent ($0<H<1/2$) and persistent ($1/2<H<1$) using the Hurst exponent estimates by DLFNN. The time duration and particle displacement of these segments were measured and then fitted to distributions. In this way, we could measure the stochastic switching between active and passive transport and the statistics of vesicle movement within these states.
Figure 4 shows the survival time probabilities $\mathrm{\Psi}(t)$ of different states of motion (persistent and antipersistent) in vesicle trajectories. $\mathrm{\Psi}(t)$ is the probability that the vesicle will still be in the same state of motion after time $t$ has elapsed. Figure 4 shows that the persistent and antipersistent states follow $\mathrm{\Psi}(t)={e}^{\lambda t}{\left(\frac{{\tau}_{0}}{{\tau}_{0}+t}\right)}^{\mu}$. The survival time probabilities show that lysosomes and endosomes are far more likely to remain trapped in a antipersistent state than be persistently transported by motor proteins. While this is intuitively obvious in the context of cell biology, this analysis provides quantitative characterization of endosomal and lysosomal motility. Appendix 3—table 1 shows the parameters of fitting for Figure 4. Appendix 3—figure 1 shows the empirical probability density functions (PDF) of the particle displacements for different states of motion. Displacements of segments are fitted to Burr Type XII distributions,
where $x$, ${x}_{0}$, $c$ and $k>0$.
As expected, this analysis demonstrates that both endosomes and lysosomes are far more likely to move large distances when they are in the persistent state. This reconciles how vesicles are able to move large distances even though they are more likely to stay in a antipersistent state for long periods of time. The large displacements in the persistent state compete with long durations spent in the antipersistent state.
Appendix 4
Calculating information criteria for GMM fittings
In order to determine the minimum amount of components necessary to model the histograms of Hurst exponent in Figure 5, the Akaike and Bayes Information Criterion were computed.
Data availability
Supporting files are on GitHub and Zenodo.
References

ConferenceTensorflow: a system for largescale machine learning12th USENIX Symposium on Operating Systems Design and Implementation.

Defects in ERendosome contacts impact lysosome function in hereditary spastic paraplegiaJournal of Cell Biology 216:1337–1355.https://doi.org/10.1083/jcb.201609033

Subdiffusion and anomalous local viscoelasticity in actin networksPhysical Review Letters 77:4470–4473.https://doi.org/10.1103/PhysRevLett.77.4470

Temporal analysis of active and passive transport in living cellsPhysical Review Letters 101:248103.https://doi.org/10.1103/PhysRevLett.101.248103

Swarming Bacteria migrate by lévy walkNature Communications 6:1–6.https://doi.org/10.1038/ncomms9396

Multifractional processes with random exponentPublicacions Matemàtiques 49:459–486.https://doi.org/10.5565/PUBLMAT_49205_11

Anomalous diffusion of proteins due to molecular crowdingBiophysical Journal 89:2960–2971.https://doi.org/10.1529/biophysj.104.051078

Strange kinetics of single molecules in living cellsPhysics Today 65:29–35.https://doi.org/10.1063/PT.3.1677

Hook is an adapter that coordinates kinesin3 and dynein cargo attachment on early endosomesThe Journal of Cell Biology 204:989–1007.https://doi.org/10.1083/jcb.201309022

Inverse problems of anomalous diffusion theory: An artificial neural network approachJournal of Applied and Industrial Mathematics 10:311–321.https://doi.org/10.1134/S1990478916030017

Stochastic models of intracellular transportReviews of Modern Physics 85:135–196.https://doi.org/10.1103/RevModPhys.85.135

Objective comparison of particle tracking methodsNature Methods 11:281–289.https://doi.org/10.1038/nmeth.2808

Phosphatidylinositol3OH kinases are Rab5 effectorsNature Cell Biology 1:249–252.https://doi.org/10.1038/12075

Fractional brownian motion in crowded fluidsSoft Matter 8:4886–4889.https://doi.org/10.1039/c2sm25220a

De novo structure prediction with deeplearning based scoringAnnual Review of Biochemistry 77:6.

The asymptotic distribution of the range of sums of independent random variablesThe Annals of Mathematical Statistics 22:427–432.https://doi.org/10.1214/aoms/1177729589

Fractional brownian motions: memory, diffusion velocity, and correlation functionsJournal of Physics A: Mathematical and Theoretical 50:054002.https://doi.org/10.1088/17518121/50/5/054002

A FAM21containing WASH complex regulates retromerdependent sortingDevelopmental Cell 17:699–711.https://doi.org/10.1016/j.devcel.2009.09.009

The role of the cytoskeleton and molecular motors in endosomal dynamicsSeminars in Cell & Developmental Biology 31:20–29.https://doi.org/10.1016/j.semcdb.2014.04.011

Cryoelectron tomography reveals that dynactin recruits a team of dyneins for processive motilityNature Structural & Molecular Biology 25:203–207.https://doi.org/10.1038/s4159401800277

Bidirectional cargo transport: moving beyond tug of warNature Reviews Molecular Cell Biology 15:615–628.https://doi.org/10.1038/nrm3853

Microtubule motors mediate endosomal sorting by maintaining functional domain organizationJournal of Cell Science 126:2493–2501.https://doi.org/10.1242/jcs.122317

Longterm storage capacity of reservoirsTransactions of the American Society of Civil Engineers 116:770–799.

In vivo anomalous diffusion and weak ergodicity breaking of lipid granulesPhysical Review Letters 106:048103.https://doi.org/10.1103/PhysRevLett.106.048103

BookFirst Steps in Random Walks: From Tools to ApplicationsOxford University Press.https://doi.org/10.1093/acprof:oso/9780199234868.001.0001

Spectral content of a single NonBrownian trajectoryPhysical Review X 9:011019.https://doi.org/10.1103/PhysRevX.9.011019

Selfsimilar gaussian processes for modeling anomalous diffusionPhysical Review E 66:021114.https://doi.org/10.1103/PhysRevE.66.021114

The restaurant at the end of the random walk: recent developments in the description of anomalous transport by fractional dynamicsJournal of Physics A: Mathematical and General 37:R161–R208.https://doi.org/10.1088/03054470/37/31/R01

Inferring transient particle transport dynamics in live cellsNature Methods 12:838–840.https://doi.org/10.1038/nmeth.3483

The enigmatic endosome  sorting the ins and outs of endocytic traffickingJournal of Cell Science 131:jcs216499.https://doi.org/10.1242/jcs.216499

ReportMultifractional Brownian Motion : Definition and Preliminary Results. Research Report RR2645, INRIAProjet FRACTALES.

BookFractal Market Analysis: Applying Chaos Theory to Investment and EconomicsJohn Wiley & Sons.

The cytoplasmic dynein transport machinery and its many cargoesNature Reviews Molecular Cell Biology 19:382–398.https://doi.org/10.1038/s4158001800043

Assembly and activation of dynein–dynactin by the cargo adaptor protein Hook3Journal of Cell Biology 214:309–318.https://doi.org/10.1083/jcb.201604002

Cargoselective endosomal sorting for retrieval to the Golgi requires retromerJournal of Cell Biology 165:111–122.https://doi.org/10.1083/jcb.200312034

Actindependent endosomal receptor recyclingCurrent Opinion in Cell Biology 56:22–33.https://doi.org/10.1016/j.ceb.2018.08.006

Models of anomalous diffusion in crowded environmentsSoft Matter 8:9043–9052.https://doi.org/10.1039/c2sm25701g

The rab gtpase familyGenome Biology 2:reviews3007.1.https://doi.org/10.1186/gb200125reviews3007

Elucidating the origin of anomalous diffusion in crowded fluidsPhysical Review Letters 103:038102.https://doi.org/10.1103/PhysRevLett.103.038102

Microrheology of complex fluidsReports on Progress in Physics 68:685–742.https://doi.org/10.1088/00344885/68/3/R04

Advances in the microrheology of complex fluidsReports on Progress in Physics 79:074601.https://doi.org/10.1088/00344885/79/7/074601

Anomalous subdiffusion is a measure for cytoplasmic crowding in living cellsBiophysical Journal 87:3518–3524.https://doi.org/10.1529/biophysj.104.044263

HookA is a novel dyneinearly endosome linker critical for cargo movement in vivoThe Journal of Cell Biology 204:1009–1026.https://doi.org/10.1083/jcb.201308009
Decision letter

Robert H SingerReviewing Editor; Albert Einstein College of Medicine, United States

Aleksandra M WalczakSenior Editor; École Normale Supérieure, France

Hyeyoon ParkReviewer; Seoul National university, Republic of Korea
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
The authors develop a useful Deep Learning Neural Network to classify heterogeneous singleparticle motion from livecell imaging data. They apply their model to infer fractional Brownian motion from both simulated and cellular data, and show that it outperforms competing approaches such as MSDbased averaging and exponent inference, accurately predicting Hurst exponents in as few as 7 steps.
Decision letter after peer review:
Thank you for submitting your article "Deciphering anomalous heterogeneous intracellular transport with neural networks" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Aleksandra Walczak as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Hyeyoon Park (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
The authors develop a useful Deep Learning Neural Network to classify heterogeneous singleparticle motion from livecell imaging data. They apply their model to infer fractional Brownian motion from both simulated and cellular data, and show that it outperforms competing approaches such as MSDbased averaging and exponent inference, accurately predicting Hurst exponents in as few as 7 steps. Their article is well written and presented.
Summary:
All the reviewers see the merit in the approach, although there is some concern that it is not particularly novel and the approach may be biased.
Essential revisions:
The authors should either explain or illustrate how training their model solely on the class of models they seek to infer, rather than a broader set of models, may bias their inference procedure. For example, could they perform a test of this potential bias directly by training the neural network by including other models that are not fBm, and show how this impacts their results, particularly in the case of cellular datasets?
Limited comparison would be helpful for the cellular datasets in particular in order to illustrate to the reader how/why application of the fBm to infer singleparticle trajectory data from living systems is useful for inferring molecular/other mechanism of dynamical molecular motion in living systems.
It would be helpful if the authors could elaborate a bit further on how their fBm and Hurst exponents are useful to infer motion mechanism.
The authors need to offer a bit more biological/cellular insight into their preceding findings on vesicular motion?
Reviewer #1:
The authors provide an algorithmic approach for evaluating cellular transport distinct from MSD based approaches. The useful feature is that presumably fewer points are needed for predictive value. This is important since photobleaching is a major limitation in particle tracking, the more information that can be extracted from limited data, the better.
I feel this approach may be useful, but it is not clear how this compares to the current standard HMM analysis in a head to head matchup. Also, it seems a bit cumbersome so anything they can do to make it user friendly would be appreciated.
Reviewer #2:
The authors develop a useful Deep Learning Neural Network to classify heterogeneous singleparticle motion from livecell imaging data. They apply their model to infer fractional Brownian motion from both simulated and cellular data, and show that it outperforms competing approaches such as MSDbased averaging and exponent inference, accurately predicting Hurst exponents in as few as 7 steps. Their article is well written and presented.
Training the neural networks on a fractional Brownian motion (fBm) model alone must effectively "bias" the inference procedure towards this model, acting analogous to a prior in Bayesian inference. Can the authors either explain or illustrate how training their model solely on the class of models they seek to infer, rather than a broader set of models, may bias their inference procedure? For example, could they perform a test of this potential bias directly by training the neural network by including other models that are not fBm, and show how this impacts their results, particularly in the case of cellular datasets?
In this vein, the authors cite HMMbased models that infer diffusion and directed motion from singleparticle trajectories, but they do not compare their procedure with these methods. Some limited comparison would be helpful for the cellular datasets in particular in order to illustrate to the reader how/why application of the fBm to infer singleparticle trajectory data from living systems is useful for inferring molecular/other mechanism of dynamical molecular motion in living systems.
Related to this, it would also be helpful if the authors could elaborate a bit further on how their fBm and Hurst exponents are useful to infer motion mechanism, immediately before Conclusions, where the authors write:
"This implies that the vesicles may have a biological mechanism to prioritise certain interactions within the complex cytoplasm, similar to how human dynamics are often heavy tailed and bursty Barabasi, (2005)."
Given the rather large difference between human dynamics and molecular, organelle, or vesicular dynamics, could the authors offer a bit more biological/cellular insight into their preceding findings on vesicular motion?
For a wide application of this analysis tool in biology, can the authors provide a directly executable GUI software?
Reviewer #3:
This paper describes an analysis tool for particle trajectory data. The authors used a deep learning feedforward neural network (DLFNN) to extract a stochastic Hurst exponent H(t) from a trajectory. They found that the neural network is a more sensitive method to characterize fractional Brownian motion (fBM) than previous statistical tools such as mean squared displacement (MSD), rescaled range, and sequential range methods. They applied this tool to analyze the trajectories of lysosome and endosomes in live cells. The topic is interesting, but the novelty and impact of the work are not very clear. Also, the software is based on Python, which may limit the application of the tool by a wide range of researchers in cell biology. A userfriendly, standalone software would be more helpful.
1) As the authors mentioned in the Introduction, exponent estimation using neural networks has been already demonstrated. The authors claimed a novelty in that the local H(t) is used to segment single trajectories into persistent and antipersistent sections. However, the manuscript is lacking comparison with the existing methods using hidden Markov models and rolling windowed analysis. The authors also claimed that "fBM with a stochastic Hurst exponent is a new intracellular transport model". However, this section is rather brief, and some notations are not clearly defined. I think stronger impact and novelty are required for publication in eLife.
2) For a wide application of this analysis tool in biology, can the authors provide a directly executable GUI software?
3) It is unclear whether users can apply the pretrained model for a broad range of data with different time and spatial scales. Or do users have to train the neural network for their own data set? More detailed instruction for the training is required. In the subsection “DLFNN structure and training”, the last sentence needs more explanation.
4) In Figure 1E, it is counterintuitive that the error (σ_{H}) increases as the SNR increases. In subsection “The DLFNN is more accurate than established methods”, 'Gaussian noise with increasing signaltonoise ratio" needs to be revised. Please use a different term for this parameter or revise the figure with a commonly used definition of SNR.
5) It would be informative if the authors also show the Hurst exponent estimated from the TAMSD method in Figure 2.
6) Heavytailed distributions have been observed not only in human dynamics but also in many biological systems. Some of the relevant review and original articles are listed below.
Reynolds and Rhodes, (2009)
Ariel, et al., (2015)
Song et al., (2018)
Chen et al., (2015)
https://doi.org/10.7554/eLife.52224.sa1Author response
The authors develop a useful Deep Learning Neural Network to classify heterogeneous singleparticle motion from livecell imaging data. They apply their model to infer fractional Brownian motion from both simulated and cellular data, and show that it outperforms competing approaches such as MSDbased averaging and exponent inference, accurately predicting Hurst exponents in as few as 7 steps. Their article is well written and presented.
Summary:
All the reviewers see the merit in the approach, although there is some concern that it is not particularly novel and the approach may be biased.
We acknowledge the concern of reviewers of novelty and potential bias in our method. However, we believe that our specifically trained neural network allows us to illuminate the heterogeneous behavior in single experimental trajectories with great accuracy. To our knowledge, this has not been achieved before by other methods. Please see our responses to the essential revisions for further details.
Essential revisions:
1) The authors should either explain or illustrate how training their model solely on the class of models they seek to infer, rather than a broader set of models, may bias their inference procedure. For example, could they perform a test of this potential bias directly by training the neural network by including other models that are not fBm, and show how this impacts their results, particularly in the case of cellular datasets?
We fully agree that a broad set of possible anomalous transport models pose a challenge in selecting the optimal one, especially in deducing the origin of anomalous behaviour. However, we believe that training the neural network solely on fractional Brownian motion (fBm) is well justified for cellular datasets and especially for the short time scales in which we are seeking to classify motion. On the trajectory level, fBm behaves the same as the generalized Langevin equation (GLE) model which is a good description of equilibrium and nonequilibrium behavior of a tracer particle undergoing anomalous diffusion. This supports our choice of fBm model. Other anomalous transport models, such as scaled Brownian motion (sBm), subdiffusive continuous time random walks (CTRW) and superdiffusive L´evy walks (LW), have been shown to be good models for describing transport of intracellular vesicles on a mesoscopic scale. However, all of these models are not suitable for direct segmentation and interpretation from positions of tracked particles in consecutive frames of experimental videos on the scales smaller than single unidirectional runs.
On a microscopic level, sBm trajectories display normal Gaussian diffusion, CTRW trajectories are stationary and LW trajectories are ballistic. If we consider CTRW models, the microscopic dynamics at scales smaller than the waiting times would show the particle mostly stationary with instantaneous jumps to different locations, which is clearly unrealistic. For L´evy walk models on microscopic scales, the particle would be ballistic for the majority of times except at the turning points in the trajectory. The ‘anomalous diffusiveness’ of such random walks are thus generated in the mesoscopic scale when considering the statistics of many jumps or runs. On the other hand, fBm has consistent statistical interpretations for all scales due to its selfsimilar properties. Furthermore, the effectiveness of the neural network to estimate the Hurst exponent is due to the correlation of stationary increments and selfsimilarity. For other models based on the statistics of multiple jumps, the neural network would have similar efficiency of estimation as the standard statistical methods such as meansquare displacements (MSDs).
Experimentally, there is a strong motivation for considering transport inside cells as fBm [1, 2]. Therefore, we find that fBm (or GLE) is the model that is most physical. Furthermore, fBm can model ‘both subdiffusion (0 < H < 1/2) and superdiffusion (1/2 < H < 1).… in a unified manner using only the Hurst exponent’ (Introduction).
In order to better address these concerns, we add the following to the Introduction:
‘fBm was chosen due to its selfsimilar properties that allow direct analysis at short time scales given by experimental systems; and the experimental evidence for fBm in the crowded cytoplasm [3, 1, 2]. Moreover, other anomalous diffusion models, such as scaled Brownian motion [4], subdiffusive continuous time random walks [5] and superdiffusive L´evy walks [6] are not suitable to interpret anomalous trajectories on the microscopic level.’
2) Limited comparison would be helpful for the cellular datasets in particular in order to illustrate to the reader how/why application of the fBm to infer singleparticle trajectory data from living systems is useful for inferring molecular/other mechanism of dynamical molecular motion in living systems.
There are several other models of inferring vesicle motion in the cytoplasm, most notably the HMMBayes model introduced by Monnier et al., 2017. To address the concern about the usefulness of our method, we compare results from HMMBayes package to our results. The HMMBayes model was applied to the same trajectory as shown in Figure 3. The result in the newly added Appendix 1 clearly shows that the HMMBayes model is less suitable for segmenting anomalous intracellular trajectories of endosomes and lysosomes since the active motility was not picked up properly. It is apparent that the fBm description with persistent and antipersistent movement produces more intuitive results than that of HMM Bayes with multiple velocity and diffusive states. Quantitatively, our results for subdiffusion and superdiffusion segmentation show residence time probability density functions that are consistent with the truncated powerlaw distributed run times found in [Chen, Wang and Granick, 2015] (Figure 4) albeit they use a directionally persistent ecological method to detect single runs.
The comparisons are made in Appendix 1, a newly added section, and the following is added to the Introduction:
“are commonly used to segment local behaviour along single trajectories (see Appendix 1 for comparisons).”
Also, we would like to mention that windowed analysis using mean squared displacements are conducted in Appendix 2—figure 1 (bottom row) showing the superior performance of our new method.
For further response on the biological relevance, see response to 3below.
3)It would be helpful if the authors could elaborate a bit further on how their fBm and Hurst exponents are useful to infer motion mechanism. The authors need to offer a bit more biological/cellular insight into their preceding findings on vesicular motion?
In order to elaborate further on biological insights and how fBm and Hurst exponents are useful to infer the motion mechanism, we introduce a new experimental dataset for SNX1 positive endosomes made by Anna Gavrilova (who is now also included as an author) and we add new biologically relevant analysis to the manuscript. To reflect this increased biological content, we have replaced the section ‘Trajectory analysis using DLFNN shows regime switching’ with ‘DLFNN analysis reveals differences in motile behavior of organelles in the endocytic pathway’. We also now include an introduction to the endocytic pathway (Introduction).
We tracked GFPSNX1 tagged endosomes and we analyzed them in the same way as before to show a significant difference in the distribution of Hurst exponents (new data in Figure 4 and Figure 5). These results are intriguing, since both Rab5 and SNX1 are present on many of the same early endosomes, although in distinct domains (shown by confocal imaging in Figure 6—figure supplement 4). We now include example tracks and DLFNN analysis for a GFPSNX1 endosome and a lysosome, for comparison (Figure 3—figure supplement 1 and Figure 3—figure supplement 2). We also include example videos for each organelle marker (Figure 6—figure supplement 1, Figure 6—figure supplement, Figure 6—figure supplement 3).
Importantly, we extend our analysis and biological relevance considerably by subdividing persistent segments of long runs into anterograde (outward, kinesindriven) and retrograde transport (dyneindriven movement towards the cell center), as described in the Materials and methods section. These new results are presented in Figure 7 and Table 1. Altogether, this powerful combination of approaches reveals unexpected differences in behavior of the different endocytic compartments, even though they are using the same motor, dynein, for retrograde movement. Molecular mechanisms that might generate such distinct dynamics are discussed (subsection “DLFNN analysis reveals differences in motile behavior of organelles in the endo cytic pathway”). The results suggest that the manner in which these vesicles move is highly correlated with the different functions of vesicles on the endocytic pathway. This implies that directionality and the correlation between consecutive steps are important measurements in addition to the displacement, velocity and duration of movement.
Reviewer #1:
The authors provide an algorithmic approach for evaluating cellular transport distinct from MSD based approaches. The useful feature is that presumably fewer points are needed for predictive value. This is important since photobleaching is a major limitation in particle tracking, the more information that can be extracted from limited data, the better.
1.1) I feel this approach may be useful, but it is not clear how this compares to the current standard HMM analysis in a head to head matchup.
Please see response 2 to the essential revisions.
1.2) Also, it seems a bit cumbersome so anything they can do to make it user friendly would be appreciated.
We had previously made a GUI for the initial submission but was unable to make the GUI available on a public platform due to its size (∼1GB). However, after a long and tedious process of software engineering, we have been able to reduce the size of the GUI (∼100MB) and it is available online at https://zenodo.org/record/3613843#.XkPf2Wj7SUl. We have included this information in subsection “Software and code”:
“and a GUI is available on https://zenodo.org/record/3613843#.XkPf2Wj7SUl.”
Reviewer #2:
The authors develop a useful Deep Learning Neural Network to classify heterogeneous singleparticle motion from livecell imaging data. They apply their model to infer fractional Brownian motion from both simulated and cellular data, and show that it outperforms competing approaches such as MSDbased averaging and exponent inference, accurately predicting Hurst exponents in as few as 7 steps. Their article is well written and presented.
2.1) Training the neural networks on a fractional Brownian motion (fBm) model alone must effectively "bias" the inference procedure towards this model, acting analogous to a prior in Bayesian inference. Can the authors either explain or illustrate how training their model solely on the class of models they seek to infer, rather than a broader set of models, may bias their inference procedure? For example, could they perform a test of this potential bias directly by training the neural network by including other models that are not fBm, and show how this impacts their results, particularly in the case of cellular datasets?
Please see response 1 to the essential revisions.
2.2) In this vein, the authors cite HMMbased models that infer diffusion and directed motion from singleparticle trajectories, but they do not compare their procedure with these methods. Some limited comparison would be helpful for the cellular datasets in particular in order to illustrate to the reader how/why application of the fBm to infer singleparticle trajectory data from living systems is useful for inferring molecular/other mechanism of dynamical molecular motion in living systems.
Please see response 2 to the essential revisions
2.3) Related to this, it would also be helpful if the authors could elaborate a bit further on how their fBm and Hurst exponents are useful to infer motion mechanism, immediately before Conclusions, where the authors write:
"This implies that the vesicles may have a biological mechanism to prioritise certain interactions within the complex cytoplasm, similar to how human dynamics are often heavy tailed and bursty Barabasi, (2005)."
Given the rather large difference between human dynamics and molecular, organelle, or vesicular dynamics, could the authors offer a bit more biological/cellular insight into their preceding findings on vesicular motion?
Please see response 3 to the essential revisions where we further elaborate on the usefulness of fBm and Hurst exponents. The similarity between superdiffusive dynamics of vesicle transport and human dynamics is based on the origin of heavy tailed dynamics: most tasks which are performed (by humans or chemical reactions in biological/cellular context) being rapidly executed, whereas a few tasks (reactions which control directional switching of vesicles) experience very long waiting times due to complex signaling processes that lead to powerlaw distribution of unidirectional run lengths. A detailed elaboration of this idea is a work in progress.
2.4) For a wide application of this analysis tool in biology, can the authors provide a directly executable GUI software?
Please see response 1.2.
Reviewer #3:
This paper describes an analysis tool for particle trajectory data. The authors used a deep learning feedforward neural network (DLFNN) to extract a stochastic Hurst exponent H(t) from a trajectory. They found that the neural network is a more sensitive method to characterize fractional Brownian motion (fBM) than previous statistical tools such as mean squared displacement (MSD), rescaled range, and sequential range methods. They applied this tool to analyze the trajectories of lysosome and endosomes in live cells. The topic is interesting, but the novelty and impact of the work are not very clear. Also, the software is based on Python, which may limit the application of the tool by a wide range of researchers in cell biology. A userfriendly, standalone software would be more helpful.
3.1) As the authors mentioned in the Introduction, exponent estimation using neural networks has been already demonstrated. The authors claimed a novelty in that the local H(t) is used to segment single trajectories into persistent and antipersistent sections. However, the manuscript is lacking comparison with the existing methods using hidden Markov models and rolling windowed analysis.
Please see response 2to Essential revisions and response 3.2.
3.2) The authors also claimed that "fBM with a stochastic Hurst exponent is a new intracellular transport model". However, this section is rather brief, and some notations are not clearly defined. I think stronger impact and novelty are required for publication in eLife.
We thank the reviewer 3 for this comment since it pushed us to develop further the mathematical aspects of the research. In fact, multifractional processes with random exponent as a theory already exists [Ayache and Taqqu, 2005]. However, there were no clear applications of this theory in experiments since there were no methods to measure the changing exponents for such a small number of points. To the authors’ knowledge, this paper is the first time that a changing Hurst exponent was detected, measured and applied in a way that was biologically meaningful. This became possible with the increasing capabilities of the neural network as an estimator.
Although estimating the anomalous exponent with a neural network was already demonstrated in [Bondarenko, Bugueva, and Dedok, 2016], our work presents a far more detailed evaluation of the neural network estimation for very short trajectories as well as demonstrating a biological application. To the authors’ knowledge, this is also the first time where the Hurst exponent estimation of the neural network has been characterized to such an extent, with comparisons to other statistical methods. Furthermore, we believe that this is also the first time that the Hurst exponent has been used to segment experimental trajectories, generating useful biological information. We have also shown that the Hurst exponent displays values of H > 0.5 when the windowed velocity of the endosome or lysosome is much greater than zero (displaying correlated directional motion) and H < 0.5 when the windowed velocity is close to zero, confirming that the Hurst exponent is a consistent measure for our experimental data.
In order to clarify the impact and the novelty of our work we change the last sentence of the Abstract to:
“By using this analysis, fractional Brownian motion with a stochastic Hurst exponent was used to interpret, for the first time, anomalous intracellular dynamics, revealing unexpected differences in behavior between closely related endocytic organelles.”
Also, we rewrite the subsection “‘fBm with a stochastic Hurst exponent is a new possible intracellular transport model’”:
“In our case, H(t) is itself a stochastic process and such a process has been considered theoretically [9]. This is the first application of such a theory to intracellular transport and opens a new avenue for characterizing vesicular movement. Furthermore, Figure 3 shows that the motion of a vesicle, B_{H}(t), exhibits regime switching behaviour between persistent and antipersistent states.”
Finally, we add the following to the end of the Conclusion:
“Finally, in addition to providing a new segmentation method of active and passive transport, this new technique distinguishes the difference in motility between lysosomes, Rab5 positive endosomes and SNX1 positive endosomes. The results suggest that the manner in which these vesicles move is highly linked with which endocytic pathway they are associated with especially when the motion is antipersistent. This implies that directionality and the correlation between consecutive steps is important to measure in addition to the displacement, velocity and duration of movement. We hope that this type of analysis will allow discoveries in particle motility of a more refined nature and make applying anomalous transport theory more accessible to researchers in a wide variety of disciplines.”
3.3) For a wide application of this analysis tool in biology, can the authors provide a directly executable GUI software?
Please see response 1.2.
3.4) It is unclear whether users can apply the pretrained model for a broad range of data with different time and spatial scales. Or do users have to train the neural network for their own data set?
Since the model requires differenced and normalized input values, we anticipate that it should be applicable to a wide range of data sets with no further training required. In order to clarify this, we add the following at the end of the subsection “DLFNN structure and training”:
“Since the model requires differenced and normalized input values, in theory, it should be applicable to a wide range of datasets. However, further testing must be done in order to confirm this.”
3.5) More detailed instruction for the training is required. In the subsection “DLFNN structure and training”, the last sentence needs more explanation.
We have clarified the training details in subsection “DLFNN structure and training”, where we specify the optimizer and the activation functions used for neural network training.
3.6) In Figure 1E, it is counterintuitive that the error (σ_{H}) increases as the SNR increases. In subsection “The DLFNN is more accurate than established methods”, 'Gaussian noise with increasing signaltonoise ratio" needs to be revised. Please use a different term for this parameter or revise the figure with a commonly used definition of SNR.
We thank the reviewer for pointing this out. Due to rearrangement of the Figure, 1E is now 1H. The xaxis label of Figure 1H has been revised to NoiseSignaland the text in subsection “The DLFNN is more accurate than established methods” has also been revised to ‘Figure 1H shows how the exponent estimation error increases when Gaussian noise with varying strength compared to the original signal is added to the fBm trajectories.’
3.7) It would be informative if the authors also show the Hurst exponent estimated from the TAMSD method in Figure 2.
We would like to draw the reviewer’s attention to Appendix 2—figure 1 where we have used a rolling window with TAMSD method to estimate the anomalous exponent.
3.8) Heavytailed distributions have been observed not only in human dynamics but also in many biological systems. Some of the relevant review and original articles are listed below.
Reynolds and Rhodes, (2009)
Ariel, et al., (2015)
Song et al., (2018)
Chen et al., (2015)
We thank reviewer 3 for pointing this out. We have added to subsection “fBm with a stochastic Hurst exponent is a new possible intracellular transport model”:
“ecological searching patterns [11], swarming bacteria [12] and…” and the Conclusion:
“…varying biological processes similar to ecological searching patterns [11], swarming bacteria [12] and…”
https://doi.org/10.7554/eLife.52224.sa2Article and author information
Author details
Funding
Wellcome Trust (215189/Z/19/Z)
 Daniel Han
EPSRC (EP/J019526/1)
 Nickolay Korabel
 Victoria J Allan
 Sergei Fedotov
 Thomas A Waigh
BBSRC (BB/H017828/1)
 Victoria J Allan
Wellcome Trust (108867/Z/15/Z)
 Anna Gavrilova
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The authors thank: Dr. Jay Newby and Zach Richardson (UNC Chapel Hill) for assistance in using the automated tracking system (AITracker); Prof. Philip Woodman, Prof. Martin Lowe, Prof. Matthias Weiss (Universität Bayreuth), Dr. Henry Cox, Dr. Jack Hart, Rebecca Yarwood and Hannah Perkins for discussions; and, Dr. Guy Pearson and Dr. Evan Reid (University of Cambridge) for providing the GFPRab5 and GFPSNX1 MRC5 cell lines. DH acknowledges financial support from the Wellcome Trust Grant No. 215189/Z/19/Z. SF, NK, TAW, and VJA acknowledge financial support from EPSRC Grant No. EP/J019526/1. VJA acknowledges support from the Biotechnology and Biological Sciences Research Council grant number BB/H017828/1. AG acknowledges financial support from the Wellcome Trust Grant No. 108867/Z/15/Z. The Leica SP8 microscope used in this study was purchased by the University of Manchester Strategic Fund. Special thanks goes to Dr. Peter March for his help with the confocal imaging.
Senior Editor
 Aleksandra M Walczak, École Normale Supérieure, France
Reviewing Editor
 Robert H Singer, Albert Einstein College of Medicine, United States
Reviewer
 Hyeyoon Park, Seoul National university, Republic of Korea
Version history
 Received: September 26, 2019
 Accepted: March 22, 2020
 Accepted Manuscript published: March 24, 2020 (version 1)
 Version of Record published: April 8, 2020 (version 2)
 Version of Record updated: May 26, 2020 (version 3)
Copyright
© 2020, Han et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,729
 Page views

 278
 Downloads

 34
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Cell Biology
 Developmental Biology
Cylicins are testisspecific proteins, which are exclusively expressed during spermiogenesis. In mice and humans, two Cylicins, the gonosomal Xlinked Cylicin 1 (Cylc1/CYLC1) and the autosomal Cylicin 2 (Cylc2/CYLC2) genes, have been identified. Cylicins are cytoskeletal proteins with an overall positive charge due to lysinerich repeats. While Cylicins have been localized in the acrosomal region of round spermatids, they resemble a major component of the calyx within the perinuclear theca at the posterior part of mature sperm nuclei. However, the role of Cylicins during spermiogenesis has not yet been investigated. Here, we applied CRISPR/Cas9mediated gene editing in zygotes to establish Cylc1 and Cylc2deficient mouse lines as a model to study the function of these proteins. Cylc1 deficiency resulted in male subfertility, whereas Cylc2^{/}, Cylc1^{/y}Cylc2^{+/}, and Cylc1^{/y}Cylc2^{/} males were infertile. Phenotypical characterization revealed that loss of Cylicins prevents proper calyx assembly during spermiogenesis. This results in decreased epididymal sperm counts, impaired shedding of excess cytoplasm, and severe structural malformations, ultimately resulting in impaired sperm motility. Furthermore, exome sequencing identified an infertile man with a hemizygous variant in CYLC1 and a heterozygous variant in CYLC2, displaying morphological abnormalities of the sperm including the absence of the acrosome. Thus, our study highlights the relevance and importance of Cylicins for spermiogenic remodeling and male fertility in human and mouse, and provides the basis for further studies on unraveling the complex molecular interactions between perinuclear theca proteins required during spermiogenesis.

 Cell Biology
 Structural Biology and Molecular Biophysics
Previously we showed that 2D template matching (2DTM) can be used to localize macromolecular complexes in images recorded by cryogenic electron microscopy (cryoEM) with high precision, even in the presence of noise and cellular background (Lucas et al., 2021; Lucas et al., 2022). Here, we show that once localized, these particles may be averaged together to generate highresolution 3D reconstructions. However, regions included in the template may suffer from template bias, leading to inflated resolution estimates and making the interpretation of highresolution features unreliable. We evaluate conditions that minimize template bias while retaining the benefits of highprecision localization, and we show that molecular features not present in the template can be reconstructed at high resolution from targets found by 2DTM, extending prior work at lowresolution. Moreover, we present a quantitative metric for template bias to aid the interpretation of 3D reconstructions calculated with particles localized using highresolution templates and fine angular sampling.