Recovering mixtures of fast-diffusing states from short single-particle trajectories

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Single-particle tracking (SPT) directly measures the dynamics of proteins in living cells and is a powerful tool to dissect molecular mechanisms of cellular regulation. Interpretation of SPT with fast-diffusing proteins in mammalian cells, however, is complicated by technical limitations imposed by fast image acquisition. These limitations include short trajectory length due to photobleaching and shallow depth of field, high localization error due to the low photon budget imposed by short integration times, and cell-to-cell variability. To address these issues, we investigated methods inspired by Bayesian nonparametrics to infer distributions of state parameters from SPT data with short trajectories, variable localization precision, and absence of prior knowledge about the number of underlying states. We discuss the advantages and disadvantages of these approaches relative to other frameworks for SPT analysis.

Editor's evaluation

This paper will be of interest to the cellular biologists who perform single-particle tracking experiments and develop new tracking methodologies. The authors investigate a new way of estimating an unknown number of diffusion states from short single-molecule trajectories. Ideas developed in the paper are likely to be used for further algorithm development. The authors give the users access to a repository on GitHub that contains comprehensive code that supports the paper.

https://doi.org/10.7554/eLife.70169.sa0

Introduction

Biological processes are driven by interactions between molecules. To understand the role of a molecular species in a process, a central challenge is to measure subpopulations of the molecule engaged in distinct interactions without perturbing the living system. Some interactions – such as complex formation – cause changes in a molecule’s mobility. As a result, live-cell single-particle tracking (SPT), by separately observing the motion of individual molecules, is a promising tool to meet this challenge (Shen et al., 2017).

While SPT originally targeted proteins on cellular membranes, advances in the past two decades led to intracellular applications (Barak and Webb, 1982, Ghosh and Webb, 1994, Kubitscheck et al., 2000, Goulian and Simon, 2000). These include the use of stochastic labeling to isolate a single emitter’s path (Manley et al., 2008), a principle that can be extended into intracellular settings with genetically encoded photoconvertible proteins (Ando et al., 2002, Wiedenmann et al., 2004) or cell-permeable dyes (Grimm et al., 2015, Grimm et al., 2016). Another advance is pulsed or ‘stroboscopic’ excitation, which reduces blur associated with fast-diffusing emitters (Elf et al., 2007). Together with modifications of TIRF microscopes (Tokunaga et al., 2008), these techniques have facilitated the application of SPT to intracellular settings with fast-moving subpopulations (English et al., 2011, Persson et al., 2013, Izeddin et al., 2014, Normanno et al., 2015, Hansen et al., 2017). Following Manley et al., 2008, we refer to this experiment as ‘sptPALM’ (Figure 1A, Video 1).

Figure 1 with 1 supplement see all

Download asset Open asset

Overview of sptPALM.

(A) Schematic of experimental setup. An inclined illumination source is used in combination with a high-numerical aperture (NA) objective to resolve molecules in a thin slice in a cell. The excitation laser is pulsed to limit motion blur. Tracking yields a set of short trajectories (mean track length 3–5 frames). Trajectories shown are from a 7.48 ms tracking movie with retinoic acid receptor α-HaloTag (RARA-HaloTag) labeled with photoactivatable JF549 in U2OS nuclei. Asterisks in the movie frames mark particles at the edge of the focus. (B) Schematic of our inference problem. Each trajectory’s state is assumed to be a random draw from a distribution of state parameters. The goal is to recover this distribution from the observed trajectories. (C) Effects of particle mobility on trajectory length. RARA-HaloTag trajectories from U2OS nuclei were binned into five groups based on their mean squared displacement (MSD). Individual data points are the mean trajectory length of each group for three distinct knock-in clones of RARA-HaloTag (c156: 36961 trajectories, c239: 27543 trajectories, c258: 60347 trajectories); bar heights are the means across clones.

Video 1

Download asset

posterframe for video — Example of sptPALM data.

NPM1-HaloTag in U2OS osteosarcoma nuclei was labeled with 100 nM PA-JFX549-HTL for 5 min followed by washes (‘Materials and methods’), then imaged with a HiLo setup at 7.48 ms frame intervals with 1.5 ms excitation pulses. The pixel size after accounting for magnification is 160 nm. Dots and lines indicate the output of the detection and tracking algorithm; each trajectory has been given a distinct color.

sptPALM experiments on fast-moving emitters in 3D settings pose several challenges for analysis (Hansen et al., 2018). First, apparent motion in sptPALM reflects both the true motion of the emitter and error associated with the estimate for its position (‘localization error’) (Martin et al., 2002, Matsuoka et al., 2009). Like fixed cell PALM and STORM microscopies (Betzig et al., 2006, Rust et al., 2006), the magnitude of localization error in sptPALM depends on the number of photons collected from each emitter (Thompson et al., 2002). But unlike fixed cell microscopies, sptPALM has another component of error due to motion blur, the convolution of the microscope’s point spread function with the path of the emitter. This component of error is not trivial: the mean 2D displacement of a Brownian particle with diffusion coefficient 10 μm² s^-1 during a 1 ms integration is ~180 nm, substantially larger than typical localization error in fixed cell PALM/STORM (Figure 1—figure supplement 1B). Consequently, localization error in sptPALM depends on both the emitter’s mobility and its distance from the focus and is not simple to measure (Kubitscheck et al., 2000, Berglund, 2010, Michalet and Berglund, 2012). Pulsed excitation can be used to reduce motion blur (Elf et al., 2007), but because the laser pulse still has nonzero duration (usually ≥1 ms), motion blur remains an important part of the measurement (Deschout et al., 2012, Lindén et al., 2017).

Second, the high numerical aperture (NA) objectives required to resolve single emitters induce short depths of field, typically less than a micron. Whereas bacteria such as Escherichia coli are often small enough to fit into the resulting focal volume, mammalian cells – with depths ≥5–10 μm – cannot. As a result, intracellular SPT experiments only capture short transits of emitters through the focal volume, a behavior termed defocalization (Figure 1C, Video 2, Video 3; Kues and Kubitscheck, 2002, Mazza et al., 2012, Hansen et al., 2018). The duration of each transit depends on the emitter’s mobility. This creates a sampling problem: slow particles with long residences inside the focal volume contribute a few long trajectories, while fast particles with short residences contribute many short trajectories. Mean trajectory length is often as little as 3–4 frames, severely limiting the ability to infer dynamic parameters (such as diffusion coefficient) from any single trajectory. Fast multifocal imaging may mitigate this problem (Abrahamsson et al., 2013), but such methods currently require higher photon budgets and are not yet applicable to fast-diffusing targets with high motion blur. Meanwhile, the use of cylindrical optics to encode axial position in PSF astigmatism (Kao and Verkman, 1994), while popular in fixed cell PALM/STORM, is complicated in sptPALM by its resemblance to motion blur.

Video 2

Download asset

Video 3

Download asset

Third, the true number of dynamic subpopulations or ‘states’ for a protein of interest is usually unknown a priori. Proteins often participate in many complexes with distinct dynamics. Model-dependent analyses that assume a fixed number of states (Mazza et al., 2012, Hansen et al., 2017, Hansen et al., 2018), while powerful when combined with complementary measurements (Izeddin et al., 2014, Hansen et al., 2020), are limited to measuring coefficients of known models. To compound model complexity, a protein may behave differently in distinct subcellular environments. Indeed, although sptPALM directly observes the spatial context for each trajectory (Xiang et al., 2020), analyses such as jump distribution modeling often discard this information by aggregating jumps across all subcellular locations.

The central problem for sptPALM analysis is to recover the underlying dynamic states for a protein of interest given a set of observed trajectories in the presence of these three challenges.

A common approach to recover subpopulations from sptPALM is to construct histograms of the mean squared displacement (MSD), the maximum likelihood estimator for the diffusion coefficient in the absence of localization error. The MSD is highly variable for short trajectories and, when used to estimate diffusion coefficient, becomes especially error-prone when the variance of localization error is unknown (Michalet and Berglund, 2012). More problematically, MSD histograms assume that sampling from slow and fast states with equal occupation produces the same number of trajectories, which leads to severe state biases in the presence of defocalization (Mazza et al., 2012, Hansen et al., 2018). Common preprocessing steps to select for long trajectories compound the problem by introducing biases for slow emitters that remain in focus.

Methods based on least-squares fitting of the jump length cumulative distribution function (CDF) have interpreted sptPALM data with two- and three-state models while accounting for defocalization (Mazza et al., 2012, Hansen et al., 2018), but extend poorly to more complex models due to overfitting and do not provide a way to select between competing models.

A different approach to model selection is represented by vbSPT, a variational Bayesian framework for reaction-diffusion models (Persson et al., 2013). vbSPT relies on the evidence lower bound to identify the number of states, and it excels at recovering occupations and transition rates for a small number of diffusing states from short trajectories. However, it is not appropriate to apply in situations where the target’s dynamic profile is not discrete and does not consider defocalization or localization error, although it can be complemented with a separate estimate of localization error (Lindén et al., 2017). As such, there is a need for methods that combine the advantages of Bayesian methods like vbSPT with a model that can accommodate nondiscrete dynamic profiles, while accounting for biases induced by sptPALM imaging geometry.

Here, we examine two alternative methods for recovering an sptPALM target’s dynamic profile. The first is based on a Dirichlet process mixture model (DPMM) and the second on a finite state approximation to the DPMM that we refer to as a state array (SA). Exploring these techniques on simulated and real datasets, we find that although both DPMMs and SAs recover complex mixtures of states and can be applied to nondiscrete distributions of diffusion coefficients, SAs far outperform DPMMs due to their robustness to variable localization error variance. Both methods share the limitation that they do not deal with transitions between states. We investigate how this limitation affects apparent state occupations recovered with these methods.

The SA method is publicly available as the pip-installable Python package saspt (source: https://github.com/alecheckert/saspt; Heckert, 2022c documentation: https://saspt.readthedocs.io/en/latest/).

Results

Two approaches to infer subpopulations in sptPALM datasets

We considered how to infer dynamic subpopulations from the short, fragmented trajectories produced by sptPALM in a manner robust to the effects of localization error and defocalization (Figure 1).

A simple and popular approach to this problem is to make a separate estimate for the parameters of each trajectory, then compile a histogram of the results. In the case of Brownian motion, we refer to this method as the ‘MSD histogram’ approach since the MSD is the maximum likelihood estimator for the diffusion coefficient of a Brownian motion with no localization error.

Real estimates of a particle’s position, however, are invariably associated with localization error. In sptPALM, this problem is more significant due to motion blur, which increases the magnitude of the error (Figure 1—figure supplement 1). To incorporate these effects, we refer to the combination of regular Brownian motion with normally distributed, mean-zero localization error as ‘RBME’ (‘Materials and methods’). Each RBME is characterized by two parameters: the diffusion coefficient and the localization error variance. (For brevity, we refer to the latter simply as ‘localization error.’) Importantly, the increments of RBME are only Markovian when the localization error is zero (Martin et al., 2002; Figure 1—figure supplement 1).

Because individual trajectories produced by sptPALM are usually too short to estimate localization error, and because it does not take into account other effects like defocalization, the MSD histogram approach is prone to large systematic biases (Michalet and Berglund, 2012, Hansen et al., 2018). While techniques exist to mitigate some biases of MSD fitting (Kepten et al., 2015), most are difficult to apply at the single trajectory level due to the small number of points per trajectory.

A distinct approach is represented by Bayesian finite state mixture models (Marin et al., 2005, McLachlan et al., 2019; Figure 2A, Figure 2—figure supplement 1A). Such models are comprised of a collection of states labeled $k = 1, \dots, K$ . Each state is associated with an occupation $τ_{k}$ (describing the probability to observe trajectories from that state) and a vector of state parameters $θ_{k}$ (describing the kind of trajectories produced by that state). Importantly, $θ_{k}$ can also incorporate measurement parameters like the localization error. The probability to observe a particular trajectory $x$ is then $\sum_{k = 1}^{K} τ_{k} p_{X} (x | θ_{k})$ , where $p_{X} (x | θ_{k})$ is a distribution over trajectories produced by state $k$ and depends on the type of motion being considered. The goal is to infer $τ_{k}$ and $θ_{k}$ for each state given some observed set of trajectories $F$ . A challenge with such methods is choosing the number of states $K$ as well as the high computational cost when $p_{X} (x | θ)$ is nonconjugate to the prior over $θ$ .

Figure 2 with 1 supplement see all

Download asset Open asset

Schematic comparison of finite state mixtures, Dirichlet process mixtures, and state arrays (SAs).

(A) Finite state mixture models use a discrete set of $K$ states. Challenges include estimating $K$ and producing intelligible output when the underlying dynamic profile is not discrete. (B) Dirichlet process mixture models (DPMMs) address the problem of nondiscrete dynamic profiles by using a continuous distribution over state parameters. Inference routines are slow, so in this work we use approximative motion models. (C) SAs, a special case of the finite state mixture. SAs approximate DPMMs by using a discrete grid of state parameters and have a faster inference routine. Challenges with SAs include the choice of the parameter grid.

Potential solutions can be found in the Bayesian nonparametric class of methods. These approaches begin with a single model comprising a very large or infinite collection of states. A Bayesian inference algorithm is then used to prune away superfluous complexity, leaving a sparse subset of states sufficient to explain the observed trajectories. The foundational example is the DPMM (Ferguson, 1973), which has the distinct advantage of being able to approximate essentially any mixture of states, discrete or continuous (Neal, 1992, Teh, 2010; Figure 2B). Its disadvantage is the high computational cost associated with inference, which becomes especially severe when considering types of motion with multiple parameters (such as RBME) (Neal, 2000, Andrieu et al., 2003).

We considered two responses to this challenge. First, we constructed a DPMM that uses a cheap approximation to RBME by treating the RBME as a Markov process (Matsuda et al., 2018; Figure 3C). This assumption is strictly true only when the localization error is zero and is the same assumption used to estimate diffusion coefficient via the MSD (Michalet and Berglund, 2012). Because localization error is never actually zero, we were curious to see when and how this method breaks down.

Figure 3 with 2 supplements see all

Download asset Open asset

Application of state arrays and Dirichlet process mixture models (DPMMs) to mixtures of Brownian motions.

(A) Regular Brownian motion with localization error (RBME) is a motion model that involves two parameters: diffusion coefficient and localization error variance. (For brevity, we refer to the latter simply as ‘localization error.’) Unlike pure Brownian motion, RBME has correlations between sequential jumps due to the influence of localization error. (B) State array inference for RBMEs. The naive occupation estimate is the initial estimate for the posterior, which is subsequently refined through variational inference. At the end of inference, we marginalize out localization error to yield 1D distributions over the diffusion coefficient. (C) DPMM inference for mixtures of Brownian motions. Because the Gibbs sampling routine for a pure DPMM is slow, we use an approximative motion model that neglects the off-diagonal terms of the covariance matrix in (A). (D) Example of state arrays evaluated on simulated sptPALM. Tracking was simulated in a spherical nucleus with 700 nm focal depth, uniform photoactivation probability, 14 Hz bleaching rate, 7.48 ms frame intervals, and variable localization error. The lines represent the state array posterior mean occupations for independent replicates of the same simulation.

The second approach we explored is a model we refer to as a ‘state array’ (SA). This model is a special case of the finite state mixture, obtained by selecting a large number of states $K$ and fixing the state parameters to the vertices of an ‘array’ that spans some target parameter space (Figure 2C, Figure 2—figure supplement 1). For example, the array for RBME might span a range of biologically plausible diffusion coefficients and localization error variances. An array for an anomalous diffusion model may also incorporate one or more anomaly parameters. The occupation of each ‘state’ in this array is inferred through a variational Bayesian algorithm, driving the occupation of most states to zero to leave a minimal set sufficient to explain the observations (‘Materials and methods’). Importantly, SAs jointly infer a ‘global’ distribution over the state parameters along with ‘individual’ distributions for each trajectory. The nature of the variational inference algorithm means that the ‘global’ distribution is always a weighted mean of these ‘individual’ distributions. We focus our attention on the global distribution in this article, with some consideration of the individual distributions for each trajectory at the end.

Because the parameters for each state in an SA are fixed, the most expensive computations can be cached and reused throughout inference. As a result, SAs can handle more complex models than DPMMs. In this article, we use a 2D SA for RBME spanning a range of diffusion coefficients and localization error variances. After inference, we marginalize out the localization error part to yield 1D functions of the diffusion coefficient (Figure 3B). This procedure naturally incorporates uncertainty about localization error variance, rendering SAs more robust to variations in localization error than DPMMs (Figure 3—figure supplement 1).

DPMMs and SAs work best with thousands to tens of thousands of trajectories. This often requires aggregating trajectories across multiple cells, which can mask cell-to-cell variability. To assess cell-to-cell variability, we also found it useful to have a ‘cheap and dirty’ estimate of state occupation that works with a smaller number (100 s) of trajectories. This is derived from the SA calculation and is simply the sum of the normalized RBME likelihood function across all of the trajectories observed in a cell. We refer to this as the ‘naive occupation estimate.’ Functionally it behaves like a less precise version of the SA method (‘Materials and methods’).

Finally, to account for defocalization we developed a method applicable to the posterior distributions of both DPMMs and SAs (Figure 3—figure supplement 2, ‘Materials and methods’).

Evaluating DPMMs and SAs on simulated sptPALM data

As the target for inference, we considered a mixture of RBMEs enclosed in a spherical membrane with a thin focal volume bisecting the sphere, with dimensions similar to a mammalian cell nucleus. Emitters photoactivate and photobleach throughout the sphere and are only observed when their positions coincide with the focal volume. Because no gaps are allowed during tracking, the result is a highly fragmented set of trajectories with mean length 3–5 frames. We chose simulation settings to approximate real sptPALM experiments, with bleaching rates ≥10 Hz, diffusion coefficients in the range 0–100 μm² s^-1, and localization error variances between 0² and 0.06²μm².

We compared the ability of DPMMs, SAs, and MSD histograms to recover the underlying distribution of diffusion coefficients from this data. We divided these simulations into four classes with increasing difficulty. In class 1, localization error for all states was provided as a known constant to the algorithms (Figure 4A, Figure 4—figure supplement 1A). In class 2, localization error was held constant for all states but was unknown to the algorithms (Figure 4B, Figure 4—figure supplement 1B). In class 3, localization error was allowed to vary between diffusive states and was also unknown to the algorithms (Figure 4C, Figure 4—figure supplement 1C). Finally, for class 4 we simulated full sptPALM-like movies that incorporate heterogeneous localization error, motion blur, camera noise, tracking errors, and defocus (Figure 4—figure supplement 5, Figure 4—figure supplement 6). In these simulations, the localization error is unique for each emitter and depends on the emitter’s axial position, the stochastic number of photons it emits during each integration, and its pattern of motion blur (Video 4, Video 5).

Figure 4 with 12 supplements see all

Download asset Open asset

Comparison of the mean squared displacement (MSD) histogram, Dirichlet process mixture model (DPMM), and state array (SA) methods to recover dynamic profiles from trajectory simulations.

(**A–C**) Mixtures of diffusing states were simulated in a 700 nm focal volume with 7.48 ms frame intervals. Simulations were divided into three classes of increasing difficulty based on the treatment of localization error as described in the text. For each replicate, exactly 12,800 trajectories were simulated. Estimated occupations for five independent replicates are overlaid on each subplot. (D) Accuracy of state occupation estimates for each method as a function of sample size. Each method was run on trajectory simulations generated from an underlying three-state dynamic model (0.02 μm² s^-1 [20%], 0.5 μm² s^-1 [30%], 5.0 μm² s^-1 [50%]), then occupations were estimated by integrating the distribution produced by each method. Limits of integration were set to 0–0.08 μm² s^-1 (state 1), 0.08–1.5 μm² s^-1 (state 2), or 1.5–40 μm² s^-1 (state 3). 20 replicates were run per condition. (E) Mean absolute error (MAE) in state occupation estimates for the simulations in (D). Each value is the average MAE across all replicates. (F) Inferring mixtures of diffusing states with similar diffusion coefficients using SAs. For each replicate, a total of 6400 trajectories were simulated with the indicated underlying state distribution. (G) Effect of state transitions on the MSD, DPMM, and SA approaches. We varied the first-order transition rate constant between two diffusing states, simulating 6400 trajectories per replicate.

Video 4

Download asset

Video 5

Download asset

DPMMs and SAs both recovered the dynamic profile for simulations in class 1 with a resolution that exceeded the MSD histogram approach. With large samples of trajectories, DPMMs and SAs inferred even nondiscrete distributions of states (Figure 4A, Figure 4—figure supplement 1A).

When knowledge of the localization error was removed (classes 2 and 3), the SA approach outperformed both the MSD and DPMM approaches. The DPMM’s performance was especially poor when the contributions of diffusion and error to jump variance were similar ( $D Δ t \approx σ_{loc}^{2}$ ), likely due to its simplistic treatment of localization error. Meanwhile, the dynamic profile estimated by SAs was unperturbed by variations in the localization error (Figure 4B and C, Figure 4—figure supplement 1B and C). Comparing the results from simulations in class 3 numerically, we found that the root mean squared deviation of the estimated CDF from the true CDF was ≤ 5% for SAs, while it was 5–20% for both the MSD histogram and DPMM approaches (Figure 4—figure supplement 2).

The dynamic profiles produced by the MSD, DPMM, and SA approaches can be integrated to yield occupation estimates over particular diffusion coefficient ranges. We compared the accuracy and precision of these estimates with discrete two-, three-, or four-state models (Figure 4D, Figure 4—figure supplement 3, Figure 4—figure supplement 4). As the number of trajectories increased, occupations estimated by DPMMs and SAs converted to within 3% of the true values. In contrast, the MSD approach was associated with large systematic errors, an effect previously reported (Mazza et al., 2012, Hansen et al., 2018).

On full optical and dynamic simulations in class 4, SAs also outperformed the DPMM approach (Figure 4—figure supplement 5, Figure 4—figure supplement 6). Again, the difference was particularly pronounced for small diffusion coefficients, for which the DPMM state occupation estimates were severely inaccurate. Both methods had difficulty recovering the fastest diffusion coefficient tested (Figure 4—figure supplement 5B), possibly due to the restrictive conditions on the maximum jump distance used during tracking.

A central limitation of DPMMs and SAs is that they do not account for transitions between diffusive states. To determine the effect of state transitions on the output of these algorithms, we simulated mixtures of two diffusive states with increasing transition rates (Figure 4G, Figure 4—figure supplement 7). While slow transition rates had a negligible effect on the estimated state profile, transition rates approaching the frame interval appeared as single state with intermediate diffusion coefficient (Figure 4—figure supplement 7C), consistent with a result from reaction-diffusion systems (Crank, 1975). The shift from the two-state to single-state regime occurred in a narrow window of mean state dwell times between 0.05 and 0.5 frame intervals.

In this article, we restricted DPMM/SA inference to a range of diffusion coefficients from 10^-2 to 10² μm² s^-1. We also explored what happens when the true diffusion coefficient lies outside this range. DPMMs and SAs still recovered the correct state occupations by using the closest diffusion coefficient in their respective supports (Figure 4—figure supplement 8).

In the presence of multiple diffusing states with similar diffusion coefficients, both DPMMs and SAs tended to identify a single population with occupation equal to the sum of the occupations for each true state (Figure 4F, Figure 4—figure supplement 9).

We compared the performance of SAs and vbSPT (Persson et al., 2013) using simulated SPT movies with different dynamic models (Figure 4—figure supplement 10). Both methods had comparable accuracy on simple two-state models (Figure 4—figure supplement 10B). On more complex models (Figure 4—figure supplement 10C), both methods encountered distinct difficulties, with vbSPT tending to overestimate and SAs tending to underestimate the number of states. For clusters of states with similar parameters (Figure 4—figure supplement 10C, bottom), SAs tend to produce a ‘smear’ of state occupations over a range of diffusion coefficients, while vbSPT tended to produce a different cluster of states in the same region of parameter space. vbSPT was noticeably less accurate at recovering slow-moving states with small diffusion coefficients (<0.1 μm² s^-1). We concluded that both approaches are useful and may provide complementary information.

While our investigation focused primarily on Brownian motion, SAs can be applied to any motion model parameterized by a likelihood function. To explore applications of SAs outside of Brownian motion, we applied it to fractional Brownian motion (FBM), a generalization of Brownian motion capable of producing anomalous diffusion (Mandelbrot and Van Ness, 1968). Whereas Brownian motion’s sole parameter is the diffusion coefficient, FBM parameterizes both the magnitude (via a scaling coefficient) and the temporal correlations (via the Hurst parameter) of a particle’s increments. As with Brownian motion, we simulated sptPALM movies with fraction Brownian particles with variable diffusion coefficient and Hurst parameter (Video 6). To construct a state array for FBM, we used a 3D array over scaling coefficient, Hurst parameter, and localization error variance (Figure 4—figure supplement 11C). As with the RBME array, we marginalized out localization error after inference. While the SA accurately recovered the diffusion coefficient and Hurst parameter for multistate FBM models (Figure 4—figure supplement 11D), we noted a systematic error in the estimation of low (subdiffusive) Hurst parameters due to motion blur (Figure 4—figure supplement 12).

Video 6

Download asset

Performance of state arrays on experimental sptPALM

After observing that SAs outperformed DPMMs on simulations, we proceeded to evaluate SAs on real data. We acquired an sptPALM dataset in U2OS osteosarcoma nuclei with endogenously tagged retinoic acid receptor-α-HaloTag (RARA-HT) (Pontén and Saksela, 1967, Los et al., 2008; (Figure 5—figure supplement 1). RARA-HT is a type II nuclear receptor that heterodimerizes via its ligand-binding domain (LBD) with the retinoid X receptor (RXR) to form a complex competent to bind chromatin and regulate target genes Giguere et al., 1987, Petkovich et al., 1987, Brand et al., 1988, Yu et al., 1991, Bugge et al., 1992, Marks et al., 1992, Leid et al., 1992; reviewed in Evans and Mangelsdorf, 2014). In addition, association of coregulator complexes with the RAR/RXR heterodimer has been shown to influence the dimer’s dynamics in FCS studies (Brazda et al., 2011, Brazda et al., 2014). As such, RARA-HT is expected to inhabit a variety of dynamic states in sptPALM.

For comparison, we also performed identical sptPALM experiments with histone H2B-HaloTag (H2B-HT), a protein with a high-occupation immobile state (Hansen et al., 2017, McSwiggen et al., 2019), as well as HaloTag and HaloTag-NLS (HT and HT-NLS), which are fast-diffusing proteins with low immobile fractions.

The four proteins presented distinct dynamic profiles (Figure 5A). For both HT and HT-NLS, the SA identified a single highly mobile state. In agreement with previous reports (Xiang et al., 2020), we observed that addition of the NLS reduces HaloTag’s diffusion coefficient by two- to threefold. In contrast, both RARA-HT and H2B-HT had substantial immobile fractions, accounting for roughly 40 and 70% of their total populations, respectively (Figure 5C). SAs identified stark differences in the mobile subpopulations for RARA-HT and H2B-HT. Whereas H2B-HT presented a fast population at 8–10 μm² s^-1, RARA-HT inhabited a broad spectrum of diffusing states ranging from 0.3 to 10.0 μm² s^-1. Biological replicates gave similar results (Figure 5—figure supplement 2A).

Figure 5 with 2 supplements see all

Download asset Open asset

State arrays (SAs) applied to experimental sptPALM.

All sptPALM experiments were performed with the photoactivatable dye PA-JFX549 using a TIRF microscope with HiLo illumination, 7.48 ms frame intervals, and 1 ms excitation pulses. (A) Naive and SA occupations for four different tracking targets. The upper two panels are the naive occupations for each nucleus in each of two biological replicates. Biological replicates correspond to separate knock-in clones for RARA-HaloTag or separate transfections for the other constructs (mean 1627 trajectories per nucleus). The bottom panel displays the SA occupations for a run of the SA algorithm on trajectories pooled from a single biological replicate (mean 17,899 trajectories per biological replicate). Asterisks for RARA-HaloTag and H2B-HaloTag indicate that the immobile fraction for these constructs has been truncated to visualize the faster-moving states. (B) Naive occupation estimate for RARA-HaloTag constructs bearing domain deletions or point mutations. ‘Exogenously expressed’ constructs were expressed from a nucleofected PiggyBac vector under an L30 promoter. (C) Quantification of the immobile fractions and mean free diffusion coefficients for the four constructs in (A). The ‘immobile fraction’ was defined as the total occupation below 0.05 μm² s^-1, while the mean free diffusion coefficient was the posterior mean diffusion coefficient above this threshold. Each dot represents a biological replicate (a different knock-in clone for RARA-HT or a different nucleofection for H2B-HT, HT-NLS, and HT).

Figure 5—source data 1 Raw and labeled RARA-HaloTag Western blots used in Figure 5.: https://cdn.elifesciences.org/articles/70169/elife-70169-fig5-data1-v1.zip
Download elife-70169-fig5-data1-v1.zip

To determine the origins of the dynamic states observed for RARA-HT, we performed domain deletions (Figure 5B). Removal of either the DNA-binding domain (DBD) or LBD resulted in loss of the immobile population. Because both the DBD and LBD are required for chromatin binding by the RAR/RXR heterodimer, this suggests that the immobile fraction represents chromatin-bound molecules. To confirm this, we introduced a point mutation (C88G) in the zinc fingers for the RARA-HT DBD that abolishes DNA-binding in vitro (Zhu et al., 1999). This led to loss of the immobile fraction (Figure 5B). Deletion of the unstructured N-terminal domain (NTD) or C-terminal domain (CTD) had a milder effect, suggesting that these domains are not the primary determinants of the dynamic behavior of RARA-HT.

To understand the origins of heterogeneity in the diffusive profile, we performed three variants of bootstrap aggregation (Figure 5—figure supplement 2B). The primary origins of variability for both DPMMs and SAs were cell-to-cell rather than clone-to-clone variability or intrinsic variability due to finite sample sizes.

Spatiotemporal context of cellular protein dynamics

In the process of inferring the global distribution over state parameters for an sptPALM dataset, SAs jointly infer individual distributions for each trajectory. Up to this point, we have analyzed the global distribution. However, it is also possible to aggregate the individual distribution for each trajectory as a function of space or time, yielding, for instance, separate dynamic profiles for every spatial location in an experiment. This approach offers a potential route to understand spatiotemporal variation in the dynamics of a protein target.

We explored this aspect of SAs with a U2OS nucleophosmin-HaloTag (NPM1-HT) sptPALM dataset. NPM1-HT exhibits partial nucleolar localization (Figure 6—figure supplement 1B) and distinct dynamic behavior inside and outside nucleoli (Mitrea et al., 2018). The SA identified a broad range of diffusion coefficients for NPM1-HT, with three modes including an effectively immobile population (Figure 6A). Selecting four ranges of diffusion coefficients for analysis (Figure 6A), we visualized the posterior distribution as a function of space, calculating local fractional occupations for each range (Figure 6B, Figure 6—figure supplement 1C). This analysis revealed that some populations (including a slow-moving mobile population at 0.23 μm² s^-1) are enriched in nucleoli, while others (for instance, a fast-moving population at 4 μm² s^-1) are depleted and still others show no preference (Figure 6C). Notably, these preferences are apparent even in the naive occupations for trajectories in each compartment (Figure 6—figure supplement 1D).

Figure 6 with 1 supplement see all

Download asset Open asset

Spatiotemporal variation in the state array posterior distribution.

(A) Posterior occupations for a state array evaluated on NPM1-HaloTag trajectories in U2OS nuclei. The ranges labeled i, ii, iii, and iv indicate parts of the dynamic profile isolated for analysis in subsequent panels. (B) Spatial distribution of the posterior probability in (A) for NPM1-HaloTag trajectories in a single U2OS nucleus. The posterior model over the diffusion coefficient was evaluated for each of the origin trajectories, and these points were then used to perform a kernel density estimate (KDE) with a 100 nm Gaussian kernel. For the local normalized occupation, these KDEs were normalized to estimate the relative fractions of molecules in each state. (C) Quantification of the analysis in (B) for 15 nuclei. ‘Nucleoplasmic’ trajectories were defined as trajectories outside nucleoli but inside the nucleus. (D) Temporal variation in the posterior distribution.

The NPM1-HT tracking experiments were performed with an acquisition sequence comprising several phases with distinct levels of photoactivation. As a result, the localization density varied temporally in each movie. To understand the effect of localization density on the diffusion coefficient likelihoods, we aggregated the naive state occupations over 100-frame temporal blocks (Figure 6D). These experiments demonstrated that high localization densities led to a deflation in the occupation of slower-moving states, probably due to tracking errors. As a result, only phases with low localization density were used for posterior estimation. This demonstrates how the temporal perspective on the posterior may be useful as a guide for subsequent analysis, including quality control in SPT experiments.

Discussion

Intracellular sptPALM with fast-diffusing proteins presents unique challenges for analysis. In particular, the issues of state bias arising from imaging geometry, limited information available from any single trajectory, and variable localization error must be addressed prior to biological interpretation of sptPALM data.

The two methods investigated here, DPMMs and SAs, represent distinct approaches to this problem inspired by Bayesian nonparametrics. These methods identify sparse explanatory models from more complex alternatives, similar to other popular SPT approaches like vbSPT, but can use a broader range of dynamic models and are applicable when the dynamic profile is not comprised of discrete states. Between the two methods, SAs far outperformed DPMMs.

When evaluated on real sptPALM data, SAs revealed previously unappreciated features of the dynamic profile for RARA-HaloTag and H2B-HaloTag. In particular, RARA-HaloTag exhibited a broad spectrum of diffusive states that stands in contrast to the more discretized profile of H2B-HaloTag or HaloTag-NLS. The ability to identify the presence or absence of discrete diffusing states is a major advantage of SAs over existing methods, which are generally premised on the existence of discrete states. We found that SAs were especially useful when complemented with the naive occupation estimate to visualize cell-to-cell and movie-to-movie variability. A Python tool that implements SAs can be found at https://github.com/alecheckert/saspt with documentation at https://saspt.readthedocs.io.

DPMMs and SAs have several limitations. DPMMs require prior measurement of the localization error, while SAs require selection of a parameter grid with spacing fine enough to avoid discretization artifacts. The saSPT package uses default parameter grids that satisfy this requirement for regular and FBM. However, the grid needs to be reevaluated for any new types of motion to which SAs are applied. Additionally, neither DPMMs nor SAs consider transitions between states, a major shortcoming of these methods.

Our experiments used a fixed range of diffusion coefficients from 10^-2 to 10² μm² s^-1. Even when the true diffusion coefficient was outside this range, SAs accurately estimated state occupations by using the nearest available diffusion coefficient (Figure 4—figure supplement 8). Our experimental SPT results, with large spikes at the lowest diffusion coefficient, suggest this is common in real data for SPT targets with very slow or immobile populations. A potential area for future improvement is to extend the support iteratively until the slowest and fastest states are captured. Such an approach would need to contend with the increased difficulty in estimating the diffusion coefficient when it is much smaller than the localization error variance (Figure 3—figure supplement 1C).

While we have only investigated the application of SAs to regular Brownian motion (and, briefly, FBM) in this article, the model could be extended to any type of motion parameterized by a likelihood function. We highlight two potential challenges for any such work. First, the SA’s size scales with the number of parameters of the motion, meaning that more complex models are more computationally expensive. This could be addressed at the implementation level; for instance, by porting SA inference to graphical processing units. The second and more fundamental challenge is the similarity of the various flavors of anomalous diffusion to localization and tracking errors. For instance, both the Hurst parameter in FBM and the localization error primarily manifest as negative off-diagonal components of the trajectory increment covariance matrix (Figure 4—figure supplement 11B). Likewise, the erratic jumps of Levy flights have similarities to tracking errors. These issues are likely to become more significant when the sptPALM is lower in quality or highly heterogeneous (due to motion blur, defocus, and nonstationary camera noise).

In a recent objective evaluation of methods to measure anomalous diffusion (Muñoz-Gil et al., 2021), even top-performing methods (including recent machine learning approaches) were associated with mean absolute error gt_0.3 when estimating anomaly parameters for short trajectories (<10 frames). Because SAs create mixture models out of any underlying set of motion models, they could potentially be combined with such approaches (rather than the raw RBME likelihood function we use here) to boost their performance when run on large collections of short sptPALM trajectories.

Neither DPMMs nor SAs have any built-in mechanism to distinguish true jumps from tracking errors. Both rely on trajectories produced by another algorithm. It may be possible to combine both tracking and state occupation estimation into a single inference step using a model defining a joint distribution over states and possible links between detected particles.

Materials and methods

Plasmids

Unless otherwise noted, all PCRs were performed with New England Biosciences Phusion High-Fidelity DNA polymerase (M0530S), and Gibson assemblies (Gibson et al., 2009) were performed with New England Biosciences Gibson Assembly Master Mix (E2611S) following the manufacturer’s instructions. Cloning and expression of plasmids was performed in E. coli DH5α using the Inoue protocol (Im et al., 2011). Plasmids used for nucleofections were purified with Zymo midiprep kit (Zymo D4200) and concentrations were quantified by absorption at 260 nm. Cloning primers were synthesized by Integrated DNA Technologies as 25 nmol DNA oligos with standard desalting, and sequences were verified by Sanger sequencing at the UC Berkeley DNA Sequencing Facility. A complete list of the primers used in this article is provided in Supplementary file 1, and a complete list of the plasmids used in this article is provided in Supplementary file 2.

We produced the vector PB PGKp-PuroR L30p MCS-GDGAGLIN-HaloTag-3xFLAG by amplifying the human L30 promoter with prAH675 and prAH676 and assembling into AsiSI- (NEB R0630) and XbaI- (NEB R0145) digested PB PGKp-PuroR EF1a MCS-GDGAGLIN-HaloTag-3xFLAG. For the expression plasmid PB PGKp-PuroR EF1a 3x-FLAG-HaloTag-GDGAGLIN, we cloned three tandem copies of the SV40 nuclear localization sequence into XbaI- and BamHI-HF (NEB R3136)-digested PB PGKp-PuroR EF1a 3xFLAG-HaloTag-MCS using Gibson assembly.

For constructs expressing RARA-HaloTag domain deletions and point mutations, we first cloned the RARA coding sequence out of U2OS cDNA by extracting RNA from cycling U2OS cells with a QIAGEN RNeasy kit (QIAGEN 74104), preparing cDNA with the iScript Reverse Transcription Supermix (Bio-Rad 1708840), amplifying the CDS with prAH495 and prAH496, then assembling into an XbaI- and NotI-HF- (NEB R3189) digested PB PGKp-PuroR EF1a MCS-GDGAGLIN-HaloTag-3xFLAG using Gibson assembly. Next, to produce the mutants, we amplified parts of the RARA coding sequence in PCR fragments while introducing point mutations or domain deletions at the intersections of the fragments. PCR fragments were assembled into XbaI- and BamHI-HF-digested PB PGKp-PuroR L30p-MCS-GDGAGLIN-HaloTag-3xFLAG using Gibson assembly. The primers used for each construct were as follows: for PB PGKp-PuroR EF1a RARA[ΔNTD]-HaloTag-GDGAGLIN-3xFLAG, PCR fragment 1 was produced with prAH1111 and prAH1112; for PB PGKp-PuroR EF1a RARA[ΔCTD]-HaloTag-GDGAGLIN-3xFLAG, PCR fragment 1 was produced with prAH1113 and prAH1114; for PB PGKp-PuroR EF1a RARA[ΔNTD,ΔCTD]-HaloTag-GDGAGLIN-3xFLAG, PCR fragment 1 was produced with prAH1111 and prAH1114; for PB PGKp-PuroR EF1a RARA[C88G]-HaloTag-GDGAGLIN-3xFLAG, PCR fragment 1 was produced with prAH1113 and prAH1069 and PCR fragment 2 was produced with prAH1112 and prAH1070; for PB PGKp-PuroR EF1a RARA[ΔDBD]-HaloTag-GDGAGLIN-3xFLAG, PCR fragment 1 was produced with prAH596 and prAH704 and PCR fragment 2 was produced with prAH597 and prAH705; for PB PGKp-PuroR EF1a RARA[ΔLBD]-HaloTag-GDGAGLIN-3xFLAG, PCR fragment 1 was produced with prAH596 and prAH706 and PCR fragment 2 was produced with prAH597 and prAH707.

To generate the plasmid-based homology repair donor for gene editing at the human RARA exon 9 locus, we assembled the following fragments by Gibson assembly. For fragment 1, we digested the pUC57 vector with EcoRI and HindIII. For fragment 2, we amplified the left homology arm out of U2OS genomic DNA with prAH599 and prAH600. For fragment 3, we amplified the GDGAGLIN-HaloTag-3xFLAG insert out of the plasmid PB PGKp-PuroR L30p MCS-GDGAGLIN-HaloTag-3xFLAG with prAH601 and prAH602. For fragment 4, we amplified the right homology arm out of U2OS genomic DNA with prAH603 and prAH604.

To generate guide RNA/Cas9 expression plasmids for gene editing at the human RARA exon 9 locus, we cloned the two guide RNA sequences under a U6 promoter in a vector that coexpresses the sgRNA, mVenus, and S. pyogenes Cas9, which has been previously described (Hansen et al., 2017).

In luciferase assays, we used the retinoic acid-responsive firefly luciferase expression vector pGL3-RARE-luciferase (Addgene plasmid #13458; http://n2t.net/addgene:13458; RRID:Addgene_13458), a gift from T. Michael Underhill (Hoffman et al., 2006). Renilla luciferase was expressed from pRL CMV Renilla (Promega E2261).

Share this article

Cite this article

Overview of sptPALM.

Example of sptPALM data.

Illustration of defocalization for a single regular Brownian state.

Illustration of defocalization for multistate regular Brownian motion.

Schematic comparison of finite state mixtures, Dirichlet process mixtures, and state arrays (SAs).

Application of state arrays and Dirichlet process mixture models (DPMMs) to mixtures of Brownian motions.

Comparison of the mean squared displacement (MSD) histogram, Dirichlet process mixture model (DPMM), and state array (SA) methods to recover dynamic profiles from trajectory simulations.

Example of a simulated SPT movie.

Simulated SPT movies at variable excitation pulse widths.

Simulated SPT movies with fractional Brownian motion (FBM).

State arrays (SAs) applied to experimental sptPALM.

Figure 5—source data 1

Spatiotemporal variation in the state array posterior distribution.

Author details

Alec Heckert

Contribution

For correspondence

Competing interests

Liza Dahal

Contribution

Competing interests

Robert Tijan

Contribution

Competing interests

Xavier Darzacq

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism