Dynamic structure of locomotor behavior in walking fruit flies
Abstract
The function of the brain is unlikely to be understood without an accurate description of its output, yet the nature of movement elements and their organization remains an open problem. Here, movement elements are identified from dynamics of walking in flies, using unbiased criteria. On one time scale, dynamics of walking are consistent over hundreds of milliseconds, allowing elementary features to be defined. Over longer periods, walking is well described by a stochastic process composed of these elementary features, and a generative model of this process reproduces individual behavior sequences accurately over seconds or longer. Within elementary features, velocities diverge, suggesting that dynamical stability of movement elements is a weak behavioral constraint. Rather, longterm instability can be limited by the finite memory between these elementary features. This structure suggests how complex dynamics may arise in biological systems from elements whose combination need not be tuned for dynamic stability.
https://doi.org/10.7554/eLife.26410.001Introduction
Behaving animals can act as quickly as their nervous systems allow, as slowly as their environments fluctuate, or over intervals determined by some task. For instance, behaviors may change quickly to avoid predators, slowly to adjust to seasons, or at the intervals of locomotor steps, song phrases, or nestbuilding stages. How different behaviors between these limits fit together remains poorly understood.
Classical ethological studies examined different stereotyped, goaldriven behaviors including feeding, courtship rituals, aggressive encounters, and escape sequences (Tinbergen, 1963). To human observers, many of these behaviors consisted of recognizable movements that often occurred in characteristic sequences. These movements were inferred to be the fundamental units of behavior, and their modularity suggested a similarly modular organization in neural control (Tinbergen, 1950; Simmons and Young, 2010). Yet on the whole, animals can adjust their movements in both discrete and graded ways. For instance, houseflies track other flies in flight using a combination of stereotyped saccadic course corrections and smooth pursuit (Wagner, 1986), and primates track small targets using both saccadic and smooth orienting movements (Fuchs, 1967). While discrete movements can be imprecise, gradual velocity adjustments can be used to tune movements quickly with fine resolution. However, graded movement control, as a dynamical system, requires stability to be useful. As stable implementation of arbitrary dynamical systems is challenging (Smale, 1966), graded control may impose limits on movement complexity or risk susceptibility to unpredictable behavior over time. Thus, while an intuitive description of movement has converged on modularity in both high and low level motor control (reviewed in Giszter and Hart, 2013; Flash and Hochner, 2005), variability and flexibility of ethological units have long been appreciated (Barlow, 1968), and taskspecific coordination of motor units down to individual muscles has been suggested (Kutch et al., 2008; ValeroCuevas et al., 2009). It remains unclear whether degrees of freedom in movements are themselves limited by a finite set of functional modules, or may be adjusted continuously and in a taskspecific manner (Tresch and Jarc, 2009). In invertebrates, it has been argued that behavior modules represent about half of all behaviors (Berman et al., 2014), but it has also been argued that apparent modules represent mere extremes within a continuum of behaviors (Gallagher et al., 2013; Szigeti et al., 2015; Hums et al., 2016). Thus, the extent to which behavior is modular remains unresolved.
Recent advances in monitoring behavior, computing power, and statistical tools have encouraged efforts to examine behavioral organization in an unbiased manner. However, there is scant consensus on methods or criteria for unbiased segmentation. A variety of dimensionality reduction techniques using both linear and nonlinear approaches have been applied to dissecting limb and animal movements in both vertebrates and invertebrates (Fod et al., 2002; Del Vecchio et al., 2003, Avella and Bizzi, 2005; Stephens et al., 2008; 2010, Braun et al., 2010; Gallagher et al., 2013; Mendes et al., 2013; Berman et al., 2014; Vogelstein et al., 2014; Schwarz et al., 2015; Wiltschko et al., 2015). In addition, a number of approaches have used human observations to train statistical models to identify behavioral patterns. For example, the movements of both flies and worms have been described as sequences of behavioral events that were originally selected by human observers (Croll, 1975; PierceShimomura et al., 1999; Branson et al., 2009; Dankert et al., 2009; Kabra et al., 2013; Kain et al., 2013). These methods have allowed humanobserved behaviors to be scored more efficiently, enabling highthroughput quantification of gender and individual differences, or encounter types between individuals (Branson et al., 2009; Dankert et al., 2009). However, given the overall diversity of approaches that have been applied, and the corresponding differences in their conclusions, identifying principles of behavioral organization across different behaviors and organisms remains challenging. Here we develop a set of criteria for behavior segmentation that may be applied wherever time series measurements of behavior can be obtained.
As a simple model of spontaneous behavior, we studied fruit fly locomotion. Due to the fly’s relatively simple nervous systems and compact behavioral repertoire, this model system provides an opportunity for highthroughput studies of behavior that can be linked with circuit function. In this model system, a variety of behavioral sequences have been studied (Strauss and Heisenberg, 1990; Pick and Strauss, 2005; Branson et al., 2009; Chen et al., 2002; Spieth, 1974), and powerful genetic tools can be used to link behavior to circuits (Simpson, 2009).
To build a catalog of spontaneous behaviors, we acquired large datasets capturing the locomotor movements of freely walking animals. We then identified patterns in movement dynamics from body velocity time series using an iterative ICA procedure, and defined unique or interchangeable behavior components from their occurrence statistics. We found that walking behavior can be decomposed into a small number of patterns that occur largely independently of all but the immediately preceding behavior. Next, we developed a statistical model that captured sequences of these patterns over longer timescales, and tested the ability of this model to generate sets of synthetic behavior trajectories matching real fly behaviors. On short time scales, we find a continuum of behaviors whose variation can be captured by a small number of parameters. Over longer times, variation grows, but behavior is predictable because it resolves into a small number of distinct, finite memory episodes, corresponding to elementary walking patterns. As a result, these studies systematically connect the time scales of velocity modulation on the order of the actionperception cycle with longer time scales at which ethological units of behavior are observed.
Results
Behavior shows consistency over short periods, describing the extent of behavior episodes
We set out to characterize walking at time scales from tens of milliseconds to a few seconds, beginning at the time scale of the fly actionperception cycle and extending to the time scale of locomotor behaviors (David, 1984; Nagle and Bell, 1987). Spontaneous walking behavior of female flies was examined in a uniformly illuminated environment without systematically varying visual, auditory or olfactory stimuli. We were interested in finding reproducible behaviors and sought to eliminate individual differences and behavioral changes that might emerge over extended periods of time. To minimize slow behavior changes due to fatigue, circadian phase, or age, wellfed 2–3 dayold flies (n = 7364) walking on a glass surface under dim illumination were tracked alone or in groups, in 10 minute trials during 2 hours of peak circadian activity. In aggregate, our datasets comprised over 1000 flyhours of behavior, and contained over ${10}^{6}$ trajectories (Katsov et al., 2017). Three datasets were collected for different purposes: a dataset of $\sim {10}^{6}$ trajectories that densely sampled walking behavior on short time scales (Dataset S, trajectory length $mean$ = 1.5 sec, ${P}_{95}$ = [0.22,5.1] sec), a dataset of $\sim {10}^{4}$ longer trajectories in a large arena (Dataset L, $mean$ = 5.22 sec, ${P}_{95}$ = [0.42,14.55] sec), and a dataset of $\sim {10}^{4}$ trajectories from isolated individuals (Dataset A, $mean$ = 2.2 sec, ${P}_{95}$ = [0.63,7.2] sec). Given these large datasets, we could effectively sample a wide range of spontaneous locomotor behaviors.
Locomotor behaviors are commonly defined using simple speed heuristics. For instance, turns have been defined in several species, including flies, by setting a threshold on rotational speed, and runs and pauses have been defined by thresholds on translation (Berg and Brown, 1972; PierceShimomura et al., 1999; Geurten et al., 2010; Drai et al., 2000; Wolf et al., 2002). However, behavior categories are not always reflected unambiguously in instantaneous speeds. For instance, speed distributions with multiple modes can reflect distinct behaviors, each represented by a typical speed, but can also reflect biomechanical or other constraints that suppress some speeds in all behaviors, producing troughs in the distribution. To use more information, we looked at movement properties over time.
As Drosophila shows little independent head movement in walking maneuvers (Geurten et al., 2014; Fujiwara et al., 2017), we measured displacement of the entire animal as an oblong shape on a two dimensional surface and inferred the position of the head statistically with greater than 99.8% accuracy (Figure 1A; Katsov and Clandinin, 2008). Three velocity components, translation, rotation, and sideslip, $\overrightarrow{\mathit{v}}=({\mathit{v}}_{T},{\mathit{v}}_{R},{\mathit{v}}_{S})$, completely describe walking at this level (Figure 1B). These velocity components, plus accelerations $\dot{\overrightarrow{\mathit{v}}}=({\dot{\mathit{v}}}_{T},{\dot{\mathit{v}}}_{R},{\dot{\mathit{v}}}_{S})$ were measured from ${10}^{6}$ individual trajectories (Figure 1C).
Movement properties were first examined over a small time step, 33 milliseconds, in a velocityacceleration phase space. In this space, starting from velocity trajectories, the average acceleration at each $\overrightarrow{\mathit{v}},\text{}\mathbf{E}\left[\dot{\overrightarrow{\mathit{v}}}\mid \overrightarrow{\mathit{v}}\right]$, was used to construct a kinematic field (Materials and methods). This field shows average tendencies at different velocities. For example, when flies are not turning (Figure 1D, ${\mathit{v}}_{R}\approx 0$), but have high sideslip they tend to speed up (arrows up), while at low sideslip they tend to slow down (center of panel, arrows down). Altogether, this field constitutes a local description of behavior dynamics as it explicitly describes average dynamics at each velocity bin. For locomotor behavior, this description captures the mapping between observed behavior and aggregate forces due to neural control and biomechanical constraints, as indirectly represented in the accelerations observed at each velocity. If locomotor behaviors are separated by distinct combinations of velocityacceleration pairs, different regions of this space should describe distinct behaviors well. However, in the kinematic field, mean dispersion about the average accelerations is greater than five times the local averages, manyfold greater than our measurement error (Materials and methods). This large dispersion may represent intrinsic behavior variability, or suggest that distinct behaviors are not well represented in their average tendencies at one time scale.
We considered three possible explanations for this large dispersion (Figure 2). First, though unlikely, the maneuvers of walking flies, like those of flying insects, may be relatively unstable (Taylor and Thomas, 2003; Sun and Xiong, 2005). As a consequence, two flies initiating a turn in the same way, with nearly identical velocity, might move in very different ways a short time later. A broad range of accelerations at any given velocity will result, and behavior dynamics, represented by accelerations, would appear poorly constrained. This possibility predicts that if one were to follow trajectories in velocity space, behaviors would diverge quickly from initially similar velocities (Figure 2A). Second, behavior dynamics may be stable, but related trajectories can be reasonably expected to vary (Barlow, 1968). Hence, even if distinct behaviors were well separated in velocity phase space, a range of accelerations at any given velocity would still be observed. However, unlike the first possibility, the second predicts that movements of one type will maintain velocities that are more similar to each other over time than to other movement types (Figure 2B). Related behaviors should remain close in velocity space over time, in distinct regions from other behaviors. The final possibility is that distinct movement patterns use distinct accelerations at similar velocities, and therefore overlap in velocity phase space. As a result, different patterns contribute different accelerations to the same velocity neighborhood. In this case, initially similar velocities would remain similar when behavior is committed to a particular movement type, but will diverge at other times (Figure 2C, red, black squares, respectively).
To test these alternatives, we randomly sampled small velocity neighborhoods, each corresponding to approximately 0.004% of velocity space, and measured divergence of trajectories from each neighborhood over time (Materials and methods). On average, trajectories quickly diverged up to the extent of the entire velocity space (Figure 2D, solid line), excluding the possibility that stable behaviors are wellseparated in instantaneous velocities (Figure 2B). Of the remaining possibilities, is walking behavior unstable (Figure 2A), or composed of distinct patterns that reuse similar velocities (Figure 2C)? We reasoned that if stable behaviors exist, they should be most consistent when behavior is committed to a particular movement type. Therefore, we looked for behavior episodes between locomotor adjustments. Whenever locomotor behavior is adjusted, some force must be applied to alter velocity. As a result, behavior adjustments will correspond to accelerations or decelerations. As flies accelerate or decelerate for finite periods of time, episodes of behavior adjustment must have extrema. Acceleration extrema correspond to inflections in velocity and hence we chose to examine trajectory intervals that included at least two velocity inflections, bounding an interval between behavior adjustments. We observed that the rotational component fluctuated throughout all trajectories, and displayed a steep autocorrelation falloff (Figure 2E,F). Indeed, local peaks in $\mathit{v}}_{R$ arise frequently enough, with an interpeak interval of $250\pm 110$ ms, that almost every trajectory can be represented in its entirety by short trajectory fragments surrounding $\mathit{v}}_{R$ peaks. When fragments include most of each interval before and after a peak, they are nearly guaranteed to include behavior adjustments on either side of the peak, capturing a relevant behavior interval. To test whether behavior divergence is ever bounded during these episodes, we examined behavior starting from defined times preceding velocity peaks. Specifically, we selected velocity neighborhoods randomly throughout velocity space again, but only examined divergence of trajectories where a $\mathit{v}}_{R$ peak occurred a defined time later, ranging from 70 ms to 170 ms. Divergence should be unaffected by this selection if behavior is unstable (Figure 2A). To the contrary, we observed that on average divergence plateaued for up to 200 ms when trajectories were sampled up to 130 ms before $v}_{R$ peaks, but not earlier (Figure 2D, dashed line, and d.n.s). This period of bounded divergence suggested that there exist behavior patterns that remain consistent for about 200 ms, beginning from approximately the middle of the average interpeak interval. Taken together, our findings excluded possibilities A and B, suggesting that different movements must overlap in velocity phase space (Figure 2C), while consistent behaviors may be defined with respect to $\mathit{v}}_{R$ peaks.
Unbiased segmentation of trajectories defines modal movement patterns
We next sought to separate behavior patterns statistically. In walking flies, certain movement patterns stand out for human observers: for instance, stops, crabwalks, sharp turns, straight and curved walks (eg: Strauss and Heisenberg, 1990; Branson et al., 2009; Kain et al., 2013; Geurten et al., 2014). However, on closer inspection, even seemingly stereotyped patterns like sharp turns present a variety of movements, producing a range of very large to barely perceptible orientation changes. Does this range include distinct movement patterns? Conversely, are movement patterns that look similar to human observers really used by animals as if they are the same? Without explicit human guidance, approaches to behavior segmentation have been based on models of joint kinematics, movement dynamics, or muscle synergies (eg: Fod et al., 2002; Del Vecchio et al., 2003; d'Avella and Bizzi, 2005), or on classification of movements from distributions of instantaneous velocities (Braun et al., 2010; Gallagher et al., 2013), or postures (Stephens et al., 2008, 2010; Berman et al., 2014; Schwarz et al., 2015; Wiltschko et al., 2015), or experimental activation of neuron groups (Vogelstein et al., 2014). Our strategy was to identify any behaviors that may be different in dynamic structure, and then to use statistical relationships between these behaviors to determine if multiple behaviors belong in the same group, or whether one identified behavior is used in different ways so as to suggest multiple underlying states.
Distinct behaviors arise from differences in how flies accelerate or decelerate. Due to inertia, velocities cannot change instantaneously. However, as observed in Figure 2D, velocities may or may not remain similar over time, depending on what phase of behavior is examined. Because behaviors maintained similarity around $v}_{R$ peaks, trajectory fragments surrounding $\mathit{v}}_{R$ peaks were chosen for analysis. Specifically, fragments were selected including 550 ms before and after each $\mathit{v}}_{R$ peak, encompassing over 99% of all interpeak intervals (Figure 3A, Box 1). This fragment length extended beyond the average period of bounded divergence to include as many full intervals between behavior adjustments as possible, allowing for extended behaviors. As a result, a series of partially overlapping fragments were extracted from each trajectory. These fragments were aligned in time to when a peak occurred in $\mathit{v}}_{R$, thereby aligning the adjoining rising and falling $\mathit{v}}_{R$ signals across trajectories. In this aligned dataset, distinct movement patterns would be marked by different magnitudes of $\mathit{v}}_{R$ peaks, different temporal patterns of changes in velocities around turn peaks, or both. The aligned dataset was represented in a matrix, where each column contained a different fragment around a unique $\mathit{v}}_{R$ peak, and each row represented a step in time relative to that peak, for each velocity component. Different columns then contained either variants of related behaviors, or distinct movement patterns.
Segmentation of movement patterns, setup.
Starting with the full set of trajectories $\overrightarrow{\mathit{v}}(\tau )=[{\mathit{v}}_{T},{\mathit{v}}_{R},{\mathit{v}}_{S}](\tau )$, times $\{{t}_{e}\}$ of local extrema in ${v}_{R}(\tau )$ were identified and trajectory fragments were isolated around each extremum, $\overrightarrow{\mathit{v}}(\tau ={t}_{e})$. These fragments, $\overrightarrow{\mathit{v}}({t}^{\mathrm{\prime}})$, ${t}^{\prime}=\tau {t}_{e}$, were normalized to the sign of ${v}_{R}({t}^{\mathrm{\prime}}=0)$, making all turns at ${t}^{\prime}=0$ positive but preserving sideslip direction relative to the turn. Each fragment of $n$ time samples was represented as a $(3n\times 1)$ column vector:
Then, vectors ${\mathbf{v}}^{(j)},\phantom{\rule{thinmathspace}{0ex}}j=1,\dots ,N$, representing $N\text{}{v}_{R}$ peaks, were concatenated horizontally into a $(3n\times N)$ matrix $\mathbf{M}$, such that each row $i$ was a velocity $c{v}_{R},\text{}{v}_{T}\text{}\text{or}\text{}c{v}_{S}$ at the same time ${t}^{\prime}$ from a $v}_{R$ peak at ${t}^{\prime}=0$;
Correlations between rows in this matrix relate velocity components at aligned times ${t}^{\prime}$ from $v}_{R$ peaks in the dataset. Because velocity components differ in range and distributions, which can also change over time relative to $v}_{R$ peaks, rows were standardized to obtain matrix
This matrix may contain two kinds of mixtures of movement patterns. First, each column (or a part of it in time) may represent a variant of a single movement pattern. This movement pattern may be described in terms of independently adjustable parts of its temporal structure, or independent components. Framed in terms of the standard ICA generative model (Hyvärinen and Oja, 2000), it is hypothesized that vectors of this matrix can be described as:
where $\mathbf{a}}_{i$ are columns of a mixing matrix $\mathbf{A}$, and ${s}_{ij}$ is a coefficient of sample $j$ in the ${i}^{th}$ basis vector, of the $D$ basis vectors that span the space of independent components of $\mathbf{Z}$. This basis vector set represents degrees of freedom in velocity trajectory shapes over time. ${s}_{ij}$ may take on continuous or discrete values, corresponding to continuous or discrete modes of movement control. The matrix $\mathbf{A}$ produces timeevolving trajectories by giving each scalar ${s}_{ij}$ an extent in time, and mixing these components. If movement dynamics can be represented by fewer degrees of freedom than dimensions of the standardized velocitytime vector $\mathbf{z}$, then $D\text{}\text{}{\mathbf{z}}^{(j)}=3n$.
The second kind of mixture in $\mathbf{Z}$ may be a mixture of distinct movement patterns, such that any given column (or part of it in time) contains one type of movement pattern, while different columns may contain qualitatively different patterns. If all movement dynamics are subject to the same degrees of freedom, then the same mixing matrix $\mathbf{A}$ applies to the entire dataset, and the space of independent components $\{\mathbf{s}\}$ is spanned by the same basis vector set $\mathit{\beta}$. Alternatively, different mixing matrices $\mathbf{A}}^{(C)$ and independent components $\mathit{\beta}}^{(C)$ describe each subset $C$, corresponding to distinct movement patterns.
https://doi.org/10.7554/eLife.26410.005To identify potentially different movement patterns, we looked for simple descriptors of velocity fragments, asking which fragments are well captured by the same set of descriptors, and which fragments require different sets of descriptors. These descriptors were inferred from patterns of variation in the fragment set, using independent components analysis, ICA (Hyvärinen, 1999; Hyvärinen and Oja, 2000; Himberg et al., 2004), Box 1, 2, Materials and methods). Using ICA, the shapes of trajectory fragments in velocity space were reduced to a set of independent parameters. These parameters may be thought of as degrees of freedom in the temporal structure of velocity trajectories. An individual trajectory fragment, as one trajectory shape described by a set of parameters, is represented by a coefficient in each parameter. To separate potentially unrelated fragments, we assumed that under most circumstances, parameter coefficients should be distributed unimodally if a parameter represents one degree of freedom of a single movement pattern. Alternatively, a multimodal coefficient distribution indicates that movement represented in this parameter may be impeded by some mechanical constraint, that movement is not well represented in this parameter, or that more than one movement type is present. For these reasons, coefficient distributions were examined once independent components were identified in a fragment dataset, starting from the complete dataset consisting of $2.9\cdot {10}^{6}$ trajectory fragments. Whenever a coefficient distribution showed multiple modes or inflections, raising the possibility of contributions from multiple movement patterns, data were separated by mode and the smaller pool of trajectory fragments corresponding to each subset was reanalyzed to find its own independent components. Because different movement patterns could arise at different frequencies, and require disambiguation of different velocity components on distinct time scales, the entire procedure was repeated iteratively for each subset of raw velocity trajectories corresponding to a distinct coefficient subset, until further iteration produced no separable data features, did not converge, or produced independent components that did not differ from previous iteration (Box 2 and Materials and methods).
Segmentation iteration.
(1) Construct matrix $\mathbf{M}$ from trajectory fragments around $v}_{R$ peaks. Include all velocity components (Box 1, and Materials and methods Equation 3). Columns are aligned in time with respect to $v}_{R$ peaks.
(2) Standardize, whiten rows of $\mathbf{M}$ to obtain matrices $\mathbf{Z}$ and $\mathbf{Z}}^{w$, respectively (Materials and methods Equations 4–5).
(3) Select the largest principal components (PCs) to retain $>85\%$ of total data variance. Eliminate rows corresponding to the remaining PCs from $\mathbf{Z}}^{w$, obtaining a reduced matrix $\stackrel{~}{\mathbf{Z}}}^{w$.
(4) Apply Independent Components Analysis (ICA) to $\stackrel{~}{\mathbf{Z}}}^{w$. Repeat with random initial guesses for ICs.
(5) Examine ICs and data projected into ICs
if ICs fail to converge then
(5.1) Terminate procedure. Examine previous steps; reduce velocitytime points in $\mathbf{M}$
(dimensions of interest) as warranted. Proceed from step 1.
else if ICs represent the same dimensions as preICA then
(5.2) Terminate procedure. Trajectories in matrix $\mathbf{M}$ represent a submode
else
(5.3) Project trajectories into ICs
if Parameter distribution in ICs contains no separable features then
(5.3.1) Terminate procedure. Trajectories in matrix $\mathbf{M}$ represent a submode
else
(5.3.2) Separate features in IC projections.
(5.3.3) Identify trajectory fragments that correspond to each feature class in ICs.
(5.3.4) Repeat, from Step 1, for each data subset.
end
end
https://doi.org/10.7554/eLife.26410.006Walking patterns classified in this way were termed submodes, as this analysis favored splitting over lumping of distinct behaviors and left open the possibility that identified behaviors may be subsets of larger behavior groups. Submodes included stops and dithers, when flies shifted around while remaining in one place, straight runs, smooth turns and several kinds of sharp turns, as well as different crabwalks marked by sideways motion. In addition, several variations of a sharp turn were found that included sideways or backward movement at high velocity (Figure 3B, Figure 4 and Figure 4—figure supplement 1). As no more than 5–6 parameters were needed to account for at least 85% of data variance at each iteration of the classification procedure, each movement pattern was reasonably well captured by a small number of parameters (see Materials and methods). Altogether, 15 submodes were identified that tiled individual trajectories from a dataset collected from thousands of individuals, likely accounting for all major walking patterns of female flies.
As a whole, the submode set included movements salient for human observers (Branson et al., 2009; Kain et al., 2013, pers. observation), but the extent and composition of some submodes and the diversity of others was not intuitive. For instance, turns represented in submodes 7,10, and 11 (Figure 4, Figure 4—figure supplement 1) were easy to pick out as one kind of behavior, but not necessarily three. Some human calls picked out features that were not identified as individual submodes: for instance, reversals did not appear in submodes on their own, but as part of more complex movement patterns. Similarly, submode 1 spanned a range of movements from clear turns to relatively straight walks, while submode 2 included both average and very slow walks, sometimes but not always with a clear sideslip component. Are such different movements really associated together? Conversely, five submodes represented more or less subtle variants of sharp turns (submodes 3,6,7,10,11) and several submodes included various subtle rotations or sideslip (submodes 4,5,9,12–15). Are these similar movement patterns really used by animals in different ways?
To answer these questions, we examined submode relationships over time. One possibility is that each of the identified submodes represents a completely autonomous behavior. If this were the case, each submode would occur independently of submodes that come before or after. In all other cases, the statistics of submodes that tend to occur before or after a given submode can be used to identify submodes that share the same statistical relationships. Such submodes will be grouped into units we will call ‘modes.’
Submode sequences demonstrate limited memory
To measure the extent of submode relationships over time, we first measured how well the identity of submodes at one time predicts the identity of their neighbors in the same trajectory. Velocity trajectories were converted to submode sequences by classifying each $\mathit{v}}_{R$ peak and the surrounding trajectory fragment, using parameters found in the previous analysis (Materials and methods). For each trajectory $j$, we obtained a submode sequence ${\{{u}_{i}\}}^{(j)}$, where $u$ represents submode identity, and $i$ indexes time. Among all trajectories, we found that over a quarter of the uncertainty about submode identity ${u}_{i}$ is explained by the previous submode ${u}_{i1}$, indicating that submodes are not independent $(\rho =I({u}_{i};{u}_{i1})/H(u)=0.271,C{I}_{95}=[0.270,\mathrm{\hspace{0.25em}0.272}]$; where $I$ is mutual information and $H$ is Shannon entropy (Cover and Thomas, 1991) ). However, after accounting for the history of one previous submode, almost none of the remaining uncertainty was explained by submodes one step further in the past, ${u}_{i2}({\rho}_{2}=I({u}_{i};{u}_{i2}\mid {u}_{i1})/H({u}_{i}\mid {u}_{i1})=0.015,C{I}_{95}=[0.014,\mathrm{\hspace{0.25em}0.015}])$. This result suggested that transitions between submodes depend on the current submode, and negligibly on submode history two or more steps in the past. However, it is possible that a small contribution of prior memory can accumulate over time.
To test the cumulative impact of history on submode transitions, two Markov models that neglected all but one or twostep submode history were used to generate short synthetic submode sequences that were then compared with real sequences. These two models generated nearly indistinguishable sets of submode sequences, each comparable to random sets of real sequences (Figure 5—figure supplement 1A). Specifically, synthetic sequences from a firstorder Markov model matched almost the same number of real sequences as were matched between two random sets of real sequences (matched fraction = $0.93\pm 0.009$, see Materials and methods), while sequences from a secondorder Markov model matched real sequences only slightly better (matched fraction = $0.96\pm 0.008$). Given that nearly equal data fractions were matched accurately by assuming either one or two submode history, we concluded that models conditioned on one previous submode are sufficient to capture most short submode sequences. As one submode but not prior history effectively influenced transitions to the next submode, this analysis argues that memory of past behavior must extinguish between the start of one submode and the start of the next. Hence, while individual submodes represent unique temporal correlations in velocities, submodes also correspond to periods during which memory of past behavior is effectively extinguished. In this way, velocity associations over short periods captured by submode identities also defined behavior elements that are decoupled over longer periods.
Statistical relationships between submodes suggest groups of related submodes
Next, we asked whether the statistics of submode transitions suggested additional levels of behavior organization. In principle, there are two possibilities. Statistical relationships shared by more than one submode could indicate that these submodes are interchangeable in the context of other submodes, and therefore may be grouped, or transition statistics may indicate that more than one behavior underlies a single submode. To allow for both possibilities, we used the framework of Hidden Markov Models (HMMs) to identify statistical patterns in submode sequences. Different HMMs were constructed and evaluated using multiple criteria while systematically varying the assumed number of hidden states (Materials and methods). For each number of hidden states, multiple models were trained from random initial conditions to fit observed submode sequences. All models assumed that state transitions depend only on the current state, approximating history dependence in submode transitions. Models of equivalent complexity were evaluated using the likelihood attributed by each model to real submode sequences. Models of different complexity were compared using a variant of sampling from a generative model, testing each model’s ability to generate sets of individual behavior sequences that correctly represented the distribution of real trajectories (Materials and methods). By these criteria, it was found that 6state models reproduced real submode sequences as well as models that used information about all 15 submodes, some 5state models performed similarly to 6state models, and all simpler models performed significantly more poorly (Figure 5—figure supplement 1C). Therefore, we concluded that a five state model represented a good tradeoff between model complexity and accuracy, and chose one such model based on its performance (Figure 5—figure supplement 1E,F and Materials and methods). This model, as all wellperforming models, revealed a largely straightforward pattern: most submodes mapped predominantly to a unique underlying state (Figure 5—figure supplement 1D). This result was surprising as by construction, HMMs allowed each submode to be associated with multiple states. As most submodes mapped to only a single state, some submodes were associated with the same state because there were more submodes than states. Submodes that mapped to the same state could be considered a group based on their shared statistical properties. An exception to this pattern was behavior at low velocities and stops (submode 8, and one or more of the rare submodes 4,5,9,14,15), which showed signatures of potentially mixed behavioral states (see Materials and methods and below). However, where evaluated, division of submodes into more underlying processes increased model complexity without substantially improving fit to data (see below). As many different models were evaluated, we observed that groupings of the most frequent submodes agreed between the best Markov models but submodes that occurred rarely in trajectories could be assigned to different groups, presumably reflecting their modest contributions to the dataset. As a result, a small number of models that grouped rare submodes in different ways performed comparably well by all model selection criteria, and could not be differentiated further (Figure 5—figure supplement 1C,E). In addition, transition probabilities below approximately 0.005 could not be accurately determined, due to finite dataset sampling error (see Materials and methods). Nevertheless, as a 5state Markov model could generate sets of synthetic behavior sequences that matched the distribution of real sequences nearly as well as more complex models, we concluded that five submode groups, or modes, provided a sufficient model of submode organization (Figure 5—figure supplement 1A,E, Materials and methods).
How are submodes organized? Multiple submodes may map to the same mode because each occurred between the same set of other movements, with similar transition probabilities. Provided their statistics are wellsampled, either similar or distinct movements may be legitimately grouped by these criteria. Were similar or distinct movements grouped in modes? Each mode, like each submode, can be described as a distribution in four dimensions: the threecomponent velocity vector $\overrightarrow{\mathit{v}}$, plus time. We call this distribution a velocity profile. Qualitatively, the velocity profiles of most submodes assigned to the same mode are similar (Figure 4), suggesting modes represent major groups of movements.
Movement patterns represented in two of the modes were confined to either low or high ranges of $\mathit{v}}_{R$ and $\mathit{v}}_{T$ (Modes I, V respectively, Figure 5A,E). Velocity profiles of the other three modes all spanned low and high velocities in multiple velocity components (Modes IIIV Figure 5B–D). Movement patterns in these modes included: sharp and shallow turns accompanied by forward acceleration (Mode II), various sideslip motions at slow speeds, often but not always preceded by decelerations (Mode III), and slower, relatively straight runs with occasional sideslip (Mode IV) (Figure 5B–D, Figure 4). Individual movement patterns assigned to a mode could be largely judged by human observers to represent variations on a theme (Figure 4, Figure 4—figure supplement 1, and pers. obs.).
However, both unexpected groupings and distinctions were found for statistically wellrepresented movement patterns. Fast, straight walking was grouped with smooth turning in the high velocity Mode V, while intermediate and slow straight walks, with occasional side slip, were grouped together in a different mode, IV (Figure 5D). That is, in this segmentation, two apparently different behaviors, some turns and straight runs, were grouped together, and were distinguished from other types of related behaviors. On the other hand, two sharp turns (submodes 6 and 7) differed only subtly, but were never assigned to the same mode in any wellperforming Markov model we evaluated. These groupings contrast with previous behavior segmentation efforts, where turns were deemed by human observers to be separate, a priori, from straight runs of all speeds, and sharp turns were treated as a single category (PierceShimomura et al., 1999; Branson et al., 2009; Kain et al., 2013; Geurten et al., 2014). Crucially, in our segmentation distinct movements such as turns and straight walks of Mode V were not grouped because they tended to occur one after the other, but because they occurred interchangeably between other movements. Statistical relationships therefore also revealed associations beyond movement similarity. These relationships, the transition statistics between modes, showed an additional distinction. Modes I and V had high selftransition probabilities, and did not communicate with each other directly at significant frequencies (Table 1A). Rather, transitions between Modes I and V typically required passage through one or more of the intermediate Modes IIIV (Figure 5H). Due to this transition structure, Modes I and V showed longer residence times than other modes (Figure 5G).
A critical prediction of the Markov model used to capture mode transitions is that the dwell time distribution describing the time flies spent in each mode should be fit by a single exponential. An excellent fit between real and modelgenerated dwell time distributions shows that the model captures most transition probabilities in all modes, although some slow transitions in modes I and III are missed (Figure 5G). Because mode III occurred infrequently (accounting for 7.3% of data), and its dominant, fast component is well captured by the model (Figure 5G) it was not segmented further. Two components in the stopped mode (I) dwell time distribution were separated using a Hidden Markov model, capturing observed dwell times that varied over 2–3 orders of magnitude (Figure 5G). However, the additional stopped mode (${I}_{B}$) that the HMM uncovered only communicated with other stops (mode ${I}_{A}$), and did not improve model performance quantitatively (data not shown). We therefore did not consider this additional mode in our subsequent analysis, and infer that this second stopping mode, ${I}_{B}$, likely represents a generally nonresponsive state, such as sleep (Shaw et al., 2000; Hendricks et al., 2000). However, stopped flies also engage in several nonlocomotor behaviors such as eye, wing, and abdomen grooming, oviposition, and defecation that our dataset was not designed to resolve. These behaviors would then comprise subsets of mode I.
The set of observed behaviors, characterized as submodes, is also described more coarsely by the smaller number of modes, potentially eliminating information. However, in sequences, uncertainty about mode identity is explained as well by the previous mode as uncertainty about submode identity is explained by the previous submode (mode $\rho =0.303,C{I}_{95}=[\mathrm{0.302\hspace{0.25em}0.304}]$). Conversely, as with submodes, almost none of the remaining uncertainty is explained between modes two steps apart (mode ${\rho}_{2}=0.0069,C{I}_{95}=[\mathrm{0.0066\hspace{0.25em}0.0072}]$). By contrast, when submodes are grouped arbitrarily into five elements, information can both decrease between adjacent elements in a sequence, and increase spuriously between elements two steps apart (Figure 5—figure supplement 2). These results argue that grouping submodes into modes neither lumped behaviors in a way that eliminated information, nor scrambled existing relationships.
We next sought to determine the extent to which the structure of our model was influenced by the structure of our chamber, or by trajectory length. We therefore examined whether submode segmentation, groupings, and transition structure remained adequate in another dataset collected under identical illumination, but in a large arena where longer individual trajectories were obtained (Dataset L). Altogether, stereotyped patterns were broadly preserved between datasets, as the same segmentation criteria unmixed trajectory coefficient distributions from either dataset in the same independent components at all steps (Figure 6). At a more detailed level, we quantified velocity profile similarity using an intuitive measure of distribution overlap, the Bhattacharyya coefficient $B$ that ranges from 0 (no overlap) to 1 (complete overlap) (Kailath, 1967, Materials and methods). Overlap was high between the same mode types segmented in different datasets (median overlap $B=0.976,{P}_{95}=[\mathrm{0.942\hspace{0.25em}0.982}]$). By comparison, overlap was lower between different modes both within and between datasets (median $B=0.642,{P}_{95}=[\mathrm{0.190\hspace{0.25em}0.822}]$). Comparable results were obtained using other metrics of similarity, as well as in a third dataset where individual flies were recorded in isolation (data not shown). Thus, modes retained over 90% identity and remained distinct, with only small shifts in their velocity profiles, across different datasets.
Because trajectories are continuous and different modes must connect with each other as the fly moves, a mode’s velocity profile could include parts of transition periods between modes. Velocity profiles may thus change because the distribution of adjacent modes changes. In addition, velocities may also be modulated independently of mode structure. To investigate these possibilities, we compared transition probabilities and velocity profiles during transition periods in datasets S and L, representing two conditions. Indeed, although mode connectivity remained largely the same, transition probabilities differed between conditions (Figure 6—figure supplement 1). As modes were defined by classifying trajectory fragments around $\mathit{v}}_{R$ peaks, in this segmentation transition periods occurred in the intervals between $\mathit{v}}_{R$ peaks. Accordingly, we compared velocity profiles of the 25 possible interval types between the five modes from datasets S and L. Of these interval types, 22 were sufficiently well sampled to compare (Table 1), and of these, 19/22 showed velocity profiles that were statistically distinguishable between conditions (Table 2, Table 3, Materials and methods). However, interval velocity profiles of the same type overlapped to a high degree, even when they differed statistically (median overlap $B=0.96,{P}_{95}=[\mathrm{0.87\hspace{0.25em}1.0}]$). Hence, considering both velocity profiles and mode connectivity, model structure was broadly similar between two different experimental conditions. At the same time, transition probabilities between modes changed with condition, and velocity profiles differed subtly regardless of how they were measured. While modes can be reliably identified under different conditions, subtle velocity profile shifts may reflect altered transitions between modes, behavioral changes independent of mode structure, or both.
How well does a Markov model that ignores velocity profile shifts and lingering correlations between submodes predict real behavior? Can real, extended behaviors be reproduced by a model that abstracts velocity trajectories as modes connected by Markovian transitions? To test the model, we measured matches between real mode sequences observed in trajectories, and synthetic sequences produced by the Markov model, while varying sequence length. Real sequences were obtained by classifying sequences of $\mathit{v}}_{R$ peaks and their surrounding trajectory fragments in trajectories from dataset L according to the 5mode model (Materials and methods, Figure 5—figure supplement 1). Synthetic sequences were sampled from a generative model, the 5state Markov chain representing mode transitions (Figure 5H). As in previous analyses, sets of both real and synthetic sequences can be considered a random sample of all sequences flies, or the model, might generate. Accordingly, model performance was assessed by resampling sequences from real and synthetic sequence sets, then measuring the average number of exact matches between synthetic and real samples, normalized by matches between two real samples (Materials and methods). As before, this procedure was quantified by the match fraction $f$, ranging from 0 when synthetic sequences differ completely from real ones, to one when synthetic and real sequence sets are indistinguishable. Measuring $f$ at increasing sequence lengths, we defined ${\tau}_{P}$ as the time interval over which $f\ge 0.9$, corresponding to sequence lengths over which the model performed accurately. Remarkably, the Markov model remained accurate over seconds (${\tau}_{P}=4.3\pm 0.55$ s, Figure 7A). By comparison, velocity autocorrelations nearly vanish by 1 s (Figure 2F) and similar velocity patterns diverge throughout velocity space after about 200 ms (Figures 2D and 7B). The Markov model thus reproduced samples of individual behavior beyond the time scale when patterns in velocity trajectories remain similar enough to be predictable. In fact, the Markov model was accurate across the entire time span of behavior that was well sampled in our trajectory datasets, raising the possibility that it may also describe walking behavior over longer periods.
Did the model capture the range of observed individual behaviors? In principle, a model might perform well simply by capturing the most common transitions in individual trajectories, such as selftransitions of stops and highspeed runs and turns (modes I and V), and matching only common behaviors like extended stop or run episodes. If this happened, matches evaluated in the generative test could be skewed by common behavioral sequences. To address this possibility, frequencies of real and modelpredicted mode sequences were compared over increasing sequence lengths up to ${\tau}_{P}=4.3$ s. Importantly, the observed frequencies of individual sequences in real trajectories were reproduced in the modelgenerated sequences, and thus the model did not simply describe behavior on average (Figure 7C). Moreover, as common mode sequences were no better matched than rarer ones, model accuracy did not depend on a few overrepresented trajectory types (Figure 7C). Finally, because frequencies of all real mode sequences above the sampling error in our data were matched by the Markov model with a Pearson Coefficient of R = 1.00, we conclude that the model captured all measurable transition types. Ten of these mode sequences were selected randomly and five random samples of each sequence type are shown in Figure 7—figure supplement 1, along with samples of the most frequent sequence type, repeats of the highspeed runs and turns mode V. In sum, velocity patterns were distinguished by the same segmentation criteria on different conditions and Markov models captured transition probabilities specific to each condition, sufficiently to predict individual behavior sequences at least an order of magnitude longer than the extent of locally correlated velocity patterns.
Finally, we asked whether velocity trajectories become predictable over longer periods once mode identity is known. Starting from small neighborhoods in velocity space preceding each of the five modes, trajectory divergence was measured separately for each mode. We found that after the same short period over which unsegmented trajectories remain nearby, divergence in velocity space is unaffected by segmentation (Figures 2D and 7B). Whether segmented by mode identity or not, similar trajectories diverged from $<0.004\%$ of velocity space 100 ms before $\mathit{v}}_{R$ peaks, near mode initiation, across 15–20% of velocity space over the following 200 ms, during mode execution, and throughout velocity space afterwards (Figure 7B, Figure 7—figure supplement 2). As unbounded divergence occurred at the average time of behavior adjustments, midway between velocity peaks or modes, this finding raises the possibility that divergence is linked with apparent memory extinction captured by Markov models between the start of one mode and the start of the next. Intriguingly, bounded but significant divergence within modes also raises the possibility that a mechanism for memory loss can be intrinsic to mode structure: first order Markovity may be approximated as trajectory history is scrambled over the course of one mode (Figure 7D,E).
Discussion
This work examined dynamics of walking in a relatively complex animal, the fruit fly, using principled criteria. We find that fly movements naturally decompose into different time scales, identify major movement patterns, and characterize properties of these patterns over time. On a fast time scale, distinct movement patterns emerge from patterns of variation in trajectory fragments. Each movement pattern corresponds to a continuum of behaviors that can be described by a common set of parameters, representing stereotyped, but not invariant behaviors. On a longer time scale, sequences of these patterns can be approximated as a stochastic, finitememory process. In all, this description captures a dynamic structure of behavior spanning tens of milliseconds to seconds.
Defining behavior structure from dynamics
Our behavior segmentation criteria were designed to identify subsets of trajectories that showed common dynamics within a subset, and mutually independent variation between subsets, consistent with signatures of distinct motor planning or control. Segmented by these criteria, some elementary behaviors were behaviors picked out by human observers or prior automated segmentation strategies, while others revealed surprising associations or distinctions between movement patterns. For instance, behaviors we termed submodes identified smooth turns, sharp turns, and sideways movements, all of which can be picked out by human observers (Branson et al., 2009; Kain et al., 2013), pers. obs.). Less intuitively, sharp turns that differed only subtly were distinguished as different behaviors (Figure 4: submodes 3 or 7 vs. 6), while a range of behaviors from turns to straight walks were associated in a single behavior element (Figure 4 and Figure 4—figure supplement 1: submode 1). Consistently, when statistical relationships between segmented elements were examined, subtly different sharp turns remained separate (in modes II vs. V), and turns remained grouped with some straight runs but not others (modes IV vs. V, modes II vs. III). By comparison, human observers have recognized sharp (‘saccadic’) turns as a distinct behavior, but had not recognized the spectrum of different turn types, and had culled turns from straight runs as different behaviors a priori (Croll, 1975; PierceShimomura et al., 1999; Branson et al., 2009; Kain et al., 2013). Likewise, so had behavior segmentation strategies that largely removed human judgment but parsed behavior in instantaneous observable parameters, by thresholding fast versus slow velocities, or by segmenting orthogonal (mutually exclusive) animal postures (Braun et al., 2010; Geurten et al., 2010, Geurten et al., 2014). These simple examples underscore how differently behavior dynamics may be organized relative to behavior features salient to human observers, or machine classification that collapses time scales.
Contrary to these segmentation strategies, our criteria identified distinct behavior patterns that overlap substantially in velocity space (Figure 5F). Because they overlap, these patterns cannot be separated from each other solely due to biomechanical constraints. At the same time, these patterns are unlikely to represent mere extremes of a behavior continuum (Gallagher et al., 2013; Szigeti et al., 2015), because they show mutually independent variation and approximately Markovian statistical relationships over time.
We hypothesize that dynamically independent behaviors in our segmentation reflect differences in neural control. While no comparable data yet exist in walking Drosophila, dynamics of global neural activity in C. elegans suggested that worm locomotion includes multiple, distinct turn types and forward run states that had not been uniquely identified in previous analyses of worm behavior (Kato et al., 2015). In worms, however, behaviors have not yet been defined directly from the dynamics of locomotion. It remains to be seen whether neural and behavioral states correspond when both are defined from dynamics.
Measurement
Several approaches to behavior segmentation have been developed to date, using different measurements that include parameterized models of limb or wholeanimal posture (Fod et al., 2002; Stephens et al., 2008), two or threedimensional images of postures (Berman et al., 2014; Wiltschko et al., 2015), measures of gait (Kain et al., 2013), and wholebody velocity (Braun et al., 2010; Geurten et al., 2010). The most comprehensive measurements could seem like the best starting point for unbiased segmentation, but which measures of behavior are informative depends on behavior organization, itself unknown. Since similar goals can be reached using different movements, and similar movements can be produced by different muscle actions (Bernstein, 1967; Winter, 1984), an increase in measured parameters does not guarantee a proportional increase in behavior resolution. On the other hand, to find behaviors from first principles requires searching potential combinations of parameters and their dynamics. This search space expands with the number of parameters. To aid search, two strategies have been applied. In one, the space of instantaneous postures is decomposed first, and then, in a reduced space of posture elements, dynamics are fitted to indirect measures of behavior (Stephens et al., 2008, 2010). In other approaches, indirect measures of dynamics are included with posture measurements (Berman et al., 2014; Wiltschko et al., 2015). In both cases, the representation of either behavior or its dynamics is indirect at segmentation.
For this work, our aim was to analyze behavior directly from dynamics. For this reason we chose behaviors that could be represented by a small number of parameters. Walking Drosophila show multiple gaits (Strauss and Heisenberg, 1990), but gait use is correlated with velocity, gaits transition smoothly with velocity changes (Mendes et al., 2013), and movements of different body segments are tightly correlated during walking maneuvers (Geurten et al., 2014; Fujiwara et al., 2017). For these reasons, and since body displacement is the goal of locomotion, measurement of body velocities over time can provide sufficient resolution for locomotor behaviors in flies. However, many animals use more complicated limb or body segment movements during locomotion, and nonlocomotor behaviors require measurements of posture (Stephens et al., 2008; Wiltschko et al., 2015). In spite of a long history of investigations, it remains to be seen in what ways movement is structured with respect to specific body configurations, action goals irrespective of configuration, or both.
Experimental biases
Animal behavior bridges many types of external events and internal processes. Unsurprisingly, behaviors operate on multiple time scales, and many behavior properties are sensitive to environmental conditions. Fly locomotion shows trends on the scale of minutes to hours that are sensitive to age, sex, strain, satiety and other internal and environmental conditions (Martin et al., 1999a); pers. obs.). For instance, after removal from one type of laboratory food, locomotion decreases over tens of minutes under low illumination in both genders of different strains, but activity is genderdependent at shorter time scales, straindependent at longer time scales, and illuminationdependent on the scale of 1–2 hr (Martin et al., 1999a). Under some conditions, fly locomotion has also been suggested to be scalefree (Cole, 1995). In all, fly locomotion can show trends on multiple time scales that depend on experimental conditions.
For this work, the goal was to describe properties of behavior elements under stable conditions. Therefore, experimental conditions were chosen so as to minimize trends in locomotion over the course of experiments. In these conditions, flies kept moving over 30 min and velocities approximated steady state for the entire period. Data was then collected for 10 min to stay well within this period. In addition, a few different conditions were explored but all shared the same criterion, that flies remain active without gross trends in locomotion during the experiment (see Results). By this criterion, we tried to isolate behavior properties from other potential variables controlling behavior.
Under different conditions, trends in fly behavior were found on long time scales by Berman et al. (2014), (2016), who attributed these trends to an internal state change. In those conditions, locomotion largely stopped shortly after removal from food and behavior was then analyzed over the following hour (Berman et al., 2014). As a result, over 85% of analyzed behaviors were nonlocomotor movements while flies were standing, and here, Markovity was found to break down over one to a few seconds. This time scale is wellsampled in our data, where, by contrast, a generative Markov model produced excellent fits to data. In all, datasets and analyses differ between the two studies in several ways (Materials and methods). Since the majority of behaviors analyzed by Berman et al. are nonlocomotor, and our studies used different genders, it is possible that nonlocomotor and locomotor behaviors have different properties, or that memory in behavior is genderspecific. A likely possibility is that behavior transitions can be adjusted over time by internal or external variables (Berman et al., 2016), suggesting one level at which behavior structure may be modified. Notably, the time scale of elementary behaviors may be invariant to conditions, gender, or other differences between studies to date. This time scale is seen to be a few hundred milliseconds by three different methods in different datasets: increased densities in posture time series power spectra (Berman et al., 2014), mean residence times in inferred postural states (Berman et al., 2016), and trajectory divergence properties in velocity phase space (this work).
Mode structure and finite memory in behavior
We find that temporal structure, defined by correlations in velocities over short times, implies a decoupling over longer time scales. This decoupling between modes can be described as approximately Markovian, in that transitions between modes depend on the current mode with negligible contribution from prior history. Within any one mode, similar velocities diverge, up to a bound of approximately 20% of velocity space. Around the time of transition between modes, effectively unbounded velocity divergence is observed. These findings raise three possibilities: history of prior behavior can be forgotten in the course of executing a mode, during transitions between modes, or both. While it is possible that there might be specific neural circuits devoted to maintaining and forgetting prior states (Martin et al., 1998, 1999b), we note that memory loss may also be intrinsic to the mode structure we describe, or the dynamics of neural activity underlying mode execution.
Mode structure and stability
Behavior is relatively similar any time an animal executes the same mode. Whatever the animal does beforehand, the range of potentially different behaviors that can precede a given mode must narrow to the domain of that mode. At this level, modes can be said to show a transient stability. However, our data provides no evidence that modes show asymptotic stability, under which behaviors would tend towards some typical, canonical behavior within each mode, or more abstractly, a behavior attractor. In fact, on average, originally similar velocities diverge severely in the course of mode execution, and divergence of the more typical velocity trajectories within a mode differs little from that of the less typical (Figure 7—figure supplement 2). If behavior can be described by attractor dynamics with noise, the attractors cannot be strong relative to noise. In all cases, similar velocities at the start of a mode are unlikely to remain similar at the time of transition to the next mode. Hence, at the level of modes, dynamic stability is a weak constraint in spontaneous behavior.
Interface between behaviors
Many behaviors are thought to comprise sequences of relatively stereotyped movements that unfold over seconds or longer (e.g. Marler and Hamilton, 1966; Baerends, 1971; Spieth, 1974; Liu and Sternberg, 1995; Seeds et al., 2014). To specify an entire sequence reproducibly would require a complex, relatively highdimensional dynamical system. This dynamical system would need to cope with internal, neural variation and potentially different initial conditions each time a behavior is to be executed. For instance, variation in neural signals, mismatched mechanical impedances, or external forces, as well as changing behavioral goals, muscle tone, or afferent feedback all change the context in which any given movement is planned and executed (Bernstein, 1967). These differences may persist or become amplified in the course of a sufficiently complex dynamical process unless it is carefully configured for stability. Yet, in theory, stable highdimensional dynamical systems are rare (Smale, 1966).
As one solution, behaviors may be stabilized using feedback between actual and predicted movement outcomes (eg. Todorov and Jordan, 2002). However, some stereotyped behaviors proceed to completion irrespective of taskspecific feedback (Lorenz and Tinbergen, 1938; Willows, 1967; Barlow, 1968). Moreover, even when movement execution can be stabilized by feedback, the template of an intended movement still corresponds to a dynamical system, whose stability remains relevant.
As another solution, complex movements may be represented in terms of simpler, independently specified components, sometimes termed primitives (reviewed in: Flash and Hochner, 2005), potentially composed into actions via hierarchies of motor control (Sherrington, 1906; Tinbergen, 1950). However, modular organization on its own does not guarantee dynamic stability over time. Behavior primitives must interface with each other. Likewise, so must different levels of control. Yet, when different dynamical systems interact, the history of one system can propagate to another. In this way, unintended deviations can also propagate over time. Longterm system stability is not guaranteed.
The properties of behavior elements described in this work suggest how potential challenges in building and maintaining complex, reproducible behaviors may be managed in nature. Broad velocity distributions associated with each behavior mode may accommodate a range of initial conditions, allowing different kinds of behaviors to follow one another. At the interface between behaviors, sensitivity to initial conditions can be minimized by memory loss observed over the course of single behaviors. In this way, subsequent behavior transitions can be insulated from past ones. Over longer times, finite memory between behavior elements can bound susceptibility of behavior to instability, allowing construction of more complex behavioral sequences. Interestingly, human and other vertebrate movements decompose into short episodes with a single velocity extremum following stroke or ablations, fusing gradually into smooth movements in the course of recovery; this suggests that normally smooth movement in vertebrates may be composed from such submovements (Rohrer et al., 2002, 2004; Stein, 2008). To what extent behavior properties found in this work are also found in other contexts and organisms remains to be investigated. This work suggests one possibility for how animals maintain dynamic complexity but limited susceptibility to consequences of any one behavior.
Materials and methods
Datasets
Request a detailed protocolA lab strain of wildtype OregonR D. melanogaster with homology to OregonRmodENCODE (RRID:BDSC_25211; D. Gohl, pers. comm.) was used for all data collection. Single fly trajectories were obtained as described in Katsov and Clandinin (2008) and are available at the Dryad Digital Repository (Katsov et al., 2017). Briefly, the centroid location and orientation of individual flies walking alone or in groups under uniform illumination from below (5.5 cd/m2) in an otherwise dark room were recorded at 30 frames per second. Flies walked either in long test tubes (25 × 150 mm, VWR 89001–458; Datasets S, A) or a large, custombuilt arena (300 mm diameter; Dataset L). Behavior was recorded for 10 min under conditions where locomotion remained near steady state up to 30 min (Katsov and Clandinin, 2008). Trajectories were retained only from the top, central portion of each chamber (8 × 100 mm of test tube, 200 × 200 mm of arena). The position of the head was inferred statistically with $>99.8\%$ accuracy. Trajectories were filtered with a Gaussian kernel $FWHM\approx 0.08$ s (s.d.=1 camera frame), as in previous work (Katsov and Clandinin, 2008).
Velocity phase space, construction and measurements
Request a detailed protocolConstruction. Approx. 10^{6} trajectories from 6930 female flies (median length 0.93 s, P_{2.5}P_{97.5} = [0.30, 4.5] s) were used to measure instantaneous velocity $\overrightarrow{\mathit{v}}=({\mathit{v}}_{T},{\mathit{v}}_{R},{\mathit{v}}_{S})$ and acceleration $\dot{\overrightarrow{\mathit{v}}}=({\dot{\mathit{v}}}_{T},{\dot{\mathit{v}}}_{R},{\dot{\mathit{v}}}_{S})$. These were used to construct distributions of accelerations at $N=41615$ velocity component locations, $Pr\left[\dot{\overrightarrow{\mathit{v}}}\mid \overrightarrow{\mathit{v}}\right]$ spanning ${\mathit{v}}_{R}<$ 450°s^{−1} in 25°s^{−1} increments, $0.6\le {\mathit{v}}_{\mathit{T}}\le \mathit{3.2}$ cm s^{−1} in 0.09 cm s^{−1} increments, and ${\mathit{v}}_{S}<$ 0.94 cm s^{−1} in 0.063 cm s^{−1} increments. Although flies are capable of attaining velocities outside this range, these were not well sampled in our dataset and the velocity range was truncated accordingly to retain useful estimates of $Pr\left[\dot{\overrightarrow{\mathit{v}}}\mid \overrightarrow{\mathit{v}}\right]$. The average number of acceleration data points per velocity bin was $\u27e8{n}_{v}\u27e9\approx 500$, but taking into account autocorrelation time, the effective number of data points per velocity bin was $\u27e8{n}_{v}{\u27e9}_{eff}\approx 100$, with cutoff set at ${n}_{eff}<10$. Acceleration distributions spanned, ${\dot{v}}_{R}<$ 4640 °s^{−2}, $\dot{{v}_{T}}<$ 16.4 cm s^{−1}, ${\dot{v}}_{S}<$ 10.1 cm s^{−1}. In all, 99.7% of recorded behavior was retained. The distribution $Pr\left[\dot{\overrightarrow{\mathit{v}}}\mid \overrightarrow{\mathit{v}}\right]$ was averaged over $\dot{\overrightarrow{\mathit{v}}}$ at each $\overrightarrow{\mathit{v}}$ to produce $\mathbf{E}\left[\dot{\overrightarrow{\mathit{v}}}\mid \overrightarrow{\mathit{v}}\right]$. This expectation is a vector field, describing average changes in velocity at each combination of velocity components. Local dispersion was estimated by the mean absolute deviation, a conservative estimate of dispersion, taking expected value $\mathbf{E}\left[\dot{\overrightarrow{\mathit{v}}}\u27e8\dot{\overrightarrow{\mathit{v}}}\u27e9\right]$ over $\dot{\overrightarrow{\mathit{v}}}$ at each $\overrightarrow{\mathit{v}}$, where $\u27e8\dot{\overrightarrow{\mathit{v}}}\u27e9=\mathbf{E}\left[\dot{\overrightarrow{\mathit{v}}}\mid \overrightarrow{\mathit{v}}\right]$.
Divergence of trajectories $\overrightarrow{\mathit{v}}(\tau )$ was measured from similar initial conditions, defined as small neighborhoods in velocity space each covering 0.004% of this space. Velocity neighborhoods were randomly chosen by sampling velocity bin indices $b\in \{1,\mathrm{\dots},N\}$ corresponding to bin locations in the 3component velocity space. A random sample of ${N}_{B}=200$ bins was drawn. Each sampled bin ${b}_{i}$ produced a set of trajectories $\{\overrightarrow{\mathit{v}}(t){\}}^{({b}_{i})},i\in \{1,\dots ,{N}_{B}\}$, that passed through this bin at time ${t}_{b}$, with $t=\tau {t}_{b}$. Trajectory spread at time $t$ was quantified by standard deviations ${\mathit{\sigma}}_{t}\left[\{\overrightarrow{\mathit{v}}(t){\}}^{({b}_{i})}\right]$, averaged over all randomly sampled neighborhoods and normalized to the standard deviations of the timeindependent velocity component distributions:
In the space of the 3 velocity components, $\overline{\mathit{\sigma}}(t)$ describes the dispersion of trajectories as a fraction of the range of attainable velocities. Trajectories were sampled in four ways for different analyses. First, all trajectories were included that passed the same neighborhood at $t=0$. Second, trajectories were selected that passed the same neighborhood at $t=0$ then attained a $\mathit{v}}_{R$ extremum at different times afterward, $t=[67,100,\mathrm{\dots},300]$ ms. The delay $t=100$ ms is half of the average interval between $\mathit{v}}_{R$ extrema. Third, trajectories were selected that passed the same neighborhood at $t=0$, attained a $\mathit{v}}_{R$ extremum at $t=100$ ms, and were classified as a specific mode $m$ at that time. Fourth, trajectories were selected that passed the same neighborhood at $t=0$, attained a $\mathit{v}}_{R$ extremum at $t=100$ ms, were classified as mode $m$, and fell in the same quartile of this mode’s velocity profile at $t=0$, $Pr{\left[{v}_{T},{v}_{R},{v}_{S}\right]}^{(m)},\text{}m\in \{I..V\}$. 95% confidence intervals were obtained by repeating the procedure 50 times to estimate the error of the mean.
Submode segmentation
ICA setup
Request a detailed protocolAs described in Box 1, times $\{{t}_{e}\}$ of local extrema in ${\mathit{v}}_{R}(\tau )$ were identified over $\tau \pm 83.3$ ms (±2 camera frames), and trajectory fragments $\overrightarrow{\mathit{v}}({t}^{\mathrm{\prime}}),{t}^{\mathrm{\prime}}=\tau {t}_{e}$, covering $n$ camera frames were isolated around each ${v}_{R}({t}_{e})$; initially $n=33,{t}^{\prime}<550$ ms, but see below. Fragments $\overrightarrow{v}({t}^{\mathrm{\prime}})$ were represented as $(3n\times 1)$ column vectors concatenating the 3 velocity components, normalizing turn direction and preserving relative sideslip direction:
Then, these vectors ${\mathbf{v}}^{(j)},j=[1,\dots ,N]$, were concatenated horizontally and standardized rowwise:
We followed Hyvärinen and Oja (2000), Hyvärinen (1999) and Himberg et al. (2004) to identify robust convergence of the FastICA algorithm for decomposition $\mathbf{}=\mathbf{}$, where $\mathbf{A}$ is a $(3n\times D)$ mixing matrix and $\mathbf{S}$ is a $(D\times N)$ matrix that contains $D$ independent component coefficients of sample $j,\text{}{\mathbf{s}}^{(j)}$, in each column. This procedure was done on the entire dataset in the first iteration, $N\approx 2.9\cdot {10}^{6}$ trajectory fragments, and then iteratively on subsets as described below.
Treating trajectory fragments $\mathbf{z}$ as mixtures of components $\mathbf{s}$ in independent components basis $\mathit{\beta}$, the generative model is reformulated using demixing matrix $\mathit{W}}_{\beta$ as: $\mathit{S}={W}_{\beta}Z$. Taking a row $\mathbf{w}}_{i$ of the demixing matrix, $\mathbf{}={\mathbf{}}_{\mathbf{}}\mathbf{}$ represents a coefficient of fragment $\mathbf{z}$ in independent component $i$. Independent components are found using the expectation that coefficient distributions tend to become Gaussian when $\mathbf{w}\mathbf{z}$ projections represent sums of independent variables, and become less Gaussian as independent variables are demixed. FastICA searches for $\mathbf{w}\mathbf{z}$ projections distributed the least like Gaussians. The FastICA algorithm was used to maximize $\mathbf{E}\left[G(\mathbf{w}\mathbf{z})\right]$, where $G$ is a contrast function. We used $G\approx kurtosis$, in part to minimize computation time (Hyvärinen, 1999; Hyvärinen and Oja, 2000). To ensure robust convergence, the search was repeated from random initial guesses of $\mathbf{W}$ (Himberg et al., 2004).
Dimensionality reduction
Request a detailed protocolIn practice, the search is aided by lower data dimensionality and a $\mathbf{W}$ search space limited to orthogonal transformations. The matrix $\mathbf{Z}$ (Equation 4) was whitened to produce
where $\mathbf{D}$ is a diagonal matrix of the eigenvalues of $Cov[\mathbf{z},{\mathbf{z}}^{\mathrm{\prime}}]=\frac{1}{\mathbf{N}}\mathbf{Z}{\mathbf{Z}}^{\mathrm{\top}}$, and $\mathbf{E}$ is the corresponding matrix of column eigenvectors.
In preliminary data analysis, it was found that principal components corresponding to time points ${t}^{\mathrm{\prime}}\text{}\text{}300$ ms accounted for a negligible fraction of total variance in the dataset. Moreover, when included, these dimensions slowed FastICA convergence and increased the spread of converged values of $\mathbf{W}$ without substantially affecting their mean. We inferred that these time points contributed more noise than signal and retained only ${t}^{\prime}\le 300$ ms in further analysis, corresponding to $n=19$ camera frames per fragment.
While performing ICA on the entire dataset, it was found that FastICA convergence suffered when all 3 velocity components were included. Because $\mathit{v}}_{S$ often correlated with $\mathit{v}}_{R$, FastICA convergence was tested using fragments without any $\mathit{v}}_{S$ dimensions. This improved convergence. However, we noted that when they did converge, IC solutions found from all 3 velocity components separated data features better than any solutions from fragments lacking $\mathit{v}}_{S$. To improve ICA convergence without eliminating information contributed by $\mathit{v}}_{S$, we substituted one velocity variable in fragment matrix $\mathbf{M}$ for $\mathit{v}}_{R$ and ${v}_{S}:\text{}{v}_{H}={v}_{R}\frac{\mathrm{\partial}}{\mathrm{\partial}t}(\mathrm{a}\mathrm{r}\mathrm{c}\mathrm{t}\mathrm{a}\mathrm{n}\frac{{v}_{S}}{{v}_{T}})$. This is a combined rotational velocity corresponding to velocity of heading direction, rather than of body orientation only.
Operating on the entire dataset, the first iteration of ICA was set up using modified Equations 2–5. Reduced $(38\times 1)$ vectors,
were concatenated into (38 x N) matrices
and standardized to produce matrix
The matrix $\mathcal{\mathcal{Z}}$ was then whitened to produce
where $\mathbf{D}$ and $\mathbf{E}$ are eigenvalue and eigenvector matrices of $Cov[{{\textstyle \text{}}\phantom{\rule{0.4em}{0ex}}z,{\textstyle \text{}}\phantom{\rule{0.4em}{0ex}}z}^{\mathrm{\prime}}]=\frac{1}{N}\mathcal{\mathcal{Z}}{\mathcal{\mathcal{Z}}}^{\mathrm{\top}}$, respectively.
Evaluating the entire fragment dataset, it was found that six principal components account for 87% of total variance, and only the $(6\times N)$ projection $\stackrel{~}{\mathcal{\mathcal{Z}}}}^{\mathbf{w}$ into these PCs was used in the first iteration of ICA. Eliminated dimensions may contain biological noise or nondominant contributors to total behavior variation. Nondominant contributors were pursued by applying the procedure iteratively on data subsets after removing major components (see below).
Iterative ICA, first iteration
Request a detailed protocolWhen ICA converged on a solution $\mathbf{S}\mathbf{=}\mathbf{W}{\stackrel{~}{\mathcal{\mathcal{Z}}}}^{\mathbf{w}}$ , we examined joint histograms of independent component coefficients obtained in $\mathbf{S}$, $\{Pr\left[{\mathbf{S}}_{{\mathbf{i}}_{1}},{\mathbf{S}}_{{\mathbf{i}}_{2}}\right]\},{i}_{n}$ specifying a row index, ${i}_{1}\ne {i}_{2}$. Each row in $\mathbf{S}$ corresponded to coefficients in one independent component for all trajectory fragments, and rows were examined pairwise for simplicity. After one ICA iteration, at least three clusters in the 6dimensional space of $\mathbf{S}$ were apparent by visual inspection. As distributions with clear clusters are nonGaussian at least in some dimensions, these components may have maximized the contrast function $G$ without necessarily being independent. For this reason, clearly nonGaussian features of IC projections were segmented iteratively. Statistical dependence of segmented components was subsequently assessed explicitly (below).
To begin the iterative procedure, the biggest cluster in $Pr\left[\mathbf{S}\right]$ was separated from smaller ones along the minimum data density between these clusters (Figure 3B, decision point 1). This segmentation turned out to approximately divide trajectories into patterns that graze nearzero $\mathit{v}}_{T$ (Cluster 1) and those that do not (Cluster 2).
Segmentation iteration
Request a detailed protocolThe ICA procedure was repeated on each cluster. Matrix $\mathbf{M}$ (Equation 3) was separated into $\mathbf{M}}^{({C}_{1})}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\mathbf{\text{M}}}^{({C}_{2})},\phantom{\rule{thinmathspace}{0ex}}{C}_{1}\phantom{\rule{thinmathspace}{0ex}}\text{and}\text{}{C}_{2$ corresponding to column indices of velocity fragments belonging to Cluster 1 and Cluster 2, respectively, when projected to $\mathbf{S}$. The rest of the procedure was repeated on each $\mathbf{M}$ subset as described above, including rowwise standardization, PCA, and ICA repeated from random initial guesses of demixing matrices $\mathbf{W}$. Pairwise kmeans clustering was used to separate $\mathbf{S}$ features whenever appropriate (Duda et al., 2000). Like the first iteration, each subsequent iteration began with full 1.1 s fragments $\mathbf{v}}^{(j)$. It was found empirically, as in the first iteration, that fragment interval could be reduced in each subsequent iteration too, in some cases down to ${t}^{\prime}<200$ ms. All velocity components $({v}_{R},{v}_{T},{v}_{S})$ contributed useful information for segmentation at decision points 4,7 and subsequent iterations (Figure 3B). At each iteration, 5–6 PCs were retained prior to ICA accounting for $>90\%$ variance within each subset. The total variance accounted for after all ICA iterations is a weighted sum, rather than product, of variance retained by PCs at each iteration (weighted by subset size), because each round began with the raw data subsets of velocity fragments (M columns, v). Hence, the total variance accounted for at the end of the procedure is likely $>87\%$, the lowest variance accounted for by PCs retained at any one step. Iteration was halted when further rounds failed to find ICs substantially different from those of the starting subset, when projection into ICs did not reveal clearly separable data features, or when ICA failed to converge from random initial conditions.
Cluster 1 was segmented over four further iterations into 13 submodes (Figure 3B, decision points 3–7). These 13 submodes comprised different movement types traversing zero or nearzero $\mathit{v}}_{T$.
Cluster 2 was segmented into two submodes, but not using ICA because further ICA on this subset did not converge on dimensions substantially different from the starting ones in $\mathbf{v}$. Nevertheless, this subset appeared to contain a mixture of movement patterns as its joint distribution $Pr\left[{v}_{T}({t}_{e}),{v}_{R}({t}_{e})\right]$ showed two tails and kinetics of exit from this subset showed at least two distinct components (Figure 6, decision point 2 and data not shown). Furthermore, it was found that behavior switched out of this subset by two distinct sets of velocity trajectories (data not shown). Based on these observations, Cluster 2 was segmented to separate its distinct joint velocity distribution tails and transition paths, producing submodes 1 and 2 (Figures 3 and 6).
Evaluating history dependence between submodes
Request a detailed protocolIndividual trajectories $\mathbf{v}(t)$ from Dataset S were segmented using demixing matrices $\mathbf{W}}^{(C)$ from the iterative classification, producing submode sequences $(u{)}_{k},u\in \{\mathrm{1..15}\},k\text{}\text{}2$, and transition probability distributions $\mathrm{Pr}\left[u\right],\mathrm{Pr}[{u}_{i}\mid {u}_{i1}],\mathrm{Pr}[{u}_{i}\mid {u}_{i1},{u}_{i2}]$ were measured assuming history dependence on 0 to 2 previous submodes. We then constructed artificial submode sequences drawing iteratively from these distributions and compared each set with a set of real sequences. The ’matched fraction’ of real sequences, quantifying set overlap, was calculated as described below (‘sampling from a generative model’, and ‘match test’). This fraction is the average number of exact matches between synthetic and real sets, normalized by the average number of matches between two real sets (Equation 10).
Markov models
We constructed HMMs treating submode sequences as emissions. Models were trained from random initial conditions, assuming a small number of hidden states underlying the 15 previously identified submodes. Markov model assumptions were tested via three different approaches (described below), and adequate model convergence was verified when training from different, random initial conditions converged to nearly identical models. Repeatedly converged models were evaluated in two ways. Models of the same order (same number of parameters) were compared using the loglikelihood of observed data given a particular model. In addition, models of the same and different order were compared using a variant of sampling from a generative model (described below). In wellperforming HMMs, almost every submode was predominantly emitted from a single major state, with all secondary states together contributing a small fraction of total emissions per submode (Figure 5—figure supplement 1D). Based on these observations of HMM structure, we constructed Markov Models (MM) in which underlying states were no longer hidden, significantly reducing the number of model parameters. Moreover, MMs permitted direct estimation of model parameters from submode frequencies observed in movement trajectories, rather than by inference as in HMMs. MMs were constructed by grouping submodes into the same emission class based on their proximity in the hierarchical clustering tree (Figure 3), or by lumping submodes constituting a dominant emission from the same hidden state in the best performing HMMs (Figure 5—figure supplement 1E,F). Multiple MM variants were constructed and evaluated, comprising different MMs with 4–6 states.
The markov model
Request a detailed protocolStarting from a set of possible states and emissions, classified at $\mathrm{v}}_{\mathrm{R}$ extrema occurring at times $\{{t}_{e}\}$, where $state({t}_{e})\in \{{q}_{1},\mathrm{\dots},{q}_{N}\},emission({t}_{e})\in \{{u}_{1},\mathrm{\dots},{u}_{M}\},$ we trained HMMs with N = 2 to 6 states and MMs with N = 4 to 6 states, and used all $M=15$ submodes identified in segmentation as emissions in both cases. In the case of MMs, states $q$ correspond to submode groups termed modes, $m\in \{{m}_{1},\mathrm{\dots},{m}_{N}\}$.
Number of model parameters
Request a detailed protocolThe transition probability matrix ${\mathbf{}}_{\mathbf{}}\in {\mathbb{R}}^{N\times N}$ for both HMM and MM models is given by: ${[Tr]}_{ij}=\mathrm{Pr}[state({t}_{e+1})={q}_{j}\mid state({t}_{e})={q}_{i}]$
Since the sum of each row in the matrix (corresponding to a specific pretransition state) is equal to 1, there are $N(N1)$ free parameters defining this matrix. The emission probability matrix $\mathbf{E}\in {\mathbb{R}}^{N\times M}$ is given by:
In the HMM case, the sum of each row in the emission matrix (corresponding to a specific state) is equal to 1, and hence there are $N(M1)$ free parameters defined by the matrix. In the MM case, an additional condition is that each column of the emission matrix (corresponding to a particular submode) contains only 1 nonzero. Accordingly, $(MN)$ free parameters define the matrix (provided $M\ge N$). Hence, while number of states and their definitions contribute the same number of parameters in HMMs and MMs of comparable size, discrete emissions contribute $O(MN)$ free parameters in the HMM case and $O(M)$ free parameters for MMs.
HMM training
Request a detailed protocolFor a given model order, models were trained using 15 sets of randomly chosen initialization parameters and training data subsets. Random initial transition and emission probability matrices were generated by sampling from a uniform distribution on [0,1] and normalizing matrix rows, and 1500 short submode sequences were randomly chosen for training out of a set of 371775 sequences (mean sequence length = 3.32 emissions; s.d. 3.27). A constrained optimization procedure (using the MATLAB optimization toolbox function fmincon, with an interiorpoint algorithm) was used to find locally optimal parameters which maximize the loglikelihood of the training set (computed using the MATLAB statistics toolbox function hmmdecode) with the constraints being that all rows in the transition and emission matrices sum to 1.
MM training
Request a detailed protocolA set of models was assessed for each model order of 4, 5 or 6 states. Submodes were grouped based on (1) results of HMM training, (2) relationship between submodes in the classification tree, and (3) similarities between submode velocity profiles and connectivity with other submodes. The initial submode assignment to states was as follows:
for 4 states  {8,[3 10:13],[4 5 14],1};
for 5 states  {8,[3 10:13],[4 5 14],1,2};
for 6 states, 3 different initial state assignments were evaluated:
{8,[10:13],[4 5 14],1,2,3}, {8,[3 10:13],[4 5],14,1,2}, {8,[3 10:13],[4 5 14],15,1,2}.
Models were generated for all possible combinations of submodestate assignments for the unassigned submodes in each case. Maximum likelihood estimates for single and joint submode probabilities were computed from submode occurrence frequencies in the dataset, and transition and emission matrices were derived from these probability distributions:
Model comparison criteria
Request a detailed protocolWe computed loglikelihoods of 371775 submode sequences in the training set from Dataset S, based on each model. Ten models with the highest loglikelihood scores for each model type (4–6 states) were kept for further validation and performance testing.
likelihood test
Request a detailed protocolFor each model, ten test sets consisting of 20000 submode sequences were chosen randomly from Dataset S, likelihoods for each test set were estimated given the model, and the average likelihood of all ten test sets reported as the mean likelihood.
sampling from a generative model
Request a detailed protocolFirst, 11 sets of 5000 submode sequences of length 5 were randomly selected from the 83746 submode sequences in the training dataset that consisted of no less than 5 submodes. This was done by first randomly selecting a sequence from all sequences consisting of more than 5 submodes, and then randomly selecting a starting index that is at least 5 submodes away from the end of the sequence. No sequence was used more than once in any of the 11 test datasets. The number of exact matches between one set and each of the remaining 10 sets was counted as described in ‘Match Test’ below (Equation 8). Then, 10 sets of 5000 sequences of length 5 were generated from each of the generative models evaluated.
For independent, and first and second order Markov models (models $I,M1,M2$), synthetic sequences were generated by iteratively sampling from submode transition probability distributions $\mathrm{Pr}\left[u\right],\mathrm{Pr}[{u}_{i}\mid {u}_{i1}],\mathrm{Pr}[{u}_{i}\mid {u}_{i1},{u}_{i2}]$ (Figure 5—figure supplement 1A).
For HMMs and MMs, first, state or mode sequences were drawn according to model transition probabilities, and then submode emissions were drawn according to model emission probabilities, conditioned on the state sequences sampled, to generate submode sequences (Figure 5—figure supplement 1A,C,E).
Exact matches were then counted between these synthetic sets and the first set of real sequences (Equation 9), and matched fraction reported (Equation 10, with N = 10).
Stationarity
Request a detailed protocolStationarity of submode and mode frequencies over the experimental time period was confirmed by checking their distributions in data subsets over time, and by comparing empirical distributions with equilibrium distributions estimated from $\mathbf{T}}_{\mathbf{i}\mathbf{j}}\text{}\text{and}\text{}\mathbf{E$.
Dwell time analysis
Request a detailed protocolDwell times were calculated as follows. Trajectories were segmented according to Markov model $MM5$ mode definitions, producing mode sequences of length $k,\{{({m}_{i})}_{k}\},m\in \{I\mathrm{\dots}V\},i=1\mathrm{\dots}k$, and time sequences of $\mathit{v}}_{R$ extrema associated with each mode ${m}_{i},\{{({t}_{i})}_{k}\}$. Only sequences with an observed switch into and out of Mode $m$ contributed to the dwell time distribution for Mode $m$. Hence, only sequences of length $k\text{}\text{}2$ were retained. Dwell time in Mode $m$ was taken as the difference between times of last and first sequential $\mathit{v}}_{R$ extrema classified as Mode $m$. Dwell times were also calculated by taking half the time to the $\mathit{v}}_{R$ extremum of last Mode ${m}_{i}$ as time of entry into mode $m$, and half the time to the $\mathit{v}}_{R$ extremum of the next Mode ${m}_{j}$ as time of exit from $m,{m}_{i},{m}_{j}\ne m$. This extended dwell time calculation did not affect distribution shape (data not shown). Dwell times in modelgenerated mode sequences were estimated the same way as in real sequences. The time base of all modelgenerated sequences was fixed such that the interval between modes was taken as a constant 167 ms. This time corresponds to the modal value of times between $\mathit{v}}_{R$ extrema.
Test of velocity profile changes between training and test conditions
Request a detailed protocolVelocity distributions $\{Pr{\left[\mathbf{v}\right]}^{(m)}=Pr{\left[{\mathit{v}}_{\mathit{T}}\mathit{,}{\mathit{v}}_{\mathit{R}}\mathit{,}{\mathit{v}}_{\mathit{S}}\mid {t}^{\mathrm{\prime}}\right]}^{(m)}\},m\in \{I\dots V\}$, represent timedependent velocity profiles for each mode $m$ specified by $MM5$. These distributions were compared between modes classified from Dataset S and Dataset L over ${t}^{\mathrm{\prime}}\text{}\text{}300$ ms. Distribution overlap was computed for each comparison using the Bhattacharya Coefficient for discrete distributions: $B(P,{P}^{\mathrm{\prime}})={\sum}_{\mathbf{v}}{\left[P(\mathbf{v}){P}^{\mathrm{\prime}}(\mathbf{v})\right]}^{\frac{1}{2}}$ (Kailath, 1967). Significance was estimated by bootstrap (Rice, 1994): trajectory fragments representing the same transition type (same from: and to: states) were pooled from the two datasets, and resampled to generate 10000 comparisons between two randomly drawn trajectory subsets, maintaining the same sample sizes as in original sets. Each comparison produced an overlap value, and distributions of overlap values $\mathrm{Pr}\left[B\right]$ were used to evaluate the probability that the observed differences between two conditions are due to random chance while sampling from identical $Pr{\left[\mathbf{v}\right]}^{(m)}$ distributions.
Match test
Sequence subsets were sampled from a set of real behavior sequences or produced by a generative model. Subsets $R$ and ${R}^{\prime}$ were drawn independently from a set of real sequences, and subset $S$ was drawn from a set of synthetic sequences. Different $R,{R}^{\prime},\text{and}S$ were drawn and compared $N$ times, as described for each analysis.
Some sequences may occur more than once in the same subset. Therefore, we counted sequence matches between two sets taking into account sequence multiplicity. For each unique element $x$ of multiset X, multiplicity ${\nu}_{X}(x)$ is the number of times element $x$ occurs in X. For two multisets X and Y, ${\nu}_{X\cap Y}(x):=min({\nu}_{X}(x),{\nu}_{Y}(x))$.
For each unique sequence $r$ in subset $R$ and each unique sequence $s$ in subset $S$, with subsets of equal size $R=S$, define:
where ${\u27e8\cdot \u27e9}_{N}$ is an average over $N$ comparisons. Metric $f$ reports the average fraction of synthetic sequences found in a random real set, normalized to the average fraction of real sequences found in two randomly sampled real sets.
Match test as a function of time
Request a detailed protocolMode sequences of length $k,{(m)}_{k}$ were sampled with replacement from sets of real or synthetic sequences. Real sequences were derived from Dataset L trajectories using the 15submode segmentation criteria and 5mode grouping criteria of $MM5$. Synthetic sequences were generated by iteratively sampling from state transition probability distributions of model $MM5$. The total number of sequences in each set ranged from 88434 $(k=2)$, to 5159 $(k=20)$. For each set of length $k$ sequences, ${f}_{k}$ was computed for subsets $\{({m}_{i}{)}_{i=0}^{k}\},\text{}k=2\dots 20,\text{}\mathrm{w}\mathrm{i}\mathrm{t}\mathrm{h}\text{}R=S=500,N=5000$. $\u27e8{F}^{R:R}{\u27e9}_{N}$ ranged from 465 ($k$=2) to 96 ($k$=20). ${\u27e8{F}^{S:R}\u27e9}_{N}$ ranged from 462 ($k$=2) to 76 ($k$=20).
Dataset and code availability
Request a detailed protocolFull trajectory datasets and segmentation code are available at the Dryad Digital Repository (Katsov et al., 2017). The datasets include raw trajectories and annotation of submodes and modes.
Summary of dataset and analysis parameters
Request a detailed protocolMaterials and methods  Table 4.
Data availability

Data from: Dynamic structure of locomotor behavior in walking fruit fliesAvailable at Dryad Digital Repository under a CC0 Public Domain Dedication.
References

The ethological analysis of fish behaviorFish Physiology 6:279–370.https://doi.org/10.1016/S15465098(08)601508

BookEthological units of behaviorIn: Ingle D, editors. The Central Nervous System and Fish Behavior. Chicago: Chicago University Press. pp. 217–232.

Mapping the stereotyped behaviour of freely moving fruit fliesJournal of the Royal Society Interface 11:20140672.https://doi.org/10.1098/rsif.2014.0672

Highthroughput ethomics in large groups of DrosophilaNature Methods 6:451–457.https://doi.org/10.1038/nmeth.1328

Fractal time in animal behaviour: the movement activity of DrosophilaAnimal Behaviour 50:1317–1324.https://doi.org/10.1016/00033472(95)800476

Behavioural analysis of nematode movementAdvances in Parasitology 13:71–122.https://doi.org/10.1016/S0065308X(08)60319X

The dynamics of height stabilization in DrosophilaPhysiological Entomology 9:377–386.https://doi.org/10.1111/j.13653032.1984.tb00778.x

Statistical discrimination of natural modes of motion in rat exploratory behaviorJournal of Neuroscience Methods 96:119–131.https://doi.org/10.1016/S01650270(99)001946

Motor primitives in vertebrates and invertebratesCurrent Opinion in Neurobiology 15:660–666.https://doi.org/10.1016/j.conb.2005.10.011

Automated derivation of primitives for movement classificationAutonomous Robots 12:39–54.https://doi.org/10.1023/A:1013254724861

Saccadic and smooth pursuit eye movements in the monkeyThe Journal of Physiology 191:609–631.https://doi.org/10.1113/jphysiol.1967.sp008271

Saccadic body turns in walking DrosophilaFrontiers in Behavioral Neuroscience 8:265.https://doi.org/10.3389/fnbeh.2014.00365

A syntax of hoverfly flight prototypesJournal of Experimental Biology 213:2461–2475.https://doi.org/10.1242/jeb.036079

Motor primitives and synergies in the spinal cord and after injurythe current state of playAnnals of the New York Academy of Sciences 1279:114–126.https://doi.org/10.1111/nyas.12065

Genetic dissection of neural circuits and behavior in mus MusculusAdvances in Genetics 65:138.https://doi.org/10.1016/S00652660(09)65001X

Independent component analysis: algorithms and applicationsNeural Networks 13:411–430.https://doi.org/10.1016/S08936080(00)000265

Fast and robust fixedpoint algorithms for independent component analysisIEEE Transactions on Neural Networks 10:626–634.https://doi.org/10.1109/72.761722

The divergence and Bhattacharyya Distance measures in signal selectionIEEE Transactions on Communications 15:52–60.https://doi.org/10.1109/TCOM.1967.1089532

Legtracking and automated behavioural classification in DrosophilaNature Communications 4:1910.https://doi.org/10.1038/ncomms2908

DataData from: dynamic structure of locomotor behavior in walking fruit fliesDryad Digital Repository.https://doi.org/10.5061/dryad.854j2

Endpoint force fluctuations reveal flexible rather than synergistic patterns of muscle cooperationJournal of Neurophysiology 100:2455–2471.https://doi.org/10.1152/jn.90274.2008

Taxis und Instinkthandlung in der Eirollbewegung der Graugans. 1Zeitschrift Für Tierpsychologie 2:1–29.https://doi.org/10.1111/j.14390310.1939.tb01558.x

Mushroom bodies suppress locomotor activity in Drosophila MelanogasterLearning & Memory 5:179–191.https://doi.org/10.1101/lm.5.1.179

Temporal pattern of locomotor activity in Drosophila MelanogasterJournal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology 184:73–84.https://doi.org/10.1007/s003590050307

Central complex substructures are required for the maintenance of locomotor activity in Drosophila MelanogasterJournal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology 185:277–288.https://doi.org/10.1007/s003590050387

Goaldriven behavioral adaptations in gapclimbing DrosophilaCurrent Biology 15:1473–1478.https://doi.org/10.1016/j.cub.2005.07.022

The fundamental role of pirouettes in Caenorhabditis elegans chemotaxisJournal of Neuroscience 19:9557–9569.

Movement smoothness changes during stroke recoveryJournal of Neuroscience 22:8297–8304.

Changes in Postural Syntax characterize Sensory modulation and natural variation of C. elegans LocomotionPLOS Computational Biology 11:e1004322.https://doi.org/10.1371/journal.pcbi.1004322

BookNerve Cells and Animal BehaviourCambridge University Press.https://doi.org/10.1017/CBO9780511782138

Mapping and manipulating neural circuits in the fly brainAdvances in Genetics 65:79–143.https://doi.org/10.1016/S00652660(09)650033

Structurally stable Systems are not denseAmerican Journal of Mathematics 88:491–496.https://doi.org/10.2307/2373203

Courtship behavior in DrosophilaAnnual Review of Entomology 19:385–405.https://doi.org/10.1146/annurev.en.19.010174.002125

Motor pattern deletions and modular organization of turtle spinal cordBrain Research Reviews 57:118–124.https://doi.org/10.1016/j.brainresrev.2007.07.008

Dimensionality and dynamics in the behavior of C. elegansPLoS Computational Biology 4:e1000028.https://doi.org/10.1371/journal.pcbi.1000028

Coordination of legs during straight walking and turning in Drosophila MelanogasterJournal of Comparative Physiology A 167:403–412.https://doi.org/10.1007/BF00192575

Dynamic flight stability of a hovering bumblebeeJournal of Experimental Biology 208:447–459.https://doi.org/10.1242/jeb.01407

Searching for motifs in the behaviour of larval Drosophila Melanogaster and Caenorhabditis elegans reveals continuity between behavioural statesJournal of the Royal Society Interface 12:20150899.https://doi.org/10.1098/rsif.2015.0899

Dynamic flight stability in the desert locust Schistocerca gregariaJournal of Experimental Biology 206:2803–2829.https://doi.org/10.1242/jeb.00501

The hierarchical organization of nervous mechanisms underlying instinctive behaviourSymposia of the Society for Experimental Biology 4:305–312.

On aims and methods of ethologyZeitschrift Für Tierpsychologie 20:410–433.https://doi.org/10.1111/j.14390310.1963.tb01161.x

Optimal feedback control as a theory of motor coordinationNature Neuroscience 5:1226–1235.https://doi.org/10.1038/nn963

The case for and against muscle synergiesCurrent Opinion in Neurobiology 19:601–607.https://doi.org/10.1016/j.conb.2009.09.002

Structured variability of muscle activations supports the minimal intervention principle of motor controlJournal of Neurophysiology 102:59–68.https://doi.org/10.1152/jn.90324.2008

Flight Performance and Visual Control of Flight of the FreeFlying Housefly (Musca Domestica L.) II. pursuit of targetsPhilosophical Transactions of the Royal Society B: Biological Sciences 312:553–579.https://doi.org/10.1098/rstb.1986.0018

Highresolution analysis of ethanolinduced locomotor stimulation in DrosophilaJournal of Neuroscience 22:11035–11044.
Article and author information
Author details
Funding
Stanford University (Stanford Graduate Fellowship)
 Alexander Y Katsov
 Limor Freifeld
Jane Coffin Childs Memorial Fund for Medical Research
 Alexander Y Katsov
NIH Office of the Director (DP1 OD003530)
 Thomas R Clandinin
National Institutes of Health (R01 EY022638)
 Thomas R Clandinin
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The authors thank Daniel Ramot, Eran Mukamel, James Fitzgerald, and members of the Clandinin and Bargmann Labs for discussions, and Zak Frentz for comments on the manuscript.
Version history
 Received: February 28, 2017
 Accepted: June 8, 2017
 Version of Record published: July 25, 2017 (version 1)
 Version of Record updated: August 18, 2017 (version 2)
 Version of Record updated: August 29, 2017 (version 3)
Copyright
© 2017, Katsov et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 4,417
 views

 533
 downloads

 33
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Negative memories engage a brain and bodywide stress response in humans that can alter cognition and behavior. Prolonged stress responses induce maladaptive cellular, circuit, and systemslevel changes that can lead to pathological brain states and corresponding disorders in which mood and memory are affected. However, it is unclear if repeated activation of cells processing negative memories induces similar phenotypes in mice. In this study, we used an activitydependent tagging method to access neuronal ensembles and assess their molecular characteristics. Sequencing memory engrams in mice revealed that positive (maletofemale exposure) and negative (foot shock) cells upregulated genes linked to anti and proinflammatory responses, respectively. To investigate the impact of persistent activation of negative engrams, we chemogenetically activated them in the ventral hippocampus over 3 months and conducted anxiety and memoryrelated tests. Negative engram activation increased anxiety behaviors in both 6 and 14monthold mice, reduced spatial working memory in older mice, impaired fear extinction in younger mice, and heightened fear generalization in both age groups. Immunohistochemistry revealed changes in microglial and astrocytic structure and number in the hippocampus. In summary, repeated activation of negative memories induces lasting cellular and behavioral abnormalities in mice, offering insights into the negative effects of chronic negative thinkinglike behaviors on human health.

 Neuroscience
Synaptic inputs to cortical neurons are highly structured in adult sensory systems, such that neighboring synapses along dendrites are activated by similar stimuli. This organization of synaptic inputs, called synaptic clustering, is required for highfidelity signal processing, and clustered synapses can already be observed before eye opening. However, how clustered inputs emerge during development is unknown. Here, we employed concurrent in vivo wholecell patchclamp and dendritic calcium imaging to map spontaneous synaptic inputs to dendrites of layer 2/3 neurons in the mouse primary visual cortex during the second postnatal week until eye opening. We found that the number of functional synapses and the frequency of transmission events increase several fold during this developmental period. At the beginning of the second postnatal week, synapses assemble specifically in confined dendritic segments, whereas other segments are devoid of synapses. By the end of the second postnatal week, just before eye opening, dendrites are almost entirely covered by domains of coactive synapses. Finally, coactivity with their neighbor synapses correlates with synaptic stabilization and potentiation. Thus, clustered synapses form in distinct functional domains presumably to equip dendrites with computational modules for highcapacity sensory processing when the eyes open.