Abstract
Most behaviors such as making tea are not stereotypical but have an obvious structure. However, analytical methods to objectively extract structure from nonstereotyped behaviors are immature. In this study, we analyze the locomotion of fruit flies and show that this nonstereotyped behavior is welldescribed by a Hierarchical Hidden Markov Model (HHMM). HHMM shows that a fly's locomotion can be decomposed into a few locomotor features, and odors modulate locomotion by altering the time a fly spends performing different locomotor features. Importantly, although all flies in our dataset use the same set of locomotor features, individual flies vary considerably in how often they employ a given locomotor feature, and how this usage is modulated by odor. This variation is so large that the behavior of individual flies is best understood as being grouped into at least three to five distinct clusters, rather than variations around an average fly.
https://doi.org/10.7554/eLife.41235.001eLife digest
Many behaviors that we perform everyday, including something as familiar as making a peanutbutter sandwich, consist of a sequence of recognizable acts. These acts may include, for example, holding a knife and opening a jar. Yet often neither the sequence nor the individual acts are always performed in the exact same way. For example, there are many ways to hold a knife and there are many ways to open a jar, meaning neither of these actions could be called “stereotyped”.
A lack of stereotypy makes it difficult for a computer to automatically recognize the individual acts in a sequence. This same problem would apply to other common behaviors, such as walking around somewhere you have not visited before. While we easily recognize it when we see it, walking is not a stereotyped behavior. It consists of a series of movements that differ between individuals, and even in the same individual at different times. So how can someone automatically recognize the individual acts in a nonstereotyped behavior like walking?
To begin to find out, Tao et al. developed a mathematical model that can recognize the walking behavior of a fruit fly. Existing recordings of fruit flies walking were analyzed using a type of mathematical model called a Hierarchical Hidden Markov Model (often shortened to HHMM). Such models assume that there are hidden states that influence the behaviors we can see. For example, someone’s chances of going skiing (an observable behavior) depend on whether or not it is winter (a hidden state).
The HHMM revealed that the seemingly random wanderings of a fly consist of ten types of movement. These include the “meander”, the “stopandwalk”, as well as right turns and left turns. Exposing the flies to a pleasant odor – in this case, apple cider vinegar – altered how the flies walked by changing the time they spent performing each of the different types of movement. All flies in the dataset used the same ten movements, but in different proportions. This means that each fly showed an individual pattern of movement. In fact, the differences between flies are so great that Tao et al. argue that there is no such thing as an average walk for a fruit fly.
The model represents a complete description of how fruit flies walk. It thus provides clues to the processes that transform an animal’s sensory experiences into behavior. But it also has potential clinical applications. Similar models for human behaviors could help reveal behaviors that are abnormal because of disease. Normal behaviors also show variability, and some diseases increase or decrease this variability. By making it easier to detect these changes, mathematical models could support earlier diagnosis of medical conditions.
https://doi.org/10.7554/eLife.41235.002Introduction
There are many approaches to the study of neural underpinnings of behavior: One large body of work is rooted in the psychophysical literature where an animal is forced to choose between a few discrete behaviors (Green and Swets, 1974). This approach allows stimulus control and rigorous analysis of behavior based on an established framework (Gold and Shadlen, 2000), but sacrifices a full analysis of behavioral dynamics, leaving critical issues unexplored. Other studies have focused on behaviors that are reflexive (albeit with some flexibility) such as saccades (Laurutis and Robinson, 1986) and collision avoidance in insects (Tammero and Dickinson, 2002). Another large body of work has focused on the control processes involved in goaldirected behaviors such as reaching movements and has revealed many fundamental principles of motor control. Yet, another popular behavioral motif that has received much attention is behaviors that require meticulous sequencing (Graybiel, 2008). Finally, much work has been done to elucidate the workings of central pattern generators that underlie the rhythmic motor activity during walking and running (Grillner, 1979). Although many of these relatively stereotypical behavioral motifs are at play during most behaviors, they are not helpful in describing the structure underlying most everyday activities such as making a cup of coffee or a peanutbutter sandwich or walking to a car which consist of a sequence of actions, but neither the sequence nor each subaction is stereotyped. These activities and their underlying subactions cannot be described either as sensorimotor reflexes or as behaviors that arise out of meticulous sequencing. An important example of such a behavior is an animal’s locomotion. While tracks of a mouse or a fly exploring a chamber are not stereotypical there is an obvious structure to it.
Uncovering the structure within nonstereotyped behaviors such as locomotion requires sophisticated analytical tools. These tools can be applied to two complementary representations of an animal’s behavior which can be described in the shape/posture space or in the world coordinate system. Recently much progress has been made in employing analytical tools that describe behaviors as a sequence of transformations in the shape/posture space using both supervised (Branson et al., 2009; Kabra et al., 2013), and unsupervised (Berman et al., 2016; Berman et al., 2014; Wiltschko et al., 2015; Vogelstein et al., 2014) algorithms. These studies provided remarkable insights by showing that much of an animal’s behavioral repertoire can be described as transitions between few postures. While behavioral descriptions as transformations in posture space accurately classify the type of behavior (such as locomotion vs. grooming), because an animal’s position in the world coordinate system is ignored, the structure within an animal’s trajectory in world coordinates remains relatively unexplored.
Much of the work in extracting structure from an organism’s trajectory is derived from the ‘run and tumble model’ which was originally employed in the context of bacterial chemotaxis (Berg and Brown, 1972). In this model, the organism is assumed to travel in relatively straight lines (runs) of exponentially distributed run lengths until they make sharp turns (tumble) to choose another direction at random. It is tempting to consider the motion of larger animals as roughly approximated by a runandtumble model, and many studies (explicitly and implicitly) employ a runandtumble framework with increasing sophistication as an analytical framework for locomotion (Kim and Dickinson, 2017; PierceShimomura et al., 1999; Schulze et al., 2015). One obvious welldocumented limitation of this framework is that animals do not turn at discrete times (Stephens et al., 2010; Ohashi et al., 2014; Iino and Yoshida, 2009; Jung et al., 2015; GomezMarin and Louis, 2014; Straub and Heisenberg, 1990), and therefore, their locomotion cannot be described using a runandtumble framework. More generally, larger animals are likely to exert greater control over speed and direction of their locomotion and a better model is necessary to understand the resulting structure in their trajectory.
The lack of a model for locomotion makes it difficult to quantify the effect of stimuli on locomotion and is a critical missing piece in understanding the underlying sensorimotor transformations. For example, in many studies of odor modulation of locomotion, odors are primarily described as attractive or repulsive; this description is based on the end result, and does not consider the navigational maneuvers that underlie these endresults. Ignoring the underlying navigational maneuvers has led to a fundamental misunderstanding of odor modulation of locomotion. In a recent detailed analysis of a fly’s locomotion, we demonstrated that it's navigational maneuvers in response to similarly attractive odors are quite distinct (Jung et al., 2015); the analysis used was based on an adhoc parametrization of locomotion, and not on a generative model of locomotion, making it difficult to determine whether our chosen parameter set was appropriate. A model of locomotion also makes it possible to compare how locomotion is affected by a given stimuli, and also how different individuals differ in their locomotion and in their response to stimuli.
In this study, we employ a hierarchical statistical model, Hierarchical Hidden Markov Model (HHMM) to describe the structure in the fly’s locomotion (Fine et al., 1998). We show that fly locomotion is wellstructured and an HHMM is an elegant representation of this structure. HHMM provides a simple and intuitive description of both a fly's locomotion and the effect of odors on the same. Surprisingly, different flies employ different strategies in their locomotion both before odor onset and in response to odors. Our data are, thus, inconsistent with the idea that the behavior of different flies represent variations around an ‘average’ fly. Rather, our data are most consistent with the idea that flies employ three to five different strategies, at a minimum, to explore a small circular arena and a similar number in their response to odors.
Results
Rationale for the choice of HHMM as the model and the model architecture
We model the locomotion of wildtype flies exploring a circular arena (Jung et al., 2015) whose center (odorzone) consists of a fixed concentration of odor (Figure 1A). The arena and the experimental procedure was previously described (Jung et al., 2015). Briefly, locomotion of each of the 34 flies in our dataset was measured 3 min before an odor (apple cider vinegar or ACV) was turned on, and 3 min during the presence of ACV. Sample trajectories are shown in Figure 1B.
We first attempted to model the fly’s locomotion using Hidden Markov Model (HMM) (Gallagher et al., 2013; Isakov, 2016). HMMs create discrete states based on a time series of observables such as position, speed or acceleration. The advantage of using HMMs in modeling locomotion is well described in earlier studies (Gallagher et al., 2013) (see Materials and methods).
In this study, we use observables that describe the change of position as a function of time, and hence our analysis will focus on behavioral states in the velocity space. Speed and angular speed are commonly used measures of velocity. But, because it is difficult to measure angular speed accurately at low speeds (Gallagher et al., 2013) (see Materials and methods section 2 for details), we fit the model to the component of speed parallel ($\hat{v}}_{$) and perpendicular ($\hat{v}}_{\perp$) to its movement during the previous time point (Figure 1C, Materials and methods section 2). If the fly walks straight ahead, $\hat{v}}_{\perp$ would be zero, therefore, $\hat{v}}_{$ and $\hat{v}}_{\perp$are closely related to speed and angular speed. We fit a time series of $\hat{v}}_{$ and $\hat{v}}_{\perp$ to an HMM. The HMM architecture is shown in Figure 1D. We employed models with 24 to 50 states. HMM was only modestly successful because it was never able to classify >70% of the tracks into one of the states with >85% confidence.
The transition probability matrix for the HMM was sparse (Figure 1—figure supplement 1) suggesting that from each state there are transitions to only a handful of other states. One method to improve upon HMM performance is to cluster the states obtained by HMM according to the transition probability matrix. A common approach to clustering is to blockdiagonalize the transition probability matrix (Berman et al., 2016). Tracks corresponding to the 10 states obtained by clustering are shown in Figure 1—figure supplement 1. Some of these states appear to describe recognizable features in the data such as left (state 10) or right turn (state 9). But efforts to blockdiagonalize the transition probability matrix were only partially successful. The most obvious failure corresponds to the states with little movement. These states – describing the absence of movement – can occur in many different contexts, such as the fully stopped state or intermittent runs. When the same state is used in different contexts, an approximate block diagonalization of the transition probability matrix fails because the same state belongs to two blocks. In these cases, the different states that correspond to the absence of movement did not cluster, and appeared alone even after blockdiagonalization (Figure 1—figure supplement 1). Thus, the existence of the same state in different contexts is one important reason for the modest success of HMM (Marco et al., 2017; Wonjoon et al., 2010; Michele Weiland and Nelson, 2005; Nguyen et al., 2005; Murphy and Paskin, 2001; Chou, 2006).
We employed a twolayered Hierarchical Hidden Markov model (HHMM) to model the data (Figure 2A). We reasoned that the lowlevel states (LL states) would be represented by Gaussian distributions on the observables, and the highlevel states (HL states) would therefore be a mixture of Gaussians and would be able to model the experimental data better. Moreover, these HL states would have longer duration than the states discovered by HMM allowing it to more naturally model composite states.
Indeed, in a Bayesian model comparison (see Materials and methods section 4), HHMMs outperformed HMMs with the same number of states. Since HMMs rarely used more than 35 states (even when models with higher number of states were fit), to perform model comparisons, we used models with less number of states than the particular HHMM we eventually employed. We compared a twolevel HHMM with 6 HL states and 4 LL states to a singlelevel HMM with the same number of states (4 × 6 = 24 states). We labeled the nonhierarchical model as the null model, and were able to reject the null model using Bayesian model comparison at p < 0.0001, implying that a hierarchical model is necessary. Another model comparison – a twolevel model with 8 HL and 4 LL states compared to a singlelevel model with 32 states – also yielded similar results. The objectively better performance of an HHMM compared to an HMM suggests that a model that includes a hierarchical structure is more consistent with a fly’s locomotion. It is important to note that, HHMMs are actually simpler than HMMs with the same number of states. This simplicity comes from the fact that any HHMM – which puts very specific constraints on the transition probability matrix – can be represented by an HMM but not viceversa. HHMMs with the same number of states has far fewer parameters. Thus, for the two comparisons above, the HHMM has 6^{2} + 6*4^{2} + 6*4*5 = 252, and 8^{2} + 8*4^{2} + 8*4*5 = 352 parameters, and the HMM has 24^{2} + 24*5 = 696 and 32^{2} + 32*5 = 1184 parameters respectively; therefore, HHMM has fewer parameters. When a simpler model better characterizes the data, we can conclude that the additional structure contained in that model provides a more accurate characterization of the structure within the data.
The model we chose has 10 HL states (Figure 2A) and 5 LL states for each HL state. The model was fit to the entire dataset – both before and during the presence of the odors. The fitting process initializes by fitting each fly’s tracks to its own HHMM and then clusters these 34 HHMMs – one for each fly – using a Gaussian mixture model, resulting in a smaller number of models. Remarkably, a single HHMM is an excellent fit for all the data suggesting that the behavior of wildtype flies is composed of similar components. The model was able to successfully assign an HL state (defined as >85% confidence) for >80% of the data points (Figure 2B, median 81%). This percentage was consistently high for all flies in our dataset (Figure 2B). In comparison, an HMM with 50 states can only classify 68% of the data with the same level of confidence. Tracks of a fly with each HL state labeled with a different color are shown in Figure 2C.
To make the difference between HHMM and HMM clearer, we compare the HHMM above to a HMM. As expected the time a fly spends in a HMM state is shorter than that in a HHMM state (Figure 2—figure supplement 1A). The longer time a fly spends in a HHMM state results from its hierarchical structure, and allows a HHMM to more accurately assign states, and is illustrated with two examples. First, consider a track that is assigned as a left turn by the HHMM, the HMM only classifies parts of the track as a left turn because of the inability of HMMs to consider longer duration trends in the observables (Figure 2—figure supplement 1B1). Shortterm inhomogeneity in the data throws the HMM off; as soon as the $\hat{v}}_{\perp$ decreases, the HMM exits its left turn state. Another example (Figure 2—figure supplement 1B2) shows that the HMM exits the stopped state as soon as there is a small movement. The net result is that the HHMM can classify all long stops into a single state while HMM needs four different stop states. HHMM also assigns more of the left turn as such (6% compared to only 2% by HMM).
The duration of the LL states of a HHMM is shorter than the duration of the HL state state (Figure 2—figure supplement 2A). Moreover, as the duration of HL states becomes longer, the mean number of transitions increase. The shorter duration of LL states compared to the HL states, and increased number of transitions between LL states within each HL state transition support the idea that there is structure at multiple timescales, and some of this structure is captured by the HHMM.
The limitations of HMM in describing phenomenon which have hierarchical and shared structure because of the short duration of its states is well documented (Marco et al., 2017; Wonjoon et al., 2010; Michele Weiland and Nelson, 2005; Nguyen et al., 2005; Murphy and Paskin, 2001; Chou, 2006). Therefore, an HHMM is an objectively a better model of fly walking data than an HMM.
HL states of the HHMM model describe locomotor features
Both the organized transitions between HL states, and the narrow range of observables associated with each state shows that the fly’s locomotion is structured: The transition probability matrix is sparse – a vast majority of transitions from each HL state were to 23 other HL states. When we reordered the states (see Materials and methods section 5) from lowspeedhighturnstates to highspeedlowturn states, we found that from any state the flies transitioned to the neighboring states with a high probability (Figure 2—figure supplement 3) suggesting a gradual transition from lowspeedhighturn states to highspeedlowturn states. This gradual transition is not because flies cannot make large transitions due to biomechanical limitations because 47/81 1 possible transitions between HL states have a nonzero probability. Rather, transitions to states with similar kinematics show that under our experimental conditions – locomotion in a dark, small circular arena  flies locomote at similar $\hat{v}}_{$ and $\hat{v}}_{\perp$ for extended periods of time, and represent one way in which locomotion is organized.
More important to the organization is the narrow distribution of observables  $\hat{v}}_{$ and $\hat{v}}_{\perp$ associated with each HLstate. The distribution of observables for a HL state is a composite of the distributions of its LL states (Figure 3A). Both the model (solid line in Figure 3A1) and a random sample of observables drawn from the time points assigned to a given LL state (gray markers) show that during each LL state within HL state 10, the observables are limited to a narrow range of values. In each LL state, $\hat{v}}_{$ is large and $\hat{v}}_{\perp$ is negative implying that in HL state 10, flies turn counterclockwise at high speeds as observed in sample tracks corresponding to a single transition to HL state 10 (Figure 3B). The sample tracks also show that within each transition to a HL state there are multiple transitions between LL states, a signature of the hierarchical organization in our data. Fast, counterclockwise turns represent a locomotor feature which describes a fly’s locomotion in HL state 10. To better visualize this feature, we translated each track such that it began at the origin and rotated the tracks so that the initial velocity vector pointed along the yaxis (Figure 3C_{1}, see Materials and methods). These transformations make it apparent that the locomotor feature for state 10 is turning left at high speeds (Figure 3C_{2}).
Rotated and translated (as in Figure 3C) tracks for each of the 10 HL states are shown in Figure 4. The distribution of the observables for each HL state is also plotted. HL state 1 represents very slow walking with frequent changes in direction. In state 2, flies are either completely stopped or they walk at a speed about twice the speed of the fly in state 1; state 2 represents stop and start locomotion. The subtle, but important differences between state 1 and state 2 show an instance in which the HHMM is successful at extracting an unexpected feature in the velocity profile in a fly’s locomotion. During state 3, the fly is exhibiting a sharp turn that is reflected in the increase in $\hat{v}}_{\perp$ with a concomitant decrease in $\hat{v}}_{$. These three states together represent slow locomotion.
In states 47, flies are walking at a mediumspeed. In contrast to the clear drop in $\hat{v}}_{$ with increases in $\hat{v}}_{\perp$ in state 3, $\hat{v}}_{$ remains strikingly constant irrespective of $\hat{v}}_{\perp$. These states are different from each other because $\hat{v}}_{$ is slightly different.
States 8–10 are highspeed states; each of these states is also characterized by their turn direction. During states 8 and 9, the fly turns right; the fly’s speed is higher during state 9 than during state 8. During state 10, the fly turns left. States 9 and 10 are mirrorsymmetric versions of each other.
Flies spend 60% of time performing a locomotor feature for >300 ms, and >10% of their time performing a single locomotor feature for >3 s (Figure 2—figure supplement 2). Thus, flies spend extended time in the same state.
Odors affect locomotion by altering the occupancy of HL states
In the absence of ACV, the state occupancy inside and outside the odorzone are quite similar: The fly spends 30% of its time in state 2 and roughly equal time in all other HL states. Introducing ACV changes the fly’s locomotor behavior both inside and outside the odorzone, but with opposite effects on the HL state occupancy in the two zones. Inside the odorzone, in the presence of ACV, the fly spends more time in HL states 1 and 3 at the expense of time spent in HL states 7–10 (Figure 5A). These changes from highspeed states to lowspeed states suggest that in the presence of ACV the fly is performing a local search, presumably to find food. Outside the odorzone (Figure 5B), the fly spends more time in the highspeed states (HL states 8–10), with a decrease in the occupancy of HL state 2 (which includes stopping). Decreased stopping and increased highspeed walking with turning is likely to represent a different search strategy, wherein the fly might be attempting to refind the odor it has recently lost. We also investigated whether there were changes in the LL state composition of the HL states and found no changes (Figure 5—figure supplement 1). Overall, these results showed that odors affect locomotion not by creating new locomotor features, but by altering the frequency with which existing locomotor features are used.
The divergent effect of ACV on the probability of HL states inside and outside the odorzone is consistent with our previous analysis (Jung et al., 2015) and shows that the effect of ACV can be described by the change in the probability of the fly occupying HL states. To assess whether there is a more finegrained spatial structure to the effect of ACV on a fly’s behavior, we divided the arena into a 60by60 grid and measured the ACVinduced changes in the probability of occupying a given HL state at each of the 3600 locations (Figure 6). The probability that a fly is in HL state 1 increases dramatically only at the edge of the odorzone (Figure 6A), and not throughout the odorzone where the odor concentration is uniform throughout, showing that the effect of odor on locomotion has a finegrained spatial structure. Locationspecific change in the probability of each HL state are shown in Figure 6B. The finegrained modulation of locomotion is observed in other states as well  increases in state 2 are largest in an annular region just inside the odorzone and increases in state 3 are largest at the very center of the arena. A similar specificity is observed in the increase in the probability of HL states outside the odorzone. Increases in the occupancy of state 8 are uniform across the entire chamber outside the odorzone; in contrast, the occupancy of states 9 and 10 increases in the region close to the odorzone.
The structure that we observe represents the timeaverage over the entire duration of the odor period, and ignores the time evolution of the behavior (Figure 6—figure supplement 1 and discussion). We were unable to explore the spatiotemporal evolution of behavior because of substantial flytofly differences in locomotion and how it is modulated by odor.
Both locomotion and an odor’s effect on locomotion is fly dependent
Given that ACV affects the occupancy of HL states, it should be possible to do the reverse, that is decode the presence of ACV based on the distribution of HL states. Surprisingly, a variety of different decoding techniques failed to decode the presence of ACV based on the distribution of HL states. One such method (Figure 7—figure supplement 1) in which we employed logistic regression to classify each one second of every fly’s track into ‘ACV present’ or ‘ACV absent’ failed. Even more surprisingly, population decoding based on HL states did not perform any better than decoding based on the observables (Figure 7—figure supplement 1). One possibility that the logistic regression approach failed is because the average behavior represented in Figure 5 does not accurately encapsulate individual fly behavior. Large flytofly differences, where different individuals have fundamentally different basal locomotion or response to odor, might doom decoding methods aimed at discovering a single set of regressors that captures individual fly behavior.
Consistent with large flytofly differences in behavior, we found that the distances between empirical flies are much larger than the distances between the synthetic flies (Figure 7A). Synthetic flies were generated as described in Figure 7—figure supplement 7–2. It is statistically impossible (p < 10^{−131}) that the observed Euclidean distance represents variations around the same average fly. The same conclusion applied to the fly’s behavior in the other three conditions (beforeinside, duringoutside and duringinside: Figure 7—figure supplement 3).
Because the data in Figure 7A is inconsistent with individual flies being variations around a single average fly. We assessed whether the observed variability can be approximated based on a small number of discrete locomotortypes. Xmeans clustering (see Materials and methods section 6) showed that there are 4 clusters of flies based on their locomotion outside the odorzone, before odor onset (Figure 7B). Although the identity of flies that cluster together changed, a similar number of clusters was found in each of the four conditions (Before odor/inside odorzone, during odor/inside odorzone, beforeoutside and duringoutside, Figure 7 and Figure 7—figure supplement 3). Importantly, Xmeans clustering on a set of 34 randomly sampled points from a uniform distribution in the probability simplex space that the data reside in found no clusters. The Euclidean distances between synthetic flies drawn from the four different clusters were similar to the distances between empirical flies (Figure 7B and Figure 7—figure supplement 3). Since Xmeans clustering tends to underestimate the actual number of clusters in data (Pelleg and Moore, 2000), the analyses in Figure 7 and Figure 7—figure supplement 3 suggest that there are at least three to four flytypes based on the frequency with which they use the HL states.
Consistent with the analysis with Euclidean distances above, the KL divergences (Figure 7C, left set of data points) show a large range indicating that some, but not all, flies are wellrepresented by the population average while others are not. Employing three to four flytypes defined as average distribution of their respective cluster decreased the information loss (Figure 7C, right).
How was the behavior of the flies in the four clusters different from each other? The average occupancy of the HL states for flies in each cluster and one example from each cluster is shown in Figure 7D. Cluster two was distinct from the others because flies move at markedly slower speed and spent >60% of their time in State 2 (Figure 7D) during which the fly was often stopped. Locomotion of flies in the largest cluster (cluster 1) was characterized by an alternation between medium and slowspeed states. Flies in this cluster employed states 5 and 7 with a high frequency while making radially inward forays into the center of the arena. Similar behavior was observed in cluster 3, except that the flies employed the slower mediumspeed states (states 4 and 5). Finally, the fourth cluster of flies demonstrated a different locomotor strategy. Flies in cluster four traversed the arena in concentric circles using the highspeed states (states 8–10). Thus, Xmeans performed on the HL state distributions appear to identify different locomotor strategies employed by the fly.
How similar is the locomotion of a fly at different times during a trial? To investigate this issue we first examined whether the mean behavior of flies from different clusters are different enough that they can be accurately clustered based on a small sample of the HL states (Figure 7—figure supplement 4A). We limited the analysis to beforeoutside and duringinside scenarios because the fly spent much of its time in these two scenarios. Only a onesecond chunk of data is sufficient for better than chance clustering, and just 30 s of sampling is enough to accurately classify >85% of the flies into their respective clusters. We performed two analyses to test whether a fly’s behavior is persistent: First, we divided the tracks into bins of different length, and asked whether cluster assignments based on small bin sizes is stable (Figure 7—figure supplement 4B). We found that state distribution within each bin was highly predictive of the cluster they belonged to. Second, we repeated the same analysis, but with bins of varying size starting from the first data point or ending at the last data point (Figure 7—figure supplement 4C). These analyses show that within the admittedly short timeframe of our experiments, the cluster assignments are stable.
Apart from the flytofly variability, another reason why decoding based on the average fly fails is that the behavior of the fly before odor onset is only weakly predictive of its behavior during the presence of odor. Some flies exhibited similar behavior in the before odor/outside odorzone, but were divided into separate clusters in the presence of odor because their locomotion differed (e.g. flies 11 and 33, and 17 and 27;Figure 7—figure supplement 5A, see the distributions of HL states).A similar trend is observed inside the odorzone (Figure 7—figure supplement 5B). These examples imply that behavior before odor onset is unlikely to be strongly predictive of behavior during the presence of odor. This conclusion is supported by the weak correlation between Euclidean distances between pair of flies in the before and during periods (Figure 7—figure supplement 5A and Figure 7—figure supplement 5B).
A small number of strategies can explain the variability in flies’ response to odor
The analysis presented above suggests that individual differences explain why the logistic regression approach based on the average HL state distribution across flies failed to decode the presence of ACV from its absence based on the HL states usage by individual flies. If so, individualized logistic regression should be more successful. Logistic regression based on individual flies to decode the presence or absence of odor based on HL state occupancy during a 1sinterval was able to classify odorno odor at better than chance level for every single fly (Figure 8A, see Materials and methods section seven and Figure 8—figure supplement 1 for details). Moreover, as expected, logistic regression using the HL states performed significantly better than did the observables (Figure 8B) which indicate that HL states are more predictive of the presence of ACV than the observables.
The analysis in Figure 8A shows that the occupancy of HL states is predictive of the presence or absence of odor when the analysis is performed at the level of individual flies. Does this mean that each fly follows an individualistic strategy? To evaluate whether a fly’s response to ACV cluster into a small number of response types, we once again started with Xmeans clustering based on the change in state occupancy before and during odor. Xmeans clustering found five clusters inside the odorzone and four clusters outside the odorzone (Figure 8C). Using these clusters as a starting point, we could reconfigure the clusters such that the logistic regression on flies in each cluster performed at a better than chance level for each fly in the cluster (Figure 8C, see Materials and methods section seven for details), thus implying that a fly’s response to odors can be approximated as a choice between few responsetypes.
Based on their behavior inside the odorzone, the flies were divided into five clusters, four of these clusters have more than three flies (Figure 9A). Flies in cluster 5, just like the average fly (in Figure 5), slow down inside the odorzone. Flies in cluster 3 also demonstrate a strategy similar to flies in cluster five except that ACV causes a large decrease in the time a fly spends in the mediumspeed states rather than the highspeed states. In contrast to clusters 3 and 5 during which the fly slows down inside the odorzone, flies in cluster two demonstrate a fundamentally different strategy in which there is a large decrease in state 2 in favor of states 1, 3 and 4. The flies in this cluster go from stopstart locomotion to locomotion in which they either meander at slow speeds or walk slowly with many sharp turns. Finally, for the flies in cluster 1, there is no dramatic change in state. These different strategies represent diametrically different effects of ACV on some HL states – the most striking example is the opposing effects of the odor on HL state 2 occupancies in different clusters – a large decrease in cluster 2, and an increase in clusters 3 and 5. These differences explain the odorinduced increase in usage of all the slow states in the average fly inside the odorzone, except state 2 (Figure 6A).
Based on how odors modulate their behavior outside the odorzone, there were four clusters. Behaviors that represent three of these clusters are shown in Figure 9B. Flies in cluster 1 decrease the time they spend in slowstates (state 1–2) and instead spend time in the fast states (states 8–10) resembling the behavior of the average fly. Cluster 2) shows a large decrease in HL state 2 occupancy similar to the behavior of flies in inside odorzone cluster 2 while exhibiting a large increase in mediumspeed states (state 4–6). Cluster 3 showed no dramatic change in state.
Discussion
A cornerstone of neuroethology is that behavior unfolds in discrete packets, that is, behavior can be temporally segmented into natural units (Barlow, 1977; Tinbergen, 1996; Baerends, 1976). In some behaviors, these discrete packets are readily recognizable. But, in most daily behaviors, there is enough variability in these discrete packets to make the underlying natural units unrecognizable without the help of sophisticated analytical tools. Recently, there has been significant progress in discovering structure in sequences of posture in C. elegans (Stephens et al., 2010), Drosophila (Berman et al., 2016; Vogelstein et al., 2014) and mice (Wiltschko et al., 2015). Here we describe structure in a fly’s locomotion in the velocity space.
Our salient results are:
A fly moves at a relatively constant $\hat{v}}_{$ and $\hat{v}}_{\perp$ for an extended time that can last tens of steps. Therefore, a fly’s locomotion can be decomposed into a few locomotor features – 10 features in the case of the model we present. The same 10 locomotor features could describe the behavior of all flies in our dataset. The effect of odors on locomotion, and the differences in behavior across flies can be described in terms of these 10locomotor features.
Using this analytical framework, we show that odors affect locomotion by altering the time that a fly spends performing a given locomotor feature (instead of creating new features). The odorinduced change in locomotion has a fine spatial scale – the fly’s response to odor changes as it moves from the border of the odorzone to its center, and as it moves away from the odor border.
The HHMM framework also allowed us to show that flies used the same 10 locomotor features, but in different proportions. The variation is so large that the fly’s behavior cannot be understood as variations around the same average fly. Instead, the flies employ a minimum of at least 3–4 different strategies.
Below we discuss the limitation and implication of these findings.
Model limitations
The model presented here is a model of locomotion and not the model of locomotion. The choice of observables and model strongly influences the features of the structure that is discovered. Our particular model reveals the structure of locomotion in the velocity space.
In choosing the observables, we employ a common method for describing locomotion, that is we treat the fly as a point object and measure the instantaneous change in the position of this point object; therefore, much of the insights from the model relate to how the fly changes its position in time. Apart from $\hat{v}}_{$ and $\hat{v}}_{\perp$, another similar and more commonly used representation of the change in fly’s position: instantaneous speed and angular speed yielded similar locomotor features (data not shown). Ultimately, we used $\hat{v}}_{$ and $\hat{v}}_{\perp$ because this representation is more closely related to movement representation within the insect brain (Green et al., 2017; Heinze, 2017; TurnerEvans and Jayaraman, 2016), and because the measurement errors associated with angular speed are particularly large when flies moves slowly (Gallagher et al., 2013).
A fly’s position can also be described using the actual position of the animal as observables rather than the change in the position, as employed in a recent study in rats (Shan and Mason, 2017) Using the instantaneous position as an observable would reveal different aspects of the structure underlying an animal’s locomotion. Consider the trajectories of flies in Clusters 1 and 3 (Figure 7). They cross similar spatial positions, but are classified into different states because the flies travel at different speeds. In terms of the sequence of position in space, flies in both clusters have a similar behavior – they explore the outer arena border and make occasional radial forays inside the odorzone. An analysis based on position would likely place these two clusters of flies together whereas our analysis, in the velocity space, placed them in different clusters.
Model architecture is also important. A hierarchical model performed better than a nonhierarchical model. The current model has state durations of <3 s. It is clear to human observers that there is structure in the data that is >3 s long. Flies sometime explore the outer border of the arena using characteristic paths that can last up to a minute. The short duration of states in our model cannot capture structure on these longtime scales. One possibility is choosing a deeperlayered architecture. Given the structured transitions between the HL states in our model, it is likely that if we used a deeperlayered architecture, we would likely uncover structure on a longer timescale.
Locomotor features and implications for neural control of behavior
During both HL states 1 and 2, the fly’s locomotion is quite slow, but in state 2 the fly stops and runs intermittently while in state 1, the fly is continuously in motion, albeit slowly. Similarly, in each of the HL states 4, 5 and 7, $\hat{v}}_{$ lies within a narrow range, which is distinct for each of these three states, implying a tight control over forward speed. These locomotor characteristics can persist over 3 seconds (Figure 2—figure supplement 2) – a time period during which a fly takes 30 steps on average (given a step frequency of 10 Hz; Mendes et al., 2013). This tight control over $\hat{v}}_{$ over many steps strongly suggests that locomotion unfolds in blocks. The HHMM presented here or the states revealed by it may not reflect the actual states employed by neurons in the brain. In fact, there is an ongoing debate whether behavior and its control is better represented as a continuum than by discrete states. The presence of longlasting states that are employed repeatedly by all flies in our dataset implies that either locomotion does consists of transition between discrete states, or that these states represent fixed points or peaks of a dynamical system around which the animal spends most of its time (Berman, 2018).
Another surprising result is that the same set of locomotor features describes the behavior of all the flies in the dataset. This result is particularly surprising given that our model explicitly allows each fly its own set of locomotor features. The fact that all flies can be reasonably modeled by the same model implies that within a given environment all flies construct their locomotion from the same building blocks, and differences in locomotion amongst flies or the effect of sensory stimulation can be quantified as changes in the frequency with which these building blocks are employed. An important avenue for future research is to assess whether these locomotor features are fixed or flexible.
Odors affect behavior on a fine spatial scale
In nature, animals encounter odors in a cluttered and dynamic sensory environment (Riffell et al., 2014). Discriminating between odors, navigating towards the chosen odor source and pinpointing the odor source requires a flexible deployment of multiple different motor programs. It is difficult to replicate the complex natural environment in the laboratory. Therefore, laboratory studies are typically aimed at different subsets of the complex environment experienced by animals. In insects, much research has focused on an environment in which it experiences odors in a highly structured odor plume often within a highcontrast visual environment (Budick and Dickinson, 2006; Kennedy, 1983; Vickers, 2000; van Breugel and Dickinson, 2014; ÁlvarezSalvado et al., 2018). Recently, similar experiments have been repeated for flies walking towards an odor source (Bell and Wilson, 2016). These experiments model an insect’s behavior under one specific condition wherein the fly tries to locate an odor source at a distance using strong directional information from wind and vision. The experiments described here explore a fly’s behavior in a small, dark circular arena. At most locations in the arena, the air speed was 0.07 m/s; the highest windspeed was 0.11 m/s (Jung et al., 2015), a value lower than has been employed in most studies. Therefore, nonolfactory directional cues from vision or wind are minimized (Jung et al., 2015). Consistent with this idea, there was no change in the distribution of the flies when wind was completely eliminated.
We find that this behavior near the odor source can be described by changes in the HL states. The clearest evidence that changes in HL states are a good description of the fly’s behavior is the analysis in which we measured the spatial distribution of odorevoked changes in HL states (Figure 6), and observed a pattern that strongly resembles the odorzone. This analysis shows that the HL state description is accurate enough to facilitate discovery of arena structure. However, because we averaged HL state distribution over the entire 3 min of odor exposure, the analysis misses some details. The flies first detect odor slightly outside the odorzone as defined in this study (Jung et al., 2015), and their behavior during the first 10 s after odor encounter differs from their behavior during the rest of the odor period (Figure 6—figure supplement 1A). Moreover, at least some of the spatial structure results from the change in the radial density of the fly as a function of time (Figure 6—figure supplement 1B). A fly’s interaction with odor is dynamic and that HL states are a good analytical framework to extract the spatiotemporal patterning of fly’s behavior by odor. This spatiotemporal pattern likely differs among flies; a full description of this pattern requires a larger dataset and represents an important avenue for future research.
Flies show considerable variability in locomotion despite employing the same locomotor features
Even singlecell organisms and animals with simple nervous systems display substantial individuality (Jordan et al., 2013; Gallagher et al., 2013). Animals with larger nervous systems are likely to display even greater individuality, in the case of adult flies this individuality was demonstrated in the context of locomotor handedness (Buchanan et al., 2015) in a choice assay. The nature and extent of individuality is harder to assess in more complex behaviors because of the difficulty in assessing individuality in a large behavioral space; differences in behavior can simply be different instantiation of the same behavior, or reflect fundamental differences in behavior. In this study, we find that different flies employ the same locomotor features but use them in vastly different proportions. The observed variability between the flies is inconsistent with a single type of locomotor behavior but can be approximated by invoking 3–4 clusters of flies. Because the clustering framework we employed (Xmeans) underestimates the number of clusters, and because there were only 34 flies in our dataset, it remains to be seen whether there are a few locomotortypes or a whole continuum of locomotortypes. Another limitation of this study is that we have not yet ascertained whether a fly’s behavior persists over a longer time frame. Despite these limitations, this study makes two important contribution to the study of individuality. First, we develop a statistical framework to study complex behaviors. This method can be extended to examine whether the differences we observe truely represent individuality. Second, in many behavioral studies, researchers focus on the effect of some stimuli on behavior and make conclusions based on the average fly. In this study, we provide a framework for testing whether the description based on an average fly is appropriate, and ways to proceed if such an approach is inadequate.
The diversity of odor responses observed here is consistent with work done on moths where (Willis and Arbas, 1998) but in sharp contrast to recent work on walking Drosophila (Bell and Wilson, 2016) in which the authors reported that attraction to odors results from a stereotypical motor pattern. The authors of that study claimed that the relatively simple response to odors in their study is likely a result of their simple behavioral arena. Another possibility is that in their study there is a strong, directional wind cue. In the presence of a steady wind cue in a narrow arena, it is likely that the flies’ locomotor behavior is dominated by upwind walking and suppresses other elements of their behavior. It is wellestablished that a fly’s response to odor is strongly influenced by context, as was demonstrated recently by comparing the response to odors in different visual and air flow conditions (Saxena et al., 2018).
When considered from the viewpoint of an individual animal, this variability is hard to understand: A hungry fly in search of food should respond with a singular, hardwired behavior which represents an optimal strategy for locating food. However, species evolve as large populations of individuals, and a successful species should be able to adapt to fluctuating environmental condition. Recent work has shown that  bet hedging  a process by which the same genotype shows considerable phenotypic variation is important for adaptation to fluctuating environments (Kain et al., 2015). Having a diversity of phenotypes ensures that some individuals would thrive in any condition and behavioral variability is a feature not a bug and its careful consideration is critical.
Studying individual behavioral responses is also important to understand the mechanism underlying both the control of locomotion and how odors control locomotion. Analyzing behavior at the population level can provide important insights into an animal’s response but does not provide the resolution necessary for understanding the neural mechanism underlying the momentbymoment control of behavior at the level of individual flies. In this context, it is instructive to take a closer look at the average response to odors in the light of clusters of response to odors. The average response (Figure 5A) was surprising: the occupancy of all the slow states except state 2 is increased. The lack of increase in occupancy of state 2 results from a cluster of flies in which the occupancy of state 2 is strongly decreased (Figure 9A). A similar effect is observed in the response of the average fly to the mediumspeed states – states 4–7. The occupancy of these states decreases in some flies and increases in others. Thus, the average fly is an aggregate of these different clusters of flies, each of which has a distinct response to odor. Disaggregation is an essential first step to understanding neural control of behavior.
Materials and methods
All the code required to fit the HHMM model, and perform all the analysis in this manuscript can be found on Github (Tao et al., 2018; copy archived at https://github.com/elifesciencespublications/HHMM). The dataset can be found on Dryad (doi:10.5061/dryad.m930f2m).
Collection of behavioral data
Request a detailed protocolThe methods used to collect the behavioral data were reported in a previous study (Jung et al., 2015). Briefly, flies were raised in a sparse culture. Flies that were 3–5 days posteclosion were starved for 14–18 hr. Locomotion of a single fly was recorded for a 3 min period before odor was introduced (before period) and a 3min period during odor (during period), using a video camera at a rate of 30 frames per second. The coordinates of the fly were extracted using a custom MATLAB program (https://github.com/bhandawat/flywalkingbehavior/tree/master/tracking; Bhandawat, 2017; copy archived at https://github.com/elifesciencespublications/flywalkingbehavior).
Extracting observables from the trajectory
Request a detailed protocolThe behavioral arena was normalized to a unit circle centered at the origin. The raw coordinates of the centroid of the fly were smoothed using wavelet denoising followed by a locally weighted (lowess) filter. Speed and curvature were defined exactly as in the previous study.
To quantify the behavior of the 34 flies, we computed the speed of the fly along the original direction of movement ($\hat{v}}_{$) and the speed of the fly perpendicular to the original direction of movement ($\hat{v}}_{\perp$). $\hat{v}}_{$ at time t was defined as the component of the velocity at time $t$ in the direction of velocity of the fly at time $t1$. $\hat{v}}_{\perp$ was defined as the component of the velocity at time $t$ perpendicular to velocity of the fly at time $t1$ (Figure 1C). These values were calculated as follows:
Values of $\hat{v}}_{$ and $\hat{v}}_{\perp$ found to be further than 4 standard deviations away from the average were set to values drawn from a normally distributed distribution ($\sigma =1$) centered at the 4 standard deviation mark. The resulting $\hat{v}}_{$ and $\hat{v}}_{\perp$ were then set as the observables used in fitting a 2level Hierarchical Hidden Markov model (HHMM).
We employed $\hat{v}}_{$ and $\hat{v}}_{\perp$ instead of speed and curvature because curvature is very noisy at low speeds because the calculation of curvature requires division by the thirdpower of speed. $\hat{v}}_{$ and $\hat{v}}_{\perp$ are directly related to speed and curvature as follows
HHMM modeling
HMMs are widely used in a variety of fields for modeling time series data. An HMM is a Markov model which assumes that a given sequence of observations may be explained by a set of states that are not observed (or hidden states), and the time independent probability of transitioning between these states. The model processes which produce the observations in an HMM are hidden to the researcher and thus, the goal of fitting an HMM is to uncover the highest likelihood probability model parameters that can generate the data. Baum and others developed the core theory of HMMs (Baum and Petrie, 1966). Since then there has been much exploration of model architecture, and fitting procedure.
HMMs have been shown to be effective in modeling behavior because instantaneous measures of an observable are variable; therefore, behavioral states inferred by the application of simple thresholding to instantaneous measures of the observables are likely to be erroneous. HMM remedies this problem by inferring states based not only on the value of the observable at the current time point but also on the previous and following time points, and allows a more accurate determination of state (this idea is wellexplained in Figure 2 in ref 17). Specifically, the assumption of Markov dynamics with a sparse prior on state transitions penalizes the consideration of unlikely state transitions based upon recent history (forward filtering) and future destinations (backward smoothing).
The HHMM is an extension to the HMM which applies hierarchical structure in the form that higher level state is itself an HMM composed of its lower level states (Fine et al., 1998). The approach we take in exploring and fitting the HHMM closely follows the approach developed by Matt Beal in which he applied variational algorithms to fit HHMM to a time series of observables (Beal, 2003).
This section is divided into three parts. First is the description of the model, second is the details of the process by which the model is fit, and third is the thought process behind our model selection.
Model description
Request a detailed protocolThe model we describe here has 10 hidden states at the higher level (HL) and 5 hidden states at the lower level (LL) (Figure 2A). The empirical data to which the model is fit is a time series of the two observables  $\hat{v}}_{$ and $\hat{v}}_{\perp$. In building the model, the aim is to assign each instant in this time series to a LL and HL state. Each LL state is associated with a joint probability distribution on the observables. Each HL state is described by the transition probability matrix of its LL states. Therefore, while fitting a HHMM, we are determining three sets of quantities: First, the distribution of $\hat{v}}_{$ and $\hat{v}}_{\perp$ which describes each LL states (Figure 3S1A/B). Second, the transition probabilities (TP) between the LL states which describe each HL state (Figure 2S3B). Finally, the transition probabilities between the HL states (Figure 2S2A). Based on the values of $\hat{v}}_{$ and $\hat{v}}_{\perp$, the model assigns a sequence of HL and LL states which best describes the data.
TP matrix for the HL states and the LL states associated with each HL states is shown in Figure 2—figure supplement 3. In each case, the TP matrix describes the probability (P_{ij}) that a fly in a given state detailed in the j^{th} column will transition to a different highlevel state detailed in the i^{th} row. The highlevel states are arranged in ascending order of mean speed/variance in curvature ratio. Because of this arrangement, the TP for the HL states appears wellstructured. From any state, there is a strong tendency to transition to one of the neighboring state. This tendency implies that flies transition to states which have a similar speed/curvature ratio. LL states that belong to the same HL state often have more similar speed/curvature ratios. Thus, there is less of a tendency for LL states to transition to the LL state with the closest speed/curvature ratio. Nevertheless, the LL state transition probability matrices, too, are sparse; signifying a distinct pattern of transition between LL states (Figure 2—figure supplement 3B).
Figure 3—figure supplement 1A and B shows the joint distribution for $\hat{v}}_{$  $\hat{v}}_{\perp$ for each state. Each row represents one HL state and its lowlevel children. In each panel, the solid line represents the model predictions and dots represent empirical values. Each LL state is modeled as a multivariate normal distribution of observables. The solid line represents the bound within which the model predicts 85% of the data points corresponding to a given LL state lie. The empirical data points represent randomly selected subset of data corresponding to instants which the model assigns to a given LL state. The close agreement between the two for most LL states implies that the model is an excellent descriptor of the observables. The few exceptions where either the model distribution is too broad or where the model is not a good descriptor of the data reflect cases in which there are not too many data points in the concerned state (See for example, LL state 1 for HL state 6, Figure 3—figure supplement 1A ). Each HL state is a composite of the 5 LL state because for every time instance the fly is in a given HL state, the model assigns a LL state. Therefore, we can consider the probability density function of a HL state as the sum of the probability density functions of its LL states. From this, we can calculate the 85^{th} percentile contour for the corresponding probability density function for the HL states (Figure 3A2 , Figure 4). Again there is a strong agreement between model and empirical data.
Fitting process
Request a detailed protocolWe will first formally define a HMM as follows:
Variable  Description 

s_{t} ∈ [1,2,…N]  Indicates which of the N states is occupied at time t ∈ [1,2,…T] 
A = (a_{ij})  A NxN transition matrix where a_{ij} represents the probability of transitioning from state i to state j. 
O = o_{1},o_{2},…o_{T}  A sequence of observables composing of the twodimensional data set: $\hat{v}}_{$ and $\hat{v}}_{\perp$ 
B_{ti} = P(o_{t}s_{t} = i)  Emission probability describing the probability of an observation o_{t} t ∈ [1,2,…T] being generated from a state s_{t} = i; Assumed to be Gaussian. 
π= π_{1}, π _{2},… π _{N}  The initial probability distribution of states. 
For our two layered HHMM, we chose a structure composing of 10 highlevel states (HL state) on the top level with each HL state being associated with five lowlevel states (LL states) on the bottom level (Figure 2). Each LL state is modeled as a multivariate normal (MVN) distribution on the observables, and the HL states as being fully described by the mixture of LL state distributions associated with the HL state.
Now we will define our two layered HHMM as follows:
Variable  Description 

$s}_{t$ [1,2,…N_{HL}]  Indicates which high level state is occupied at time t ∈ [1,2,…T] 
$u}_{t$∈ [1,2,…N_{LL}]  Indicates which low level state is occupied 
$Q=\left[qkl\right]$  High level state transition probability matrix, q_{kl} represents the probability of transitioning from high level state k to high level state l. 
$A}^{k$= ($a}_{ij}^{k$)  The low level state transition probability matrix. Here $a}_{ij}^{k$ represents the probability of transitioning from low level state i to low level state j given that the high level state is k 
O = o_{1},o_{2},…o_{T}  A sequence of observables composing of the twodimensional data set: $\hat{v}}_{$ and $\hat{v}}_{\perp$ 
B_{tki} = P(o_{t}$s}_{t},{u}_{t$)  Emission probability describing the probability of an observation o_{t} t∈[1,2,…T] being generated from a HLS $s}_{t$=k and associated low level state ${u}_{t}=i$. Assumed to be Gaussian. 
π^{0} π^{k}  The initial probability distribution of HLS. The initial probability distribution of LLS for HLS k. 
We can henceforth refer to the set of model parameters as $\theta =\left[\pi ,A,B\right]$ and the latent state variables as $Z=\left[{s}_{1,\text{}}{u}_{1},{s}_{2,\text{}}{u}_{2},{s}_{3,\text{}}{u}_{3}\dots {s}_{T},{u}_{T}\right]$. For a given set of observations, the goal is to obtain a posterior probability distribution over the parameters and latent state variables. We approximate this posterior distribution as a factorized distribution over latent variables and parameters, that is $p\left(\theta ,Z\right)\approx q\left(\theta \right)q\left(Z\right)$, and use the Variational Bayesian Expectation Maximization (VBEM) algorithm to find the best approximation. q(Z) is obtained using the forwardbackward algorithm which provides sufficient statistics needed to update the approximate posterior distributions over the parameters. In this setting, posterior probability distributions over rows of transition probability matrices are assumed to be Dirichlet, and the prior is chosen to favor selftransition parameters, ${\alpha}_{ii}\phantom{\rule{thinmathspace}{0ex}}>\phantom{\rule{thinmathspace}{0ex}}1$, while discouraging the use of unneeded states that is, ${\alpha}_{ij}\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}1$ for, $i\ne j$. Initial state distributions were also assumed to be Dirichlet with ${\alpha}_{i}\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}1$.
The emissions probability distributions associated with each state were assumed to be Normal inverse Wishart with a prior favoring zero mean and unit variance. For each computational run, the initial parameters of these posterior distributions were randomized.
The priors used for fitting were such that within state transitions (ith HL state to ith HL state) was six times higher than transitions across HL states ((ith HL state to jth HL state). The prior for all itoj transitions were same. The Dirichlet prior over state transitions which we used was very weak. Specifically, in strength, the prior we used corresponds to the equivalent of two observations relative to the >10,000 observations that we used to fit the model for each fly.
VBEM algorithm
Request a detailed protocolThe Variational Bayesian Expectation Maximization algorithm functions by iteratively updating the two components of our factorized approximation to the true posterior over parameters and latent variables and exploits conditional conjugacy to identify explicit update rules for all distributions over parameters. The goal of the VBEM algorithm is described alternatively as minimizing model error as given by the KullbackLeibler divergence between the approximate and true posterior distributions; or maximizing a lower bound on the marginal probability of the data given the model. It is perhaps easiest to see how the algorithm works by considering the KL(q,p) instantiation. Under our factorized approximation, this objective function takes the form
The VBEM algorithm minimizes this objective function using coordinate ascent in the q(θ) q(z) function space. That is we obtain iterative update rules by simply taking the functional derivative if the KL with respect to q(Z) for fixed q(θ) and then solving for q(Z). This results in the socalled E step where we update the posterior distribution over latent variables by averaging the true joint distribution of observations, parameters, and latent variables over our current estimate of the posterior on parameters:
In the so called M step, the roles are reversed:
Here, the tilde indicates equality up to an additive constant. This twostep procedure is repeated until convergence. The simplicity of these equations belies the complexity of the actual calculation of the posterior distribution over latent state variables. This is accomplished using the wellknown forwardbackward algorithm. The particular implementation of the forwardbackward algorithm used in VB differs from the traditional EM implementation in that the parameters of the transition probability matrix are obtained by exponentiation of the geometric mean of the transition probability matrices:
Based on the current posterior over parameters instead of simply using a maximum likelihood or MAP estimate. Because we assumed the rows of the transition probability matrix to be sparse, this further encourages the model to leave unneeded states unused.
Regardless, based on the assignments of observable tracks to HHMM models, we can iteratively update model parameters for each cluster to maximize the probability of observing the observables given the model parameters and the initial LL state probability density using the forwardbackward algorithm. This operation can be thought of as the maximization step where we are calculating the best set of model parameters ($\theta$) that maximizes the Q function.
Although the EM algorithm is guaranteed to get a better fit on every iteration, it often does not converge to a global optimum of the likelihood function. As such, we implemented a system of cluster pruning, splitting, and reassignment to perturb the system in the case of reaching a local minimal fit. In the pruning step, unused clusters are removed until we are left with at most two unused clusters. In the splitting step, one cluster was selected based on a pseudorandom selection weighted by the size of the clusters (number of flies best fit to the cluster). The flies are then clustered into two clusters using kmeans clustering based on the expected distribution of lower level state usage. Individual HHMMs were fit to each of these new clusters. In the cluster reassignment step, we filled each unused cluster(s) with the fly(s) that had the worst fit to the current cluster assignment. After each step, we conducted 10 iterations of the EM algorithm. The sequence of perturbations was conducted for 10 iterations before a final fit using the EM algorithm was conducted until convergence.
Mixture Model
Request a detailed protocolTo account for differing search strategies that may be utilized by different flies we also fit a mixture of HHMMs to our multifly data set. The goal of this model is to identify a small set of different HHMMs that can describe our entire data set by clustering flies according to the similarity in their locomotion. In this context, two flies are said to behave similarly if the same HHMM provides a good description of their behavior. To model this scenario, we added another layer to our Bayesian model of the fly movement dataset. In this topmost layer, we instantiate a Dirichlet process which probabilistically assigns a label, z_{n}, to the n^{th} fly. When z_{n} = k, it indicates that the k^{th} HHMM under consideration governs the fly’s movement. Therefore, in addition to identifying the posterior distribution over high and lowlevel states, we also infer a probability distribution over cluster assignments. These probability distributions are then used to determine how much we should weight each fly’s movement data when updating the parameters of the different HHMM’s.
Model selection
Request a detailed protocolIn this study, we experimented with models with a varying number of HL states (616), and lowlevel states (46). One limitation of our fitting procedure is that it is not possible to compare models with a different number of states objectively, and thus the choice of model depends on the investigator. We chose a model with 10 HL states for two reasons: The most compelling reason is that different model runs with 10 HL states produced results that were more similar to each other than did model runs with either lower or higher number of states. Another reason is that as the number of HL states in the model is increased, many HL states are sparsely used.
We think that the range 6–12 is a fairly narrow range. One important point is that within this narrow range of states, the state duration in all the models we tried were very similar.
One important limitation of our modeling approach (which we realized in hindsight) is that it is difficult to compare across models because everything changes slightly– LL states change, HL states change. We are currently revising the model to keep the LL states fixed so that the modeling approach would mostly focus on how the HL select the LL states. This process will make it much easier to compare across models.
Comparison of HMM with HHMM
Block clustering of HMM
Request a detailed protocolFor a given HMM fit, we may obtain the transition probability matrix (A). To obtain higher level structure from the states fit to a HMM, we may cluster the states based on their likelihood to transition to each other. One method of sorting and clustering the transition matrix into a block diagonal structure involves the use of information bottleneck formalism (Berman et al., 2016; Tishby et al., 1999). We used this method to search for a given K=10 clusters with an inverse temperature $\beta$ from 1 to 300 and time lag values of between 1 to 45 (33 ms to 1500 ms). We found that $\beta$ from 100200 with time lags of 1040 showed some of the same structure as the HHMM. We showed one solution using a time lag of 10 and a $\beta$ of 100 on an HMM fit of 50 states with 31 used states (Figure 1—figure supplement 1 ).
Bayesian model comparison
Request a detailed protocolGiven an HMM and an HHMM model with equal number of total states and observables O, we wish to calculate the probability of the HHMM model given the observations (P(HHMMO)). Using Bayes theorem, we can calculate this with:
Utilizing a flat prior and Bayes factor:
We now obtain:
We utilized the evidence lower bound to approximate $P\left(OHHMM\right)$ and $P\left(OHMM\right)$. From this we can interpret the $P\left(HMMMO\right)$ as 1 – pvalues where $H}_{0$ = the HMM model is a better model fit. We obtain a $P\left(HMMMO\right)$ > 0.9999 indicating that we can reject the HMM model at p < 0.0001.
Parameter comparison
Request a detailed protocolTo define an HMM, we need the transition probability($p({x}_{j}\mid {x}_{i})$ for and the description for the probability distributions defining each state. This means that there will be $K}^{2}+K{n}_{p$ parameters where $K$ designates the total number of states and $n}_{p$ designates the number of parameters defining the distributions. To define a HHMM, we need the transition probabilities at each level $\mathrm{p}{\left({x}_{j}{x}_{i}\right)}_{l}$ for $1\le i,j\le {K}_{l},\text{}1\le l\le L$ and the probability distributions defining each state. This means that in a model architecture in which each state at a level ($l$) is composed of equal numbers of states one level lower ($l1$), there will be $K}_{L}^{2}+{K}_{L}{\left({K}_{L1}\right)}^{2}+\dots +{K}_{2}{\left({K}_{1}\right)}^{2}+{n}_{p}\prod _{l=1}^{L}{K}_{l$ parameters. The distributions used in this report consisted of twodimensional multivariate Gaussians, which are described by five parameters (${\mu}_{1},{\mu}_{2},{\sigma}_{1},{\sigma}_{2},\rho$). The model that we employ in the manuscript has 10^{2}+10*5^{2}+50*5 = 600 parameters.
Posthoc analyses based on the HHMM model
Preprocessing of the model output
Request a detailed protocolFor each time point, the model assigns the probability that the fly is in each of the 10 HL states. We only included time instances where the model assigned a probability of >0.85 for one of the 10 states; this condition was satisfied for 81% of all data points (Figure 2B). Instances during which the flies transitioned from one HL state to another HL state and back to the same initial state in less than five frames (~170 ms) were reassigned to the initial HL state. Furthermore, instances during which the fly only spends one frame (~30 ms) in a certain HL state before changing states were removed. These corrections were done because such short transitions are likely to arise from noise in the observables rather than rapid transitions. Tracks of HL and LL states were extracted to compute the empirical distribution of the observables (gray dots in Figures 3A and 4 and Figure 3—figure supplement 1A/B).
Each HL state track was translated to begin at the origin by subtracting the position of the first point in the track. Then we rotated each track such that the fly is moving in the forward direction (positive y axis) at the start of the track. We considered the overall vector of the 10 frames (330 ms) before each track as the vector defining the fly’s directional intent prior to starting a track. We defined the angle of rotation as the angle between this vector and the forward direction. These translated and rotated tracks allow us to visualize the distinct types of locomotion defined by the HL states (trajectories in Figures 3C and 4).
Sorting of the HL states
Request a detailed protocolThe model output numbers the HL state in a random order with respect to the underlying distribution of observables. To better understand the structure underlying transitions between HL states, we rearranged the states from lowspeedhighturn states to highspeedlowturnstates by sorting the HL states of the model based on the ratio of their mean speed over the variance in curvature.
Analysis of the effect of odors on the HL states
Request a detailed protocolThe nominal radius of the odorzone defined by the radius of the odor tube was 1.25 cm (Jung et al., 2015). However, because there was some spread of the odor outside the odorzone, the actual radius of the odorzone was 1.5cm (Jung et al., 2015). A fly was considered to be inside the odorzone when it was within the 1.5 cm radius. Although the odor was turned on 3 min following the start of the recording period, because the odor was only present inside the odorzone, we considered the first time the fly entered the odorzone after 3 min (first entry) as the start of the odor period. Based on a combination of the presence of odor and the location of the fly, we parsed the data into four categories: These were defined as inside odorzone before first entry (B_{I}), inside odorzone following first entry (D_{I}), outside odorzone before first entry (B_{O}), and outside odorzone following first entry (D_{O}). The standard deviation (SD) of the probability of HL states (P) in each scenario were calculated as follows:
where N is the total number of data points in the scenario.
For each fly, we measured the probability distribution of HL states occupancy for each of our four scenarios. Figure 5 shows the average probability distribution of these HL state occupancy with error bars denoting 95% confidence intervals calculated by bootstrap resampling using the BCa method (Efron and Tibshirani, 1993). We then conducted a bootstrap test for testing equal means in order to determine significance in the change in HL state distribution after odor onset for inside and outside the odor region separately (Efron and Tibshirani, 1993). Briefly, we first consider a given HL state probability of occurrence inside the odor zone for before (B) and after first entry (A). Our null hypothesis was that there was no change in the mean probability of observing the HL state. To test the null hypothesis, we first constructed populations F and G by translating populations A and B respectively such that F and G share a common mean. We then drew bootstrap samples of flies F’ from population F and G’ from population G and calculate the test statistic as follows:
We repeated this process 10,000 times to generate a distribution of the test statistic that we would expect from the null hypothesis. We then calculated the empirical statistic using the same formulation as the test statistic: TS(A,B). Using the distribution of test statistic and our empirical statistic, we conducted a twotailed test with alpha = 0.05 and 0.01. As our data included multiple statistical tests (one for each HL state), we corrected for multiple comparisons by applying the HolmBonferroni procedure. This process was repeated for outside the odor zone.
XMeans clustering
Request a detailed protocolXmeans clustering is an extension of Kmeans clustering. Kmeans clustering is an iterative algorithm that assigns data points to one of K groups based on the distance between points and the cluster centers; in most versions of Kmeans clustering, the number of clusters is specified by the user. Xmeans extends the Kmeans algorithm by computing the Bayesian information criterion (BIC) scores associated with a given Kmeans model fit, and, therefore allows a better assessment of the appropriate number of clusters in the data (Pelleg and Moore, 2000). Following Raftery et al. (Raftery, 1995), we computed the tscore based on the change in BIC given an increase in the number of clusters for N = 34 flies as follows:
In this case, the t statistic and the corresponding p values represent the likelihood that kmeans with a given cluster size will do significantly better than one with smaller cluster size. We chose the maximum cluster K that fulfilled t > 3.86 (p < 0.05) and within 5% of the minimum BIC.
In the analyses in Figure 7 and Figure 7—figure supplement 3, we clustered flies based on a 10dimensional representation of the fractional time a fly spends in each of the 10 HL states. We also employ Xmeans clustering in Figure 8 to cluster a fly’s response to odors. In this analysis, the clustering was performed on odorevoked change in HL state occupancy: For inside the odorzone, D_{I}B_{I} was computed for each fly and was the input to the Xmeans. For outside the odorzone, D_{o}B_{o} was computed for each fly and was the input to the Xmeans (Figure 8).
The clustering was performed on the 10dimensional representation; but, for visual representation, we conducted PCA on the HL state distributions in each of the four scenarios to obtain a 2dimensional representation of the clusters of flies (Figure 7 and Figure 7—figure supplement 3). The first two principal components explain less than 90% of the variance.
The 10dimensional HL state probability distributions used in Figure 7 and Figure 7—figure supplement 3 reside on a 9dimensional probability simplex:
To assess whether the xmeans clusters found in our data set are valid, we conducted xmeans on a set of 34 points sampled randomly from the uniformly distributed 9simplex space our data resides in. To sample from this simplex, we used a flat Dirichlet distribution marked by the following density function.
where K = the number of dimensions. We found that xmeans did not cluster these sampled data points into clusters.
Logistic regression model
Request a detailed protocolWe employed logistic regression (logit) models (Cox, 1958; Dobson and Barnett, 2008) – a generalized linear model which can be used to describe the relationship between independent observations and a binary dependent variable. In this study, we analyzed whether the distribution of HL states in a onesecond window is predictive of whether the fly is inside the odorzone or outside it. Because of the difference in a fly’s behavior inside and outside the odor zone, we separately performed this analysis inside and outside the odorzone. We performed three different logistic regressions in this study: The analysis process is the same; the only difference is how flies were grouped. In the first analysis (Figure 7—figure supplement 1), a single regression was performed for all the flies. In the second analysis (Figure 8A/B, Figure 8—figure supplement 1), we calculated regressions individually for each fly so that there are 34 regressions –one for each fly. Finally, in Figure 8C, we calculated logistic regression individually for each cluster of flies – five clusters inside the odorzone and four outside the odorzone. The process for performing logistic regression is described below and illustrated in Figure 7—figure supplements 1 and Figure 8—figure supplement 1.
First, we divide the data into 1 s timebins. We also performed logistic regression on data subdivided into 0.33, 0.66, and 3 s bins (corresponding to 10, 20 and 90 data points) with varying amounts of time overlap and found no notable differences in model predictions. Because we want the chance prediction to be 50% in each analysis, bins were randomly removed from either the before or during case such that the total number of bins were the same for the before period and during period. Next, we performed principal component analysis (PCA) on the distribution to obtain a smaller number of uncorrelated variables. We considered the smallest number of principal components that cumulatively explained over 90% of the variance in our analysis.
The resulting principal components were used as predictors in fitting to a logarithmic regression model. We used the MATLAB builtin function ‘glmfit.m’ to implement fitting to a generalized linear model. For fitting to logit model, we used a binomial distribution (having experienced the odor or not) and the ‘logit’ criterion. The resulting logistic function based on the population data was used to predict if a fly was experiencing odor in any given 1 s bin.
To evaluate the predictive power of the raw observables (speed and curvature) on the behavior of the flies, the GLM was fit using the speed and curvature as predictors instead of the distributions of HL states. To compare the relative probability of correct predictions on for each fly between two different types of GLM fits (M_{1}, M_{2}), we considered the perpendicular distance ($\overrightarrow{\mathrm{D}}\mathrm{j}$) of these fits from the line of unity (indicating perfect correlation).
To determine whether HL states performed better than the observables, a Wilcoxon matched rank test was conducted on the 34 distances calculated for each of the model comparisons (Figure 8B, Figure 7—figure supplement 1).
As Xmeans was conducted on the average distribution of HL states for each fly in each scenario, the partitioning obtained will not necessarily reflect the optimal clustering of flies based on ability to distinguish if the fly has encountered odor or not given a smaller (1 s) time bin observation of HLS distribution. To better partition flies into clusters based on both the average difference across time and smaller snapshots of difference across the 1 s time frame, we took Xmeans as the first partitioning of flies. Then we took a total of 6 flies from the largest clusters (n > 10) that had the worst predictive power given GLM fits based on clusters. We then redistributed these flies into clusters in order to maximize the sum of the predictive power of individual flies across all clusters.
Generation of synthetic flies
Request a detailed protocolHHMM synthetic flies were generated based on the transition probabilities for each of the four scenarios (B_{I}, B_{O}, D_{I} and D_{O}) separately. To generate a synthetic track, an HL state was first chosen based on the occurrence probability distribution of HL states. At this point, a new HL state was assigned for the next time instance ($x}_{i$) based on the HL transition probability matrix. Since the empirical flies spent variable time in the four scenarios, in the creation of the tracks we chose the median time spent. Therefore, each synthetic fly lasted until the total duration was reached for each scenario (Figure 7—figure supplement 2). 100 sets of 34 synthetic flies were generated for each of the four scenarios (Figure 7A, Figure 7—figure supplement 3). The resulting synthetic average distribution of HL states for each of the four scenarios was compared across the 100 iterations and showed high consistency between iterations.
Prediction of clustersbased subsampling
Request a detailed protocolTime points were sampled using four methods based on bin duration for each fly for the two scenarios with the most data (B_{o} and D_{i}) separately (Figure 7—figure supplement 4). In method one, for a given scenario, segments of continuous repeated HL states were randomly sampled and stitched together until the duration of the time bin was fulfilled. In the second method, a window lasting the time bin was sampled from the HL states for the scenario. In method three, the segments were sampled starting with the first time point of the scenarios. In method four, the segments were sampled starting with the last time point of the scenarios and moving backwards in time. After sampling, the average HL state distributions were calculated for each time bin and the Euclidean distance from the distribution to the centroid of each Xmeans cluster for the given scenario were calculated. The closest cluster was compared to the Xmeans cluster assignment based on all time points. This process was repeated 100 times to generate a mean percentage of correctly labeled flies based on the subsampling duration. Chance was calculated as the probability of choosing a fly for a given cluster and being in the Voronoi cell of the cluster. This translates to:
where $x}_{i$ is the probability of observing cluster i based on the number of flies in each cluster, $p}_{i$ is a weighting based on the size of Voronoi cell in the simplex space, and K is the total number of clusters.
Data availability
Data has been deposited in Dryad Data Repository and is available at doi:10.5061/dryad.m930f2m. All the code required to fit the HHMM model, and perform all the analysis in this manuscript can be found on Github (https://github.com/bhandawat/HHMM; copy archived at https://github.com/elifesciencespublications/HHMM).

Dryad Digital RepositoryData from: Statistical structure of locomotion and its modulation by odors.https://doi.org/10.5061/dryad.m930f2m
References

On drive, conflict and instinct, and the functional organization of behaviorProgress in Brain Research 45:425–447.https://doi.org/10.1016/S00796123(08)61002X

How Animals Communicate98–134, Modal action patterns, How Animals Communicate, Indianapolis, University of Indiana Press.

Statistical inference for probabilistic functions of finite state markov chainsThe Annals of Mathematical Statistics 37:1554–1563.https://doi.org/10.1214/aoms/1177699147

ThesisVariational Algorithms for Approximate Bayesian InferenceUniversity of London, London.

Mapping the stereotyped behaviour of freely moving fruit fliesJournal of the Royal Society Interface 11:20140672.https://doi.org/10.1098/rsif.2014.0672

Highthroughput ethomics in large groups of DrosophilaNature Methods 6:451–457.https://doi.org/10.1038/nmeth.1328

The role of visual and mechanosensory cues in structuring forward flight in Drosophila melanogasterJournal of Experimental Biology 210:4092–4103.https://doi.org/10.1242/jeb.006502

Freeflight responses of Drosophila melanogaster to attractive odorsJournal of Experimental Biology 209:3001–3017.https://doi.org/10.1242/jeb.02305

ConferenceTechniques to incorporate the benefits of a hierarchy in a modified hidden markov model21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. pp. 120–127.

The regression analysis of binary sequencesJournal of the Royal Statistical Society: Series B 20:215–232.https://doi.org/10.1111/j.25176161.1958.tb00292.x

BookAn Introduction to Generalized Linear Models (Third Edition)Chapman & Hall/CRC Press.

The hierarchical hidden markov model: analysis and applicationsMachine Learning 32:41–62.https://doi.org/10.1023/A:1007469218079

Multilevel control of run orientation in Drosophila larval chemotaxisFrontiers in Behavioral Neuroscience 8:.https://doi.org/10.3389/fnbeh.2014.00038

Habits, rituals, and the evaluative brainAnnual Review of Neuroscience 31:359–387.https://doi.org/10.1146/annurev.neuro.29.051605.112851

Interaction between central and peripheral mechanisms in the control of locomotionProgress in Brain Research 50:227–235.https://doi.org/10.1016/S00796123(08)608237

Unraveling the neural basis of insect navigationCurrent Opinion in Insect Science 24:58–67.https://doi.org/10.1016/j.cois.2017.09.001

Parallel use of two behavioral mechanisms for chemotaxis in Caenorhabditis ElegansJournal of Neuroscience 29:5370–5380.https://doi.org/10.1523/JNEUROSCI.363308.2009

Zigzagging and casting as a programmed response to windborne odour: a reviewPhysiological Entomology 8:109–120.https://doi.org/10.1111/j.13653032.1983.tb00340.x

Idiothetic path integration in the fruit fly Drosophila melanogasterCurrent Biology 27:2227–2238.https://doi.org/10.1016/j.cub.2017.06.026

The vestibuloocular reflex during human saccadic eye movementsThe Journal of Physiology 373:209–233.https://doi.org/10.1113/jphysiol.1986.sp016043

Multiscale chromatin state annotation using a hierarchical hidden markov modelNature Communications 8:15011.https://doi.org/10.1038/ncomms15011

ReportLearning Musical Pitch Structures with Hierarchical Hidden Markov Models Technical ReportUniversity of Edinburgh.

ConferenceLinear time inference in hierarchical HMMsProceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. pp. 833–840.

ConferenceLearning and detecting activities from movement trajectories using the hierarchical hidden Markov model2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). pp. 955–960.https://doi.org/10.1109/CVPR.2005.203

ConferenceXmeans: Extending Kmeans with efficient estimation of the number of clustersProceedings of the Seventeenth International Conference on Machine Learning. pp. 727–734.

The fundamental role of pirouettes in Caenorhabditis Elegans chemotaxisThe Journal of Neuroscience 19:9557–9569.https://doi.org/10.1523/JNEUROSCI.192109557.1999

Bayesian model selection in social researchSociological Methodology 25:111–163.https://doi.org/10.2307/271063

Odor source localization in complex visual environments by fruit fliesThe Journal of Experimental Biology 221:jeb172023.https://doi.org/10.1242/jeb.172023

Coordination of legs during straight walking and turning in Drosophila melanogasterJournal of Comparative Physiology A 167:403–412.https://doi.org/10.1007/BF00192575

Collisionavoidance and landing responses are mediated by separate pathways in the fruit fly, Drosophila melanogasterThe Journal of Experimental Biology 205:2785–2798.

Foundations of Animal Behavior: Classic Papers with Commentaries406–413, The hierarchical organization of nervous mechanisms underlying instinctive behaviour, Foundations of Animal Behavior: Classic Papers with Commentaries, Chicago, University of Chicago Press.

ConferenceThe information bottleneck method37th Annual Allerton Conference on Communication, Control and Computing. pp. 368–377.

Mechanisms of animal navigation in odor plumesThe Biological Bulletin 198:203–212.https://doi.org/10.2307/1542524

Variability in odormodulated flight by mothsJournal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology 182:191–202.https://doi.org/10.1007/s003590050170

ConferenceDetecting and predicting of abnormal behavior using hierarchical Markov model in smart home networkIEEE 17th International Conference on Industrial Engineering and Engineering Management. pp. 410–414.
Decision letter

Ronald L CalabreseReviewing Editor; Emory University, United States

K VijayRaghavanSenior Editor; National Centre for Biological Sciences, Tata Institute of Fundamental Research, India
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Statistical structure of locomotion and its modulation by odors" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Ronald L Calabrese as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by K VijayRaghavan as the Senior Editor.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
In this manuscript, the authors present a creative analysis of fly walking in response to an attractive odor (apple cider vinegar, ACV) in circular arena in the dark and where odor can be strictly limited to a central zone. They monitor fly movement as velocity (parallel and perpendicular to prior movement) prior to and after odor is introduced to the arena center. They then analyze the data with a Hidden Markov Model (HMM) and a Hidden Hierarchal Markov Model (HHMM) and show decisively that the HHMM fits the data better and that the high level sates (HL) describe stereotypical locomotor components and that transitions between them vary in probability such that states with similar velocities have higher transition probability. They then show that introduction of odor affects the probability that a given HL state is occupied and varies inside and outside the odor zone with final spatial structure. A surprising finding is that the average probability distributions of the HL states cannot be used to determine the presence of the odor. Subdividing the flies according to their individual pre and during odor probability distributions and clustering reveal 34 categories of responses and there are different cluster of the 34 flies tested for each of the four conditions (pre inside odor zone, pre outside odor zone, during inside odor zone, during outside odor zone). The most interesting conclusions of the paper are: 1) the observation that the HHMM model fits better than the HMM model, implies that there is structure at multiple time scales in the data 2) under the specific conditions imposed that although all flies in dataset use the same set of locomotor features, individual flies vary considerably in how often they employ a given locomotor feature (HL state), and how this usage is modulated by odor. The paper should arouse general interest in the behavioral neuroscience community.
Essential revisions:
1) There are concerns about how well the data constrains particular HHMM put forward that can be allayed by putting confidence intervals on the probability distributions of Figure 5 and Figure 6.
2) Several of the reviewer concerns focus on dwell times in each HL state. These dwell time should be analyzed and distributions presented and interpreted.
3) The observation that the HHMM model fits better than the HMM model, implies that there is structure at multiple time scales in the data. For example, HL states may be prolonged because the systems transitions among nearby LL states within the HL state; is this supported by the dwell time of LL states within a prolonged HL state? The authors also note in the Discussion that there is structure on longer time scales. Can they think of some analysis that would pull this out?
4) The findings about individuality and the fact that the presence of odor can be predicted from a model that takes individuality into account but not from a model that does not are interesting. Here however, some additional analyses or data would help support the claim. For example, can the authors compare the distribution of states early versus late in the trial and show that each individual still occupies a characteristic set of states? How do HL state dwell times vary among individual or early vs. late in a trial? Would more flies analyzed and/or for longer period of time help resolve the individuality issues?
5) The detailed reviewer comments are appended and that should be addressed in light of the consensus concerns above.
Reviewer #1:
Concerns
1) Subsection “A small number of strategies can explain the variability in flies’ response to odor” third paragraph: be explicitly clear as to how many clusters were found for each case illustrated in Figure 9. In Figure 9A, I assume there were at least 5 clusters but that cluster 4 had less than 4 flies; were there other clusters? Be very clear as to the number of clusters for Figure 9A and 9B.
2) Results section: you state in results that flies "…spend 60% of time performing a locomotor feature for >300 milliseconds, and >10% of their time performing a single locomotor feature for >3s (Figure 2—figure supplement 1)”. The first statement appears compatible with the cumulative distribution in the figure, but I don't see the second at all. Am I missing something or is the maximum duration illustrated 1.5 s and less that 5% of states are of this duration or longer? Please plot in a supplemental figure the real distribution of time duration of states in the data. These data are essential if you are to make claims like "These locomotor characteristics can persist over 3 seconds – a time period during which a fly takes 30 steps on average (given a step frequency of 10 Hz). This tight control over 𝑣̂ over tens of steps strongly suggests that locomotion unfolds, not on a stepbystep basis, but in blocks of tens of steps." Or like "A fly moves at a relatively constant 𝑣̂ and 𝑣̂⫠ for tens of step." It seems for example, that for states that last 300 ms that only 3 steps are possible and there are many states that last shorter periods. Are moving states distributed in duration longer/shorter that stopping states?
3) Subsection “Odors affect behavior on a fine spatial scale.”, paragraph two: Here you are creating a sequence of behavior based on averages and YET you claim that average data does not describe a fly's behavior but there are distinct strategies. Please clarify.
Reviewer #2:
1) The model assumes that behavior is best modeled as set of discrete states (with the implication, I think, that this is how they are controlled neurally). The alternative possibility, alluded to briefly in subsection “Flies show considerable variability in locomotion despite employing the same locomotor features.”, is that some parts of the behavior could be better represented (or controlled) as continua. Since velocity is a continuous 1D variable, and since the fly must pass through intermediate velocities to transition from low to high velocity and viceversa, I think this alternative should be considered. I am not suggesting the authors entirely revamp their model but I think this point could be discussed or considered a bit more prominently.
2) In the analysis of spatial structure (Figure 6), the role of time history should be considered/discussed. For example, behavior near the odor border might be different because the fly is more likely to have experienced no odor shortly before odor (or vice versa) than in the center of the arena. The responses of olfactory neurons are well known to show responses that depend on history over multiple time scales, so this point should be considered/discussed. Whether behavior could also be influenced by airflow at different parts of the chamber should be noted (without assuming the reader knows the details of the earlier paper). For example, could the airflow from the vacuum be causing the flies to slow down inside the odorized area, as in Yorozu et al., 2009?
3) The findings about individuality are quite interesting. As mentioned above, the analysis in which the presence of odor can be predicted based on individual flies or clusters of flies, but not the whole dataset is quite interesting. However, as the authors note in the response to Reviewer 3, individuality in responses has also been investigated elsewhere. A typical analysis in these studies (de Bivort, Branson) is to compare behavior of the same individual at different time points and to show that they are more similar to themselves than to each other. The authors might consider splitting their data in time and repeating the analysis, although the time interval of the data here is relatively short (6 min). In addition, it would be nice to know if they could correlate their clusters with anything else about the flies. In the response to reviewers the authors allude to different behaviors produced by different genotypes. I think it would add to the interest of the study if these data were included.
4) The comparison of HMM and HHMM models is nicely rigorous but seems highly technical for this journal. The authors should consider focusing the text on the conceptual conclusions that can be drawn from this comparison.
Reviewer #3:
Before publication some work is needed to address questions about: 1) how to interpret HL behaviors (survival time statistics of HL states and relationship to the structure of a 10HL5LL model); 2) error bars.
Main comments:
Error bars.
I could not get a clear sense of how well the data is constraining the HHMM model. For example in Figure 5, what is the uncertainty of these distributions? Can the authors bootstrap the data? Likewise in Figure 6 how many data points go in each colored dot? What is the uncertainty of the probability plots as a function of radial distance? There are no error bars on these plots.
Time spent in HLs.
One of the reasons HHMM seems to work better than HMM is that behaviors have extended durations and since the trajectory can jump around the LL states within a single HL state, that HL state can last long. Even though this is a key aspect of the model, there is no analysis or plot of the waiting time distribution in each HL state that would give an idea of the "duration" of such states during behavior and how that duration depends on the number of HL and LL states used in the HHMM model. The only thing I could find was Figure 2—figure supplement 1 (but this includes all states) and the statement made: "A fly moves at a relatively constant 𝑣̂ and 𝑣̂⫠ for tens of step. This tendency means that a fly's locomotion can be decomposed into a small number of locomotor features – 10 features in the case of the model we present.". However, this main conclusion is provided without plots about it. Supporting this point with quantitative data analysis is important because the authors use it to interpret various aspect of the results. It is also important to know if some HLs state last much longer than other in order to interpret the data. This may also clarify why 10HLs each with 5LLs are used in the HHMM. See below.
Subsection " HHMM reveals that a fly’s locomotion is surprisingly structured in the velocity space…".
An interesting outcome underlined by the authors (last paragraph) is that the resulting HHMM accounts for 80% of all the data (within 85% confidence interval) using 10 locomotor features (10 HLs). Some of the HL states correspond to clearly distinct behavior, such as 1=meander, 2=stoptowalk or 9=fast right turn, 10=fast left turn. However, other HLs seem to be part of a continuum, such as 4,5,7= medium speed walking, and it is not entirely clear why 3 distinct HLs are needed to describe medium speed walking. Some HLs seem to fit less well the data, e.g. HL 6. I could not find a clear explanation of why 10HLs each with 5LLs were chosen and whether fewer HLs would have worked as well. What would happen if the HHMM had less HL nodes but more LL nodes in each HL? A discussion of survival times in each HLs might help sort this out and provide a way to interpret states 4,5,7. Are these different behaviors or one behavior distributed over 3 HL?
Subsection “Locomotor features and implications for neural control of behavior”.
In the discussion the authors say that v_parallel lies within a narrow range that is distinct for each three state and that 3 states reflect a tight control on locomotion by the brain. They mention these locomotor state can persist up to 3 second/30 steps and suggest that this indicates that the brain controls locomotion in blocks of 10 steps. I find this conclusion drawn from the finding of these 3 clusters too speculative given the data: In Figure 3—figure supplement 1 and Figure 4 the distributions of v_par overlap significantly between HL 4,5 and 7. This is again related to the two points above. Finally, if HL 4 5 7 are really distinct behaviors then another possibility could be that they are needed to account for flytofly variability in the data, and that for a single fly only one state is sufficient to describe medium speed walk.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Statistical structure of locomotion and its modulation by odors" for further consideration at eLife. Your revised article has been favorably evaluated by K VijayRaghavan (Senior Editor), a Reviewing Editor, and two reviewers.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
Significant additional revisions are still required. The comments of the expert reviewers, below, are detailed and require full responses.
Reviewer #2:
This manuscript compares HMM and HHMM approaches to clustering behavioral data from flies in an olfactory paradigm. Based on a rigorous comparison of models, the authors conclude that the HHMM model performs better than the HMM model, implying the presence of hierarchical longtime scale structure in the data. They further analyze the behavior of individual flies and show that responses to odor cannot be understood in terms of variation around an "average fly" but rather are better understood as belonging to 34 response types. Response types of individuals are stable over the duration of the experiment. Overall the methods introduced for analyzing complex behavioral data appear to be sound and are likely to be of broad interest to scientists studying natural behavior and its neural correlates.
Two points need to be addressed before acceptance:
1) How do the priors used in fitting constrain the structure of the transition matrices found for the models? In the Results section, the authors state that "the assumption of Markov dynamics with a sparse prior on state transitions penalizes the consideration of unlikely state transitions base upon recent history and future destinations." This suggests that a sparse prior was applied in fitting. However, later in the paper the authors state: "The transition probability matrix for the HMM was sparse, suggesting that from each state there are transitions to only a handful of other states." If a sparse prior was applied during fitting than the fact that the resulting transition matrix is sparse is not really a finding about the data.
Similarly, the authors need to clarify what fitting priors or constraints were placed in the HHMM model. The Results state "This is because any HHMM which puts very specific constraints on the transitions probability matrix can be represented by an HMM but not viceversa." What are the constraints on the transition probability matrix for the HHMM? The text says "The transition probability matrix is sparse a vast majority of transitions from each HL state were to 23 other HL states." Was this imposed by fitting priors?
2) Parts of the manuscript are somewhat long and repetitive. I think the manuscript could be productively shortened to have greater impact on its readers.
For example, in the section titled "HHMM reveals that a fly's locomotion is surprisingly structured in the velocity space" the last paragraph is devoted to restating the major conclusion of the first section: HHMM performs better than HMM. I think this could be reduced or folded into section one, which focusses on this comparison, and the section focused more exclusively on describing the HL states uncovered by the model.
In the Discussion subsection “Flies show considerable variability in locomotion despite employing the same locomotor features” paragraphs two and three are highly descriptive of the individual fly data and somewhat repetitive with parts of the Results subsection “Both locomotion and an odor’s effect on locomotion is fly dependent” (already rather long). I think these sections could be condensed to make the major points about individual variability in the Results, and use the Discussion mostly to compare these findings with the results of other studies.
Reviewer #3:
The authors have addressed most of my concerns expect the important one regarding the confidence interval on the distributions in Figure 5. What I am asking is for the authors to use bootstrapping to extract many sample distributions from subsets of the data in order to get a confidence interval on how these bins are populated by the data and how significant the changes are between before and during odor exposure.
https://doi.org/10.7554/eLife.41235.030Author response
Essential revisions:
1) There are concerns about how well the data constrains particular HHMM put forward that can be allayed by putting confidence intervals on the probability distributions of Figure 5 and Figure 6.
There are two sources of variability. The first corresponds to how well the data constrains the model. This source of variability can be quantified as follows: The standard deviation (SD) of the probability of HL states (P) in each scenario can be calculated as follows:
SDj=Pj1PjN1
∀j∈1,2,3,…10HLstate
Where N is the total number of data points in each scenario. This standard deviation is extremely small and is shown in Figure 5.
The second source of variability is how different is the behavior of different flies in a given scenario. Getting to the bottom of this variability is the main question that drives the manuscript from Figure 79.
2) Several of the reviewer concerns focus on dwell times in each HL state. These dwell time should be analyzed and distributions presented and interpreted.
We have analyzed the dwell times of each HL state (Figure 2—figure supplement 2). Specifically, we plot the cumulative distributions of the HL states. We also plot the percentage of time a fly spends in a state of a given duration. These plots show the differences in dwell time in different HL state.
3) The observation that the HHMM model fits better than the HMM model, implies that there is structure at multiple time scales in the data. For example, HL states may be prolonged because the systems transitions among nearby LL states within the HL state; is this supported by the dwell time of LL states within a prolonged HL state? The authors also note in the Discussion that there is structure on longer time scales. Can they think of some analysis that would pull this out?
The plot on dwell time – Figure 2—figure supplement 2 – also plots the average dwell time of the lowlevel states and the number of LL state transitions/transition into HL state as a function of the duration of the HL state. These data support the idea stated above: The number of transitions increase when the duration of HL states become longer. Thus, this analysis works exactly as the reviewers envision above, and as one would expect if the data has structure on multiple time scales.
With regards to pulling out structure on an even longer time scale: we don’t think that there is any simple analysis that would work. We think that pulling out longer timescale likely requires a different model with more hierarchical layers, and likely a different set of observables.
4) The findings about individuality and the fact that the presence of odor can be predicted from a model that takes individuality into account but not from a model that does not are interesting. Here however, some additional analyses or data would help support the claim. For example, can the authors compare the distribution of states early versus late in the trial and show that each individual still occupies a characteristic set of states? How do HL state dwell times vary among individual or early vs. late in a trial? Would more flies analyzed and/or for longer period of time help resolve the individuality issues?
We performed the analysis suggested here (Figure 7—figure supplement 4). We find that even if you take a 1second chunk of data, it assigns flies to the correct cluster at a level greater than chance. A 30second chunk of data classifies >80% of the flies accurately. The correctly assigned fraction is about the same whether we use the first 30 seconds of data or the last 30 seconds of data.
It is important to note that what we are claiming is that these 34 sixminute trajectories do not belong to the same cluster. To establish individuality, one needs trajectories from the same fly over multiple days. It is also important to control the environmental conditions within a very tight bound. We were anal about the environment conditions. But, because the experiments were not specifically designed to get at the question of individuality, therefore the bounds on many conditions – age of the fly (35 days old), culture conditions (50200 eggs) should be controlled even more tightly. Moreover, we did not control humidity at the time of the experiment. In essence, collecting longitudinal data under even more controlled conditions from a large population of flies is necessary to settle questions regarding individuality, and is well beyond the scope of this study.
5) The detailed reviewer comments are appended and that should be addressed in light of the consensus concerns above.
We agree with most of the reviewer critiques and have addressed them in the manuscript.
Reviewer #1:
Concerns
1) Subsection “A small number of strategies can explain the variability in flies’ response to odor” third paragraph: be explicitly clear as to how many clusters were found for each case illustrated in Figure 9. In Figure 9A, I assume there were at least 5 clusters but that cluster 4 had less than 4 flies; were there other clusters? Be very clear as to the number of clusters for Figure 9A and 9B.
Thank you. We have specified the number of clusters in the figure in subsection “A small number of strategies can explain the variability in flies’ response to odor”.
2) Results section: you state in results that flies "…spend 60% of time performing a locomotor feature for >300 milliseconds, and >10% of their time performing a single locomotor feature for >3s (Figure 2—figure supplement 1)”. The first statement appears compatible with the cumulative distribution in the figure, but I don't see the second at all. Am I missing something or is the maximum duration illustrated 1.5 s and less that 5% of states are of this duration or longer? Please plot in a supplemental figure the real distribution of time duration of states in the data. These data are essential if you are to make claims like "These locomotor characteristics can persist over 3 seconds – a time period during which a fly takes 30 steps on average (given a step frequency of 10 Hz). This tight control over 𝑣̂ over tens of steps strongly suggests that locomotion unfolds, not on a stepbystep basis, but in blocks of tens of steps." Or like "A fly moves at a relatively constant 𝑣̂ and 𝑣̂⫠ for tens of step." It seems for example, that for states that last 300 ms that only 3 steps are possible and there are many states that last shorter periods. Are moving states distributed in duration longer/shorter that stopping states?
The referee raises three important points here: 1) one concerning statements that we made regarding state duration, 2) general issues related to state duration, 3) conclusions we draw based on the state durations.
With regards to the specific statements that >10% of the time is spent performing locomotor features >3 s: Figure 2—figure supplement 1 plots the fraction of transitions above a certain duration. To estimate how much time a fly spends in a state that is >a certain duration (t), we have to multiply N(t)*t, where N is the number of transitions of exactly duration t, and calculate P(>t) from the resulting quantity. We have now shown this calculation in Figure 2—figure supplement 2B. The number 10% comes from the mean curve (black line). When we examined the same distribution by state, we found (as implied by the reviewer above) that the stop state (State 2) dominates the long duration states. Interestingly, the fast states – states 9 and 10 and state 1 (meander) also has many long duration transitions. The medium speed states – states 47 rarely have long duration transitions. Therefore, the conclusion that “A fly moves at a relatively constant 𝑣̂ and 𝑣̂⫠ for tens of step” is weaker than we originally thought. The general idea that flies can reside within the same state for many steps is true, and we have clarified the discussion in subsection “Odors affect behavior on a fine spatial scale” to reflect the same.
With regards to the general issue related to state durations – we present a new analysis which quantifies – 1) duration of HL states, 2) duration of lowlevel states, 3) No. of lowlevel state transition/transition into HL states. These analyses are detailed in Figure 2—figure supplement 2.
Finally we have clarified the conclusions we draw from state durations. We state clearly that the states of the HHMM are not necessarily the states employed by the system. In fact, as pointed out by another reviewer, there is no guarantee that there are discrete states at all. We completely agree with that sentiment and described explicitly in the discussion subsection “Limitations of the model”.
3) Subsection “Odors affect behavior on a fine spatial scale.”, paragraph two: Here you are creating a sequence of behavior based on averages and YET you claim that average data does not describe a fly's behavior but there are distinct strategies. Please clarify.
This is an excellent point. We have deleted the entire discussion.
Reviewer #2:
1) The model assumes that behavior is best modeled as set of discrete states (with the implication, I think, that this is how they are controlled neurally). The alternative possibility, alluded to briefly in subsection “Flies show considerable variability in locomotion despite employing the same locomotor features.”, is that some parts of the behavior could be better represented (or controlled) as continua. Since velocity is a continuous 1D variable, and since the fly must pass through intermediate velocities to transition from low to high velocity and viceversa, I think this alternative should be considered. I am not suggesting the authors entirely revamp their model but I think this point could be discussed or considered a bit more prominently.
The most parsimonious way to think about the discrete vs. continuous is to think about the timeseries of velocity and curvature as outputs of a dynamical system. In this framework, the HL states would represent the system being in a vicinity of a fixed point for the vast majority of time. We have included this idea in the discussion in subsection “Extracting observables from the trajectory”.
2) In the analysis of spatial structure (Figure 6), the role of time history should be considered/discussed. For example, behavior near the odor border might be different because the fly is more likely to have experienced no odor shortly before odor (or vice versa) than in the center of the arena. The responses of olfactory neurons are well known to show responses that depend on history over multiple time scales, so this point should be considered/discussed. Whether behavior could also be influenced by airflow at different parts of the chamber should be noted (without assuming the reader knows the details of the earlier paper). For example, could the airflow from the vacuum be causing the flies to slow down inside the odorized area, as in Yorozu et al., 2009?
The reviewer raises two issues here:
The first issue is that the behavior is not just spatially structured, but spatiotemporally structured. We agree with the reviewer. The spatial structure we present in Figure 6 is actually spatiotemporal structure. We have discussed this issue and added a figure (Figure 6—figure supplement 1) which shows that the behavior evolves as a function of time. However, this evolution is complete within the first 10 seconds or so. Unfortunately, because of 1) individuality among flies, 2) because we did not quantify the concentration of odor as a function of time (after the first minute), we cannot make a definitive analysis of the nature of the spatiotemporal structure in the data. The main point we are making in Figure 6 is that the HL states provide an important analytical tool for analyzing this structure.
The second issue relates to the airflow. We are quite confident that behavior in the arena is minimally impacted by airflow because the airflow we use is much smaller than almost any other study on fly olfaction, and is wellbelow the reported thresholds for anemotactic responses. More importantly, in the original study (Jung et al., 2015), we had performed controls in which we compared the effect of turning on the air, then the air and vacuum on the fly’s locomotion. We found that there was no observed effect of either the air or the air/vacuum. This issue is discussed in subsection “HHMM modeling”.
3) The findings about individuality are quite interesting. As mentioned above, the analysis in which the presence of odor can be predicted based on individual flies or clusters of flies, but not the whole dataset is quite interesting. However, as the authors note in the response to Reviewer 3, individuality in responses has also been investigated elsewhere. A typical analysis in these studies (de Bivort, Branson) is to compare behavior of the same individual at different time points and to show that they are more similar to themselves than to each other. The authors might consider splitting their data in time and repeating the analysis, although the time interval of the data here is relatively short (6 min). In addition, it would be nice to know if they could correlate their clusters with anything else about the flies. In the response to reviewers the authors allude to different behaviors produced by different genotypes. I think it would add to the interest of the study if these data were included.
There are two important points here. First, given our limited dataset, can we still assess whether the clusters are stable? This analysis is presented in Figure 7—figure supplement 4. The stability of the cluster depends on – 1) How different are the clusters from each other? If the clusters are very different, then only a small temporal sample is enough to assign flies to clusters. To assess how different the clusters are from each other, we assembled tracks of a given length from each fly by sampling data points at random. We found that there is a high probability that tracks >10 second in length were assigned to the original cluster. This analysis implies that the cluster means are sufficiently different that small chunks are representative.
2) How long of a timesample does one need to describe the average behavior? We analyzed whether chunks of contiguous data points of a given length are representative of the original cluster. We performed two analyses: a) we asked whether chunks of a given length belong to the original cluster. We found that we could assign the flies to their original cluster at greater than chance level. b) We chose chunks either at the beginning or at the end of the track. Once again, we find that small chunks are predicted to be in their respective cluster at levels much greater than chance.
Second, we examined whether the clusters correlated with 1) time of recording, 2) starvation time, 3) age of the fly. We found no trends.
4) The comparison of HMM and HHMM models is nicely rigorous but seems highly technical for this journal. The authors should consider focusing the text on the conceptual conclusions that can be drawn from this comparison.
We agree with the reviewer. But, based on the previous set of reviews, we would like to keep that description. A large section of the audience has grave misunderstandings about HHMM vs. HMM. It is best to clarify these misconceptions at the outset.
Reviewer #3:
Before publication some work is needed to address questions about: 1) how to interpret HL behaviors (survival time statistics of HL states and relationship to the structure of a 10HL5LL model); 2) error bars.
Main comments:
Error bars.
I could not get a clear sense of how well the data is constraining the HHMM model. For example in Figure 5, what is the uncertainty of these distributions? Can the authors bootstrap the data? Likewise in Figure 6 how many data points go in each colored dot? What is the uncertainty of the probability plots as a function of radial distance? There are no error bars on these plots.
There are two sources of variability. The first corresponds to how well a model constrains the data. This source of variability can be quantified as follows: The standard deviation (SD) of the probability of HL states (P) in each scenario were calculated as follows:
SDj=Pj1PjN1
∀∀j∈1,2,3,…10HLstate
Where N is the total number of data points in each scenario. This standard deviation is extremely small and is shown in Figure 5.
The second source of variability is how different is the behavior of different flies in a given scenario. Getting to the bottom of this variability is one of the questions which drives the manuscript after Figure 6.
Time spent in HLs.
One of the reasons HHMM seems to work better than HMM is that behaviors have extended durations and since the trajectory can jump around the LL states within a single HL state, that HL state can last long. Even though this is a key aspect of the model, there is no analysis or plot of the waiting time distribution in each HL state that would give an idea of the "duration" of such states during behavior and how that duration depends on the number of HL and LL states used in the HHMM model. The only thing I could find was Figure 2—figure supplement 1 (but this includes all states) and the statement made: "A fly moves at a relatively constant 𝑣̂ and 𝑣̂⫠ for tens of step. This tendency means that a fly's locomotion can be decomposed into a small number of locomotor features – 10 features in the case of the model we present.". However, this main conclusion is provided without plots about it. Supporting this point with quantitative data analysis is important because the authors use it to interpret various aspect of the results. It is also important to know if some HLs state last much longer than other in order to interpret the data. This may also clarify why 10HLs each with 5LLs are used in the HHMM. See below.
This is an important point. We have now analyzed the duration distributions for both the HL and LL states (Figure 2—figure supplement 2). A short description of our findings is presented in the essential revisions and repeated below. We have analyzed the dwell times of each HL state (Figure 2—figure supplement 2). Specifically, we plot the cumulative distributions of the HL states. We also plot the percentage of time a fly spends in a state of a given duration. These plots show the differences in dwell time in different HL state.
The plot on dwell time – Figure 2—figure supplement 2 – also plots the average dwell time of the lowlevel states and the number of LL state transitions/transition into HL state as a function of the duration of the HL state. These data support the idea stated above: The number of transitions increase when the duration of HL states become longer. Thus, this analysis works exactly as the reviewers envision above, and as one would expect if the data has structure on multiple time scales.
Subsection " HHMM reveals that a fly’s locomotion is surprisingly structured in the velocity space".
An interesting outcome underlined by the authors (last paragraph) is that the resulting HHMM accounts for 80% of all the data (within 85% confidence interval) using 10 locomotor features (10 HLs). Some of the HL states correspond to clearly distinct behavior, such as 1=meander, 2=stoptowalk or 9=fast right turn, 10=fast left turn. However, other HLs seem to be part of a continuum, such as 4,5,7= medium speed walking, and it is not entirely clear why 3 distinct HLs are needed to describe medium speed walking. Some HLs seem to fit less well the data, e.g. HL 6. I could not find a clear explanation of why 10HLs each with 5LLs were chosen and whether fewer HLs would have worked as well. What would happen if the HHMM had less HL nodes but more LL nodes in each HL? A discussion of survival times in each HLs might help sort this out and provide a way to interpret states 4,5,7. Are these different behaviors or one behavior distributed over 3 HL?
The explanation of why we chose the model was in subsection “Model selection” of the Materials and methods. We have expanded this section to address the reviewer’s comments. To address the reviewer’s specific questions: The main reason for the choice of 10 HL and 5 LL states was repeatability.
We tried different combination of HL and LL states from 616 for HL states and 46 for LL states. There were some models with as few as 6 HL states that worked well. Models with 16 HL states rarely used more than 12 HL states. So, models with 612 HL states could work. However, multiple runs with 6 or 8 states gave widely different results, therefore we decided to use 10 states. With respect to the LL states, 46 states yielded similar results. If more than 6 LL states were chosen, many LL states remained unused.
We think that the range 612 is a fairly narrow range. One important point is that within this narrow range of states, the state duration in all the models we tried were very similar. The insensitivity of state duration made us confident that the structure we discovered is real.
A detailed analysis of the duration of the HL state seems consistent with the reviewer’s intuition that it would make more sense to combine those states 47 into a single state because these states have a shorter duration. Does reducing the number of HL state combine the three medium speed HL states into one state? The answer is largely a no (see Author response image 1). Although, the number of middle speed state (labeled in bold) decreases, even models with 8 HL state devote three separate states to medium speed locomotion. More importantly, the number of middle speed states stabilize to 4 states as the number of HL states is increased to 12.
One important limitation of our modelling approach (which we only realized in hindsight) is that it is difficult to compare across models because everything changes slightly– LL states change, HL states change. We are currently revising the model to keep the LL states fixed so that the modeling approach would mostly focus on how the HL select the LL states. This process will make it much easier to compare across models. Unfortunately, this change is nontrivial and beyond the scope of this manuscript.
Subsection “Locomotor features and implications for neural control of behavior”.
In the discussion the authors say that v_parallel lies within a narrow range that is distinct for each three state and that 3 states reflect a tight control on locomotion by the brain. They mention these locomotor state can persist up to 3 second/30 steps and suggest that this indicates that the brain controls locomotion in blocks of 10 steps. I find this conclusion drawn from the finding of these 3 clusters too speculative given the data: In Figure 3—figure supplement 1 and Figure 4 the distributions of v_par overlap significantly between HL 4,5 and 7. This is again related to the two points above. Finally, if HL 4 5 7 are really distinct behaviors then another possibility could be that they are needed to account for flytofly variability in the data, and that for a single fly only one state is sufficient to describe medium speed walk.
Yes, we have changed the discussion around this claim.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Significant additional revisions are still required. The comments of the expert reviewers, below, are detailed and require full responses.
Reviewer #2:
[…] Two points need to be addressed before acceptance:
1) How do the priors used in fitting constrain the structure of the transition matrices found for the models? In the Results section, the authors state that "the assumption of Markov dynamics with a sparse prior on state transitions penalizes the consideration of unlikely state transitions base upon recent history and future destinations." This suggests that a sparse prior was applied in fitting. However, later in the paper the authors state: "The transition probability matrix for the HMM was sparse, suggesting that from each state there are transitions to only a handful of other states." If a sparse prior was applied during fitting than the fact that the resulting transition matrix is sparse is not really a finding about the data.
Similarly, the authors need to clarify what fitting priors or constraints were placed in the HHMM model. The Results state "This is because any HHMM which puts very specific constraints on the transitions probability matrix can be represented by an HMM but not viceversa." What are the constraints on the transition probability matrix for the HHMM? The text says "The transition probability matrix is sparse a vast majority of transitions from each HL state were to 23 other HL states." Was this imposed by fitting priors?
We have made the priors that we use more explicit (see subsection “Posthoc analyses based on the HHMM model”). The description in subsection “Rationale for the choice of HHMM as the model and the model architecture” only refers to the fact that selftransitions are favored over transition to another state. The priors for all transitions from i^{th} – j^{th} state were the same.
With respect to HHMM versus HMM, this is already welldescribed in the manuscript. (“Rationale for the choice of HHMM as the model and the model architecture” and “Generation of synthetic flies”). Essentially, an HMM with 50 states allows transitions between each of the 50 states. HHMM with 10 HL states and 5 LL states only allows a fraction of these transitions because only LL states within each HL state can transition amongst each other. This property also means that HHMM can always be represented by HMM and not viceversa.
2) Parts of the manuscript are somewhat long and repetitive. I think the manuscript could be productively shortened to have greater impact on its readers.
For example, in the section titled "HHMM reveals that a fly's locomotion is surprisingly structured in the velocity space" the last paragraph is devoted to restating the major conclusion of the first section: HHMM performs better than HMM. I think this could be reduced or folded into section one, which focusses on this comparison, and the section focused more exclusively on describing the HL states uncovered by the model.
In the Discussion subsection “Flies show considerable variability in locomotion despite employing the same locomotor features” paragraphs two and three are highly descriptive of the individual fly data and somewhat repetitive with parts of the Results subsection “Both locomotion and an odor’s effect on locomotion is fly dependent” (already rather long). I think these sections could be condensed to make the major points about individual variability in the Results, and use the Discussion mostly to compare these findings with the results of other studies.
We agree with the reviewer. We have not only made the two specific changes suggested above. We have also taken a careful look at the manuscript, and by cutting repeated segments and tightening the writing, we have reduced the word count (except for the Materials and methods) by ~2000 words.
Reviewer #3:
The authors have addressed most of my concerns expect the important one regarding the confidence interval on the distributions in Figure 5. What I am asking is for the authors to use bootstrapping to extract many sample distributions from subsets of the data in order to get a confidence interval on how these bins are populated by the data and how significant the changes are between before and during odor exposure.
Apologies for not fully understanding your query. We have performed the bootstrapping. Confidence intervals and significance are noted in Figure 5.
https://doi.org/10.7554/eLife.41235.031Article and author information
Author details
Funding
National Institute of Neurological Disorders and Stroke
 Vikas Bhandawat
National Institute on Deafness and Other Communication Disorders
 Vikas Bhandawat
National Science Foundation
 Vikas Bhandawat
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We acknowledge the members of Bhandawat lab and Gaby Maimon for critical comments on earlier versions of the manuscript. This research was supported by NIDCD (VB), NINDS (VB) and an NSF CAREER award (VB).
Senior Editor
 K VijayRaghavan, National Centre for Biological Sciences, Tata Institute of Fundamental Research, India
Reviewing Editor
 Ronald L Calabrese, Emory University, United States
Publication history
 Received: August 21, 2018
 Accepted: January 5, 2019
 Accepted Manuscript published: January 8, 2019 (version 1)
 Version of Record published: February 4, 2019 (version 2)
 Version of Record updated: March 15, 2019 (version 3)
Copyright
© 2019, Tao et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.