1. Neuroscience
Download icon

Comprehensive machine learning analysis of Hydra behavior reveals a stable basal behavioral repertoire

  1. Shuting Han  Is a corresponding author
  2. Ekaterina Taralova
  3. Christophe Dupre
  4. Rafael Yuste
  1. Columbia University, United States
Research Article
  • Cited 1
  • Views 3,506
  • Annotations
Cite as: eLife 2018;7:e32605 doi: 10.7554/eLife.32605

Abstract

Animal behavior has been studied for centuries, but few efficient methods are available to automatically identify and classify it. Quantitative behavioral studies have been hindered by the subjective and imprecise nature of human observation, and the slow speed of annotating behavioral data. Here, we developed an automatic behavior analysis pipeline for the cnidarian Hydra vulgaris using machine learning. We imaged freely behaving Hydra, extracted motion and shape features from the videos, and constructed a dictionary of visual features to classify pre-defined behaviors. We also identified unannotated behaviors with unsupervised methods. Using this analysis pipeline, we quantified 6 basic behaviors and found surprisingly similar behavior statistics across animals within the same species, regardless of experimental conditions. Our analysis indicates that the fundamental behavioral repertoire of Hydra is stable. This robustness could reflect a homeostatic neural control of "housekeeping" behaviors which could have been already present in the earliest nervous systems.

https://doi.org/10.7554/eLife.32605.001

eLife digest

How do animals control their behavior? Scientists have been trying to answer this question for over 2,000 years, and many studies have analysed specific behaviors in different animals. However, most of these studies have traditionally relied on human observers to recognise and classify different behaviors such as movement, rest, grooming or feeding. This approach is subject to human error and bias, and is also very time consuming. Because of this, reseachers normally only study one particular behavior, in a piecemeal fashion. But to capture all the different actions an animal generates, faster, more objective methods of systematically classifying and quantifying behavior would be ideal.

One promising opportunity comes from studying a small freshwater organism called Hydra, one of the most primitive animals with a nervous system. Thanks to Hydra’s transparent body, modern imaging techniques can be used to observe the activity of their whole nervous system all at once, while the animal is engaged in different actions. However, to realise this potential, scientists need a quick way of automatically recognising different Hydra behaviors, such as contracting, bending, tentacle swaying, feeding or somersaulting. This is particularly difficult because Hydra’s bodies can change shape in different situations.

To address this, Han et al. borrowed cutting-edge techniques from the field of computer vision to create a computer program that could automatically analyse hours of videos of freely-moving Hydra and classify their behavior automatically. The computer algorithms can learn how to recognise different behaviors in two ways: by learning from examples already classified by humans (known as ‘supervised learning’) or by letting it pick out different patterns by itself (known as ‘unsupervised learning’). The program was able to identify all the behaviors previously classified by humans, as well as new types that had been missed by human observation.

Using this new computer program, Han et al. discovered that Hydra’s collection of six basic behaviors stays essentially the same under different environmental conditions, such as light or darkness. One possible explanation for this is that its nervous system adapts to the environment to maintain a basic set of actions it needs for survival, although another possibility is that Hydra just does not care and goes along with its basic behaviors, regardless of the environment. Han et al.’s new method is useful not only for classifying all behavioral responses in Hydra, but could potentially be adapted to study all the behaviors in other animal species. This would allow scientists to systematically perform experiments to understand how the nervous system controls all animal behavior, a goal that it is the holy grail of neuroscience.

https://doi.org/10.7554/eLife.32605.002

Introduction

Animal behavior is generally characterized by an enormous variability in posture and the motion of different body parts, even if many complex behaviors can be reduced to sequences of simple stereotypical movements (Berman et al., 2014; Branson et al., 2009; Gallagher et al., 2013; Srivastava et al., 2009; Wiltschko et al., 2015; Yamamoto and Koganezawa, 2013). As a way to systematic capture this variability and compositionality, quantitative behavior recognition and measurement methods could provide an important tool for investigating behavioral differences under various conditions using large datasets, allowing for the discovery of behavior features that are beyond the capability of human inspection, and defining a uniform standard for describing behaviors across conditions (Egnor and Branson, 2016). In addition, much remains unknown about how the specific spatiotemporal pattern of activity of the nervous systems integrate external sensory inputs and internal neural network states in order to selectively generate different behavior. Thus, automatic methods to measure and classify behavior quantitatively could allow researchers to indetify potential neural mechanisms by providing a standard measurement of the behavioral output of the nervous system.

Indeed, advances in calcium imaging techniques have enabled the recording of the activity of large neural populations (Chen et al., 2013; Jin et al., 2012; Kralj et al., 2011; St-Pierre et al., 2014; Tian et al., 2009; Yuste and Katz, 1991), including whole brain activity from small organisms such as C. elegans and larval zebrafish (Ahrens et al., 2013; Nguyen et al., 2016; Prevedel et al., 2014). A recent study has demonstrated the cnidarian Hydra can be used as an alternative model to image the complete neural activity during behavior (Dupre and Yuste, 2017). As a cnidarian, Hydra is close to the earliest animals in evolution that had nervous systems. As the output of the nervous system, animal behavior allows individuals to adapt to the environment at a time scale that is much faster than natural selection, and drives the rapid evolution of the nervous system, providing a rich context to study nervous system functions and evolution (Anderson and Perona, 2014). As Hydra's nervous system evolved from that present in the last common ancestor of cnidarians and bilaterians, the behaviors of Hydra could also represent some of the most primitive examples of coordination between a nervous system and non-neuronal cells. This could make Hydra particularly relevant to our understanding of the nervous systems of model organisms such as Caenorhabditis elegans, Drosophila, zebrafish, and mice, as it provides an evolutionary perspective to discern whether neural mechanisms found in those species represent a specialization or are generally conserved. In fact, although Hydra behavior has been study for centuries, it is still unknown whether Hydra possesses complex behaviors such as social interactions and learning, how its behavior changes under environmental, physiological, nutritional or pharmacological manipulations, or what are the underlying neural mechanisms of these potential changes. Having an unbiased and automated behavior recognition and quantification method would therefore enable such studies with large datasets. This could allow high-throughput systematic pharmacological assays, lesion studies, environmental and physiological condition changes in behavior, or alternations under activation of subsets of neurons, testing quantitative models, and linking behavior outputs with the underlying neural activity patterns.

Hydra behavior was first described by Trembley (1744), and it consists of both spontaneous and stimulus-evoked movements. Spontaneous behaviors include contraction (Passano and McCullough, 1964) and locomotion such as somersaulting and inchworming (Mackie, 1974), and can sometimes be induced by mechanical stimuli or light. Food-associated stimuli induce a stereotypical feeding response that consists of three distinct stages: tentacle writhing, tentacle ball formation and mouth opening (Koizumi et al., 1983; Lenhoff, 1968). This elaborate reflex-like behavior is fundamental to the survival of Hydra and sensitive to its needs: well-fed animals do not appear to show feeding behavior when exposed to a food stimulus (Lenhoff and Loomis, 1961). In addition, feeding behavior can be robustly induced by small molecules such as glutathione and S-methyl-glutathione (GSM) (Lenhoff and Lenhoff, 1986). Besides these relatively complex behaviors, Hydra also exhibits simpler behaviors with different amplitudes and in different body regions, such as bending, individual tentacle movement, and radial and longitudinal contractions. These simpler behaviors can be oscillatory and occur in an overlapping fashion and are often hard to describe in a quantitative manner. This, in turn, makes complex behaviors such as social or learning behaviors, which can be considered as sequences of simple behaviors, hard to quantitatively define. Indeed, to manually annotate behaviors in videos that are hours or days long is not only extremely time-consuming, but also partly subjective and imprecise (Anderson and Perona, 2014). However, analyzing large datasets of behaviors is necessary to systematically study behaviors across individuals in a long-term fashion. Recently, computational methods have been developed to define and recognize some behaviors of C. elegans (Brown et al., 2013; Stephens et al., 2008) and Drosophila (Berman et al., 2014; Johnson et al., 2016). These pioneer studies identify the movements of animals by generating a series of posture templates and decomposing the animal posture at each time points with these standard templates. This general framework works well for animals with relatively fixed shapes. However, Hydra has a highly deformable body shape that contracts, bends and elongates in a continuous and non-isometric manner, and the same behavior can occur at various body postures. Moreover, Hydra has different numbers of tentacles and buds across individuals, which presents further challenges for applying template-based methods. Therefore, a method that encodes behavior information in a statistical rather than an explicit manner is desirable.

As a potential solution to this challenge, the field of computer vision has recently developed algorithms for deformable human body recognition and action classification. Human actions have large variations based on the individual’s appearance, speed, the strength of the action, background, illumination, etc. (Wang et al., 2011). To recognize the same action across conditions, features from different videos need to be represented in a unified way. In particular, the Bag-of-Words model (BoW model) (Matikainen et al., 2009; Sun et al., 2009; Venegas-Barrera and Manjarrez, 2011; Wang et al., 2011) has become a standard method for computer vision, as it is a video representation approach that captures the general statistics of image features in videos by treating videos as ‘bags’ of those features. This enables to generalize behavior features in a dataset that is rich with widely varied individual-specific characteristics. The BoW model originated from document classification and spam-detection algorithms, where a text is represented by an empirical distribution of its words. To analyze videos of moving scenes, the BoW model has two steps: feature representation and codebook representation. In the first step, features (i.e. ‘words’ such as movements and shapes) are extracted and unified into descriptor representations. In the second step, these higher order descriptors from multiple samples are clustered (i.e. movement motifs), usually by k-means algorithms, and then averaged descriptors from each cluster are defined as ‘codewords’ that form a large codebook. This codebook in principle contains representative descriptors of all the different movements of the animal. Therefore, each clip of the video can be represented as a histogram over all codewords in the codebook. These histogram representations can be then used to train classifiers such as SVMs, or as inputs to various clustering algorithms, supervised or unsupervised, to identify and quantify behavior types. BoW produces an abstract representation compared to manually specified features, and effectively leverages the salient statistics of the data, enabling modeling of large populations. Doing so on a large scale with manually selected features is not practical. The power of such a generalization makes the BoW framework particularly well suited for addressing the challenge of quantifying Hydra behavior.

Inspired by previous work on C. elegans (Brown et al., 2013; Kato et al., 2015; Stephens et al., 2008) and Drosophila (Berman et al., 2014; Johnson et al., 2016; Robie et al., 2017) as well as by progress in computer vision (Wang et al., 2011), we explored the BoW approach, combining computer vision and machine learning techniques, to identify both known and unannotated behavior types in Hydra. To do so, we imaged behaviors from freely moving Hydra, extracted motion and shape features from the videos, and constructed a dictionary of these features. We then trained classifiers to recognize Hydra behavior types with manual annotations, and identified both annotated and unannotated behavior types in the embedding space. We confirmed the performance of the algorithms with manually annotated data and then used the method for a comprehensive survey of Hydra behavior, finding a surprising stability in the expression of six basic behaviors, regardless of the different experimental and environmental conditions. These findings are consistent with the robust behavioral and neural circuit homeostasis found in other invertebrate nervous systems for "housekeeping" functions (Haddad and Marder, 2017).

Results

Capturing the movement and shape statistics of freely moving Hydra

Our goal was to develop a method to characterize the complete behavioral repertoire of Hydra under different laboratory conditions. We collected a Hydra behavior video dataset (Han, 2018a) using a widefield dissecting microscope, allowing Hydra to move freely in a culture dish (Figure 1a). We imaged 53 Hydra specimens at a rate of 5 Hz for 30 min, and we either allowed each of them to behave freely, or induced feeding behavior with glutathione, since feeding could not be observed without the presence of prey (which would have obscured the imaging). From viewing these data, we visually identified eight different behaviors, and manually annotated every frame of the entire dataset with the following labels for these eight behavioral states: silent (no apparent motion), elongation, tentacle swaying, body swaying, bending, contraction, somersaulting, and feeding (Figure 1bFigure 1e-l; Videos 17). Overall, we acquired an annotated Hydra behavior dataset with 360,000 fames in total. We noticed that most behaviors in our manual annotation lasted less than 10 s (Figure 1c), and that, within a time window of 5 s, most windows contained only one type of behavior (Figure 1d). A post-hoc comparison of different window sizes (1–20 s) with the complete analysis framework also demonstrated that 5 s windows result in the best performance (Figure 2—figure supplement 1a). Therefore, we chose 5 s as the analysis length of a behavior element in Hydra.

Figure 1 with 1 supplement see all
Acquiring an annotated Hydra behavior dataset.

(a) Imaging Hydra behavior with a widefield dissecting microscope. A Hydra polyp was allowed to move freely in a Petri dish, which was placed on a dark surface under the microscope objective. The light source was placed laterally, creating an bright image of the Hydra polyp on a dark background. (b) Histogram of the eight annotated behavior types in all data sets. (c) Histogram of the duration of annotated behaviors. (d) Histogram of total number of different behavior types in 1 s, 5 s and 10 s time windows. (e–l) Representative images of silent (e), elongation (f), tentacle swaying (g), body swaying (h), bending (i), contraction (j), feeding (k), and somersaulting (l) behaviors.

https://doi.org/10.7554/eLife.32605.003
Video 1
Example of elongation behavior.

The animal was allowed to move freely in a petri dish. The video was taken at 5 Hz, and was accelerated 20 fold.

https://doi.org/10.7554/eLife.32605.005
Video 2
Example of tentacle swaying behavior.

The animal was allowed to move freely in a petri dish. The video was taken at 5 Hz, and was accelerated 20 fold.

https://doi.org/10.7554/eLife.32605.006
Video 3
Example of body swaying behavior.

The animal was allowed to move freely in a petri dish. The video was taken at 5 Hz, and was accelerated 20 fold.

https://doi.org/10.7554/eLife.32605.007
Video 4
Example of bending behavior.

The animal was allowed to move freely in a petri dish. The video was taken at 5 Hz, and was accelerated 20 fold.

https://doi.org/10.7554/eLife.32605.008
Video 5
Example of a contraction burst.

The animal was allowed to move freely in a petri dish. The video was taken at 5 Hz, and was accelerated 20 fold.

https://doi.org/10.7554/eLife.32605.009
Video 6
Example of induced feeding behavior.

The animal was treated with reduced L-glutathione at 45 s. The video was taken at 5 Hz, and was accelerated 20 fold.

https://doi.org/10.7554/eLife.32605.010
Video 7
Example of somersaulting behavior.

The video was taken at 5 Hz, and was accelerated by 20 fold.

https://doi.org/10.7554/eLife.32605.011

Due to the large shape variability of the highly deformable Hydra body during behavior, methods that construct postural eigenmodes from animal postures are not suitable. Therefore, we designed a novel pipeline consisting of four steps: pre-processing, feature extraction, codebook generation, and feature encoding (Han, 2018b) (Figure 2), in line with the BoW framework. Pre-processing was done to exclude the variability in size and rotation angle during imaging, which introduces large variance. To do so, we first defined a behavior element as a 5 s time window, splitting each behavior video into windows accordingly. Then we fitted the body column of Hydra into an ellipse, and centered, rotated, and scaled the ellipse to a uniform template ellipse in each element window. We then encoded spatial information into the BoW framework by segmenting the Hydra area through the videos with an automated program, dividing it into a tentacle region, an upper body region, and a lower body region (Materials and methods; Video 8).

Figure 2 with 1 supplement see all
Analysis pipeline.

Videos of freely moving Hydra polyps were collected (1), then, Hydra images were segmented from background, and the body column was fit to an ellipse. Each time window was then centered and registered, and the Hydra region was separated into three separate body parts: tentacles, upper body column, and lower body column (2). Interest points were then detected and tracked through each time window, and HOF, HOG and MBH features were extracted from local video patches of interest points. Gaussian mixture codebooks were then generated for each features subtype (4), and Fisher vectors were calculated using the codebooks (5). Supervised learning using SVM (6), or unsupervised learning using t-SNE embedding (7) was performed using Fisher vector representations.

https://doi.org/10.7554/eLife.32605.012
Video 8
Example of the output of body part segmentation.

White represents tentacle region, yellow represents upper body column region, and red represents lower body column region.

https://doi.org/10.7554/eLife.32605.014

After this encoding, in a feature extraction step we applied a dense trajectory method in each 5 s window element (Wang et al., 2011). This dense trajectory method represents video patches by several shape and motion descriptors, including a Histogram of Oriented Gradient (HOG) (Dalal and Triggs, 2005), which is based on edge properties in the image patch; and a Histogram of Optical Flow (HOF) as well as a Motion Boundary Histogram (MBH) (Dalal et al., 2006), based on motion properties. With the dense trajectory method, we first detected and tracked points with prominent features throughout the videos. Then, for each feature point, we analyzed a small surrounding local patch and computed the motion and shape information therein represented by HOF, HOG and MBH descriptors (Video 9). Thus, each video window element was captured as motion and shape descriptors associated with a set of local video patches with distinguished visual features.

Video 9
Examples of detected interest points (red) and dense trajectories (green) in tentacle swaying (left), elongation (middle left), body swaying (middle right), and contraction (right) behaviors in 2 s video clips.

Upper panels show the original videos; lower panels show the detected features.

https://doi.org/10.7554/eLife.32605.015

To quantize the ‘bags’ of features from each element time window, we collected a uniform feature codebook using all the dense trajectory features. Intuitively, the elements in the codebook are the representative features for each type of motion or shape in a local patch, therefore they can be regarded as standard entries in a dictionary. Here, we generate the codebook in a ‘soft’ manner, where the codebook contains information of the centroid of clusters and their shape. We fitted the features with k Gaussian mixtures. Because each Gaussian is characterized not only by its mean, but also by its variance, we preserved more information than with other ‘hard’ methods like k-means. The next step was to encode the features with the codebook. For this, ‘hard’ methods where one encodes the features by assigning each feature vector to its nearest Gaussian mixture, lose information concerning the shapes of the Gaussians. To avoid this, we encoded the features using Fisher vectors, which describe the distance between features and the Gaussian mixture codebook entries in a probabilistic way, encoding both the number of occurrence and the distribution of the descriptors (Perronnin et al., 2010) (Figure 2—figure supplement 1b). Since each element window was split into tentacle, upper body and lower body region, we were able to integrate spatial information by encoding the features in each of the three body regions separately (Figure 2—figure supplement 1b). Finally, we represented the behavior in each element window by the concatenated Fisher vector from the three regions.

Hydra behavior classified from video statistics

Like all animals, Hydra exhibits behaviors at various time scales. Basic behaviors such as elongation and bending are usually long and temporally uniform, while tentacle swaying, body swaying and contraction are usually short and executed in a burst-like manner. Feeding and somersaulting are more complex behaviors that can be broken down into short behavior motifs (Videos 67) (Lenhoff and Loomis, 1961). Feeding is apparently a stepwise, fixed action pattern-like uniform behavior, with smooth transitions between tentacle writhing, ball formation, and mouth opening (Video 6). Somersaulting represents another fixed action pattern-like behavior and typically consists of a sequence of basic behaviors with elongation accompanied by tentacle movements, contraction, bending, contraction, elongation, and contraction; completing the entire sequence takes a few minutes (Video 7). The time spent during each step and the exact way each step is executed vary between animals. Thus, to study Hydra behavior, it is essential to accurately recognize the basic behavior types that comprise these complex activities.

We aimed to capture basic behaviors including silent, elongation, tentacle swaying, body swaying, bending, contraction, and feeding, using the Fisher vector features that encode the video statistics. These features were extracted from 5 s element windows and exhibited stronger similarity within the same behavior type, but were distinguished from features of different behavior types (Figure 3a). We then trained support vector machine (SVM) classifiers with manual labels on data from 50 Hydra, and tested them on a random 10% withheld validation dataset. We evaluated classification performance via the standard receiver operating characteristic (ROC) curve and area under curve (AUC). In addition, we calculated three standard measurements from the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN): accuracy, defined as (TP + TN)/(TP + TN + FP + FN); precision, defined as TP/(TP + FP); and recall, defined as TP/(TP + FN). We achieved perfect training performance (AUC = 1, accuracy 100%), while on the validation data the overall accuracy was 86.8%, and mean AUC was 0.97 (Figure 3b and c; Table 1). This classification framework was easily generalized to new data. With data from three Hydra that were not involved in either codebook generation or classifier training, we extracted and encoded features using the generated codebook, and achieved classification accuracy of 90.3% for silent (AUC = 0.95), 87.9% for elongation (AUC = 0.91), 71.9% for tentacle swaying (AUC = 0.76), 83.4% for body swaying (AUC = 0.75), 93.9% for bending (AUC = 0.81) and 92.8% for contraction (AUC = 0.92). All the classifiers achieved significantly better performance than chance levels (Figure 3b, c and d; Table 1; Video 10). Interestingly, the variability in classifier performance with new data matched human annotator variability (Figure 1—figure supplement 1). This demonstrates that the codebook generated from training data efficiently captured Hydra behaviors and that trained classifiers can robustly identify the basic behaviors of Hydra and predict their occurrence automatically from the data.

SVM classifiers recognize pre-defined Hydra behavior types.

(a) Pairwise Euclidean similarity matrix of extracted Fisher vectors. Similarity values are indicated by color code. (b) Confusion matrices of trained classifiers predicting training, validation, and test data. Each column of the matrix represents the number in a predicted class; each row represents the number in a true class. Numbers are color coded as color bar indicates. (Training: n = 50, randomly selected 90% samples; validation: n = 50, randomly selected 10% samples; test: n = 3) (c) ROC curves of trained classifiers predicting training, validation and test data. TPR, true positive rate; FPR, false positive rate. Dashed lines represent chance level. (d) An example of predicted ethogram using the trained classifiers. (e) Three examples of SVM classification of somersaulting behaviors. Dashed boxes indicate the core bending and flipping events.

https://doi.org/10.7554/eLife.32605.016
Table 1
SVM statistics. AUC: area under curve; Acc: accuracy; Prc: precision; Rec: recall.
https://doi.org/10.7554/eLife.32605.017
BehaviorTrainWithheldTest
AUCAUC chanceAccAcc chancePrcRecAUCAUC chanceAccAcc chancePrcRecAUCAUC chanceAccAcc chancePrcRec
Silent10.5100%9.6%100%100%0.980.595.6%9.6%75.6%97.4%0.950.590.3%1.9%18.4%90.3%
Elongation10.5100%14.2%100%100%0.960.593.4%13.6%76.4%95.9%0.910.587.9%22.2%71.4%92.6%
Tentacle sway10.5100%25.1%100%100%0.950.589.6%25.0%77.5%92.4%0.760.571.9%30.2%47.9%76.7%
Body sway10.5100%10.0%100%100%0.920.592.9%9.3%65.7%97.0%0.750.583.4%17.7%52.8%95.4%
Bending10.5100%5.2%100%100%0.980.597.3%6.1%74.4%98.4%0.810.593.9%6.1%38.9%96.5%
Contraction10.5100%6.6%100%100%0.970.595.7%6.9%70.4%97.7%0.920.592.8%11.7%63.2%95.5%
Feeding10.5100%29.2%100%100%10.598.8%29.6%98.5%99.4%0.830.581.0%10.2%39.6%94.1%
Video 10
Example of the trained SVM classifiers predicting new data.
https://doi.org/10.7554/eLife.32605.018

Hydra can exhibit overlapping behaviors at the same time. For example, a Hydra specimen could be moving its tentacles while bending, or swaying its body while elongating. In such cases, it would be imprecise to allow only a single behavior label per time window. To capture this situation, we allowed a ‘soft’ classification strategy, taking up to three highest classification types that have a classifier probability within a twofold difference between them. With joint classifiers, we achieved 86.8% overall accuracy on the validation data (81.6% with hard classification), and 59.0% with new test data (50.1% with hard classification). Soft classification improved classification performance by allowing a realistic situation when Hydra transitions between two behaviors, or executes multiple behaviors simultaneously.

In addition to optimally classifying the seven basic behaviors described above, classifying somersaulting video clips with basic behavior classifiers showed a conserved structure during the progression of this behavior (Figure 3e; Video 11). Somersaulting is a complex behavioral sequence that was not included in the seven visually identified behavior types. This long behavior can typically be decomposed into a sequence of simple behaviors of tentacle swaying, elongation, body swaying, contraction, and elongation. Indeed, in our classification of somersaulting with the seven basic behavior types, we noticed a strong corresponding structure: the classified sequences start with tentacle swaying, elongation, and body swaying, then a sequence of contraction and elongation before a core bending event (Figure 3e); finally, elongation and contraction complete the entire somersaulting behavior. This segmented classification based on breaking down a complex behavior into a sequence of multiple elementary behaviors agrees with human observations, indicating that our method is able to describe combined behaviors using the language of basic behavior types.

Video 11
Example of the trained SVM classifiers predicting somersaulting behavior from a new video.

Soft prediction was allowed here.

https://doi.org/10.7554/eLife.32605.019

Unsupervised discovery of behavior states in embedding space

Manual annotation identifies behavior types on the basis of distinct visual features. However, it is subjective by nature, especially when the Hydra exhibits multiple behaviors simultaneously and can be affected by the individual biases of the annotator. Therefore, to complement the supervised method described above, where classifiers were trained with annotated categories, we sought to perform unsupervised learning to discover the structural features of Hydra behaviors. Since the Fisher vector representation of video statistics is high-dimensional, we applied a nonlinear embedding technique, t-Distributed Stochastic Neighbor Embedding (t-SNE), to reduce the feature vector dimensionality (Berman et al., 2014; Van Der Maaten, 2009). This also allowed us to directly visualize the data structure in two dimensions while preserving the local structures in the data, serving as a method for revealing potential structures of the behavior dataset.

Embedding the feature vectors of training data resulted in a t-SNE map that corresponded well to our manual annotation (Figure 4a). Generating a density map over the embedded data points revealed cluster-like structures in the embedding space (Figure 4b). We segmented the density map into regions with a watershed method, which defined each region as a behavior motif region (Figure 4c and e). We evaluated the embedding results by quantifying the manual labels of data points in each behavior motif region. We then assigned a label to each region based on the majority of the manually labeled behavior types in it. Using this approach, we identified 10 distinct behavior regions in the map (Figure 4d). These regions represented not only the seven types we defined for supervised learning, but also a somersaulting region, and three separate regions representing the three stages of feeding behavior (Figure 4d). Embedding with continuous 5 s time windows, which exclude the effect of the hard boundaries of separating the behavior elements, revealed the same types of behaviors (Figure 4—figure supplement 1).

Figure 4 with 1 supplement see all
t-SNE embedding map of behavior types.

(a) Scatter plot with embedded Fisher vectors. Each dot represents projection from a high-dimensional Fisher vector to its equivalent in the embedding space. Color represents the manual label of each dot. (b) Segmented density map generated from the embedding scatter plot. (c) Behavior motif regions defined using the segmented density map. (d) Labeled behavior regions. Color represents the corresponding behavior type of each region. (e) Percentage of the number of samples in each segmented region. (f) Two examples of embedded behavior density maps from test Hydra polyps that were not involved in generating the codebooks or generating the embedding space. (g) Quantification of manual label distribution in training, validation and test datasets. Dashed boxes highlight the behavior types that were robustly recognized in all the three datasets. Feeding 1, the tentacle writhing or the first stage of feeding behavior; feeding 2, the ball formation or the second stage of feeding behavior; feeding 3, the mouth opening or the last stage of feeding behavior.

https://doi.org/10.7554/eLife.32605.020

The generated embedding space could be used to embed new data points (Berman et al., 2014). We embedded feature vectors from a withheld validation dataset, as well as from three Hydra that were involved neither in generating the feature codebook, nor in the embedding space generation (Figure 4f). Quantitative evaluation of embedding performance with manual labels showed that all behavior types were accurately identified by embedding in the validation data. In test samples, embedding identification of elongation, tentacle sway, body sway, contraction, and the ball formation stage of feeding, all agreed with manual labels (Figure 4g). Therefore, embedding of feature vectors can identify the same behavior types that are identified by human annotation.

Embedding reveals unannotated behaviors in long datasets

We wondered if Hydra has any spontaneous behaviors under natural day/night cycles that were not included in our manually labeled sets. We mimicked natural conditions by imaging a Hydra polyp for 3 days and nights with a 12 hr dark/light cycle (Figure 5a), keeping the Hydra in a 100 µm thick coverslip covered chamber to constrain it within the field of view of the microscope (Figure 5b) (Dupre and Yuste, 2017). This imaging approach, although constraining the movement of Hydra, efficiently reduced the complexity of the resulting motion from a three-dimensional to a two-dimensional projection, while still allowing the Hydra to exhibit a basic repertoire of normal behaviors.

t-SNE embedding reveals unannotated egestion behavior.

(a) Experimental design. A Hydra polyp was imaged for 3 days and nights, with a 12 hr light/12 hr dark cycle. (b) A Hydra polyp was imaged between two glass coverslips separated by a 100 µm spacer. (c) Left: density map of embedded behavior during the 3-day imaging. Right: segmented behavior regions with the density map. Magenta arrow indicates the behavior region with discovered egestion behavior. (d) Identification of egestion behavior using width profile. Width of the Hydra polyp (gray trace) was detected by fitting the body column of the animal to an ellipse, and measuring the minor axis length of the ellipse. The width trace was then filtered by subtracting a 15-minute mean width after each time point from a 15-minute mean width before each time point (black trace). Peaks (red stars) were then detected as estimated time points of egestion events (Materials and methods). (e) Density of detected egestion behaviors in the embedding space. Magenta arrow indicates the high density region that correspond to the egestion region discovered in c.

https://doi.org/10.7554/eLife.32605.022

Using this new dataset, we generated a t-SNE embedding density map from the feature vectors as previously described, and segmented it into behavior motif regions (Figure 5c). Among the resulting 260 motif regions, we not only discovered previously defined behavior types including silent, elongation, bending, tentacle swaying, and contraction, but also found subtypes within certain classes (Videos 1219). In elongation, for example, we found three different subtypes based on the state of the animal: slow elongation during the resting state of the animal, fast elongation after a contraction burst, and inter-contraction elongation during a contraction burst (Videos 1315). In contraction, we found two different subtypes: the initial contraction of a contraction burst, and the subsequent individual contraction events when the animal is in a contracted state (Videos 1819). Interestingly, we also discovered one region in the embedding map that showed a previously unannotated egestion behavior (Figure 5c; Video 20). Egestion behavior (also known as radial contraction) has been observed before (Dupre and Yuste, 2017), and is typically a fast, radial contraction of the body column that happens within 1 s and empties the body cavity of fluid. Although this behavior happens with animals in their natural free movement, its fast time scale and the unconstrained movement make it hard to identify visually during human annotation. In addition, another t-SNE region showed a novel hypostome movement associated with egestion, characterized by a regional pumping-like movement in hypostome and lower tentacle regions (Video 21).

Video 12
Examples from the identified silent region in the embedding space.
https://doi.org/10.7554/eLife.32605.023
Video 13
Examples from the identified slow elongation region in the embedding space.
https://doi.org/10.7554/eLife.32605.024
Video 14
Examples from the identified fast elongation region in the embedding space.
https://doi.org/10.7554/eLife.32605.025
Video 15
Examples from the identified inter-contraction elongation region in the embedding space.
https://doi.org/10.7554/eLife.32605.026
Video 16
Examples from the identified bending region in the embedding space.
https://doi.org/10.7554/eLife.32605.027
Video 17
Examples from the identified tentacle swaying region in the embedding space.
https://doi.org/10.7554/eLife.32605.028
Video 18
Examples from the identified initial contraction region in the embedding space.
https://doi.org/10.7554/eLife.32605.029
Video 19
Examples from the identified contracted contraction region in the embedding space.
https://doi.org/10.7554/eLife.32605.030
Video 20
Examples from the identified egestion region in the embedding space.
https://doi.org/10.7554/eLife.32605.031
Video 21
Examples from the identified hypostome movement region in the embedding space.
https://doi.org/10.7554/eLife.32605.032

We evaluated the reliability of the identification of this newly discovered egestion behavior from the embedding method by detecting egestion with an additional ad-hoc method. We measured the width of the Hydra body column by fitting it to an ellipse, and low-pass filtered the width trace. Peaks in the trace then represent estimated time points of egestion behavior, which is essentially a rapid decrease in the body column width (Figure 5d). Detected egestion time points were densely distributed in the newly discovered egestion region in the embedding map (Figure 5e), confirming that our method is as an efficient way to find novel behavior types.

Basic behavior of Hydra under different experimental conditions

Although basic Hydra behaviors such as contraction, feeding and somersaulting have been described for over two centuries, the quantitative understanding of Hydra behaviors has been limited by the subjective nature of human annotation and by the amount of data that can be processed by manual examination. To build quantitative descriptions that link behaviors to neural processes and to explore behavior characteristics of Hydra, we used our newly developed method to compare the statistics of behavior under various physiological and environmental conditions.

In its natural habitat, Hydra experiences day/night cycles, food fluctuations, temperature variations, and changes in water chemistry. Therefore, we wondered whether Hydra exhibit different behavioral frequencies or behavioral variability under dark and light conditions, as well as in starved and well-fed conditions. Since we did not expect Hydra to exhibit spontaneous feeding behavior in the absence of prey, we only analyzed six basic behavior types using the trained classifiers: silent, elongation, tentacle swaying, body swaying, bending, and contraction. Lighting conditions (light vs. dark) did not result in any significant changes in either the average time spent in each of the six behavior types (Figure 6a) or the individual behavior variability defined by the variation of the percentage of time spent in each behavior in 30 min time windows (Figure 6b). Also, compared with starved Hydra, well-fed Hydra did not show significant changes in the percentage of time spent in elongation behavior (Figure 6c), but showed less variability in it (Figure 6d; starved: 8.95 ± 0.69%, fed: 5.46 ± 0.53%, p=0.0047).

Similar behavior statistics under different conditions but differences across species.

(a) Percentage of time Hydra spent in each behavior, in dark (red to infra-red) and light conditions. Each circle represents data from one individual. The horizontal line represents the average of all samples. Red represents dark condition, blue represents light condition. (ndark = 6, nlight = 7) (b) Standard deviations of behaviors within each individual animal, calculated with separate 30 min time windows in the recording. Each circle represents the behavior variability of one individual. (c) Percentage of time Hydra spent in each behavior, in starved and well-fed condition. (nstarved = 6, nfed = 7) (d) Standard deviations of individual behaviors under starved and well-fed conditions. (e) Percentage of time small and large Hydra spent in each behavior. (nsmall = 10, nlarge = 7). (f) Standard deviations of behaviors of small and large individuals. (g) Percentage of time Hydra vulgaris and Hydra viridissima spent in each behavior type. (nvulgaris = 7, nviridissima = 5). (h) Standard deviations of individual brown and green Hydra. *p<0.05, **p<0.01, Wilcoxon rank-sum test.

https://doi.org/10.7554/eLife.32605.033

As Hydra polyps vary significantly in size depending on the developmental stage (e.g. freshly detached buds vs. fully grown animals,) and nutrition status (e.g. Hydra that has been starved for a week vs. well-fed Hydra), we also explored whether Hydra of different sizes exhibit different behavioral characteristics. For this, we imaged behaviors of Hydra with up to a threefold difference in sizes. Large Hydra polyps had similar silent, body swaying, and contraction patterns, but spent slightly less time in elongation, and more in tentacle swaying (Figure 6e; elongation small: 22.42 ± 1.35%, large: 17.00 ± 0.74%, p=0.0068; tentacle swaying small: 34.24 ± 1.24%, large: 41.06 ± 2.70%, p=0.03). The individual behavior variability remained unchanged (Figure 6f).

Finally, we further inquired if different Hydra species have different behavioral repertoires. To answer this, we compared the behaviors of Hydra vulgaris, and Hydra viridissima, (i.e. green Hydra), which contains symbiotic algae in its endodermal epithelial cells(Martínez et al., 2010). The last common ancestor of these two species was at the base of Hydra radiation. Indeed, we found that Hydra viridissima exhibited statistically less silent and bending behaviors, but more elongations (Figure 6g; elongation vulgaris: 15.74 ± 0.50%, viridissima: 18.63 ± 0.87%, p=0.0303; bending vulgaris: 2.31 ± 0.27%, viridissima: 1.35 ± 0.17%, p=0.0177), while individual viridissima specimens also exhibit slightly different variability in bending (Figure 6h; vulgaris: 2.17% ± 0.26%, viridissima: 1.33 ± 0.20%, p=0.0480). We concluded that different Hydra species can have different basic behavioral repertoires.

Discussion

A machine learning method for quantifying behavior of deformable animals

Interdisciplinary efforts in the emerging field of computational ethology are seeking novel ways to automatically measure and model natural behaviors of animals (Anderson and Perona, 2014) (Berman et al., 2014; Branson et al., 2009; Brown et al., 2013; Creton, 2009; Dankert et al., 2009; Johnson et al., 2016; Kabra et al., 2013; Pérez-Escudero et al., 2014; Robie et al., 2017; Stephens et al., 2008; Swierczek et al., 2011; Wiltschko et al., 2015). Most of these approaches rely on recognizing variation of the shapes of animals based on fitting video data to a standard template of the body of the animal. However, unlike model organisms like worms, flies, fishes and mice, Hydra differs dramatically from these bilaterian organisms in having an extremely deformable and elastic body. Indeed, during contraction, Hydra appears as a ball with all tentacles shortened, while during elongation, Hydra appears as a long and thin column with tentacles relaxed. Moreover, these deformations are non-isometric, that is, different axes, and different parts of the body, change differently. The number of tentacles each Hydra has also varies. These present difficult challenges for recognizing Hydra behaviors using preset templates.

To tackle the problem of measuring behavior in a deformable animal, we developed a novel analysis pipeline using approaches from computer vision that have achieved success in human action classification tasks (Ke et al., 2007; Laptev et al., 2008; Poppe, 2010; Wang et al., 2009; Wang et al., 2011). Such tasks usually involve various actions and observation angles, as well as occlusion and cluttered background. Therefore, they require more robust approaches to capture stationary and motion statistics, compared to using pre-defined template-based features. In particular, the bag-of-words (BoW) framework is an effective approach for extracting visual information from videos of humans or animals with arbitrary motion and deformation. The BoW framework originated from document classification tasks with machine learning. In this framework, documents are considered ‘bags’ of words, and are then represented by a histogram of word counts using a common dictionary. These histogram representations are widely used for classifying document types because of their efficiency. In computer vision, the BoW framework considers pictures or videos as ‘bags’ of visual words, such as small patches in the images, or shape and motion features extracted from such patches. Compared with another popular technique in machine vision, template matching, BoW is more robust against challenges such as occlusion, position, orientation, and viewing angle changes. It also proves to be successful in capturing object features in various scenes, and thus has become one of the most important developments and cutting edge methods in this field. For these reasons, BoW appears ideally suited for the problem behavior recognition tasks of deformable animals, such as Hydra.

We modified the BoW framework by integrating other computational methods, including body part segmentation (which introduces spatial information), dense trajectory features (which encode shape and motion statistics in video patches) and Fisher vectors (which represent visual words in a statistical manner). Our choice of framework and parameters proved to be quite adequate, considering both its training and validation accuracy, as well as its generalizability on test datasets (Figure 2—figure supplement 1). Indeed, the robust correspondence between supervised, unsupervised and manual classification that we report provides internal cross-validation to the validity and applicability of our BoW machine learning approach. Our developed framework, which uses both supervised and unsupervised techniques, is in principle applicable to all organisms, since it does not rely on specific information of Hydra. Compared with previously developed methods, our method would be particularly suitable for behaviors in natural conditions that involve deformable body shapes, as a first step to developing more sophisticated behavioral methods in complex environment for other species.

Our goal was to describe all possible Hydra behavior quantitatively. Because of this, we used the BoW framework to capture the overall statistics with a given time frame. We defined the length of basic behavior elements to be 5 s, which maximizes the number of behaviors that were kept intact while uncontaminated by other behavior types (Figure 1c–d). However, it should be noted that our approach could not capture fine-level behavior differences, for example, single tentacle behavior. This would require modeling the animal with an explicit template, or with anatomical landmarks, as demonstrated by deformable human body modeling with wearable sensors. Our approach also does not recover transition probabilities between behavior types, or behavioral interactions between individual specimens. In fact, since our method treats each time window as an independent ‘bag’ of visual words, there was no constraint on the temporal smoothness of classified behaviors. Classifications were allowed to be temporally noisy, therefore they could not be applied for temporal structure analysis. A few studies have integrated state-space models for modeling both animal and human behavior (Gallagher et al., 2013; Ogale et al., 2007; Wiltschko et al., 2015), while others have used discriminative models such as Conditional Random Field models for activity recognition (Sminchisescu et al., 2006; Wang and Suter, 2007). These methods may provide promising candidates for modeling behavior with temporal structure in combination with our approach (Poppe, 2010).

In our analysis pipeline, we applied both supervised and unsupervised approaches to characterize Hydra behavior. In supervised classifications (with SVM), we manually defined seven types of behaviors, and trained classifiers to infer the label of unknown samples. In unsupervised analysis (t-SNE), we did not pre-define behavior types, but rather let the algorithm discover the structures that were embedded in the behavior data. In addition, we found that unsupervised learning could discover previously unannotated behavior types such as egestion. However, the types of behaviors discovered by unsupervised analysis are limited by the nature of the encoded feature vectors. Since the BoW model provides only a statistical description of videos, those features do not encode fine differences in behaviors. Due to this difference, we did not apply unsupervised learning to analyze behavior statistics under different environmental and physiological conditions, as supervised learning appeared more suitable for applications where one needs to assign a particular label to a new behavior video.

Stability of the basic behavioral repertoire of Hydra

Once we established the reliability or our method, we quantified the differences between six basic behaviors in Hydra under different experimental conditions with two different species of Hydra and found that Hydra vulgaris exhibits essentially the same behavior statistics under dark/light, large/small and starved/fed conditions. Although some small differences were observed among experimental variables, the overall dwell time and variance of the behavioral repertoire of Hydra were unexpectedly very similar in all these different conditions. Although we could not exclude the possibility that there were differences in the transition probabilities between behaviors, our results still show that , from the six basic behaviors analyzed, Hydra possess a surprisingly robust behavioral frequencies and similarities across environmental and physiological conditions, while interspecies differences introduce stronger behavior differences.

Passano and McCullough (1964) reported that Hydra littoralis, a close relative with our Hydra vulgaris AEP strain (Martínez et al., 2010), showed fewer contraction bursts in the evenings and nights than in the day, and feeding every third or fourth day resulted in fewer contraction bursts than was seen with daily feeding. However, they detected contraction bursts by electrical recording of epithelial cell activity, and defined coordinated activity as a contraction event. In our method, we did not measure the number of such events, but instead measured the number of time windows that contain such contractile behavior. This is essentially a measurement of the time spent in contractions instead of frequency of individual events. Using natural light instead of lamp light could also lead to a difference in the observation results. Interestingly, we observed that Hydra vulgaris exhibits different behavior statistics compared with Hydra viridissima. The split leading to Hydra vulgaris and Hydra viridissima is the earliest one in the Hydra phylogenetic tree (Martínez et al., 2010), thus these two species are quite divergent. Hydra viridissima also possesses symbiotic algae, and requires light for normal growth (Lenhoff and Brown, 1970). These differences in genetics and growth conditions could help explaining the observed behavioral differences.

Given the similarity in statistics of basic behaviors in different conditions across different animals within the same species, we naturally wondered if our approach might not be effective or sensitive enough to detect significant behavioral differences among animals. However, the high accuracy of the classification of annotated behavior subtypes (Figure 3) and also the method reproducibility, with small variances when measuring different datasets, rules out the possibility that this machine learning method is insensitive, in which case the results of our behavioral analysis would have been noisy and irreproducible. This conclusion was corroborated by the statistical differences in behavior found across two different Hydra species.

We had originally expected to observe larger variability of behaviors under different experimental conditions and we report essentially the opposite result. We interpret the lack of behavioral differences across individuals as evidence for robust neural control of a basic behavioral pattern, which appears unperturbed by different experimental conditions. While this rigidity may not seem ideal if one assumes that behavior should flexibly adapt to the environment, it is possible that the six behaviors we studied represent a basic ‘house keeping’ repertoire that needs to be conserved for the normal physiology and survival of the animal. Our results are reminiscent of work on the stomatogastric ganglion of crustaceans that has revealed homeostatic mechanisms that enable central pattern generators to function robustly in different environmental conditions, such as changes in temperature (Haddad and Marder, 2017). In fact, in this system, neuropeptides and neuromodulators appear to be flexibly used to enable circuit and behavioral homeostasis (Marder, 2012). Although we do not yet understand the neural mechanisms responsible for the behavioral stability in Hydra, it is interesting to note that the Hydra genome has more than one hundred neuropeptides that could play neuromodulator roles (Chapman et al., 2010; Fujisawa and Hayakawa, 2012). This vast chemical toolbox could be used to supplement a relatively sparse wiring pattern with mechanisms to ensure that the basic behavior necessary for the survival of the animal remains constant under many different environmental conditions. One can imagine that different neuromodulators could alter the biophysical properties of connections in the Hydra nerve net and thus keep a stable operating regime of its neurons in the physiological states.

In addition, a possible reason for the behavioral similarity among different specimens of Hydra could be their genetic similarities. We used animals derived from the same colony (Hydra AEP strain), which was propagated by clonal budding. Thus, it is likely that many of the animals were isogenic, or genetically very similar. The lack of genetic variability, although it does not explain the behavioral robustness, could partly be a reason behind our differences across species, and it would explain a relatively small quantitative variability across animals of our H. vulgaris colony, as opposed to a larger variability in specimens from the wild.

Finally, it is also possible that the behavioral repertoire of cnidarians, which represents some of the simplest nervous systems in evolution in structure and probably also in function, could be particularly simple and hardwired as compared with other metazoans or with bilaterians. From this point of view, the robustness we observed could reflect a ‘passive stability’ where the neural mechanisms are simply unresponsive to the environment, as opposed to a homeostatic ‘active stability’, generated perhaps by neuromodulators. This distinction mirrors the difference between open-loop and closed-loop control systems in engineering (Schiff, 2012). Thus, it would be fascinating to reverse engineer the Hydra nerve net and discern to what extent its control mechanisms are regulated externally. Regardless of the reason for this behavioral stability, our analysis provides a strong baseline for future behavioral analysis of Hydra and for the quantitative analysis of the relation between behavior, neural and non-neuronal cell activity.

Hydra as a model system for investigating neural circuits underlying behavior

Revisiting Hydra as a model system with modern imaging and computational tools to systematically analyze its behavior provides a unique opportunity to image the entire neural network in an organism and decode the relation between neural activity and behaviors (Bosch et al., 2017). With recently established GCaMP6s transgenic Hydra lines (Dupre and Yuste, 2017) and the automated behavior recognition method introduced in this study, it should now be possible to identify the neural networks responsible for each behavior in Hydra under laboratory conditions.

With this method, we demonstrate that we are able to recognize and quantify Hydra behaviors automatically, and to identify novel behavior types. This allows us to investigate the behavioral repertoire stability under different environmental, physiological and genetic conditions, providing insight into how a primitive nervous system adapt to its environment. Although our framework does not currently model temporal information directly, it serves as a stepping-stone toward building more comprehensive models of Hydra behaviors. Future work that incorporates temporal models would allow us to quantify behavior sequences, and to potentially investigate more complicated behaviors in Hydra such as social and learning behaviors.

As a member of the phylum Cnidaria, Hydra is a sister to bilaterians, and its nervous system and bilaterians nervous systems share a common ancestry. As demonstrated by the analysis of its genome (Chapman et al., 2010), Hydra is closer in gene content to the last common ancestor of the bilaterian lineage than some other models systems used in neuroscience research, such as Drosophila and C. elegans. In addition, comparative studies are essential to discern whether the phenomena and mechanisms found when studying one particular species are specialized or general and can thus help illuminate essential principles that apply widely. Moreover, as was found in developmental biology, where the body plan of animals is built using the same logic and molecular toolbox (Nüsslein-Volhard and Wieschaus, 1980), it is possible that the function and structure of neural circuits could also be evolutionarily conserved among animals. Therefore, early diverging metazoans could provide an exciting opportunity to understand the fundamental mechanisms by which nervous systems generate and regulate behaviors.

Materials and methods

Hydra behavior dataset

The Hydra behavior dataset consisted of 53 videos from 53 Hydra with an average length of 30 min. The AEP strain of Hydra was used for all experiments. Hydra polyps were maintained at 18°C in darkness and were fed with Artemia nauplii once or more times a week by standard methods (Lenhoff and Brown, 1970). During imaging, Hydra polyps were placed in a 3.5 cm plastic petri dish under a dissecting microscope (Leica M165) equipped with a sCMOS camera (Hamamatsu ORCA-Flash 4.0). Videos were recorded at 5 Hz. Hydra polyps were allowed to behave either undisturbed, or in the presence with reduced L-glutathione (Sigma-Aldrich, G4251-5G) to induce feeding behavior, since Hydra does not exhibit feeding behavior in the absence of prey.

Manual annotation

Each video in the Hydra behavior dataset was examined manually at a high playback speed, and each frame in the video was assigned a label in the following eleven classes based on the behavior that Hydra was performing: silent, elongation, tentacle swaying, body swaying, bending, contraction, somersaulting, tentacle writhing of feeding, ball formation of feeding, mouth opening of feeding, and a none class. These behaviors were labeled as 1 through 11, where larger numbers correspond to more prominent behaviors, and the none class is labeled as 0. To generate manual labels for a given time window, the top two most frequent labels, L1 and L2, within this time window were identified. The window was assigned as L2 if its count exceed L1 by three-fold and if L1 is more prominent than L2; otherwise, the window was assigned as L1. This annotation method labels time windows as more prominent behaviors if behaviors with large motion, e.g. contraction, happens in only a few frames, while the majority of frames are slow behaviors.

Video pre-processing

Prior work has shown that the bag of words methods for video action classification perform better when encoding spatial structure (Taralova et al., 2011; Wang et al., 2009). Encoding spatial information is especially important in our case because allowing the animal to move freely produces large variations in orientation, which is not related to behavior classification. Therefore, we performed a basic image registration procedure that keeps the motion information invariant, but aligns the Hydra region to a canonical scale and orientation. This involves three steps: background segmentation, registration, and body part segmentation. In brief, the image background was calculated by a morphological opening operation, and the background was removed from the raw image. Then, image contrast was adjusted to enhance tentacle identification. Images were then segmented by clustering the pixel intensity profiles to three clusters corresponding to Hydra body, weak-intensity tentacle regions and background by k-means, and the largest cluster from the result was treated as background, and the other two clusters as foreground, that is Hydra region. Connected components that occupied less than 0.25% of total image area in this binary image were removed as noise, and the resulting Hydra mask was then dilated by three pixels. To detect the body column, the background-removed image was convolved with a small 3-by-3 Gaussian filter with sigma equals one pixel, and the filtered image was thresholded with Otsu’s segmentation algorithm. The binarization was repeated with a new threshold defined with Otsu’s method within the previous above-threshold region, and the resulting binary mask was considered as the body column. The body column region was then fitted with an ellipse; the major axis, centroid, and orientation of the ellipse were noted. To determine the orientation, two small square masks were placed on both ends of the ellipse along the major axis, and the area of the Hydra region excluding the body column under the patch was calculated; the end with the larger area was defined as the tentacle/mouth region, and the end with the smaller area was defined as the foot region. To separate the Hydra region into three body parts, the part under the upper body square mask excluding the body column was defined as the tentacle region, and the rest of the mask was split at the minor axis of the ellipse; the part close to the tentacle region was defined as the upper body region, and the other as the lower body region. This step has shown to improve representation efficiency (Figure 2—figure supplement 1b).

Each 5-s video clip was then centered by calculating the average ellipse centroid position and centering it. The average major axis length and the average orientation were also calculated. Each image in the video clip was rotated according to the average orientation to make the Hydra vertical, and was scaled to make the length of the Hydra body 100 pixels, with an output size of 300 by 300 pixels, while only keeping the region under the Hydra binary mask.

Feature extraction

Video features including HOF, HOG and MBH were extracted using a codebase that was previously released (Wang et al., 2011). Briefly, interest points were densely sampled with five pixels spacing at each time point in each 5 s video clip and were then tracked throughout the video clip with optical flow for 15 frames. The tracking quality threshold was set to 0.01; the minimum variation of trajectory displacement was set to 0.1, the maximum variation was set to 50, and the maximum displacement was set to 50. The neighboring 32 pixels of each interest point were then extracted, and HOF (8 dimensions for eight orientations plus one extra zero bin), HOG (eight dimensions) and MBH (eight dimensions) features were calculated with standard procedures. Note that MBH was calculated for horizontal and vertical optical flow separately, therefore two sets of MBH features, MBHx and MBHy were generated. All features were placed into three groups based on the part of body they fall in, that is tentacles, upper body column, and lower body column. All parameters above were cross-validated with the training and test datasets.

Gaussian mixture codebook and Fisher vector

A Gaussian mixture codebook and Fisher vectors were generated using the code developed by Jegou et al. for each feature type (Jégou et al., 2012), using 50 Hydra in the behavior dataset that includes all behavior types. Features from each body part were centered at zero, then PCA was performed on centered features from all three body parts, keeping half of the original dimension (five for HOF, four for HOG, MBHx and MBHy). Whitening was performed on the PCA data as following, which de-correlates the data and removes redundant information:

xwhite, i=xiλi

where x denotes principal components, and λ denotes eigenvalues. K=256 Gaussian mixtures were then fitted with the whitened data using a subset of 256,000 data points. We then calculated the Fisher vectors as following:

zX=Lλ λ LXλ)

where X={xt, t=1 T} is a set of T data points that were assumed to be generated with Gaussian distributions uλx=i=1Kwiui(x), with λ={wi,μi,σi, i=1,,K} denotes the Gaussian parameters, and Lλ is the decomposed Fisher Information Matrix:

F λ-1Ex~uλλloguλ(x)λloguλxT=LλTLλ

Fisher vectors then represent the normalized gradient vector obtained from Fisher kernel KX,X':

KX,X'= λ L X λ )T Fλ-1 λ L X' λ )=zXTzX

Comparing with hard-assigning each feature to a code word, the Gaussian mixtures can be regarded as probabilistic vocabulary, and Fisher vectors encode information of both the position and the shape of each word with respect to the Gaussian mixtures. Power normalization was then performed on the Fisher vectors to improve the quality of representation:

f(z) = signzzα

with α=0.5, followed by l2 normalization, which removes scale dependence (Perronnin et al., 2010). The final representation of each video clip is a concatenation of Fisher vectors of HOF, HOG, MBHx and MBHy. In this paper, the GMM size was set to 128 with cross-validation (Figure 2—figure supplement 1c).

SVM classification

PCA was first performed on the concatenated Fisher vectors to reduce the dimensions while keeping 90% of the original variance. A random 90% of samples from the 50 training Hydra were selected as training data, and the remaining 10% were withheld as validation data. Another three Hydra that exhibit all behavior types were kept as test data. Because each behavior type has different numbers of data points, we trained SVM classifiers using the libSVM implementation (Chang and Lin, 2011) by assigning each type a weight of wi=(iNi)/Ni, where i=1,,7 denotes the behavior type, and Ni denotes the number of data points that belong to type i. We trained SVM classifiers with a radial basis kernel, allowing probability estimate, and a fivefold cross-validation testing the cost parameter c with a range of log2c-5:2:15, and the g in the kernel function with a range of log2g-5:2:15, where -5:2:15 denotes integers ranging from −5 to 15 with a step of 2. The best parameter combination from cross-validation was chosen to train the SVM classifiers.

To classify test data, features were extracted as above and were encoded with Fisher vectors with the codebook generated from the training data. PCA was performed using the projection matrix from training data. A probability estimate for each behavior type was given by the classifiers, and the final assigned label is the classifier with the highest probability. For soft classifications, we allowed up to three labels for each sample if the second highest label probability is >50% of the highest label, and the third is >50% of the second highest label. To evaluate classification performance, true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) were calculated. Accuracy was defined as Acc=(TP+TN)/(TP+TN+FP+FN); precision was defined as Prc=TP/(TP+FP); recall was defined as Acc=TN/(TN+FP). Two other measurements were calculated: true positive rate TPR=TP/(TP+FN), and false-positive rate FPR=FP/(FP+TN). Plotting TPR against FPR gives the standard ROC curve, and the area under curve (AUC) reflects the performance of classification. In this plot, a straight line TPR = FPR with AUC = 0.5 represents random guess; the upper left quadrant with AUC >0.5 represents better performance than random.

t-SNE embedding

Embedding was performed with the dimension-reduced data. A random 80% of the dataset from the 50 training Hydra were chosen to generate the embedding map, and the remaining 20% were withheld as validation dataset. Three other Hydra were used as test dataset. We followed the procedures of Berman et al. (2014), with a slight modification that uses Euclidean distance as the distance measurement. Embedding perplexity was chosen as 16. To generate a density map, a probability density function was calculated in the embedding space by convolving the embedded points with a Gaussian kernel; σ of the Gaussian was chosen to be 1/40 of the maximum value in the embedding space by cross-validation with human examination to minimize over-segmentation. In the 3-day dataset, σ was chosen to be 1/60 of the maximum value in order to reveal finer structures. To segment the density map, peaks were found in the density map, a binary map containing peak positions was generated, and peak points were dilated by three pixels. A distance map of the binary image was generated and inverted, and the peak positions were set to be minimum. Watershed was performed on the inverted distance map, and the boundaries were defined with the resulting watershed segmentation.

Egestion detection

Estimated egestion time points were calculated by first extracting the width profile of Hydra from the pre-processing step, then filtering the width profile by taking the mean width during 15 min after each time point t, and the mean width during 15 min before time t, and subtracting the former from the latter. Peaks were detected on the resulting trace and were regarded as egestion behaviors, since they represent a sharp decrease in the thickness of the animals.

Behavior experiments

All Hydra used for experiments were fed three times a week and were cultured at 18°C. On non-feeding days, the culture medium was changed. Hydra viridissima was cultured at room temperature under sunlight coming through the laboratory windows. For imaging, animals were placed in a petri dish under the microscope without disturbance to habituate for at least 30 min. Imaging typically started between 7 pm and 9 pm, and ended between 9 am and 11 am except for the large/small experiments. All imagings were done excluding environmental light by putting a black curtain around the microscope. For dark condition, a longpass filter with a cutoff frequency of 650 nm (Thorlabs, FEL0650) was placed at the source light path to create ‘Hydra darkness’ (Passano and McCullough, 1962). For starved condition, Hydra were fed once a week. For the large/small experiment, Hydra buds that were detached from their parents within 3 days were chosen as small Hydra, and mature post-budding mature Hydra polyps were chosen as large Hydra. There was a two- to threefold size difference between small and large Hydra when they were relaxed. However, since the Hydra body was constantly contracting and elongating, it was difficult to measure the exact size. Imaging for this experiment was done during the day time for 1 hr per Hydra.

Statistical analysis

All statistical analyses were done using Wilcoxon rank-sum test unless otherwise indicated. Data is represented by mean ± S.E.M unless otherwise indicated.

Resource availability

The code for the method developed in this paper is available at https://github.com/hanshuting/Hydra_behavior. A copy is archived at https://github.com/elifesciences-publications/hydra_behavior (Han, 2018b). The annotated behavior dataset is available on Academic Commons (dx.doi.org/10.7916/D8WH41ZR).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
    LIBSVM: A library for support vector machines
    1. C-C Chang
    2. C-J Lin
    (2011)
    ACM Transactions on Intelligent Systems and Technology 2:1–27.
    https://doi.org/10.1145/1961189.1961199
  8. 8
  9. 9
  10. 10
  11. 11
    Lecture Notes in Computer Science
    1. N Dalal
    2. B Triggs
    3. C Schmid
    (2006)
    428–441, Human detection using oriented histograms of flow and appearance, Lecture Notes in Computer Science, Springer-Verlag, 10.1007/11744047_33.
  12. 12
    Histograms of oriented gradients for human detection
    1. N Dalal
    2. B Triggs
    (2005)
    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), (IEEE). pp. 886–893.
    https://doi.org/10.1109/CVPR.2005.177
  13. 13
  14. 14
  15. 15
  16. 16
    Peptide signaling in Hydra
    1. T Fujisawa
    2. E Hayakawa
    (2012)
    The International Journal of Developmental Biology 56:543–550.
    https://doi.org/10.1387/ijdb.113477tf
  17. 17
  18. 18
  19. 19
    Hydra behavior dataset
    1. S Han
    (authors) (2018a)
    Columbia Academic Commons.
  20. 20
    hydra_behavior
    1. S Han
    (2018b)
    GitHub.
  21. 21
  22. 22
  23. 23
    Composing graphical models with neural networks for structured representations and fast inference
    1. MJ Johnson
    2. D Duvenaud
    3. AB Wiltschko
    4. SR Datta
    5. RP Adams
    (2016)
    Advances in Neural Information Processing Systems 29:514–521.
  24. 24
  25. 25
  26. 26
    Spatio-temporal shape and flow correlation for action recognition
    1. Y Ke
    2. R Sukthankar
    3. M Hebert
    (2007)
    In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
    https://doi.org/10.1109/CVPR.2007.383512
  27. 27
    Reaction chain in feeding behavior of hydra : different speeificities of three feeding responses
    1. O Koizumi
    2. Y Haraguchi
    3. A Ohuchida
    (1983)
    Journal of Comparative Physiology. A, Sensory, Neural, and Behavioral Physiology, 10.1007/BF00605293.
  28. 28
  29. 29
    Learning realistic human actions from movies
    1. I Laptev
    2. M Marszalek
    3. C Schmid
    4. B Rozenfeld
    (2008)
    2008 IEEE Conference on Computer Vision and Pattern Recognition, (IEEE). pp. 1–8.
    https://doi.org/10.1109/CVPR.2008.4587756
  30. 30
  31. 31
    The Biology of Hydra: And of Some Other Coelenterates
    1. HM Lenhoff
    2. WF Loomis
    (1961)
    The Biology of Hydra: And of Some Other Coelenterates.
  32. 32
  33. 33
    Hydra and the Birth of Experimental Biology - 1744
    1. SG Lenhoff
    2. HM Lenhoff
    (1986)
    Boxwood Pr.
  34. 34
    Coelenterate Biology: Reviews and New Perspectives
    1. GO Mackie
    (1974)
    Elsevier.
  35. 35
  36. 36
  37. 37
    Trajectons: Action recognition through the motion analysis of tracked features
    1. P Matikainen
    2. M Hebert
    3. R Sukthankar
    (2009)
    2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, (IEEE). pp. 514–521.
  38. 38
  39. 39
  40. 40
    Dynamical Vision
    1. AS Ogale
    2. A Karapurkar
    3. Y Aloimonos
    (2007)
    115–126, View-Invariant Modeling and Recognition of Human Actions Using Grammars, Dynamical Vision, Berlin, Heidelberg, Springer Berlin Heidelberg.
  41. 41
  42. 42
    Co-ordinating systems and behaviour in hydra: i. pacemaker system of the periodic contractions
    1. LM Passano
    2. CB McCullough
    (1964)
    The Journal of Experimental Biology 41:643–664.
  43. 43
    Lecture Notes in Computer Science
    1. F Perronnin
    2. J Sánchez
    3. T Mensink
    (2010)
    143–156, Improving the Fisher Kernel for Large-Scale Image Classification, Lecture Notes in Computer Science, 10.1007/978-3-642-15561-1_11.
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
    Neural Control Engineering: The Emerging Intersection Between Control Theory and Neuroscience
    1. SJ Schiff
    (2012)
    MIT Press.
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
    Hierarchical spatio-temporal context modeling for action recognition
    1. J Sun
    2. X Wu
    3. S Yan
    4. L-F Cheong
    5. T-S Chua
    6. J Li
    (2009)
    2009 IEEE Conference on Computer Vision and Pattern Recognition, (IEEE). pp. 2004–2011.
    https://doi.org/10.1109/CVPR.2009.5206721
  54. 54
  55. 55
    Source constrained clustering
    1. E Taralova
    2. F De la Torre
    3. M Hebert
    (2011)
    2011 International Conference on Computer Vision. pp. 1927–1934.
  56. 56
  57. 57
    Mémoires Pour Servir À l’Histoire D’un Genre De Polypes D’eau Douce, Bras en Forme De Cornes
    1. A Trembley
    (1744)
    A Leide: Chez Jean & Herman Verbeek.
  58. 58
    Learning a parametric embedding by preserving local structure
    1. L Van Der Maaten
    (2009)
    JMLR Proc 5:384–391.
  59. 59
    Visual Categorization with Bags of Keypoints
    1. CS Venegas-Barrera
    2. J Manjarrez
    (2011)
    Revista Mexicana De Biodiversidad 82:179–191.
  60. 60
    Action Recognition by Dense Trajectories
    1. H Wang
    2. A Kl
    3. C Schmid
    4. C-L Liu
    (2011)
    3169–3176, Action Recognition by Dense Trajectories, 10.1109/CVPR.2011.5995407.
  61. 61
    Evaluation of local spatio-temporal features for action recognition
    1. H Wang
    2. MM Ullah
    3. A Klaser
    4. I Laptev
    5. C Schmid
    (2009)
    BMVC 2009 - Br. Mach. Vis. Conf. pp. 124.1–12124.
  62. 62
  63. 63
  64. 64
  65. 65

Decision letter

  1. Ronald L Calabrese
    Reviewing Editor; Emory University, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Comprehensive machine learning analysis of Hydra behavior reveals a stable behavioral repertoire" for consideration by eLife. Your article has been reviewed by three peer reviewers,, one of whom, Ronald Calabrese is a member of our Board of Reviewing Editors and the evaluation has been overseen by Eve Marder as the Senior Editor. The following individual involved in review of your submission has also agreed to reveal his identity: Gordon J Berman.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission

Summary:

This is an interesting manuscript that reports the development a novel behavioral analysis pipeline using approaches from computer vision to assess various natural behaviors compatible with variable observation angles. It extracts visual information from videos of animals with arbitrary motion and deformation. It then integrates a few modern computational methods, including dense trajectory features, Fisher vectors and t-SNE embedding for robust recognition and classification of Hydra behaviors with the bag-of-words framework. The pipeline, which uses both supervised and unsupervised techniques, is suitable for use not only with Hydra, as demonstrated here, but compared with previously developed methods this method is particularly suitable for behaviors in natural conditions that involve animals with deformable body shapes. The pipeline is used to describe behaviors in Hydra and is successful in identifying previously-identified behaviors and novel behaviors not previously identified. The paper then goes on to specify the frequency and variance of these behaviors under a variety of conditions (e.g. fed vs. unfed) and surprisingly found similar behavioral statistics. They conclude that the behavioral repertoire of Hydra is robust which may reflect homeostatic neural principles or a particularly stable ground state of the nervous system. Comparisons with another distantly related Hydra species interestingly reveal some strong differences.

Essential revisions:

The reviewers found that there was a substantial contribution to methodology and behavioral analyses of Hydra in this paper but had concerns that these contributions were obscured by the explanation of the methodology and the presentation and interpretation of the behavioral results. There concerns are well summarized in the thorough review discussion. Reviewer #2 commented "I agree with the importance of quantifying Hydra behavior but have two reservations:

1) The choices for their particular pipeline need to be clarified and discussed. This includes Reviewer #3's concern on the 5-second window but extends to other choices in the stream. As machine learning techniques advance it is getting much easier to represent video information with numbers and I expect that as a result we will see many future advances in behavioral representation. However, these representations are often idiosyncratic and so it is important to understand what aspects are universal, or at the very least include a discussion about various choices. I also think it is important to discuss what kind of behavior might be missing in this approach.

2) They need to do a better job of motivating and discussing the important questions that their quantitative behavioral repertoire can answer. This is the science of behavior not simply representation of videos as numbers. And it's here that we can learn how Hydra compares to worms and flies what we might expect to find in the neural recordings."

Reviewer #3 then commented "I think that Reviewer #2 put it well. I would like to see them talk a bit more about:

1) The motivations for their representational choices and, importantly, the consequences and implications that these choices have.

2) How they anticipate using these numbers to answer biological questions. Any measurement representation should have a rational relationship to the questions being asked, so addressing what their method will be useful for (and what it won't be useful for) will be valuable for the literature moving forward. Will this simply be a better detection technique, or does their unsupervised approach allow the field fundamentally new types of behavioral measurements?"

These concerns and the minor issues from the original review that are appended can form the basis of a revision that clarifies the methodology and brings out the behavioral significance revealed by the new methodology.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Comprehensive machine learning analysis of Hydra behavior reveals a stable behavioral repertoire" for further consideration at eLife. Your revised article has been favorably evaluated by Eve Marder (Senior Editor), a Reviewing editor, and three reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

The authors have endeavored to address many of the major concerns of the previous review with further explanations of the method and its underlying assumptions and method choices, and the inherent limitations of the method/analysis. They have also tried to clarify how their method can be used to answer biological questions. There is still one major concern.

1a) The authors should try windows greater than 5 seconds. It's hardly surprising that less than five seconds is less effective, but why not 8, 10, 20? Just saying "we noticed 5 seconds is a reasonable length to define a single behavior" is hardly convincing (neither is Figure 1C).

1b) It still remains possible that the highly-fragmented t-SNE representation results form the fact that behaviors are unnecessarily chopped-up by imposing a 5 second window. Problems might occur because the behavior spans a window boundary. The analysis should be performed using a sliding 5-second window rather than separated windows. This may remove some of the observed over-segmentation of the space. There are several methods (including one in Berman, 2014, but others as well) for handling tens to hundreds of millions of data points. Since the space is one of the cruxes of the paper's arguments, and the authors might get better results with the sliding window, it seems somewhat remiss to not attempt this (it would be ~ 24 hours of running time on a machine that can handle a 30,000-point t-SNE). The Barnes-Hut implementation from: https://lvdmaaten.github.io/tsne/ may prove helpful.

https://doi.org/10.7554/eLife.32605.038

Author response

1) The choices for their particular pipeline need to be clarified and discussed. This includes Reviewer #3's concern on the 5-second window but extends to other choices in the stream. As machine learning techniques advance it is getting much easier to represent video information with numbers and I expect that as a result we will see many future advances in behavioral representation. However, these representations are often idiosyncratic and so it is important to understand what aspects are universal, or at the very least include a discussion about various choices. I also think it is important to discuss what kind of behavior might be missing in this approach.

Our goal was to describe all possible Hydra behavior quantitatively. For this purpose, we chose the bag-of-words (BoW) framework, which captures the overall statistics of a dataset with a given time frame and has demonstrated success in deformable human action classification tasks. The BoW framework originated from document classification tasks in the machine learning field. In this framework, documents are considered “bags” of words, and are then represented by a histogram of word counts using a common dictionary. These histogram representations are demonstrated to be efficient for classifying document types. In computer vision, the BoW framework considers instead pictures or videos as “bags” of visual words such as small patches in the images, or shape and motion features extracted from such patches. Compared with another popular technique, template matching, BoW is more robust against challenges such as occlusion, position, orientation, and viewing angle changes. It also proves to be successful in capturing object features in various scenes and is thus one of the most important concepts and cutting-edge techniques in this field. For behavior recognition tasks of deformable animals, BoW is therefore ideally suited for the problem.

In the BoW framework, we made the choices of segmenting the Hydra from background, scale and register Hydra to eliminate variance introduced by size and orientation, using dense trajectories features that captures both shape and motion statistics, using Gaussian Mixture codebooks and Fisher vectors to encode the features in a probabilistic way, classifying Hydra behaviors with standard SVM classifiers, and identifying behavior types with the t-SNE embedding which has demonstrated success in fly behavior analysis in an unsupervised way. Although these choices may seem arbitrary, they are anchored on the structure of the data and task at hand, as we explain below. Our developed framework is a modified version of the original BoW framework, which is simply a normalized histogram representation of selected visual features. This modification includes the key steps of 3 body part segmentation, dense trajectory features, and Fisher vector encoding. We also compared the supervised classification performance of histogram representation vs. Fisher vector representation, the effect of introducing body part segmentation of 3 and 6 segments (Figure 2—figure supplement 1B), different time window sizes (Figure 2—figure supplement 1A), and different Gaussian Mixture codebook sizes (Figure 2—figure supplement 1C). Our choice of framework and parameters proves to be quite ideal considering both training and validation accuracy, as well as generalizability on test datasets. Although it is early to say if BoW will be adopted by computational ethologists and neuroscientists, our developed framework is also in principle universal to all organisms since it does not rely on specific information of Hydra, presenting the stepping stone to developing more sophisticated behavioral methods.

To come clean, as a limitation we should mention that our framework is constrained by the lack of temporal information, which is lost in the bag-of-words approach. Nevertheless, we show that we can still encode Hydra behavior even when we do not model the temporal information explicitly. BoW also does not model fine behaviors on the level of single tentacle twitching, or local muscle twitching in body column. This would require an explicit model of the Hydra body, instead of the statistical bag-of-words model. Depending on the specific biological question, more specialized method could be developed in the future to investigate these behavior differences.

Revision:

We expanded a paragraph in subsection “A machine learning method for quantifying behavior of deformable animals”, to discuss the general choice of BoW for our behavior recognition task:

“To tackle the problem of measuring behavior in a deformable animal, we developed a novel analysis pipeline using approaches from computer vision that have achieved success in human action classification tasks (Ke et al., 2007; Laptev et al., 2008; Poppe, 2010; Wang et al., 2009, 2011). Such tasks usually involve various actions and observation angles, as well as occlusion and cluttered background. Therefore, they require more robust approaches to capture stationary and motion statistics, compared to using pre-defined template-based features. In particular, the bag-of-words (BoW) framework is an effective approach for extracting visual information from videos of animals with arbitrary motion and deformation. The BoW framework originated from document classification tasks with machine learning. In this framework, documents are considered “bags” of words, and are then represented by a histogram of word counts using a common dictionary. These histogram representations are demonstrated to be efficient for classifying document types. In computer vision, the BoW framework considers pictures or videos as “bags” of visual words such as small patches in the images, or shape and motion features extracted from such patches. Compared with another popular technique, template matching, it is robust against challenges such as occlusion, position, orientation, and viewing angle changes. It also proves to be successful in capturing object features in various scenes and is thus one of the most important concepts and cutting edge techniques in this field. For behavior recognition tasks of deformable animals, it is therefore ideally suited for the problem.”

We modified the following paragraph to discuss the specific modifications we made to the original BoW framework, subsection “A machine learning method for quantifying behavior of deformable animals”:

“We modified the BoW framework by integrating a few state-of-the-art computational methods, including body part segmentation which introduces spatial information, dense trajectory features which encode shape and motion statistics in video patches, Fisher vectors which represent visual words in a statistical manner. Our choice of framework and parameters proves to be quite adequate, considering both its training and validation accuracy, as well as the generalizability on test datasets (Figure 2—figure supplement 1). Indeed, the robust correspondence between supervised, unsupervised and manual classification that we report provides internal cross-validation to the validity and applicability of our machine learning approach. Our developed framework, which uses both supervised and unsupervised techniques, is in principle applicable to all organisms, since it does not rely on specific information of Hydra. Compared with previously developed methods, our method is particularly suitable for behaviors in natural conditions that involve deformable body shapes, as a first step to developing more sophisticated behavioral methods in complex environment for other species.”

We also introduced a paragraph with discussions concerning the potential drawbacks of our method, subsection “A machine learning method for quantifying behavior of deformable animals”:

“Our goal was to describe all possible Hydra behavior quantitatively. Because of this, we chose the bag-of-words framework which captures the overall statistics with a given time frame. We defined the length of basic behavior elements to be 5 seconds, which maximizes the number of behaviors that were kept intact while uncontaminated by other behavior types (Figure 1C–D). The bag-of-words framework has shown success in human action classification tasks; here we improved the basic bag-of-words framework by densely sample feature points in the videos and allowing soft feature quantization with Gaussian Mixture codebook and Fisher vector encoding. However, it should be noted that our approach could not capture fine-level behavior differences, e.g. single tentacle behavior. This would require modeling the animal with an explicit template, or with anatomical landmarks as demonstrated by deformable human body modeling with wearable sensors. Our approach also does not recover transition probabilities between behavior types, or behavioral interactions between individual specimens. In fact, since our method treats each time window as an independent “bag” of visual words, there was no constraint on the temporal smoothness of classified behaviors. Classifications were allowed to be temporally noisy, therefore they could not be applied for temporal structure analysis. A few studies have integrated state-space models for modeling both animal and human behavior (Gallagher et al., 2013; Ogale et al., 2007; Wiltschko et al., 2015), while others have used discriminative models such as Conditional Random Field models for activity recognition (Sminchisescu et al., 2006; Wang and Suter, 2007). These methods may provide promising candidates for modeling behavior with temporal structure in combination with our approach (Poppe, 2010).”

In the Materials and methods section, we added discussions to justify our choice of framework and parameters, as following: subsection “Video pre-processing”:

“… To separate the Hydra region into three body parts, the part under the upper body square mask excluding the body column was defined as the tentacle region, and the rest of the mask was split at the minor axis of the ellipse; the part close to the tentacle region was defined as the upper body region, and the other as the lower body region. This step has shown to improve representation efficiency (Figure 2—figure supplement 1B).”

In subsection “Feature extraction”:

“…All parameters above were cross-validated with the training and test datasets.”

And in subsection “Gaussian mixture codebook and Fisher vector”:

“In this paper, the GMM size was set to 128 with cross-validation (Figure 2—figure supplement 1C).”

Along with the revised text, we provided a supplementary figure (Figure 2—figure supplement 1) to justify our specific choices of framework and parameters.

We believe that these modifications together will make our choice of framework and specific steps stronger and will provide a more comprehensive view of our choices.

2) They need to do a better job of motivating and discussing the important questions that their quantitative behavioral repertoire can answer. This is the science of behavior not simply representation of videos as numbers. And it's here that we can learn how Hydra compares to worms and flies what we might expect to find in the neural recordings."

Thank you for the comments. Quantitative behavior recognition and measurement methods provide an important tool for investigating behavioral differences under various conditions from large datasets, allows the discovery of behavior features that are beyond the capability of human visual system, and defines a uniform standard for describing behaviors across conditions. But beyond the purely ethology questions that such methods could answer, they also allow researchers to address potential neural mechanisms by providing a standard and quantitative measurement of the behavioral output of the nervous system. Both ethological and neuroscience application seem important and our approach is quite well poised for these tasks.

Our method also enables the recognition and quantification of Hydra behaviors in an automated fashion. Because of this, it provides a quantitative and objective tool to characterize the behavior differences of Hydra under pharmacological assays, lesion studies, optogenetic activation of subsets of neurons, or testing the existence of more advanced behaviors such as learning and social behavior. As a proof of concept, it also allows testing quantitative models of behaviors in Hydra, investigating the underlying neural activity patterns of each behavior, and predicting the behavioral output from neural activity. As the first pass of its kind, our method opens the possibility to discovery interesting behavioral mechanisms in Hydra. And why is Hydra interesting? We would argue that, as a cnidarian, Hydra’s nervous system represents one of the earliest nervous systems in evolution. Thus, studying Hydra behavior as the output of this primitive nervous system would provide insight into how the nervous system adapts to the changing environment and further evolves.

Revision:

We modified the first and second paragraphs of the Introduction section to integrate the above discussion:

“Animal behaviors are generally characterized by an enormous variability in posture and the motion of different body parts, even if many complex behaviors can be reduced to sequences of simple stereotypical movements (Berman et al., 2014; Branson et al., 2009; Gallagher et al., 2013; Srivastava et al., 2009; Wiltschko et al., 2015; Yamamoto and Koganezawa, 2013). As a way to systematic capture this variability and compositionality, quantitative behavior recognition and measurement methods could provide an important tool for investigating behavioral differences under various conditions from large datasets, allowing for the discovery of behavior features that are beyond the capability of human visual system, and defining a uniform standard for describing behaviors across conditions (Egnor and Branson, 2016). In addition, much remains unknown about how the specific spatiotemporal pattern of activity of the nervous systems integrate external sensory inputs and internal neural network states in order to selectively generate different behavior. Thus automatic methods to measure and classify behavior quantitatively allow researchers to address potential neural mechanisms by providing a standard measurement of the behavioral output of the nervous system.”

“Indeed, advances in calcium imaging techniques have enabled the recording of large neural populations (Chen et al., 2013; Jin et al., 2012; Kralj et al., 2012; St-Pierre et al., 2014; Tian et al., 2009; Yuste and Katz, 1991) and whole brain activity from small organisms such as C. elegans and larval zebrafish (Ahrens et al., 2013; Nguyen et al., 2016; Prevedel et al., 2014). A recent study has demonstrated the cnidarian Hydra can be used as an alternative model to image the complete neural activity during behavior (Dupre and Yuste, 2017). As a cnidarian, Hydra is closer to the earliest animals in evolution that possess a nervous system. As an important function of the nervous system, animal behaviors allow individuals to adapt to the environment at a time scale that is much faster than natural selection, and drives rapid evolution of the nervous system, providing a rich context to study nervous system functions and evolution (Anderson and Perona, 2014). As Hydra nervous system evolved from the nervous system present in the last common ancestor of cnidarians and bilaterians, the behaviors of Hydra could also represent some of the most primitive examples of coordination between a nervous system and non-neuronal cells, making it relevant to our understanding of the nervous systems of model organisms such as C. elegans, Drosophila, zebrafish, and mice, as it provides an evolutionary perspective to discern whether neural mechanisms found in a particular species represent a specialization or are generally conserved. In fact, although Hydra behavior has been studied for centuries, it is largely unknown whether Hydra possesses complex behaviors such as social behavior and learning behavior, how its behavior changes under environmental, physiological, nutritional or pharmacological manipulations, and the underlying neural mechanisms of the potential changes. Having an unbiased and automated behavior recognition and quantification method would therefore enable such studies with large datasets, allowing us to address the behavioral differences of Hydra with pharmacological assays, lesion studies, environmental and physiological condition changes, under activation of subsets of neurons, testing quantitative models of Hydra behaviors, and linking behavior outputs with the underlying neural activity patterns.”

Reviewer #3 then commented "I think that Reviewer #2 put it well. I would like to see them talk a bit more about:

1) The motivations for their representational choices and, importantly, the consequences and implications that these choices have.

If we have to point to a single reason, we chose the current framework specifically because of its advantage of dealing with deformable shapes. But, as discussed in the response to Reviewer #2’s first question, many specific choices in the framework was made due to additional advantages: segmentation and registration eliminate variance caused by background noise and orientation, body part segmentation introduces spatial information to the BoW framework, dense trajectories features maximizes the information captured by the features, Gaussian Mixture codebook and Fisher vectors avoid the inaccuracy of hard encoding from simple k-means codebook and histogram representations. These modifications to the original BoW framework greatly improved the representation efficiency as shown by an overall increase in classification accuracy and generalization ability (Figure 2—figure supplement 1).

As discussed in the response to Reviewer #2’s first question, one important consequence of our framework is the lack of temporal information, which could be done in further work.

Revision:

We made the changes described in answer to Reviewer #2’s first question.

2) How they anticipate using these numbers to answer biological questions. Any measurement representation should have a rational relationship to the questions being asked, so addressing what their method will be useful for (and what it won't be useful for) will be valuable for the literature moving forward. Will this simply be a better detection technique, or does their unsupervised approach allow the field fundamentally new types of behavioral measurements?

These concerns and the minor issues from the original review that are appended can form the basis of a revision that clarifies the methodology and brings out the behavioral significance revealed by the new methodology.

Thank you for the comments. Our method could be used to recognize and quantify Hydra behaviors from large datasets in an automated and consistent way and would allow us to address questions at the level of behavior repertoire statistics. As discussed in the answer to reviewer #2’s second question, our method provides the possibility to study the behavioral changes of Hydra under various conditions such as pharmacological regulations, lesions, environmental and physiological changes. It also provides the tool to investigate the existence of complex behaviors such as social and learning, as well as building and testing quantitative models of Hydra behaviors. In this paper, we demonstrated that we can use this method to investigate the behavioral difference under different conditions (e.g. fed/starved). Importantly, our developed framework is not limited to Hydra; it is potentially applicable to all animal models since it does not rely on assumptions about the specific features of Hydra. However, as pointed out above, our method does lacks temporal information, therefore could not be used to model any particular behavioral sequences and the differences therein.

In addition, the unsupervised approach depends heavily on the quality of encoded features. Since the BoW model provides only a statistical description of videos, the features do not encode fine differences in behaviors. Therefore, the types of behavior that can be identified and quantified by the unsupervised approach have the same constraints as described in the response to Reviewer #2’s first question.

Revision:

Besides the changes we made in response to Reviewer #2’s second question, we added a paragraph in subsection “Hydra as a model system for investigating neural circuits underlying behavior”, to address the second part of this comment, concerning the usefulness of our method:

“With our method, we demonstrate that we are able to recognize and quantify Hydra behaviors automatically and identify novel behavior types. This allows us to investigate the behavioral repertoire stability under different environmental, physiological and genetic conditions, providing insight into how a primitive nervous system adapt to its environment. Although our framework does not currently model temporal information directly, it serves as a stepping-stone towards building more comprehensive models of Hydra behaviors. Future work that incorporates temporal models would allow us to quantify behavior sequences, and to potentially investigate more complicated behaviors in Hydra such as social and learning behaviors.”

We also revised the last paragraph of subsection “A machine learning method for quantifying behavior of deformable animals”, in response to the question “will this simply be a better detection technique, or does their unsupervised approach allow the field fundamentally new types of behavioral measurements”:

“In our pipeline, we applied both supervised and unsupervised approaches to characterize Hydra behavior. In supervised classifications (with SVM), we manually defined seven types of behaviors, and trained classifiers to infer the label of unknown samples. In unsupervised analysis (t-SNE), we did not pre-define behavior types, but rather let the algorithm discover the structures that were embedded in the behavior data. In addition, we found that unsupervised learning could discover previously unannotated behavior types such as egestion. However, the types of behaviors discovered by unsupervised analysis are limited by the nature of the encoded feature vectors. Since the bag-of-words model provides only a statistical description of videos, those features do not encode fine differences in behaviors. Due to this difference, we did not apply unsupervised learning to analyze the behavior statistics under different environmental and physiological conditions, as supervised learning appears more suitable for applications where one needs to assign a particular label to a new behavior video.”

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The authors have endeavored to address many of the major concerns of the previous review with further explanations of the method and its underlying assumptions and method choices, and the inherent limitations of the method/analysis. They have also tried to clarify how their method can be used to answer biological questions. There is still one major concern.

1a) The authors should try windows greater than 5 seconds. It's hardly surprising that less than five seconds is less effective, but why not 8, 10, 20? Just saying "we noticed 5 seconds is a reasonable length to define a single behavior" is hardly convincing (neither is Figure 1C).

We now tested window size of 8 seconds, 10 seconds and 20 seconds with our developed analysis framework, and compared the training, validation and test classification accuracy. The result shows that 5-second time window still performs best with all the accuracy measurements (Figure2—figure supplement 1A). Therefore, we believe 5-second is a reasonable length to define a single behavior with our method.

Revision:

We modified Figure2—figure supplement 1A to include the classification accuracy of window sizes greater than 5 seconds, and modified the corresponding text (subsection “Capturing the movement and shape statistics of freely-moving Hydra”):

“Our goal was to […] A post hoc comparison of different window sizes (1-20 seconds) with the complete analysis framework also demonstrated that 5-second windows result in the best performance (Figure 2—figure supplement 1A). Therefore, we chose 5-second as the length of a behavior element in Hydra.”

We also modified the corresponding Figure legend.

1b) It still remains possible that the highly-fragmented t-SNE representation results form the fact that behaviors are unnecessarily chopped-up by imposing a 5 second window. Problems might occur because the behavior spans a window boundary. The analysis should be performed using a sliding 5-second window rather than separated windows. This may remove some of the observed over-segmentation of the space. There are several methods (including one in Berman, 2014, but others as well) for handling tens to hundreds of millions of data points. Since the space is one of the cruxes of the paper's arguments, and the authors might get better results with the sliding window, it seems somewhat remiss to not attempt this (it would be ~ 24 hours of running time on a machine that can handle a 30,000-point t-SNE). The Barnes-Hut implementation from: https://lvdmaaten.github.io/tsne/ may prove helpful.

We performed t-SNE with the fast Barnes-Hut implementation on the complete training dataset of 50 Hydra, with a sliding 5-second window. The resulting space did not show improved segmentation (see Author response image 1). It is possible that, by performing sliding windows, the highly-overlapping windows introduce more local structures that represent the similarities within each individual, rather than within each behavior category. As t-SNE is designed for discovering local similarity structures, this results in the highly segmented embedding map as shown below. We believe that the original embedding analysis presented in the manuscript represents a proof-of-concept demonstration of categorizing behavior types with unsupervised methods, without too much bias created by individual similarities.

Author response image 1
t-SNE embedding of continuous time windows.

a, Scatter plot with embedded Fisher vectors from 50 Hydra. Each dot represents projection from a high-dimensional Fisher vector to its equivalent in the embedding space. The Fisher vectors were encoded from continuous 5-second windows with an overlap of 24 frames. Color represents the manual label of each dot. b, Segmented density map generated from the embedding scatter plot. c, Behavior motif regions defined using the segmented density map. d, Labeled behavior regions with manual labels. Color represents the corresponding behavior type of each region.

https://doi.org/10.7554/eLife.32605.035
https://doi.org/10.7554/eLife.32605.039

Article and author information

Author details

  1. Shuting Han

    NeuroTechnology Center, Department of Biological Sciences, Columbia University, New York, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    shuting.han@columbia.edu
    Competing interests
    No competing interests declared
    ORCID icon 0000-0001-9315-3089
  2. Ekaterina Taralova

    NeuroTechnology Center, Department of Biological Sciences, Columbia University, New York, United States
    Contribution
    Resources, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Christophe Dupre

    NeuroTechnology Center, Department of Biological Sciences, Columbia University, New York, United States
    Contribution
    Resources, Data curation, Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon 0000-0002-5929-8492
  4. Rafael Yuste

    NeuroTechnology Center, Department of Biological Sciences, Columbia University, New York, United States
    Contribution
    Conceptualization, Funding acquisition, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-4206-497X

Funding

Defense Advanced Research Projects Agency (HR0011-17-C-0026)

  • Rafael Yuste

Howard Hughes Medical Institute (Howard Hughes Medical Institute International Student Research Fellowship)

  • Shuting Han

Grass Foundation (Grass Fellowship)

  • Christophe Dupre

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Drs. Robert Steele, Charles David, and Adrienne Fairhall for discussions. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-17-C-0026. SH is a Howard Hughes Medical Institute International Student Research fellow. This work was partly supported by the Grass Fellowship (CD) during the summer of 2016, and CD would like to thank the Director, Associate Director, members of the Grass laboratory and Grass Foundation for their generous feedback and support. RY was a Whitman fellow at the Marine Biological Laboratory and this Hydra research was also supported in part by competitive fellowship funds from the H Keffer Hartline, Edward F MacNichol, Jr. Fellowship Fund, and the E E Just Endowed Research Fellowship Fund, Lucy B. Lemann Fellowship Fund, and Frank R. Lillie Fellowship Fund of the Marine Biological Laboratory in Woods Hole, MA. The authors declare no competing financial interests.

Reviewing Editor

  1. Ronald L Calabrese, Emory University, United States

Publication history

  1. Received: October 9, 2017
  2. Accepted: March 23, 2018
  3. Accepted Manuscript published: March 28, 2018 (version 1)
  4. Version of Record published: April 27, 2018 (version 2)

Copyright

© 2018, Han et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,506
    Page views
  • 479
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)