1. Genetics and Genomics
  2. Neuroscience
Download icon

The structure of behavioral variation within a genotype

  1. Zachary Werkhoven
  2. Alyssa Bravin
  3. Kyobi Skutt-Kakaria
  4. Pablo Reimers
  5. Luisa F Pallares
  6. Julien Ayroles
  7. Benjamin L de Bivort  Is a corresponding author
  1. Center for Brain Science and Department of Organismic and Evolutionary Biology, United States
  2. Division of Biology and Biological Engineering, California Institute of Technology, United States
  3. Janelia Research Campus, Howard Hughes Medical Institute, United States
  4. Department of Ecology and Evolutionary Biology and Lewis-Sigler Institute for Integrative Genomics, Princeton University, United States
Research Article
  • Cited 0
  • Views 1,357
  • Annotations
Cite this article as: eLife 2021;10:e64988 doi: 10.7554/eLife.64988

Abstract

Individual animals vary in their behaviors. This is true even when they share the same genotype and were reared in the same environment. Clusters of covarying behaviors constitute behavioral syndromes, and an individual’s position along such axes of covariation is a representation of their personality. Despite these conceptual frameworks, the structure of behavioral covariation within a genotype is essentially uncharacterized and its mechanistic origins unknown. Passing hundreds of inbred Drosophila individuals through an experimental pipeline that captured hundreds of behavioral measures, we found sparse but significant correlations among small sets of behaviors. Thus, the space of behavioral variation has many independent dimensions. Manipulating the physiology of the brain, and specific neural populations, altered specific correlations. We also observed that variation in gene expression can predict an individual’s position on some behavioral axes. This work represents the first steps in understanding the biological mechanisms determining the structure of behavioral variation within a genotype.

Introduction

Individuals display idiosyncratic differences in behavior that often persist through time and are robust to situational context. Some persistent individual behavioral traits commonly occur in correlated groups and can therefore be said to covary. Behavioral ethologists have long understood that types of human and animal personalities often fall on multivariate axes of variation. A five-dimensional model (Goldberg, 1993) known as The Big Five personality traits is frequently used by psychologists to describe the range of human personality, and a similar model has been used to describe personality in fish (Réale et al., 2007). For example, human propensity for behaviors such as assertiveness, talkativeness, and impulsiveness is collectively described as extraversion and is thought to be anticorrelated with behaviors such as passivity, shyness, and deliberateness, all behaviors associated with introversion (Matthews et al., 2003).

In animal species, correlated suites of behaviors are described as behavioral syndromes. Aggressive behaviors such as fighting over mates or food are frequently correlated with exploratory behaviors such as foraging and social interaction. Correlated aggressive and exploratory behaviors have been observed in insects (Jeanson and Weidenmüller, 2014), arachnids (Johnson and Sih, 2007), fish (Huntingford, 1976), and birds (van Oers et al., 2004). The prevalence of behavioral correlation in so many species suggests that covariation is likely a universal feature of behavior. Although correlated individual differences in behavior are commonly observed, the structure and mechanisms of behavioral covariation are not well understood. Typically, where an individual lands on these behavioral axes is thought to be established by a deterministic confluence of genetic and environmental effects. But there is increasing evidence that substantial individual behavioral variation is rooted in intragenotypic variation (Honegger and de Bivort, 2018; Akhund-Zade et al., 2019). The extent to which intragenotypic behavioral variation is organized into syndromes or axes is essentially uncharacterized.

Substantial variation in specific behavioral measures, even in inbred lines raised in standardized conditions, has been observed in several clonal animals, including geckoes (Sakai, 2018), amazonian mollies (Bierbach et al., 2017), and aphids and nematodes (Schuett et al., 2011; Stern et al., 2017). Genetic model systems hold particular promise for the mechanistic dissection of this variation, and intragenotypic variability (IGV) in behavior has been characterized in mice (Freund et al., 2013), zebrafish (Bierbach et al., 2017), and Drosophila. In flies, IGV of many behaviors has been studied, including phototaxis (Kain et al., 2012), locomotor handedness and wing-folding (Buchanan et al., 2015), spontaneous microbehaviors (Kain et al., 2013; Todd et al., 2017), thermal preference (Kain et al., 2015), and object-fixated locomotion (Linneweber et al., 2020). Mechanistic studies of these behavioral phenomena have addressed two major questions: (1) what biological mechanisms underlie the magnitude of behavioral variability (e.g., genetic variation [Ayroles et al., 2015] or neural state variation [Kain et al., 2012; Buchanan et al., 2015]), and (2) what specific differences within individual nervous systems predict individual behavioral biases (Linneweber et al., 2020; Mellert et al., 2016). The mechanistic basis of individuality is an exciting new field, but no study to date has focused on characterizing the large-scale variance-covariance structure of IGV in behavior.

Behavioral correlations within a genotype could arise through a number of biological mechanisms including cell-to-cell variation in gene expression or individual differences in neural circuit wiring or synaptic weights. For example, stochastic variation during developmental critical windows has the potential to impart lasting differences between individuals in the absence of conspicuous genetic or environmental differences. Any such differences affecting nodes common to multiple behaviors in neural or molecular pathways may result in correlated shifts in behavior. Here, we study this directly by focusing on the correlation structure of behavioral variation when genetic and environmental variation are minimized.

This is an important biological question for several reasons. The structure of intragenotypic behavioral variability (1) will shape the distribution and kinds of personalities that a population of organisms displays, even when they have matched genomes and environments, (2) is the product of stochastic biological outcomes, and its organization reveals how stochasticity drives variation, (3) constrains the evolution of behavior and adaptive phenotypic strategies like bet-hedging (Hopper, 1999), (4) is a relatively uncharacterized component of neural diversity and its manifestations in behavior and disease, and (5) may shed light on how the nervous system orchestrates behavior as a whole. In flies, we have a suitable experimental system for directly characterizing this structure as we can produce large numbers of individuals with nearly identical genomes, reared in the same environment, and collect many behavioral measures per individual. We performed this experiment in wild-type inbred flies as well as wild-type outbred flies and collections of transgenic lines manipulating neural activity. This approach let us contrast the structure of intragenotypic behavioral variability in animals where the source of variability is, respectively, stochastic fluctuations, genetic differences + stochastic fluctuations, and systematic perturbations of the nervous system + stochastic fluctuations. We found that in all cases, behavioral variation has high dimensionality, that is, many independent axes of variation. The addition of variation from genetic differences and neural perturbations did not fundamentally alter this qualitative result, suggesting that stochastic fluctuations and genetic differences may structure behavior through common biological mechanisms.

Results

High-dimensional measurement of individual behaviors

The first step in revealing the structure of behavioral variation within a genotype was to devise an experimental pipeline that produced a data set of many (200+) individual flies, with many behavioral measurements each. We developed a number of behavioral assays, measuring both spontaneous and stimulus-evoked responses of individual flies, which could be implemented in a common experimental platform (Figure 1A; Werkhoven et al., 2019). This instrument features an imaging plane, within which flies moved in arenas of various geometries. Fly position was tracked with digital cameras using diffused infrared illumination invisible to the flies. Visual stimuli were presented to the animals using DLP projectors or LEDs embedded in the arena walls. We implemented six assays (Supplementary file 1) in this style, assessing (1) spontaneous walking in circular arenas, (2) preference to rest in brighter or dimmer positions (in an environment of spatially structured illumination), (3) preference to rest in brighter or dimmer light levels (in a fictive, temporally modulated light environment), (4) optomotor responses to rotating visual stripes, (5) spontaneous left-right decision making in Y-mazes, and (6) phototaxis in Y-mazes, where flies are given a choice of walking toward or away from a lit LED (Figure 1B).

Figure 1 with 12 supplements see all
Decathlon experimental design and structure of intragenotypic behavioral variation.

(A) Schematic of the imaging rig used for most Decathlon experiments. (B) Schematics of the behavioral assays, illustrating the geometry of the arenas and stimulus structure. (C) Timeline of the Decathlon experiment. Colors indicate the assays conducted on each day, half-black half-white blocks indicate the circadian assay and storage in 96-well plates. (D) Timelines of the three Decathlon experiments, indicating the randomized order of assays 2–8. (E) Full correlation matrix of all raw behavioral measures taken in the Decathlon. Colored blocks indicate blocks of measures we thought a priori might be correlated (outer blocks, text labels). Inner blocks indicate assay. (F) Example scatter plots associated with measure correlations. Points are individual flies. Line is the best fit (principal component [PC]1 of these points), gray region is the 95% confidence interval of the fit, as determined by bootstrap resampling. (G) Distilled correlation matrix in which all correlated measures represent unexpected relationships. Meaningful correlations in this matrix can be found outside the within-a-priori-group on-diagonal blocks. (H) Example scatter plot from the distilled correlations. Plot elements as in (F). (I) Scree plot of the ranked, normalized eigenvalues, that is, the % variance explained by each PC, of the distilled behavior matrix, versus PC #. (J) Connected components spectrum (see text and Figure 1—figure supplement 10) for the distilled correlation matrix. Height of bars indicates organization at that dimensionality. (K) Points corresponding to individual flies nonlinearly embedded using t-SNE from the 121-dimensional full matrix to two dimensions. (L) Points corresponding to behavioral measures nonlinearly embedded using t-SNE from the 384-dimensional space of flies to two dimensions. Colors indicate groups of measures we expected a priori to be related.

To these assays, we added three more, assessing (7) odor sensitivity in linear chambers (Claridge-Chang et al., 2009; Honegger et al., 2020) in which half of the compartment is filled with an aversive odorant, (8) spontaneous behavior, acquired via high-resolution 100 Hz video and suitable for pixel-based unsupervised classification (Berman et al., 2014), and (9) circadian activity and spontaneous locomotion in 96-well plates with access to food. Each of these assays produced multiple behavioral measures for each individual fly. For example, flies behaving in the phototactic ‘LED Y-maze’ (assay 6) are performing phototaxis and exploratory locomotion but yield several different behavioral measures, including the number of choices made by passing through the choice point of the Y-maze (a measure of total activity), the fraction of turns that are to the right, the fraction of turns that are toward the lit LED, the number of pauses in which the animal did not move, the average duration of pauses, etc. Thus, the total collection of behavioral measures across all assays per fly was quite large (up to 121), constituting a diverse, inclusive characterization of individual behavior. Each assay has a particular measure that captures the behavior it is primarily designed to assess (e.g., the fraction of turns toward the lit LED in the LED Y-maze). In control experiments, we confirmed that these primary measures are consistent across days within an individual (Figure 1—figure supplement 1; i.e., they reflect persistent idiosyncrasies [Kain et al., 2012; Buchanan et al., 2015; Kain et al., 2015; Honegger et al., 2020]).

In order to obtain all of the behavioral measures from each experimental animal, we combined our behavioral assays in a serial experimental pipeline lasting 13 consecutive days (Figure 1C), generally with one unique assay per day and continuous circadian imaging (assay 9) between assays. This pipeline begins with 3-day-old flies being loaded into the 96-well circadian imaging plates. Using a common behavioral platform as much as possible and storing flies between experiments in 96-well plates made maintaining the errorless identity of flies over the whole 13-day experiment substantially easier. Starting on day 3, daily assays began. On each day, flies were lightly anesthetized on an ice-chilled plate and aspirated, maintaining their identity, into the assay arrays. After the assay was completed (typically after 2 hr of recording), flies were again lightly anesthetized and returned to 96-well plates for renewed circadian imaging. On the first such day, flies were loaded into an array of circular arenas and imaged for total activity (in a version of assay 1). At this point, the most active 192 flies were retained for further testing. In preliminary experiments, we found that flies that were inactive at the beginning of the pipeline were very unlikely to produce substantial amounts of data over the rest of the pipeline. With the addition of this activity-screening assay, the total number of experiments was 10, and as each fly ‘competes’ in all 10 events, we refer to the entire pipeline as a Decathlon.

It is possible that the assay order has some effect on the recorded behavioral measures. So we randomized the assay order between Decathlon implementations as much as possible (Figure 1D), subject to two restrictions: activity screening was always the first assay, and high-resolution imaging for unsupervised analysis (assay 8) was always the last assay. (This assay has lower throughput, and 3 days were required to complete all 168 remaining flies. If this assay were performed earlier in the pipeline, it might introduce heterogeneity across subsequent assays.) When each fly completed its run through all Decathlon assays (i.e., over the 3 days of assay 8 imaging), it was flash-frozen in liquid nitrogen for RNA sequencing.

Behavioral variability among inbred flies has high dimensionality and sparse pairwise correlations

To collect data that would reveal the structure of behavioral variation within a genotype, we conducted two Decathlons using highly inbred, nearly isogenic flies derived from the wild-type strain Berlin-K (BSC#8522; Nöthel, 1981). We confirmed that this strain was, indeed, highly isogenic with genomic sequencing of individual animals, finding ~75 SNPs in the population across the entire genome (Figure 1—figure supplement 2). 115 flies completed the first Decathlon, and 176 the second. While we aimed to collect 121 measures per fly, a portion of values were missing, typically because flies did not meet assay-specific activity cutoffs. The sample sizes achieved for each assay and the distributions of all raw measures are given in Figure 1—figure supplement 3. For subsequent analyses, it was sometimes necessary to have a complete data matrix (see Figure 1—figure supplement 4 for a schematic of all analysis pipelines). We infilled missing values using the alternating least-squares method, which, as judged by analyses of toy ground truth data, performed better than mean-infilling (Figure 1—figure supplement 5). For the sake of maximal statistical power, we wanted to merge the data sets from the two Berlin-Kiso Decathlons. The correlation matrices of these two data sets were not identical, but were substantially more similar than expected by chance (Figure 1—figure supplement 6), implying that while there were inter-Decathlon effects, much of the same structure was present in each and merging them was justified. To do this, we z-score-normalized the data points from each arena array/batch (within each Decathlon) across flies, thus eliminating any arena, assay, and Decathlon effects and enriching the data for contrasts between individuals. A grand data matrix was made by concatenating these batches (382 individuals × 121 behavioral measures).

The full correlation matrix of this Berlin-Kiso data set is shown in Figure 1E. It contains a substantial amount of structure, indicating that large groups of behavioral measures covary. But the covariance of many pairs of measures in the matrix is not surprising. For example, almost all our assays generate some measure of locomotor activity (meanSpeed in circular arenas, number of turns in Y-mazes, meanSpeed in the olfactory tunnels, etc.), and one might expect that especially active flies in one assay will be especially active in another assay. Additional unsurprising structure in this matrix comes in identical measures recorded in each of the 11–13 circadian assays each fly completed. But, even in this first analysis, surprising correlations were evident. For example, flies with higher variation in the inter-turn interval in the olfactory assay (‘clumpiness’ in their olfactory turning) exhibited higher mean speed in the circadian assays, and flies with higher variation in inter-turn intervals in the Y-maze (clumpiness in their Y-maze turning) exhibited lower mutual information in the direction of subsequent turns in the Y-maze (‘switchiness’ in their Y-maze handedness; Figure 1F. See below and Buchanan et al., 2015; Akhund-Zade et al., 2019) for more about these measures.

To produce an exhaustive list of such non-trivial correlations, we distilled the grand correlation matrix to a smaller matrix (the ‘distilled matrix’; Figure 1G) in which two kinds of interesting relationships were revealed: (1) uncorrelated dimensions among measures for which we had a prior expectation of correlation (e.g., if meanSpeed in circular arenas is found to be uncorrelated with meanSpeed in olfactory tunnels), and (2) correlated dimensions among measures for which we had no prior expectation of correlation. Relationships of the former class were identified by enumerating, before we ran any correlation analyses, groups of measures we expected to be correlated (‘a priori groups’; Figure 1E and G). See Supplementary file 2 and Supplementary file 3 for a breakdown of the measures included in each a priori group and their respective inclusion criteria. We looked for surprising independence within such groups by computing the principal components (PCs) of data submatrices defined by the grouping (e.g., for the ‘activity’ a priori group, by running principal component analysis (PCA) on the data set consisting of 382 individual flies, and 57 nominal measures of activity). We then replaced each a priori group submatrix with its projection onto its statistically significant PCs, as determined by a reshuffling analysis (see Materials and methods, Figure 1—figure supplement 7). Some a priori groups largely matched our expectations, with relatively few uncorrelated dimensions among many measures (e.g., the gravitaxis group that had 10 measures and only 2 significant PCs), while others exhibited relatively many uncorrelated dimensions (e.g., the clumpiness group that had five measures and five significant PCs; see Figure 1—figure supplement 7 for all a priori group PCAs).

With a priori group submatrices represented in their respective significant PCs, the grand data matrix now contained 38 behavioral measures. Every significant correlation between behavioral measures at this point represents an unexpected element of structure of behavioral variation (Figure 1G). Given the high dimensionality of the full data set and the complex structure of the behavioral measures in the distilled correlation matrix, we created an online data browser (http://decathlon.debivort.org) to explore the data and compare the alternative correlation matrices. The first impression of the distilled correlation matrix is that it is sparse. Most behavioral measures are uncorrelated or weakly correlated, suggesting that there are many independent dimensions of behavioral variation. However, 176 pairs of behaviors were significantly correlated at a false discovery rate (FDR) of 38%, and the distribution of p-values for the entries in this matrix exhibits a clear enrichment of low values (Figure 1—figure supplement 8), indicating an enrichment of significant correlations. As an example, flies with high values in the first PC of the phototaxis a priori group tend to have high values in the second PC of the activity level a priori group. As another example, the third switchiness PC is positively correlated with the second clumpiness PC (Figure 1H; a relationship that is likely related to the positive correlation between the Y-maze hand clumpiness and Y-maze hand switchiness, Figure 1F). Interpreting the loadings (Figure 1—figure supplements 9 and 10) of these PCs indicates that this is a correlation between olfactory tunnel turn direction switchiness and olfactory tunnel turn timing clumpiness. We detected a substantial number of correlations between different dimensions of switchiness and clumpiness (Figure 1—figure supplement 11), suggesting that there are multiple couplings between these suites of traits.

Stepping back from specific pairwise correlations, we examined the overall geometry of behavioral variation. The full matrix contained 22 significant PCs, with PCs 1–3 explaining 9.3, 6.9, and 5.8% of the variance, respectively (the distilled matrix contained 16 significant PCs, with PCs 1–3 explaining 13.8, 10.0, and 7.7% of the variance, respectively; Figure 1I). But the amount of variance explained across PCs does not provide the full picture of how many dimensions of variation are present in a data set. A correlation matrix can be organized at different scales/hierarchically, so there need not be a single number that characterizes dimensionality. As an intuitive example, data filling a volume shaped like a frisbee is organized largely in two dimensions. Data shaped like a rugby ball is somewhat one-dimensional, not particularly two-dimensional, and somewhat three-dimensional. An approach is needed that can characterize such continuous variation in organization across dimensionalities, particularly the possibility that the data are not structured tidily in a single dimensionality.

We developed a ‘connected components spectrum’ analysis that characterizes the continuously varying degree of organization of a correlation matrix from dimensionality 1 to d, the total dimensionality of the data. Briefly, we thresholded the absolute value of the correlation matrix at values ranging from 1 to d, and, treating these matrices as adjacency matrices, determined the number of connected components, recording how often n connected components were observed. See Materials and methods and Figure 1—figure supplement 12. These connected components spectra can be interpreted as follows: peaks at dimensionality = 1 indicate that all measures are coupled in a network of at least weak correlations; peaks at dimensionality = d indicate that all measures are to some degree uncorrelated; peaks in between these values indicate intermediate scales of organization. Multiple peaks are possible because these kinds of organization are not mutually exclusive. The connected components spectrum (Figure 1J) of the distilled Decathlon data set had peaks at 1 and d (Li and Durbin, 2009). There was also evidence for structure over the full range of intermediate dimensionalities. Overall, the organization is one of predominantly uncorrelated behaviors, with sparse sets of behaviors correlated with continuously varying strengths.

To assess how individual flies are distributed in behavior space, we embedded them from the 384-dimensional space into two dimensions using t-SNE (van der Maaten and Hinton, 2008). There appear to be no discrete clusters corresponding to ‘types’ of flies. Instead, variation among flies appears continuously distributed around a single mode (Figure 1K). The same is true for an embedding of flies as represented in the distilled matrix (Figure 4—figure supplement 3D). We also embedded behavioral measures as points from the 121-dimensional space of flies into two dimensions (Figure 1L). This confirmed that while our intuition for which sets of measures would be similar (the a priori groups) was right in many cases, measures we thought would be similar were often dissimilar across flies, and sometimes measures we did not anticipate being similar were (e.g., phototaxis and activity level).

Behaviors that covary among individuals tend to be patterned similarly in time and across the body

With data from the second Decathlon, we characterized the structure of variation in a set of behaviors that was potentially exhaustive for one behavioral condition (free walking/motion in a 2D arena; Figure 2A). High-speed, high-resolution video was acquired for flies simultaneously in each of two rigs. Over 3 days, we acquired 13.5 GB of 200 × 200 px 100 Hz videos centered on each fly as they behaved spontaneously over the course of 60 min. These frames were fed into an unsupervised analysis pipeline (Berman et al., 2014) that computed high-dimensional representations of these data in the time-frequency domain before embedding them in two dimensions and demarcating boundaries between 70 discrete modes of behavior (Figure 2B). The behavior of each fly was thus represented as one of 70 values at each frame. Flies exhibited a broadly similar probability distribution of performing each of these behaviors (Figure 2C), though there were conspicuous differences among individual patterns of behavior (Figure 2D).

Figure 2 with 1 supplement see all
Correlation structure of unsupervised behavioral classifications.

(A) Schematic of the four camera imaging rig used to acquire single fly videos. (B) Overview of the data processing pipeline from single fly videos to behavioral probability maps. (C) Sample individual behavior mode probability density functions (PDFs visualized in an embedded t-SNE space). Discrete regions correspond to watersheds of the t-SNE embedding. (D) Sample individual PDFs mapped to locations in t-SNE space. Discrete regions correspond to watersheds of the t-SNE embedded probability densities. (E) Correlation matrix (top) for individual PDFs with rows and columns hierarchically clustered. Colored blocks indicate labels applied to classifications post hoc. Example scatter plots (bottom) of individual behavioral probabilities. Points correspond to probabilities for individual flies. Line is the best fit (principal component [PC]1 of these points), gray region is the 95% confidence interval of the fit, as determined by bootstrap resampling. (F) Connected components histogram of the thresholded PDF correlation matrix (see Materials and methods). (G) Discrete behavioral map with individuals zones colored by post hoc labels as in (E). (H) Transition probability matrix for behavioral classifications. Entries in the ith row and jth column correspond to the probability of transitioning from state i to state j over consecutive frames. Blocks on the diagonal indicate clusters of post hoc labels as in (E) and (G).

The correlation matrix of behavioral modes identified in the unsupervised analysis was highly structured (Figure 2E; like the correlation matrix of the rest of the Decathlon measures, Figure 1E), appearing to be strongly organized in 10 or fewer dimensions, with some evidence of organization in higher dimensions (Figure 2F). Correlations between behavioral modes identified by unsupervised clustering and measures from the other Decathlon assays were generally weaker than correlations within these categories, but were nevertheless enriched for significant relationships (Figure 2—figure supplement 1). For this analysis, there was no equivalent of a priori groups of behavioral measures as measures were not defined prior to the analysis. But, in examining sample movies of flies executing each of the 70 unsupervised behavioral modes (Videos 14), it was clear that highly correlated behavioral modes tended to reflect variations on the same type of behavior (e.g., walking) or behaviors performed on the same region of the body (e.g., anterior movements including eye and foreleg grooming; Figure 2G). In other words, individual flies that perform more eye grooming tend to perform more of other anterior behaviors. There were some correlations between behaviors implemented by disparate parts of the body. For example, flies that spent more time performing anterior grooming also spent more time performing slow leg movements (Figure 2G). The overall similarity of covarying behaviors was confirmed by defining groups of covarying behaviors and observing that they were associated with contiguous regions of the embedded behavioral map (Figure 2G). That is, behaviors whose prevalence covaries across individuals have similar time-frequency patterns across the body. Moreover, these clusters of covarying, contiguously embedded behaviors exhibited similar temporal transitions; behaviors that covary across individuals tend to precede specific sets of subsequent behaviors (Figure 2H). Thus, there appear to be couplings between the dimensions of behavioral variation across individuals, the domains of the body implementing behavior, and the temporal patterning of behaviors.

Video 1
Examples of a mode of walking behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Colored dots indicate whether flies are outbred (NEX; red) or inbred (Berlin-Kiso; blue).

Video 2
Examples of a mode of wing grooming behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Colored dots indicate whether flies are outbred (NEX; red) or inbred (Berlin-Kiso; blue).

Video 3
Examples of a mode of head grooming behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Colored dots indicate whether flies are outbred (NEX; red) or inbred (Berlin-Kiso; blue).

Video 4
Examples of a mode of abdomen flexing behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Colored dots indicate whether flies are outbred (NEX; red) or inbred (Berlin-Kiso; blue).

Thermogenetic neural perturbation alters the correlations between behaviors

To (1) confirm that the Decathlon experiments revealed biologically meaningful couplings between behaviors and (2) probe biological mechanisms potentially giving rise to behavioral correlations, we treated correlations in the Decathlon matrices as hypotheses to test using data from a thermogenetic neural circuit perturbation screen (Skutt-Kakaria et al., 2019). Specifically, we focused on the many correlations between measures of turn timing clumpiness and turn direction switchiness (Figure 1H and S8). Before the Decathlon experiment, we had no reason to think these measures would be correlated, as one describes higher-order structure in the timing of locomotor turns (clumpiness), and the other describes higher-order structure in the direction of sequential turns (switchiness). Our prediction derived from the Decathlon was that if perturbing a circuit element caused a change in clumpiness it would tend to also cause a change in switchiness in a consistent direction. We looked for such correlated changes when we inactivated or activated neurons in the central complex (Figure 3—figure supplement 1A–C), a cluster of neuropils involved in locomotor behaviors (Buchanan et al., 2015; Ofstad et al., 2011; Kottler et al., 2019) and heading estimation (Seelig and Jayaraman, 2015; Green et al., 2017; Kakaria and de Bivort, 2017). We used a set of Gal4 lines (Wolff and Rubin, 2018), each of which targets a single cell type and that tile the entire protocerebral bridge (a central complex neuropil; see Supplementary file 4), to express Shibirets (Kitamoto, 2001) or dTRPA1 (Hamada et al., 2008), thermogenetic reagents that block vesicular release and depolarize cells, respectively. As controls, we used flies heterozygous for the Gal4 lines and lacking the effector transgenes.

Flies with these genotypes were loaded into Y-mazes for behavior imaging before, during, and after a temperature ramp from the permissive temperature (23°C) to the restrictive temperature (32°C for dTRPA1 and 29°C for Shibirets) (Figure 3—figure supplement 1A). At the permissive temperature, we observed significant negative correlations in the average line clumpiness and switchiness of control and dTrpA1 expressing lines (Figure 3—figure supplement 1D and E). This suggests that the mechanisms that couple variation in switchiness and clumpiness within a genotype may also be at play across genotypes. Surprisingly, at the restrictive temperature, we saw significant positive correlations between clumpiness and switchiness in all three experimental treatments: control (Gal4/+), Gal4/Shibirets, and Gal4/dTRPA1 lines. That this correlation appeared in controls suggests that temperature alone can selectively alter the function of circuit elements regulating both clumpiness and switchiness, effectively reversing their coupling. The dTrpA1- and Shibirets-expressing lines also showed this reversal, but to a lesser extent, suggesting that perturbing neurons in the central complex can block temperature-induced changes in the coupling of clumpiness and switchiness.

Individual gene expression variation correlates with individual behaviors

That thermogenetic manipulation can disrupt correlations between behavioral measures suggests that specific patterns of neural activity underlie the structure of behavioral variation. Such physiological variation could arise in stochastic variation in gene expression (Lin et al., 2016) in circuit elements. To test this hypothesis, we performed RNA sequencing on the heads of the flies at the end of the first Decathlon experiment (Figure 3A). We used Tm3′seq (Pallares et al., 2020) to make 3′-biased libraries for each individual animal. We quantified the expression of 17,470 genes in 101 individual flies. The expression profiles were strongly correlated across individuals (Figure 3B), but there was some significant variation across individuals (Figure 3C). To assess whether this variation was meaningful with respect to behavioral variation, we trained linear models (over 625,000 in total) to predict an individual’s behavioral measures from its transcriptional idiosyncrasies. Specifically, we fit simple linear models for each of our 97 behavioral measures as a function of the 6642 most highly expressed genes. The median model had an r2 value of 0.008% and 5% of models predicted behavior at r2 > 0.135. Behavioral measures varied greatly in their number of significant (p<0.05) gene predictors (Figure 3D), ranging from 147 to 1172 genes.

Figure 3 with 1 supplement see all
Correlation between individual transcriptomes and behavioral biases.

(A) Steps for collecting transcriptomes from flies that have completed the Decathlon. (B) Data matrix of individual head transcriptomes. Rows are individual flies (n = 98). Columns are 17,470 genes sorted by their mean expression across individuals. The dashed line indicates the mean expression cutoff at 10 reads per million (RPM), below which genes were excluded from analysis. (C) Scree plots of the logged % variance explained for ranked principal components of gene expression variation, for observed (red) and shuffled (black) data. Shaded region corresponds to 95% CI as calculated by bootstrap resampling. (D) Performance heatmap (-log p) of linear models predicting the behavior of individual flies from single-gene expression. Colored bars (left) indicate a priori group identity of behavioral measures (rows). Bar graphs show the number of significant (p<0.05) models for each gene (top) and behavior (right). (E) Heatmap showing the probability across bootstrap replicates of a KEGG pathway being significantly enriched in the list of predictive genes for a given behavior. (F) Bar plot showing the average across bootstrap replicates of the maximum (across behaviors) negative log adjusted p-value of all enriched KEGG pathways. Color indicates results from observed (gray) or shuffled control (black) data. (G) Average maximum adjusted -log p-value for enriched KEGG pathways common to all Decathlon iterations. Pathway labels (right) are ordered by batch 3 (outbred) -log adjusted p-value.

After identifying genes that were predictive of each behavioral measure, we assessed whether they shared functional characteristics (Figure 4—figure supplement 1A) using KEGG pathway enrichment analysis (Mootha et al., 2003; Kanehisa and Goto, 2000; Yu et al., 2012). We independently performed KEGG enrichment analysis on each gene list to identify functional categories of genes overrepresented for any particular behavioral measure compared to the background list of 6642 genes. Of this background, 1982 genes (30%) were associated with one or more KEGG pathways. Across all behaviors, we found that 37 KEGG pathways were significantly enriched in the genes that predicted individual behavioral measures (Figure 3E, Figure 4—figure supplement 1, Supplementary file 5). Many of these pathways were significantly enriched compared to shuffled controls and were enriched in multiple behavioral measures (Figure 3F). We included features in the online Decathlon Data Browser to explore gene-behavior correlations (searching either by gene or by behavior) as well as KEGG pathway-behavior correlations. We repeated this experiment and analysis on flies of the second Decathlon experiment and found that the significance of pathway enrichments was highly correlated (r = 0.87; Figure 3G), with 43 and 73% of significant KEGG pathways in the first and second Decathlon experiments (respectively) shared in the other experiment.

Genes related to cellular respiration, protein translation, and phototransduction were significantly enriched for multiple behavioral measures, a result that was highly robust to bootstrap resampling. Cellular respiration was the most common enrichment, significant in 11 of 97 behavioral measures, suggesting that variation in metabolic rate may be predictive of variation in many behaviors. Indeed, metabolic function was a common link between significant categories, with a total of 24 of 37 being related to metabolism. We also found 11 pathways (including two highly enriched categories: ribosome and proteasome) associated with protein turnover. While most enriched pathways were related to basic cellular processes, others (Hedgehog signaling, Wnt signaling, insect hormone biosynthesis, SNARE vesicular transport, and phototransduction) suggested roles for development and neuronal function in individual behavioral variation.

Behavioral variability has a similar structure in inbred and outbred lines

If transcriptomic differences predict individual behavioral differences within a genotype, then the structure of behavioral variability might be very different in outbred populations, where transcriptomic differences are (presumably) much more substantial. We tested this by conducting a Decathlon experiment on outbred flies (n = 192) from a synthetic genetic mapping population (Long et al., 2014). These animals were from a high (~100)-generation intercross population (‘NEX’; seeded in the first generation by eight kinds of F1 heterozygotes produced by round-robin cross from eight inbred wild strains). A distilled correlation matrix of behavioral measures (Figure 4A) was produced by the same method as above. At first glance, it appears qualitatively similar to the distilled correlation matrix from inbred animals (Figure 1G). This impression was confirmed in more formal comparisons of the structure of behavior in inbred and outbred populations. In inbred and outbred populations, (1) individuals do not fall into discrete clusters, as determined by t-SNE embedding of individuals as points (Figure 4B). Moreover inbred and outbred flies appear to lie on the same manifold in behavioral measure space; (2) behavioral measures cluster according to their membership in a priori groups similarly in outbred (Figure 4C) and inbred (Figure 1L) populations; (3) the distribution of percent variance explained by PC was similar (Figure 4D); and (4) there is a similar connected components spectrum, with substantial distribution of behavioral biases across the full d dimensions and a sparse network of correlations coupling behaviors over a continuum of dimensionalities (Figure 4E). In addition to the Decathlon assay measurements, we also recorded high-speed, high-resolution videos of outbred flies for unsupervised classification and co-embedded their postural behavior modes with those of inbred flies. Correlated clusters of unsupervised behavior modes from outbred flies corresponded broadly to the clusters mapped to anatomical regions in inbred flies (Figure 4—figure supplement 1). We further conducted transcriptomic sequencing and analysis on the heads of these outbred flies and found that, similar to the inbred flies, their behaviors were broadly correlated to gene expression (Figure 4—figure supplement 2, Figure 4—figure supplement 3). The percent of significant pathways overlapping between pairs of inbred and outbred Decathlon experiments was similar to that of pairs of inbred Decathlon experiments, ranging from 38% to 64%.

Figure 4 with 4 supplements see all
Structure of behavioral variation in outbred flies.

(A) Distilled correlation matrix for outbred NEX flies. Colored blocks indicate a priori groups as described in Figure 1. (B) Points corresponding to individual flies nonlinearly embedded using t-SNE from the 121-dimensional full matrix to two dimensions. Color indicates whether the flies were inbred (blue) or outbred (orange). (C) Points corresponding to behavioral measures nonlinearly embedded using t-SNE from the 192-dimensional space of flies to two dimensions. Colors correspond to a priori group. (D) Scree plot of the ranked, normalized eigenvalues, that is, the % variance explained by each principal component (PC), of the distilled covariance matrix, versus PC #. (E) Connected components spectra for outbred and inbred correlation matrices (see Materials and methods). (F) Scatter plot of the distilled matrix correlation coefficients for inbred and outbred flies. Points correspond to distilled matrix measure pairs. (G) Example scatter plots of distilled matrix measure pairs for inbred (left) and outbred (right) flies. The rows of plots highlight a pair of measures with qualitatively different (top) and similar (bottom) correlations in inbred and outbred flies.

After determining that the overall structure of behavioral variation in inbred and outbred populations is similar, we asked whether it was also similar in specific correlations. There appears to be some similarity at this level (Figure 4F); the correlation coefficient between the inbred and outbred populations in the pairwise correlations between behavioral measures is statistically significant (p=0.001), but low in magnitude (r = 0.11). Examining specific pairs of scatter plots, it is clear that the correlations between specific behaviors are sometimes the same between the inbred and outbred animals, but sometimes not (Figure 4G, Figure 4—figure supplement 4). A caveat in interpreting apparent differences between the inbred and outbred matrices is that two qualities are different between the animals used in the respective experiments: the degree of genetic diversity, but also (necessarily) the genetic background of the flies.

Behavioral variability has high dimensionality regardless of the mechanistic origins of variation

Lastly, we examined how the correlation structure of behavior compared between sets of flies with variation coming from different sources. Specifically, we looked at four data sets: (1) the BABAM data set (Robie et al., 2017), in which measures were acquired from groups of flies behaving in open arenas, and variation came from the thermogenetic activation of 2381 different sets of neurons (the first-generation FlyLight Gal4 lines Jenett et al., 2012); (2) a Drosophila Genome Reference Panel (DGRP; Mackay et al., 2012) behavioral data set, in which measures were acquired in behavioral assays similar to the Decathlon experiments (sometimes manually, sometimes automatically), and variation came from the natural genetic variation between lines in the DGRP collection; (3) a DGRP physiological data set, in which measures are physiological or metabolic (e.g., body weight and glucose levels) and variation came from the natural genetic variation between lines in the DGRP collection; and (4) the split-Gal4 Descending Neuron (DN) data set (Cande et al., 2018) in which measures came from the same unsupervised cluster approach as Figure 2, and variation came from the optogenetic activation of specific sets of descending neurons projecting from the brain to the ventral nerve cord (Namiki et al., 2018). We analyzed these data sets with the same tools we used to characterize the structure of behavioral variation in the Decathlon experiments.

All of these data sets show substantial structure in their correlation matrices (Figure 5A and Figure 5—figure supplement 1). The BABAM and especially the DN correlation matrices contain numerous high correlation values, indicative of strong couplings between behaviors under these neuronal manipulations. The DGRP correlation matrices, especially the DGRP behavioral matrix, look more qualitatively similar to the Decathlon matrix, with lower, sparser correlations. This suggests that behavioral variation has coarsely similar structure whether variation arises intragenotypically (e.g., through stochastic variation in transcription; Figures 3B and 1J), intergenotypically among outbred individuals (Figure 4E), or intergenotypically among inbred lines derived from wild populations (Figure 5A2). A caveat of this conclusion is that sparse correlation matrices can arise either from true, biological independence of behavioral measures or from measurement error.

Figure 5 with 1 supplement see all
Analysis of Drosophila behavioral covariation in other non-isogenic populations.

(A) Correlation matrices of previously published data sets. Rows correspond to analyses performed on each data set. From left to right, the data sets (columns) are as follows: line averages of supervised behavioral classifications following thermogenetic inactivation in the fly olympiad screen (Robie et al., 2017), line averages of behavioral phenotypic data from wild-type inbred lines in the Drosophila Genomic Reference Panel (DGRP) database, line averages of physiological phenotypic data from the DGRP database, line averages of the fold change in unsupervised behavioral classifications following optogenetic activation of descending neurons (Cande et al., 2018). (B) Connected components spectra for each correlation matrix (see Materials and methods). Color in the rightmost plots (B–D) indicates either control (Gal4 driver only) or experimental animals (Gal4 × dTrapA1). (C) Points corresponding to lines nonlinearly embedded using t-SNE from the D-dimensional raw measure space to two dimensions (from left to right, d = 871, 31, 77, 151). (D) Points corresponding to lines nonlinearly embedded using t-SNE from the n-dimensional raw measure space to two dimensions (from left to right, n = 2083, 169, 169, 176).

The connected components spectra of these matrices (Figure 5B) are similar in offering evidence of organization over a wide range of dimensionalities, including high dimensionalities. Only the BABAM spectrum has no power at the dimensionality of its raw count of measurements. The BABAM and DN spectra have a single predominant peak (at dimensionality = 1), suggesting that most measures belong to a single network of at least weak couplings. This is especially true in the BABAM data and is more true in the optogenetic experimental animals than controls in the DN data. The DGRP physiology data exhibits weak peaks at dimensionality = 1 and d, but also peaks at ~25 and 37 dimensions, suggesting that intergenotypic variation in physiology may have an intrinsic dimensionality in that range. Alternatively, since these data sets comprise multiple studies, organization at this dimensionality may reflect batch effects or multiple related measurements within studies. The spectrum of the DGRP behavior data looks similar to that of the Decathlons, with peaks at dimensionality = 1 and d, and evidence for structure over the full range of intermediate dimensionalities. There is also a peak at ~27 dimensions, which may also correspond to study-level effects. Ultimately, we found no evidence for low-dimensional organization in either DGRP data set. The distribution of individual lines in the space of DGRP behavior (and physiology) measures appears to be distributed single mode (Figure 5C2 and C3), like individual flies in the Decathlon (Figure 1K). In contrast, there is some organization of individual lines in the BABAM and DN data sets, likely reflecting neuronal perturbations affecting multiple circuit elements mediating the same behavior(s) (e.g., multiple lines targeting the same neuropil). Measures fall into clusters in all of these data sets except the DGRP behavioral measures (Figure 5D), which appear distributed around a single mode, perhaps reflecting the high dimensionality of behavior itself.

Discussion

Individuals exhibit different behaviors, even when they have the same genotype and have been reared in the same environment. These differences might covary or lie on a manifold of specific geometry in behavioral variation space, but the structure of intragenotypic behavioral variation is uncharacterized. We designed a pipeline of 10 behavioral assays (Figure 1), which collectively yielded up to 121 behavioral measures per individual animal. We also used unsupervised clustering to identify an additional 70 measures per individual based on a time-frequency analysis of high-resolution video of the flies behaving spontaneously (Figure 2). These measures were the fly-specific rates of exhibiting each of the 70 unsupervised behavioral modes. All in all, across three 15-day Decathlon experiments, we collected 191 behavioral measures from 576 flies. This allowed us to produce a full correlation matrix of the behavioral measures exhibited by inbred animals grown in the lab (Figure 1E).

There is a well-developed theoretical framework for understanding the multivariate correlation structure of phenotypes. In quantitative genetics, G-matrices characterize the variance and covariance structure of phenotypes (be they behavioral, physiological, morphological) stemming from genetic differences among individuals or strains (Mackay, 2009; Bruijning et al., 2020). These representations allow the quantitative prediction of responses to selection and constrain the combinations of phenotypes individuals can exhibit. As such, these representations are a key part of predicting the future trajectories of evolution. Just as the phenotypic variance can be parsed into genetic variance, environmental variance, GxE interaction variance, etc., covariance can be similarly dissected (Charmantier et al., 2014; Berdal and Dochtermann, 2019). For example, the classic model of phenotypic variance VP = VG + VE has a direct phenotypic covariance analogue: CovP = CovG + CovE. The last term (CovE) is further broken down into temporary environmental covariances and permanent environmental covariances (CovPE) that endure for the duration of observations (similar to our measurements of individual behavioral bias). In flies, we have the potential to directly measure CovPE by rearing inbred animals in standardized lab environments, profiling their individual biases over a wide range of behavioral measures, and directly quantifying the variance and covariance of behavioral bias. This is significant for quantitative geneticists because a meta-analysis of behavioral traits indicates that across behaviors ~23% of variance can be attributed to heritable factors (Dochtermann et al., 2019). This means that environmental factors, which include both deterministic effects and stochastic intragenotypic effects, explain ~77% of behavioral variance. Thus, characterizing the structure of CovPE will contribute to closing a significant gap in our understanding of the basis of behavioral diversity.

For ethologists and behavioral neuroscientists, this work yields a view of the geometry of intragenotypic behavioral variation, which can be thought of as an emergent product of developmental biological processes and the dynamic interaction of neural activity and animals’ environment. From the full behavioral matrix, we made a so-called ‘distilled’ matrix in which any significant correlation indicates a surprising new relation between behaviors (Figure 1G and Figure 1—figure supplement 11). This form of the data minimizes duplicated measures of the same behavior, allowing us to cleanly analyze the geometry of behavioral variation. We found that behavioral measures were largely uncorrelated with each other, but small sets of behaviors were significantly correlated to varying degrees. This sparse correlation structure means that behavioral variation cannot be readily compressed to a small number of dimensions. Moreover, a single number cannot fully characterize the dimensional organization of a correlation matrix; so we developed a spectral approach (based on a connected components analysis of thresholded correlation matrices) that examined the degree of organization across all possible dimensionalities in the data (Figures 1J, 2F, 4E and 5B, Figure 1—figure supplement 12). This approach revealed organization across intermediate dimensionalities corresponding to correlations of varying strength between sparse sets of behaviors (Figure 1F and H and Figure 1—figure supplement 12). Embedding data points corresponding to individual flies from the high-dimensional space of individual biases into two dimensions produced a broad distribution around a single mode (Figure 1K), implying that there are no discrete types of flies.

One of the specific, surprising correlations we discovered was between ‘clumpiness’ and ‘switchiness’ (Figure 1F, H and Figure 1—figure supplement 11, Figure 3—figure supplement 1). These are slightly abstract, higher-order behavioral measures, corresponding respectively to the burstiness of turn/action/decision timing and the degree of independence between consecutive binary choices. We had no a priori reason to expect these measures would be correlated since one pertains to the structure of actions in time and one pertains to the persistence of trial-to-trial biases. However, their linkage may reflect a shared role in controlling the higher-order statistics of exploration. Sequences of behavior with clumps of bouts (either in time or in space) might contribute to fat-tailed distributions of dispersal that are advantageous over Brownian motion for foragers in environments of sparse resources (Bartumeus et al., 2002). Variation in switchiness and clumpiness across individuals might therefore reflect variation in multibehavioral navigational strategies, perhaps as part of bet-hedging evolutionary strategy (Hopper, 1999; Kain et al., 2015). From a perspective of biological mechanism, the correlation between these two behaviors (or other pairs we discovered) could be established during development. Individual wiring (Mellert et al., 2016; Linneweber et al., 2020) or physiological variations in neurons that mediate more than one behavior could impart coupled changes to all such behaviors.

If such an explanation accounts for the correlation of clumpiness and switchiness, there may be shared neural circuit elements in the circuits controlling decision timing and decision bias. We tested this idea in a thermogenetic screen of circuit elements in the central complex, a brain region where heading direction is represented (Seelig and Jayaraman, 2015) in a ring-attractor circuit (Kakaria and de Bivort, 2017). We found that increasing the temperature changed the sign of the correlation between switchiness and clumpiness (Figure 3—figure supplement 1D and E), suggesting that changes to brain physiology can alter correlation structure, though these effects might instead be caused by any of the many changes in neural state that accompany a temperature change. Indeed, we also found that the effector-inducing temperature manipulation alone concomitantly changed clumpiness and switchiness in some lines. This suggests that potentially subtle alterations of circuit physiology (e.g., temperature shifts [Haddad and Marder, 2018] well within physiological limits) can affect the function of circuit elements governing multiple behaviors.

We included behavioral assays in the Decathlon pipeline under a number of constraints. The assays had to be high throughput, both in the number of flies that could be assayed and in measures being automatically acquired. Flies had to survive at high rates, and the measures had to be stable over multiple days (Figure 1—figure supplement 1), because the whole experiment lasted 15 days. Because not all behavioral measures showed robust stability for this duration (all showed at least some day-to-day stability), the distilled Decathlon matrices likely represent an underestimate of the behavioral couplings exhibited over short periods (and perhaps an overestimate of lifelong couplings as flies can live more than a month). In the end, we employed a number of spontaneous locomotion assays and simple stimulus-evoked assays like odor avoidance and phototaxis.

Light responses were measured in a number of assays (Supplementary file 1), specifically the LED Y-maze (Werkhoven et al., 2019; in which flies turned toward or away from lit LEDs in a rapid trial-by-trial format), the spatial shade-light assay (in which flies chose to stand in lit or shaded regions of an arena that only changed every 4 min), and temporal shade-light (in which the same luminance levels were used as the previous assay, but a fly experienced them by traveling into virtual zones that triggered the illumination of the whole arena at a particular luminance) (Figure 1B). These assays were potentially redundant, and we included this cluster of phototaxis assays in part as a positive control. However, the three phototactic measures we thought would be correlated a priori were, in fact, uncorrelated, each being represented in the distilled correlation matrix (Figure 1G). This may reflect flies using different behavioral algorithms (Krakauer et al., 2017), implemented by non-overlapping circuits, to implement these behaviors. Indeed, a lack of correlation between behavioral measures was the typical observation. This also suggests that we have not come close to sampling the full dimensionality of intragenotypic behavioral variation; if we were able to add more measurements to the experiment, they too would likely be uncorrelated.

To address potential biases in our sampling of assay and measure space, we performed an unsupervised analysis (Berman et al., 2014; Cande et al., 2018) of flies walking spontaneously in arenas. This approach has the potential to identify all the modes of behavior exhibited in that context. Moreover, because the unsupervised algorithm is fundamentally a clustering algorithm, it does not necessarily return a definitive number of clusters/behavioral modes (with more data, it can find increasingly more clusters). Because we can also extract second-order behavioral measures from this approach, such as Markov transition rates between modes, this approach has the potential to yield a huge number of measures. In the end, we were conservative in the number of measurements we chose to work with, matching it to the same order of magnitude as the number of flies we tested. The correlation matrix for the unsupervised behavioral measures featured stronger correlations than the distilled Decathlon matrices (Figure 2E). Yet it had a similar connected components spectrum, indicating many dimensions of variation and that not all behavior modes had been sampled (Figure 2F).

Interestingly, the blocks of structure in the correlation matrix aligned, to some extent, with the blocks of structure in the Markov transition matrix of these behavioral measures. This suggests that behaviors mediated by non-overlapping circuits (those that vary independently across individuals) more rarely transition to each other over time. Conversely, behaviors mediated by overlapping circuitry are likely to follow each other sequentially. This may reflect the influence of internal states (Calhoun et al., 2019), with an internal state jointly determining what subset of overlapping neurons drives behaviors that are appropriate to string together in succession (e.g., Seeds et al., 2014). We did not assess the day-to-day persistence of behavioral modes identified in the unsupervised analysis, so the observed variation across flies could reflect moods rather than permanent personalities. Indeed, Hernández et al., 2020 find that high-level clusters in unsupervised modes identified through an information bottleneck analysis of the behavioral transitions that occur over many minutes align with clusters in the cross-individual covariance matrix. This suggests either that individuality observed in approximately hours-long video recordings reflects approximately hours-long transient states or, alternatively, that the structure of days-long or lifelong variation mirrors that of hours-long variation. The latter may be particularly plausible, given that previous studies that measured spontaneous microbehaviors using supervised (Kain et al., 2013) and unsupervised (Todd et al., 2017) approaches over repeated tests found substantial day-to-day correlations.

We investigated whether individual variation in transcript abundance would predict individual behavioral variation (Figure 3). At the end of the Decathlon, flies were flash-frozen and their heads were RNA sequenced. We fit linear models for thousands of gene-behavior measure pairs to identify a set of genes that predict behavioral variation. Some measures had many gene predictors, while others had none. Gene function enrichment analysis revealed that variation in the expression of genes involved in respiration and other kinds of metabolism predicted variation in many behavioral measures. Genes involved in neuronal and developmental processes were also enriched among predictors of individual behavior. These two functional categories are likely linked as there are strong causal couplings between metabolism and neural activity (Mann et al., 2020). Variation in behavioral measures without expression correlates may not have mechanistic origins in transcriptional variation, or the genes that do determine these behaviors may be not expressed in adults, expressed at low levels, or expressed in too few cells to detect in bulk head tissue.

Increasing transcriptional variation by adding genetic variation had the potential to change the correlation structure of behavioral measures. To our surprise, the distilled correlation matrix of outbred flies was qualitatively similar to that of our original inbred Decathlon (Figure 4). Both outbred and inbred matrices were dominated by independent axes of variation and sparse correlations between axes, with rough agreement in the specific pairwise correlations between these two data sets. These two data sets may have differed in their absolute variances (while appearing qualitatively similar in their covariances), but the normalization steps we took to resolve inter-Decathlon and assay batch effects precluded easy assessment of this possibility. That outbred and inbred variation had qualitatively similar structures raises the possibility that the same kinds of biological fluctuations underlie behavioral variation in populations of each of these kinds. The high dimensionality of the inbred and outbred matrices suggests that behaviors can evolve largely independently of one another (i.e., without pleiotropic constraint) either via plastic mechanisms within a genotype or by natural selection in a genetically diverse population.

We finally examined the structure of behavioral variation in collections of flies where variation came from three additional sources: thermogenetic activation of 2205 sets of neurons across the brain (Robie et al., 2017), optogenetic activation of 176 sparse populations of descending neurons connecting the brain to the motor centers of the ventral nerve cord (Cande et al., 2018), and variation in genotype across ~200 inbred strains derived from wild flies (Mackay et al., 2012). Overall, all of these data sets exhibited high degrees of independence in their behavioral measures, though there is some variation from one data set to the next. This variation could arise from several non-biological sources, including differing extents of collecting multiple measures from the same experiment, which can inflate correlations due to coupled noise; differences in the number of repeated or numerically coupled measures; batch effects from combining studies conducted in different labs at different times; and varying signal-to-noise ratios in the measurements. The structure of behavioral variation in the neural activation data sets was somewhat different from that of inbred and outbred flies, with a dimensionality smaller than that of the number of measures, and substantially more organization in lower dimensionalities (Figure 5A and B). These data sets also showed clustering of individual flies in behavior space (Figure 5C). Behavioral variation across the inbred strains derived from wild flies was organized qualitatively similarly to the variation across individual flies in inbred and outbred populations, again suggesting that the biological fluctuations across genotypes mirror those within a genotype as respects the coordination of behavior.

This work represents the most complete characterization to date of the structure of behavioral variation within a genotype. We found that there are no discrete types of flies, and there are many independent dimensions of behavioral variation. Moreover, the similar organization of biological variation within and among genotypes suggests that fluctuations in the same biological processes underpin behavioral variation at both of these levels. Elucidating the causal molecular and genetic fluctuations underpinning intragenotypic variation and covariation will be a challenging and illuminating future research direction.

Materials and methods

Software

All software for analysis and figures in this paper as well as documentation are available at http://lab.debivort.org/structure-of-behavioral-variability/. Static DOI for this code is available athttps://doi.org/10.5281/zenodo.4110049. Actively maintained versions of the analysis software are available at https://github.com/de-Bivort-Lab/decathlon (copy archived at swh:1:rev:6c9e338db6f03e42bbbb2e2afa0cfd52162e7772; de Bivort, 2021a).

MARGO (animal tracking and stimulus delivery software): https://github.com/de-Bivort-Lab/margo (copy archived at swh:1:rev:fe61a873494e464d2a3ee48f67e885eb95359e0a, de Bivort, 2021b).

Decathlon Data Browser (interactive web application for exploring data): .http://decathlon.debivort.org.

Data

Data such as behavioral measures, unsupervised embedding PDFs, RNAseq read counts, and formatted versions of the BABAM, DGRP phenotypic, and DN screen data are available at http://lab.debivort.org/structure-of-behavioral-variability/. Static DOI for these data in addition to the unsupervised classification embedding time series data is available at https://doi.org/10.5281/zenodo.4110049. Our raw tracking data from the assays covers ~600 flies over 12 days of continuous tracking at between 3 and 60 Hz and are greater than 50 GB in size. Our unsupervised classification videos comprise ~144 million frames of uncompressed video data and are greater than 8 TB in size. For these reasons, we have not posted the rawest forms of the data to online repositories and instead offer them upon request by external hard drive.

Fly strains

Request a detailed protocol

All Decathlon experiments were performed on virgin female fruit flies derived from inbred wild-type Berlin-K (BSC# 8522) isogenic flies or custom outbred flies (NEX) (REFEF). Prior to Decathlon testing, a lineage of Berlin-K isogenic flies (Berlin-Kiso) were selected for robustness from a pool of inbred lineages as the result of a short screen designed to mimic the stresses of the Decathlon assay battery. All inbred lineages tested were derived from three wild-type strains: Berlin-K (BK), Canton-S (CS), and Oregon-R (OR). For each wild-type strain, six virgin females were dispensed into separate vials and paired with a single male fly to establish separate lineages. Parental flies were removed from their vials after 2 days to separate them from their progeny. For all successive generations after the first generation, the six virgin females were picked from the three vials (two from each) with the highest number of progeny to select for fecundity and overall ease of maintenance. Virgins were then dispensed into separate vials with full sibling males for inbreeding. After 13 generations of inbreeding, this way the resulting lineages (six total for each strain) were screened for overall activity level via behavioral testing on the Y-maze assay in cohorts of 120 flies from each lineage. For each strain, the lineage with the highest number of choices (turns) in the Y-maze from was kept for further testing to improve overall sampling in behavioral experiments, resulting in a single lineage from each strain: Berlin-Kiso, Canton-Siso, and Oregon-Riso.

To screen robustness to the stresses of multiple days of behavioral testing such as periodic food deprivation and repeated cold anesthetization, cohorts of 96 flies from the resulting inbred lines were singly housed and repeatedly tested on the Y-maze assay over a 3-day period. Berlin-Kiso and Canton-Siso were selected as the lines with lowest and second lowest (respectively) mortality after 3 days of testing. To potentially introduce hybrid vigor and reduce the deleterious effects of inbreeding, four F1 hybrid inbred strains were generated by crossing each combination of parental sex and inbred strain (e.g., ♂ Berlin-Kiso × ♀ Canton-Siso, ♀ Berlin-Kiso × ♂ Canton-Siso, etc.) and screened for mortality in the 3-day experiment described above. Contrary to our expectation that increased heterozygosity in the F1 hybrid lines would result in more robust flies and lower mortality, F1 flies displayed intermediate mortality and activity level (choice number) between Berlin-Kiso and Canton-Siso parentals. Thus Berlin-Kiso flies were ultimately selected for the Decathlon screen.

Fly handling

Request a detailed protocol

Parental flies and Decathlon flies (prior to individual housing) were raised on CalTech formula cornmeal media under 12 hr/12 hr light and dark cycle in an incubator at 25°C and 70% humidity. A single cohort of virgin Decathlon flies were collected over an 8 hr period (10 AM to 6 PM) and stored in group housing up to a maximum of 288 flies. On the following day, flies were anesthetized using carbon dioxide (CO2) and were aspirated into custom multiwell plates for individual housing (FlySorter LLC, Seattle, WA) on cornmeal media. Individually housed flies were stored in a custom imaging box for circadian activity measurements (circadian chamber) on a 12 hr light and dark cycle (10 AM to 10 PM), which was housed inside of an environmentally controlled room at 23°C and 40% humidity. Food media was replaced and individual housing trays were cleaned every 2 days while flies were in behavioral testing to keep food moisturized and prevent microbial buildup.

Following the start of Decathlon (3 days post-eclosion), flies were removed from the circadian chamber each day for behavioral testing between 11 AM and 2 PM. Prior to all behavioral experiments (excluding olfaction), flies were cold anesthetized by transferring individual housing plates and food trays to an ice-chilled aluminum block in a 4°C for 10 min. This time was the minimum time required to reliably induce chill coma in most flies given the mass and low thermal conductivity of the cornmeal media separating the flies and the chill block. Once anesthetized, flies were individually transferred via aspiration to room temperature custom behavioral arenas (without food) where they quickly awoke (typically 1–10 s) after transfer. After all flies were transferred, they were given an additional 20 min to recover prior to behavioral testing. Flies were then tested in custom behavioral platforms, lasting as little as 15 min (olfaction only) and as much as 2 hr (typical). Following testing, flies were returned to individual housing via aspiration either directly (olfaction only) or after re-anesthetizing by transferring the custom behavioral arenas to an ice-chilled aluminum block for 10 min. Once all flies were returned, the individual housing plates were returned to the circadian chamber between 2 and 5 PM.

Custom behavioral experiments

Assay fabrication

Request a detailed protocol

Unless otherwise specified, all tracking was conducted in custom imaging boxes constructed with laser-cut acrylic and aluminum rails. Schematics of custom behavioral arenas and behavioral boxes were designed in AutoCAD 2014 (AutoDesk). Arena parts were laser-cut from black and clear acrylic and were formed as stacked layers joined with Plastruct plastic weld. Arena floors were made from sandblasted clear acrylic and arena lids were made from custom-cut clear eighth inch acrylic. Schematics for behavioral boxes and behavioral arenas can be found on the de Bivort Lab schematics repository (https://github.com/de-Bivort-Lab/dblab-schematics). Illumination was provided by dual-channel white and infrared LED array panels mounted at the base (Part BK3301, Knema LLC, Shreveport, LA). Adjacent pairs of white and infrared LEDs were arrayed in a 14 × 14 grid spaced 2.2 cm apart. White and infrared LEDs were wired for independent control by MOSFET transistors and a Teensy 3.2 microcontroller. Two sand-blasted clear acrylic diffusers were placed in between the illuminator and the behavioral arena for smooth backlighting.

For circadian experiments, flies were housed and imaged in individual fly storage units (FlyPlates) from FlySorter, LLC. Circadian imaging boxes consisted of a small (6" × 6" × 18") enclosed behavioral box with a dual-channel LED illuminator on a 12:12 hr light and dark cycle programmatically controlled via MARGO behavioral tracking software (https://github.com/de-Bivort-Lab/margo). A heat-sink was affixed to the underside of the illuminator panel (outside of the box), and fans were used to ensure the interior of the boxes remained consistent with the temperature of the environmental room.

The olfactory sensitivity assay was performed in a custom behavioral chamber modeled on Claridge-Chang et al., 2009. The apparatus consisted of 15 parallel tunnels constructed from Accura 60 plastic using stereolithography (In’Tech Industries) fabrication. Stainless steel hypo tubing (Small Parts) was used to connect the apparatus with (ID: 0.7 mm). Odorized or clean air was delivered via Teflon odor tubes to inlet ports at each end of the tunnel and streams vented to the room through exhaust ports in the center forming a sharp choice zone. An active vacuum was not applied to the exhaust ports, and the tunnels operated close to atmospheric pressure throughout the experiment. A clear acrylic lid was clamped in place above the apparatus to ensure an air-tight seal during odor presentation. Air dilutions could be made independently for each side of the apparatus. A custom 15-way PEEK manifold was used on each side to split the odorized flow equally between 15 tunnel inlets. A final valve (SH360T041; NResearch) was used immediately upstream of each manifold to quickly switch between pure dehumidified air and the odorized stream.

Phototactic stimuli were delivered with a custom 12" × 12" PCB designed in Express PCB CAD software and manufactured by ExpressPCB. Briefly, the PCB was designed to power and independently control 216 white light LEDs (11.34 mW ± 4.3 μW at 20 ma) geometrically arranged to match the maze arm ends of an array of Y-shaped behavioral arenas. LED intensity control was provided over USB via board interface with a Teensy 3.2 microcontroller and constant current driver boards (Adafruit 24-Channel TLC5947 LED Driver). Holes in the footprint shape of each individual arena were custom cut into the PCB with water jet cutter to provide infrared backlighting.

Visual stimuli in the spatial shade-light, temporal shade-light, and optomotor assays were delivered in a behavior box modified to accommodate stimulus delivery with an overhead projector (Optoma S310e DLP). The DLP color wheel was removed to reduce apparent flicker to the flies caused by low sensitivity to red light in flies. Stimuli were accurately targeted to individual flies by creating a registration mapping between the projector and the tracking camera using previously described method using MARGO tracking software. Briefly, a 2D polynomial registration model of a projector targeting coordinates of a small (10 px) was fitted to coordinates obtained by tracking the dot as it was rastered over the projector display field. All visual stimuli were crafted and displayed with PsychToolbox.

Single-fly videos used in the motionmapper unsupervised classification pipeline were acquired in behavioral boxes as described above. We replaced the dual-channel LED illuminator with an LCD screen backlight to reduce apparent non-uniformities in the videos. Behavioral arenas consisted of custom thermoformed clear PTEG lid with sloping sides to prevent wall walking. Lids were coated with sigmacote (Sigma-Aldrich SL2-100ML) to prevent flies from walking on the ceiling and were clamped in place to a 0.25" tempered glass base via custom acrylic 2 × 2 array of arenas.

Behavioral tracking

Request a detailed protocol

Flies were imaged with overhead tracking cameras at varying resolutions and frame rates. Unless otherwise specified, images for behavior tracking were acquired at 10 Hz with a 1280 × 1024 pixel BlackFly GigE camera (PointGrey BFLY-U3-13S2M-CS) fitted with a Fujinon YV2.8 × 2.8 SA-2, 2.8–8 mm, 1/3", CS mount lens. Circadian and optomotor experiments were conducted with 3 Hz and 60 Hz imaging, respectively. Tracking images for the odor sensitivity assay were acquired at 1328 × 1048 pixel at 23 Hz (PointGrey FMVU-13S2C). Single-fly videos for unsupervised classification were acquired.

Fly tracking and stimulus control for all experiments (excluding unsupervised single-fly imaging) were programmed in MATLAB and implemented with MARGO (Werkhoven et al., 2019). The MARGO tracking algorithm has been previously described in detail. Briefly, binary foreground blobs were segmented from a thresholded difference image computed by subtracting each frame from an estimate of the background. A rolling model of the background was computed as the median image of the rolling stack of background sample images. Every 2 min, a new sample image of the background was acquired and replaced the oldest image in the stack. Regions of interest (ROIs) were defined to encompass only a single behavioral arena (i.e., one fly), and the tracking algorithm proceeded independently for each ROI. The tracking threshold used to segment binary foreground blobs from the background was manually set prior to tracking and was independent for each experiment. Blobs below a minimum or maximum area (also defined manually for each experiment) were excluded from tracking in each frame. After filtering, the centroid trace of each ROI was updated with the position of the blob with shortest distance to the last known (non-NaN) position of each trace. The centroid trace of any ROI with no detected blobs in any given frame received no position (i.e., NaN). Output measurements of speed and area were converted from pixels/s and pixels2 to mm/s and mm2, respectively, by measuring the length of a known landmark size (e.g., arena diameter) prior to tracking. Fisheye distortion of camera lenses was modeled and corrected via MATLAB’s camera calibrator app.

For the acquisition of single-fly videos used in the motionmapper unsupervised classification pipeline, flies were tracked in real time to reduce video file size by extracting a 200 × 200 pixel region around the fly centroid. Tracking was performed by computing the centroid of the entire difference image acquired and was calculated via the method described above. Tracking was coordinated simultaneously for four 1280 × 1024 pixel resolution cameras (PointGrey CM3-U3-13Y3M-CS) at 100 Hz with a custom LabView script obtained from the J. Shaevitz lab (personal communication), which was lightly modified to add background subtraction to the tracking.

Decathlon experimental design

Request a detailed protocol

Flies underwent behavioral testing from 3 days post-eclosion up to a maximum of 14 days post-eclosion depending on when unsupervised classification was performed. For each Decathlon experiment, behavioral testing began with activity screen of 288 flies in the Y-maze assay. Flies were then sorted by number turns made during the activity screen. The top 192 flies with the highest activity level were selected for testing in the remainder of the Decathlon and were transferred to individual housing on day 3 post-eclosion for overnight circadian behavior profiling. On subsequent days, all flies were anesthetized as described above and transferred from individual housing into a behavioral assay each day (from days 3 to 11 post-eclosion) and were returned individual housing for overnight circadian behavioral measurement. Flies were tested in cohorts dependent on the throughput capacity (described for each assay below) of the assay run on any given day. Although the range of testing times for all animals on some days was relatively broad (all experiments were conducted between 11 AM and 5PM), no behavioral experiment lasted more than 2 hr in duration, meaning that individual cohorts typically spent more than 3 hr (2 hr experiment +2× anesthetization and transferring) off of food media. The order of individual assays (shown in Figure 1) was randomized at the start of each Decathlon with the exception of activity screening and unsupervised imaging, which occurred at the beginning and end of each Decathlon (respectively) for logistical feasibility. Descriptions and implementation details of the Decathlon behavioral assays are provided below and are summarized in Supplementary file 1.

After day 11 post-eclosion, remaining living flies were split into three roughly equal-sized cohorts (approximately 56 flies each) for unsupervised behavioral imaging, which were imaged in groups of four flies between 10 AM and 6 PM on subsequent days. Cohorts of flies were flash-frozen on liquid nitrogen immediately following unsupervised behavioral imaging and prepped for RNAseq as described above.

Arena circling and circadian assays

Request a detailed protocol

Both the arena circling and circadian assays consisted of exploration of a circular arena. Although a near identical list of measures of locomotor handedness, speed, and movement bout dynamics were recorded in each assay (circadian includes gravitaxis also), the assays are distinguished by their background light level, duration, behavioral arena, and access to food. The arena circling assay was conducted over 2 hr with constant light in a wide (28 mm diameter) and shallow (1.6 mm depth) arena without food. The circadian assay was conducted overnight (20–21 hr) on a 12:12 hr light and dark cycle (10 AM:10 PM) with each fly in a narrow (6.8 mm) and deep (10.6 mm) well of a 96-multiwell plate. The depth of the arena in addition to the difference in arena ceiling (sigmacoted acrylic in arena circling and uncoated plastic mesh circadian) adds a vertical and body orientation dimension to the circadian assay not present in arena circling. This feature not only adds a measure of floor vs. ceiling preference (i.e., gravitaxis) to the circadian assay but also potentially confounds measures of circling directionality due to the fact that flies can circle the arena right-side up on the floor, upside-down on the ceiling, or sideways on the walls.

LED Y-maze assay

Request a detailed protocol

Individual flies explored symmetrical Y-shaped arenas with LEDs at the end of each arm. For all arenas in parallel, real-time tracking detected which arm the fly was in at each frame. At the start of each trial, an LED was randomly turned on in one of the unoccupied arms. Once flies walked into one of these two new arms, all the LEDs in the arena were turned off and a new trial was initiated by randomly turning on an LED in one of the newly unoccupied arms. This process is repeated for each fly independently over 2 hr. Turns were scored as positively (toward a lit LED) or negatively (toward an unlit LED) phototactic. LEDs were either fully off (OFF) or were driven at maximum intensity (ON). No intermediate LED intensity values were tested. In addition to the phototactic bias, measures of locomotor handedness peed, choice timing, and choice sequence were also recorded.

Odor sensitivity assay

Request a detailed protocol

In the odor sensitivity assay, flies explored linear tunnels where half of the tunnel contained an odor (typically noxious) and the other contained no odor. Odor ports positioned at either end of the tunnels delivered either odor or odorless air. Vents positioned at the midpoint of each tunnel form a sharp boundary between the two air streams. Flies were presented with a 3 min baseline period with no odor on either side followed by a 12 min experimental period with half odor and half no odor. Sensitivity to the odor was estimated by measuring the fraction of occupancy time spent in the odor half of the tunnel. Flies showed a weak aversion to the designated odor side during baseline measurement, prior to the presentation of odor (0.47 ± 0.30 mean pre-odor occupancy). Flies displayed strong initial aversion (0.17 ± 0.22 mean first half odor occupancy) to the side of the tunnels with the odor, which slightly diminished (0.22 ± 0.23 mean second half occupancy) over the stimulus duration. In addition to odor occupancy, locomotor handedness was scored as the average direction of heading direction reversals on the long axis of the tunnels. In addition to the above measures, speed, turn timing, and turn direction sequence were also recorded.

Optomotor assay

Request a detailed protocol

In this assay, optomotor stimuli were centered on the bodies of flies by projecting them onto the floor of their arenas in which they are walking freely. These stimuli consisted of a maximally contrasting (black = 0, white = 1, min. power = 79 µW ± 61 nW, max. power = 2.4 mW ± 4.7 µW) rotating pinwheel (spatial frequency = 0.18 cycles/°, rotational speed = 320°/s) and typically evoked a turn in the direction of the rotation to stabilize the visual motion. The pinwheel center followed the position of the fly as it moved so that the only apparent motion of the stimulus is rotational motion around the body. Therefore, the stimulus was closed loop with respect to position and open loop with respect to rotation velocity. We observed that optomotor responses could be reliably elicited, provided individuals were already moving when the pinwheel was initiated. We therefore only presented the pinwheel when (1) flies were moving, (2) a minimum inter-trial interval (2 s) had passed to prevent behavioral responses from adapting, and (3) flies exceeded a minimum distance from the edge of the arena to ensure that the stimulus occupied a significant portion of the animal’s FoV. Over 2 hr of testing, an optomotor index was calculated for each fly as the fraction (normalized to [–1,1]) of body angle change that occurred in the same direction as the stimulus rotation over the duration of the stimulus. On average, flies displayed reliable optomotor responses (mean = 0.49 ± 0.16) when stimulated. In addition to optomotor index measures of fly activity, movement bout dynamics and locomotor handedness were also recorded.

Spatial light-shade assay

Request a detailed protocol

Ambient light preference is estimated in this assay by projecting a circular stimulus with one side fully bright (intensity = 1) and the other side fully dark (intensity = 0), with the two regions separated by a small (10% of the arena diameter) boundary zone of intermediate brightness (intensity = 0.5) (min. power = 79 µW ± 61 nW, max. power = 2.4 mW ± 4.7 µW). A light occupancy measure was calculated for each fly as the fraction of time spent in the lit region divided by the total time spent in both the lit and unlit regions. Time spent in the intermediate boundary zone was scored as no preference and was excluded from the calculation. Each 2 hr experiment was divided into 15 stimulus cycles with each cycle consisting of an alternating 4 min baseline block where each arena was unlit and a 4 min experimental block where shade-light stimuli were targeted to each arena. At the start of each block, each stimulus was rotated to center the intermediate zones (or virtual intermediate zone in the baseline block) over the flies’ bodies. In effect, this required flies to move from the intermediate zone during the block to express any preference. In addition to light occupancy measures of fly activity, movement bout dynamics and locomotor handedness were also recorded.

Temporal light-shade assay

Request a detailed protocol

Ambient light preference is estimated in this assay by projecting a fully bright (intensity = 1) or fully dark (intensity = 0) stimulus to the entire arena when flies crossed an invisible virtual boundary in the center (min. power = 79 µW ± 61 nW, max. power = 2.4 mW ± 4.7 µW). Therefore, this stimulus paradigm was distinct from the spatial light-shade assay in that flies could not express a preference by navigating with respect to an apparent spatial pattern of light. A light occupancy measure was calculated for each fly as the fraction of time flies received the lit stimulus divided by the total duration of the experiment (2 hr). The invisible boundary zone was always positioned in the center of the arena such that a randomly exploring fly would receive both stimuli in roughly equal amounts. The angular orientation of the boundary zone was randomly initialized for each arena independently at the start of the experiment. To avoid rapid switching or flickering of the stimulus for flies sitting directly on the boundary, flies were required to cross a small buffer zone (5% of the arena diameter) beyond the arena center into the other zone before switching the stimulus.

Y-maze assay

Request a detailed protocol

In the Y-maze assay, locomotor handedness was estimated as flies explored symmetrical Y-shaped arenas. Each time flies changed position from one arm of the maze to another, turns were scored as left-handed or right-handed depending on whether the chosen arm was to the left of right on the choice point (i.e., the arena center). To avoid scoring centroid estimation errors around the choice point or small forays into new arms as turns, the flies were required to traverse a minimum distance (60% the length of the arm) into any arm before a turn was scored. In addition to locomotor handedness, measures of speed, movement bout dynamics, turn choice timing, and turn choice sequence were also recorded.

Central complex thermogenetic screen

Request a detailed protocol

We extracted measure of turn timing clumpiness (choiceClumpiness) and turn direction switchiness (handSwitchiness) from a previously conducted thermogenetic screen for light-dependent modulation of locomotor handedness (Skutt-Kakaria et al., 2019). Gal4 driver and neuronal effector parental lines were crossed to generate F1 progeny with sparse neuronal expression of effectors that could be thermogenetically activated (dTrpA1) or silenced (Shibirets). For all screen experiments, fly locomotor handedness was assayed for 4 hr in the Y-maze with the following temperature program: 1 hr at 23°C (permissive temperature), 1 hr temperature ramp up to the restrictive temperature (Shibirets = 29°C, dTrpA1 = 32°C), 1 hr at the restrictive temperature, and 1 hr temperature ramp down from the restrictive temperature to 23°C. Although activity level varied over the duration of the experiment, flies made many turns throughout all temperature blocks: completing an average of 193 ± 119 and 757 ± 311 turns in the permissive and restrictive periods, respectively. The background light was repeatedly switched on and off in time blocks (min = 5 s, other times = 10 and 30, max = 60 s, each hour had the same repeated sequence) in a random order shared between all screen experiments. Because we were interested only in temperature-dependent modulation of turn dynamics, we computed individual clumpiness and switchiness scores for all turns within each temperature condition (permissive and restrictive) regardless of the light condition.

RNAseq and transcriptomic analyses

Single-fly RNAseq preparation and sequencing was performed using TM3′seq, a high-throughput low-cost RNA protocol previously described in Pallares et al., 2020. An overview of the method is provided below, but more detailed descriptions are available in an online protocol and in the original publication.

Fly tissue preparation

Request a detailed protocol

Following Decathlon behavioral testing, flies were briefly anesthetized on CO2 and transferred to a 96-well plate via aspiration. Immediately following, flies were flash-frozen on liquid nitrogen and were decapitated to separate heads and bodies. Well plates were then transferred to a dry ice and ethanol cold bath to keep the samples frozen while heads were individually transferred to a separate 96-well plate with a cold probe needle. Samples were then stored at –80°C. After storage, tissue was ground in 100 μl of lysis buffer with a 2.8 mm stainless steel grinding bead steel grinding bead for 10 min at maximum speed with a homogenizer. CyBio FeliX liquid handling robot (Analitik Jena) was used to perform mRNA extraction from the resulting lysate using a Dynabeads mRNA DIRECT Purification Kit (ThermoFisher, #61012) and a custom protocol (Pallares et al., 2020) optimized for low cost, yielding approximately 10–20 ng mRNA per head.

Library preparation

Request a detailed protocol

RNA (10 μl of 1 ng/μl mRNA) was added to 1 μl of 0.83 µM oligo (Tn5ME-B-30T) for a 3 min incubation at 65°C immediately prior to reverse transcription. The first strand of cDNA was synthesized by reverse transcription of the mRNA via a 1 hr incubation at 42°C by adding the following to the reaction mixture above: 1 μl SMARTScribe RT (Takara, #639538), 1 μl dNTPs 10 mM (NEB, #N0447S), 2 μl DTT 0.1 M (Takara, #639538), 4 μl 5× First-Strand buffer (Takara, #639538), and 1 μl B-tag-sw oligo. Following synthesis, the reverse transcriptase was inactivated via a 15 min incubation 70°C. A cDNA amplification mixture was prepared by adding 5 μl of the resulting first-strand cDNA with 7.5 μl of OneTaq HS Quick-load 2× (NEB, #M0486L) and 2.5 μl water. The cDNA was then amplified in a thermocycler via the following program: 68°C 3 min, 95°C 30 s, [95°C 10 s, 55°C 30 s, 68°C 3 min] * 3 cycles, 68°C 5 min.

Tn5 tagmentation was used alongside universal adaptors for library amplification. The adapter-B was previously added during synthesis of the first cDNA strand. To create a Tn5 adaptor-A, an adapter annealing mixture was prepared by adding 10 μl (100 μM) of a forward oligo (adapter-A) and 10 μl (100 μM) reverse adapter-A oligo (Tn5MErev) to 80 μl re-association buffer (10 mM Tris pH 8.0, 50 mM NaCl, 1 mM EDTA). Oligos were annealed in a thermocycler with the following cycle program: 95°C for 10 min, 90°C for 1 min followed by 60 cycles reducing temperature by 1°C/cycle, hold at 4°C. 5 μl of 1 μM annealed adapter was then anneal to 5 μl of Tn5 in a thermal cycler for 30 min at 37°C. 5 μl of cDNA was mixed with 1 μl of pre-charged Tn5. The adapter-A loaded Tn5 was diluted 7× in re-association buffer: glycerol (1:1), 4 μl of TAPS buffer 5× pH 8.5; 50 mM TAPS, 25 mM MgCl2, 50% v/v DMF, and 5 μl of water. The mixture was then and incubated for 7 min at 55°C followed by an additional 7 min incubation with 3.5 μl of SDS 0.2% (Promega, #V6551) to ensure that Tn5 was dissociated from the cDNA. The resulting cDNA libraries were then amplified. Briefly, 10 μl of OneTaq HS Quick-Load 2× (NEB, #M0486L), 1 μl i5 primer 1 μM, 1 μl i7 primer 1 μM, and 7 μl of water were used to amplify 1 μl of the tagmentation reaction following the program: 68°C 3 min, 95°C 30 s, [95°C 10 s, 55°C 30 s, 68°C 30 s] * 12 cycles, 68°C 5 min.

Sequencing and expression quantification

Request a detailed protocol

Samples were sequenced on an Illumina HiSeq 2500 in separate runs using dual indexes and single-end ~100 bp sequencing. Low-quality bases and adapter sequences were removed from reads using Trimmomatic 0.32 (SE ILLUMINACLIP: 1:30:7 LEADING: 3 TRAILING: 3 SLIDINGWINDOW: 4:15 MINLEN:20; Bolger et al., 2014). Downstream analysis was only performed on reads at least 20 nucleotides after trimming. Reads were then mapped to the r6.14 Drosophila melanogaster genome assembly using STAR (Dobin et al., 2013), and were assigned to genes using featureCounts from the package Subread (Liao et al., 2014). Read counts were further filtered to include only reads assigned to protein coding genes with a minimum of 500k reads per individual (n.b. measuring 3′ biased expression).

Enrichment analyses

Request a detailed protocol

Following sequencing, raw reads from individual flies with fewer than 1M reads total were removed. Raw read counts were normalized to reads per million (RPM) and were further filtered to exclude genes with less than 10 RPM averaged across all remaining flies. Individual fly read distributions were then quantile normalized so that the distributions of normalized reads were matched across all individual samples. One-dimensional linear models were fit using expression data from the remaining 6710 genes to predict each of the 97 individual behavioral scores (97 × 6710 models total). 69 × 6710 models were fit for unsupervised measures. Gene lists were constructed for each behavior separately by collecting their respective significantly predictive genes (model p-value < 0.05).

Enrichment scores were computed using clusterProfiler (Yu et al., 2012) as part of the Bioconductor (https://www.bioconductor.org) package for R (version 3.6; R Development Core Team, 2020). We performed KEGG enrichment analysis on predictive gene lists from each behavior independently to identify functional categories of genes overrepresented for any particular behavioral measure compared to the background list of 6710 genes. clusterProfiler uses a Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995) to adjust p-values. We report these adjusted p-values. KEGG categories were defined as significant if the adjusted p-values for the category were less than 0.05. Significant categories and their corresponding genes were identified separately for each behavior. Significant category (and gene) lists for groups of behaviors (i.e., all behaviors or a priori group behaviors) were constructed as the union of the significant enrichment categories (and genes) from each group’s member behaviors. To assess reproducibility of enrichment categories and their corresponding gene sets, enrichment lists were repeatedly generated from bootstrap resampled individuals (500 replicates), downstream of quantile normalization. Average p-values were calculated as the mean across bootstrap replicates of the minimum adjusted p-value for each enrichment category across behaviors within a group.

Genomic sequencing of inbred flies

Request a detailed protocol

DNA was extracted from individual flies using Zymo Quick-DNA Microprep Kit (Cat# D3020). Sequencing libraries were prepared using the transposase Tn5 using the same protocol described above (from cDNA). Samples were sequenced on an Illumina HiSeq 2500 using dual indexes, paired-end ~150 bp sequencing. After de-multiplexing, low-quality bases and adapter sequences were removed from reads using Trimmomatic 0.32 (SE ILLUMINACLIP: 1:30:7 LEADING: 3 TRAILING: 3 SLIDINGWINDOW: 4:15 MINLEN: 20). Reads were then mapped to the r6.14 Drosophila melanogaster genome assembly using BWA -aln (Li and Durbin, 2009). Genotypes were generated following GATK best practices (https://software.broadinstitute.org/gatk/best-practices/; McKenna et al., 2010). Heterozygosity was estimated using VCFTools v0.1.16 --het (Danecek et al., 2011).

Quantification and statistical analysis

Merging Decathlon data sets

Request a detailed protocol

Inspection of raw measure distributions separated by batch (i.e., a single cohort of flies tested in a single behavioral box at the same time) showed batch effects on the sample means and dispersion even within the same Decathlon experiment. Therefore, data was z-scored separately by batch and measure type to control for sample differences in mean and variance. Data of the same assay, measure, and day of testing (i.e., the unique measures from a single day of Decathlon behavior) were then combined, resulting in an initial n × d data matrix for a single Decathlon experiment, where n is the number of individuals and d is the number of measures. The matrix contained a substantial fraction of missingness (mean = 0.23 ± 0.22) within measures after initial construction. We used alternating least squares (ALS) to estimate a complete matrix that preserved, as accurately as possible, the covariance structure of the underlying data (Figure 1—figure supplement 5). To reduce run to run variation in ALS, we generated 200 complete data matrices with ALS, under different random seeds, and computed the final infilled matrix as the median of these repetitions. Simulations with ground truth data (Figure 1—figure supplement 5) for comparison of infilling methods were run with data randomly drawn from a multivariate normal distribution matching that of the inbred Decathlon covariance matrix. These simulated comparisons showed that median ALS imputation did not necessarily produce the infilling with the lowest mean-squared error, but did result in a correlation structure closer to ground truth than any other method. Since we are ultimately interested in the correlation structure of behavioral biases, we therefore opted for median ALS infilling. This process resulted in three complete matrices: two n × d matrices for the two Berlin-Kiso Decathlon experiments and one n × d matrix for the NEX Decathlon experiment.

To combine the Decathlon-1 and Decathlon-2 matrices for Berlin-Kiso, we z-scored by behavioral measure within each matrix to adjust for Decathlon experiment batch effects on mean and variance. Data from the two matrices were then combined by matching unique assay and measure combination. Because the order of assays was randomized for each Decathlon experiment, day of testing was ignored when combining behavioral measures from all non-circadian assays (e.g., olfaction odorOccupancy from day 7 of Decathlon-1 was combined with olfaction odorOccupancy from day 8 of Decathlon-2). Circadian measures were matched by day of testing due to circadian measurements being collected on all days of testing (e.g., circadian meanSpeed from day 1 of Decathlon-1 was combined with circadian meanSpeed from day 1 of Decathlon-2). Placeholder NaN values were inserted in cases where no matching measure existed in the other matrix (e.g., temporal phototaxis for Decathlon-1). The resulting full data matrix was then z-scored by measure and any residual missing values (due to lack of an existing measure match) were then infilled via the ALS method described above.

Distilled matrix generation

Request a detailed protocol

We created a distilled matrix to condense any covarying features for which we had a prior expectation that the measures might be correlated for obvious or uninteresting reasons. We defined such groups of measures (a priori groups) that primarily consisted of either duplicate measures (e.g., all 10 days of circadian gravitaxis) or measures that are likely to be linked by the same underlying phenomenon (e.g., choiceNumber and meanSpeed across multiple assays). Details of the measures in each a priori group are detailed in Supplementary file 2. Criteria used to assign each metric to its corresponding a priori group are shown in Supplementary file 3. We performed PCA on each a priori group separately in an attempt to capture group variance with fewer dimensions. We defined an adaptive cutoff for the number of PCs retained from each group as the highest PC above or within the 95% confidence interval of the variance explained by PCs computed on a shuffled version of the same matrix. We reasoned that PCs above or within the variance explained for PCs computed on the shuffled matrix (a matrix with approximately independent features) represented components corresponding to meaningful covariance.

Multidimensional analyses

Request a detailed protocol

To characterize the dimensional organization of the Decathlon, we computed the number of connected components in a thresholded covariance matrix representing a directionless graph. We swept 200 threshold values uniformly spaced between the minimum and maximum covariance. A spectrum of dimensional organization was formed by iteratively counting the number of connected components in the graph at each threshold value. To estimate a lower limit on the dimensionality and to assess the degree to which the number of connected components was dependent on specific groups of measures, we repeated the steps above on matrices after randomly selecting measures to drop from the matrix, iteratively increasing the number of measures dropped from one to d-1 (i.e., one feature remaining).

Unless otherwise specified, t-SNE embeddings of all data sets was performed on the Euclidean pairwise distances of z-scored values with perplexity = 20. All embeddings were optimized by minimizing the KL-divergence between the original and embedded data. t-SNE performed on Decathlon measures was run with perplexity = 8, where we had an expectation that clusters would be relatively small (e.g., 3–8 data points) due to the low number of measures in our a priori groups (median = 5) and low number of unique measures from each assay (median = 8) and day of testing.

Unsupervised behavioral classification

Request a detailed protocol

As previously described (Berman et al., 2014), single-fly videos were decomposed into behavioral classification time series with the motion mapper pipeline. Briefly, 200 × 200 px frames centered on the flies were translationally and rotationally aligned with sub-pixel accuracy to a template fly body to restrict frame-to-frame variation to postural changes by the flies. The data dimensionality was reduced by restricting further analysis to the 6700 px with the highest variance. The data for all individuals was further compressed with PCA into 50 eigenmode postural time series. PC time series were then spectrally decomposed into 25 uniformly spaced frequency channels via Morlet wavelet transformation, resulting in a high-dimensional (1250) representation of each frame at various timescales. A representative subsampled training set of frames was constructed for each individual by embedding their data into two dimensions with t-SNE (van der Maaten and Hinton, 2008) and selecting frames according to proportionally to their local probability density in the embedded space. A joint two-dimensional embedding for all individuals data was then constructed by embedding the combined training set via t-SNE. Individual embeddings were generated by projecting all remaining data points into the joint embedding. As previously described, the distribution of log-speed trajectories within the embedded space was well-described by a two-component Gaussian mixture model (GMM) with the majority of frames falling into a low-speed mode. We used the standard deviation of the low-speed GMM component to define a small Gaussian kernel (σ = 1.37) that was convolved with the embedded points the compute a continuous density. The embedded space was then segmented into discrete regions by computing watersheds (Najman and Schmitt, 1994) of the negative density. Behavioral classifications were then assigned to frames by the watershed region occupied. Frames were filtered by defining an embedded speed threshold (2.78) as the intersection of GMM components where the log-speed distribution was maximally separable. All frames above speed threshold received no classification. Individual occupancy within each classification was then used to compute a discrete probability density function for each individual. The behavioral state transition matrix (Figure 2G) was calculated as the row-normalized probability that a fly in state i (rows) would be in state j (columns) 1 frame later.

Human readable labels were applied to each classification by generating 8 × 8 tiled movies of frames corresponding to individual pauses within each classification above a minimum duration of 10 ms. Labels were designated post hoc to describe the behaviors that appeared to be frequently represented after repeated viewing. The labels were composed with the following format: body part (e.g., wing, forelimb, abdomen), the behaviors displayed (e.g., walking, grooming, movement), and a qualitative descriptor of the speed of movement (e.g., idle, slow, fast).

Statistics

Unless otherwise stated, all reported correlations were computed as the Spearman rank correlation. p-Values reported for correlation coefficients were calculated via two-tailed t-test of the null hypothesis that the regression coefficient was not significantly different from zero.

To generate bootstrapped distributions of correlation the correlation matrices, Decathlon matrices (either the full or distilled matrices) were sorted to match measures across Decathlon data sets as described above. A pair of bootstrapped matrices were created by bootstrapping individuals from either the same (e.g., resampling Berlin-Kiso matrix twice) or different Decathlon matrices up to the size of the original matrix. The correlation matrix was computed for each bootstrapped matrix and the unique, off-diagonal r-values of each matrix were stored. A single correlation of correlations was then calculated on the two sets of r-values. The above steps were then repeated over 100 repetitions for all unique combinations of Decathlon data sets.

False discovery rate calculation

Request a detailed protocol

We computed the FDR for significance of measure correlations as a function of α-value for each combination of Decathlon data set (Berlin-Kiso or NEX) and matrix type (full or distilled). Each data matrix was bootstrap resampled (100 repetitions) up to the size of the original matrix. Correlation matrices were computed for each matrix and p-values were calculated correlation coefficient (see statistics). We then calculated mean kernel density estimates for all p-values of unique measure combinations across bootstrap replicates. The above steps were then repeated for bootstrap shuffled data matrices to create a null distribution of p-values for correlation coefficients. We defined FDR as the shuffled p-value density divided by the observed p-value density. For determining significance of correlation, we set α = 0.05, corresponding to the following FDR: Berlin-Kiso full = 0.30, NEX full = 0.30, Berlin-Kiso distilled = 0.37, NEX distilled = 0.42.

Data availability

All analysis data and code generated are linked in the manuscript and supporting files at: https://zenodo.org/record/4081667#.X8U1tohKiHs Our raw tracking data from the assays covers ~600 flies over 12 days of continuous tracking between 3 and 60Hz and are greater than 50GB in size. Our unsupervised classification videos comprise ~144 million frames of uncompressed video data and are greater than 8TB in size. For these reasons, we have not posted the rawest forms of the data to online repositories, and are instead offering them upon request by external hard drive.

The following data sets were generated
The following previously published data sets were used
    1. Mackay et al.
    (2020) phenotypic data
    ID DGRP2. The Drosophila melanogaster Genetic Reference Panel.
    1. Robie et al.
    (2017) BABAM
    ID behavioraldata. Mapping the Neural substrates of Behavior.

References

  1. Book
    1. Nöthel H
    (1981)
    Investigations on Radiosensitive and Radioresistant Populations of Drosophila Melanogaster VIII. The System of Relative Radioresistance in Immature Oocytes of the Irradiated Population Roi4
    Mutation Research.
  2. Software
    1. R Development Core Team
    (2020) R: A Language and Environment for Statistical Computing
    R Foundation for Statistical Computing, Vienna, Austria.
    1. van der Maaten L
    2. Hinton G
    (2008)
    Visualizing data using T-SNE
    Journal of Machine Learning Research 9:2579–2605.

Decision letter

  1. Manuel Zimmer
    Reviewing Editor; University of Vienna, Austria
  2. Ronald L Calabrese
    Senior Editor; Emory University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This paper is interesting to psychologists, ethologists, and neuroscientists. It uses fruit flies as a model system and a sophisticated high-throughput experimental pipeline that captures a large space of behavioral metrics over long timescales, while keeping the identity of individual animals. Using innovative computational tools, the authors identify structure in form of covariations in this behavioral parameter space. They find that behavioral covariates appear rather sparse and high-dimensional. Hence, there are no stereotyped groups of individuality-types found in the population of flies. The work further shows that variability in behavior can be predicted in part by gene expression, and makes first attempts in targeting neuronal circuits that potentially contribute to behavioral variability.

Decision letter after peer review:

Thank you for submitting your article "The structure of behavioral variation within a genotype" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Ronald Calabrese as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Please go over the reviewers detailed comments below which we believe most of them can be addressed by additional analyses, text revisions and clarifications. The authors should revise the manuscript accordingly and prepare a point-by-point response.

2) The authors should provide a more thorough analysis showing which behaviours show persistent idiosyncrasies in individual animals as opposed to trial-trial variability (see reviewer #1). Based on this, perhaps the analyses can be repeated only on those behavioural metrics.

3) One concern is that the two decathlon data are not reproducible which undermines many of the conclusions and the subsequent comparisons to outbred fly lines. See reviewer #1 (1-2) and discussion with reviewer #3. Can the authors justify better their conclusions in the face of these concerns?

4) The reviewers agree that the results in Figure 3 are very weak. Besides the finding that behavioural covariates can change with temperature in the iso line, there is little that can be learn from the circuit interrogation lines. We wonder whether the paper would be stronger without these data included.

Reviewer #1:

The definition of individuality and its neurogenetic basis is a fundamental problem in ethology and neuroscience. Individuals might fall into discrete groups of personality types; alternatively, individuals might be better described by a broader spectrum of independent traits. An unbiased and quantitative analysis of behavioural traits that make up an individual's personality is a prerequisite of investigating the neuronal and genetic basis of individuality. Given the technical challenges in systematically measuring many behavioural traits across sufficiently large and genetically defined populations and over long time-scales, these questions remain unanswered. This manuscript represents a tour-de-force trying to shed more light in these directions. Werkhoven and colleagues aim at characterizing structure in correlations among a large set of quantitative behavioural measures obtained from the model organism Drosophila melanogaster. The authors performed a large number of high throughput behavioural experiments that cover behavioural paradigms ranging from locomotion to perceptual decision-making. Data were acquired from an inbred, hence isogenic fly line, an outbred line, and various neuronal circuit manipulations. In addition, gene expression data were obtained from individuals. In this way, the authors were able to capture hundreds of behavioural metrics from hundreds of flies, while keeping their individual identities over the course of 13 days. They developed a computational analysis pipeline that quantifies the correlation matrix computed from these metrics. In a 2-step procedure, they condense this matrix into a "distilled" matrix, the entries of which contain all remaining behavioural covariates that were not a priori expected by the authors.

A central claim in this paper is that any structure in this distilled matrix should reveal the principal axes along which individuality should be described. Based on these measurements and analyses flies could not be categorized into discrete types. Moreover, behavioral covariates appear rather sparse and derive from a high-dimensional behavioral space. This would mean that each individual fly is better described by a large combinatorial set of parameters. The same qualitative finding was made between inbred and outbred flies, leading the authors to a conclusion that larger genetic diversity does not change the principal organization of behaviour. The authors perform a set of neuronal-circuit manipulations and claim in conclusion that specific neuronal activity patterns underly structure in behavioural correlations. Some correlations between gene expression and behavioral metrics were discovered, for example gene expression of metabolic pathways

can predict some variability found in the behaviour of flies. The behavioural pipeline is sophisticated and presents a great leap forward in enabling researchers to capture a large set of behavioural measures from large fly population, keeping the identity of individuals. The work is also presenting an innovative and interesting analysis pipeline.

Although we applaud these ambitious experimental paradigms and computational techniques used, we have several major reservations about this work. Reading through the manuscript multiple times, one is left confused whether the major finding is that no structure whatsoever can be found in these data and to what extent the remaining sparse correlations are of biological / ethological relevance. Another major concern arises from the high level of trial-trial variability that is found in the data, which seems to preclude identification of persistent idiosyncrasies in the behavioural traits of individuals and impedes the reproducibility of the data matrices in two repetitions of the main experiment. We feel that most of the authors' conclusions and claims are confounded by these caveats.

(1) Distinguishing persistent idiosyncrasies from trial-to-trial variability and reproducibility of decathlon data.

A major challenge in measuring personality traits or individuality is to distinguish between persistent idiosyncrasies and trial-to-trial variation; the latter could result from inherent stochastic properties of behaviors, environmental or measurement noise. To identify an idiosyncratic behavioral trait in an animal one needs to show that individuals exhibit a distinct distribution in a behavioral metric that cannot be explained by trial-to-trial variability. Such a distinction cannot be made if a behavioral metric is measured just once or during a short period, but requires repeated measures over longer time-scales from a sufficiently large population of animals. Unfortunately, in this study many measures have been taken during just one 1-2hs episode per individual of a decathlon. For other measures that were taken repeatedly (circadian assays, unsupervised video acquisition) no efforts have been undertaken by the authors to make the above distinction. Hence, the authors' conclusion that there are no "types" of flies seems premature. In Figure S1 we are surprised to see how low most behavioral measures auto-correlate when recorded on two subsequent days; most auto-correlations further drop to meaningless values when compared over time-periods that correspond to the different epochs of a decathlon. This indicates that trial-to-trial variability dominates the data. In our view it makes little sense to ask whether two behavioral metrics are correlated or not, if their autocorrelations measured over the same time-scale are already extremely low. Moreover, Figure S5B shows that the two decathlons generate largely different data matrices (correlation ~0.25), raising concerns that the results are not reproducible. We wonder whether any structure in behavioral correlations was masked by various sources of noise in this study.

Related to above, there should be error bars and number of flies for the plots in Figure S1. This figure undermines the starting point of the paper claiming persistent idiosyncratic behaviors.

(2) Given the concerns above, it is not surprising that the outbred fly line delivers another set of covariates which lack otherwise any further structure. If experiments with >100 inbred flies cannot deliver reproducible results, it cannot be expected that a similarly sized population of outbred flies would. Perhaps the needed population size must be orders of magnitudes larger in this case.

(3) Figure 3. It is intriguing to observe how the relationship between switchiness and clumpiness is perturbed upon temperature shifts. But, it seems rather uncorrelated at the restrictive temperature in the Iso line, with a slightly positive value. However, the switchiness-clumpiness correlation is not reproducible in both perturbation types at permissive temperatures. Note, that at both temperatures the Shi and Trp datasets show no – or very low correlations: the Trp lines produce correlations from approx. -0.2 (permissive T) to 0.1 (restrictive T); the Shi lines 0, 0.1 respectively. Figure 3D is very misleading in showing the best fits to the combined datasets. We are not convinced that there is a robust sign-inversion in any of these correlation. The authors' major conclusion that " thermogenetic manipulation and specific neuronal activity patterns underlie the structure of behavioral variation" is not supported by these data. The effect of temperature in the control line, although interesting, is a major caveat for interpreting the results from the Shi and Trp results.

(4) The authors measure a large set of low- and high-level behavioral metrics, e.g. walking speed and choices in Y-mazes respectively. A fundamental problem is that many of these metrics potentially have common underlying but trivial causes, e.g. covariation between speeds measured in various conditions is expected. Therefore, the authors condense their original correlation matrix (Figure 1E) into a distilled matrix (1G) by making such judgements. In the present form, it is impossible to evaluate how systematic or arbitrarily these choices were. In many cases, where the same measure was recorded repeatedly (e.g. circadian bout length) or across different conditions (e.g. mean speed) it is obvious, but for other cases it is not obvious at all for the non-expert: for example, why are circadian-bout-length and LED-Y-maze-choice-number lumped into one block of expected behavioral covariates? The current manuscript lacks detailed explanations how the authors systematically created the distilled matrix. Can the sparseness of the distilled matrix be a consequence of too generous pre-allocations? See also point (6). The bulk of the analysis in this paper is done on the "distilled matrices" which are produced by removing correlations within previously defined groups of behavioral metrics. This is said to cleanly reveal unexpected correlations, leading to a main result of the paper, the correlations between "Switchiness" and "Clumpiness". However, if the a priori categories were defined differently, then in the extreme case this correlation would have been completely removed. How sensitive is this correlation to the choice of categories, especially given that many of the Switchiness and Clumpiness metrics are from similar assays (Figure S8)?

(5) For the second pipeline that uses t-SNE and watershed (Figure 2 and S3C), a previous publication from some of the authors [1] appears to show low repeatability of this analysis. Thus, the repeatability and noise levels of the pipeline must be investigated further. These were 3x 1h recordings per decathlon. Related to comments (1-2), the authors need to show that the differences across flies (Figure 2C,D) are not expected from the level of trial-to-trial variability. Perhaps more data from individual flies need to be recorded?

(6) 1G: To our understanding, within-block entries to the distilled matrix should indicate zero correlations, because these are correlations between PCA-projections. But we see many nonzero entries. Given the information provided in the methods it is unclear why this is the case; this requires further explanation.

In any case, within-block correlations are expected to be at least very low. Hence, we expect the distilled matrix to be relatively sparse given how it was calculated. Of interest are then the across-block correlations, the authors should make this pointy more clear top the readers.

(7) Some of the author's claims are related to the spectral dimensionality description technique described in Figure S9. However, none of the real data shown in the main paper figures look qualitatively similar to the toy data. Indeed, the histograms from the main figures are on a log scale, and are thus not comparable to the toy data results. Although the technique might be well suited for certain classes of data, one interpretation of the main paper figures seems to be that no structure is revealed whatsoever. More work should be done to exclude this as a possible interpretation, at least by generating toy data that look like the real Datasets; also with respect to point (6) above.

(8) Throughout the paper, the authors use the term "independence" for orthogonal / uncorrelated datasets. Correlation/uncorrelation – dependence/independence are not interchangeable terms. To my understanding PCA decomposes into independent variables only under certain circumstances (multivariate normal distributed data). Have the authors tested for independence?

[1] Todd, J.G., Kain, J.S. and de Bivort, B.L., 2017.

Systematic exploration of unsupervised methods for mapping behavior. Physical biology, 14(1), p.015002.

The authors should be more rigorous in identifying persistent idiosyncrasies.

The authors could repeat their analysis on the behavioral parameters that show consistent auto-correlations only.

Data/results related to Figure 3 could be removed from the paper.

Reviewer #2:

In this paper, Werkhoven and colleagues describe a large-scale effort, using Drosophila, to study variation in behavior among individuals with identical genotypes, and raised in very similar environmental conditions. This addresses the important and basic question of how much behavioral variability exists under such conditions, e.g. due to stochastic processes during development. By looking across many different behaviors, the authors are able also to investigate the nature of this variability. The key conclusion of the paper is that this intragenotypic variability is high dimensional, and cannot be explained by a small set of behavioral syndromes. They find that this observation is robust to the method they use to quantify behavior, and also holds to different degrees in data sets acquired from outbred flies, or files subjected to genetic perturbations of neural activity. Furthermore, they have generated a data set that allows correlation of behavioral biases in individual animals with transcriptomic data. Altogether, this is an impressive study that, beyond its important conclusions, opens up the possibilities for many further explorations in this area, and should be interesting to a broad audience. The experiments are well designed and overall the paper is very nicely written and clear to understand.

I have a few small questions or points that were unclear to me that the authors might like to address.

1) I missed a description of the distribution across individuals of the different behavioral measures that were used for the correlation analysis. E.g. were they generally Gaussian, heavy-tailed etc?

2) One complication in their data set is that there are quite a lot of points with missing data, because a particular variable could not be measured in a particular fly. The authors deal with this by filling in the missing data using an approach which they validate using synthetic data, where they have randomly deleted a subset of points. However it appears that the missing points in their datasets are not distributed randomly (Figure S3A) but have quite a marked structure. Perhaps the authors could apply similarly structured perturbations in their toy data to strengthen the argument for the methods they use for infilling.

3) A recent preprint from the Berman lab (Hernández et al., 2020. https://arxiv.org/abs/2007.09689) ascribes some aspects of interindividual variability to temporary differences in occupancy of longer time-scale behavioral states, and they make some attempts to normalize for this in their analysis. Perhaps the authors could discuss the possible implications, if any, for interpretation of their data set.

4) In the introduction the authors cite Pantoja et al., 2016 in the context of studying intragenotypic variability, but I don't think this paper looked at individuals with identical genotypes.

5) The data presented in Figure 3 seems less convincing than the rest of the paper. First, it appears to focus on only one particular behavioral coupling so that it is difficult to make a general conclusion from this that the Decathlon experiments tend to identify behaviorally meaningful couplings. I am also confused by Figure 3D, which appears to show, in the 95% confidence interval, a very wide range of possible slopes in the control data, which overlap considerably with the result at the higher temperature. Maybe the authors can help clarify this?

Reviewer #3:

In this paper Werkhoven et al., ask a fundamental question in behavioral neuroscience – what is the structure of co-varying behaviors among individuals within populations. While questions in the context of inter-individual behavioral differences have been studied across organisms, this work represents a highly novel and comprehensive analysis of the behavioral structure of inter-individual variation in the fly, and the underlying biological mechanism that may shape this structure of covariation. In particular, for their experiments they combined a set of behavioral tests (some of them were explored in previous studies) to a 13-day long behavioral paradigm that tested single individuals in a highly controlled and precise way. Through clever analysis the authors interestingly showed strong correlations only between a small set of behaviors, indicating that most of the behaviors that they tested do not co-vary, exhibiting many dimensions of inter-individual variation in the data. They further used perturbations of neuronal circuits and showed that temperature and circuit perturbations can change dependencies among sets of behaviors. In a different set of experiments where they integrated gene-expression data (from the brains of single individuals), they showed that some of the genes are correlated with individual-specific parameters of behaviors. Interestingly, through comparison of inbred and outbred population they demonstrated that also outbred populations are showing relatively low covariance of behaviors across individuals.

Overall, the data in the paper indicate that surprisingly, even for a 'simple' organism, there are many dimensions of inter-individual variation, e.g. many specific characters that can change among individuals in a non-depended way. The ability of the authors to precisely measure such dependencies in such a highly robust and precise way allowed their investigation of the underlying processes that may generate this variation. The results in this study are highly interesting and novel. They uncover a general picture of the structure of behavioral variation among individuals and open many avenues for further analyses of the underlying neuronal and molecular mechanisms that control variation in sets of behaviors. Furthermore, the methods that were developed in this paper can be of great use by other researches in the field.

However, while the key claims of the manuscript are well supported by the data and analyses methods, some aspects of data analysis need to be clarified or extended.

– It is not clear what is the motivation for using the 'Effective dimensionality spectrum' analysis presented in the paper and how it significantly adds to existing methods of clustering that are relying directly on the correlation/distance matrix (some of them were used in this study).

– While it is clear that the distilled behavioral covariation matrix has many independent dimensions (as the authors indicated, most of the a-priori PCs are not strongly correlated), the number of 'significant' Pcs was not calculated directly for the distilled matrix, and t-SNE analysis is presented only for the original covariation matrix (1L).

– It is possible that some if the behaviors that covary across individuals in the high temporal resolution assay and also tend to be associated over time within an individual, may indicate sequences of behavior on longer time-scales (than the timescales in which parameters are quantified).

– Further analyses are needed for extending the detection of correlations between variation in gene-expression data and the independent behavioral measures in the covariation matrix.

The results in the study by Werkhoven et al., are highly novel and important. They provide a compelling evidence for many independent dimensions of inter-individual variation within populations. Furthermore, using their careful and accurate measurements, as well as their clever analyses, they authors interestingly demonstrated that the structure of behavioral covariation among individuals may change under specific environmental/circuit manipulations and can be predicted from gene-expression differences. For their experiments they examined ~120 different behavioral parameters (across ten behavioral paradigms) in hundreds of individuals. Their robust and precise measurements allowed them to construct a covariation matrix that represents covarying behavioral parameters across individuals, thus defining characters of individuals in a more complex way. Analyses of these data detected many dimensions of variation, uncovering a complex behavioral covariation space. They further extended their experiments to (1) perturb co-variation structure by neuronal manipulation (2) find association between gene-expression and behavioral variation (3) show that both inbred and outbred populations have many dimensions of behavioral variation (4) run their analysis pipeline on available datasets.

Overall, this is an important study that will greatly impact many areas of behavioral neuroscience. The data in the paper indicate that surprisingly, even for a 'simple' organism, there are many dimensions of inter-individual variation, e.g. many specific characters that can change among individuals in a non-dependent way. The ability of the authors to precisely measure such dependencies allowed their investigation of the underlying processes that may generate this variation. This work uncovers a more global picture of the structure of behavioral variation among individuals and open many avenues for further analyses of the underlying neuronal and molecular mechanisms that control variation in sets of behaviors. Furthermore, the methods that were developed in this paper can be of use by other researches in the field.

However, while the key claims of the manuscript are well supported by the data and analyses methods, some aspects of data analysis need to be clarified or extended.

– It is not clear what is the motivation for using the 'Effective dimensionality spectrum' analysis presented in the paper and how it significantly adds to existing methods of clustering/network analysis that are relying directly on the correlation/distance matrix (some of them were used in this study). This analysis seems more of analysis of group structure of the covariation data and not a direct dimensionality assessment such as by using PCA. Also, a possibly more direct description of X axis in Figure 1J is 'number of connected components.

– It is not clear what is the number of 'significant' Pcs in the distilled matrix (Figure 1I), using the same shuffling methods that the authors used for the original matrix. It will be interesting to include t-SNE analysis for the distilled covariation matrix (such as in 1L).

– Figure 2 – why only the second Decathlon was used (p. 6)?

– Not clear how the probability matrix in Figure 2H was generated. What are the defined/measured parameters? Not in methods.

– Temporal correlations in Figure 2H may indicate sequences of behavior on longer time-scales that can change in intensity among individuals. In this case they may covary across individuals because they are part of the same temporal sequence

– It will be interesting to correlate the 'high resolution' movement parameters (Figure 2) with the rest of the behavioral parameters (Figure 1), although there is a clear separation in timescales between behavioral parameters.

– Figure 3: "colored hash marks" – do not appear in the figure.

– Figure 4 – it will be informative to correlate gene-expression data with measures of the distilled behavioral covariation matrix (PCs), given that genes are probably correlated in the same way with behavioral parameters from the same a-priori group.

– Figure 7- Analysis of other datasets indicate that in some of the cases there are stronger correlations among behavioral parameters. It will be interesting to discuss potential sources of differences in structure between the datasets.

Figure S9 Dashed line not explained in figure caption.

Comment by reviewer #3 that arouse during the discussion, in response to reviewer #1, point (1):

The reproducibility of the two experiments- the correlations in the distilled matrix (S5B) are lower than in the original matrix (S5A). I assume that the low correlation values in the distilled matrix makes the correaltion comparison more noisy. Actually, it is hard to learn a lot by computing correlation on correlations. I would ask the authors to plot the correlations of D1, D2 (one against the other) to directly show reproducibility.

https://doi.org/10.7554/eLife.64988.sa1

Author response

Reviewer #1:

The definition of individuality and its neurogenetic basis is a fundamental problem in ethology and neuroscience. Individuals might fall into discrete groups of personality types; alternatively, individuals might be better described by a broader spectrum of independent traits. An unbiased and quantitative analysis of behavioural traits that make up an individual's personality is a prerequisite of investigating the neuronal and genetic basis of individuality. Given the technical challenges in systematically measuring many behavioural traits across sufficiently large and genetically defined populations and over long time-scales, these questions remain unanswered. This manuscript represents a tour-de-force trying to shed more light in these directions. Werkhoven and colleagues aim at characterizing structure in correlations among a large set of quantitative behavioural measures obtained from the model organism Drosophila melanogaster. The authors performed a large number of high throughput behavioural experiments that cover behavioural paradigms ranging from locomotion to perceptual decision-making. Data were acquired from an inbred, hence isogenic fly line, an outbred line, and various neuronal circuit manipulations. In addition, gene expression data were obtained from individuals. In this way, the authors were able to capture hundreds of behavioural metrics from hundreds of flies, while keeping their individual identities over the course of 13 days. They developed a computational analysis pipeline that quantifies the correlation matrix computed from these metrics. In a 2-step procedure, they condense this matrix into a "distilled" matrix, the entries of which contain all remaining behavioural covariates that were not a priori expected by the authors.

A central claim in this paper is that any structure in this distilled matrix should reveal the principal axes along which individuality should be described. Based on these measurements and analyses flies could not be categorized into discrete types. Moreover, behavioral covariates appear rather sparse and derive from a high-dimensional behavioral space. This would mean that each individual fly is better described by a large combinatorial set of parameters. The same qualitative finding was made between inbred and outbred flies, leading the authors to a conclusion that larger genetic diversity does not change the principal organization of behaviour. The authors perform a set of neuronal-circuit manipulations and claim in conclusion that specific neuronal activity patterns underly structure in behavioural correlations. Some correlations between gene expression and behavioral metrics were discovered, for example gene expression of metabolic pathways can predict some variability found in the behaviour of flies. The behavioural pipeline is sophisticated and presents a great leap forward in enabling researchers to capture a large set of behavioural measures from large fly population, keeping the identity of individuals. The work is also presenting an innovative and interesting analysis pipeline.

The reviewer has identified the major contributions of the manuscript, and we are glad to hear they feel the work is innovative and interesting.

Although we applaud these ambitious experimental paradigms and computational techniques used, we have several major reservations about this work. Reading through the manuscript multiple times, one is left confused whether the major finding is that no structure whatsoever can be found in these data and to what extent the remaining sparse correlations are of biological / ethological relevance.

Technical responses to these concerns are provided below. In short, while the overall structure is one of many independent dimensions, there is structure in the form of sparse correlations between specific behaviors. Some of these correlations are not so surprising (e.g., many different measures of activity correlate across time and assay, within and between Decathlons). Some were surprising (e.g., switchiness and clumpiness) and robust in that they appear across Decathlon replicates and in wholly separate experiments. We have edited the Discussion to make it clear that these are the top-level conclusions. Specifically, we now say “We found that behavioral measures were largely independent of each other, but small sets of behaviors were significantly correlated to varying degrees. This sparse correlation structure means that with no evidence that behavioral variation cannot be readily compressed to a small number of dimensions low dimensionality.”

The pertinent sentence of the abstract has been revised to “we found sparse but significant correlations among small sets of behaviors.”

As for the ethological significance of this structure, while this is indeed an important question, particularly how this structure plays out in wild populations, directly addressing this question is beyond the scope of this lab study. We speculate that behavior having many independent axes of variation, in both inbred and outbred populations, may allow evolution to freely optimize each behavioral trait without the constraint of pleiotropic effects on other traits. In the Discussion we now say:

“The high dimensionality of the inbred and outbred matrices suggests that behaviors can evolve largely independent of one another (i.e., without pleiotropic constraint) either via plastic mechanisms within a genotype, or by natural selection in a genetically diverse population.”

It is also worth mentioning that even small but real correlations (e.g., r=0.1) can have large evolutionary effects given large populations/sufficient time.

Another major concern arises from the high level of trial-trial variability that is found in the data, which seems to preclude identification of persistent idiosyncrasies in the behavioural traits of individuals and impedes the reproducibility of the data matrices in two repetitions of the main experiment. We feel that most of the authors' conclusions and claims are confounded by these caveats.

We believe we have now improved the analysis of trial-to-trial variability and its presentation in the figures. Please see detailed replies below.

(1) Distinguishing persistent idiosyncrasies from trial-to-trial variability and reproducibility of decathlon data

A major challenge in measuring personality traits or individuality is to distinguish between persistent idiosyncrasies and trial-to-trial variation; the latter could result from inherent stochastic properties of behaviors, environmental or measurement noise. To identify an idiosyncratic behavioral trait in an animal one needs to show that individuals exhibit a distinct distribution in a behavioral metric that cannot be explained by trial-to-trial variability. Such a distinction cannot be made if a behavioral metric is measured just once or during a short period, but requires repeated measures over longer time-scales from a sufficiently large population of animals. Unfortunately, in this study many measures have been taken during just one 1-2hs episode per individual of a decathlon. For other measures that were taken repeatedly (circadian assays, unsupervised video acquisition) no efforts have been undertaken by the authors to make the above distinction. Hence, the authors' conclusion that there are no "types" of flies seems premature. In Figure S1 we are surprised to see how low most behavioral measures auto-correlate when recorded on two subsequent days; most auto-correlations further drop to meaningless values when compared over time-periods that correspond to the different epochs of a decathlon.

The reviewer is correct about the conditions that are needed to infer that a behavioral trait is idiosyncratic. We performed many experiments prior to the Decathlons to establish that most of the traits measured are idiosyncratic by this standard. In our view the autocorrelations in Figure 1—figure supplement 1 demonstrate that the behavior metrics measured include some measurement error (estimated in bootstrap CIs now reported), some trial-to-trial variability that drifts over longer timescales as well as, in many cases, biases that persist over the timescale of the decathlon experiment. This is especially evident now that Figure 1—figure supplement 1 Includes confidence intervals, see below. These experiments support the conclusion that most measures are stably idiosyncratic over the duration of the Decathlon.

We did not conduct such experiments for all behaviors analyzed in this study, as stability over days-long timescales has been established in previous studies for olfactory preferences (Honegger and Smith et al., 2020, PNAS) and behavioral modes identified by unsupervised classification (Todd et al., 2017, Physical Biology; Hernandez…Berman, 2020, BioRxiv). Additionally, at this point we have relatively strong priors that most individual behavioral measures acquired over many trials are repeatable over days as was also previously seen in locomotor handedness (Buchanan et al., 2015, PNAS), thermal and shade preference (Kain et al., 2015; Akhund-Zade et al., 2020, BioRxiv), visually guided locomotion (Linneweber et al., 2020, Science), dyadic affinity measures (Alisch et al., 2018, eLife) and olfactory learning (Smith et al., 2020 BioRxiv).

There is evidence of drift in biases over weeks-long timescales for most behavior measures. We interpret the fact that the autocorrelations progressively decay over multiple days of measurement suggest that this shift is biological and not due primarily to measurement error. There are indeed many behavioral autocorrelations that decay to meaninglessness within the time scale of the experiment. That is a limitation of the study, but it does not preclude us from concluding that behavior is not organized along a small number of strongly correlated axes. Nor does it preclude the possibility of measuring correlations between behavioral measures over shorter timescales. Correlations between behavioral measurements over shorter timescales could still be of ethological significance, we identified sparse significant correlations between unexpected pairs of behaviors despite the time-dependent loss of power to detect such correlations in some behaviors.

We expect that behavioral measurements are composed of many unknown variables and that many real behavioral correlations will be weak. We designed the decathlon pipeline to be high throughput so it could afford us the power to detect weak correlations. We believe that the abundance of significant day-to-day correlations that are low is indeed a key finding of our study and, in fact, what one might expect from behaviors orchestrated by overlapping circuits, but also subject to modulatory fluctuations and internal states changing on many timescales.

This indicates that trial-to-trial variability dominates the data. In our view it makes little sense to ask whether two behavioral metrics are correlated or not, if their autocorrelations measured over the same time-scale are already extremely low. Moreover, Figure S5B shows that the two decathlons generate largely different data matrices (correlation ~0.25), raising concerns that the results are not reproducible. We wonder whether any structure in behavioral correlations was masked by various sources of noise in this study.

We expected that the distribution of correlation coefficients in previous FigS5 (now Figure 1—figure supplement 6) would be quite low for the distilled matrices when comparing the first and second decathlon iterations. This is partly because our inclusion criteria only specifies that included PCs must be above or within the 95% confidence interval of the shuffled scree plot. This means we are liberally including higher PCs that may indeed reflect noisy measurements. Although the correlation of the inbred and outbred distilled matrices is relatively low overall, there are several strong relationships between distilled matrix PCs that are reproducible in inbred and outbred animals (Figure 4—figure supplement 4). These reproducible relationships suggest that the low correlation between distilled matrices is a result of (1) a biologically significant lack of correlation among the distilled behavior measures and (2) noise in some measurements, particularly the higher PCs in each a priori grouping.

We also find it encouraging that the inbred and outbred full matrices, which do not have the caveats of the a priori distillation approach, are much more correlated. This suggests that the lack of conspicuous structure common to both distilled matrices is a reflection of the lack of broad scale structure between “surprising” pairs of behaviors, rather than low reproducibility of behavioral measures.

Related to above, there should be error bars and number of flies for the plots in Figure S1. This figure undermines the starting point of the paper claiming persistent idiosyncratic behaviors.

We agree that error bars in previously Figure S1 (now Figure 1—figure supplement 1) are necessary to interpret the plot and have reorganized the figure to show the persistence of these traits in the same style as the correlation matrices reported in the main figures, as well as confidence intervals for each trait as estimated by bootstrap resampling. We believe that it can now be clearly seen that our sample sizes (in the hundreds) provided the statistical power to detect even weak correlations. Thus, our finding of few strong correlations between behaviors is a positive result, rather than reflecting a lack of power.

(2) Given the concerns above, it is not surprising that the outbred fly line delivers another set of covariates which lack otherwise any further structure. If experiments with >100 inbred flies cannot deliver reproducible results, it cannot be expected that a similarly sized population of outbred flies would. Perhaps the needed population size must be orders of magnitudes larger in this case.

Given that we had the power to detect even modest correlations between behaviors, and did detect many significant correlations in both the full and distilled matrices in inbred decathlons (Figures1E-H, 4A,G and Figure 1—figure supplement 10), we believe we had sufficient power to detect moderate changes to the correlation structure between inbred and outbred populations (particularly in the full matrix, see Figure 1—figure supplement 6, Figure 1—figure supplement 10, Figure 4—figure supplement 4).

We believe that the reviewer’s impression of a lack of shared dense structure between inbred and outbred distilled matrices is the right top-line conclusion. That said, there are specific behavioral correlations that are conserved between the inbred and outbred populations. We have added a new supplementary Figure 4—figure supplement 4 that highlights a few of these conserved relationships and also presents scatter plots of the within-decathlon correlations between the inbred and outbred experiments in the style of Figure 1—figure supplement 6.

It is certainly true that with a much larger sample size, we might have been able to (1) detect weaker correlations conserved between inbred and outbred populations, and (2) more precisely delineate the ways in which the covariation structure differs between these conditions. But we were operating very close to practical limits with this experiment, and to conduct it with substantially larger sample sizes would likely require a novel technical approach like automated fly-handling.

(3) Figure 3. It is intriguing to observe how the relationship between switchiness and clumpiness is perturbed upon temperature shifts. But, it seems rather uncorrelated at the restrictive temperature in the Iso line, with a slightly positive value. However, the switchiness-clumpiness correlation is not reproducible in both perturbation types at permissive temperatures. Note, that at both temperatures the Shi and Trp datasets show no – or very low correlations: the Trp lines produce correlations from approx. -0.2 (permissive T) to 0.1 (restrictive T); the Shi lines 0, 0.1 respectively. Figure 3D is very misleading in showing the best fits to the combined datasets. We are not convinced that there is a robust sign-inversion in any of these correlation.

We believe that at least some of the reviewer’s concerns here stem from problematic data visualization. We used a PCA-based regression to fit trend lines based on residuals that are orthogonal to the trend line. While this is appropriate given that neither of these measures stands out as a dependent vs independent variable, in combination with the z-score normalization of each measure, it produces fit lines only with slope 1 or -1. Given the modest underlying correlation, when bootstrapping is applied there is a large resulting variation in the fit lines used to determine the CI of the regression.

We are including below a comparable analysis based on standard linear regression for comparison, with switchiness as the independent variable:

This visualization of the regression supports the original interpretations. However, since this entire figure is now in the supplementary materials (Figure 3—figure supplement 1), we prefer to use the original PCA regression for consistency with the rest of the regression analyses throughout the manuscript and data browser.

The authors' major conclusion that " thermogenetic manipulation and specific neuronal activity patterns underlie the structure of behavioral variation" is not supported by these data. The effect of temperature in the control line, although interesting, is a major caveat for interpreting the results from the Shi and Trp results.

The reviewer is correct that the effect of temperature alone in the control lines limits the extent to which we can conclude that neural state affects the correlation structure. We have now re-worked the discussion of the thermogenetic experiments to emphasize the effect of temperature alone. We believe that the temperature effect on correlation is real, and quite possibly mediated through effects on brain state (c.f. work showing that latent individuality in circuit activity is revealed by thermal stress; Haddad and Marder, 2018, Neuron). However, since we did not design these experiments to test the effects of temperature, and given other reviewer’s concerns with this experiment, we have moved this figure into the supplementary materials as Figure 3—figure supplement 1.

(4) The authors measure a large set of low- and high-level behavioral metrics, e.g. walking speed and choices in Y-mazes respectively. A fundamental problem is that many of these metrics potentially have common underlying but trivial causes, e.g. covariation between speeds measured in various conditions is expected. Therefore, the authors condense their original correlation matrix (Figure 1E) into a distilled matrix (1G) by making such judgements. In the present form, it is impossible to evaluate how systematic or arbitrarily these choices were.

Although the assignment of metrics to a priori groups was a somewhat subjective process, we tried to be thoughtful in our approach. Measures grouped together met one or more of the following criteria:

1. Multiple measures of the same metric in the same assay across days (e.g. circadian speed days 1-10)

2. Multiple measures of the same metric across assays (e.g. Y-Maze hand clumpiness and Olfaction hand clumpiness)

3. Metrics that could share a plausible trivial coupling (e.g. Circadian bout length and LED Y-Maze choice number)

4. Measures of responses to similar stimuli (e.g. three measurements in response to phototactic stimuli in LED Y-Maze, Temporal Shade-light, and Spatial Shade-light)

We appreciate that this process was not transparent and have added a supplemental table to clarify our classification of each metric into its respective group. These criteria are now mentioned explicitly in Supplementary File 3.

In many cases, where the same measure was recorded repeatedly (e.g. circadian bout length) or across different conditions (e.g. mean speed) it is obvious, but for other cases it is not obvious at all for the non-expert: for example, why are circadian-bout-length and LED-Y-maze-choice-number lumped into one block of expected behavioral covariates?

As an example of the process, we reasoned that LED Y-Maze choice number and Circadian bout length would be correlated under criterion #3. Although it is not by definition true that more active animals will make more passes through the center of the Y-Maze (i.e. choices), we have observed that this is true in practice because most turn around events occur only when flies reach the end of an arm. We similarly reasoned that flies that are more active by other measures such as mean speed and choice number would also tend to move in longer bouts. Importantly, we recognized that these assumptions might be wrong, which is why we measured the compressibility of each a priori group separately and included all PCs from each group with more variance captured than shuffled controls. Indeed, there are several a priori groups that we found to be incompressible and therefore passed all the original metrics straight to the distilled matrix.

While the above explains our decision-making for this specific example, we hope the reviewer agrees that it is not essential to provide a detailed explanation for all measures in all a priori groups. We believe that the PCA process, which retains independent measures, makes our qualitative results largely robust to the subjectivity of the a priori grouping process. In addition, we now provide detailed descriptions of each measure as a supplementary table (in addition to the data browser). Along with (what we hope are) well-annotated public repositories of our data and code, it should be feasible for curious third parties to try their own alternative a priori groupings.

The current manuscript lacks detailed explanations how the authors systematically created the distilled matrix. Can the sparseness of the distilled matrix be a consequence of too generous pre-allocations? See also point (6).

Given that we grouped behaviors for which we would not be surprised to find a correlation, any additional correlations seen in a matrix constructed via less-generous grouping would, to us, be not surprising. Surprising correlations might be present in more than one “copy” in such a matrix, but the current distilled matrix should include all such redundant correlations as at least a single entry, given the within-group PCA process that retains non-compressible dimensions.

To confirm that the top qualitative conclusion of high independence with sparse significant correlations is indeed robust to the generosity of a priori group definitions, we computed a larger distilled matrix where the only a priori groups were identical measures taken across multiple days. This distilled matrix (Author response image 1) supports the same qualitative conclusions as the distilled matrix presented in the manuscript.

Author response image 1

The bulk of the analysis in this paper is done on the "distilled matrices" which are produced by removing correlations within previously defined groups of behavioral metrics.

By our count, in the main figures, 37 plots present results from the full matrix and 12 plots present results from the distilled matrix. We interpret the reviewer’s comment to mean that some of the most high-level conclusions of the paper (e.g. behavioral correlations are sparse and that the structure of covariation is similar in inbred and outbred flies) were based on observations in the distilled matrix. We base our responses to the following comments on that assumption.

This is said to cleanly reveal unexpected correlations, leading to a main result of the paper, the correlations between "Switchiness" and "Clumpiness". However, if the a priori categories were defined differently, then in the extreme case this correlation would have been completely removed.

We constructed the pipeline carefully so as not to remove any correlation from the data, surprising or unsurprising. Both of these kinds of correlation are shown in the full matrix. The distilled matrix reflects our subjective definition of surprising correlations, given our experience with fly behavior.

While it is true that if we had grouped switchiness and clumpiness together in an “a priori group” (this is now in scare quotes because such a group would no longer reflect our goal of only grouping behaviors for which we had a prior expectation of correlation), then the correlation between them would have been assumed, rather than discovered. As such, the distilled matrix can be considered a conservative list of interesting correlations, with a more liberal definition of interesting being reflected in the full matrix.

How sensitive is this correlation to the choice of categories, especially given that many of the Switchiness and Clumpiness metrics are from similar assays (Figure S8)?

If we had constructed the groups differently, our quantitative interpretation of the results in previous Figure S8 (now Figure 1—figure supplement 11) would have changed, but the qualitative result would likely not have changed. If for example we did not group clumpiness measures together and switchiness measures together but grouped by assay, this correlation would have been preserved but would have been spread more broadly over different features via the loadings of the PCs that comprise the distilled matrix.

The distilled matrix shown above indicates that this less inclusive approach to defining a priori groups (a more liberal definition of what is subjectively surprising) still produces a cluster of significant correlations between switchiness and clumpiness measures.

(5) For the second pipeline that uses t-SNE and watershed (Figure 2 and S3C), a previous publication from some of the authors [1] appears to show low repeatability of this analysis.Thus, the repeatability and noise levels of the pipeline must be investigated further.

In implementing the unsupervised classification pipeline outlined in Figure 2 and previous Figure S3C (now Figure 1—figure supplement 4), we found that different runs of the embedding heuristic (the potential point of repeatability failure) gave consistent partitionings of the data. To document this, we are including a confusion matrix from two runs of the embedding algorithm in (Author response image 2).

Author response image 2

These were 3x 1h recordings per decathlon. Related to comments (1-2), the authors need to show that the differences across flies (Figure 2C,D) are not expected from the level of trial-to-trial variability. Perhaps more data from individual flies need to be recorded?

To be clear, each fly was recorded for unsupervised analysis for only 1 h. Given the throughput of this imaging, we had to record flies over 3 separate days/batches. For this same reason, it was not feasible to include multiple recordings for unsupervised analysis per individual within the decathlon, nor over multiple consecutive days for hundreds of individuals in the style of our Figure 1—figure supplement 1 persistence experiments.

We are not certain about the reviewer’s suggestion of recording from more flies. We observed dense significant correlations among the behavioral modes Figure 2E, suggesting we had ample statistical power for detecting cross-mode correlations. While we did not conduct persistence experiments with the unsupervised classification in this study, adding more flies would not address this.

Nevertheless, we have some confidence that the measures produced by unsupervised analyses represent a similar level of day-to-day stability as the measures collected in our other assays. First, in Todd et al., we saw within-individual persistence in the distribution of behavioral modes performed, across trials separated by 24 and 48 hours (Figure 8D in that publication). The per-experiment transition matrices were similarly persistent across days within individuals (8E). Analyses from the Berman group (Berman et al., 2014, J R Soc Interface; Hernandez et al., 2020, BioRxiv) also show that differences in the profile of unsupervised behavioral modes exhibited by individuals are attributable to “long-lasting internal states'' which potentially reflect biases stable over days. Though, to our knowledge, they have not conducted test-retest experiments maintaining individual identity to distinguish hours-long from days-long internal states.

We additionally believe that the interpretation of the high-level conclusion that the correlations among behavioral motifs are organized into blocks corresponding to anatomical region is supported whether measured differences in behavioral biases reflect hours-long or days-long states.

(6) 1G: To our understanding, within-block entries to the distilled matrix should indicate zero correlations, because these are correlations between PCA-projections. But we see many nonzero entries. Given the information provided in the methods it is unclear why this is the case; this requires further explanation.

The non-zero values between PCs are the result of imperfect alignment of the covariance within inbred and outbred flies. The behavior data from both experimental groups is combined for upstream of PCA so that they share a common projection (This is necessary so that PCX of the inbred matrix can be directly compared to PCX of the outbred matrix). The data are then split back into inbred and outbred groups before performing correlation analysis. We interpret the relative low magnitude of these residual correlations as an indication that the variance is relatively well aligned across inbred and outbred flies.

In any case, within-block correlations are expected to be at least very low. Hence, we expect the distilled matrix to be relatively sparse given how it was calculated. Of interest are then the across-block correlations, the authors should make this pointy more clear top the readers.

We have added the following sentence to the caption of Figure 1G to make this clear: “Meaningful correlations in this matrix can be found outside the within-a-priori-group on-diagonal blocks.”

(7) Some of the author's claims are related to the spectral dimensionality description technique described in Figure S9. However, none of the real data shown in the main paper figures look qualitatively similar to the toy data. Indeed, the histograms from the main figures are on a log scale, and are thus not comparable to the toy data results.

We have now plotted connected components spectra in what was previously Figure S9 (now Figure 1—figure supplement 12) on a log-scale. We have done this for all such plots. We have also added analyses to Figure 1—figure supplement 12 comparing connected components spectra of the observed data to those of random data drawn sampled from the covariance matrix observed in the empirical decathlon data.

Although the technique might be well suited for certain classes of data, one interpretation of the main paper figures seems to be that no structure is revealed whatsoever.

As a philosophical point, we think it is worth respectfully remarking that correlation distributed uniformly across dimensions is indeed a structure around which behavioral variation might be organized. Not finding organization in tight dimensional bands is not a negative result or technical failure of the approach.

Insufficient motivation for the effective dimensionality spectra was brought up by reviewer 2, and we have added a couple sentences when the approach is introduced to motivate it more clearly. We now explicitly state that organization on a wide range of dimensionalities is a possible outcome: ”As an intuitive example, data filling a volume shaped like a frisbee is organized largely in two dimensions. Data shaped like a rugby ball is somewhat one-dimensional, not particularly two-dimensional and somewhat three dimensional. An approach is needed that can characterize such continuous variation in organization across dimensionalities, particularly the possibility that the data are not structured tidily in a single dimensionality.”

More work should be done to exclude this as a possible interpretation, at least by generating toy data that look like the real Datasets; also with respect to point (6) above.

We have now included in Figure 1—figure supplement 12 an example with (1) on-diagonal blocks with no correlation, (2) noise, and (3) sparse stronger correlations in off-diagonal blocks modeled from the empirical decathlon correlation matrices. This produces a dimensionality spectrum that is concordant with the observed spectrum for the distilled matrix.

(8) Throughout the paper, the authors use the term "independence" for orthogonal / uncorrelated datasets. Correlation/uncorrelation – dependence/independence are not interchangeable terms. To my understanding PCA decomposes into independent variables only under certain circumstances (multivariate normal distributed data). Have the authors tested for independence?

We thank the reviewer for catching this mistake. While we think “independent” is appropriate at several points in the paper, in the sense of linearly independent vectors or axes, data can certainly be uncorrelated without being independent. In general when we used this word in the context of PCA of real data, we meant uncorrelated. We have replaced all the misused instances of “independent” with “uncorrelated.”

[1] Todd, J.G., Kain, J.S. and de Bivort, B.L., 2017.

Systematic exploration of unsupervised methods for mapping behavior. Physical biology, 14(1), p.015002.

The authors should be more rigorous in identifying persistent idiosyncrasies.

We hope that with the revisions described above the statistical support for the persistence of these idiosyncrasies is now clearer.

The authors could repeat their analysis on the behavioral parameters that show consistent auto-correlations only.

We have now generated a distilled matrix using only behavioral measures with r-values greater than 0.15 for all time points in the persistence experiments (Author response image 3) . It is qualitatively similar to the original distilled matrix, suggesting that the high-level results of this approach are robust to the exclusion of less repeatable behaviors.

Author response image 3

Data/results related to Figure 3 could be removed from the paper.

We are open to removing them altogether, since they are challenging to interpret. However, since they do suggest mechanistic bases for the correlation structure of behavior (physiologic broadly defined to include both temperature and neural state) we would prefer to include them in the manuscript. We believe that de-emphasizing them in the Results and Discussion, and moving Figure 3 to the supplementary materials strikes a good balance.

Reviewer #2:

In this paper, Werkhoven and colleagues describe a large-scale effort, using Drosophila, to study variation in behavior among individuals with identical genotypes, and raised in very similar environmental conditions. This addresses the important and basic question of how much behavioral variability exists under such conditions, e.g. due to stochastic processes during development. By looking across many different behaviors, the authors are able also to investigate the nature of this variability. The key conclusion of the paper is that this intragenotypic variability is high dimensional, and cannot be explained by a small set of behavioral syndromes. They find that this observation is robust to the method they use to quantify behavior, and also holds to different degrees in data sets acquired from outbred flies, or files subjected to genetic perturbations of neural activity. Furthermore, they have generated a data set that allows correlation of behavioral biases in individual animals with transcriptomic data. Altogether, this is an impressive study that, beyond its important conclusions, opens up the possibilities for many further explorations in this area, and should be interesting to a broad audience. The experiments are well designed and overall the paper is very nicely written and clear to understand.

I have a few small questions or points that were unclear to me that the authors might like to address.

1) I missed a description of the distribution across individuals of the different behavioral measures that were used for the correlation analysis. E.g. were they generally Gaussian, heavy-tailed etc?

Many of the behavioral measures are roughly normally distributed, some with heavy tails. Histograms of the raw measures were previously available on the Decathlon Data Browser. We have now added them asFigure 1—figure supplement 3.

2) One complication in their data set is that there are quite a lot of points with missing data, because a particular variable could not be measured in a particular fly. The authors deal with this by filling in the missing data using an approach which they validate using synthetic data, where they have randomly deleted a subset of points. However it appears that the missing points in their datasets are not distributed randomly (Figure S3A) but have quite a marked structure. Perhaps the authors could apply similarly structured perturbations in their toy data to strengthen the argument for the methods they use for infilling.

This is a good suggestion. We have repeated the analysis in previous Figure S4 (now Figure 1—figure supplement 5) with data randomly drawn from multivariate normal distributions derived from the empirical inbred full covariance matrix. To asses the statistical impact of the empirical missing data, we took a resampling approach to generating missing values in this ground-truth sampled data. Values were deleted from the data matrix by shuffling columns of the full matrix missingness mask randomly and drawing columns up to the observed amount (40%) of missing values. We found that the infilling method performed similarly with missing data of this structure and have updated Figure 1—figure supplement 5 to show these new analyses.

3) A recent preprint from the Berman lab (Hernández et al., 2020. https://arxiv.org/abs/2007.09689) ascribes some aspects of interindividual variability to temporary differences in occupancy of longer time-scale behavioral states, and they make some attempts to normalize for this in their analysis. Perhaps the authors could discuss the possible implications, if any, for interpretation of their data set.

Our understanding of those results is as follows: the covariance structure of within-species behavior modes has similar high-level clusters as the clusters which emerge from an information bottleneck analysis that identifies the best high-level clustering of the modes such that the current mode is predictive of modes ~1h in the future. This is the longest time scale in that data because of the length of the recordings. Thus, the authors have found that the behavior is organized on hours-long timescales in ways that mirror the covariation across individuals. But this does not imply that behavior is not organized on days-long timescales in ways that also mirror the covariation across individuals. We believe the key experiment would be to test individuals on two occasions, separated by days, maintaining their identity, and our understanding is that the Berman group has not done this experiment. Given the days-long stability of unsupervised behavioral modes we observed in Todd, et al., we would not be surprised if biases in the behavioral modes identified by Berman’s approach were correlated over days.

That said, the concordance of the clustering between their within-species covariance matrix and information bottleneck analysis seems consistent with the concordance we observed between the correlation matrix in Figure 2E and the markov transition matrix between behavioral modes (2H). We now discuss this consistency in the Discussion.

4) In the introduction the authors cite Pantoja et al., 2016 in the context of studying intragenotypic variability, but I don't think this paper looked at individuals with identical genotypes.

The reviewer is correct that these animals were not deliberately inbred. However, it was a study looking at domesticated animals, which tend to have relatively low genetic diversity by virtue of their small population culture conditions. So we would guess it is pertinent to the structure of IGV. But a more relevant set of fish experiments was conducted in clonal Amazon mollies (Bierbach and Laskowski, Wolf, 2017, Nat. Comm) which we now cite instead.

5) The data presented in Figure 3 seems less convincing than the rest of the paper. First, it appears to focus on only one particular behavioral coupling so that it is difficult to make a general conclusion from this that the Decathlon experiments tend to identify behaviorally meaningful couplings.

Indeed, these focused experiments do not provide a broad validation of the many correlations observed in the Decathlon experiments. We think the experiments presented in the previous Figure 3 (now Figure 2—figure supplement 1) serve the purpose of exploring a potential mechanistic origin of the correlations between behaviors, namely physiological states.

I am also confused by Figure 3D, which appears to show, in the 95% confidence interval, a very wide range of possible slopes in the control data, which overlap considerably with the result at the higher temperature. Maybe the authors can help clarify this?

The wide range of fits seen in those confidence intervals, reflect our choice of a PCA-based regression, which minimizes residuals orthogonal to the regression line, rather than typical regressions that minimize residuals on the dependent variable (y) axis. In combination with z-score normalization applied to each behavioral measure, this approach produces regression lines that have slope 1 or -1. For a weak correlation like the one in 3D, bootstrapping produces best flt lines of both of these slopes frequently, and consequently wide CIs.

Moreover, since these experiments include changes to the correlation structure in control lines subject to the heatshock, we cannot definitively interpret the results as implying that neural state (rather than physiological state more broadly construed). For these reasons we have moved this figure into the supplementary materials, and re-emphasized the caveats associated with these experiments in the discussion.

Reviewer #3:

In this paper Werkhoven et al., ask a fundamental question in behavioral neuroscience – what is the structure of co-varying behaviors among individuals within populations. While questions in the context of inter-individual behavioral differences have been studied across organisms, this work represents a highly novel and comprehensive analysis of the behavioral structure of inter-individual variation in the fly, and the underlying biological mechanism that may shape this structure of covariation. In particular, for their experiments they combined a set of behavioral tests (some of them were explored in previous studies) to a 13-day long behavioral paradigm that tested single individuals in a highly controlled and precise way. Through clever analysis the authors interestingly showed strong correlations only between a small set of behaviors, indicating that most of the behaviors that they tested do not co-vary, exhibiting many dimensions of inter-individual variation in the data. They further used perturbations of neuronal circuits and showed that temperature and circuit perturbations can change dependencies among sets of behaviors. In a different set of experiments where they integrated gene-expression data (from the brains of single individuals), they showed that some of the genes are correlated with individual-specific parameters of behaviors. Interestingly, through comparison of inbred and outbred population they demonstrated that also outbred populations are showing relatively low covariance of behaviors across individuals.

Overall, the data in the paper indicate that surprisingly, even for a 'simple' organism, there are many dimensions of inter-individual variation, e.g. many specific characters that can change among individuals in a non-depended way. The ability of the authors to precisely measure such dependencies in such a highly robust and precise way allowed their investigation of the underlying processes that may generate this variation. The results in this study are highly interesting and novel. They uncover a general picture of the structure of behavioral variation among individuals and open many avenues for further analyses of the underlying neuronal and molecular mechanisms that control variation in sets of behaviors. Furthermore, the methods that were developed in this paper can be of great use by other researches in the field.

However, while the key claims of the manuscript are well supported by the data and analyses methods, some aspects of data analysis need to be clarified or extended.

We are glad the reviewer saw these strengths of the paper and hope that our responses to their feedback below, and the responses to the other reviews address these aspects.

– It is not clear what is the motivation for using the 'Effective dimensionality spectrum' analysis presented in the paper and how it significantly adds to existing methods of clustering that are relying directly on the correlation/distance matrix (some of them were used in this study).

Please see detailed response below.

– While it is clear that the distilled behavioral covariation matrix has many independent dimensions (as the authors indicated, most of the a-priori PCs are not strongly correlated), the number of 'significant' Pcs was not calculated directly for the distilled matrix, and t-SNE analysis is presented only for the original covariation matrix (1L).

Please see detailed response below.

– It is possible that some if the behaviors that covary across individuals in the high temporal resolution assay and also tend to be associated over time within an individual, may indicate sequences of behavior on longer time-scales (than the timescales in which parameters are quantified).

Please see detailed response below.

– Further analyses are needed for extending the detection of correlations between variation in gene-expression data and the independent behavioral measures in the covariation matrix.

Please see detailed response below.

The results in the study by Werkhoven et al., are highly novel and important. They provide a compelling evidence for many independent dimensions of inter-individual variation within populations. Furthermore, using their careful and accurate measurements, as well as their clever analyses, they authors interestingly demonstrated that the structure of behavioral covariation among individuals may change under specific environmental/circuit manipulations and can be predicted from gene-expression differences. For their experiments they examined ~120 different behavioral parameters (across ten behavioral paradigms) in hundreds of individuals. Their robust and precise measurements allowed them to construct a covariation matrix that represents covarying behavioral parameters across individuals, thus defining characters of individuals in a more complex way. Analyses of these data detected many dimensions of variation, uncovering a complex behavioral covariation space. They further extended their experiments to (1) perturb co-variation structure by neuronal manipulation (2) find association between gene-expression and behavioral variation (3) show that both inbred and outbred populations have many dimensions of behavioral variation (4) run their analysis pipeline on available datasets.

Overall, this is an important study that will greatly impact many areas of behavioral neuroscience. The data in the paper indicate that surprisingly, even for a 'simple' organism, there are many dimensions of inter-individual variation, e.g. many specific characters that can change among individuals in a non-dependent way. The ability of the authors to precisely measure such dependencies allowed their investigation of the underlying processes that may generate this variation. This work uncovers a more global picture of the structure of behavioral variation among individuals and open many avenues for further analyses of the underlying neuronal and molecular mechanisms that control variation in sets of behaviors. Furthermore, the methods that were developed in this paper can be of use by other researches in the field.

However, while the key claims of the manuscript are well supported by the data and analyses methods, some aspects of data analysis need to be clarified or extended.

– It is not clear what is the motivation for using the 'Effective dimensionality spectrum' analysis presented in the paper and how it significantly adds to existing methods of clustering/network analysis that are relying directly on the correlation/distance matrix (some of them were used in this study).

Our use of the effective dimensionality spectrum analysis was originally motivated by a desire to provide a spectral approach that could quantify the amount of organization at each dimension. Considering that much of the organization we observed in the behavioral data was distributed broadly over many weak correlations of varying strength, we felt that using a method that defined a single number for the dimensionality would inevitably be incomplete. We agree that the method was not sufficiently motivated in the text and that interpretation of dimensionality spectra is difficult even for the toy data sets in previous Figure S9 (now Figure 1—figure supplement 12). For that reason,(1) we have introduced the idea with new (hopefully relatable) examples, specifically considering the spectrum of dimensionality associated with data filling the volume of a rugby ball and a frisbee. (2) We have replaced the single value previously listed in the "ground truth dimensions" column of previous Figure 1—figure supplement 12 with multiple values for each example reflecting the ground-truth structure we built into these covariance matrices. This more accurately reflects the continuous distribution of organization across dimensionalities. (3) To this figure, per the suggestion of another reviewer, we have added connected components spectra based on ground truth correlation matrices constructed to match what we conclude are the key statistical features of the decathlon correlation matrices: a specific distribution of sparse correlations of varying strength. We find that spectra of this synthetic data are a qualitative match to those of the decathlon data. This suggests that these key statistical features are sufficient to capture the dimensional organization of the decathlon matrices. (4) We renamed the method to “connected components strectra” to be more transparent about the method and how it differs from other methods.

We believe these edits more clearly communicate the motivation for this analysis and the clarity of the examples in Figure 1—figure supplement 12.

This analysis seems more of analysis of group structure of the covariation data and not a direct dimensionality assessment such as by using PCA. Also, a possibly more direct description of X axis in Figure 1J is 'number of connected components.

We feel that this analysis does reflect the dimensional organization of the data, the same way that PCA (the eigen-decomposition of the covariance matrix) does. We do, however, agree that the plot labels could be clearer and have relabeled the x-axes of the dimensionality spectra “connected components” to reflect this recommendation. We have also renamed the y-axes “log(count).”

– It is not clear what is the number of 'significant' Pcs in the distilled matrix (Figure 1I), using the same shuffling methods that the authors used for the original matrix.

The scree plot of the distilled matrix is shown in Figure 1I and the number of significant PCs in the distilled matrix is reported in the text: “The full matrix contained 22 significant PCs, with PCs 1-3 explaining 9.3, 6.9 and 5.8% of the variance, respectively (the distilled matrix contained 16 significant PCs, with PCs 1-3 explaining 13.8, 10.0, and 7.7% of the variance, respectively; Figure 1I).”

It will be interesting to include t-SNE analysis for the distilled covariation matrix (such as in 1L).

The t-SNE result embedding of individual flies using the distilled matrix features is now shown in Figure 4—figure supplement 4D and shows a unimodal distribution of flies for both inbred and outbred flies to similar to flies embedding using the full matrix features. We have added a description this result to the Results section. We have not added a t-SNE embedding of the behavioral measures of the distilled matrix in the space of individual flies because 38 points is really not enough to see structure using this method.

– Figure 2 – why only the second Decathlon was used (p. 6)?

We were unable to perform the unsupervised classification analysis because the video quality of the unsupervised footage from the first decathlon was too low to reliably align frames. We improved our imaging strategy for the second (inbred batch 2) and third (outbred) decathlon iterations. The unsupervised analysis was also performed on the outbred flies but was not originally shown. We have now included Figure 4—figure supplement 2, which is comparable to Figure 2, in the supplement for the outbred flies.

– Not clear how the probability matrix in Figure 2H was generated. What are the defined/measured parameters? Not in methods.

The probability matrix in Figure 2H was generated by calculating the frame to frame probability of transitioning from state i to state j in the following frame. Therefore each row represents the transition probabilities for a given behavioral state. We have added text to the figure caption and methods to clarify this point.

– Temporal correlations in Figure 2H may indicate sequences of behavior on longer time-scales that can change in intensity among individuals. In this case they may covary across individuals because they are part of the same temporal sequence

We believe this is a very plausible scenario. It would match our observation that the clusters in the covariance matrix match those of the high-level clusters of the unsupervised embedding, as has also been reported in Hernandez et al., 2020, BioRxiv. The authors of that paper also note that corresponding clusters are seen in within-species covariance matrices and clusters of the embedded behavioral space defined information bottleneck criteria on hours-long timescales. While they interpret that to mean individual variation may be due primarily to hours-long states, we do not believe they can formally exclude the possibility of days-long or life-long states, as we see in the other measures of the Decathlon and in previous unsupervised clusterings of fly behavior (Todd et al., 2017, Physical Biology). Per this and another reviewer’s suggestion, we have expanded the discussion around this topic.

– It will be interesting to correlate the 'high resolution' movement parameters (Figure 2) with the rest of the behavioral parameters (Figure 1), although there is a clear separation in timescales between behavioral parameters.

We have added this as a new analysis in Figure 2—figure supplement 1. While the correlations between the unsupervised behavioral modes and the assay measures are generally weaker than the correlations within either category, there is nevertheless an enrichment for statistically significant correlations, and the structure of correlations between these two kinds of behavioral measures is roughly conserved between the inbred and outbred decathlons. This also adds confidence to our having identified consistent structure in the organization of behavior across contexts.

– Figure 3: "colored hash marks" – do not appear in the figure.

We have updated the figure (now Figure 3—figure supplement 1) caption to reflect this point.

– Figure 4 – it will be informative to correlate gene-expression data with measures of the distilled behavioral covariation matrix (PCs), given that genes are probably correlated in the same way with behavioral parameters from the same a-priori group.

The individual gene expression model p-values and enriched KEGG pathways showed some similarity across behaviors from the same a priori groups (Figure 3D, E; Figure 4—figure supplement 1 B3-4, C3-4). However, we found no significant enriched pathways for distilled matrix PCs. We now state this explicitly in the captions of Figure 4—figure supplements 1 and 3.

– Figure 7- Analysis of other datasets indicate that in some of the cases there are stronger correlations among behavioral parameters. It will be interesting to discuss potential sources of differences in structure between the datasets.

We agree that some of the data sets in this figure (now Figure 5) have stronger correlations between behavioral parameters and show much more organization at lower dimensionality. We believe that this variation in organization could reflect biological differences, as well as several non-biological causes, including:

1. Parameters calculated on simultaneous measurements – we see significantly higher correlations between measures collected on the same day. This may be in part due to shared noise. It also likely reflects increased correlation due to a broader biological state. We see that nearly all of our behavioral measurements decrease in correlation over time (see Figure 1—figure supplement 1), suggesting that only part of the behavioral variation represents a persistent idiosyncratic difference (as multiple reviewers have suggested here). Thus, data sets collected over shorter windows may exhibit higher correlations.

2. Differences in the amount of duplicated or coupled behavioral measures. JAABA collects hundreds of behavioral measures that are likely numerically or mechanically coupled (e.g. multiple measures of body angle, relative angle of several body parts between 2 flies etc). Likewise, segmentations between behavioral modes in the unsupervised pipeline are designed to be uncorrelated. In most unsupervised pipelines, the number of clusters is a “mostly free” parameter, and the more clusters are labeled, the higher the max correlation between clusters.

3. There may be differences in the signal-to-noise ratio of the measurements made across these diverse assays.

Figure S9 Dashed line not explained in figure caption.

We have improved previous Figure S9 (now Figure 1—figure supplement 12) figure in several ways: There are now several dashed lines corresponding to the multiple dimensionalities expected based on the constructed covariance matrix block-diagonal structure. They represent the peaks we would expect to see in the spectra if they can identify the dimensionalities around which the data are organized. We now say this in the caption as well.

Comment by reviewer #3 that arouse during the discussion, in response to reviewer #1, point (1):

The reproducibility of the two experiments- the correlations in the distilled matrix (S5B) are lower than in the original matrix (S5A). I assume that the low correlation values in the distilled matrix makes the correaltion comparison more noisy. Actually, it is hard to learn a lot by computing correlation on correlations. I would ask the authors to plot the correlations of D1, D2 (one against the other) to directly show reproducibility.

We have now included scatter plots of the full and distilled matrix correlation values in inbred-D1, inbred-D2, outbred-D3 in Figure 4—figure supplement 4. We have also now included panels showing exemplar correlations that are conserved between inbred and outbred decathlons as pairs of scatter plots.

https://doi.org/10.7554/eLife.64988.sa2

Article and author information

Author details

  1. Zachary Werkhoven

    Center for Brain Science and Department of Organismic and Evolutionary Biology, Cambridge, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Validation, Visualization, Project administration, Writing – original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0611-5864
  2. Alyssa Bravin

    Center for Brain Science and Department of Organismic and Evolutionary Biology, Cambridge, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  3. Kyobi Skutt-Kakaria

    1. Center for Brain Science and Department of Organismic and Evolutionary Biology, Cambridge, United States
    2. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, United States
    Contribution
    Data curation, Investigation
    Competing interests
    No competing interests declared
  4. Pablo Reimers

    1. Center for Brain Science and Department of Organismic and Evolutionary Biology, Cambridge, United States
    2. Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, United States
    Contribution
    Data curation, Investigation
    Competing interests
    No competing interests declared
  5. Luisa F Pallares

    Department of Ecology and Evolutionary Biology and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, United States
    Contribution
    Data curation, Investigation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6547-1901
  6. Julien Ayroles

    1. Center for Brain Science and Department of Organismic and Evolutionary Biology, Cambridge, United States
    2. Department of Ecology and Evolutionary Biology and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8729-0511
  7. Benjamin L de Bivort

    Center for Brain Science and Department of Organismic and Evolutionary Biology, Cambridge, United States
    Contribution
    Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Writing – review and editing, Project administration, Writing – original draft
    For correspondence
    debivort@oeb.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6165-7696

Funding

National Science Foundation (DGE-1144152)

  • Zachary Werkhoven

National Science Foundation (2013170544)

  • Kyobi Skutt-Kakaria

National Institutes of Health (GM124881)

  • Julien Ayroles

Sloan Research Fellowship

  • Benjamin L de Bivort

Klingenstein-Simons Fellowship Award

  • Benjamin L de Bivort

Smith Family Foundation Odyssey Award

  • Benjamin L de Bivort

Harvard/MIT Basic Neuroscience Grant

  • Benjamin L de Bivort

National Science Foundation (IOS-1557913)

  • Benjamin L de Bivort

National Institutes of Health (MH119092)

  • Benjamin L de Bivort

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Ed Soucy, Brett Graham, Adam Bercu, and Joel Greenwood of Harvard’s CBS Neuroengineering core for help fabricating our instruments. Julie Peng helped with the genomic experiments. Gordon Berman generously consulted on the analysis and sharing data. Timothy Sackton provided insightful guidance on the RNAseq modeling and KEGG enrichment analysis. Joshua Shaevitz kindly shared code and expertise for the unsupervised analysis imaging rig. Tanya Wolff and Gerry Rubin kindly shared most of the Gal4 lines from the circuit screen. James Crall, Jennifer Erickson, Danylo Lavrentovich, and Shradda Lall provided helpful feedback on the manuscript. ZW and KSK were supported by NSF Graduate Research Fellowships DGE-1144152 and #2013170544; JFA was supported by the NIH under grant no. GM124881. BdB was supported by a Sloan Research Fellowship, a Klingenstein-Simons Fellowship Award, a Smith Family Odyssey Award, a Harvard/MIT Basic Neuroscience Grant, the NSF under grant no. IOS-1557913, and the NIH under grant no. MH119092. We also thank the reviewers for providing thorough, helpful feedback on a hefty manuscript in the midst of a pandemic.

Senior Editor

  1. Ronald L Calabrese, Emory University, United States

Reviewing Editor

  1. Manuel Zimmer, University of Vienna, Austria

Publication history

  1. Preprint posted: September 23, 2019 (view preprint)
  2. Received: November 18, 2020
  3. Accepted: September 14, 2021
  4. Version of Record published: October 19, 2021 (version 1)

Copyright

© 2021, Werkhoven et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,357
    Page views
  • 117
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Developmental Biology
    2. Genetics and Genomics
    Juliet R Girard et al.
    Research Article Updated

    Mechanistic studies of Drosophila lymph gland hematopoiesis are limited by the availability of cell-type-specific markers. Using a combination of bulk RNA-Seq of FACS-sorted cells, single-cell RNA-Seq, and genetic dissection, we identify new blood cell subpopulations along a developmental trajectory with multiple paths to mature cell types. This provides functional insights into key developmental processes and signaling pathways. We highlight metabolism as a driver of development, show that graded Pointed expression allows distinct roles in successive developmental steps, and that mature crystal cells specifically express an alternate isoform of Hypoxia-inducible factor (Hif/Sima). Mechanistically, the Musashi-regulated protein Numb facilitates Sima-dependent non-canonical, and inhibits canonical, Notch signaling. Broadly, we find that prior to making a fate choice, a progenitor selects between alternative, biologically relevant, transitory states allowing smooth transitions reflective of combinatorial expressions rather than stepwise binary decisions. Increasingly, this view is gaining support in mammalian hematopoiesis.

    1. Cancer Biology
    2. Genetics and Genomics
    Andrew S McNeal et al.
    Research Article

    Benign melanocytic nevi frequently emerge when an acquired BRAFV600E mutation triggers unchecked proliferation and subsequent arrest in melanocytes. Recent observations have challenged the role of oncogene-induced senescence in melanocytic nevus formation, necessitating investigations into alternative mechanisms for the establishment and maintenance of proliferation arrest in nevi. We compared the transcriptomes of melanocytes from healthy human skin, nevi, and melanomas arising from nevi and identified a set of microRNAs as highly expressed nevus-enriched transcripts. Two of these microRNAs—MIR211-5p and MIR328-3p—induced mitotic failure, genome duplication, and proliferation arrest in human melanocytes through convergent targeting of AURKB. We demonstrate that BRAFV600E induces a similar proliferation arrest in primary human melanocytes that is both reversible and conditional. Specifically, BRAFV600E expression stimulates either arrest or proliferation depending on the differentiation state of the melanocyte. We report genome duplication in human melanocytic nevi, reciprocal expression of AURKB and microRNAs in nevi and melanomas, and rescue of arrested human nevus cells with AURKB expression. Taken together, our data describe an alternative molecular mechanism for melanocytic nevus formation that is congruent with both experimental and clinical observations.