The structure of behavioral variation within a genotype

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Individual animals vary in their behaviors. This is true even when they share the same genotype and were reared in the same environment. Clusters of covarying behaviors constitute behavioral syndromes, and an individual’s position along such axes of covariation is a representation of their personality. Despite these conceptual frameworks, the structure of behavioral covariation within a genotype is essentially uncharacterized and its mechanistic origins unknown. Passing hundreds of inbred Drosophila individuals through an experimental pipeline that captured hundreds of behavioral measures, we found sparse but significant correlations among small sets of behaviors. Thus, the space of behavioral variation has many independent dimensions. Manipulating the physiology of the brain, and specific neural populations, altered specific correlations. We also observed that variation in gene expression can predict an individual’s position on some behavioral axes. This work represents the first steps in understanding the biological mechanisms determining the structure of behavioral variation within a genotype.

Introduction

Individuals display idiosyncratic differences in behavior that often persist through time and are robust to situational context. Some persistent individual behavioral traits commonly occur in correlated groups and can therefore be said to covary. Behavioral ethologists have long understood that types of human and animal personalities often fall on multivariate axes of variation. A five-dimensional model (Goldberg, 1993) known as The Big Five personality traits is frequently used by psychologists to describe the range of human personality, and a similar model has been used to describe personality in fish (Réale et al., 2007). For example, human propensity for behaviors such as assertiveness, talkativeness, and impulsiveness is collectively described as extraversion and is thought to be anticorrelated with behaviors such as passivity, shyness, and deliberateness, all behaviors associated with introversion (Matthews et al., 2003).

In animal species, correlated suites of behaviors are described as behavioral syndromes. Aggressive behaviors such as fighting over mates or food are frequently correlated with exploratory behaviors such as foraging and social interaction. Correlated aggressive and exploratory behaviors have been observed in insects (Jeanson and Weidenmüller, 2014), arachnids (Johnson and Sih, 2007), fish (Huntingford, 1976), and birds (van Oers et al., 2004). The prevalence of behavioral correlation in so many species suggests that covariation is likely a universal feature of behavior. Although correlated individual differences in behavior are commonly observed, the structure and mechanisms of behavioral covariation are not well understood. Typically, where an individual lands on these behavioral axes is thought to be established by a deterministic confluence of genetic and environmental effects. But there is increasing evidence that substantial individual behavioral variation is rooted in intragenotypic variation (Honegger and de Bivort, 2018; Akhund-Zade et al., 2019). The extent to which intragenotypic behavioral variation is organized into syndromes or axes is essentially uncharacterized.

Substantial variation in specific behavioral measures, even in inbred lines raised in standardized conditions, has been observed in several clonal animals, including geckoes (Sakai, 2018), amazonian mollies (Bierbach et al., 2017), and aphids and nematodes (Schuett et al., 2011; Stern et al., 2017). Genetic model systems hold particular promise for the mechanistic dissection of this variation, and intragenotypic variability (IGV) in behavior has been characterized in mice (Freund et al., 2013), zebrafish (Bierbach et al., 2017), and Drosophila. In flies, IGV of many behaviors has been studied, including phototaxis (Kain et al., 2012), locomotor handedness and wing-folding (Buchanan et al., 2015), spontaneous microbehaviors (Kain et al., 2013; Todd et al., 2017), thermal preference (Kain et al., 2015), and object-fixated locomotion (Linneweber et al., 2020). Mechanistic studies of these behavioral phenomena have addressed two major questions: (1) what biological mechanisms underlie the magnitude of behavioral variability (e.g., genetic variation [Ayroles et al., 2015] or neural state variation [Kain et al., 2012; Buchanan et al., 2015]), and (2) what specific differences within individual nervous systems predict individual behavioral biases (Linneweber et al., 2020; Mellert et al., 2016). The mechanistic basis of individuality is an exciting new field, but no study to date has focused on characterizing the large-scale variance-covariance structure of IGV in behavior.

Behavioral correlations within a genotype could arise through a number of biological mechanisms including cell-to-cell variation in gene expression or individual differences in neural circuit wiring or synaptic weights. For example, stochastic variation during developmental critical windows has the potential to impart lasting differences between individuals in the absence of conspicuous genetic or environmental differences. Any such differences affecting nodes common to multiple behaviors in neural or molecular pathways may result in correlated shifts in behavior. Here, we study this directly by focusing on the correlation structure of behavioral variation when genetic and environmental variation are minimized.

This is an important biological question for several reasons. The structure of intragenotypic behavioral variability (1) will shape the distribution and kinds of personalities that a population of organisms displays, even when they have matched genomes and environments, (2) is the product of stochastic biological outcomes, and its organization reveals how stochasticity drives variation, (3) constrains the evolution of behavior and adaptive phenotypic strategies like bet-hedging (Hopper, 1999), (4) is a relatively uncharacterized component of neural diversity and its manifestations in behavior and disease, and (5) may shed light on how the nervous system orchestrates behavior as a whole. In flies, we have a suitable experimental system for directly characterizing this structure as we can produce large numbers of individuals with nearly identical genomes, reared in the same environment, and collect many behavioral measures per individual. We performed this experiment in wild-type inbred flies as well as wild-type outbred flies and collections of transgenic lines manipulating neural activity. This approach let us contrast the structure of intragenotypic behavioral variability in animals where the source of variability is, respectively, stochastic fluctuations, genetic differences + stochastic fluctuations, and systematic perturbations of the nervous system + stochastic fluctuations. We found that in all cases, behavioral variation has high dimensionality, that is, many independent axes of variation. The addition of variation from genetic differences and neural perturbations did not fundamentally alter this qualitative result, suggesting that stochastic fluctuations and genetic differences may structure behavior through common biological mechanisms.

Results

High-dimensional measurement of individual behaviors

The first step in revealing the structure of behavioral variation within a genotype was to devise an experimental pipeline that produced a data set of many (200+) individual flies, with many behavioral measurements each. We developed a number of behavioral assays, measuring both spontaneous and stimulus-evoked responses of individual flies, which could be implemented in a common experimental platform (Figure 1A; Werkhoven et al., 2019). This instrument features an imaging plane, within which flies moved in arenas of various geometries. Fly position was tracked with digital cameras using diffused infrared illumination invisible to the flies. Visual stimuli were presented to the animals using DLP projectors or LEDs embedded in the arena walls. We implemented six assays (Supplementary file 1) in this style, assessing (1) spontaneous walking in circular arenas, (2) preference to rest in brighter or dimmer positions (in an environment of spatially structured illumination), (3) preference to rest in brighter or dimmer light levels (in a fictive, temporally modulated light environment), (4) optomotor responses to rotating visual stripes, (5) spontaneous left-right decision making in Y-mazes, and (6) phototaxis in Y-mazes, where flies are given a choice of walking toward or away from a lit LED (Figure 1B).

Figure 1 with 12 supplements see all

Download asset Open asset

Decathlon experimental design and structure of intragenotypic behavioral variation.

(A) Schematic of the imaging rig used for most Decathlon experiments. (B) Schematics of the behavioral assays, illustrating the geometry of the arenas and stimulus structure. (C) Timeline of the Decathlon experiment. Colors indicate the assays conducted on each day, half-black half-white blocks indicate the circadian assay and storage in 96-well plates. (D) Timelines of the three Decathlon experiments, indicating the randomized order of assays 2–8. (E) Full correlation matrix of all raw behavioral measures taken in the Decathlon. Colored blocks indicate blocks of measures we thought a priori might be correlated (outer blocks, text labels). Inner blocks indicate assay. (F) Example scatter plots associated with measure correlations. Points are individual flies. Line is the best fit (principal component [PC]1 of these points), gray region is the 95% confidence interval of the fit, as determined by bootstrap resampling. (G) Distilled correlation matrix in which all correlated measures represent unexpected relationships. Meaningful correlations in this matrix can be found outside the within-a-priori-group on-diagonal blocks. (H) Example scatter plot from the distilled correlations. Plot elements as in (F). (I) Scree plot of the ranked, normalized eigenvalues, that is, the % variance explained by each PC, of the distilled behavior matrix, versus PC #. (J) Connected components spectrum (see text and Figure 1—figure supplement 10) for the distilled correlation matrix. Height of bars indicates organization at that dimensionality. (K) Points corresponding to individual flies nonlinearly embedded using t-SNE from the 121-dimensional full matrix to two dimensions. (L) Points corresponding to behavioral measures nonlinearly embedded using t-SNE from the 384-dimensional space of flies to two dimensions. Colors indicate groups of measures we expected a priori to be related.

To these assays, we added three more, assessing (7) odor sensitivity in linear chambers (Claridge-Chang et al., 2009; Honegger et al., 2020) in which half of the compartment is filled with an aversive odorant, (8) spontaneous behavior, acquired via high-resolution 100 Hz video and suitable for pixel-based unsupervised classification (Berman et al., 2014), and (9) circadian activity and spontaneous locomotion in 96-well plates with access to food. Each of these assays produced multiple behavioral measures for each individual fly. For example, flies behaving in the phototactic ‘LED Y-maze’ (assay 6) are performing phototaxis and exploratory locomotion but yield several different behavioral measures, including the number of choices made by passing through the choice point of the Y-maze (a measure of total activity), the fraction of turns that are to the right, the fraction of turns that are toward the lit LED, the number of pauses in which the animal did not move, the average duration of pauses, etc. Thus, the total collection of behavioral measures across all assays per fly was quite large (up to 121), constituting a diverse, inclusive characterization of individual behavior. Each assay has a particular measure that captures the behavior it is primarily designed to assess (e.g., the fraction of turns toward the lit LED in the LED Y-maze). In control experiments, we confirmed that these primary measures are consistent across days within an individual (Figure 1—figure supplement 1; i.e., they reflect persistent idiosyncrasies [Kain et al., 2012; Buchanan et al., 2015; Kain et al., 2015; Honegger et al., 2020]).

In order to obtain all of the behavioral measures from each experimental animal, we combined our behavioral assays in a serial experimental pipeline lasting 13 consecutive days (Figure 1C), generally with one unique assay per day and continuous circadian imaging (assay 9) between assays. This pipeline begins with 3-day-old flies being loaded into the 96-well circadian imaging plates. Using a common behavioral platform as much as possible and storing flies between experiments in 96-well plates made maintaining the errorless identity of flies over the whole 13-day experiment substantially easier. Starting on day 3, daily assays began. On each day, flies were lightly anesthetized on an ice-chilled plate and aspirated, maintaining their identity, into the assay arrays. After the assay was completed (typically after 2 hr of recording), flies were again lightly anesthetized and returned to 96-well plates for renewed circadian imaging. On the first such day, flies were loaded into an array of circular arenas and imaged for total activity (in a version of assay 1). At this point, the most active 192 flies were retained for further testing. In preliminary experiments, we found that flies that were inactive at the beginning of the pipeline were very unlikely to produce substantial amounts of data over the rest of the pipeline. With the addition of this activity-screening assay, the total number of experiments was 10, and as each fly ‘competes’ in all 10 events, we refer to the entire pipeline as a Decathlon.

It is possible that the assay order has some effect on the recorded behavioral measures. So we randomized the assay order between Decathlon implementations as much as possible (Figure 1D), subject to two restrictions: activity screening was always the first assay, and high-resolution imaging for unsupervised analysis (assay 8) was always the last assay. (This assay has lower throughput, and 3 days were required to complete all 168 remaining flies. If this assay were performed earlier in the pipeline, it might introduce heterogeneity across subsequent assays.) When each fly completed its run through all Decathlon assays (i.e., over the 3 days of assay 8 imaging), it was flash-frozen in liquid nitrogen for RNA sequencing.

Behavioral variability among inbred flies has high dimensionality and sparse pairwise correlations

To collect data that would reveal the structure of behavioral variation within a genotype, we conducted two Decathlons using highly inbred, nearly isogenic flies derived from the wild-type strain Berlin-K (BSC#8522; Nöthel, 1981). We confirmed that this strain was, indeed, highly isogenic with genomic sequencing of individual animals, finding ~75 SNPs in the population across the entire genome (Figure 1—figure supplement 2). 115 flies completed the first Decathlon, and 176 the second. While we aimed to collect 121 measures per fly, a portion of values were missing, typically because flies did not meet assay-specific activity cutoffs. The sample sizes achieved for each assay and the distributions of all raw measures are given in Figure 1—figure supplement 3. For subsequent analyses, it was sometimes necessary to have a complete data matrix (see Figure 1—figure supplement 4 for a schematic of all analysis pipelines). We infilled missing values using the alternating least-squares method, which, as judged by analyses of toy ground truth data, performed better than mean-infilling (Figure 1—figure supplement 5). For the sake of maximal statistical power, we wanted to merge the data sets from the two Berlin-K^iso Decathlons. The correlation matrices of these two data sets were not identical, but were substantially more similar than expected by chance (Figure 1—figure supplement 6), implying that while there were inter-Decathlon effects, much of the same structure was present in each and merging them was justified. To do this, we z-score-normalized the data points from each arena array/batch (within each Decathlon) across flies, thus eliminating any arena, assay, and Decathlon effects and enriching the data for contrasts between individuals. A grand data matrix was made by concatenating these batches (382 individuals × 121 behavioral measures).

The full correlation matrix of this Berlin-K^iso data set is shown in Figure 1E. It contains a substantial amount of structure, indicating that large groups of behavioral measures covary. But the covariance of many pairs of measures in the matrix is not surprising. For example, almost all our assays generate some measure of locomotor activity (meanSpeed in circular arenas, number of turns in Y-mazes, meanSpeed in the olfactory tunnels, etc.), and one might expect that especially active flies in one assay will be especially active in another assay. Additional unsurprising structure in this matrix comes in identical measures recorded in each of the 11–13 circadian assays each fly completed. But, even in this first analysis, surprising correlations were evident. For example, flies with higher variation in the inter-turn interval in the olfactory assay (‘clumpiness’ in their olfactory turning) exhibited higher mean speed in the circadian assays, and flies with higher variation in inter-turn intervals in the Y-maze (clumpiness in their Y-maze turning) exhibited lower mutual information in the direction of subsequent turns in the Y-maze (‘switchiness’ in their Y-maze handedness; Figure 1F. See below and Buchanan et al., 2015; Akhund-Zade et al., 2019) for more about these measures.

To produce an exhaustive list of such non-trivial correlations, we distilled the grand correlation matrix to a smaller matrix (the ‘distilled matrix’; Figure 1G) in which two kinds of interesting relationships were revealed: (1) uncorrelated dimensions among measures for which we had a prior expectation of correlation (e.g., if meanSpeed in circular arenas is found to be uncorrelated with meanSpeed in olfactory tunnels), and (2) correlated dimensions among measures for which we had no prior expectation of correlation. Relationships of the former class were identified by enumerating, before we ran any correlation analyses, groups of measures we expected to be correlated (‘a priori groups’; Figure 1E and G). See Supplementary file 2 and Supplementary file 3 for a breakdown of the measures included in each a priori group and their respective inclusion criteria. We looked for surprising independence within such groups by computing the principal components (PCs) of data submatrices defined by the grouping (e.g., for the ‘activity’ a priori group, by running principal component analysis (PCA) on the data set consisting of 382 individual flies, and 57 nominal measures of activity). We then replaced each a priori group submatrix with its projection onto its statistically significant PCs, as determined by a reshuffling analysis (see Materials and methods, Figure 1—figure supplement 7). Some a priori groups largely matched our expectations, with relatively few uncorrelated dimensions among many measures (e.g., the gravitaxis group that had 10 measures and only 2 significant PCs), while others exhibited relatively many uncorrelated dimensions (e.g., the clumpiness group that had five measures and five significant PCs; see Figure 1—figure supplement 7 for all a priori group PCAs).

With a priori group submatrices represented in their respective significant PCs, the grand data matrix now contained 38 behavioral measures. Every significant correlation between behavioral measures at this point represents an unexpected element of structure of behavioral variation (Figure 1G). Given the high dimensionality of the full data set and the complex structure of the behavioral measures in the distilled correlation matrix, we created an online data browser (http://decathlon.debivort.org) to explore the data and compare the alternative correlation matrices. The first impression of the distilled correlation matrix is that it is sparse. Most behavioral measures are uncorrelated or weakly correlated, suggesting that there are many independent dimensions of behavioral variation. However, 176 pairs of behaviors were significantly correlated at a false discovery rate (FDR) of 38%, and the distribution of p-values for the entries in this matrix exhibits a clear enrichment of low values (Figure 1—figure supplement 8), indicating an enrichment of significant correlations. As an example, flies with high values in the first PC of the phototaxis a priori group tend to have high values in the second PC of the activity level a priori group. As another example, the third switchiness PC is positively correlated with the second clumpiness PC (Figure 1H; a relationship that is likely related to the positive correlation between the Y-maze hand clumpiness and Y-maze hand switchiness, Figure 1F). Interpreting the loadings (Figure 1—figure supplements 9 and 10) of these PCs indicates that this is a correlation between olfactory tunnel turn direction switchiness and olfactory tunnel turn timing clumpiness. We detected a substantial number of correlations between different dimensions of switchiness and clumpiness (Figure 1—figure supplement 11), suggesting that there are multiple couplings between these suites of traits.

Stepping back from specific pairwise correlations, we examined the overall geometry of behavioral variation. The full matrix contained 22 significant PCs, with PCs 1–3 explaining 9.3, 6.9, and 5.8% of the variance, respectively (the distilled matrix contained 16 significant PCs, with PCs 1–3 explaining 13.8, 10.0, and 7.7% of the variance, respectively; Figure 1I). But the amount of variance explained across PCs does not provide the full picture of how many dimensions of variation are present in a data set. A correlation matrix can be organized at different scales/hierarchically, so there need not be a single number that characterizes dimensionality. As an intuitive example, data filling a volume shaped like a frisbee is organized largely in two dimensions. Data shaped like a rugby ball is somewhat one-dimensional, not particularly two-dimensional, and somewhat three-dimensional. An approach is needed that can characterize such continuous variation in organization across dimensionalities, particularly the possibility that the data are not structured tidily in a single dimensionality.

We developed a ‘connected components spectrum’ analysis that characterizes the continuously varying degree of organization of a correlation matrix from dimensionality 1 to d, the total dimensionality of the data. Briefly, we thresholded the absolute value of the correlation matrix at values ranging from 1 to d, and, treating these matrices as adjacency matrices, determined the number of connected components, recording how often n connected components were observed. See Materials and methods and Figure 1—figure supplement 12. These connected components spectra can be interpreted as follows: peaks at dimensionality = 1 indicate that all measures are coupled in a network of at least weak correlations; peaks at dimensionality = d indicate that all measures are to some degree uncorrelated; peaks in between these values indicate intermediate scales of organization. Multiple peaks are possible because these kinds of organization are not mutually exclusive. The connected components spectrum (Figure 1J) of the distilled Decathlon data set had peaks at 1 and d (Li and Durbin, 2009). There was also evidence for structure over the full range of intermediate dimensionalities. Overall, the organization is one of predominantly uncorrelated behaviors, with sparse sets of behaviors correlated with continuously varying strengths.

To assess how individual flies are distributed in behavior space, we embedded them from the 384-dimensional space into two dimensions using t-SNE (van der Maaten and Hinton, 2008). There appear to be no discrete clusters corresponding to ‘types’ of flies. Instead, variation among flies appears continuously distributed around a single mode (Figure 1K). The same is true for an embedding of flies as represented in the distilled matrix (Figure 4—figure supplement 3D). We also embedded behavioral measures as points from the 121-dimensional space of flies into two dimensions (Figure 1L). This confirmed that while our intuition for which sets of measures would be similar (the a priori groups) was right in many cases, measures we thought would be similar were often dissimilar across flies, and sometimes measures we did not anticipate being similar were (e.g., phototaxis and activity level).

Behaviors that covary among individuals tend to be patterned similarly in time and across the body

With data from the second Decathlon, we characterized the structure of variation in a set of behaviors that was potentially exhaustive for one behavioral condition (free walking/motion in a 2D arena; Figure 2A). High-speed, high-resolution video was acquired for flies simultaneously in each of two rigs. Over 3 days, we acquired 13.5 GB of 200 × 200 px 100 Hz videos centered on each fly as they behaved spontaneously over the course of 60 min. These frames were fed into an unsupervised analysis pipeline (Berman et al., 2014) that computed high-dimensional representations of these data in the time-frequency domain before embedding them in two dimensions and demarcating boundaries between 70 discrete modes of behavior (Figure 2B). The behavior of each fly was thus represented as one of 70 values at each frame. Flies exhibited a broadly similar probability distribution of performing each of these behaviors (Figure 2C), though there were conspicuous differences among individual patterns of behavior (Figure 2D).

Figure 2 with 1 supplement see all

Download asset Open asset

Correlation structure of unsupervised behavioral classifications.

(A) Schematic of the four camera imaging rig used to acquire single fly videos. (B) Overview of the data processing pipeline from single fly videos to behavioral probability maps. (C) Sample individual behavior mode probability density functions (PDFs visualized in an embedded t-SNE space). Discrete regions correspond to watersheds of the t-SNE embedding. (D) Sample individual PDFs mapped to locations in t-SNE space. Discrete regions correspond to watersheds of the t-SNE embedded probability densities. (E) Correlation matrix (top) for individual PDFs with rows and columns hierarchically clustered. Colored blocks indicate labels applied to classifications post hoc. Example scatter plots (bottom) of individual behavioral probabilities. Points correspond to probabilities for individual flies. Line is the best fit (principal component [PC]1 of these points), gray region is the 95% confidence interval of the fit, as determined by bootstrap resampling. (F) Connected components histogram of the thresholded PDF correlation matrix (see Materials and methods). (G) Discrete behavioral map with individuals zones colored by post hoc labels as in (E). (H) Transition probability matrix for behavioral classifications. Entries in the ith row and jth column correspond to the probability of transitioning from state i to state j over consecutive frames. Blocks on the diagonal indicate clusters of post hoc labels as in (E) and (G).

The correlation matrix of behavioral modes identified in the unsupervised analysis was highly structured (Figure 2E; like the correlation matrix of the rest of the Decathlon measures, Figure 1E), appearing to be strongly organized in 10 or fewer dimensions, with some evidence of organization in higher dimensions (Figure 2F). Correlations between behavioral modes identified by unsupervised clustering and measures from the other Decathlon assays were generally weaker than correlations within these categories, but were nevertheless enriched for significant relationships (Figure 2—figure supplement 1). For this analysis, there was no equivalent of a priori groups of behavioral measures as measures were not defined prior to the analysis. But, in examining sample movies of flies executing each of the 70 unsupervised behavioral modes (Videos 1–4), it was clear that highly correlated behavioral modes tended to reflect variations on the same type of behavior (e.g., walking) or behaviors performed on the same region of the body (e.g., anterior movements including eye and foreleg grooming; Figure 2G). In other words, individual flies that perform more eye grooming tend to perform more of other anterior behaviors. There were some correlations between behaviors implemented by disparate parts of the body. For example, flies that spent more time performing anterior grooming also spent more time performing slow leg movements (Figure 2G). The overall similarity of covarying behaviors was confirmed by defining groups of covarying behaviors and observing that they were associated with contiguous regions of the embedded behavioral map (Figure 2G). That is, behaviors whose prevalence covaries across individuals have similar time-frequency patterns across the body. Moreover, these clusters of covarying, contiguously embedded behaviors exhibited similar temporal transitions; behaviors that covary across individuals tend to precede specific sets of subsequent behaviors (Figure 2H). Thus, there appear to be couplings between the dimensions of behavioral variation across individuals, the domains of the body implementing behavior, and the temporal patterning of behaviors.

Video 1

Download asset

posterframe for video — Examples of a mode of walking behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Colored dots indicate whether flies are outbred (NEX; red) or inbred (Berlin-K^iso; blue).

Video 2

Download asset

Video 3

Download asset

Video 4

Download asset

Thermogenetic neural perturbation alters the correlations between behaviors

To (1) confirm that the Decathlon experiments revealed biologically meaningful couplings between behaviors and (2) probe biological mechanisms potentially giving rise to behavioral correlations, we treated correlations in the Decathlon matrices as hypotheses to test using data from a thermogenetic neural circuit perturbation screen (Skutt-Kakaria et al., 2019). Specifically, we focused on the many correlations between measures of turn timing clumpiness and turn direction switchiness (Figure 1H and S8). Before the Decathlon experiment, we had no reason to think these measures would be correlated, as one describes higher-order structure in the timing of locomotor turns (clumpiness), and the other describes higher-order structure in the direction of sequential turns (switchiness). Our prediction derived from the Decathlon was that if perturbing a circuit element caused a change in clumpiness it would tend to also cause a change in switchiness in a consistent direction. We looked for such correlated changes when we inactivated or activated neurons in the central complex (Figure 3—figure supplement 1A–C), a cluster of neuropils involved in locomotor behaviors (Buchanan et al., 2015; Ofstad et al., 2011; Kottler et al., 2019) and heading estimation (Seelig and Jayaraman, 2015; Green et al., 2017; Kakaria and de Bivort, 2017). We used a set of Gal4 lines (Wolff and Rubin, 2018), each of which targets a single cell type and that tile the entire protocerebral bridge (a central complex neuropil; see Supplementary file 4), to express Shibire^ts (Kitamoto, 2001) or dTRPA1 (Hamada et al., 2008), thermogenetic reagents that block vesicular release and depolarize cells, respectively. As controls, we used flies heterozygous for the Gal4 lines and lacking the effector transgenes.

Flies with these genotypes were loaded into Y-mazes for behavior imaging before, during, and after a temperature ramp from the permissive temperature (23°C) to the restrictive temperature (32°C for dTRPA1 and 29°C for Shibire^ts) (Figure 3—figure supplement 1A). At the permissive temperature, we observed significant negative correlations in the average line clumpiness and switchiness of control and dTrpA1 expressing lines (Figure 3—figure supplement 1D and E). This suggests that the mechanisms that couple variation in switchiness and clumpiness within a genotype may also be at play across genotypes. Surprisingly, at the restrictive temperature, we saw significant positive correlations between clumpiness and switchiness in all three experimental treatments: control (Gal4/+), Gal4/Shibire^ts, and Gal4/dTRPA1 lines. That this correlation appeared in controls suggests that temperature alone can selectively alter the function of circuit elements regulating both clumpiness and switchiness, effectively reversing their coupling. The dTrpA1- and Shibire^ts-expressing lines also showed this reversal, but to a lesser extent, suggesting that perturbing neurons in the central complex can block temperature-induced changes in the coupling of clumpiness and switchiness.

Individual gene expression variation correlates with individual behaviors

That thermogenetic manipulation can disrupt correlations between behavioral measures suggests that specific patterns of neural activity underlie the structure of behavioral variation. Such physiological variation could arise in stochastic variation in gene expression (Lin et al., 2016) in circuit elements. To test this hypothesis, we performed RNA sequencing on the heads of the flies at the end of the first Decathlon experiment (Figure 3A). We used Tm3′seq (Pallares et al., 2020) to make 3′-biased libraries for each individual animal. We quantified the expression of 17,470 genes in 101 individual flies. The expression profiles were strongly correlated across individuals (Figure 3B), but there was some significant variation across individuals (Figure 3C). To assess whether this variation was meaningful with respect to behavioral variation, we trained linear models (over 625,000 in total) to predict an individual’s behavioral measures from its transcriptional idiosyncrasies. Specifically, we fit simple linear models for each of our 97 behavioral measures as a function of the 6642 most highly expressed genes. The median model had an r² value of 0.008% and 5% of models predicted behavior at r² > 0.135. Behavioral measures varied greatly in their number of significant (p<0.05) gene predictors (Figure 3D), ranging from 147 to 1172 genes.

Figure 3 with 1 supplement see all

Download asset Open asset

Correlation between individual transcriptomes and behavioral biases.

(A) Steps for collecting transcriptomes from flies that have completed the Decathlon. (B) Data matrix of individual head transcriptomes. Rows are individual flies (n = 98). Columns are 17,470 genes sorted by their mean expression across individuals. The dashed line indicates the mean expression cutoff at 10 reads per million (RPM), below which genes were excluded from analysis. (C) Scree plots of the logged % variance explained for ranked principal components of gene expression variation, for observed (red) and shuffled (black) data. Shaded region corresponds to 95% CI as calculated by bootstrap resampling. (D) Performance heatmap (-log p) of linear models predicting the behavior of individual flies from single-gene expression. Colored bars (left) indicate a priori group identity of behavioral measures (rows). Bar graphs show the number of significant (p<0.05) models for each gene (top) and behavior (right). (E) Heatmap showing the probability across bootstrap replicates of a KEGG pathway being significantly enriched in the list of predictive genes for a given behavior. (F) Bar plot showing the average across bootstrap replicates of the maximum (across behaviors) negative log adjusted p-value of all enriched KEGG pathways. Color indicates results from observed (gray) or shuffled control (black) data. (G) Average maximum adjusted -log p-value for enriched KEGG pathways common to all Decathlon iterations. Pathway labels (right) are ordered by batch 3 (outbred) -log adjusted p-value.

After identifying genes that were predictive of each behavioral measure, we assessed whether they shared functional characteristics (Figure 4—figure supplement 1A) using KEGG pathway enrichment analysis (Mootha et al., 2003; Kanehisa and Goto, 2000; Yu et al., 2012). We independently performed KEGG enrichment analysis on each gene list to identify functional categories of genes overrepresented for any particular behavioral measure compared to the background list of 6642 genes. Of this background, 1982 genes (30%) were associated with one or more KEGG pathways. Across all behaviors, we found that 37 KEGG pathways were significantly enriched in the genes that predicted individual behavioral measures (Figure 3E, Figure 4—figure supplement 1, Supplementary file 5). Many of these pathways were significantly enriched compared to shuffled controls and were enriched in multiple behavioral measures (Figure 3F). We included features in the online Decathlon Data Browser to explore gene-behavior correlations (searching either by gene or by behavior) as well as KEGG pathway-behavior correlations. We repeated this experiment and analysis on flies of the second Decathlon experiment and found that the significance of pathway enrichments was highly correlated (r = 0.87; Figure 3G), with 43 and 73% of significant KEGG pathways in the first and second Decathlon experiments (respectively) shared in the other experiment.

Genes related to cellular respiration, protein translation, and phototransduction were significantly enriched for multiple behavioral measures, a result that was highly robust to bootstrap resampling. Cellular respiration was the most common enrichment, significant in 11 of 97 behavioral measures, suggesting that variation in metabolic rate may be predictive of variation in many behaviors. Indeed, metabolic function was a common link between significant categories, with a total of 24 of 37 being related to metabolism. We also found 11 pathways (including two highly enriched categories: ribosome and proteasome) associated with protein turnover. While most enriched pathways were related to basic cellular processes, others (Hedgehog signaling, Wnt signaling, insect hormone biosynthesis, SNARE vesicular transport, and phototransduction) suggested roles for development and neuronal function in individual behavioral variation.

Behavioral variability has a similar structure in inbred and outbred lines

If transcriptomic differences predict individual behavioral differences within a genotype, then the structure of behavioral variability might be very different in outbred populations, where transcriptomic differences are (presumably) much more substantial. We tested this by conducting a Decathlon experiment on outbred flies (n = 192) from a synthetic genetic mapping population (Long et al., 2014). These animals were from a high (~100)-generation intercross population (‘NEX’; seeded in the first generation by eight kinds of F₁ heterozygotes produced by round-robin cross from eight inbred wild strains). A distilled correlation matrix of behavioral measures (Figure 4A) was produced by the same method as above. At first glance, it appears qualitatively similar to the distilled correlation matrix from inbred animals (Figure 1G). This impression was confirmed in more formal comparisons of the structure of behavior in inbred and outbred populations. In inbred and outbred populations, (1) individuals do not fall into discrete clusters, as determined by t-SNE embedding of individuals as points (Figure 4B). Moreover inbred and outbred flies appear to lie on the same manifold in behavioral measure space; (2) behavioral measures cluster according to their membership in a priori groups similarly in outbred (Figure 4C) and inbred (Figure 1L) populations; (3) the distribution of percent variance explained by PC was similar (Figure 4D); and (4) there is a similar connected components spectrum, with substantial distribution of behavioral biases across the full d dimensions and a sparse network of correlations coupling behaviors over a continuum of dimensionalities (Figure 4E). In addition to the Decathlon assay measurements, we also recorded high-speed, high-resolution videos of outbred flies for unsupervised classification and co-embedded their postural behavior modes with those of inbred flies. Correlated clusters of unsupervised behavior modes from outbred flies corresponded broadly to the clusters mapped to anatomical regions in inbred flies (Figure 4—figure supplement 1). We further conducted transcriptomic sequencing and analysis on the heads of these outbred flies and found that, similar to the inbred flies, their behaviors were broadly correlated to gene expression (Figure 4—figure supplement 2, Figure 4—figure supplement 3). The percent of significant pathways overlapping between pairs of inbred and outbred Decathlon experiments was similar to that of pairs of inbred Decathlon experiments, ranging from 38% to 64%.

Figure 4 with 4 supplements see all

Download asset Open asset

Structure of behavioral variation in outbred flies.

(A) Distilled correlation matrix for outbred NEX flies. Colored blocks indicate a priori groups as described in Figure 1. (B) Points corresponding to individual flies nonlinearly embedded using t-SNE from the 121-dimensional full matrix to two dimensions. Color indicates whether the flies were inbred (blue) or outbred (orange). (C) Points corresponding to behavioral measures nonlinearly embedded using t-SNE from the 192-dimensional space of flies to two dimensions. Colors correspond to a priori group. (D) Scree plot of the ranked, normalized eigenvalues, that is, the % variance explained by each principal component (PC), of the distilled covariance matrix, versus PC #. (E) Connected components spectra for outbred and inbred correlation matrices (see Materials and methods). (F) Scatter plot of the distilled matrix correlation coefficients for inbred and outbred flies. Points correspond to distilled matrix measure pairs. (G) Example scatter plots of distilled matrix measure pairs for inbred (left) and outbred (right) flies. The rows of plots highlight a pair of measures with qualitatively different (top) and similar (bottom) correlations in inbred and outbred flies.

After determining that the overall structure of behavioral variation in inbred and outbred populations is similar, we asked whether it was also similar in specific correlations. There appears to be some similarity at this level (Figure 4F); the correlation coefficient between the inbred and outbred populations in the pairwise correlations between behavioral measures is statistically significant (p=0.001), but low in magnitude (r = 0.11). Examining specific pairs of scatter plots, it is clear that the correlations between specific behaviors are sometimes the same between the inbred and outbred animals, but sometimes not (Figure 4G, Figure 4—figure supplement 4). A caveat in interpreting apparent differences between the inbred and outbred matrices is that two qualities are different between the animals used in the respective experiments: the degree of genetic diversity, but also (necessarily) the genetic background of the flies.

Behavioral variability has high dimensionality regardless of the mechanistic origins of variation

Lastly, we examined how the correlation structure of behavior compared between sets of flies with variation coming from different sources. Specifically, we looked at four data sets: (1) the BABAM data set (Robie et al., 2017), in which measures were acquired from groups of flies behaving in open arenas, and variation came from the thermogenetic activation of 2381 different sets of neurons (the first-generation FlyLight Gal4 lines Jenett et al., 2012); (2) a Drosophila Genome Reference Panel (DGRP; Mackay et al., 2012) behavioral data set, in which measures were acquired in behavioral assays similar to the Decathlon experiments (sometimes manually, sometimes automatically), and variation came from the natural genetic variation between lines in the DGRP collection; (3) a DGRP physiological data set, in which measures are physiological or metabolic (e.g., body weight and glucose levels) and variation came from the natural genetic variation between lines in the DGRP collection; and (4) the split-Gal4 Descending Neuron (DN) data set (Cande et al., 2018) in which measures came from the same unsupervised cluster approach as Figure 2, and variation came from the optogenetic activation of specific sets of descending neurons projecting from the brain to the ventral nerve cord (Namiki et al., 2018). We analyzed these data sets with the same tools we used to characterize the structure of behavioral variation in the Decathlon experiments.

All of these data sets show substantial structure in their correlation matrices (Figure 5A and Figure 5—figure supplement 1). The BABAM and especially the DN correlation matrices contain numerous high correlation values, indicative of strong couplings between behaviors under these neuronal manipulations. The DGRP correlation matrices, especially the DGRP behavioral matrix, look more qualitatively similar to the Decathlon matrix, with lower, sparser correlations. This suggests that behavioral variation has coarsely similar structure whether variation arises intragenotypically (e.g., through stochastic variation in transcription; Figures 3B and 1J), intergenotypically among outbred individuals (Figure 4E), or intergenotypically among inbred lines derived from wild populations (Figure 5A2). A caveat of this conclusion is that sparse correlation matrices can arise either from true, biological independence of behavioral measures or from measurement error.

Figure 5 with 1 supplement see all

Download asset Open asset

Analysis of *Drosophila* behavioral covariation in other non-isogenic populations.

(A) Correlation matrices of previously published data sets. Rows correspond to analyses performed on each data set. From left to right, the data sets (columns) are as follows: line averages of supervised behavioral classifications following thermogenetic inactivation in the fly olympiad screen (Robie et al., 2017), line averages of behavioral phenotypic data from wild-type inbred lines in the *Drosophila* Genomic Reference Panel (DGRP) database, line averages of physiological phenotypic data from the DGRP database, line averages of the fold change in unsupervised behavioral classifications following optogenetic activation of descending neurons (Cande et al., 2018). (B) Connected components spectra for each correlation matrix (see Materials and methods). Color in the rightmost plots (**B–D**) indicates either control (Gal4 driver only) or experimental animals (Gal4 × dTrapA1). (C) Points corresponding to lines nonlinearly embedded using t-SNE from the D-dimensional raw measure space to two dimensions (from left to right, d = 871, 31, 77, 151). (D) Points corresponding to lines nonlinearly embedded using t-SNE from the n-dimensional raw measure space to two dimensions (from left to right, n = 2083, 169, 169, 176).

The connected components spectra of these matrices (Figure 5B) are similar in offering evidence of organization over a wide range of dimensionalities, including high dimensionalities. Only the BABAM spectrum has no power at the dimensionality of its raw count of measurements. The BABAM and DN spectra have a single predominant peak (at dimensionality = 1), suggesting that most measures belong to a single network of at least weak couplings. This is especially true in the BABAM data and is more true in the optogenetic experimental animals than controls in the DN data. The DGRP physiology data exhibits weak peaks at dimensionality = 1 and d, but also peaks at ~25 and 37 dimensions, suggesting that intergenotypic variation in physiology may have an intrinsic dimensionality in that range. Alternatively, since these data sets comprise multiple studies, organization at this dimensionality may reflect batch effects or multiple related measurements within studies. The spectrum of the DGRP behavior data looks similar to that of the Decathlons, with peaks at dimensionality = 1 and d, and evidence for structure over the full range of intermediate dimensionalities. There is also a peak at ~27 dimensions, which may also correspond to study-level effects. Ultimately, we found no evidence for low-dimensional organization in either DGRP data set. The distribution of individual lines in the space of DGRP behavior (and physiology) measures appears to be distributed single mode (Figure 5C2 and C3), like individual flies in the Decathlon (Figure 1K). In contrast, there is some organization of individual lines in the BABAM and DN data sets, likely reflecting neuronal perturbations affecting multiple circuit elements mediating the same behavior(s) (e.g., multiple lines targeting the same neuropil). Measures fall into clusters in all of these data sets except the DGRP behavioral measures (Figure 5D), which appear distributed around a single mode, perhaps reflecting the high dimensionality of behavior itself.

Discussion

Individuals exhibit different behaviors, even when they have the same genotype and have been reared in the same environment. These differences might covary or lie on a manifold of specific geometry in behavioral variation space, but the structure of intragenotypic behavioral variation is uncharacterized. We designed a pipeline of 10 behavioral assays (Figure 1), which collectively yielded up to 121 behavioral measures per individual animal. We also used unsupervised clustering to identify an additional 70 measures per individual based on a time-frequency analysis of high-resolution video of the flies behaving spontaneously (Figure 2). These measures were the fly-specific rates of exhibiting each of the 70 unsupervised behavioral modes. All in all, across three 15-day Decathlon experiments, we collected 191 behavioral measures from 576 flies. This allowed us to produce a full correlation matrix of the behavioral measures exhibited by inbred animals grown in the lab (Figure 1E).

There is a well-developed theoretical framework for understanding the multivariate correlation structure of phenotypes. In quantitative genetics, G-matrices characterize the variance and covariance structure of phenotypes (be they behavioral, physiological, morphological) stemming from genetic differences among individuals or strains (Mackay, 2009; Bruijning et al., 2020). These representations allow the quantitative prediction of responses to selection and constrain the combinations of phenotypes individuals can exhibit. As such, these representations are a key part of predicting the future trajectories of evolution. Just as the phenotypic variance can be parsed into genetic variance, environmental variance, GxE interaction variance, etc., covariance can be similarly dissected (Charmantier et al., 2014; Berdal and Dochtermann, 2019). For example, the classic model of phenotypic variance V_P = V_G + V_E has a direct phenotypic covariance analogue: Cov_P = Cov_G + Cov_E. The last term (Cov_E) is further broken down into temporary environmental covariances and permanent environmental covariances (Cov_PE) that endure for the duration of observations (similar to our measurements of individual behavioral bias). In flies, we have the potential to directly measure Cov_PE by rearing inbred animals in standardized lab environments, profiling their individual biases over a wide range of behavioral measures, and directly quantifying the variance and covariance of behavioral bias. This is significant for quantitative geneticists because a meta-analysis of behavioral traits indicates that across behaviors ~23% of variance can be attributed to heritable factors (Dochtermann et al., 2019). This means that environmental factors, which include both deterministic effects and stochastic intragenotypic effects, explain ~77% of behavioral variance. Thus, characterizing the structure of Cov_PE will contribute to closing a significant gap in our understanding of the basis of behavioral diversity.

For ethologists and behavioral neuroscientists, this work yields a view of the geometry of intragenotypic behavioral variation, which can be thought of as an emergent product of developmental biological processes and the dynamic interaction of neural activity and animals’ environment. From the full behavioral matrix, we made a so-called ‘distilled’ matrix in which any significant correlation indicates a surprising new relation between behaviors (Figure 1G and Figure 1—figure supplement 11). This form of the data minimizes duplicated measures of the same behavior, allowing us to cleanly analyze the geometry of behavioral variation. We found that behavioral measures were largely uncorrelated with each other, but small sets of behaviors were significantly correlated to varying degrees. This sparse correlation structure means that behavioral variation cannot be readily compressed to a small number of dimensions. Moreover, a single number cannot fully characterize the dimensional organization of a correlation matrix; so we developed a spectral approach (based on a connected components analysis of thresholded correlation matrices) that examined the degree of organization across all possible dimensionalities in the data (Figures 1J, 2F, 4E and 5B, Figure 1—figure supplement 12). This approach revealed organization across intermediate dimensionalities corresponding to correlations of varying strength between sparse sets of behaviors (Figure 1F and H and Figure 1—figure supplement 12). Embedding data points corresponding to individual flies from the high-dimensional space of individual biases into two dimensions produced a broad distribution around a single mode (Figure 1K), implying that there are no discrete types of flies.

One of the specific, surprising correlations we discovered was between ‘clumpiness’ and ‘switchiness’ (Figure 1F, H and Figure 1—figure supplement 11, Figure 3—figure supplement 1). These are slightly abstract, higher-order behavioral measures, corresponding respectively to the burstiness of turn/action/decision timing and the degree of independence between consecutive binary choices. We had no a priori reason to expect these measures would be correlated since one pertains to the structure of actions in time and one pertains to the persistence of trial-to-trial biases. However, their linkage may reflect a shared role in controlling the higher-order statistics of exploration. Sequences of behavior with clumps of bouts (either in time or in space) might contribute to fat-tailed distributions of dispersal that are advantageous over Brownian motion for foragers in environments of sparse resources (Bartumeus et al., 2002). Variation in switchiness and clumpiness across individuals might therefore reflect variation in multibehavioral navigational strategies, perhaps as part of bet-hedging evolutionary strategy (Hopper, 1999; Kain et al., 2015). From a perspective of biological mechanism, the correlation between these two behaviors (or other pairs we discovered) could be established during development. Individual wiring (Mellert et al., 2016; Linneweber et al., 2020) or physiological variations in neurons that mediate more than one behavior could impart coupled changes to all such behaviors.

If such an explanation accounts for the correlation of clumpiness and switchiness, there may be shared neural circuit elements in the circuits controlling decision timing and decision bias. We tested this idea in a thermogenetic screen of circuit elements in the central complex, a brain region where heading direction is represented (Seelig and Jayaraman, 2015) in a ring-attractor circuit (Kakaria and de Bivort, 2017). We found that increasing the temperature changed the sign of the correlation between switchiness and clumpiness (Figure 3—figure supplement 1D and E), suggesting that changes to brain physiology can alter correlation structure, though these effects might instead be caused by any of the many changes in neural state that accompany a temperature change. Indeed, we also found that the effector-inducing temperature manipulation alone concomitantly changed clumpiness and switchiness in some lines. This suggests that potentially subtle alterations of circuit physiology (e.g., temperature shifts [Haddad and Marder, 2018] well within physiological limits) can affect the function of circuit elements governing multiple behaviors.

We included behavioral assays in the Decathlon pipeline under a number of constraints. The assays had to be high throughput, both in the number of flies that could be assayed and in measures being automatically acquired. Flies had to survive at high rates, and the measures had to be stable over multiple days (Figure 1—figure supplement 1), because the whole experiment lasted 15 days. Because not all behavioral measures showed robust stability for this duration (all showed at least some day-to-day stability), the distilled Decathlon matrices likely represent an underestimate of the behavioral couplings exhibited over short periods (and perhaps an overestimate of lifelong couplings as flies can live more than a month). In the end, we employed a number of spontaneous locomotion assays and simple stimulus-evoked assays like odor avoidance and phototaxis.

Light responses were measured in a number of assays (Supplementary file 1), specifically the LED Y-maze (Werkhoven et al., 2019; in which flies turned toward or away from lit LEDs in a rapid trial-by-trial format), the spatial shade-light assay (in which flies chose to stand in lit or shaded regions of an arena that only changed every 4 min), and temporal shade-light (in which the same luminance levels were used as the previous assay, but a fly experienced them by traveling into virtual zones that triggered the illumination of the whole arena at a particular luminance) (Figure 1B). These assays were potentially redundant, and we included this cluster of phototaxis assays in part as a positive control. However, the three phototactic measures we thought would be correlated a priori were, in fact, uncorrelated, each being represented in the distilled correlation matrix (Figure 1G). This may reflect flies using different behavioral algorithms (Krakauer et al., 2017), implemented by non-overlapping circuits, to implement these behaviors. Indeed, a lack of correlation between behavioral measures was the typical observation. This also suggests that we have not come close to sampling the full dimensionality of intragenotypic behavioral variation; if we were able to add more measurements to the experiment, they too would likely be uncorrelated.

To address potential biases in our sampling of assay and measure space, we performed an unsupervised analysis (Berman et al., 2014; Cande et al., 2018) of flies walking spontaneously in arenas. This approach has the potential to identify all the modes of behavior exhibited in that context. Moreover, because the unsupervised algorithm is fundamentally a clustering algorithm, it does not necessarily return a definitive number of clusters/behavioral modes (with more data, it can find increasingly more clusters). Because we can also extract second-order behavioral measures from this approach, such as Markov transition rates between modes, this approach has the potential to yield a huge number of measures. In the end, we were conservative in the number of measurements we chose to work with, matching it to the same order of magnitude as the number of flies we tested. The correlation matrix for the unsupervised behavioral measures featured stronger correlations than the distilled Decathlon matrices (Figure 2E). Yet it had a similar connected components spectrum, indicating many dimensions of variation and that not all behavior modes had been sampled (Figure 2F).

Interestingly, the blocks of structure in the correlation matrix aligned, to some extent, with the blocks of structure in the Markov transition matrix of these behavioral measures. This suggests that behaviors mediated by non-overlapping circuits (those that vary independently across individuals) more rarely transition to each other over time. Conversely, behaviors mediated by overlapping circuitry are likely to follow each other sequentially. This may reflect the influence of internal states (Calhoun et al., 2019), with an internal state jointly determining what subset of overlapping neurons drives behaviors that are appropriate to string together in succession (e.g., Seeds et al., 2014). We did not assess the day-to-day persistence of behavioral modes identified in the unsupervised analysis, so the observed variation across flies could reflect moods rather than permanent personalities. Indeed, Hernández et al., 2020 find that high-level clusters in unsupervised modes identified through an information bottleneck analysis of the behavioral transitions that occur over many minutes align with clusters in the cross-individual covariance matrix. This suggests either that individuality observed in approximately hours-long video recordings reflects approximately hours-long transient states or, alternatively, that the structure of days-long or lifelong variation mirrors that of hours-long variation. The latter may be particularly plausible, given that previous studies that measured spontaneous microbehaviors using supervised (Kain et al., 2013) and unsupervised (Todd et al., 2017) approaches over repeated tests found substantial day-to-day correlations.

We investigated whether individual variation in transcript abundance would predict individual behavioral variation (Figure 3). At the end of the Decathlon, flies were flash-frozen and their heads were RNA sequenced. We fit linear models for thousands of gene-behavior measure pairs to identify a set of genes that predict behavioral variation. Some measures had many gene predictors, while others had none. Gene function enrichment analysis revealed that variation in the expression of genes involved in respiration and other kinds of metabolism predicted variation in many behavioral measures. Genes involved in neuronal and developmental processes were also enriched among predictors of individual behavior. These two functional categories are likely linked as there are strong causal couplings between metabolism and neural activity (Mann et al., 2020). Variation in behavioral measures without expression correlates may not have mechanistic origins in transcriptional variation, or the genes that do determine these behaviors may be not expressed in adults, expressed at low levels, or expressed in too few cells to detect in bulk head tissue.

Increasing transcriptional variation by adding genetic variation had the potential to change the correlation structure of behavioral measures. To our surprise, the distilled correlation matrix of outbred flies was qualitatively similar to that of our original inbred Decathlon (Figure 4). Both outbred and inbred matrices were dominated by independent axes of variation and sparse correlations between axes, with rough agreement in the specific pairwise correlations between these two data sets. These two data sets may have differed in their absolute variances (while appearing qualitatively similar in their covariances), but the normalization steps we took to resolve inter-Decathlon and assay batch effects precluded easy assessment of this possibility. That outbred and inbred variation had qualitatively similar structures raises the possibility that the same kinds of biological fluctuations underlie behavioral variation in populations of each of these kinds. The high dimensionality of the inbred and outbred matrices suggests that behaviors can evolve largely independently of one another (i.e., without pleiotropic constraint) either via plastic mechanisms within a genotype or by natural selection in a genetically diverse population.

We finally examined the structure of behavioral variation in collections of flies where variation came from three additional sources: thermogenetic activation of 2205 sets of neurons across the brain (Robie et al., 2017), optogenetic activation of 176 sparse populations of descending neurons connecting the brain to the motor centers of the ventral nerve cord (Cande et al., 2018), and variation in genotype across ~200 inbred strains derived from wild flies (Mackay et al., 2012). Overall, all of these data sets exhibited high degrees of independence in their behavioral measures, though there is some variation from one data set to the next. This variation could arise from several non-biological sources, including differing extents of collecting multiple measures from the same experiment, which can inflate correlations due to coupled noise; differences in the number of repeated or numerically coupled measures; batch effects from combining studies conducted in different labs at different times; and varying signal-to-noise ratios in the measurements. The structure of behavioral variation in the neural activation data sets was somewhat different from that of inbred and outbred flies, with a dimensionality smaller than that of the number of measures, and substantially more organization in lower dimensionalities (Figure 5A and B). These data sets also showed clustering of individual flies in behavior space (Figure 5C). Behavioral variation across the inbred strains derived from wild flies was organized qualitatively similarly to the variation across individual flies in inbred and outbred populations, again suggesting that the biological fluctuations across genotypes mirror those within a genotype as respects the coordination of behavior.

This work represents the most complete characterization to date of the structure of behavioral variation within a genotype. We found that there are no discrete types of flies, and there are many independent dimensions of behavioral variation. Moreover, the similar organization of biological variation within and among genotypes suggests that fluctuations in the same biological processes underpin behavioral variation at both of these levels. Elucidating the causal molecular and genetic fluctuations underpinning intragenotypic variation and covariation will be a challenging and illuminating future research direction.

Materials and methods

Software

All software for analysis and figures in this paper as well as documentation are available at http://lab.debivort.org/structure-of-behavioral-variability/. Static DOI for this code is available athttps://doi.org/10.5281/zenodo.4110049. Actively maintained versions of the analysis software are available at https://github.com/de-Bivort-Lab/decathlon (copy archived at swh:1:rev:6c9e338db6f03e42bbbb2e2afa0cfd52162e7772; de Bivort, 2021a).

MARGO (animal tracking and stimulus delivery software): https://github.com/de-Bivort-Lab/margo (copy archived at swh:1:rev:fe61a873494e464d2a3ee48f67e885eb95359e0a, de Bivort, 2021b).

Decathlon Data Browser (interactive web application for exploring data): .http://decathlon.debivort.org.

Data

Data such as behavioral measures, unsupervised embedding PDFs, RNAseq read counts, and formatted versions of the BABAM, DGRP phenotypic, and DN screen data are available at http://lab.debivort.org/structure-of-behavioral-variability/. Static DOI for these data in addition to the unsupervised classification embedding time series data is available at https://doi.org/10.5281/zenodo.4110049. Our raw tracking data from the assays covers ~600 flies over 12 days of continuous tracking at between 3 and 60 Hz and are greater than 50 GB in size. Our unsupervised classification videos comprise ~144 million frames of uncompressed video data and are greater than 8 TB in size. For these reasons, we have not posted the rawest forms of the data to online repositories and instead offer them upon request by external hard drive.

Share this article

Cite this article

Decathlon experimental design and structure of intragenotypic behavioral variation.

Correlation structure of unsupervised behavioral classifications.

Examples of a mode of walking behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Examples of a mode of wing grooming behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Examples of a mode of head grooming behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Examples of a mode of abdomen flexing behavior as identified by the unsupervised analysis, from movies of single flies, made up of successive frames classified as the same behavior.

Correlation between individual transcriptomes and behavioral biases.

Structure of behavioral variation in outbred flies.

Analysis of Drosophila behavioral covariation in other non-isogenic populations.

Author details

Zachary Werkhoven

Contribution

Competing interests

Alyssa Bravin

Contribution

Competing interests

Kyobi Skutt-Kakaria

Contribution

Competing interests

Pablo Reimers

Contribution

Competing interests

Luisa F Pallares

Contribution

Competing interests

Julien Ayroles

Contribution

Competing interests

Benjamin L de Bivort

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism