Individuality across environmental context in Drosophila melanogaster

  1. Institut für Biologie, Abteilung Neurobiologie, Fachbereich Biologie, Chemie und Pharmazie, Freie Universität Berlin, Berlin, Germany

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Sonia Sen
    Tata Institute for Genetics and Society, Bangalore, India
  • Senior Editor
    Albert Cardona
    University of Cambridge, Cambridge, United Kingdom

Reviewer #1 (Public review):

Summary:

The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

Strengths:

The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.

The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

Weaknesses:

The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.
The study discusses a number of interesting, stimulating ideas about inter-individual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.

Comments on revisions:

I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists.

(1) GLM Analysis Explanation (Figure 9)
While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:

- The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other non-statistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge.
- The criteria used to judge how well the GLM results support their hypothesis are not clearly stated.
- The relationship between the GLM findings and their original correlation-based conclusions needs better integration and connection, leading the reader through your reasoning.

(2) Documentation of Changes
One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:

(3) Statistical Method Selection
The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:

- Why ridge regression was selected as the optimal method
- How the regularization parameter (λ) was determined
- How this choice affects the interpretation of environmental parameters' influence on individuality

Reviewer #2 (Public review):

Summary:

The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

Strengths:

The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.

Weaknesses/Limitations:

I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.

I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

Reviewer #3 (Public review):

This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

(1) Many individualistic behaviours remain stable over the course of many days.
(2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.
(3) All the behaviours they tested fail to remain stable over spatially varying environment (arena shape).
(4) and only angular velocity (a read out of attention) remains stable across varying internal states (walking and flying)

Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

The manuscript is a technical feat with the authors having built many new high-throughput assays. The number of animals are large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, different temperature among others.

Comments on revisions:'

The authors have addressed my previous concerns.

Author response:

The following is the authors’ response to the original reviews

Public Reviews:

Reviewer #1 (Public Review):

Summary:

The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

Strengths:

The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as:

(1) a large set of behavioral attributes,

(2) with inter-individual variability, that are

(3) stable over time.

A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

We thank the reviewer for his exceptionally kind assessment of our work!

Weaknesses:

The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results.

We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

Why were five or so parameters selected from the full set? How were these selected?

The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9).

What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.

We thank the reviewer again for the extensive and constructive feedback.

Reviewer #2 (Public Review):

Summary:

The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

Strengths:

The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

We thank the reviewer for highlighting the strengths of our study.

Weaknesses/Limitations:

I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.

We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not.

The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?).

The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial.

We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

Reviewer #3 (Public Review):

This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

(1) Many individualistic behaviours remain stable over the course of many days.

(2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

(3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

(4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others.

We thank the reviewer for this extraordinary kind assessment of our work!

Recommendations for the authors:

Reviewing Editor (Recommendations For The Authors):

While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

(1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

(2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

Reviewer #2 (Recommendations For The Authors):

I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

We thank the reviewer again for this assessment!

That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models!

As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering?

We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup , the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

Reviewer #3 (Recommendations For The Authors):

This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation