Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorGordon BermanEmory University, Atlanta, United States of America
- Senior EditorAleksandra WalczakCNRS, Paris, France
Reviewer #1 (Public review):
Summary:
The submitted article reports the development of an unsupervised learning method that enables quantification of behaviour and poses of C. elegans from 15 minute long videos and presents a spatial map of both. The entire pipeline is a two part process, with the first part based on contrastive learning that represents spatial poses onto an embedded space, while the second part uses a transformer encoder to enable estimation of masked parts in a spatiotemporal sequence.
Strengths:
This analysis approach will prove to be useful for the C. elegans community. The application of the method on various age-related videos on various strains presents a good use-case for the approach. The manuscript is well written and presented.
Specific comments:
(1) One of the main motivations as mentioned in the introduction as well as emphasized in the discussion section is that this approach does not require key-point estimation for skeletonization and is also not dependent on the eigenworm approach for pose estimation. However, the eigenworm data has been estimated using the Tierpsy tracker in videos used in this work and stored as metadata. This is subsequently used for interpretation. It is not clear at this point, how else the spatial embedded map may be interpreted without using this kind of pose estimates obtained from other approaches. Please elaborate and comment.
(2) As per the manuscript, the second part of the pipeline is used to estimate the masked sequences of the spatiotemporal behavioral feature. However, it is not clear what the numbers listed in Fig. 2.3 represent?
(3) It is not clear how motion speed is linked to individual poses as mentioned in Figs. 4 (b) and (c).
Reviewer #2 (Public review):
Summary:
The manuscript by Maurice and Katarzyna describes a self-supervised, annotation-free deep-learning approach capable of quantitatively representing complex poses and behaviors of C. elegans directly from video pixel values. Their method overcomes limitations inherent to traditional methods relying on skeletonization or keypoint tracking, which often fail with highly coiled or self-intersecting worms. By applying self-supervised contrastive learning and a Transformer-based network architecture, the authors successfully capture diverse behavioral patterns and depict the aging trajectory of behavioral repertoire. This provides a useful new tool for behavioral research in C. elegans and other flexible-bodied organisms.
Strengths:
Reliable tracking and segmentation of complex poses remain significant bottlenecks in C. elegans behavioral research, and the authors made valuable attempts to address these challenges. The presented method offers several advantages over existing tools, including freedom from manual labeling, independence from explicit skeletonization or keypoint tracking, and the capability to capture highly coiled or overlapping poses. Thus, the proposed method would be useful to the C. elegans research community.
The research question is clearly defined. Methods and results are engagingly presented, and the manuscript is concise and well-organized.
Weaknesses:
(1) In the abstract, the claim of an 'unbiased' approach is not well-supported. The method is still affected by dataset biases, as mentioned in the aging results (section 4.3).
(2) In section 3.2, the rationale behind rotating worm images to a vertical orientation is unclear.
(3) The methods section is clearly written but uses overly technical language, making it less accessible to the audience of eLife, the majority of whom are biologists. Clearer explanations of key methods and the rationale behind their selection are needed. For example, in section 3.3, the authors should briefly explain in simple language what contrastive learning is, why they chose it, and why this method potentially achieves their goal.
(4) The reason why the gray data points could not be resolved by Tierpsy is not quantitatively described. Are they all due to heavily coiled or overlapping poses?
(5) In section 4.1, generating pose representations grouped by genetic strains would provide insights into strain-specific differences resolved by the proposed method.
(6) Fig. 3a requires clarification. Highly bent poses (red points) intuitively should be close to highly coiled poses (gray points). The authors should explain the observed greenish/blueish points interfacing with the gray points.
(7) In Fig. 3a, some colored points overlap with the gray point cloud. Why can Tierpsy resolve these overlapping points representing highly coiled poses? A more systematic quantitative comparison between Tierpsy and the proposed method is required.
(8) The claim in section 4.2 regarding strain separation in pose embedding spaces is unsupported by Fig. 3a, which lacks strain-based distinctions. As mentioned in point #5, showing pose representations grouped by different strains is required.
(9) In section 4.2, how the authors could verify the statement, "This likely occurs since most strains share common behaviors such as simple forward locomotion"?
(10) An important weakness of the proposed method is its low direct interpretability, as it is not based on handcrafted features. To better interpret the pose/behavior embedding space, it would be helpful to compare it against more basic Tierpsy features in Fig. 3 and 4. This comparison could reveal what understandable features were learned by the neural network, thereby increasing human interpretability.
(11) The main conclusion of section 4.3 is not sufficiently tested. Is Fig. 5a generated only from data of N2 animals? To quantitatively verify the statement, "Young individuals appear to display a wide range of behaviors, while as they age their behavior repertoire reduces," the authors should perform a formal analysis of behavioral variability throughout aging.
(12) In Fig. 5a, better visualization of aging trajectories could include plotting the center of mass along with variance of the point cloud over time.
(13) To better reveal aging trajectories of behavioral changes for different genetic backgrounds, it would be meaningful to generate behavior representations for different strains as they age.
(14) As a methods paper, the ease of use for other researchers should be explicitly addressed, and source code and datasets should be provided.
Reviewer #3 (Public review):
Summary:
In this paper, the authors present an unsupervised learning approach to represent C. elegans poses and temporal sequences of poses in low-dimensional spaces by directly using pixel values from video frames. The method does not rely on the exact identification of the worm's contour/midline, nor on the identification of the head and tail prior to analyzing behavioral parameters. In particular, using contrastive learning, the model represents worm poses in low-dimensional spaces, while a transformer encoder neural network embeds sequences of worm postures over short time scales. The study evaluates this newly developed method using a dataset of different C. elegans genetic strains and aging individuals. The authors compared the representations inferred by the unsupervised learning with features extracted by an established approach, which relies on direct identification of the worm's posture and its head-tail direction.
Strengths:
The newly developed method provides a coarse classification of C. elegans posture types in a low-dimensional space using a relatively simple approach that directly analyzes video frames. The authors demonstrate that representations of postures or movements of different genotypes, based on pixel values, can be distinguishable to some extent.
Weaknesses:
- A significant disadvantage of the presented method is that it does not include the direction of the worm's body (e.g., head/tail identification). This highly limits the detailed and comprehensive identification of the worm's behavioral repertoire (on- and off-food), which requires body directionality in order to infer behaviors (for example, classifying forward vs. reverse movements). In addition, including a mix of opposite postures as input to the new method may create significant classification artifacts in the low-dimensional representation-such that, for example, curvature at opposite parts of the body could cluster together. This concern applies both to the representation of individual postures and to the representation of sequences of postures.
- The authors state that head-tail direction can be inferred during forward movement. This is true when individuals are measured off-food, where they are highly likely to move forward. However, when animals are grown on food, head-tail identification can also be based on quantifying the speed of the two ends of the worm (the head shows side-to-side movements). This does not require identifying morphological features. See, for example, Harel et al. (2024) or Yemini et al. (2013).
- Another confounding parameter that cannot be distinguished using the presented method is the size of individuals. Size can differ between genotypes, as well as with aging. This can potentially lead to clustering of individuals based on their size rather than behavior.
- There is no quantitative comparison between classification based on the presented method and methods that rely on identifying the skeleton.