Automated analysis of long-term grooming behavior in Drosophila using a k-nearest neighbors classifier

  1. Bing Qiao
  2. Chiyuan Li
  3. Victoria W Allen
  4. Mimi Shirasu-Hiza
  5. Sheyum Syed  Is a corresponding author
  1. University of Miami, United States
  2. Columbia University, United States
8 figures, 4 videos, 3 tables and 1 additional file

Figures

Overview of approach for detecting Drosophila grooming.

(A) Apparatus used in recording behavior. Flies constrained to individual tubes are continuously illuminated by infrared light from below and recorded by a digital camera from above. LED lights on sides of chamber simulate day-night light conditions. Temperature and humidity probes placed in the chamber are monitored by a computer. Inset: Camera photo of fly tubes in chamber. (B) Examples of the most commonly observed types of grooming in our experiments. The top row displays postures of a fly in inactive state. The three rows below show how the limbs and body of a fly coordinate to perform specific grooming movements. Arrows point to the moving part during grooming. (C) Flowchart of our algorithm used to classify fly behavior. After generating a suitable background image, the algorithm characterizes movements of fly center (CD), core (CM) and periphery (PM) to fully classify behavior in each frame.

https://doi.org/10.7554/eLife.34497.003
Figure 2 with 1 supplement
Feature extraction and behavior classification.

(A) The distribution of grayscale fluctuations in the absence of mobile flies. A cutoff of grayscale value change C0 = 10 rules out >99.99% of fluctuations. Shown here are only positive values of fluctuations, which are symmetric about zero. (B) Maximum area (pixels) of a closed object generated by noise when different threshold C0 are applied. A C0 = 10 rejects objects larger than 20 pixels. Based on this, we set a threshold C1 = 25 to remove objects smaller than 25 pixels without affecting identification of flies which have a typical area of ~300 pixels in our studies. (C) Grayscale value distribution of pixels belonging to 20 individual flies. Two regions are clearly seen: the left region with peak around 40 represents the core of the flies and the right region with peak around 90 represents their periphery. (D) Variations in the center position of a stationary fly. The minimum displacement that represents a true fly center movement is 0.5-pixel length in our experiment, a requirement that excludes >99.99% of false displacements. (E) Examples of original and processed images of a fly displaying different behaviors: Top, left: front leg grooming; top, right: wing grooming; bottom, left: resting; bottom, right: locomoting. In each panel, original images from two consecutive frames are shown on left, periphery in the middle and core on the right. Changes of periphery and core are shown in the bottom row. PM and CM denote differences in the number of pixels representing the fly periphery and core, respectively, in two frames. Features PM and CM are different for different behaviors. Rubbing of front legs manifests through PM (top, left) while sweeping wings affects PM and CM (top, right). (F) k-nearest neighbors (kNN) algorithm works by placing an unclassified sample (black circle) representing a frame into a feature space with pre-labeled samples (green/gray/purple circles, the training set). The label of the unclassified point is decided by the most frequent label among its k-nearest neighbors. The three axes of the feature space are normalized periphery movement (PM), core movement (CM), and center displacement (CD). Fly activity in the feature space is separated into three regions: grooming (green), locomotion (gray) and resting (purple). Training samples (N = 9322 grooming, 9930 locomotion, 5748 rest) and nine unlabeled samples in PM-CM-CD space are shown.

https://doi.org/10.7554/eLife.34497.007
Figure 2—figure supplement 1
Details of environmental conditions and fly detection.

(A) Locomotion (fraction of time spent), relative humidity (RH), and temperature (T) for 3 days during an experiment in constant darkness (DD) conditions. Data are binned in 5 min. (B) Binary images after background subtraction. If the background frame is not updated frequently (typically every 1000 s), both food debris (red boxes) and flies (blue boxes) may be identified as moving objects in a background-subtracted image (top, left and expanded view). The problem is rectified (bottom, left) when the background frame used is closer in time (<1000 s apart) to the image of interest. (C) An example 8-bit frame (on left) and its corresponding background-subtracted binary image showing identified flies. (D) The cross-validation loss of kNN classifier at different k values. Loss decreases with increasing k values, slowing down for k10. The loss function shown here is the averaged error of 10-fold cross validation in behavioral classification. The validation was performed on 25,000 frames from video of 20 flies.

https://doi.org/10.7554/eLife.34497.008
Data pruning and performance evaluation.

(A) Grooming data are pruned after identification by the kNN classifier. A frame is finally labeled as grooming only if this frame is in a group of 15 frames in which 12 or more were labeled as grooming by the classifier (see B below). Frame previously labeled as grooming by the classifier but that did not pass the pruning procedure is relabeled as locomotion. (B) Performance of the classifier with pruning filter sizes of 4/5, 8/10, 8/15, 10/15, 10/20, 12/15, 14/15 and 15/20. Accuracy (closed circles) is equal to the ratio of correct grooming labels to all output grooming labels. Sensitivity (open circles) is equal to the ratio of grooming identified by the classifier to all visually labeled grooming events. We set the pruning filter to be 12/15 to attain >90% accuracy and sensitivity. (C) Fly genotypes vary by size and pigmentation, which can potentially affect performance of our classifier. To verify the generality and robustness of our method to different genotypes, accuracy (top) and sensitivity (bottom) of classifier on w1118, Canton S, iso31, and yw were tested. Error rates in all tested strains were less than 10%.

https://doi.org/10.7554/eLife.34497.011
Figure 4 with 3 supplements
How grooming fits into the daily routine of a fly.

(A) Ethogram of grooming (green), locomotion (gray), feeding (blue), short rest (purple), and sleep (dark gray) performed by an iso31+ fly in 60 s (300 frames). Individual events of these four behaviors are mutually exclusive and together constitute wake (yellow-orange), which is complementary to sleep (dark gray). (B) Average fraction of time flies spent in each behavior. N = 83 iso31+ flies. (C) (D) Correlation between pairs of behaviors. There is strong negative correlation between sleep and locomotion (r = −0.93) and between sleep and short rest (r = −0.63). Interestingly, time spent in grooming does not show strong correlation with any of the other four behaviors. N = 83 iso31+ flies. r is the Pearson product-moment correlation coefficient. (E) Temporal patterns of behaviors of a single iso31+ fly during 4 days in LD cycles. Behaviors shown here are, grooming (G), locomotion (L), feeding (F), short rest (R), wake (W), and sleep (S). Level of activity is shown in terms of fraction of time spent in each behavior. Fraction is calculated every 30 min. White/black horizontal bars indicate light/dark environmental conditions, respectively. (F) Rhythmicity in grooming, locomotion and wake in an example fly. In LD condition, fraction of time spent in these behaviors are plotted on left. In power spectra on right of time series of behaviors (horizontal dash line denotes threshold power for p=0.05), temporal patterns of the three behaviors all show significant circadian rhythmicity. In right top, spectra of randomized grooming show no rhythmicity, while modified locomotion is still rhythmic. Similarly, in time series on right bottom, with the same randomized grooming, wake remains rhythmic while grooming, as one component from it, is arrhythmic. In time series of behaviors, activity is binned every 30 min.

https://doi.org/10.7554/eLife.34497.014
Figure 4—figure supplement 1
Relationships among fly grooming, locomotion, feeding, short rest, and sleep.

(A) Average fraction of time flies spent in grooming (green), locomotion (gray), feeding (blue), short rest (purple), and sleep (dark gray). N = 76 Canton S flies. (B) (C) Correlation between behaviors. Sleep shows different levels of negative correlation to locomotion (r = −0.849), short rest (r = −0.833) and feeding (−0.597). In addition, there is positive correlation between locomotion and short rest (r = 0.627). Interestingly, time spent in grooming does not show strong correlation with any of the other four behaviors. This suggests independent regulation of grooming behavior. N = 76 Canton S flies. (D) Example empirical probability distributions of random paired r values between grooming and short rest (top) and between locomotion and feeding (bottom) in iso31+ flies. p-Values of Pearson coefficient r were calculated based on two-tailed test of such distributions. (E) p-Values of all Pearson correlation coefficients r in Figure 4C,D (top table) and Figure 4—figure supplement 1B,C (bottom table). p-Values in red are from examples in (D). p<10−5 is displayed as 0 in these tables. (F) Example of binned data (reproduced from Figure 4E) showing fraction of time in different behaviors. In this representation, behaviors are not mutually exclusive and each behavior is free to assume any value between 0 and 1 (inclusive) such that wake time +sleep time=1 for every bin. Grooming: G, Locomotion: L, Feeding: F, Short rest: R, Wake; W, Sleep: S.

https://doi.org/10.7554/eLife.34497.015
Figure 4—figure supplement 2
Temporal relationships between grooming and locomotion.

(A) Position within the tube (top row), locomotion (middle) and grooming (bottom) of a single iso31+ fly during one day in LD. Locomotion and grooming are shown in terms of fraction of time spent in 5 min bins. White/black bars indicate light/dark environmental conditions, respectively. (B) Probability density of the intervals between grooming events (green) and between locomotion events (gray). Probability distributions were constructed from ~33,000 intervals between grooming events and ~73,000 intervals between locomotion events detected in 83 iso31+ flies. (C) Longest intervals between grooming events (green) and between locomotion events (gray). Each point represents an individual fly recorded for a day. N = 83 iso31+ flies, p=1.2×10−19. (D) Probability density of the duration of grooming events (green) and locomotion events (gray). Probability distributions were constructed from ~33,000 grooming events and ~73,000 locomotion events detected in 83 iso31+ flies. (E) Longest duration of grooming (green) and locomotion events (gray). Each point represents an individual fly recorded for a day. N = 83 iso31+ flies, p=3.6×10−8. (F) (G) Example fits (red) of temporal patterns of grooming activity (green) and locomotion activity (gray) of an individual fly during 3 days in LD environment. Horizontal white/black bars represent alternating light/dark conditions. (H) Sketch of the mathematical model that uses four exponential terms to describe temporal patterns of a fly activity. Parameters bMD, bER, bED, bMR, TM and TE (see Figure 4—figure supplement 3) are marked in the plot. (I–N) Comparison of parameter values yielded by fits to locomotion and grooming data. Each circle represents an individual fly (N = 9). Data from same fly are connected by a solid line. (O) Average amount time spent in grooming (green), visiting food (blue) and locomotion (gray) during two days in LD. Each behavior time series is normalized by its maximum to allow for easy comparison of their relative phases. In wild-type flies (top panel), burst in visiting food happens ~1 hr after the morning peak in locomotion. Onset of evening peaks in grooming usually occurs earlier than the peak in locomotion. Time difference between peak in feeding and grooming is considered as the time delay of grooming peak after feeding, as indicated by red arrows. N = 50 iso31+ flies. (P) The time difference in onset of bursts in grooming and locomotion (gray), grooming and feeding (blue), in LD conditions. Discreteness in time differences is a consequence of binning the time-series in 30 min. N = 50 iso31+ flies.

https://doi.org/10.7554/eLife.34497.016
Figure 4—figure supplement 3
Mathematical description of temporal changes in grooming and locomotion patterns.

(A) Sketch of the mathematical model that uses four exponential terms to describe temporal patterns of a fly activity. Horizontal white/black bars represent alternating light/dark conditions. (BC) Example fits (red) of (B) temporal pattern and (C) power spectrum of grooming activity (green) of an individual fly during 3 days in LD environment. The activity data are binned in 1 hr for visual clarity. (DE) Example fits (red) of (D) temporal pattern and (E) power spectrum of locomotion activity (gray) of an individual fly during 3 days in LD environment. The activity data are binned in 1 hr for visual clarity. To quantitatively compare the temporal patterns of grooming and locomotion (Figure 4—figure supplement 2), we applied a previously developed mathematical method that allows quantification of the main features in fly locomotion pattern. (Lazopulo and Syed, 2016). The quantification is achieved by fitting activity data with a model that consists of four exponential terms:

F(t)={HMebMDTMebMDtebMDTM1,0<t<TMHMebMR(tTM)1ebMR(ToTM),TM<t<T0HE1ebER(tTo2TE)1ebERTE,T02TE<t<T02HEebEDtTo2,T02<t<T0

The model has nine independent parameters that describe activity pattern. Parameters bMD, bMR, bED, bER define rates of morning decay (MD), morning rise (MR), evening decay (ED) and evening rise (ER), respectively. Parameter T0 defines circadian period, TM and TE define widths of M and E peaks, and HM and HE define heights of M and E peaks, as shown in sketch in panel (A). The white and black horizontal bars represent lights-on and -off phases of the external light-dark cycle. Values of the parameters are obtained from the activity data in a few steps. First, the circadian period is estimated from the power spectrum of activity data. Then, preliminary parameter values are estimated by fitting the locomotion recording with the function Ft. These values serve as initial guess for fitting the data power spectrum with an analytical expression derived by calculating the Fourier transform of Ft:

F~Tn=1T00T0Ftei2πnT0dt,

where Tn = T0/n, with n=1, 2, 3 and T0 is the circadian period. By using the spectral fit, we extract model parameters without filtering or binning. Fitting of the power spectrum produces final values for the model parameters, which are then used to construct the final form of F(t), our model of fly activity rhythms. Examples of fits of grooming and locomotion activities and their respective power spectra are provided in panels (BE). Parameter values and least squares fitting errors of fitting locomotion and grooming spectrum of nine representative individual flies are shown in Table 1 and Table 2. Here the fitting error is calculated from.

Error=i(PfitiPactuali)i(PfitiPrandomi)

where Pactuali and Pfiti are the actual spectral power and fitted spectral power at the ith spectral frequency, respectively. Prandomi is the averaged spectral power from randomly shuffled data at the ith frequency. To get Prandomi, we first randomly shuffle activity data 100 times and compute power spectrum for each of them. Then Prandomi is the average of 100 individual spectral power at the ith frequency.

https://doi.org/10.7554/eLife.34497.017
Figure 5 with 3 supplements
Grooming is under control of the circadian clock.

(A) Average temporal patterns (fraction of time spent in 30 min bins) of locomotion, feeding, short rest and sleep of eight representative iso31+ flies during 3 days in constant darkness (DD). Black horizontal bar represents lights-off condition. (B) Power spectra of behaviors in panel (A). Except for short rest, temporal patterns of the other three behaviors show significant circadian rhythmicity. Horizontal dash line and dash dot line denote threshold powers for p=0.05 and p=0.01, respectively. (C) Grooming activity (in 30 min bins) of wild-type and clock mutants during 2 days in LD cycle followed by four days in DD cycle. Grooming traces are population averages. In DD, wild-type (WT, iso31+) grooming continues to show 24 hr rhythms. In comparison, grooming in perSor perL flies show shorter or longer rhythms, respectively. For per0 flies, grooming is arrhythmic in DD. N = 8 WT, 8 perS, 8 perL, and 8 per0 representative flies. (D) Example power spectra showing circadian rhythmicity in grooming patterns of three individual wild-type, perS, perL and per0 flies. Spectra are normalized to variance of activity (in 30 min bins). Dash lines and dash dot lines represent threshold power at p=0.05 and p=0.01, respectively. More examples of individual power spectra are provided in Figure 5—figure supplement 1. (E) Spectral powers of circadian peaks of individual wild-type and circadian mutants. N = 29 control, 20 perS, 29 perL, 20 per0, 13 cyc01 and 11 clkJRK.

https://doi.org/10.7554/eLife.34497.023
Figure 5—figure supplement 1
(A) Example Lomb-Scargle periodograms of grooming activity of individual per mutants and their background control (WT).

Spectra are normalized by dividing by variance of individual grooming activity binned in 30 min. Dash lines and dash dot lines represent threshold power at p=0.05 and p=0.01 respectively. Spectra of perS, perL, and wt grooming show significant rhythmicities in accordance with their known effects on the pace of the clock. Grooming of per0 flies (fourth column from left) are arrhythmic according to the individual spectral analyses. (B) Periods of significant rhythmicity (at p=0.01 level) in grooming of individual wt, perS and perL flies. Different bin sizes of periods is a result of evenly sampled frequencies in spectral analysis. N = 29 wt, 19 perS, and 29 perL. (C) To test the effect of binning on rhythmicity, we took grooming data of individual flies recorded at 5 Hz, binned them in 30 min, 5 min and 1 min and ran Lomb-Scargle periodogram analysis on these time-series. Examples of five individual spectra of each bin size are shown here. In general, smaller bin size increases the separation between statistical cut-off power (p value, horizontal lines) and peak power because of their differential dependence on the number of data points in a time-series (see Materials and methods).

https://doi.org/10.7554/eLife.34497.024
Figure 5—figure supplement 2
Rhythmicity in grooming patterns need not be a direct result of rhythmicity in locomotion or sleep-wake cycles.

For each of the four example flies, raw data of the fraction of time spent in locomotion, grooming and wake behaviors are plotted on left column. Their power spectra (adjacent plots) show significant circadian rhythmicity at p=0.05 level (horizontal dashed line). If raw grooming data are randomly shuffled and locomotion is modified accordingly so that wake is unchanged (middle column), power spectrum of randomized grooming shows no rhythmicity, while modified locomotion is still rhythmic. If instead wake data are modified when grooming are randomized (right column) so that locomotion is unchanged, then grooming again loses rhythmicity while wake remains rhythmic. Time series in the four examples were taken in constant darkness (DD) and binned in 30 min and Lomb-Scargle periodogram were calculated from the binned data.

https://doi.org/10.7554/eLife.34497.025
Figure 5—figure supplement 3
(A) Locomotion (in 30 min bins) of wild-type (iso31+) and clock mutants during two days in LD cycle followed by four days in DD cycle.

Locomotion traces are population averages. In DD, wt locomotor activity continues to show 24 hr rhythms. In comparison, locomotion in perSor perL flies show shorter or longer rhythms, respectively. For per0 flies, locomotion appears arrhythmic in DD. N = 8 WT, 8 perS, 8 perL, 8 per0 flies. (B) Temporal patterns of population averaged grooming of two additional arrhythmic strains during 3 days in DD conditions. Top panel shows cyc01 (N = 13) and bottom shows clkJRK (N = 11). Data are binned in 30 min. (C) (D) Average of spectra of individual cyc01 (panel C left, N = 13) and clkJRK (panel D, left, N = 11) grooming. Dash lines and dash dot lines represent threshold power at p=0.05 and p=0.01, respectively. Example spectra of individual cyc01 (C) and clkJRK (D) flies show power over the circadian range are well below the p=0.05 level.

https://doi.org/10.7554/eLife.34497.026
Figure 6 with 1 supplement
Control of grooming duration is independent of circadian rhythmicity.

In each panel, bar plots on left show average fractional time spent in grooming in mutant and control flies. Pie charts on right present average fractional time spent in grooming (green), locomotion (gray), sleep (dark gray), short rest (purple) and feeding (blue). Here, numerical values for fractional time spent in behavior are indicated only for grooming, locomotion and sleep with additional details in Figure 6—figure supplement 1A. Although loss of a functional clock does not affect grooming amount (A), mutations in clock (B) and cycle (C) genes lead to robust increases in the time flies spend grooming. Additional time for grooming can come from reduction in sleep (B) or reduction in locomotion (C). Reduction in sleep, however, does not always entail similar changes in grooming since sleep mutants fumin (D) and sleepless (E) show divergent alterations in grooming durations. N = 83 control, 53 per0, p=0.28. N = 76 control, 18 cyc01, p=2.7×10−4. N = 28 control, 25 clkJRK, p=7.8×10−9. N = 17 control, 23 fumin, p=0.003. N = 28 control, 17 sss, p=1.3×10−10.

https://doi.org/10.7554/eLife.34497.031
Figure 6—figure supplement 1
Changes in grooming due to mutations in clock, sleep or immune genes.

(A)-(E) Average fraction of time flies spent in grooming (green), locomotion (gray), sleep (dark gray), short rest (purple) and feeding (blue). N = 53 per0 and 83 control, 18 cyc01 and 76 control, 25 clkJRK and 28 control, 23 fumin and 17 control, 17 sss and 28 control. (F) Correlation between normalized sleep and grooming in sss, fumin, cyc01, and clkJRK flies. (G) Correlation between normalized locomotion and grooming in sss, fumin, cyc01 and clkJRK flies. (F)-(G) For the mutants, the fraction of time spent in behaviors are normalized by dividing by the average fraction of time in that behavior by their respective control flies. N = 17 sss, 23 fumin, 18 cyc01, 25 clkJRK, and 53 per0. (H) Population-averaged fractional time spent in grooming. Grooming in imd flies are significantly less than control flies (p<0.001), while PGRP-SAseml does not significantly affect the time spent in grooming. This suggests that Drosophila grooming relies on a working immune system. The decrease in imd flies further suggests that this impact may be independent of the Toll pathway. N = 56 OR, 47 PGRP-SAseml, 45 imd.

https://doi.org/10.7554/eLife.34497.032

Videos

Video 1
Sample raw experimental video
https://doi.org/10.7554/eLife.34497.004
Video 2
Sample video of grooming on head and front legs
https://doi.org/10.7554/eLife.34497.005
Video 3
Sample video of grooming on wings and hind legs
https://doi.org/10.7554/eLife.34497.006
Video 4
Sample video of grooming-like behavior (stretching body)
https://doi.org/10.7554/eLife.34497.013

Tables

Table 1
Parameter values and fitting errors from fitting grooming time-series
https://doi.org/10.7554/eLife.34497.021
Fly #bMDbMRbERbEDT0TMTEHMHEError
10.0000.007−0.0240.01023.96.03.02944420.0115
20.0080.008−0.0550.01423.93.05.02803440.0894
3−0.0080.005−0.0700.027244.03.02958110.0674
4−0.0190.003−0.0350.046244.03.02045400.0674
50.0020.007−0.0420.019244.02.026211550.0541
6−0.0130.0050.0090.026243.03.01713170.0115
70.0260.003−0.0180.00624.34.03.02104360.076
80.1100.008−0.0120.01523.92.05.01583440.0057
9−0.0150.003−0.0010.09823.93.04.02674750.0175
Table 2
Parameter values and fitting errors from fitting locomotion time-series
https://doi.org/10.7554/eLife.34497.022
Fly #bMDbMRbERbEDT0TMTEHMHEError
1−0.0010.0040.0040.03324.06.02.0163116750.01
2−0.0050.073−0.0690.02824.12.02.082514340.0037
3−0.0130.0630.0200.00223.92.91.9416212080.0469
4−0.0100.0200.0220.00123.73.02.0335513880.001
50.0550.060−0.3380.007243.03.074119480.0056
60.0090.015−0.0540.02823.63.03.0150913690.1029
70.0010.028−0.0220.023242.03.0153510100.0072
8−0.0150.008−0.0070.03223.93.02.0150423080.007
90.0140.020−0.0280.01623.93.03.0151920040.0007
Key resources table
Reagent type (species)
or resource
DesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Drosophila melanogaster, male)sssp1DOI: 10.1126/science.1155942on iso31 background
Strain, strain background (D. melanogaster, male)iso31DOI: 10.1126/science.1155942
Strain, strain background (D. melanogaster, male)fuminDOI: 10.1523/JNEUROSCI.2048-05.2005on w1118 background
Strain, strain background (D. melanogaster, male)w1118Bloomington Drosophila Stock CenterBDSC: 3605
Strain, strain background (D. melanogaster, male)Canton SBloomington Drosophila Stock CenterBDSC: 64349
Strain, strain background (D. melanogaster, male)clkJRKthis paperbackcrossed for five generations to iso31
Strain, strain background (D. melanogaster, male)per0this paperbackcrossed for five generations to iso31+
Strain, strain background (D. melanogaster, male)perSthis paperbackcrossed for six generations to iso31+
Strain, strain background (D. melanogaster, male)perLthis paperbackcrossed for six generations to iso31+
Strain, strain background (D. melanogaster, male)cyc01otheron Canton S background, gifts from William Ja
Strain, strain background (D. melanogaster, male)iso31+othergifts from Michael Young

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Bing Qiao
  2. Chiyuan Li
  3. Victoria W Allen
  4. Mimi Shirasu-Hiza
  5. Sheyum Syed
(2018)
Automated analysis of long-term grooming behavior in Drosophila using a k-nearest neighbors classifier
eLife 7:e34497.
https://doi.org/10.7554/eLife.34497