The DANCE assay provides an accessible approach for quantifying aggression and courtship behaviors.

(A) Comparison of existing machine-vision camera hardware with the simplified, low-cost DANCE hardware for behavior acquisition. (B) Workflow for developing DANCE classifiers, including training, benchmarking against existing methods and manual ground-truthing to generate behavioral scores. (C) Behavioral classifiers developed to quantify male aggression (lunge) and courtship (wing extension, circling, following, attempted copulation, and copulation). (D) Representative raster plots comparing ground-truth, DANCE, CADABRA, and Divider assay performance for aggression. (E) Representative raster plots comparing ground-truth, DANCE, and MateBook performance for courtship. Created with BioRender.com/i1hyc7d.

Validation of the DANCE lunge classifier for quantifying male aggression.

(A) Lunge scores from 20-minute-long videos obtained by the ground-truth (gray), the DANCE lunge classifier (orange), CADABRA (purple) and Divider assay classifier (green). (B-E) Comparison of lunge scores across different aggression levels, based on manual scoring and predictions from DANCE, CADABRA, and Divider: (B) 0–70 lunges ‘low aggressive’ (n=10; **P<0.0017, ****P<0.0001). (C) 71–160 lunges, ’moderately aggressive’, (n=11; **P<0.0102, ***P<0.0002, ***P<0.0001). (D) 161–300 lunges, ‘highly aggressive’, (n = 11; **P<0.0057, ***P<0.0002, ***P<0.0004). and (E) >300 lunges, ‘hyper-aggressive’, (n = 8; *P<0.0402, **P<0.0029, ***P<0.0006; Friedman’s ANOVA with Dunn’s test) (F) Regression analysis of the DANCE ‘lunge classifier’ vs. manual scores (R2=0.9760, n=40). (G) Regression of the CADABRA vs. the DANCE lunge classifier (R2=0.9, n=40). (H) Regression of the Divider assay lunge classifier score vs. manual score (R2=0.7739, n=40). (I) Precision, recall, and F1 scores of the DANCE lunge classifier compared with those of CADABRA and Divider.

Evaluation of the DANCE wing extension classifier for quantifying courtship behavior.

(A) Wing extension index of males from 15 minutes long videos scored by manual ground-truth (gray), the DANCE wing extension classifier (orange) and MateBook (purple), with decapitated virgin females. MateBook underscored wing extension across multiple videos (Friedman’s ANOVA with Dunn’s test: ground-truth vs MateBook p=0.0020, n=15; ground-truth vs DANCE p>0.9999). (B) Comparison of ground-truth, DANCE, and MateBook wing extension scores (KruskallWallis ANOVA with Dunn’s test, ns, p>0.9999, *p=0.0436; n=15). (C) Regression analysis of the DANCE wing extension classifier vs. ground-truth (R2=0.9831, n=15). (D) Regression of MateBook vs. ground-truth (R2=0.1054, n=15). (E) Precision, recall, and F1 score of the DANCE wing extension classifier and MateBook relative ground-truth scores.

Validation of the DANCE attempted-copulation classifier.

(A) Attempted copulation index of males from 15-minute-long videos scored by manual ground-truth (gray), the ‘DANCE attempted copulation classifier’ (orange) and the MateBook (purple) with both mated and decapitated females. (B) Comparison of ground-truth, DANCE attempted-copulation classifier, and MateBook scores (KruskallJWallis ANOVA with Dunn’s test, ns, p>0.9999; ****p<0.0001, n=32). (C) Regression analysis of the attempted-copulation classifier vs. ground-truth (R2=0.8742, n=32). (D) Regression analysis of MateBook vs. ground-truth (R2=1512, n=32). (E) Precision, recall, and F1 score of the DANCE and MateBook attempted-copulation classifiers relative to the ground-truth scores.

Evaluation of the DANCE circling classifier.

(A) Circling index of males from 15 minute long videos from the ground-truth (gray), ‘DANCE circling classifier’ (orange) and MateBook (purple) with decapitated virgin females Friedman’s ANOVA with Dunn’s test, p=0.2049, ****p<0.0001, n=12. (B) Comparison of the ground-truth, DANCE, and MateBook circling classifiers (ordinary one-way ANOVA with Dunnett’s test, ns, p=0.8014; *p=0.0157, n=12). (C) Regression analysis of the DANCE circling classifier vs. ground-truth (R2=0.92, n=12). (D) Regression of MateBook vs. ground-truth (R2=0.88, n=12). (E) Precision, recall, and F1 score of the DANCE and MateBook circling classifiers relative to the ground-truth score.

DANCE hardware and recording setup.

(A) DANCE aggression setup (B) 3D-rendered components of the aggression setup (C) DANCE courtship setup (D) 3D-rendered components of the courtship setup, showing males and females separated by an X-ray film separator or ‘divider comb’. (E-G) Top and side views of the DANCE setup with a smartphone camera for recording and an electronic tablet as the backlight.

Benchmarking DANCE hardware and application to neurogenetic tools.

(A-B) Courtship behaviors recorded using a pre-existing circular setup and DANCE setup in group-housed (GH) and single-housed (SH) flies for (C-D) wing extension, (C) ***p<0.0010, n=23; (D) ****p<0.0001, GH, n=22 and SH, n=33. (E-F) Attempted copulation, (E) ***p<0.0002, n=23; (F) **p<0.0022, GH, n=21 and SH, n=33. (G-H) Following, (G) ns, p>0.0959, n=23; (H) ns, p<0.2537, GH, n=22 and SH, n=32. (I-J) Circling, (I) *p<0.012, n=23; (J) *p<0.0104, GH, n=24 and SH, n=31. (K-L) Aggressive lunges were recorded using a pre-existing circular setup and a DANCE setup. (MlN) Lunges of SH flies compared with those of GH flies reared on food with yeast granules. (M) **p<0.0138, n=36; (N) **p<0.0372, n=40. (O) Effect of yeast extract food on aggressive behavior; ****p<0.0001, n=38. (P-Q) Genetic knockdown of the neuropeptide Drosulfakinin (Dsk) in insulin-producing neurons using dilp2-GAL4. (P) ns, p<0.0502; ns, p>0.9999; ****p<0.0001; **p<0.0040; n=35. (Q) ****p<0.0001, ns, p>0.9999, ****p<0.0001, *p>0.0210, n=30. (R) Optogenetic silencing of dopaminergic neurons with UAS-GtACR1 driven by the TH-GAL4 driver; ns, p<0.0986; ns, p>0.9999; ****p>0.0001; **p>0.0012; **p>0.0013; n=24. (ClJ and MlO) MannlWhitney U test; (PlR) KruskallWallis test with Dunn’s multiple comparisons.

Aggression chamber described by Dankert et al. (2009).

The setup consists of a bottom food plate, 12 wells (aggression arenas), and a top plate with fly-loading holes and a screw slot that allows sliding of the loading plate.

Courtship setup described by Koemans et al. (2017).

The setup consists of 18 wells (courtship arenas), a top cover plate, a sliding loading plate, and a sliding divider assembly used to separate male and female flies.

Comparison of annotations by two independent evaluators to assess observer bias during ground-truthing.

(A) Aggressive lunges, p=0.8789, n=15. (B) Courtship wing extension, p=0.9999, n=13. (C) Attempted copulation, p=0.0571, n=16. (D) Circling, p=0.4343, n=12. (E) Following, p=0.4405, n=13. (F) Copulation, p=0.9221, n=25. (A-F) All comparisons were performed using the MannlJWhitney U test.

Evaluation of the DANCE lunge classifier predictions across training videos.

(A–D) Scatter plots showing the correlation between manual ground-truth annotations and DANCE-predicted frame counts across four independent training videos (Videos 8–11). (A) R² = 0.9794, (B) R² = 0.9760, (C) R² = 0.9847, and (D) R² = 0.9893, demonstrating close agreement between automated classification and manual scoring. (E–H) Bar plots showing the precision, recall, and F1 score of the DANCE classifier for the corresponding videos. Video 9, which yielded the highest overall performance, was also used in Figure 2 for inter-method comparison with CADABRA and the Divider Assay to maintain consistency across analyses.

Evaluation of the DANCE wing extension classifier in the mated-female dataset.

(A) Wing extension of males from 15-minute videos scored by manual ground-truth (gray), DANCE classifier (orange), and MateBook (purple) with mated females. MateBook underscored wing extension across multiple videos (Friedman’s ANOVA with Dunn’s test; p=0.3582, p<0.0001; n=25). (B) Comparison of wing extension scores from ground-truth, DANCE, and MateBook datasets (Kruskal–Wallis ANOVA with Dunn’s test; p>0.9999, p=0.1039; n=25). (C) Regression of DANCE classifier scores vs. ground-truth (R²=0.9951, n=25). (D) Regression of MateBook vs. ground-truth (R²=0.8282, n=25). (E) Precision, recall, and F1 scores of the DANCE classifier and MateBook relative to ground-truth annotations.

Evaluation of the DANCE copulation classifier in the mixed-female dataset.

(A) Copulation scores from 15-min videos obtained by manual ground-truth (gray), the DANCE copulation classifier (orange), and MateBook (purple) (Friedman’s ANOVA with Dunn’s test, p>0.9999; p>0.9999, n=21). (B) Box plot comparison of the manual ground-truth, DANCE copulation classifier, and MateBook (KruskallJWallis ANOVA with Dunn’s test, p>0.9999; p>0.9999, n=21). (C) Regression of DANCE classifier scores vs. manual ground-truth (R²=0.98, n=21).. (D) Regression of MateBook vs. manual ground-truth (R²=0.81, n=21). (E) Precision, recall, and F1 score of the DANCE copulation classifier and MateBook relative to the ground-truth score.

Evaluation of the DANCE circling classifier in the mated-female dataset.

(A) Circling index of males from 15-minute videos scored by manual ground-truth (gray), the DANCE circling classifier (orange) and the MateBook (purple) with mated females (Friedman’s ANOVA with Dunn’s test, p>0.9999; p <0.0001, n=19). (B) Comparison of circling scores from ground-truth, DANCE, and MateBook datasets (KruskallJWallis ANOVA, p>0.9999; p=0.0822, n=19). (C) Regression of DANCE classifier scores vs. ground-truth (R2 = 0.9494, n = 19). (D) Regression of MateBook vs. ground-truth (R2=0.6938, n=19). (E) Precision, recall, and F1 scores of the DANCE and MateBook circling classifiers relative to the ground-truth scores.

Evaluation of the DANCE following classifier in the mated-female dataset.

(A) Following index of males from 15-minute videos scored by manual ground-truth (gray), DANCE following classifier (orange) and MateBook (purple) with mated females (Friedman’s ANOVA with Dunn’s test, p=0.1794; p=0.0029; n=25). (B) Box plot comparison of the following scores from the ground-truth, DANCE, and MateBook datasets (KruskallWallis ANOVA with Dunn’s test, p>0.9999; p=0.5287; n=25). (C) Regression analysis of the following classifier vs. ground-truth (R2=0.9894, n=25). (D) Regression of MateBook vs. ground-truth (R2=0.9204, n=25). (E) Precision, recall, and F1 scores of DANCE and MateBook following classifiers relative to ground-truth annotations.

Frame-level analysis of duration-based courtship classifiers.

(A-G) Bar plots showing precision, recall, and F1 scores for DANCE (orange) and MateBook (purple) across different behaviors: (A) Wing extension toward decapitated females, (B) Wing extension toward mated females, (C) Copulation, (D) Attempted copulation (mixed dataset), (E) Circling toward decapitated females, (F) Circling toward mated females, and (G) Following. Percent error rates are shown above each bar.

DANCE optogenetic recording setup for behavioral experiments.

(A) Schematic of the setup showing green LEDs (520–540 nm) controlled by an Arduino and powered via a PC, with illumination directed onto blister-pack arenas placed on an electronic tablet. The tablet provides an adjustable backlight via screen-light software, and a smartphone camera records behavior. (B) Photograph of the complete setup, showing the smartphone, tablet, LED stands, Arduino controller, and arenas under green LED illumination.

Effect of optogenetic silencing of dopaminergic neurons on daytime activity.

(A–B) Transient silencing of dopaminergic neurons using UAS-GtACR1 did not affect daytime activity between SH and GH flies. (A) On day 1 (no green-light silencing), one-way ANOVA with Tukey’s multiple comparisons revealed no differences for TH-GAL4 (ns, p=0.7383; GH, n=43; SH, n=43), UAS-GtACR1 (ns, p=0.4812; GH, n=51; SH, n=42), and TH-GAL4>UAS-GtACR1 (ns, p=0.9942; GH, n=54; SH, n=51). (B) On day 2 (with green-light silencing), no differences were observed for TH-GAL4 (ns, p=0.9976; GH, n=43; SH, n=43), UAS-GtACR1 (ns, p=0.9779; GH, n=51; SH, n=42), and TH-GAL4>UAS-GtACR1 (ns, p=0.9974; GH, n=54; SH, n=51). Two-way ANOVA across days revealed no significant interaction or main effects for TH-GAL4 (interaction: ns, p=0.5504; silencing: ns, p=0.5172; housing: ns, p=0.1602), UAS-GtACR1 (interaction: ns, p=0.4533; silencing: ns, p=0.3602; housing: p=0.255; GH, n=51; SH, n=42), or TH-GAL4>UAS-GtACR1 (interaction: ns, p=0.9977; silencing: ns, p=0.2454; housing: ns, p=0.5868). Within-day comparisons were performed using one-way ANOVA with Tukey’s multiple comparisons; across-day comparisons were performed using two-way ANOVA.

Effect of optogenetic silencing of dopaminergic neurons on male aggression across arena sizes.

(A–C) Number of lunges performed in 20 minutes by TH-GAL4>UAS-GtACR1 males tested in arenas of different diameters: (A) 13 mm (n=29–38), (B) 17 mm (n=27–32), and (C) 21 mm (n=16). Lunges of flies with GtACR1-mediated silencing were significantly greater than those of GH and control flies across all arena sizes. Statistical comparisons are indicated: ****p<0.0001; ns, not significant; ordinary one-way ANOVA followed by Tukey’s multiple comparisons test.

Quantification of wild-type Drosophila courtship behavior in DANCE chambers of varying diameters.

(A–C) Wing extension bouts (A: n=22, 33; B: n=25, 28; C: n=20, 25). (D–F) Attempted copulation bouts (D: n=21, 33; E: n=25, 28; F: n=20, 25). (G–I) Following bouts (G: n=22, 32; H: n=25, 28; I: n=21, 24). (J–L) Circling bouts (J: n=24, 31; K: n=26, 24; L: n=18, 23). Data were obtained from Canton-S (CS) males paired with either group-housed (GH) or single-housed (SH) females over 15 minutes in chambers of 11 mm, 13 mm, or 17 mm diameter. Each bar represents the median. Statistical differences are indicated as *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001; ns, not significant (Mann–Whitney U test).