Participants and experimental protocol. Thirty-two male and 17 female wild-caught, temporarily-captive great-tailed grackles either inhabiting a core (17 males, 5 females), middle (4 males, 4 females) or edge (11 males, 8 females) population of their North American breeding range (establishment year: 1951, 1996, and 2004, respectively), are participants in the current study (grackle images: Wikimedia Commons). Each grackle is individually tested on a two-phase reinforcement learning paradigm: initial learning, two colour-distinct tubes are presented, but only one coloured tube (e.g., dark grey) contains a food reward (F+ versus F-); reversal learning, the stimulus-reward tube-pairings are swapped. The learning criterion is identical in both learning phases: 17 F+ choices out of the last 20 choices, with trial 17 being the earliest a grackle can successfully finish (for details, see Materials and methods).

Grackle reinforcement learning. Behaviour. Across-population learning speed and choice-option switches in (A-B) initial (M, 32; F, 17) and (D-E) reversal learning (M, 29; F, 17), with (C,F) respective posterior estimates and M-F contrasts. Mechanisms. Within- and across-population estimates and contrasts of information-updating rate φ and risk-sensitivity rate λ in (G,I) initial and (H,J) reversal learning. In (G-J) open circles show 100 random posterior draws; red filled circles and vertical lines show posterior means and 89% HPDI, respectively. Simulations. Learning speed and choice-option switches by: 10,000 full posterior-informed ‘birds’ (n = 5,000 per sex) in (K-L) initial and (N-O) reversal learning; and six average posterior-informed ‘birds’ (n = 3 per sex) in (M) initial and (P) reversal learning. In (K,N) the full simulation sample is plotted; in (L,O) open circles show 100 random simulant draws. Note (K,N) x-axes are cut to match (A,D) x-axes. Medians are plotted/labelled in (A,B,D,E,K,L,N,O).

Figure 2—figure supplement 1. Excluding extra learning trials.

Evolutionary optimality of strategising risk-sensitive learning. (A) Illustration of our evolutionary algorithm model to estimate optimal learning parameters that evolve under systematically varied pairings of two key (urban) ecology axes: environmental stability u. and environmental stochasticity s. Specifically, 300-member populations run for 10 independent 7000-generation simulations per pairing, using ‘roulette wheel’ selection (parents are chosen for reproduction with a probability proportional to collected F+ rewards out of 1000 choices) and random mutation (offspring inherit learning genotypes with a small deviation in random direction). (B) Mean optimal learning parameter values discovered by our evolutionary model (averaged over the last 5000 generations). As the statistical environment becomes more urban-like (lower u and higher s values), selection should favour lower information-updating rate φ and higher risk-sensitivity rate λ (darker and lighter squares in left and right plot, respectively). We note arrows are intended as illustrative aids and do not correspond to a linear scale of ‘urbanness’

Reinforcement learning speed.

Between- and across-population total-trials-in-test Poisson regression model estimates and male-female contrasts, with corresponding lower (L) and upper (U) 89% highest-posterior density intervals in parentheses.

Reinforcement learning switches.

Between- and across-population total-choice-option-switches-in-test Poisson regression model estimates and male-female contrasts, with corresponding lower (L) and upper (U) 89% highest-posterior density intervals in parentheses.

Reinforcement learning information-updating rate ϕ.

Between- and across-population computational model ϕ estimates and male-female contrasts, with posterior means and corresponding lower (L) and upper (U) 89% highest-posterior density intervals in parentheses.

Reinforcement learning risk-sensitivity rate λ.

Between- and across-population computational model λ estimates and male-female contrasts, with posterior means and corresponding lower (L) and upper (U) 89% highestposterior density intervals in parentheses.

Left panel: images showing a male and female great-tailed grackle (credit: Wikimedia Commons). Right panel: schematic of the colour-reward reinforcement learning experimental protocol. In the initial learning phase, great-tailed grackles are presented with two colour-distinct tubes; however, only one coloured tube (e.g., dark grey) contains a food reward (F+ versus F-). In the reversal learning phase, the colour-reward tube-pairings are swapped. The passing criterion was identical in both phases (see main text for details).

Group-level tube-choice behaviour of simulated great-tailed grackles across colour-reward reinforcement learning trials (females: yellow, n = 14; males: green, n = 35), following model validation step one. Tube option 1 (e.g., dark grey) was the rewarded option in the initial learning phase; conversely, tube option 2 (e.g., light grey) contained the food reward in the reversal learning phase. Each open circle represents an individual tube-choice; black lines indicate binomial smoothed conditional means fitted with grey 89% compatability intervals.

Comparison of assigned and recovered φ and λ values, following model validation step two. Eighty-nine percent highest posterior density intervals (HPDI) are shown for recovered values.

Comparison of learning ability in simulated female (yellow; n = 14) and male (green; n = 35) great-tailed grackles across initial and reversal colour-reward reinforcement learning, following model validation step two. (A) φ, the rate of learning i.e., speed. (B) λ, the rate of sampling i.e., switching between choice-options. (C) and (D) show posterior distributions for respective contrasts between female and male learning. Eighty-nine percent highest posterior density intervals are shaded in grey; that this interval does not cross zero evidences a simulated effect of sex on learning ability.

Parameter recovery test for different sizes of simulated sex differences. Plots show posterior estimates of the effect of sex (contrasts between simulated male and female great-tailed grackles; n = 14 and 35, respectively) on speed (φ) and sampling (λ) learning parameters, following model validation step three. Black circles represent the mean recovered sex effect estimates with grey eighty-nine percent highest posterior density intervals (HPDIs); black solid diagonal lines represent a ‘perfect’ match between assigned and recovered parameter estimates (note that we would not expect a perfect correspondence due to stochasticity of agent-based simulations); and black dashed horizontal lines represent a recovered null sex effect.

Individual-level tube-choice behaviour of simulated great-tailed grackles across colour-reward reinforcement learning trials (females: yellow, n = 14; males: green, n = 35). Tube option 1 (e.g., dark grey) was the rewarded option in the initial learning phase; conversely, tube option 2 (e.g., light grey) contained the food reward in the reversal learning phase. Each open circle shows an individual tube-choice; black solid lines show loess smoothed conditional means fitted with grey 89% compatibility intervals; and dashed black lines show individual-unique transitions between learning phases.

Comparison of information-updating rate φ and risk-sensitivity rate λ estimates (top and bottom row, respectively) in initial learning excluding and including extra initial learning trials (left and right column, respectively), which are present in the original data set (see Methods and materials). Because this comparison does not show any noticeable difference depending on their inclusion or exclusion, we excluded extra learning trials from our analyses. All plots are generated via model estimates using our full sample size: 32 males and 17 females.