Introduction

Dispersal and range expansion go ‘hand in hand’; movement by individuals away from a population’s core is a pivotal precondition of witnessed growth in species’ geographic limits (Ronce, 2007; Chuang and Peterson, 2016). Because ‘who’ disperses—in terms of sex—varies both within and across taxa (for example, male-biased dispersal is dominant among fish and mammals, whereas female-biased dispersal is dominant among birds; see Table 1 in Trochet et al., 2016), skewed sex ratios are apt to arise at expanding range fronts, and, in turn, differentially drive invasion dynamics (Miller et al., 2011). Female-biased dispersal, for instance, can ‘speed up’ staged invertebrate invasions by increasing offspring production (Miller and Inouye, 2013). Alongside sex-biased dispersal, learning is also argued to contribute to species’ colonisation capacity, as novel environments inevitably present novel (foraging, predation, shelter, and social) challenges that newcomers need to surmount in order to settle successfully (Wright et al., 2010; Sol et al., 2013; Barrett et al., 2019).

Indeed, a growing number of studies show support for this supposition (as recently reviewed in Lee and Thornton, 2021). Carefully controlled choice tests, for example, show urban-dwelling individuals—that is, the invaders—will learn novel stimulus-reward pairings more readily than do rural-dwelling counterparts, supporting the idea that urban invasion selects for learning phenotypes at the dispersal and/or settlement stage(s) (Batabyal and Thaker, 2019). Given the independent influence of sex-biased dispersal and learning on range expansion, it is perhaps surprising, then, that their potential interactive influence on movement ecology remains unexamined empirically (but not theoretically: Liedtke and Fromhage, 2021a,b), particularly in light of concerns over (in)vertebrates’ resilience to ever-increasing urbanisation (Eisenhauer and Hines, 2021; Li et al., 2022).

Great-tailed grackles (Quiscalus mexicanus; henceforth, grackles) are an excellent model for empirical examination of the interplay between sex-biased dispersal, learning, and ongoing urban-targeted rapid range expansion: over the past ∼150 years, they have seemingly shifted their historically urban niche to include more variable urban environments (e.g., arid habitat; Summers et al., 2023), moving from their native range in Central America into much of the United States, with several first-sightings spanning as far north as Canada (Dinsmore and Dinsmore, 1993; Wehtje, 2003; Fink et al., 2020). Notably, the record of this urban invasion is heavily peppered with first-sightings involving a single or multiple male(s) (41 of 63 recorded cases spanning most of the twentieth century; Dinsmore and Dinsmore, 1993). Moreover, recent genetic data show, when comparing grackles within a population, average relatedness: (i) is higher among females than among males; and (ii) decreases with increasing geographic distance among females; but (iii) is unrelated to geographic distance among males; hence, confirming urban invasion in grackles is male-led via sex-biased dispersal (Sevchik et al., 2022). Considering these life history and genetic data in conjunction with data on grackle wildlife management efforts (e.g., pesticides, pyrotechnics, and sonic booms; Luscier, 2018), it seems plausible that urban invasion might drive differential learning between male and female grackles, potentially resulting in a spatial sorting of the magnitude of this sex difference with respect to population establishment age (i.e., sex-effect: newer population > older population; Phillips et al., 2010). In range-expanding western bluebirds (Sialia mexicana), for example, more aggressive males disperse towards the invasion front; however, in as little as three years, the sons of these colonisers show reduced aggression as the invasion front moves on (Duckworth and Badyaev, 2007). Whether sex-biased dispersal and learning similarly interact in urban-invading grackles, remains an open and timely question.

Here, for the first time (to our knowledge), we examine whether, and, if so, how sex mediates learning across 32 male and 17 female wild-caught, temporarily-captive grackles either inhabiting a core (17 males, 5 females), middle (4 males, 4 females) or edge (11 males, 8 females) population of their North American range (based on year-since-first-breeding: 1951, 1996, and 2004, respectively; details in Materials and methods; Figure 1). Collating, cleaning, and curating existing reinforcement learning data (Logan, 2016c; Logan et al., 2022a, 2023b)—wherein novel stimulus-reward pairings are presented (i.e., initial learning), and, once successfully learned, these reward contingencies are reversed (i.e., reversal learning)—we test the hypothesis that sex differences in learning are related to sex differences in dispersal. As range expansion should disfavour slow, error-prone learning strategies, we expect male and female grackles to differ across at least two reinforcement learning behaviours: speed and choice-option switches. Specifically, as documented in our preregistration (see Supplementary file 1), we expect male—versus female—grackles: (prediction 1 and 2) to be faster to, firstly, learn a novel colour-reward pairing, and secondly, reverse their colour preference when the colour-reward pairing is swapped; and (prediction 3) to make fewer choice option-switches during their colour-reward learning; if learning and dispersal relate. Finally, we further expect (prediction 4) such sex-mediated differences in learning to be more pronounced in grackles living at the edge, rather than the intermediate and/or core region of their range.

Participants and experimental protocol. Thirty-two male and 17 female wild-caught, temporarily-captive great-tailed grackles either inhabiting a core (17 males, 5 females), middle (4 males, 4 females) or edge (11 males, 8 females) population of their North American breeding range (establishment year: 1951, 1996, and 2004, respectively), are participants in the current study (grackle images: Wikimedia Commons). Each grackle is individually tested on a two-phase reinforcement learning paradigm: initial learning, two colour-distinct tubes are presented, but only one coloured tube (e.g., dark grey) contains a food reward (F+ versus F-); reversal learning, the stimulus-reward tube-pairings are swapped. The learning criterion is identical in both learning phases: 17 F+ choices out of the last 20 choices, with trial 17 being the earliest a grackle can successfully finish (for details, see Materials and methods).

To comprehensively examine links between sex-biased dispersal and learning in urban-invading grackles, we employ a combination of Bayesian computational and cognitive modelling methods, and both agent-based and evolutionary simulation techniques. Specifically, our paper proceeds as follows: (i) we begin by describing grackles’ reinforcement learning and testing our predictions using multi-level Bayesian Poisson models; (ii) we next ‘unblackbox’ candidate learning mechanisms generating grackles’ reinforcement learning, using a multi-level Bayesian reinforcement learning model; (iii) we then try to replicate our behavioural data via agent-based forward simulations, to determine if our detected learning mechanisms underpin our grackles’ reinforcement learning; (iv) and we conclude by examining the evolutionary implications of variation in these learning mechanisms under urban-like (or not) settings via algorithmic optimisation.

Results

Reinforcement learning behaviour

We observe robust reinforcement learning dynamics across populations (full between- and across-population model outputs in Supplementary file 2). As such, we compare male and female grackles’ reinforcement learning across populations. Both sexes start out as similar learners, finishing initial learning in comparable trial numbers (median trials-to-finish: males, 32; females, 35; Figure 2A and Supplementary file 2a), and with comparable counts of choice-option switches (median switches-at-finish: males, 10.5; females, 15; Figure 2 and Supplementary file 2b). Indeed, the male-female (M-F) posterior contrasts for both behaviours centre around zero, evidencing no sex-effect (Figure 2C). Once reward contingencies reverse, however, male—versus female—grackles finish this ‘relearning’ faster: they take fewer trials (median trials-to-finish: males, 64; females, 81; Figure 2B and Supplementary file 2a), and make fewer choice-option switches (median switches-at-finish: males, 25; females, 35; Figure 2B and Supplementary file 2b). The M-F posterior contrasts, which lie almost entirely below zero, clearly capture this sex-effect (Figure 2F and Supplementary file 2a and b). Environmental unpredictability, then, dependably directs disparate reinforcement learning trajectories between male and female grackles (faster versus slower finishers, respectively), sup-porting our overall expectation of sex-mediated differential learning in urban-invading grackles.

Reinforcement learning mechanisms

Because (dis)similar behaviour can result from multiple latent processes (McElreath, 2018), we next employ computational methods to delimit reinforcement learning mechanisms. Specifically, we adapt a multi-level Bayesian reinforcement learning model (from Deffner et al., 2020), which we validate apriori via agent-based simulation (see Materials and methods and Supplementary file 1), to estimate the contribution of two core latent learning parameters to grackles’ reinforcement learning: the information-updating rate φ (How rapidly do learners ‘revise’ knowledge?) and the risk-sensitivity rate λ (How strongly do learners ‘weight’ knowledge?). Both learning parameters capture individual-level internal response to incurred reward-payoffs (full mathematical details in Materials and methods). Specifically, as φ0→1, information-updating increases; as λ0→∞, risk-sensitivity strengthens. In other words, by formulating our scientific model as a statistical model, we can reverse engineer which values of our learning parameters most likely produce grackles’ choice behaviour—an analytical advantage over less mechanistic methods (McElreath, 2018).

Looking at our reinforcement learning model’s estimates between populations to determine replicability, we observe: in initial learning, the information-updating rate φ of core- and edge-inhabiting male grackles is largely lower than that of female counterparts (M-F posterior contrasts lie more below zero; Figure 2G and Supplementary file 2c), with smaller sample size likely explaining the middle population’s more uncertain estimates (M-F posterior contrasts centre widely around zero; Figure 2G and Supplementary file 2c); while in reversal learning, the information-updating rate φ of both sexes is nearly identical irrespective of population membership, with females dropping to the reduced level of males (M-F posterior contrasts centre closely around zero; Figure 2H and Supplementary file 2c). Therefore, the information-updating rate φ across male and female grackles is initially different (males < females), but converges downwards over reinforcement learning phases (across-population M-F posterior contrasts lie mostly below, and then, tightly bound zero; Figure 2G and H and Supplementary file 2c).

These primary mechanistic findings are, at first glance, perplexing: if male grackles generally outperform female grackles in reversal learning (Figure 2D-F), why do all grackles ultimately update information at matched, dampened pace? This apparent conundrum, however, in fact highlights the potential for multiple latent processes to direct behaviour. Case in point: the risk-sensitivity rate λ is distinctly higher in male grackles, compared to female counterparts, regardless of population membership and learning phase (M-F posterior contrasts lie more, if not mostly, above zero; Figure 2I and J and Supplementary file 2d), outwith the middle population in initial learning likely due to sample size (M-F posterior contrasts centre broadly around zero; Figure 2I and Supplementary file 2d). In other words, choice behaviour in male grackles is more strongly governed by relative differences in predicted reward-payoffs, as spotlighted by across-population M-F posterior contrasts that lie almost entirely above zero (Figure 2I and J and Supplementary file 2d). Thus, these combined mechanistic data reveal, when reward contingencies reverse, male— versus female—grackles ‘relearn’ faster via pronounced reward-payoff sensitivity, a persistence-based risk-sensitive learning strategy.

Grackle reinforcement learning. Behaviour. Across-population learning speed and choice-option switches in (A-B) initial (M, 32; F, 17) and (D-E) reversal learning (M, 29; F, 17), with (C,F) respective posterior estimates and M-F contrasts. Mechanisms. Within- and across-population estimates and contrasts of information-updating rate φ and risk-sensitivity rate λ in (G,I) initial and (H,J) reversal learning. In (G-J) open circles show 100 random posterior draws; red filled circles and vertical lines show posterior means and 89% HPDI, respectively. Simulations. Learning speed and choice-option switches by: 10,000 full posterior-informed ‘birds’ (n = 5,000 per sex) in (K-L) initial and (N-O) reversal learning; and six average posterior-informed ‘birds’ (n = 3 per sex) in (M) initial and (P) reversal learning. In (K,N) the full simulation sample is plotted; in (L,O) open circles show 100 random simulant draws. Note (K,N) x-axes are cut to match (A,D) x-axes. Medians are plotted/labelled in (A,B,D,E,K,L,N,O).

Figure 2—figure supplement 1. Excluding extra learning trials.

Agent-based simulations and replication of reinforcement learning

To determine definitively whether our learning parameters are sufficient to generate grackles’ observed reinforcement learning, we conduct agent-based forward simulations; that is, we simulate ‘birds’ informed by the grackles in our data set. Specifically, whilst maintaining the correlation structure among learning parameters, we randomly assign 5000 ‘males’ and 5000 ‘females’ information-updating rate φ and risk-sensitivity rate λ estimates from the full across-population posterior distribution of our reinforcement learning model, and we track synthetic reinforcement learning trajectories. By comparing these synthetic data to our real data, we gain valuable insight into the learning and choice behaviour implied by our reinforcement learning model results. Specifically, a close mapping between the two data sets would indicate our information-updating rate φ and risk-sensitivity rate λ estimates can account for our grackles’ differential reinforcement learning; whereas a poor mapping would indicate some important mechanism(s) are missing (e.g., Deffner et al., 2020).

Ten thousand synthetic reinforcement learning trajectories, together, compellingly show our ‘birds’ behave just like our grackles: ‘males’ outpace ‘females’ in reversal but not in initial learning (median trials-to-finish initial and reversal learning: ‘males’, 31 and 62; ‘females’, 32 and 79; respectively; Figure 2K and N); and ‘males’ make fewer choice-option switches in initial but not in reversal learning, compared to ‘females’ (median switches-at-finish in initial and reversal learning: ‘males’, 11 and 20; ‘females’, 11 and 29; respectively; Figure 2L and O). Figure 2M and P show, respectively, synthetic initial and reversal learning trajectories by three average ‘males’ and three average ‘females’ (i.e., simulants informed via learning parameter estimates that average over our posterior distribution), for the reader interested in representative individual-level reinforcement learning dynamics. Such quantitative replication proves our reinforcement learning model results fully explain our behavioural sex-difference data.

Selection and benefit of reinforcement learning mechanisms under urban-like environments

Learning mechanisms in grackles obviously did not evolve to be successful in the current study; instead, they likely reflect selection pressures and/or adaptive phenotypic plasticity imposed by urban environments (Blackburn et al., 2009; Sol et al., 2013; Lee and Thornton, 2021; Vinton et al., 2022; Caspi et al., 2022). Applying an evolutionary algorithm model (Figure 3A), we conclude by examining how urban environments might favour different information-updating rate φ and risk-sensitivity rate λ values, by estimating optimal learning strategies in settings that differ along two key ecological axes: environmental stability u (How often does optimal behaviour change?) and environmental stochasticity s (How often does optimal behaviour fail to payoff?). Urban environments are generally characterised as both stable (lower u) and stochastic (higher s): more specifically, urbanisation routinely leads to stabilised biotic structure, including predation pressure, thermal habitat, and resource availability, and to enhanced abiotic disruption, such as anthropogenic noise and light pollution (reviews in Shochat et al., 2006; Francis and Barber, 2013; Gaston et al., 2013). Seasonal survey data from (sub)urban British neighborhoods show, for example, 40-75% of house-holds provide supplemental feeding resources for birds (e.g., seed, bread, and peanuts; Cowie and Hinsley, 1988; Davies et al., 2009), the density of which can positively relate to avian abundance within an urban area (Fuller et al., 2008). But such supplemental feeding opportunities are necessarily traded off against increased vigilance due to unpredictable predator-like anthropogenic disturbances (e.g., automobile and airplane traffic; as outlined in Frid and Dill, 2002).

Strikingly, under characteristically urban-like (i.e., stable but stochastic) conditions, our evolutionary model shows the learning parameter constellation robustly exhibited by males grackles in our study—that is, low information-updating rate φ and high risk-sensitivity rate λ—should be favoured by natural selection (darker and lighter squares in, respectively the left and right plots in Figure 3B). These results imply, in urban and other statistically similar environments, learners benefit by averaging over prior experience (i.e., gradually updating ‘beliefs’), and by informing behaviour based on this experiential history (i.e., proceeding with ‘caution’), highlighting the adaptive value of strategising risk-sensitive learning in urban-like environments.

Evolutionary optimality of strategising risk-sensitive learning. (A) Illustration of our evolutionary algorithm model to estimate optimal learning parameters that evolve under systematically varied pairings of two key (urban) ecology axes: environmental stability u. and environmental stochasticity s. Specifically, 300-member populations run for 10 independent 7000-generation simulations per pairing, using ‘roulette wheel’ selection (parents are chosen for reproduction with a probability proportional to collected F+ rewards out of 1000 choices) and random mutation (offspring inherit learning genotypes with a small deviation in random direction). (B) Mean optimal learning parameter values discovered by our evolutionary model (averaged over the last 5000 generations). As the statistical environment becomes more urban-like (lower u and higher s values), selection should favour lower information-updating rate φ and higher risk-sensitivity rate λ (darker and lighter squares in left and right plot, respectively). We note arrows are intended as illustrative aids and do not correspond to a linear scale of ‘urbanness’

Discussion

Mapping a full pathway from behaviour to mechanisms through to selection and adaptation, we show risk-sensitive learning is a viable strategy to help explain how male grackles—the dispersing sex—currently lead their species’ remarkable North American urban invasion. Specifically, in wild-caught, temporarily-captive core-, middle- or edge-range grackles, we show: (i) irrespective of population membership, male grackles outperform female counterparts on stimulus-reward reversal reinforcement learning, finishing faster and making fewer choice-option switches; (ii) they achieve their speedier reversal learning performance via pronounced reward-payoff sensitivity (low φ and high λ), as ‘unblackboxed’ by our mechanistic model; (iii) these learning mechanisms indeed explain our sex-difference behavioural data: because we replicate our results using agent-based forward simulations; and (iv) risk-sensitive learning—i.e., low φ and high λ—appears advantageous in characteristically urban-like environments (stable but stochastic settings), according to our evolutionary model. These results set the scene for future comparative research.

The term ‘behavioural flexibility’—broadly defined as some ‘attribute’, ‘cognition’, ‘characteristic’, ‘feature’, ‘trait’ and/or ‘quality’ that enables animals to adapt behaviour to changing circumstances (Coppens et al., 2010; Audet and Lefebvre, 2017; Barrett et al., 2019; Lea et al., 2020)—has previously been hypothesised to explain invasion success (Wright et al., 2010), including that of grackles (Summers et al., 2023). But as eloquently argued elsewhere (Audet and Lefebvre, 2017), this term is conceptually uninformative, given the many ways in which it is applied and assessed. Of these approaches, reversal learning and serial—multiple back-to-back—reversal learning tasks are the most common experimental assays of behavioural flexibility (non-exhaustive examples of each assay in bees; Strang and Sherry 2014; Raine and Chittka 2012; birds; Bond et al. 2007; Morand-Ferron et al. 2022; fish; Lucon-Xiccato and Bisazza 2014; Bensky and Bell 2020; frogs; Liu et al. 2016; Burmeister 2022; reptiles; Batabyal and Thaker 2019; Gaalema 2011; primates; Cantwell et al. 2022; Lacreuse et al. 2018; and rodents; Rochais et al. 2021; Boulougouris et al. 2007). We have shown, however, at least for our grackles, faster reversal learning is governed primarily by pronounced reward-payoff sensitivity, so: firstly, these go-to experimental assays do not necessarily measure the unit they claim to measure (a point similarly highlighted in: Aljadeff and Lotem, 2021); and secondly, formal models based on the false premise that variation in learning speed relates to variation in behavioural flexibility require reassessment (Lea et al., 2020; Blaisdell et al., 2021; Logan et al., 2022b; Lukas et al., 2023; Logan et al., 2023a,c). Heeding previous calls (Dukas, 1998; McNamara and Houston, 2009; Fawcett et al., 2013), our study provides an analytical solution to facilitate productive research on proximate and ultimate explanations of seemingly flexible (or not) behaviour: because we publicly provide step-by-step code to examine individual decision making, two core underlying learning mechanisms, and their theoretical selection and benefit (see https://github.com/alexisbreen/Sex-differences-in-grackles-learning), which can be tailored to specific research questions. The reinforcement learning model, for example, generalises to, in theory, a variety of choice-option paradigms (Barrett, 2022), and these learning models can be extended to estimate asocial and social influence on individual decision making (e.g., McElreath et al., 2005; Aplin et al., 2017; Barrett et al., 2017; Deffner et al., 2020; Chimento et al., 2022), facilitating insight into the multi-faceted feedback process between individual cognition and social systems (Trump et al., 2023). Our open-access analytical resource thus allows researchers to dispense with the umbrella term behavioural flexibility, and to biologically inform and interpret their science—only then can we begin to meaningfully examine the functional basis of behavioural variation across taxa and/or contexts.

Ideas and speculation

Related to this final point, it is useful to outline how additional drivers outwith sex-biased risk-sensitive learning might contribute towards urban invasion success in grackles, too. Grackles exhibit a polygynous mating system, with territorial males attracting multiple female nesters (Johnson et al., 2000). Recent learning ‘style’ simulations show the sex with high reproductive skew approaches pure individual learning, while the other sex approaches pure social learning (Smolla et al., 2019). During population establishment, then, later-arriving female grackles could rely heavily on vetted information provided by male grackles on ‘what to do’ (Wright et al., 2010), as both sexes ultimately face the same urban-related challenges. Moreover, risk-sensitive learning in male grackles should help reduce the elevated risk associated with any skew towards acquiring knowledge through individual learning. And as the dispersing sex this process would operate independently of their proximity to a range front—a pattern suggestively supported by our mechanistic data (i.e., risk-sensitivity: males > females; Figure 2G and H). As such, future research on potential sex differences in social learning propensity in grackles seem particularly prudent, alongside systematic surveying of population-level environmental and fitness components across spatially (dis)similar populations; for this, our annotated and readily available analytical approach should prove useful, as highlighted above.

The lack of spatial replicates in the existing data set used herein inherently poses limitations on inference. But it is worth noting that phenotypic filtering by invasion stage is not a compulsory signature of successful (urban) invasion; instead, phenotypic plasticity and/or inherent species trait(s) may be facilitators (Blackburn et al., 2009; Sol et al., 2013; Lee and Thornton, 2021; Vinton et al., 2022; Caspi et al., 2022). For urban-invading grackles, both of these biological explanations seem strongly plausible, given: firstly, grackles’ highly plastic foraging and nesting ecology (Selander and Giller, 1961; Davis and Arnold, 1972; Wehtje, 2003); secondly, grackles’ apparent historic and current realised niche being—albeit in present day more variable—urban environments, a consistent habit preference that cannot be explained by changes in habitat availability or habitat connectivity (Summers et al., 2023); and finally, our combined behavioural, mechanistic, and evolutionary modelling results showing environments approaching grackles’ general species niche—urban environments—select for particular traits that exist across grackle populations (here, sex-biased risk-sensitive learning). Admittedly, our evolutionary model is not a complete representation of urban ecology dynamics. Relevant factors—e.g., spatial dynamics and realistic life histories—are missed out. These omissions are tactical ones. Our evolutionary model solely focuses on the response of reinforcement learning parameters to two core urban-like (or not) environmental statistics, providing a baseline for future study to build on; for example, it would be interesting to investigate such selection on learning parameters of ‘true’ invaders and not their descendants, a logistically tricky but nonetheless feasible research possibility (e.g., Duckworth and Badyaev, 2007).

Conclusions

By revealing robust interactive links between the dispersing sex and risk-sensitive learning in an urban invader (grackles), these fully replicable insights, coupled with our finding that urban-like environments favour pronounced risk-sensitivity, imply risk-sensitive learning is a winning strategy for urban-invasion leaders. Our modelling methods, which we document in-depth and make freely available, can now be comparatively applied, establishing a biologically meaningful analytical approach for much-needed study on (shared or divergent) drivers of geographic and phenotypic distributions (Somveille et al., 2018; Bro-Jørgensen et al., 2019; Lee and Thornton, 2021; Breen et al., 2021; Breen, 2021; Deffner et al., 2022).

Methods and Materials

Data provenance

The current study uses data from two types of sources: publicly archived data at the Knowledge Network for Biocomplexity (Logan, 2016c; Logan et al., 2022a); or privately permissed access to A.J.B. of (at the time) unpublished data by Grackle Project principal investigator Corina Logan, who declined participation on this study. We note these shared data are now also available at the Knowledge Network for Biocomplexity (Logan et al., 2023b).

Data contents

The data used herein chart colour-reward reinforcement learning performance from 32 male and 17 female wild-caught, temporarily-captive grackles inhabiting one of three study sites that differ in their range-expansion demographics; that is, defined as a core, middle or edge population (based on time-since-settlement population growth dynamics, as outlined in Chuang and Peterson, 2016). Specifically: (i) Tempe, Arizona (17 males and five females)—herein, the core population (estimated to be breeding since 1951, by adding the average time between first sighting and first breeding to the year first sighted; Wehtje, 2003, 2004); (ii) Santa Barbara, California (four males and four females)—herein, the middle population (known to be breeding since 1996; Lehman, 2020); and (iii) Greater Sacramento, California (eleven males and eight females)—herein, the edge population (known to be breeding since 2004; Hampton, 2004).

Experimental protocol

Below we detail the protocol for the colour-reward reinforcement learning test that we analysed herein.

Reinforcement learning test

The reinforcement learning test consists of two experimental phases (Figure 1): (i) stimulus-reward initial learning and (ii) stimulus-reward reversal learning. In both experimental phases, two different coloured tubes are used: for Santa Barbara grackles, gold and grey; for all other grackles, light and dark grey. Each tube consists of an outer and inner diameter of 26 mm and 19 mm, respectively; and each is mounted to two pieces of plywood attached at a right angle (entire apparatus: 50 mm wide × 50 mm tall × 67 mm deep); thus resulting in only one end of each coloured tube being accessible (Figure 1).

In initial learning, grackles are required to learn that only one of the two coloured tubes contains a food reward (e.g., dark grey); this colour-reward pairing is counterbalanced across grackles within each study site. Specifically, the rewarded and unrewarded coloured tubes are placed—either on a table or on the floor—in the centre of the aviary run (distance apart: table, 2 feet; floor, 3 feet), with the open tube-ends facing, and perpendicular to, their respective aviary side-wall. Which coloured tube is placed on which side of the aviary run (left or right) is pseudorandomised across trials. A trial begins at tube-placement, and ends when a grackle has either made a tube-choice or the maximum trial time has elapsed (eight minutes). A tube-choice is defined as a grackle bending down to examine the contents (or lack thereof) of a tube. If the chosen tube contains food, the grackle is allowed to retrieve and eat the food, before both tubes are removed and the rewarded coloured tube is rebaited out of sight (for the grackle). If a chosen tube does not contain food, both tubes are immediately removed. Each grackle is given, first, up to three minutes to make a tube-choice, after which a piece of food is placed equidistant between the tubes to entice participation; and then, if no choice has been made, an additional five minutes maximum, before both tubes are removed. All trials are recorded as either correct (choosing the rewarded coloured tube), incorrect (choosing the unrewarded coloured tube), or incomplete (no choice made). To successfully finish initial learning, a grackle must meet the learning criterion, detailed below.

In reversal learning, grackles are required to learn that the colour-reward pairing has been swapped; that is, the previously unrewarded coloured tube (e.g., light grey) now contains a food reward (Figure 1). The protocol for this second and final experimental phase is identical to that, described above, of initial learning.

Reinforcement learning criterion

For all grackles in the current study, we apply the following learning criterion: to successfully finish their respective learning phase, grackles must make a correct choice in 17 of the most recent 20 trials. Therefore, the earliest a grackle can successfully finish initial or reversal learning in the current study is at trail 17. This applied learning criterion is the most compatible with respect to previous learning criteria used by the original experimenters. Specifically, Logan (Logan, 2016c) and Logan et al. (Logan et al., 2022a) used a fixed-window learning criterion for core- and middle-population grackles, in which grackles were required to make 17 out of the last 20 choices correctly, with a minimum of eight and nine correct choices across the last two sets of 10 trials, assessed at the end of each set. If a core- or middle-population grackle successfully satisfied the fixed-window learning criterion, the grackle was assigned by Logan or colleagues the final trial number for that set (e.g., 20, 30, 40), which is problematic because this trial did not always coincide with the true passing trial (by a maximum of two additional trials; see below).

For edge-population grackles, Logan and colleagues (Logan et al., 2023b) used a sliding-window learning criterion, in which grackles were required to again make 17 out of the last 20 choices correctly, with the same minimum correct-choice counts for the previous two 10-trial sets, except that this criterion was assessed at every trial (from 20 onward) rather than at the end of discrete sets. This second method is also problematic because a grackle can successfully reach criterion via a shift in the sliding window before making a choice. For example, a grackle could make three wrong choices followed by 17 correct choices (i.e., 7/10 correct and 10/10 correct in the last two sets of 10 trials), and at the start of the next trial, the grackle will reach criterion because the summed choices now consist of 8/10 correct and at least 9/10 correct in the last two sets of 10 trials no matter their subsequent choice—see initial learning performance by bird ‘Kel’ for a real example (row 1816 in https://github.com/alexisbreen/Sex-differences-in-grackles-learning; as well as in Logan et al., 2023b). Moreover, the use of different learning criteria (fixed- and sliding-window) by Logan and colleagues in different populations represents a confound when populations are compared. Thus, our applied 17/20 learning criterion ensures our assessment of grackles’ reinforcement learning is informative, straightforward, and consistent.

As a consequence of applying our 17/20 learning criterion, grackles can remain in initial and/or reversal learning beyond reaching criterion. These extra learning trials, however, already exist for some core- and middle-population grackles originally assessed via the fixed-window learning criterion (N = 18 in initial [range: 1-2 extra trials]; N = 13 in reversal [range: 1-2 extra trials]), as explained above. And our cleaning of the original data (see our Data_Processing.R script at https://github.com/alexisbreen/Sex-differences-in-grackles-learning) detected additional cases where grackles remained in-test despite meeting the applied criterion (fixed-window: N = 1 in reversal for 11 extra trials; sliding-window: N = 11 in initial [range: 1-10 extra trials]; N = 7 in reversal [range: 1-14 extra trials]), presumably due to experimenter oversight. Similarly, our data cleaning detected four birds in the core-population that did not in fact meet the fixed-window learning criterion because of incorrect trail numbers entered by the original experimenters e.g., skipping trial 24. Moreover, our data cleaning detected two birds in the middle-population that were passed by the original experimenters despite not meeting the assigned fixed-window learning criterion; instead, both chose 7/10 and 10/10 correct choices in the last two sets of 10 trials. We note these data issues, as well as the problematic nature of both the fixed- and sliding-window learning criterion, continue to be unaddressed in work by Logan (et al.) (Logan, 2016b,a; Logan et al., 2022b, 2023a,c). In any case, in our study we: (i) verified our 17/20 learning criterion results in a similar proportion of male and female grackles experiencing extra initial learning trials (females, 15/17; males, 30/32); and (ii) our learning parameter estimations during initial learning remain relatively unchanged irrespective of whether we exclude or include extra initial learning trails (Figure 2—figure Supplement 1). Thus, we are confident that any carry-over effect of extra initial learning trials on grackles’ reversal learning in our study is negligible if not nonexistent, and we therefore excluded extra learning trials.

Statistical analyses

We analysed, processed, and visually present our data using, respectively, the ‘rstan’ (Team, 2020), ‘rethinking’ (McElreath, 2018), and ‘tidyverse’ (Wickham et al., 2019) packages in R (Team, 2021). We note our reproducible code is available at https://github.com/alexisbreen/Sex-differences-in-grackles-learning. We further note our reinforcement learning model, defined below, does not exclude cases—two males in the core, and one male in the middle population—where a grackle was dropped (due to time constraints) early on from reversal learning by the original experimenters: because individual-level φ and λ estimates can still be estimated irrespective of trial number; the certainty around the estimates will simply be wider (McElreath, 2018). Our Poisson models, however, do exclude these three cases for our modelling of reversal learning, to conserve estimation. The full output from each of our models, which use weakly informative and conservative priors, is available in Supplementary file 2, including posterior means and 89% highest posterior density intervals (HPDI) (McElreath, 2018).

Poisson models

For our behavioural assay of reinforcement learning finishing trajectories, we used a multi-level Bayesian Poisson regression to quantify the effect(s) of sex and learning phase (initial versus reversal) on grackles’ recorded number of trials to successfully finish each phase. This model was performed at both the population and across-population level, and accounted for individual differences among birds through the inclusion of individual-specific varying (i.e., random) effects.

For our behavioural assay of reinforcement learning choice-option switching, we used an identical Poisson model to that described above, to predict the total number of switches between the rewarded and unrewarded coloured tubes.

Reinforcement learning model

We employed an adapted (from Deffner et al., 2020) multi-level Bayesian reinforcement learning model, to examine the influence of sex on grackles’ initial and reversal learning. Our reinforcement learning model, defined below, allows us to link observed coloured tube-choices to latent individual-level attraction updating, and to translate the influence of latent attractions (i.e., expected payoffs) into individual tube-choice probabilities. As introduced above, we can reverse engineer which values of our two latent learning parameters—the information-updating rate φ and the risk-sensitivity rate λ—most likely produce grackles’ choice behaviour, by formulating our scientific model as a statistical model. Therefore, this computational method facilitates mechanistic insight into how multiple latent learning parameters simultaneously guide grackles’ reinforcement learning (McElreath, 2018).

Our reinforcement learning model consists of two equations:

Equation (1) expresses how attraction A to choice-option i changes for an individual j across time (t + 1) based on their prior attraction to that choice-option (Ai,j,t) plus their recently experienced choice reward-payoffs (πi,j,t), whilst accounting for the relative influence of recent reward-payoffs (φk,l). As φk,l increases in value, so, too, does the rate of individual-level attraction updating based on reward-payoffs. Here, then, φk,l represents the information-updating rate. We highlight that the k, l indexing (here and elsewhere) denotes that we estimate separate φ parameters for each population (k = 1 for core; k = 2 for middle; k = 3 for edge) and for each learning phase (l = 1 for females/initial, l = 2 for females/reversal; l = 3 for males/initial; l = 4 for males/reversal).

Equation (2) is a softmax function that expresses the probability P that choice-option i is selected in the next choice-round (t + 1) as a function of the attractions A and the parameter λk,l, which governs how much relative differences in attraction scores guide individual choice behaviour. In the reinforcement learning literature, the λ parameter is known by several names—for example, ‘inverse temperature’, ‘exploration’ or ‘risk-appetite’ (Sutton and Barto, 2018; Chimento et al., 2022)— since the higher its value the more deterministic the choice behaviour of an individual becomes (note λ = 0 generates random choice). In line with foraging theory (Stephens and Krebs, 2019), we call λ the risk-sensitivity rate, where higher values of λ imply foragers are more sensitive to risk, seeking higher expected payoffs based on their prior experience, instead of randomly sampling alternative options.

From the above reinforcement learning model, then, we generate inferences about the effect of sex on φk,l and λk,l from at least 1000 effective samples of the posterior distribution, at both the population- and across-population-level. We note our reinforcement learning model also includes bird as a random effect (to account for repeated measures within individuals); however, for clarity, this parameter is omitted from our equations (but not our code: https://github.com/alexisbreen/Sex-differences-in-grackles-learning). Our reinforcement learning model does not, on the other hand, include trials where a grackle did not make a tube-choice, as this measure cannot clearly speak to individual learning—for example, satiation rather than any learning of ‘appropriate’ colour tube-choice could be invoked as an explanation in such cases. Indeed, there are, admittedly, a number of intrinsic and extrinsic factors (e.g., temperament and temperature, respectively) that might bias grackles’ tube choice behaviour, and, in turn, the output from our reinforcement learning model (Webster and Rutz, 2020). But the aim of such models is not to replicate the entire study system. Finally, we further note, while we exclude extra learning trials from all of our analyses (see above), our reinforcement learning model initiates estimation of φ and λ during reversal learning, based on individual-level attractions encompassing all previous choices. This parameterisation ensures we precisely capture grackles’ attraction scores up to the point of stimulus-reward reversal (for details, see our RL_Execution.R script at https://github.com/alexisbreen/Sex-differences-in-grackles-learning).

Agent-based simulations: pre- and post-study

Prior to analysing our data, we used agent-based simulations to validate our reinforcement learning model (full details in our preregistration–see Supplementary file 1). In brief, the tube choice behaviour of simulants is governed by a set of rules identical to those defined by equations (1) and (2), and we apply the exact same learning criterion for successfully finishing both learning phases. Crucially, this apriori model vetting verifies our reinforcement learning model can (i) detect simulated sex effects and (ii) accurately recover simulated parameter values in both extreme and more realistic scenarios.

After model fitting, we used the same agent-based approach to forward simulate—that is, simulate via the posterior distribution—synthetic learning trajectories by ‘birds’ via individual-level parameter estimates generated from our across-population reinforcement learning model. Specifically, maintaining the correlation structure among sex- and phase-specific learning parameters, we draw samples from the full or averaged random-effects multivariate normal distribution describing the inferred population of grackles. We use these post-study forward simulations to gain a better understanding of the implied consequences of the estimated sex differences in grackles’ learning parameters (see Figure 2 and associated main text; for an example of this approach in a different context, see Deffner et al., 2020).

Evolutionary model

To investigate the evolutionary significance of strategising risk-sensitive learning, we used algorithmic optimisation techniques (Yu and Gen, 2010; Otto and Day, 2011). Specifically, we construct an evolutionary model of grackle learning, to estimate how our learning parameters—the information-updating rate φ and the risk-sensitivity rate λ—evolve in environments that systematically vary across two ecologically relevant (see main text) statistical properties: the rate of environmental stability u and the rate of environmental stochasticity s. The environmental stability parameter u represents the probability that behaviour leading to a food reward changes from one choice to the next. If u is small, individuals encounter a world where they can expect the same behaviour to be adaptive for a relatively long time. As u becomes larger, optimal behaviour can change multiple times within an individual’s lifetime. The environmental stochasticity parameter s describes the probability that, on any given day, optimal behaviour may not result in a food reward due to external causes specific to this day. If s is small, optimal behaviour reliably produces rewards. As s becomes larger, there is more and more daily ‘noise’ regarding which behaviour is rewarded.

We consider a population of fixed size with N = 300 individuals. Each generation, individual agents are born naïve and make t = 1000 binary foraging decisions resulting in a food reward (or not). Agents decide and learn about the world through reinforcement learning governed by their individual learning parameters, φ and λ (see equations (1) and (2)). Both learning parameters can vary continuously, corresponding to the infinite-alleles model from population genetics (Otto and Day, 2011). Over the course of their lifetime, agents collect food rewards, and the sum of rewards collected over the last 800 foraging decisions (or ‘days’) determines their individual fitness. We ignore the first 200 choices because selection should respond to the steady state of the environment, independently of initial conditions (Otto and Day, 2011).

To generate the next generation, we assume asexual, haploid reproduction, and use fitness-proportionate (or ‘roulette wheel’) selection to choose individuals for reproduction (Yu and Gen, 2010; Otto and Day, 2011). Here, juveniles inherit both learning parameters, φ and λ, from their parent but with a small deviation (in random direction) due to mutation. Specifically, during each mutation event, a value drawn from zero-centered normal distributions N(0, μφ) or N(0, μλ) is added to the parent value on the logit-/log-scale to ensure parameters remain within allowed limits (between 0 and 1 for φ; positive for λ). The mutation parameters μφ and μλ thus describe how much offspring values might deviate from parental values, which we set to 0.05. We restrict the risk-sensitivity rate λ to the interval 0 to 15, because greater values result in identical choice behaviour. All results reported in the main text are averaged over the last 5000 generations of 10 independent 7000-generation simulations per parameter combination. This duration is sufficient to reach steady state in all cases.

In summary, our evolutionary model is a necessary and useful first step towards addressing targeted research questions about the interplay between learning phenotype and environmental characteristics.

Acknowledgements

We thank Jean-François Gerard and Rachel Harrison for useful feedback on our study before data analyses; and James St Clair, Sue Healy, and Richard McElreath for similarly useful presubmission feedback. We are further and most grateful to Richard McElreath for study and overall full support. And we thank all members, past and present, of the Grackle Project for collecting and making available, either via public archive (core and middle population) or permissed advanced access (edge population; see Data provenance), the data analysed herein. We note this material uses illustrations from Vecteezy.com. Finally, we further note this material uses data from the eBird Status and Trends Project at the Cornell Lab of Ornithology, eBird.org. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Cornell Lab of Ornithology.

Additional information

Funding

A.J.B. and D.D. received no independent funding for this research.

Author contributions

A.J.B. conceived of the study; collated, cleaned, and curated all target data; D.D. led model and simulation building, with input from A.J.B.; A.J.B and D.D. contributed equally to mechanistic modelling and agent-based forward simulations; D.D. performed the Poisson and the evolutionary modelling; A.J.B. prepared all figures and tables, with help on Figure 3 from D.D.; A.J.B. and D.D. annotated together all open source material; A.J.B. wrote the manuscript, with constructive contributions and revisions by D.D.; A.J.B. and D.D. agree over and approve the final draft of the manuscript.

Author ORCIDs

Alexis J Breen https://orcid.org/0000-0002-2331-0920

Dominik Deffner https://orcid.org/0000-0002-1649-3861

Additional files

Supplementary files

  • Supplementary file 1. Study preregistration, including reinforcement learning model validation.

  • Supplementary file 2. Supplementary tables. (a) Total-trials-in-test Poisson regression model output (b) Total-choice-option-switches-in-test Poisson regression model output. (c) Bayesian reinforcement learning model information-updating rate φ output (d) Bayesian reinforcement learning model risk-sensitvity rate λ output. For (a-c), both between- and across-population posterior means and corresponding 89% highest-posterior density intervals are reported for males, females, and male-female contrasts.

Reinforcement learning speed.

Between- and across-population total-trials-in-test Poisson regression model estimates and male-female contrasts, with corresponding lower (L) and upper (U) 89% highest-posterior density intervals in parentheses.

Reinforcement learning switches.

Between- and across-population total-choice-option-switches-in-test Poisson regression model estimates and male-female contrasts, with corresponding lower (L) and upper (U) 89% highest-posterior density intervals in parentheses.

Reinforcement learning information-updating rate ϕ.

Between- and across-population computational model ϕ estimates and male-female contrasts, with posterior means and corresponding lower (L) and upper (U) 89% highest-posterior density intervals in parentheses.

Reinforcement learning risk-sensitivity rate λ.

Between- and across-population computational model λ estimates and male-female contrasts, with posterior means and corresponding lower (L) and upper (U) 89% highestposterior density intervals in parentheses.

Investigating sex differences in learning in a range-expanding bird

Alexis J. Breen1,* & Dominik Deffner1,2,3

1Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany

2Science of Intelligence Excellence Cluster, Technical University Berlin, Berlin 10623, Germany

3Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin 14195, Germany

*alexis_breen@eva.mpg.de

Abstract

How might differences in dispersal and learning interact in range expansion dynamics? To begin to answer this question, in this preregistration we detail the background, hypothesis plus associated predictions, and methods of our proposed study, including the development and validation of a mechanistic reinforcement learning model, which we aim to use to assay colour-reward reinforcement learning (and the influence of two candidate latent parameters—speed and sampling rate—on this learning) in great-tailed grackles—a species undergoing rapid range expansion, where males disperse.

Introduction

Dispersal and range expansion go ‘hand in hand’; movement by individuals away from a population’s core is a pivotal precondition of witnessed growth in species’ geographic limits (Chuang & Peterson, 2016; Ronce, 2007). Because ‘who’ disperses—in terms of sex—varies both within and across taxa (for example, male-biased dispersal is dominant among fish and mammals, whereas female-biased dispersal is dominant among birds; see Table 1 in Trochet et al., 2016), skewed sex ratios are apt to arise at expanding range fronts, and, in turn, differentially drive invasion dynamics. Female-biased dispersal, for instance, can ‘speed up’ staged invertebrate invasions by increasing offspring production (Miller & Inouye, 2013). Alongside sex-biased dispersal, learning ability is also argued to contribute to species’ colonisation capacity, as novel environments inevitably present novel (foraging, predation, shelter, and social) challenges that newcomers need to surmount in order to settle successfully (Sol et al., 2013; Wright et al., 2010). Indeed, a growing number of studies show support for this supposition (as recently reviewed in Lee & Thornton, 2021). Carefully controlled choice tests, for example, show that urban-dwelling individuals—that is, the ‘invaders’—will both learn and unlearn novel reward-stimulus pairings more rapidly than their rural-dwelling counterparts (Batabyal & Thaker, 2019), suggesting that range expansion selects for enhanced learning ability at the dispersal and/or settlement stage(s). Given the independent influence of sex-biased dispersal and learning ability on range expansion, it is perhaps surprising, then, that their potential interactive influence on this aspect of movement ecology remains unexamined, particularly as interactive links between dispersal and other behavioural traits such as aggression are documented within the range expansion literature (Duckworth, 2006; Gutowsky & Fox, 2011).

That learning ability can covary with, for example, exploration (e.g., Auersperg et al., 2011; Guillette et al., 2011) and neophobia (e.g., Verbeek et al., 1994), two behaviours which may likewise play a role in range expansion (Griffin et al., 2017; Lee & Thornton, 2021), is one potential reason for the knowledge gap introduced above. Such correlations stand to mask what contribution, if any, learning ability lends to range expansion—an undoubtedly daunting research prospect. A second (and not mutually exclusive) reason is that, for many species, a detailed diary of their range expansion is lacking (Blackburn et al., 2009; Udvardy & Papp, 1969). And patchy population records inevitably introduce interpretive ‘noise,’ imaginably impeding population comparisons of learning ability (or the like).

In range-expanding great-tailed grackles (Quiscalus mexicanus), however, learning ability appears to represent a unique source of individual variation; more specifically, temporarily-captive great-tailed grackles’ speed to solve colour-reward reinforcement learning tests does not correlate with measures of their exploration (time spent moving within a novel environment), inhibition (time to reverse a colour-reward preference), motor diversity (number of distinct bill and/or feet movements used in behavioural tests), neophobia (latency to approach a novel object), risk aversion (time spent stationary within a ‘safe spot’ in a novel environment), persistence (number of attempts to engage in behavioural tests), or problem solving (number of test-relevant functional and non-functional object-choices) (Logan, 2016a, 2016b). Moreover, careful combing by researchers of public records, such as regional bird reports and museum collections, means that great-tailed grackle range-expansion data is both comprehensive and readily available (Dinsmore & Dinsmore, 1993; Pandolfino et al., 2009; Wehtje, 2003). Thus, great-tailed grackles offer behavioural ecologists a useful study system to investigate the interplay between life-history strategies, learning ability, and range expansion.

Here, for the first time (to our knowledge), we propose to investigate potential differences in colour-reward reinforcement learning performance between male and female great-tailed grackles (Figure 1), to test the hypothesis that sex differences in learning ability are related to sex differences in dispersal. Since the late nineteenth century, great-tailed grackles have been expanding their range at an unprecedented rate, moving northward from their native range in Central America into the United States (breeding in at least 20 states), with several first-sightings spanning as far north as Canada (Dinsmore & Dinsmore, 1993; Wehtje, 2003). Notably, the record of this range expansion in great-tailed grackles is heavily peppered with first-sightings involving a single or multiple male(s) (Dinsmore & Dinsmore, 1993; Kingery, 1972; Littlefield, 1983; Stepney, 1975; Wehtje, 2003). Moreover, recent genetic data show that, when comparing great-tailed grackles within a population, average relatedness: (i) is higher among females than among males; and (ii) decreases with increasing geographic distance among females; but (iii) is unrelated to geographic distance among males; hence, confirming a role for male-biased dispersal in great-tailed grackles (Sevchik et al., in press). Considering these natural history and genetic data, then, we expect male and female great-tailed grackles to differ across at least two colour-reward reinforcement learning parameters: speed and sampling rate (here, sampling is defined as switching between choice-options). Specifically, we expect male—versus female—great-tailed grackles: (prediction 1 & 2) to be faster to, firstly, learn a novel colour-reward pairing, and secondly, reverse their colour preference when the colour-reward pairing is swapped; and (prediction 3) to be more deterministic—that is, sample less often—in their colour-reward learning; if learning ability and dispersal relate. Indeed, since invading great-tailed grackles face agribusiness-led wildlife management strategies, including the use of chemical crop repellents (Werner et al., 2011, 2015), range expansion should disfavour slow, error-prone learning strategies, resulting in a spatial sorting of learning ability in great-tailed grackles (Wright et al., 2010). Related to this final point, we further expect (prediction 4) such sex differences in learning ability to be more pronounced in great-tailed grackles living at the edge, rather than the intermediate and/or core, region of their range (e.g., Duckworth, 2006).

Left panel: images showing a male and female great-tailed grackle (credit: Wikimedia Commons). Right panel: schematic of the colour-reward reinforcement learning experimental protocol. In the initial learning phase, great-tailed grackles are presented with two colour-distinct tubes; however, only one coloured tube (e.g., dark grey) contains a food reward (F+ versus F-). In the reversal learning phase, the colour-reward tube-pairings are swapped. The passing criterion was identical in both phases (see main text for details).

Methods

Data

This preregistration aims to use colour-reward reinforcement learning data collected (or being collected) in great-tailed grackles across three study sites that differ in their range-expansion demographics; that is, belonging to a core, intermediate, or edge population (based on time-since-settlement population growth dynamics, as outlined in Chuang & Peterson, 2016). Specifically, data will be utilised from: (i) Tempe, Arizona—hereafter, the core population (estimated—by adding the average time between first sighting and first breeding to the year first sighted—to be breeding since 1951) (Walter, 2004; Wehtje, 2003); (ii) Santa Barbara, California—hereafter, the intermediate population (known to be breeding since 1996) (Lehman, 2020); and (iii) Woodland, California—hereafter, the edge population (known to be breeding since 2004) (Hampton, 2001). Data collection at both the Tempe, Arizona and Santa Barbara, California study sites has been completed prior to the submission of this preregistration (total sample size across sites: nine females and 25 males); however, data collection at the Woodland, California study site is ongoing (current sample size: three females and nine males; anticipated minimum total sample size: five females and ten males). Thus, the final data set should contain colour-reward reinforcement learning data from at least 14 female and 35 male great-tailed grackles.

Experimental protocol

General

A step-by-step description of the experimental protocol is reported elsewhere (e.g., Blaisdell et al., 2021). As such, below we detail only the protocol for the colour-reward reinforcement learning tests that we propose to analyse herein.

Colour-reward reinforcement learning tests

The reinforcement learning tests consist of two phases (Figure 1, right panel): (i) colour-reward learning (hereafter, initial learning) and (ii) colour-reward reversal learning (hereafter, reversal learning). In both phases, two different coloured tubes are used: for Santa Barbara great-tailed grackles, gold and grey (Logan, 2016b, 2016a); for all other great-tailed grackles: light and dark grey (Blaisdell et al., 2021). Each tube consists of an outer and inner diameter of 26 mm and 19 mm, respectively; and each is mounted to two pieces of plywood attached at a right angle (entire apparatus: 50 mm wide × 50 mm tall × 67 mm deep); thus resulting in only one end of each coloured tube being accessible (Figure 1, right panel).

In the initial learning phase, great-tailed grackles are required to learn that only one of the two coloured tubes contains a food reward (e.g., dark grey; this colour-reward pairing is counterbalanced across great-tailed grackles within each study site). Specifically, the rewarded and unrewarded coloured tubes are placed—either on a table or on the floor—in the centre of the aviary run (distance apart: table, 2 ft; floor, 3 ft), with the open tube-ends facing, and perpendicular to, their respective aviary side-wall. Which coloured tube is placed on which side of the aviary run (left or right) is pseudorandomised across trials. A trial begins at tube-placement, and ends when a great-tailed grackle has either made a tube-choice or the maximum trial time has elapsed (eight minutes). A tube-choice is defined as a great-tailed grackle bending down to examine the contents (or lack thereof) of a tube. If the chosen tube contains food, the great-tailed grackle is allowed to retrieve and eat the food, before both tubes are removed and the rewarded coloured tube is rebaited out of sight (for the great-tailed grackle). If a chosen tube does not contain food, both tubes are immediately removed. Each great-tailed grackle is given, first, up to three minutes to make a tube-choice (after which a piece of food is placed equidistant between the tubes to entice participation); and then, if no choice has been made, an additional five minutes maximum, before both tubes are removed. All trials are recorded as either correct (choosing the rewarded colour tube), incorrect (choosing the unrewarded colour tube), or incomplete (no choice made); and are presented in 10-trial blocks. To pass initial learning, a great-tailed grackle must make a correct choice in at least 17 out of the most recent 20 trials, with a minimum of eight and nine correct choices across the last two blocks.

In the reversal learning phase, great-tailed grackles are required to learn that the colour-reward pairing has been switched; that is, the previously unrewarded coloured tube (e.g., light grey) now contains a food reward. The protocol for this second and final learning phase is identical to that, described above, of the initial learning phase.

Analysis plan

General

Here, we will analyse, process, and visually present our data using, respectively, the ‘rstan’ (Stan Development Team, 2020), ‘rethinking’ (McElreath, 2018), and ‘tidyverse’ (Wickham et al., 2019) packages in R (R Core Team, 2021). Our reproducible code is available on GitHub (https://github.com/alexisbreen/Sex-differences-in-grackles-learning).

Reinforcement learning model

In this preregistration, we propose to employ an adapted (from Deffner et al., 2020) Bayesian reinforcement learning model, to examine the influence of sex on great-tailed grackles’ initial and reversal learning performance. The reinforcement learning model, defined below, allows us to link observed coloured tube-choices to latent individual-level knowledge-updating (of attractions towards, learning about, and sampling of, either coloured tube) based on recent tube-choice reward-payoffs, and to translate such latent knowledge-updating into individual tube-choice probabilities; in other words, we can reverse engineer the probability that our parameters of interest (speed and sampling rate) produce great-tailed grackles’ observed tube-choice behaviour by formulating our scientific model as a statistical model (McElreath, 2018, p. 537). This method can therefore capture whether, and, if so, how multiple latent learning strategies simultaneously guide great-tailed grackles’ decision making—an analytical advantage over more traditional methods (e.g., comparing trials to passing criterion) that ignore the potential for equifinality (Barrett, 2019; Kandler & Powell, 2018).

Our reinforcement learning model consists of two equations:

Equation 1 expresses how attraction (A) to a choice-option (i) changes for an individual (j) across time (t + 1) based on their prior attraction to that choice-option (Ai,j,t) plus their recently experienced choice-payoff (πi,j,t), whilst accounting for the weight given to recent payoffs (φk,l). As φk,l increases in value, so, too, does the rate of individual attraction-updating; thus, φk,l represents the individual learning rate. We highlight that the k, l indexing denotes that we estimate separate φ parameters for each phase of the experiment (k = 1 for initial, k = 2 for reversal) and each sex (l = 1 for females, l = 2 for males).

Equation 2 is a softmax function that expresses the probability (P) that option (i) is selected in the next choice-round (t + 1) as a function of the attractions and a parameter (λk,l) that governs how much relative differences in attraction scores guide individual choice-behaviour. The higher the value of λk,l, the more deterministic (less option-switching) the choice-behaviour of an individual becomes (note λk,l = 0 generates random choice); thus, λk,l represents the individual sampling rate for phase k and sex l.

From the above reinforcement learning model, then, we will generate inferences about the effect of sex on φk,l and λk,l from at least 1000 effective samples of the posterior distribution (see our model validation below). We note that our reinforcement learning model also includes both individual bird and study site as random effects (to account for repeated measures within both individuals and populations); however, for clarity, these parameters are omitted from our equations (but not our code: https://github.com/alexisbreen/Sex-differences-in-grackles-learning). Regarding our study site random effect, we further note that, as introduced above, we will also explore population-mediated sex-effects on φ and λ, by comparing these learning parameters both within and between sexes at each study site. Finally, our reinforcement learning model excludes trials where a great-tailed grackle did not make a tube-choice, as this measure cannot clearly speak to individual learning ability—for example, satiation rather than any learning of ‘appropriate’ colour tube-choice could be invoked as an explanation in such cases. Indeed, there are, admittedly, a number of intrinsic and extrinsic factors (e.g., temperament and temperature, respectively) that might bias great-tailed grackles’ tube-choice behaviour, and, in turn, the output from our reinforcement learning model (Webster & Rutz, 2020). Nonetheless, our reinforcement learning model serves as a useful first step towards addressing if learning ability and dispersal relate in great-tailed grackles (for a similiar rationale, see McElreath & Smaldino, 2015).

Model validation

We validated our reinforcement learning model in three steps. First, we performed agent-based simulations. Specifically, we followed the tube-choice behaviour of simulated great-tailed grackles—that is, 14 females and 35 males from one of three populations (where population membership matched known study site sex distributions)—across the described initial learning and reversal learning phases. The tube-choice behaviour of the simulated great-tailed grackles was governed by a set of rules identical to those defined by our mathematical equations—for example, coloured tube attractions were independently updated based on the reward outcome of tube choices. Because we assigned higher average φ and λ values to simulated male (versus female) great-tailed grackles, the resulting data set should show males outperform females on initial and reversal learning, at both the group and individual-level; it did (Figure 2 & S1, respectively).

Group-level tube-choice behaviour of simulated great-tailed grackles across colour-reward reinforcement learning trials (females: yellow, n = 14; males: green, n = 35), following model validation step one. Tube option 1 (e.g., dark grey) was the rewarded option in the initial learning phase; conversely, tube option 2 (e.g., light grey) contained the food reward in the reversal learning phase. Each open circle represents an individual tube-choice; black lines indicate binomial smoothed conditional means fitted with grey 89% compatability intervals.

Next, we ran our simulated data set on our reinforcement learning model. Here, we endeavored to determine whether our reinforcement learning model: (i) recovered our assigned φk,l and λk,l values (it did; Table 1); and (ii) produced ‘correct’ qualitative inferences—that is, detected the simulated sex differences in great-tailed grackles’ initial and reversal learning (it did; Figure 3).

Comparison of assigned and recovered φ and λ values, following model validation step two. Eighty-nine percent highest posterior density intervals (HPDI) are shown for recovered values.

Comparison of learning ability in simulated female (yellow; n = 14) and male (green; n = 35) great-tailed grackles across initial and reversal colour-reward reinforcement learning, following model validation step two. (A) φ, the rate of learning i.e., speed. (B) λ, the rate of sampling i.e., switching between choice-options. (C) and (D) show posterior distributions for respective contrasts between female and male learning. Eighty-nine percent highest posterior density intervals are shaded in grey; that this interval does not cross zero evidences a simulated effect of sex on learning ability.

Finally, we repeated step one and step two, using a range of realistically plausible φ and λ sex differences (note that values for female great-tailed grackles were left unchanged from Table 1), to determine whether our reinforcement learning model could detect different effect sizes of sex on our target learning parameters. This final step confirmed that, for our anticipated minimum sample size, our reinforcement learning model: (i) detects sex differences in φ values >= 0.03 and λ values >= 1; and (ii) infers a null effect for φ values < 0.03 and λ values < 1 i.e., very weak simulated sex differences (Figure 4). Both of these points together highlight how our reinforcement learning model allows us to say that null results are not just due to small sample size. Additionally, estimates obtained from step three were more precise in the reversal learning phase compared to the initial learning phase (Figure 4), and we can expect to detect even smaller sex differences if we analyse learning across both phases—an approach we will apply if we detect no effect of phase. In sum, model validation steps one through three confirm that our reinforcement learning model is reasonably fit.

Parameter recovery test for different sizes of simulated sex differences. Plots show posterior estimates of the effect of sex (contrasts between simulated male and female great-tailed grackles; n = 14 and 35, respectively) on speed (φ) and sampling (λ) learning parameters, following model validation step three. Black circles represent the mean recovered sex effect estimates with grey eighty-nine percent highest posterior density intervals (HPDIs); black solid diagonal lines represent a ‘perfect’ match between assigned and recovered parameter estimates (note that we would not expect a perfect correspondence due to stochasticity of agent-based simulations); and black dashed horizontal lines represent a recovered null sex effect.

Bias

AJB and DD are (at the time of submitting this preregistration) blind with respect to all but two aspects of the target data: the sex and population membership of each grackle that has, thus far, completed, or is expected to complete, the colour-reward reinforcement learning tests (because these parameters were used in model validation simulations—see above).

Open materials

https://github.com/alexisbreen/Sex-differences-in-grackles-learning

Acknowledgements

We thank all members, past and present, of the Grackle Project for collecting and sharing the data that we propose to analyse herein. We further thank Richard McElreath for study support.

Ethics

All data utilised herein were collected with ethical approval.

Supplementary material

Individual-level tube-choice behaviour of simulated great-tailed grackles across colour-reward reinforcement learning trials (females: yellow, n = 14; males: green, n = 35). Tube option 1 (e.g., dark grey) was the rewarded option in the initial learning phase; conversely, tube option 2 (e.g., light grey) contained the food reward in the reversal learning phase. Each open circle shows an individual tube-choice; black solid lines show loess smoothed conditional means fitted with grey 89% compatibility intervals; and dashed black lines show individual-unique transitions between learning phases.

Comparison of information-updating rate φ and risk-sensitivity rate λ estimates (top and bottom row, respectively) in initial learning excluding and including extra initial learning trials (left and right column, respectively), which are present in the original data set (see Methods and materials). Because this comparison does not show any noticeable difference depending on their inclusion or exclusion, we excluded extra learning trials from our analyses. All plots are generated via model estimates using our full sample size: 32 males and 17 females.

Acknowledgements

We thank all members, past and present, of the Grackle Project for collecting and sharing the data that we propose to analyse herein. We further thank Richard McElreath for study support.