Our decisions are guided by how we perceive the value of an option, but this evaluation also affects how we move to acquire that option. Why should economic variables such as reward and effort alter the vigor of our movements? In theory, both the option that we choose and the vigor with which we move contribute to a measure of fitness in which the objective is to maximize rewards minus efforts, divided by time. To explore this idea, we engaged marmosets in a foraging task in which on each trial they decided whether to work by making saccades to visual targets, thus accumulating food, or to harvest by licking what they had earned. We varied the effort cost of harvest by moving the food tube with respect to the mouth. Theory predicted that the subjects should respond to the increased effort costs by choosing to work longer, stockpiling food before commencing harvest, but reduce their movement vigor to conserve energy. Indeed, in response to an increased effort cost of harvest, marmosets extended their work duration, but slowed their movements. These changes in decisions and movements coincided with changes in pupil size. As the effort cost of harvest declined, work duration decreased, the pupils dilated, and the vigor of licks and saccades increased. Thus, when acquisition of reward became effortful, the pupils constricted, the decisions exhibited delayed gratification, and the movements displayed reduced vigor.
Our results suggest that as the brainstem neuromodulatory circuits that control pupil size respond to effort costs, they alter computations in the brain regions that control decisions, encouraging work and delaying gratification, and the brain regions that control movements, reducing vigor and suppressing energy expenditure. This coordinated response suggests that decisions and actions are part of a single control policy that aims to maximize a variable relevant to fitness: the capture rate.
This important study unravels the interaction between effort cost, pupil-indexed brain state, and movement (saccadic) vigor during foraging decisions in marmoset monkeys. Based on a normative computational model, the authors derive the prediction that anticipated effort should affect both decisions and movement vigor during foraging; and then provide solid behavioural and pupillometric evidence for this prediction in a foraging task. This paper will be of interest to decision and motor neuroscience as well as to all researchers studying animal behavior.
During foraging, animals work to locate a food cache and then spend effort harvesting what they have found. As they forage, their decisions appear to maximize a measure that is relevant to fitness: the sum of rewards acquired, minus efforts expended, divided by time, termed the capture rate (1–3). For example, a crow will spend effort extracting a clam from a sandy beach, but if the clam is small, it will abandon it because the additional time and effort required to extract the small reward --dropping it repeatedly from a height onto rocks -- can be better spent finding a bigger prize (4). In other words, if going to the bank entails waiting in a long line, one should go infrequently, but make each transaction a large amount.
Intriguingly, reward expectation not only affects decisions, it also affects movements: we not only prefer the less effortful option, we also move vigorously to obtain it (5, 6). This modulation of movement vigor can be justified if we consider that movements require expenditure of time and energy, which discount the value of the promised reward (7, 8). Thus, from a theoretical framework, it seems rational that the brain should have a mechanism to coordinate control of decisions with control of movements so that both contribute to maximizing a measure of fitness (9).
To study this coordination, we designed a task in which marmosets decided how long to work before they harvested their food. On a given trial, they made a sequence of saccades to visual targets and received an increment of food as their reward. However, the increment was small, and its harvest was effortful, requiring inserting their tongue inside a small tube. Theory predicted that in order to maximize the capture rate, harvest should commence only when there was sufficient reward accumulated to justify the effort required for its extraction. Indeed, the subjects chose to work in order to stockpile food, and only then initiated their harvest.
On some days the effort cost of harvest was low: the tube was placed close to the mouth. On other days the same amount of work, i.e., saccade trials, produced food that had a higher effort cost: the tube was located farther away. The theory made two interesting predictions: as the effort cost increased, the subjects should choose to work more trials, thus delaying their harvest so to stow more food, but reduce their movement vigor, thus saving energy. Indeed, when marmosets encountered an increased effort cost, they extended their work period, stockpiling food, but reduced their vigor, slowing their saccades during the work period, and slowing their licks during the harvest period.
What might be a neural basis for this coordinated response of the decision-making and the motor-control circuits? During the work and the harvest periods, momentary changes in pupil size closely tracked the changes in vigor: pupil dilation accompanied increases in vigor, while pupil constriction accompanied decreases in vigor. Remarkably, this was true regardless of whether the movement that was being performed was a saccade, or a lick. Moreover, in response to the increased effort cost, the pupils exhibited a global change, constricting during both the work and the harvest periods.
If we view the changes in pupil size as a proxy for activity in the brainstem noradrenergic circuits (10), our results suggest that as these circuits respond to effort costs (11), they alter computations in the brain regions that control decisions, delaying gratification and encouraging work, and the brain regions that control movements, promoting sloth and conserving energy.
We tracked the eyes and the tongue of head-fixed marmosets as they performed visually guided saccades in exchange for food (Fig. 1A). Each successful trial consisted of 3 visually guided saccades, at the end of which we delivered an increment of food (a slurry mixture of apple sauce and monkey chow). Because the reward amount was small (0.015-0.02 mL), the subjects rarely harvested following a single successful trial. Rather, they worked for a few trials, allowing the food to accumulate, then initiated their harvest by licking (Fig. 1B). The key variables were how many trials they chose to work before starting harvest, and how vigorously they moved their eyes and tongue during the work and the harvest periods.
Over the course of 2.5 years, we recorded 56 sessions in subject M (29 months) and 56 sessions in subject R (23 months). A typical work period lasted about 10 seconds, during which the subjects attempted ∼8 trials, and succeeded in 4-5 trials (Fig. 1D) (a successful trial was when all three saccades were within 1.25 of the center of each target). The work period ended when the subject decided to stop tracking the targets and instead initiated harvest, which lasted about 6 seconds, resulting in 16-18 licks. Subject M completed an average of 909.5 ± 61 successful trials per session (mean ± SEM), producing an average of 241 ± 13.9 work-harvest pairs, and subject R completed an average of 1431 ± 65 successful trials, producing an average of 263 ± 8.9 work-harvest pairs.
We delivered food via either the left or the right tube for 50-300 consecutive trials, and then switched tubes. We tracked the motion of the tongue using DeepLabCut (12), as shown for a typical session in Fig. 1B. The licks required precision because the tube was just large enough (4.4 mm diameter) to allow the tongue to penetrate. As a result, about 30% of the of the reward-seeking licks were successful and contacted food (30±1.6% for subject M, 28±2.5% for subject R), as shown in Supporting Information Video 1. Example of licks that failed to contact food are shown in Supporting Information Video 2, Video 3, and Video 4.
Theory and predictions
During the decision-making part of the task, the brain explicitly determined how long to work before initiating harvest. During the work and the harvest periods, the brain implicitly controlled the vigor of movements. We imagined that these two forms of behavior were not independent, but rather coordinated via a control policy that maximized a single utility: the sum of rewards acquired, minus efforts expended, divided by time, termed the capture rate. We chose this formulation because it presents a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another patch (4, 13, 14).
During a work period, our subjects decided to complete a number of saccade trials ns, a fraction βs of which were successful, earning food increment α, but expended effort cs that consumed time Ts for each trial. They then stopped working and initiated harvest, producing a number of licks nl, a fraction βl of which succeeded, thus expending effort cl and consuming time Tl for each lick. These actions produced the following capture rate:
In the numerator of Eq. (1), the first term represents the fact that the food cache increased linearly with successful trials and was then consumed gradually with successful licks. The second term represents the effort expenditure of licking, and the third term represents the effort expenditure of working. Notably, the effort expenditure of work, grows faster than linearly as a function of trials. This nonlinearity is essential to reflect the idea that following a long work period, the capture rate must be more negative than following a short work period (i.e., more work trials produce a greater reduction in utility).
A control policy describes how long to work and harvest, and an optimal policy produces periods of working and harvesting, that maximize Eq. (1). A closed-form solution for the optimal policy can be obtained (Mathematica notebook simulations.nb) and Fig. 2A provides an example. As the work period concludes and the harvest period beings (nl = 0), the capture rate is negative. This reflects the fact that the subject has performed a few trials and stockpiled food, thus expended effort but has not been rewarded yet. The capture rate rises when licking commences. Critically, the peak capture rate is not an increasing function of the work period. Rather, there is an optimal work period ( red trace, Fig. 2A) associated with a given effort cost of licking cl. If we now move the tube away from the mouth, that is, increase the effort cost of licking cl, the peak of the capture rate shifts and the optimal work period changes: the proper response to an increased effort cost of licking is to work longer, stowing more food before commencing harvest.
Notably, the higher cost of licking inevitably reduces the maximum capture rate (Fig. 2A). This should impact movement vigor: animals tend to respond to a reduced capture rate by slowing their movements (9), which can be viewed as an effective way to save energy (8). To incorporate vigor into the capture rate, we tried to define the effort cost of a single lick cl in terms of its energetic cost, a relationship that is currently unknown. Fortunately, other movements provide a clue: the energetic cost of reaching (8, 15), as well as the energetic cost of walking (16, 17), are both concave upward functions of the movement’s duration. That is, from an energetic standpoint, there is a reach speed, and a walking speed, that minimizes the cost of each type of movement. We generalized these empirical observations to licking and assumed that the energetic cost of a single lick was a concave upward function of its duration:
In Eq. (2), the lick is aimed at a tube located at distance d, and has a duration Tl. The parameter k describes the rate with which the cost grows as a function of duration. For example, the lick duration that minimizes the energetic cost is . Thus, for an energetically optimal lick, duration grows linearly with tube distance (Fig. 2B). However, our objective is not to minimize the cost of licking, but to maximize the capture rate. To do so, we insert Eq. (2) into Eq. (1) and find the optimal policy which now depends on the distance of the food tube to the mouth (Mathematica notebook simulations.nb).
The theory predicts that to maximize the capture rate (Eq. 1), the response to an increased effort cost of harvest (i.e., tube distance) should be as follows: should increase (Fig. 2C), should decrease (Fig. 2C), and should increase (Fig. 2D). Notably, the rate of increase in as a function of tube distance is faster than linear, while from an energetic point of view (Eq. 2), increase in distance should produce a linear increase in lick duration. Thus, as the harvest becomes more effortful, the subject should work longer to stockpile food, but move slower to save energy.
To test our theory further, we thought it useful to have a way to alter decisions in one direction (say work longer) but change movement vigor in the opposite direction (move faster). In theory, this is possible: if the subject is hungry (darker lines in Fig. 2C and 2D), i.e., the reward is more valuable, then they should again work longer before initiating harvest. Paradoxically, they should also move faster.
In summary, if decisions and actions are coordinated via a policy that aims to maximize the capture rate, then in response to an increased cost of harvest, one should work longer, but move with reduced vigor. In response to an increased reward value, as in hunger, one should also work longer, but now move with increased vigor.
Increased effort cost of harvest promoted work but reduced saccade vigor
To vary the effort cost of harvest, we altered the tube distance to the mouth (but kept it constant during each session). Varying tube distance affected the decisions of the subjects: when the tube was placed farther, they chose to work longer before starting harvest (Fig. 3A, left subplot): they attempted more trials during each work period (ANOVA, subject M: F(2,7908)=41.5 p=5.2×10-25, subject R: F(2,10948)=88.2 p=7×10-50), and produced more successful trials per work period (ANOVA, subject M: F(2,7908)=63 p=2.8×10-24, subject R: F(2,10948)=163 p<10-50). This policy of delayed gratification was present throughout the recording session (Fig. 3A, middle plot). That is, when the harvest required more effort, the subjects worked longer to stockpile more food before initiating their harvest (Fig. 3A, right plot, effect of tube distance on food cached: subject M: F(2,9566)=176 p<10-50, subject R: F(2,8907)=204 p<10-50).
During the work period the subjects made saccades to visual targets and accumulated their food. They also made saccades that were not toward visual targets and thus were not eligible for reward. For each animal we computed the relationship between peak saccade velocity and saccade amplitude across all sessions and then calculated the vigor of each saccade: defined as the ratio of the actual peak velocity with respect to the expected peak velocity for that amplitude (18, 19). For example, a saccade that exhibited a vigor of 1.10 had a peak velocity that was 10% greater than the average peak velocity of the saccades of that amplitude for that subject. As expected, the reward-relevant saccades, i.e., saccades made to visual targets (primary, corrective, and center saccades) were more vigorous than other saccades (Fig. 3B, 2-way ANOVA, effect of saccade type, subject M: F (1,391459)=7248 p<10-50, subject R: F (1,355839)=13641 p<10-50).
As a work period began, the reward-relevant saccades exhibited high vigor, but then trial-by-trial, this vigor declined, reaching a low vigor value just before the work period ended (Fig. 3C vigor). Remarkably, on days in which the tube was placed farther, saccade vigor was lower (RMANOVA, effect of tube distance, subject M: F(2,59033)=224 p<10-50, subject R: F(2,50103)=75.51 p=1.8×10-33). Thus, increasing the effort cost of extracting food during the harvest period reduced saccade vigor during the work period.
By definition, a more vigorous saccade had a greater peak velocity. This might imply that high vigor saccades should suffer from inaccuracy due to signal dependent noise (20). However, we observed the opposite tendency: as saccade vigor increased, both the magnitude and the variance of the endpoint error decreased (Fig. 3C, 2-way ANOVA, effect of vigor on error magnitude, subject M: F(8,59046)=480 p<10-50, subject R: F(8,50184)=252 p<10-50, effect of vigor on error variance, subject M: F(8,2673)=18200 p<10-50, subject R: F(8,2673)=4170 p<10-50). That is, reducing the effort costs of harvest not only promoted vigor, it also facilitated accuracy (21).
Cognitive signals such as effort and reward are associated with changes in pupil size (10), as well as transient activation of brainstem neuromodulatory circuits in locus coeruleus (11). We wondered if the changes in tube position altered the output of these neuromodulatory circuits, as inferred via pupil size. For each reward-relevant saccade, we measured the pupil size during a ±250 ms window centered on saccade onset, and then normalized this measure based on the distribution of pupil sizes that we had measured during the entire recording for that session in that subject, resulting in a z-score.
At the onset of each work period the pupils were dilated, but as the subjects performed more trials, the pupils constricted, exhibiting a trial-by-trial reduction that paralleled the changes in saccade vigor (Fig. 3C). Notably, the effort cost of harvest affected pupil size: during the work period the pupils were more dilated if the tube was placed closer to the mouth (Fig. 3C, RMANOVA, effect of tube distance, subject M: F(2,60502)=20 p=2×10-9, subject R: F(2,50431)=23.8 p=4.9×10-11). That is, when the effort cost of harvest was lower, the pupils dilated, and the saccades were invigorated.
In summary, when we increased the effort cost of harvest, both the movements and the decisions changed: the pupils constricted and the movements slowed, but they chose to work more trials before initiating harvest.
Increased effort cost of harvest reduced lick vigor
The work period ended when the subject chose to stop tracking the target and initiated harvest via a licking bout. As in saccades, we defined lick vigor via the ratio of the actual peak velocity of the lick with respect to the expected velocity for that lick amplitude. As amplitude increased, lick peak velocity increased during both protraction and retraction (Fig. 4A). Some of the licks were reward seeking and directed toward the tube, while others were grooming licks, cleaning the tongue and the area around the mouth (Supporting Information Video 5). Reward-seeking licks were more vigorous than grooming licks (2-way ANOVA, effect of lick type, protraction, subject M: F(1,272233)=66 p=4.5×10-16, subject R: F(1,229052)=698 p<10-50), and retraction was more vigorous than protraction (reward seeking licks, retraction vs. protraction, subject M: t(241145)=532 p<10-50, subject R: t(213674)=665 p<10-50).
As the harvest began, the first lick was very low vigor, but lick after lick, the movements gathered velocity, reaching peak vigor by the 3rd or the 4th lick (Fig. 4B). As the harvest continued, lick vigor gradually declined. Like saccades, licks had a lower vigor in sessions in which the tube was placed farther from the mouth (RMANOVA, effect of tube distance, subject M: F(2,59033)=222.5 p<10-50, subject R: F(2,133502)=224 p<10-50), and this pattern was present during the entire recording session (Fig. 4C, left subplot). Thus, an increased effort cost of harvest promoted sloth: reduced vigor of saccades during the work period, and reduced vigor of licks during the harvest period.
For each reward seeking lick, we measured pupil size during a ±250ms window centered on the moment of peak tongue displacement. During licking, the pupil size changed with a pattern that closely paralleled lick vigor: as the harvest began, pupil size was small, but it rapidly increased during the early licks, then gradually declined as the harvest continued (Fig. 4B, right subplot). Importantly, the pupils were more dilated in sessions in which the tube was closer to the mouth (Fig. 4C, right subplot, effect of tube distance, subject M: F(2,166742)=583 p<10-50, subject R: F(2,130493)=118 p<10-50). As a result, when the effort cost of reward increased, the pupils constricted, and the vigor of both saccades and licks decreased.
While the theory predicted that moving the tube farther would result in a longer work period and reduced movement vigor, it also predicted that the subjects would reduce their harvest duration (reduced licks, Fig. 2C). That is, it predicted that the subjects would work longer, stowing more food, but leave more of it behind. This last prediction did not agree with our data (see Discussion). For Subject R, the number of licks were approximately the same across the various tube distances, and for Subject M the number of licks increased with tube distance (Supplementary Fig. 1).
In summary, within a harvest period, lick vigor rapidly increased and then gradually declined. Simultaneous with the changes in vigor, the pupils rapidly dilated and then gradually constricted. In sessions where the tube was placed farther from the mouth, the licks had lower vigor, and the pupils were more constricted.
Expectation of greater reward increased lick vigor
As the subject worked, they accumulated food, thus increasing the magnitude of the available reward. To check whether reward magnitude affected movement vigor, for each tube distance we computed the average lick vigor during the harvest as a function of the number of trials completed in the preceding work period. We found that when the work period had included many completed trials, then the movements in the ensuing harvest period were more vigorous (Fig. 4D, 2-way ANOVA, effect of trials, subject M: F(4, 164242)=353 p<10-50, subject R: F(4,123411)=152 p<10-50). Thus, the licks were invigorated by the amount of food that awaited harvest.
Because the tube was small, many of the licks missed their goal and failed to contact the food. The success or failure of a lick affected both the vigor of the subsequent lick, and the change in the size of the pupil. Following a successful lick there was a large increase in lick vigor (Fig. 4E, subject M: t(85182)=40 p<10-50, subject R: t(81378)=104 p<10-50), and a large increase in pupil size (subject M: F(84969)=57 p<10-50, subject R: t(80318)=94 p<10-50). In contrast, following a failed lick the subjects either reduced or did not increase their lick vigor (Fig. 4E, subject M: t(114159)=0.88 p=0.37, subject R: t(97164)=-44 p<10-50). This failure also produced a smaller increase in pupil size (comparison to successful lick, two sample t-test, subject M: t(198722)=14.8 p=4.3×10-50, subject R: t(176044)=53 p<10-50). Thus, a single successful lick led to acquisition of reward, which then was followed by a relatively large increase in pupil size, and an invigorated subsequent lick.
For saccades we had found that increased vigor was associated with greater accuracy. To quantify the relationship between lick vigor and accuracy, for each tube distance we labeled each reward seeking lick as being high or low vigor. For subject M, high vigor licks tended to be more successful, but this was not the case for subject R (Fig. 4F). Moreover, tube distance did not produce a consistent effect on lick success.
In summary, the subject licked more vigorously following a long work period in which they had accumulated more reward. Moreover, when a lick was successful in acquiring reward, they increased the vigor of the subsequent lick.
Hunger promoted work and increased vigor
Our theory predicted that it should be possible to change decisions in one direction (say work longer), while altering movement vigor in the opposite direction (move faster). An increase in the subjective value of reward, as might occur when the subject is hungry, should have two effects: increase the number of trials that the subject chooses to perform before commencing harvest, and increase movement vigor.
We did not explicitly manipulate the weight of the subjects. Indeed, to maintain their health, we strived to keep their weights constant during the roughly 2.5-year period of these experiments. However, there was natural variability, which allowed us to test the predictions of the theory.
We found that when their weight was lower than average, the subjects chose to work a greater number of trials before commencing harvest (Fig. 5A, two sample t-test, subject M: t(11052)=7.9 p=3.4×10-25, subject M: t(12549)=10.1 p=9.3×10-24). This result was similar to the effect that we had seen when the effort cost of harvest was increased. However, the theory had predicted that the effect on vigor should be in the opposite direction: if hunger increased reward valuation, then one should speed the movements and hasten food acquisition. Notably, weight did not have a consistent effect on saccade vigor across the two subjects (Fig. 5A), yet during the harvest, both subjects licked with greater vigor when their weight was lower (Fig. 5A, subject M: t(219752)=88 p<10-50, subject R: t(205163)=22 p<10-50).
Thus, while both the effort cost of reward and hunger promoted greater work, effort promoted sloth while hunger promoted lick vigor.
Pupil size variations strongly correlated with changes in decisions and movements
Finally, we considered the data across both the work and the harvest periods and asked how well movement vigor tracked pupil size. The results demonstrated that in both the work and the harvest periods, for both saccades and licks, an increase in pupil size was associated with an increase in vigor (Fig. 5B, reward-relevant saccades, subject M: r=0.989 p=7.7×10-9, subject R: r=0.97 p=7.1×10-7, reward-seeking protraction licks, subject M: r=0.969 p=9.8×10-7, subject R: r=0.989 p=6.3×10-9). Moreover, when the pupil was dilated, the work periods tended to be shorter (Fig. 5C, subject M: r=-0.90 p=0.00014, subject R: r=-0.97 p=6.3×10-7), while harvest durations tended to be longer (Fig. 5C, subject M: r=0.894 p=0.00021, subject R: r=0.935 p=2.4×10-5). Thus, pupil dilation was associated with choosing to work less, while moving faster.
What we choose to do is the purview of the decision-making circuits of our brain, while the implicit vigor with which we perform that action is the concern of the motor-control circuits. From a theoretical perspective (9), our brain should coordinate these two forms of behavior because both the act that we select, and its vigor, dictate expenditure of time and energy, contributing to a capture rate that affects longevity and fecundity (22). Does the brain coordinate decisions and movements to maximize a capture rate? If so, how might the brain accomplish this coordination?
Here, we designed a foraging task in which marmosets worked by making saccades, accumulating food for each successful trial, then stopped working and harvested their cache by licking. On every trial they decided whether to work, or to harvest. Their decision was carried out by the motor system, producing either a visually guided saccade, or a lick, each exhibiting a particular vigor. The theory predicted that to maximize the capture rate, the appropriate response to an increased effort cost of harvest was to do two things: work longer to cache more food but reduce vigor to conserve energy.
We varied the effort costs by moving the food tube with respect to the mouth. This changed the effort cost of harvest but not the effort cost of work. The subjects responded by altering how they worked as well as how they harvested. When the harvest was more effortful, they performed more saccade trials to stockpile food. They also slowed their movements, reducing saccade velocity during the work period, and reducing lick velocity during the harvest period. Notably, the most vigorous saccades were also the most accurate: as saccade vigor increased, so did endpoint accuracy.
The theory made a second prediction: as the value of reward increased, the subjects should again choose to work a longer period before initiating harvest, but unlike the effort costs, now respond by moving more vigorously. We did not directly manipulate the subjective value of reward, but rather relied on the natural fluctuations in body weight and assumed that when their weight was low, the subjects were hungrier for reward. Indeed, when their weight was low, the subjects again chose to work longer, but now elevated their vigor during the harvest period.
Finally, we quantified the effect of reward magnitude on vigor. Within a session, lick vigor increased robustly as a function of the number of trials completed in the preceding work period. Thus, the licks were invigorated by the amount of food that awaited harvest.
Notably, some of the predictions of the theory did not agree with the experimental data. An increased effort cost did not accompany a reduction in the duration of harvest, and hunger did not increase saccade vigor robustly. Indeed, earlier experiments have shown that if the effort cost of harvest increases, animals who expend the effort will then linger longer to harvest more of the reward that they have earned (2). This mismatch between observed behavior and theory highlights some of the limitations of our formulation. For example, our capture rate reflected a single work-harvest period, rather than a long sequence. Moreover, the capture rate did not consider the fact that the food tube had finite capacity, beyond which the food would fall and be wasted. This constraint would discourage a policy of working more but harvesting less. Finally, if we assume that a reduced body weight is a proxy for increased subjective value of reward, it is notable that we observed a robust effect on vigor of licks, but not saccades. A more realistic capture rate formulation awaits simulations, possibly one that describes capture rate not as the ratio of two sums (sum of gains and losses with respect to sum of time), but rather the expected value of the ratio of each gain and loss with respect to time (23, 24).
A shortcoming of our model is that we did not include a link between lick vigor and its probability of success. As a result, when we moved the food tube away, the model did not consider the possibility that maintaining lick accuracy may involve reduced vigor. The reason for this is that we searched for but could not find a consistent relationship, across subjects or effort conditions, between protraction speed of the tongue and its success probability. Thus, we cannot exclude this alternate hypothesis. However, the most interesting aspect of our results was that when we increased tube distance, making harvest more effortful, there was not only a reduction in lick vigor, but also a reduction in saccade vigor. That is, the decisions and actions during the work period responded to the increased effort cost of reward during the harvest period.
What might be a neural basis for this coordination of decisions and movements? A clue was the fact that the pupils were more constricted in sessions in which the effort cost of harvest was greater. This global change in pupil size accompanied delayed harvest and reduced vigor across sessions, but surprisingly, even within a session, transient changes in pupil size accompanied changes in vigor. During the work period the trial-to-trial reduction in saccade vigor accompanied trial-to-trial constriction of the pupil, and within a harvest period, the rapid rise and then the gradual fall in lick vigor paralleled rapid dilatation followed by gradual constriction of the pupil.
Pupil dilation is a proxy for activity in the brainstem neuromodulatory system (25) and is a measure of arousal (26). Control of pupil size is dependent on spiking of norepinephrine neurons in locus coeruleus (LC-NE): an increase in the activity of these neurons produces pupil dilation (27, 28). Some of these neurons show a transient change in their activity when acquisition of reward requires expenditure of either physical (11) or mental effort (29), even when there is no concomitant movement to be made. It is possible that in the present task, as the effort cost of harvest increased, LC-NE neurons decreased their activity, producing pupil constriction. If so, the reduced NE release may have had two simultaneous effects: encourage work and promote delayed gratification in brain regions that control decisions, discourage energy expenditure and promote sloth in brain regions that control movements. Thus, the idea that emerges is that the response of NE to economic variables, as inferred via changes in pupil size, might act as a bridge to coordinate the computations in the decision-making circuits with the computations in the motor-control circuits, aiming to implement a consistent control policy that improves the capture rate.
In addition to NE, the basal ganglia, and in particular the neurotransmitter dopamine, are likely the key contributors to the coordination of decisions with actions (30, 31). When the effort price of a preferred food increases, animals choose to work longer, pressing a lever a greater number of times (32, 33). This desire to expend effort to acquire a valuable reward is reduced if dopamine is blocked in the ventral striatum (34–36). Hunger activates circuits in the hypothalamic nuclei, disinhibiting dopamine release in response to food cues (37). Dopamine concentrations in the striatum drop when the effort price of a food reward increases (38), and dopamine release before onset of a movement tends to invigorate that movement (39). Thus, the presence of dopamine may not only alter decisions by encouraging expenditure of effort, but also modify movements by promoting vigor.
Experiments of Hayden et al. (40) and Barack et al. (41) suggest that the decision of when to stop work and commence harvest may rely on computations that are carried out in the cingulate cortex. They found that as monkeys deliberated between the choice of staying and acquiring diminishing rewards, or leaving and incurring a travel cost, these neurons encoded a decision-variable that reflected the value of leaving the patch. The prediction that emerges from our work is that the rate of rise of these decision variables may be modulated by the presence of NE.
From a motor control perspective, a surprising aspect of our results was that an increase in saccade vigor accompanied an improvement in endpoint accuracy. In our earlier work we found that during reaching, reward increased vigor without reducing accuracy (42). Thus, the brain has the means to increase movement vigor and improve its accuracy. How is this achieved?
We found that the high vigor saccades were produced when the pupils were dilated, implying an increased release of NE. In songbirds, increased NE release acts on the basal ganglia to suppress activity of spiny neurons, and this reduced activity in the basal ganglia accompanies reduced variance in the songs that the animal sings (43). Thus, NE may play a critical role in control of movement variability. For saccades, control of endpoint accuracy depends on the coordinated activity of Purkinje cells in the oculomotor region of the cerebellar vermis (44, 45). LC projects to the cerebellum, and stimulation of LC neurons increases the sensitivity of Purkinje cells to their inputs (46).
Is movement vigor increased following increased NE inputs from LC to the basal ganglia, and accuracy improved following increased NE inputs from LC to the cerebellum? Does decision-making shift toward greater work and delayed gratification following reduced NE inputs from LC to the frontal lobe? These are some of the questions that await future experiments.
Behavioral and neurophysiological data were collected from two marmosets (Callithrix Jacchus, male and female, 350-390 g, subjects R and M, 6 years old). The neurophysiological data focused on the cerebellum and are described elsewhere (44, 47, 48). Here, our focus is on the behavioral data.
The marmosets were born and raised in a colony that Prof. Xiaoqin Wang has maintained at the Johns Hopkins School of Medicine since 1996. The procedures on the marmosets were evaluated and approved by the Johns Hopkins University Animal Care and Use Committee in compliance with the guidelines of the United States National Institutes of Health.
Following recovery from head-post implantation surgery, the animals were trained to make saccades to visual targets and rewarded with a mixture of apple sauce and lab diet (47). They were placed in a monkey chair and head-fixed while we presented visual targets on an LCD screen (Curved MSI 32” 144 Hz - model AG32CQ) and tracked both eyes at 1000 Hz using an EyeLink-1000 system (SR Research, USA). Timing of target presentation on the video screen was measured using a photo diode. Tongue movements were tracked with a 522 frame per second Sony IMX287 FLIR camera, with frames captured at 100 Hz.
Each trial began with a saccade to the center target followed by fixation for 200 ms, after which a primary target (0.5×0.5 deg square) appeared at one of 8 randomly selected directions at a distance of 5-6.5 deg. Onset of the primary target coincided with presentation of a tone. As the animal made a saccade to the primary target, that target was erased, and a secondary target was presented at a distance of 2-2.5 deg, also at one of 8 randomly selected directions. The subject was rewarded if following the primary saccade it made a corrective saccade to the secondary target, landed within 1.5 deg radius of the target center, and maintained fixation for at least 200 ms. Onset of reward coincided with presentation of another distinct tone. Following an additional 150-250 ms period (uniform random distribution), the secondary target was erased, and the center target was displayed, indicating the onset of the next trial. Thus, a successful trial comprised of a sequence of 3 saccades: center, primary, and corrective, after which the subject received a small increment of food (0.015 mL).
The food was provided in two small tubes (4.4 mm diameter), one to the left and the other to the right of the animal (Fig. 1A). A successful trial produced a food increment in one of the tubes and would continue to do so for 50-300 consecutive trials, then switch to the other tube. Because the food increment was small, the subjects naturally chose to work for a few consecutive trials, tracking the visual targets and allowing the food to accumulate, then stopped tracking and harvested the food via a licking bout. The licking bout typically included a sequence of 15-40 licks. The subjects did not work while harvesting. As a result, the behavior consisted of a work period (targeted saccades), followed by a harvest period (targeted licking), repeated hundreds of times per session.
The critical variables were the number of trials that the subject chose to perform before initiating harvest, the vigor of their saccades during the work period, and the vigor of their licks during the harvest period.
All saccades, regardless of whether they were instructed by presentation of a visual target or not, were identified using a velocity threshold. Saccades to primary, secondary, and central targets were labeled as reward-relevant saccades, while all remaining saccades were labeled as task irrelevant.
We analyzed tongue movements using DeepLabCut (12). Our network was trained on 89 video recordings of the subjects with 15-25 frames extracted and labeled from each recording. The network was built on the ResNet-152 pre-trained model, and then trained over 1.03×10 iterations with a batch size of 8, using a GeForce GTX 1080Ti graphics processing unit (49). A Kalman filter was further applied to improve quality and smoothness of the tracking, and the output was analyzed in MATLAB to quantify varying lick events and kinematics.
We tracked the tongue tip and the edge of the food in the tube, along with control locations (nose position and tube edges). We tracked all licks, regardless of whether they were aimed toward the tube (reward seeking), or not (grooming). Reward seeking licks were further differentiated based on whether they aimed to enter the tube (inner-tube licks), hit the outer edge of the tube (outer-edge licks), or fell below the tube (under tube). If any of these licks successfully contacted the tube, we labeled that lick as a success (otherwise, a failed lick).
Pupil area was measured during a ±250 ms period centered at the onset of each reward-relevant saccade, and the onset of each lick. We then normalized the pupil measurements by representing it as a z-score with respect to the mean value for that session.
Saccade and tongue vigor
We relied on previous work to define vigor of a movement (5, 9, 18, 19). Briefly, if the amplitude of a movement is x and the peak speed of that movement is v, then for each subject the relationship between the two variables can be described as:
In the above expression, α,β ≥ 0 and are subject-specific parameters. For a movement with amplitude x, its vigor was defined as the ratio of the actual peak speed with respect to the expected value of its peak speed, i.e., . Expected value was computed by fitting Eq. (3) to all the data acquired across all sessions. When vigor is greater than 1, the movement had a peak velocity that was higher than the mean value associated with that amplitude.
We chose a formulation of utility (Eq. 1) based on a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another reward opportunity (4, 13, 14). In a typical formulation of the theory, the numerator represents the reward gained (in units of energy), minus the effort expended (also in units of energy), while the denominator represents the amount of time spent during that behavior. We represented this idea in Eq. (1) with saccades that produced reward accumulation, and licks that produced reward consumption. Thus, the utility that we aim to maximize is the rate of energy gained.
The specific functions that we used to represent the energy gained through reward acquisition, and the energy expended through effort expenditure, came either from experiment design, or from the measurements we have made in other experiments. We modeled reward accumulation as a linear rise in energy stored because successful saccades produced a linear increase in the food cache. We modeled harvesting of the food as a hyperbolic function of the number of licks to represent the fact that as the licking bout began, each successful lick depleted the food, and thus the first few licks produced a greater amount of food consumption than the last few licks. We modeled the effort cost of licking as a linear function of the number of licks.
A critical assumption that we made is that energy expended performing the saccade trials (which grew faster than linearly as a function of the number of trials attempted), grew faster than the time spent attempting those same trials (which grew linearly with the number of trials). This assumption is based on the heuristic that the average rate of energy lost following a large number of attempted trials is greater than the average rate of energy lost following a small number of attempted trials.
The model’s simplicity provided closed-form solutions across all parameter values, allowing us to make predictions without having to fit the model to the measured data. For example, for all parameter values that produce a real solution (as opposed to imaginary), the optimal number of saccade trials increases with the square root of the cost of licking. Thus, the basic prediction of the model is that to maximize the capture rate, regardless of parameter values, an increase in the effort required for harvest should be met with a greater willingness to work. The closed-form solutions are presented in the supplementary document (simulations.nb).
Other models of utility
In composing our utility (Eq. 1), we chose to combine reward and effort additively. This is in contrast to other approaches in which effort discounts reward multiplicatively (50–52). Our reasoning is that multiplicative interactions have the limitation that they are incompatible with the observation that reward invigorates movements.
To compare additive and multiplicative approaches, let us consider an arbitrary function u r that specifies how effort varies with movement duration T. Typically, this is a U-shaped function that describes energy expenditure as a function of movement duration, as in (8). In the case of multiplicative interaction between reward and effort, we can consider the following representation of utility:
In the above formulation, reward a is discounted hyperbolically with time, and an increase in reward increases the utility of the action. The optimum movement vigor has the duration T* that maximizes this utility. Notably, because increasing reward merely scales this utility, it has no effect on vigor. Thus, a utility in which reward is multiplied by a function of effort generally fails to predict dependence of movement vigor on reward.
The optimal policy specifies the decisions and movements that for the effort cost defined in Eq. (2), maximizes the capture rate defined in Eq. (1). This policy selects the number of saccade trials ns to perform during the work period, the number of licks nl to perform during the harvest period, and the vigor of each lick, represented by the average duration of a lick Tl. To compute the optimal policy, we found the derivative of the capture rate with respect to each policy variable ns, nl, and Tl, then set each derivative equal to zero, producing three simultaneous nonlinear equations. In all three cases, we were able to solve for the relevant control variable analytically (see Mathematica notebook simultatons.nb for the derivations). We found that if the solution was a real number, then regardless of parameter values, an increase in d (distance of the tube to the mouth), the optimal policy produced an increase in decrease in and increase in . Thus, the results illustrated in Fig. 2 are robust to changes in parameter values.
To generate the plots in Fig. 2A, we used the following parameter values: α = 20, βs = 0.5, βL = 0.3, cs = 0.5, Ts = 1, TL = 0.2, cL = 0.5 (low effort), cL = 2.5 (high effort). For the plots in Fig. 2C and 2D, we used the same parameter values, but cL was defined via Eq. (2). Thus, tube distance d varied, and TL was unknown and was solved for. In Eq. (2), kL = 1. In the simulations, to describe state of hunger, we set α = 20 for a sated state and a= 25 for a hungry state.
Hypothesis testing was performed using functions provided by the MATLAB Statistics and Machine Learning Toolbox, version R2021b. For t-tests, across the one-sample, paired-sample, and two-sample conditions, p values were computed using the ttest and ttest2 functions with data that was combined across sessions, separated by condition. For ANOVA tests, in the one-way condition, p values were computed using a nonparametric Kruskal-Wallis test, using the kruskalwallis function. In the 2-way condition, the anovan function was used to compute p values, accounting for an unbalanced design resulting from a varied number of samples across conditions. In both cases, like in the t-tests, data was combined across sessions, separated by condition. In the repeated measures condition, each session was treated as a subject with multiple repeated measures representing a given variable (i.e., lick vigor per lick in a harvest period). To fit a repeated measures model, the fitrm function was used, then analyzed using the ranova function. In all cases of repeated measures ANOVA, compound symmetry assumptions were tested using the Mauchly sphericity test with the maulchy function. In cases where the assumption was violated (Maulchy test p < 0.05), epsilon adjustments were used, with the epsilon function, to compute corrected p values (for ε > .75, use Huynh-Feldt p value and for ε < .75, use Greenhouse-Geisser p values). For correlation analyses, Pearson’s correlation coefficient, r, and corresponding p values were computed using the corrcoef function.
The work was supported by grants from the NIH (R01-EB028156, R01-NS078311, R37-NS128416), and the Office of Naval Research (N00014-15-1-2312).
Video 1. Example of a successful inner-tube lick.
Video 2. Example of an under-tube lick that failed to contact food. Note the corrective sub-movements, as has been observed in mice (53).
Video 3. Example of an outer-tube lick that failed to contact food.
Video 4. Example of a lick that hit the outer edge of the tube and failed to contact food.
Video 5. Example of a grooming lick.
- 1.Optimal foraging, the marginal value theoremTheor.Popul.Biol 9:129–136
- 2.Optimal foraging in great tits (Parus major)Nature 268:137–139
- 3.Vigor: neuroeconomics of movement control
- 4.Diet selection and optimization by northwestern crows feeding on Japanese littleneck clamsEcology 67:1219–1226
- 5.Saccade vigor and the subjective economic value of visual stimuliJ. Neurophysiol 123:2161–2172
- 6.Saccade vigor reflects the rise of decision variables during deliberationCurr. Biol https://doi.org/10.1016/j.cub.2022.10.053
- 7.Temporal discounting of reward and the cost of time in motor controlJ.Neurosci 30:10507–10516
- 8.A representation of effort in decision-making and motor controlCurr.Biol 26:1929–1934
- 9.Control of movement vigor and decision making during foragingProc.Natl.Acad.Sci.U.S.A 115:E10476–E10485
- 10.Pupil Size as a Window on Neural Substrates of CognitionTrends Cogn. Sci 24:466–480
- 11.Locus coeruleus neurons encode the subjective difficulty of triggering and executing actionsPLOS Biol 19
- 12.DeepLabCut: markerless pose estimation of user-defined body parts with deep learningNat. Neurosci 21:1281–1289
- 13.Foraging Theory
- 14.To walk or to fly? How birds choose among foraging modesProc.Natl.Acad.Sci.U.S.A 98:1089–1094
- 15.Older adults learn less, but still reduce metabolic cost, during motor adaptationJ.Neurophysiol 111:135–144
- 16.Energy-speed relation and optimal speed during level walkingInt.Z.Angew.Physiol 17:277–283
- 17.Effect of load and speed on the energetic cost of human walkingEur.J.Appl.Physiol 94:76–83
- 18.Modulation of Saccade Vigor during Value-Based Decision MakingJ.Neurosci 35:15369–15378
- 19.Movement vigor as a trait-like attribute of individualityJ.Neurophysiol 120:741–757
- 20.Signal-dependent noise determines motor planningNature 394:780–784
- 21.The duration of reaching movement is longer than predicted by minumum varianceJ.Neurophysiol 116:2342–2345
- 22.Fitness consequences of foraging behaviour in the zebra finchNature 352:153–155
- 23.Preferences for fixed and variable food sources: variability in amount and delayJ.Exp.Analysis.Behav 63:313–329
- 24.Rate currencies and the foraging starling: the fallacy of the averages revisitedBehav. Ecol 7:341–352
- 25.Phasic locus coeruleus activity regulates cortical encoding of salience informationProc. Natl. Acad. Sci. 115:E9439–E9448
- 26.Pupillometry: Psychology, Physiology, and FunctionJ. Cogn. 1:1–23
- 27.Relationships between Pupil Diameter and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate CortexNeuron 89:221–234
- 28.Active control of arousal by a locus coeruleus GABAergic circuitNat. Neurosci 22:218–228
- 29.Pupil Dilation and Microsaccades Provide Complementary Insights into the Dynamics of Arousal and Instantaneous Attention during Effortful ListeningJ. Neurosci 43:4856–4866
- 30.The Basal Ganglia Do Not Select Reach Targets but Control the Urgency of CommitmentNeuron 95:1160–1170
- 31.Dynamic control of decision and movement speed in the human basal gangliaNat. Commun 13
- 32.Haloperidol and nucleus accumbens dopamine depletion suppress lever pressing for food but increase free food consumption in a novel food choice procedurePsychopharmacol. Berl 104:515–521
- 33.Nucleus accumbens dopamine depletions make rats more sensitive to high ratio requirements but do not impair primary food reinforcementNeuroscience 92:545–552
- 34.Role of nucleus accumbens dopamine D1 and D2 receptors in instrumental and Pavlovian paradigms of conditioned rewardPsychopharmacol. Berl 152:67–73
- 35.Nucleus accumbens and effort-related functions: behavioral and neural markers of the interactions between adenosine A2A and dopamine D2 receptorsNeuroscience 166:1056–1067
- 36.The role of dopamine D1 receptor transmission in effort-related choice behavior: Effects of D1 agonistsPharmacol.Biochem.Behav 135:217–226
- 37.Hunger and Satiety Gauge Reward SensitivityFront. Endocrinol 8
- 38.A transient dopamine signal encodes subjective value and causally influences demand in an economic contextProc.Natl.Acad.Sci.U.S.A 114:E11303–E11312
- 39.Dopamine neuron activity before action initiation gates and invigorates future movementsNature 554:244–248
- 40.Neuronal basis of sequential foraging decisions in a patchy environmentNat.Neurosci 14:933–939
- 41.Posterior Cingulate Neurons Dynamically Signal Decisions to Disengage during ForagingNeuron 96:339–347
- 42.Vigor of reaching movements: reward discounts the cost of effortJ.Neurophys 119:2347–2357
- 43.Neural dynamics underlying birdsong practice and performanceNature 599:635–639
- 44.Synchronous spiking of cerebellar Purkinje cells during control of movementsProc. Natl. Acad. Sci. 119
- 45.Saccadic dysmetria and adaptation after lesions of the cerebellar cortexJ.Neurosci 19:10931–10939
- 46.Locus coeruleus stimulation potentiates Purkinje cell responses to afferent input: The climbing fiber systemBrain Res 222:43–64
- 47.Behavioral training of marmosets and electrophysiological recording from the cerebellumJ.Neurophysiol 122:1502–1517
- 48.Complex spikes perturb movements, revealing the sensorimotor map of Purkinje cellsbioRxiv :2023–4
- 49.K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition in (2016), pp. 770– 778.Deep Residual Learning for Image Recognition :770–778
- 50.Reformative self-control and discounting of reward value by delay or effortJpn. Psychol. Res 46:1–9
- 51.Separate valuation subsystems for delay and effort decision costsJ.Neurosci 30:14080–14090
- 52.Behavioral modeling of human choices reveals dissociable effects of physical effort and temporal delay on reward devaluationPLoS.Comput.Biol 11
- 53.Cortex-dependent corrections as the tongue reaches for and misses targetsNature :1–6