Neuroscience

The value of initiating a pursuit in temporal decision-making

Elissa Sutlief
Charlie Walters
Tanya Marton author has email address
Marshall G Hussain Shuler author has email address

Department of Neuroscience, Johns Hopkins University School of Medicine, 725 N. Wolfe Street, Baltimore, MD 21205, USA
Kavli Neuroscience Discovery Institute, Department of Neuroscience, Johns Hopkins University School of Medicine, 725 N. Wolfe Street, Baltimore, MD 21205, USA

https://doi.org/10.7554/eLife.99957.1

Open access
Copyright information

Figures and data

Fundamental classes of temporal decision-making regarding initiating a pursuit: “Forgo” and “Choice”. Fundamental classes of temporal decision-making regarding initiating a pursuit: “Forgo” and “Choice”. 1st row- Topologies. The temporal structure of worlds exemplifying Forgo (left) and Choice (right) decisions mapped as their topologies. Forgo: A forgo decision to accept or reject the purple pursuit. When exiting the gold pursuit having obtained its reward (small blue circle), an agent is faced with 1) a path to re-enter gold, or 2) a path to enter the purple pursuit, which, on its completion, re-enters gold. Choice: A choice decision between an aqua pursuit, offering a small reward after a short amount of time, or a purple pursuit offering a larger amount of reward after a longer time. When exiting the gold pursuit, an agent is faced with a path to enter 1) the aqua or 2) the purple pursuit, both of which lead back to the gold pursuit upon their completion. 2nd row-Policies. Decision-making policies chart a course through the pursuit-to-pursuit structure of a world. Policies differ in the reward obtained, and in the time required, to complete a traversal of that world under that policy. Policies of investing less (left) or more (right) time to traverse the world are illustrated for the considered Forgo and Choice worlds. Forgo: A policy of rejecting the purple pursuit to re-enter the gold pursuit (left) acquires less reward though it requires less time to make a traversal of the world than a policy of accepting the purple option (right). Choice: A policy of choosing the aqua pursuit (left) results in less reward though requires less time to traverse the world than a policy of choosing the purple pursuit (right). 3rd row-Time/reward investment. The times (solid horizontal lines, colored by pursuit) and rewards (vertical blue lines) of pursuits, and their associated reward rates (dashed lines) acquired under a policy of forgo or accept in the Forgo world, or, of choosing the sooner smaller or later larger pursuit in the Choice world.

Global reward rate with respect to parceling the world into “in” and “outside” the considered pursuit.
A-C as in Figure 1 “Forgo”. D) The world divided into “Inside” and “Outside” the purple pursuit, as the agent decides whether to forgo or accept. The axes are centered on the position of the agent, just before the purple pursuit, where the upper right quadrant shows the inside (purple) pursuit’s reward rate (ρ_in), while the bottom left quadrant shows the outside (gold) pursuit reward rate (ρ_out). The global reward rate (ρ_g) is shown in magenta, calculated from the equation in the box to the right. The agent may determine the higher reward rate yielding policy by comparing the outside reward rate (ρ_out) with the resulting global reward rate (ρ_g) under a policy of accepting the considered pursuit.

Forgo Decision-making.
A) When the reward rate of the considered pursuit exceeds that of its outside rate, the global reward rate will be greater than the outside, and therefore the agent should accept the considered pursuit. B) When the reward rates inside and outside the considered pursuit are equivalent, the global reward rate will be the same when accepting or forgoing: the policies are equivalent. C) When the reward rate of the considered pursuit is less than its outside rate, the resulting global reward rate if accepting the considered pursuit will be less than its outside reward rate and therefore should be forgone.

The Subjective Value (sv) of a pursuit is the global reward rate-equivalent immediate reward magnitude.
The subjective value of a pursuit is that amount of reward requiring no investment of time that the agent would take as equivalent to accepting and acquiring the considered pursuit. For this amount to be equivalent, the immediate reward magnitude must result in the same global reward rate as that of accepting the pursuit. The global reward rate obtained under a policy of accepting the considered pursuit type is the slope of the line connecting the average times and rewards obtained in and outside the considered pursuit type. Therefore, the global reward rate equivalent immediate reward (i.e., the subjective value of the pursuit) can be depicted graphically as the y-axis intercept of the line representing the global reward rate achieved under a policy of accepting the considered pursuit.

Equivalent expressions for subjective value reveal time’s cost comprises an opportunity as well as apportionment cost.
A. The subjective value of a pursuit can be expressed in terms of the global reward rate obtained under a policy of accepting the pursuit. It is how much extra reward is earned from the pursuit over its duration than would on average be earned under a policy of accepting the pursuit. B. The cost of time of a pursuit is the amount of reward earned on average in an environment over the time needed for its obtainment under a policy of accepting the pursuit. The reward rate earned on average is the global reward rate (slope of maroon line). Projecting that global reward over the time of the considered pursuit (dashed maroon line) provides the cost of time for the pursuit (vertical maroon bar). Therefore, the subjective value of a pursuit is equivalent to its reward magnitude less the cost of time of the pursuit. C. Expressing subjective value with respect to the outside reward rate rather than the global reward rate reveals that a portion of a pursuit’s time costs arises from an opportunity cost (orange bar). The opportunity cost of a pursuit is the amount of reward earned over the considered pursuit’s time on average under a policy of not taking the considered pursuit (the outside reward rate (slope of gold line). Projecting the slope of the gold line over the time of the considered pursuit (dashed gold line) provides the opportunity cost of the pursuit (vertical orange bar). The opportunity cost-subtracted reward (cyan bar) can then be scaled to a magnitude of reward requiring no time investment that would be equivalent to investing the time and acquiring the reward of the pursuit, i.e., its subjective value. The equation’s denominator provides this scaling term, which is the proportion that the outside time is to the total time to traverse the world (the equation’s denominator). D. The difference between time’s cost and the opportunity cost of a pursuit is a pursuit’s apportionment cost (brown bar). The apportionment cost is the amount of the opportunity subtracted reward that would occur on average over the pursuit’s time under a policy of accepting the pursuit. E&F. Whether expressed in terms of the global reward rate achieved under a policy of not accepting the considered pursuit (E) or accepting the considered pursuit (F), the subjective value expressions are equivalent.

The impact of outside reward on the subjective value of a pursuit.
A) Increasing the outside reward while holding the outside time constant increases the outside reward rate (slope of gold lines), resulting in increasing the global reward rate (slope of the purple lines), and decreasing the subjective value (green dots) of the pursuit. As the reward rate of the environment outside the considered pursuit type increases from lower than, to higher than that of the considered pursuit, the subjective value of the pursuit decreases, becomes zero when the in/outside rates are equivalent, and goes negative when ρ_out exceeds ρ_in. B) Plotting the subjective value of the pursuit as a function of increasing the outside reward (while holding t_out constant) reveals that the subjective value of the pursuit decreases linearly. This linear decrease is due to the linear increase in the cost of time of the pursuit (purple dotted region). C) Time’s cost (the area, as in B, between the pursuit’s reward magnitude and its subjective value) is the sum of the opportunity cost of time (orange dotted region) and the apportionment cost of time (plum annuli region). When the outside reward rate is zero, time’s cost is composed entirely of an apportionment cost. As the outside reward increases, opportunity cost increases linearly as apportionment cost decreases linearly, until the reward rates in and outside the pursuit become equivalent, at which point the subjective value of the pursuit is zero. When subjective value is zero, the cost of time is entirely composed of opportunity cost. As the outside rate exceeds the inside rate, opportunity cost continues to increase, while the apportionment cost becomes negative (which is to say, the apportionment cost of time becomes an apportionment gain of time). Adding the positive opportunity cost and the negative apportionment cost (subtracting the purple & orange region of overlap from opportunity cost) yields the subjective value of the pursuit.

The impact of the apportionment cost of time on the subjective value of a pursuit.
A) The apportionment cost of time can best be illustrated dissociated from the contribution of the opportunity cost of time by considering the special instance in which the outside has no reward, and therefore has a reward rate of zero. B) In such instances, the pursuit still has a cost of time, however. C) Here, the cost of time is entirely composed of apportionment cost, which arises from the fact that the considered pursuit is contributing its proportion to the global reward rate. How much is the pursuit’s time cost is thus determined by the ratio of the time spent in the pursuit versus outside the pursuit: the more time is spent outside the pursuit, the less the apportionment cost of time of the pursuit, and therefore, the greater the subjective value of the pursuit. When apportionment cost solely composes the cost of time, the cost of time decreases hyperbolically as the outside time increases, resulting in the subjective value of a pursuit increasing hyperbolically.

The effect of changing the outside time and the outside reward rate on the subjective value of a pursuit.
A) The subjective value (green dots) of the considered pursuit when changing the outside time and outside reward rate. B) As outside time increases under these conditions (holding positive outside reward constant), the subjective value of the pursuit increases hyperbolically, from the negative of the outside reward magnitude to, in the limit, the inside reward magnitude. Conversely, time’s cost (purple annuli) decreases hyperbolically. C) Opportunity cost decreases hyperbolically as outside time increases. Apportionment cost initially decreases to zero as the outside and inside rates become equal, and then increases as the difference of two hyperbolas (plum annuli area). When the outside reward rate is greater than the inside reward rate, apportionment could be said to have a gain (a negative cost). Summing opportunity cost and apportionment cost yields time’s cost.

Policy options considered during the initiation of pursuits in worlds with a “Choice” topology.
A-C) Choice topology, and policies of choosing the small-sooner or larger-later pursuit, as in Figure 1 “Choice”. D) The world divided into “Inside” and “Outside” the selected pursuit, as the agent decides whether to accept SS (aqua) or LL (purple) pursuit. The global reward rate (ρ_g) under a policy of choosing the SS or LL (slopes of the magenta lines), calculated from the equation in the box to the right.

Effect of opportunity cost on subjective value in choice decision-making.
The effect of increasing the outside reward while holding the outside time constant is to linearly increase the opportunity cost of time, thus decreasing the subjective value of pursuits considered in choice decision-making. When the outside reward is sufficiently small, the subjective value of the LL pursuit can exceed the SS pursuit, indicating that selection of the LL pursuit would maximize the global reward rate. As outside reward increases, however, the subjective value of pursuits will decrease linearly as the opportunity cost of time increases. Since a policy of choosing the LL pursuit will have the greater opportunity cost, the slope of its function relating subjective value to outside reward will be greater than that of a policy of choosing the SS pursuit. Thus, outside reward can be increased sufficiently such that the subjective value of the LL and SS pursuits will become equal, past which the agent will switch to choosing the SS pursuit.

Effect of apportionment cost on subjective value in choice decision-making.
The effect of increasing the outside time (while maintaining outside rate) is to decrease the apportionment cost of the considered pursuit, thus increasing its subjective value. When the outside time is sufficiently small, the apportionment cost for LL and SS pursuits will be large, but can be greater still for the LL pursuit given its proportionally longer duration to the outside time. As outside reward time increases, however, the subjective value of pursuits increase as the apportionment cost of time of the considered pursuit decreases. As apportionment costs diminish and the magnitudes of pursuits’ rewards become more fully realized, the subjective value of the LL pursuit will eventually exceed that of the SS pursuit at sufficiently long outside times.

Effect of varying opportunity and apportionment costs on Choice behavior.
The effect of increasing the outside time while maintaining outside reward is to decrease the apportionment as well as the opportunity cost of time, thus increasing pursuit’s subjective value. Increasing outside time, which in turn, also decreases outside reward rate, results in the agent appearing as if to become more patient, being willing to switch from a policy of selecting the SS pursuit to a policy of selecting the LL pursuit past some critical threshold (vertical dashed black line).

The temporal discounting function of a global reward-rate optimal agent is a hyperbolic function relating the apportionment and opportunity cost of time.
A-C) The effect, as exemplified in three different worlds, of varying the outside time and reward on the subjective value of a pursuit as its reward is displaced into the future. The subjective value, sv, of this pursuit, as its temporal displacement into the future increases, is indicated as the green dots along the y-intercept in these three different contexts: a world in which there is A) zero outside reward rate & large outside time, B) zero outside reward rate & small outside time, and C) positive outside reward rate & the small outside time as in B. D-F) Replotting these subjective values at their corresponding temporal displacement yields the subjective value function of the offered reward in each of these contexts. G: Normalizing these subjective value functions by the reward magnitude and superimposing the resulting temporal discounting functions reveals how the steepness and curvature of the apparent discounting function of a reward rate maximizing agent changes with respect to the average reward and time spent outside the considered pursuit. When the time spent outside is increased (compare B to A)—thus decreasing the apportionment cost of time—the temporal discounting function becomes less curved, making the agent appear as if more patient. When the outside reward is increased (compare B to C)—thus increasing the opportunity cost of time—the temporal discounting function becomes steeper, making the agent appear as if less patient.

Reward-rate maximizing agents would exhibit the “Magnitude effect”.
A&B) The global reward rate (the slope of magenta vectors) that would be obtained when acquiring a considered pursuit’s reward of a given size (either relatively large as in A or small as in B) but at varying temporal removes, depicts how a considered pursuit’s subjective value (green dots, y-intercept) would decrease as the time needed for its obtainment increases in environments that are otherwise the same. C&D) Replotting the subjective values of the considered pursuit to correspond to their required delay forms the subjective value-time function for the “large” reward case (C), and the “small” reward case (D). E) Normalizing the subjective value-time functions by their reward magnitude transforms these functions into their corresponding discounting functions (blue: large reward DF; red: small reward DF), and reveals that a reward-rate maximizing agent would exhibit the “Magnitude Effect” as the steepness of the apparent discounting function would change with the size of the pursuit, and manifest as being less steep the greater the magnitude of reward.

Reward-rate maximizing agents would exhibit the “Sign effect”.
A&B) The global reward rate (the slope of magenta lines) that would be obtained when acquiring a considered pursuit’s outcome of a given magnitude but differing in sign (either rewarding as in A, or punishing as in B), depicts how the subjective value (green dots, y-intercept) would decrease as the time of its obtainment increases in environments that are otherwise the same (one in which the agent spends the same amount of time and receives the same amount of reward outside the considered pursuit for every instance within it). C&D) Replotting the subjective values of the considered pursuit to correspond to their required delay forms the subjective value-time function for the reward (C) and for the punishment (D). E) Normalizing the subjective value-time functions by their outcome transforms these functions into their corresponding discounting functions (blue: reward DF; red: punishment DF). This reveals that a reward-rate maximizing agent would exhibit the “Sign Effect”, as the steepness of the apparent discounting function would change with the sign of the pursuit, manifesting as being less steep for punishing than for rewarding outcomes of the same magnitude.

Relationship between outside time and reward with optimal temporal decision-making behavioral transitions.
An agent may be presented with three decisions: the decision to take or forgo a smaller, sooner reward of 2.5 units after 2.5 seconds (SS pursuit), the decision to take or forgo a larger, later reward of 5 units after 8.5 seconds (LL pursuit), and the decision to choose between the SS and LL pursuits. The slope of the purple line indicates the global reward rate (ρ_g) resulting from a Choice or Take policy, while the slope of “outside” the pursuit (golden line) indicates the outside reward rate (i.e., global reward rate resulting from a Forgo policy). In each panel (A-D), an example outside reward rate is plotted, illustrating the relative ordering of ρ_g slopes for each policy. Location in the lower left quadrant is thereby shaded according to the combination of global rate-maximizing policies for each of the three decision types.

Patterns of suboptimal temporal decision-making behavior resulting from time and/or reward misestimation.
Patterns of temporal decision-making in Choice and Forgo situations deviate from optimal (top row) under various parameter misestimations (subsequent rows). Characterization of the nature of suboptimality is aided by the use of the outside reward rate as the independent variable influencing decision-making (x-axis), plotted against the degree of error (y-axis) of a given parameter (ω<1 underestimation, ω=1 actual, ω>1 overestimation). The leftmost column provides a schematic exemplifying true outside (gold) and inside (blue) pursuit parameters and the nature of parameter error (dashed red) investigated per row (all showing an instance of underestimation). For each error case, the agent’s resulting choice between SS and LL pursuits (2nd column), decision to take or forgo the LL pursuit (3rd column), and decision to take or forgo the SS pursuit (4th column) are indicated by the shaded color (legend, bottom of columns) for a range of outside rates and degrees of error. The rightmost column depicts the triplet of behavior observed, combined across tasks. Rows: A) “No error” - Optimal choice and forgo behavior. Vertical white lines show outside reward rate thresholds for optimal forgo behavior. B-G) Suboptimal behavior resulting from parameter misestimation. B-D) The impact of outside pursuit parameter misestimation. B) “Outside Time”- The impact of misestimating outside time (and thus misestimating outside reward rate). C) “Outside Reward”- The impact of misestimating outside reward (and thus misestimating outside reward rate). D) “Outside Time & Reward”- The impact of misestimating outside time and reward, but maintaining outside reward rate. E-G) The impact of inside pursuit parameter misestimation. E) “Pursuit Time”- The impact of misestimating inside pursuit time (and thus misestimating inside pursuit reward rate. F) “Pursuit Reward” - The impact of misestimating the pursuit reward (and thus misestimating the pursuit reward rate). G) “Pursuit Time and Reward” - The impact of misestimating the pursuit reward and time, but maintaining the pursuit’s reward rate. For this illustration, we determined the policies for a SS pursuit of 2 reward units after 2.5 seconds, a LL pursuit of 4.75 reward units after 8 seconds, and an outside time of 10 seconds. The qualitative shape of each region and resulting conclusions are general for all situations where the SS pursuit has a higher rate than the LL pursuit (and where a region exists where the optimal agent would choose LL at low outside rates).

Opportunity cost, apportionment cost, time cost, and subjective value functions by change in outside and inside reward and time.
Functions assume positive inside and outside rewards and times. ¹If outside reward rate is zero, opportunity cost becomes a constant at zero. ²If outside reward rate is zero, as outside or inside time is varied, apportionment cost becomes purely hyperbolic.

The cost of time of a pursuit comprises both an opportunity as well as an apportionment cost.
The global reward rate under a policy of accepting the considered pursuit type (slope of magenta time), times the time that that pursuit takes (t_in), is the pursuit’s time’s cost (height of maroon bar). The subjective value of a pursuit (height of green bar) is its reward magnitude (height of the purple bar) less its cost of time. Opportunity and apportionment costs are shown to compose the cost of time of a pursuit. Opportunity cost associated with a considered pursuit, ρ_out*t_in, (height of orange bar) is the reward rate of the world under a policy of not accepting the considered pursuit (its outside rate), ρ_out, times the time of the considered pursuit, t_in. The amount of reward that would be (on average) obtained over the time of accepting the considered pursuit—were there to be no opportunity cost—is the apportionment cost of time (height of brown bar).

Comparison of typical hyperbolic discounting versus apparent discounting of a reward-rate optimal agent.
Whereas (A) the curvature of hyperbolic discounting models is typically controlled by the free fit parameter k, (B) the curvature and steepness of the apparent discounting function of a reward rate optimal agent is controlled by the time spent and reward rate obtained outside the considered pursuit. Understanding the shape of discounting models from the perspective of a reward-rate optimal agent reveals that k ought relate to the apportionment of time spent in, versus outside, the considered pursuit, underscoring, how typical hyperbolic discounting models fail to account for the opportunity cost of time (and thus cannot yield negative sv’s no matter the temporal displacement of reward). Should k be understood as representing time’s apportionment cost, the failure to account for the opportunity cost of time would lead to aberrantly high values of k.

The Malapportionment Hypothesis.
The Malapportionment Hypothesis holds that suboptimal decision-making, as revealed under Choice decision-making, arises in humans and animals as a consequence of the valuation process underweighting the contribution of accurately assessed reward rates outside versus inside the considered pursuit type. A) An example Choice situation where the global reward rate is maximized by choosing a larger later reward over a smaller sooner reward. B) An agent that underweights the outside time but accurately appreciates the outside and inside reward rates, overestimates the global reward rate resulting from each policy, and thus exhibits suboptimal impatience by selecting the smaller sooner reward. C) Similarly, an agent that overweights the time inside the considered pursuits but accurately appreciates the outside and inside reward rates also overestimates the global reward rate and selects the smaller sooner reward. As inside and outside reward rates are accurately assessed, forgo decisions can correctly be made despite any misappreciation of the relative time spent in/outside the considered pursuit.

Sign up for email alerts