(A) Distribution of inter-trial intervals (ITIs) for three representative rats. Trials were self-paced, rats were free to initiate trials within 100–200 ms of the preceding trial. If rats terminated the trial early by breaking center fixation, they were penalized with a time-out penalty (those trials are not shown). (B) Average number of trials per session for each rat, excluding trials that were terminated prematurely. Mean of this distribution (368 trials/session) is shown by the red arrow. (C) Mean behavioral performance across rats, including >2.5 million trials. Percent of trials all rats chose the safe option for each of the four safe side volumes. Axes show the probability and volume of risky alternatives. Mean performance across 36 rats (normalized to max before averaging). (D) Estimates of conditional probabilities in finite sequential data can have small biases (Miller and Sanjurjo, 2015). If this bias were driving sequential effects in our data, such as increased willingness to take risks following risky wins, we reasoned that computing this bias from random flips (of the same length as our data) of a weighted coin would also reveal an effect. Therefore, we generated random choices for each rat with a generative probability corresponding to the mean probability of choosing the safe option for that rat. We then calculated the change in the probability of choosing the safe option based on reward history for the simulated choices; the same number of trials that were used in Figure 1F were applied to this analysis. There was no observable risky win-stay bias in the simulated dataset, indicating that the effect we observed did not reflect biased estimates of conditional probabilities. (E) Difference in probability of choosing the safe option following guaranteed rewards and risky rewards of different probabilities (relative to the mean probability of choosing safe) for simulated data, as in B. Randomly simulated choices with the same sample sizes as the data (Figure 1G) did not exhibit a bias for risky choices with a graded dependence on reward probability. p=0.80 of slope parameter of least-squares regression line (dashed line). Therefore, the risky win-stay bias we observe, with graded dependence on reward probability, does not reflect biased estimation of conditional probabilities. (F) Difference in probability of choosing the safe option following guaranteed rewards, or risky unrewarded choices. There was no systematic, significant change in probability of choosing safe following unrewarded trials (paired t-test comparing change in probability of choosing safe).