Figures and data

Overview of experimental setup and conceptual model.
A Experimental setup in the dynamic Pavlovian conditioning task of Grossman et al.(16). In vivo electrophysiological recordings of the activity of individual geneticallyidentifieddorsal raphe nucleus (DRN) serotonin neurons were collected from headfixedmice receiving stochastic water rewards in a trace conditioning task. Trialsconsisted of a 1 s odour cue followed by a 1 s trace period and 3 s reward collectionwindow. Inter-trial interval (ITI) durations followed an exponential distribution with mean 3.3 s. Rewards were delivered with probability 20%, 50%, or 80% in a block structure with random uncued transitions. See Methods and ref. 16 for details. B Conceptual model. Recent reward history is used to calculate value, defined as an estimate of the probability of receiving a reward in the upcoming trial. We hypothesize that high value is reflected in both an increase in the ITI firing rates of serotonin neurons and an increase in anticipatory licking. C Time-course of reward collection probability (top), serotonin neuron firing rate (middle), and number of anticipatory licks per trial (bottom) for an example recording session. For illustration purposes, all data are smoothed using Gaussian process regression (squared exponential kernel with length scale 20 trials, observation standard deviation set to 


Reward collection, anticipatory licking behaviour, and serotonin neuron firing rates evolve slowly over time in a dynamic Pavlovian task.
Left column shows sample autocorrelation (individual sessions or neurons in gray, mean across sessions or neurons in blue), right column shows the estimated power spectrum (computed from the corresponding mean autocorrelation, see Section 5.3.3). A Autocorrelation and power spectrum of reward collection. The power spectrum has a peak at 1/63 cycles trial⨪1. N = 28 sessions. B Autocorrelation and power spectrum of anticipatory licking. The power spectrum has peaks at 1/377 cycles trial⨪1 and 1/63 cycles trial⨪1. N = 28 sessions. C Autocorrelation and power spectrum of an exponentially-weighted moving average of reward collection using a time constant of τ = 4.5 trials. Only the mean across N = 28 sessions is shown. The power spectrum has a peak at 1/64 cycles trial⨪1. Compare with B. D Autocorrelation and power spectrum of serotonin neuron firing activity, quantified as the mean inter-spike interval during each inter-trial interval. The power spectrum has a peak at 1/339 cycles trial⨪1. N = 37 neurons. E Autocorrelation and power spectrum of an exponentially-weighted moving average of reward collection using a time constant of τ = 45 trials. Only the mean across N = 28 sessions is shown. The power spectrum has a peaks at 1/319 cycles trial⨪1 and 1/69 cycles trial⨪1. Compare with D. Autocorrelations in C and E are normalized such that the autocorrelations at lag one match the means shown in B and D, respectively. For clarity, the autocorrorelation at zero lag is not shown. The 20–70 trial block length used in the experiment and the corresponding frequency range of 1/70–1/20 cycles trial⨪1 are indicated in gray.

Value coding features of serotonin neurons.
A Generative model of serotonin neuron activity. Rewards are passed into an exponentially-weighted moving average with timescale τ to estimate value vt (equivalent to learning value according to the Rescorla–Wagner rule) which is then scaled (by gain factor β1) and offset (by intercept β0) to produce an estimated firing rate 



Value is confounded with thirst when learning is slow.
A Dynamics of value as a function of the learning timescale τ. Each line represents one experimental session. An exemplar session is highlighted in blue. B Dynamics of thirst, defined as a quantity that decreases each time the animal consumes a water reward, starting at one and decreasing to zero over the course of a session. Each line represents one experimental session. The exemplar session from A is highlighted in green. C Comparison of value and thirst for the exemplar session highlighted in A and B. Value (blue) is shown using a learning timescale of τ = 70 trials. Thirst (green) is normalized so that its mean and standard deviation match those of value. D Confounding between value and thirst as a function of the learning timescale τ. Confounding is measured using the squared coefficient of determination, representing the fraction of variance in value that can be explained by thirst (and vice-versa). Each line represents one session. The exemplar session from A and B is highlighted in blue, and the black dot corresponds to the curves shown in C.

Analysis of anticipatory licking.
A Overview of behavioural model. Value computed based on a short reward history (fast value, τ = 1 trial), medium reward history (medium value, τ = 10 trials), long reward history (slow value, τ chosen to match distribution found in serotonin neurons, mean of approximately 72 trials), and thirst are combined to form a latent that drives anticipatory licking if the animal believes it is in a r egular, rewarded trial rather than a catch trial. See Methods for details. B Data, MAP model prediction, and corresponding latent components for an exemplar session. For clarity, perceived catch trials are not shown (note gaps in blue line). C Mean weight assigned to 5-HT-related (slow value and thirst) vs. non-5-HT-related (fast and medium value) latent components across N = 28 sessions. D Mean weight assigned to medium value vs. fast value across N = 28 sessions. E Mean weight assigned to slow value vs. thirst across N = 28 sessions.

Summary of quantities of interest in the hierarchical Bayesian mixture model of serotonin neuron activity.

Features of value and thirst coding in serotonin neurons.
A Conceptual introduction to the value, thirst, and null coding mixture model. A hypothetical one-dimensional dataset of hairline measurements in tenured faculty is illustrated at top. A mixture model can infer the average hairlines of male and female faculty as well as the proportion of faculty that are male or female, even when gender information is missing from the data. Similarly, the mixture model of serotonin neuron coding features can infer the properties of value, thirst, and null coding neurons as well as the proportion of neurons of each type (bottom). B Inferred proportion of neurons that are value, thirst, or null coding. Brighter colours in the main plot indicate higher posterior density. The uniform prior is illustrated schematically at top right. C Mean learning timescale τ across the population of value-coding serotonin neurons. D Standard deviation of the learning timescale τ across the population of value-coding serotonin neurons on a log10 scale. E 10th percentile of the distribution of the learning timescale τ across the population of value-coding serotonin neurons. F 90th percentile of the distribution of the learning timescale τ across the population of value-coding serotonin neurons. G Fraction of cells in this dataset with positive firing rate gain for value β(val). Pie plot shows the MAP estimate of the proportion of neurons with positive (negative) firing rate gain in black (gray). Note that the prior (black line) is based on a default assumption that positive and negative gains are equally common in the population. H Fraction of cells in this dataset with positive firing rate gain for thirst β(thr). Presented as in G. All parameter estimates are based on a model fitted to 12 387 trials from 37 neurons. Black lines and coloured histograms in C–H indicate the prior and posterior, respectively, for each parameter or derived feature. B–D show parameters that are directly included in the model, while E–H are derived from fitted parameters. Partial-pooling across neurons was used for the quantities shown in C–F, full pooling was used for the parameters in B, and no pooling was used for G or H. See Section 5.3.5 for details.

Additional features of value, thirst, and null coding mixture model.
Related to Fig. S1. A Estimated normalized firing rate gain and learning timescale for each neuron. Points and error bars represent the median and IQR of the posterior. Neurons for which the 95 % credibility interval for the normalized firing rate gain includes zero are indicated in gray. B Distribution of estimated initial values v0,[i] from the maximum a posteriori (MAP) sample (blue histogram) compared with the prior (black line). C Posterior distribution of mean firing rate in the population (blue) compared with the prior (black).

Time-course of reward collection across recording sessions.
A Heatmaps showing whether or not a reward was collected on each trial, aligned to either the start (left) or end (right) of the experiment. Trials in which a reward was collected are shown in gray, no-reward trials are shown in black. B Time-course of reward collection aligned to fraction of session length. Each gray line represents data from one row of A smoothed using Gaussian process regression (observation variance set to 0.45 × (1 ⨪ 0.45), squared exponential kernel with length scale 1/6). Black line represents the expected mean reward probability of 45 %. Note that the reward collection probability is high at the start of the experiment because the first block of trials always has a reward probability of 80 %. C Heatmaps showing trials where a reward was offered but not collected (ignored rewards), aligned as in A. Trials in which a reward was offered but not collected are shown in gray, trials in which a reward was either not offered or offered and collected are shown in black. D Time-course of the rate of ignored rewards, presented as in B.

Time-course of inter-trial intervals (ITIs) with missing inter-spike interval (ISI) data across neurons.
Missing ISI data is the result of ITIs with fewer than two spikes. This can occur if the ITI is shorter than the mean ISI (the minimum ITI duration is 0 s while serotonin neuron ISIs are typically approx. 0.2–1 s), or if the neuron is lost due to spike sorting or other recording issues. A Heatmaps showing whether or not each ITI has at least one ISI, aligned to either the start (left) or end (right) of the experiment. ITIs with zero ISIs are shown in black, ITIs with at least one ISI are shown in gray. B Time-course of missing ISI data aligned to fraction of session length. Each gray line represents data from one row of A smoothed using Gaussian process regression (observation variance set to p × (1 ⨪ p), where p is the fraction of non-empty it is in the corresponding session; squared exponential kernel with length scale 1/6).

Value coding model residuals compared with model assumptions.
A Histogram of standardized inter-spike interval residuals (blue) compared with the normal distribution assumed by the model (black line). B Autocorrelation of standardized residuals for each neuron (gray lines) compared with the model assumption of zero autocorrelation (black line). For clarity, the autocorrelation at zero lag is not shown. Residuals are net of drift captured using a third-order autoregressive model.

Firing rate intercept prior.
A Histogram of estimated firing rate intercepts β0 from ref. 9 with fitted gamma distribution. N = 50 neurons from ref. Figs. 3C and 6D. B Gamma distributions fitted to bootstrapped firing rate intercept data from A. Note variability due to sampling error. C Estimated distribution of gamma distribution parameters across bootstrap samples. Contours show a kernel density estimate of the joint distribution of ln α (natural log shape) and ln θ (natural log scale) parameters across bootstrap samples. Heatmap shows fitted multivariate Gaussian distribution to be used as a prior. D Draws from Gaussian prior distribution over ln α and ln θ. Note close resemblance to the bootstrap distribution in B.

Verification of firing rate intercept prior distribution.
Histogram of firing rate intercepts from ref. 9 Fig. 8S2A (test data; N = 28 neurons) and fitted gamma distribution (black line) compared with the distribution fitted to data from ref. Figs. 3C and 6D (dashed green line; N = 50 neurons).
