Abstract
More than four decades ago, Gibbon and Balsam (1981) showed that the acquisition of Pavlovian conditioning in pigeons is directly related to the informativeness of the conditioning stimulus (CS) about the unconditioned stimulus (US), where informativeness is defined as the ratio of the US-US interval (C) to the CS-US interval (T). However, the evidence for this relationship in other species has been equivocal. Here, we describe an experiment that measured the acquisition of appetitive Pavlovian conditioning in 14 groups of rats trained with different C/T ratios (ranging from 1.5 to 300) to establish how learning is related to informativeness. We show that the number of trials required for rats to start responding to the CS is determined by the C/T ratio and, remarkably, the specific scalar relationship between the rate of learning and informativeness aligns very closely to that previously obtained with pigeons. We also found that the response rate after extended conditioning is strongly related to T, with the terminal CS response rate being a scalar function of the CS reinforcement rate (1/T). Moreover, this same scalar relationship extended to the rats’ response rates during the (never-reinforced) inter-trial interval, which was directly proportional to the contextual rate of reinforcement (1/C). The findings establish that animals encode rates of reinforcement, and that conditioning is directly related to how much information the CS provides about the US. The consistency of the data across species, captured by a simple regression function, suggests a universal model of conditioning.
More than a century of laboratory-based research has been devoted to investigating how animals learn about simple relationships between events, such as learning to respond to a conditioned stimulus (CS) that is followed by an unconditioned stimulus (US) or learning to perform a specific action that is reinforced by a rewarding US. Much of that research has focussed on identifying what properties about the CS-US or response-US relationship are most important for learning. There is widespread consensus about the importance of three particular properties. One is the temporal contiguity between the events: conditioning emerges sooner when the US follows the CS or response closely in time. Another is the spacing of the learning trials: conditioning takes fewer trials when there is a long time-interval between each CS-US or response-US pairing. The third property is the contingency between the events: conditioning is more successful when the US occurs reliably in the presence of the CS or response, and does not occur in their absence.
Acquisition of Conditioning, C-over-T, and Informativeness
A landmark study, conducted more than 40 years ago, demonstrated that the first two of these properties—contiguity and trial spacing—are interdependent and subserved by a single principle that encompasses all three properties. In two large-scale experiments, Gibbon and colleagues (1977) measured the number of trials required for pigeons to start pecking at a key-light (the CS) as they learned it was followed by food (the US). Different groups of pigeons were trained with different trial durations (i.e., the interval between onset of the CS and delivery of the US; henceforth T). T ranged from 1 s to 64 s across 41 groups in two experiments. The inter-trial interval (ITI) also varied between groups, ranging from 6 s to 768 s. Gibbon et al. recorded the number of training trials required for each pigeon to start responding reliably during the CS (responding on 3 out of 4 consecutive trials). The birds required more trials as T increased, confirming the effect of CS-US contiguity. They also required fewer trials as the ITI increased, confirming the trial-spacing effect. More importantly, however, these two effects were completely complementary, such that an increase in T had no effect if it was accompanied by a proportional increase in the ITI. Thus, the rate at which the pigeons acquired the conditioned response systematically increased as the ratio of ITI to T increased, but there was no separate effect of varying ITI or T when their ratio was held constant.
The extent of the relationship between contiguity and trial-spacing was established by a meta-analysis that combined data from all 41 groups tested by Gibbon et al. (1977) with data from 11 other experiments investigating the acquisition of key-peck responses in pigeons (Gibbon & Balsam, 1981). This analysis compared the number of trials to criterion against the ratio of C/T (where C is the interval between USs, equal to T + ITI). Their data on pigeon autoshaping are well described by a regression model whose only parameter is the x- intercept, which is the C/T value that produces acquisition after a single trial in the median pigeon (k in Figure 1). When C/T > 4, as it is in most Pavlovian protocols, the slope of the regression model on loglog coordinates is approximately −1. Thus, over most of the C/T range, the number of reinforcements at acquisition is inversely proportional to the C/T ratio; doubling the ratio reduces required reinforcements by a factor of 2.
When reinforcements are delivered only during the CS, the C/T ratio is the ratio of the rate of reinforcement during CSs to the contextual rate, the rate subjects expect when in the experimental chamber, without regard to whether the CS is or is not present. Balsam and Gallistel (2009) termed the ratio of the CS rate of reinforcement to the contextual rate the informativeness, because the log of informativeness is the mutual information that CS onsets transmit to a subject about the expected wait for the next reinforcement (lower x-axis in Figure 1). The mutual information transmitted by a CS is the upper limit on the extent to which the CS can reduce the subject’s uncertainty about the wait for the next reinforcement. When the CS rate equals the contextual rate, no information is transmitted because the informativeness ratio is 1, and log(1) = 0.
Contingency is mutual information divided by available information (Gallistel & Latham, 2023). Available information is the amount that reduces a subject’s uncertainty to 0. Because temporal measurement error scales with duration measured (Weber’s Law, Gibbon, 1977b), contingency = 1 only when two events coincide in time (Gallistel & Latham, 2023).
The analysis presented thus far identifies a fundamental principle of associative learning; what animals learn about events is defined in terms of measurable properties of their temporal distributions. To claim this as a general principle of learning, we must seek evidence for the importance of C/T in conditioning paradigms with other species. Several studies have sought evidence that conditioning is related to C/T using an appetitive Pavlovian conditioning paradigm in which rats or mice learn to anticipate the arrival of a food reward (indexed by monitoring their activity at the food cup or licking at a spout) during a CS (Bouton & Sunsay, 2003; Burke, Jeong, Wu, Lee, & Namboodiri, 2023; Holland, 2000; Kirkpatrick & Church, 2000; Lattal, 1999; Thrailkill, Todd, & Bouton, 2020; Ward et al., 2012). However, as summarised below, the evidence from these studies is mixed.
The first evidence for an effect of C/T on the acquisition of responding in rats was reported by Lattal (1999) and Holland (2000), who observed effects that persisted even when all rats were tested with an identical ITI (different groups had been trained with different itis but were shifted to a common interval between trials when tested for responding to the CS). More recently, Ward et al. (2012) reported that the log of number of trials for mice to acquire responding scaled with log(C/T), and this relationship was similar to that described for pigeons (Gibbon et al., 1977; Gibbon & Balsam, 1981). Bouton and Sunsay (2003) also provided evidence for the importance of C/T on Pavlovian conditioning in rats. They showed that conditioning was negatively affected when T was increased 3-fold by inserting two CS-alone presentations between each CS-US trial (a manipulation that affects T but does not affect C and, therefore, reduces C/T) but that conditioning was not affected by a 3-fold increase in T brought about by omitting the US from two out of three CS-US trials (a manipulation that increases both T and C equally and, therefore, does not change C/T). Most recently, Burke et al. (2023) have shown that removing 9 out of 10 CS-US trials from a Pavlovian conditioning schedule, thus increasing C 10-fold without changing T, reduced the number of trials required for learning by a factor of 10.
In addition to finding an effect of C/T on conditioning, both Lattal (1999) and Holland (2000) also found an effect of T that was independent of the C/T ratio. They observed that an increase in T resulted in a decrease in responding even when the ITI was also increased to keep the C/T ratio constant. Further evidence against an influence of C/T ratio comes from a study by Kirkpatrick and Church (2000) in which differences in C/T, ranging from 1.5 to 12, did not produce differences in the acquisition of conditioned responding in rats. Most recently, Thrailkill et al. (2020) observed clear effects of T, but not C/T, on conditioning. Their rats responded at much higher rates to a 10-s CS than to a 60-s CS, even though the groups had identical C/T ratios. When comparing groups on how quickly responding was acquired, they found no systematic effect of C/T on the number of 4-trial conditioning blocks required to reach a response criterion.
In sum, in contrast with the impressive evidence from experiments with pigeons, studies of appetitive conditioning in rats and mice have provided inconsistent evidence for the importance of C/T to conditioning. At the same time, these studies have shown that T has an effect on responding that is independent of C/T. This latter observation is in fact consistent with the evidence from experiments with pigeons. In the original study which established that C/T determines when pigeons start to respond to a CS, Gibbon et al. (1977) also observed that the rate at which the pigeons responded to the CS after extended training was negatively related to T and not related to C/T. In other words, while C/T affected how quickly conditioned responding emerged, T, rather than C/T, determined the level of responding that was ultimately acquired. This distinction may go some way to explaining the inconsistency in the evidence for the effect of C/T in rats and mice. The inconsistencies may be due to differences in how the point of acquisition was identified in different studies, and, in particular, whether differences in the amount of responding affected the measure of when responding was first acquired. This concern is not relevant to the results from pigeon experiments because the appearance of their key-peck response is unambiguous thanks to the fact that the baseline rate of that response is effectively zero. In contrast, rats and mice show activity at the food- cup in the ITI, and therefore researchers using this appetitive paradigm must decide how to take account of the baseline response rate when quantifying conditioned responding during the CS (Lattal, 1999).
The current experiment
The experiment described here attempts to elucidate the role of C/T and T in an appetitive Pavlovian conditioning paradigm with rats by distinguishing their impact on the emergence of responding from their effect on the level of responding subsequently acquired after extended conditioning. Fourteen groups of rats were trained with for 42 sessions with a single CS that was followed on every trial by delivery of food (the US). Each session contained either 10 CS-US trials (Groups 1-11) or 3 CS-US trials (Groups 12-14). Both T and C varied between the groups in an uncorrelated fashion (r = −0.19, p = .519) so that effects of T and C could be assessed independently (summarised in Table 1).
When the location of the CS is spatially separated from the US, as in most appetitive conditioning experiments with pigeons or rodents, the two types of response are mutually incompatible which can impact on the measurement of conditioned responding. For example, any factor that increases food-cup activity, such as reducing the ITI, may reduce evidence for conditioning that is indexed by CS-directed responses like key-pecks. Conversely, evidence for conditioning indexed by food-cup activity may be reduced to the extent that animals also acquire CS-directed responses. These problems can be mitigated if the CS and US are co- located. In the present experiment, the CS was illumination of a small LED inside the magazine. Conditioned responses were measured using an infra-red photobeam across the opening of the magazine that should detect both food-cup activity and approach responses to the CS. CS-US intervals varied randomly from trial to trial (around a mean equal to T) so that response rates remained constant across the length of each trial (Harris & Carpenter, 2011; Harris, Gharaei, & Pincham, 2011).
Results
Rats in all groups eventually responded at a higher rate to the CS than during the ITI. Figure 2 shows the mean CS and ITI response rates for each group across the 42 conditioning sessions. The equivalent of this figure for each rat is included in the Supplementary material. The last panel of Figure 2 (Plot 15) shows the mean response rate per second during the CS for each group, averaged from all trials of the final 5 sessions of the experiment. This shows how responding increased rapidly over the first few seconds of the trial and then remained constant across the trial, consistent with previous experiments using variable CS-US intervals (Harris & Carpenter, 2011; Harris et al., 2011).
Trials to acquisition
We used the cumulative response rates to identify the trial at which conditioned responding first emerged. The earliest evidence for responding to the CS was the trial after which the cumulative response rate during the CS permanently exceeded the cumulative ITI response rate, computed for each rat. Starting from this trial, the difference between CS and ITI cumulative response rates was subjected to one-tail t-tests in each rat to identify the trial at which CS responding was significantly greater than pre-CS responding. The results obtained using p < .05 are shown in Figure 3A. This shows the number of trials required for each rat to reach this statistical criterion, plotted against the informativeness of the CS for that rat (C/T). The filled orange circles in the same figure show the median number of trials for each group of rats trained with the same C/T ratio. These data are superimposed on the data shown in Figure 1 that plots the results of a meta-analysis of many comparable experiments with pigeons (Gibbon & Balsam, 1981), revealing a close agreement between the present results and those previous results.
Correlational analyses, with α set at .017 to correct for multiple tests, were used to assess the relationships between the log of trials to criterion (based on the p<.05 criterion shown in Figure 3A) and log(C/T), log(C), and log(T). Log(C/T) was the strongest predictor of trials to criterion (r = −0.90, p < .001). Log(C) was also significantly correlated with the log of trials to criterion, r = −0.78, p < .001, but log(T) was not correlated with trials to criterion, r = 0.43, p = .13. Given that log(C) and log(C/T) were strongly correlated with one another (r = 0.90), their partial correlations with the log of trials to criterion were calculated to test whether each made independent contributions to the rate of acquisition. After partialling out the effect of log(C), the correlation with log(C/T) remained significant, r = −0.73, p = .005. In contrast, after partialling out the effect of log(C/T), the correlation with log(C) was not significant, r = 0.15, p = .629. In sum, the number of trials required for conditioned responding to emerge was strongly affected by the C/T ratio, and neither C nor T alone had any independent effect.
Figure 3 also shows the number of reinforced trials to acquisition when the acquisition criterion was defined using the information-theoretic statistic, the nDKL. Figure 3B plots the number of trials for the nDKL to become permanently positive; from this trial on, there is consistently positive evidence that the distribution of CS response rates and Pre response rates diverge. The black regression line fitted to the pigeon data (see Figure 1) accounts for 67% of the variance in the median trials to criterion (R2 = 0.67). Figure 3C plots the number of trials to reach an odds ratio of 10:1 in favour of a difference between CS and Pre response rates, based on the nDKL.
In Figure 3 it can be hard to appreciate the distribution of trials to acquisition data when the data for individual subjects (open grey diamonds) are superimposed. This is particularly the case for subjects learning after their first reinforcement, where individual data pile up on the x-axis, even when the informativeness is as low as 4. Figure 4 plots trials to acquisition (shown with reversed scale on the right axis; for odds 10:1) as a function of the mutual information between CS and US (bottom axis; equal to log2 of the informativeness, shown on the top axis). In this plot, the individual data for trials to acquisition data have been grouped in bins of approximately equal width on a logarithmic scale. Thus, the tiles in the first 4 rows represent single numbers of trials (1, 2, 3, and 4 trials to acquisition) and tiles in subsequent rows represent bins with wider ranges (trials 5-6, 7-9, 10-15 etc). The darkness of each tile in Figure 4 represents the number of rats within that bin. The left-hand axis shows the learning rate, computed as the reciprocal of number of trials to acquisition. Figures 3 and 4 show the large individual differences in trials to acquisition; subjects within the same informativeness group may have trials-to-acquisition values that differ by two orders of magnitude. Nonetheless, overall, the learning rate increases as the informativeness increases. The figure also makes clear that the number of rats learning after just one reinforced trial (top row) increases as the informativeness increases.
As a measure of the terminal level of responding, the response rates during the CS were averaged over the last 5 sessions for each rat. This rate was calculated as total number of responses (summed across all trials over the 5 sessions) divided by the total CS duration (summed across all trials over the 5 sessions). Our initial analysis compared this terminal response rate with the log of T, C, and C/T (α = 0.017 after correction for multiple comparisons). The rate of responding to the CS was marginally correlated with log(T) (see Figure 5A), r = −0.53, p = .051, but was not correlated with either log(C), r = −0.3, p = .298, or log(C/T), r = −0.04, p =.894.
More detailed analyses were conducted after segmenting the response rate into 3 separate components: (1) the latency to the first response in a trial; (2) the mean duration of each response (the time spent in the magazine); and (3) the inverse of the mean inter- response interval, 1/IRI, which equals the response rate after excluding the latency and response durations. None of these indices correlated significantly with log(C), largest r = −0.35, p =.221, or with log(C/T), largest r = −0.31, p =.281. On the other hand, log(T) correlated significantly with latency, r = 0.77, p < .001 (Figure 5B), and with 1/IRI, r = 0.89, p < .001 (Figure 5C). There was a marginal correlation between duration of responding and log(T), r = −0.55, p = .043, that did not survive correction for multiple comparisons (Figure 5D). As is evident in Figure 5B, the positive correlation between latency and log(T) was largely confined to groups with long CS-US intervals (T > 25 s), whereas latency varied little among groups with shorter CS-US intervals. This invariance at short CS-US intervals may have been due to a floor effect because mean latencies to first response did not decrease below 2 s (horizontal dotted line in Figure 5B). This is also consistent with the plots of response by time-in-CS, shown in plot 15 of Figure 2, where responding to the CS was low for the first few seconds after CS onset. This apparent floor effect might reflect a constraint on how quickly the rats can commence responding after CS onset or it might have arisen because 2 s was the minimum CS-US interval used in this experiment. Regardless, these analyses indicate that neither duration of responding nor latency to first response are good markers of what the rats learn about the rate of reinforcement of the CS. By contrast, the response rate, 1/IRI, varied systematically across the entire range of values of T (Figure 5C). This confirms earlier evidence that rats’ response rates scale with the log of the reinforcement rate (Harris & Carpenter, 2011).
The response rates in Figure 5A are computed by dividing the response count by the duration of the interval over which the count was made. However, when the CS reinforcement rate was high, subjects made from 2, to more than 5, pokes per second and the pokes lasted substantial fractions of a second. A rat cannot make another poke when its head is in the magazine. The correct computation of a response rate should use the time over which it was possible for pokes to be counted, that is, the cumulative time minus the cumulative head-in-magazine time. Also, the latency of the first poke was rarely shorter than 2 s (Figure 5B); whereas the inter-poke intervals after the first poke were much shorter when T was short (Figure 5C). To make a proper estimate of the rate at which a rat poked once it had begun, we divided the number of pokes made during the CS (excluding the first poke) by the remaining time in which it was possible to initiate a poke (i.e., the CS duration minus the latency to 1st poke and the cumulative head-in-magazine time). Likewise, to estimate the contextual rate of poking, we divided the ITI response count by the time in which an ITI response could be initiated (i.e., the length of the pre-CS interval minus the cumulative time with head in magazine). These methods for computing the CS and contextual rates of responding correspond closely to 1/IRI (shown in Figure 5C) except that they can still be computed when there are fewer than 2 responses (unlike the IRI). Figure 6 plots the terminal CS response rates (black dots) and contextual response rates (red dots) as functions of the CS/contextual reinforcement rate on double logarithmic coordinates.
Figure 6 shows that the log response rate while in the experimental context and the log response rate during CSs are both described by the same linear regression when plotted against the relevant reinforcement rate. It accounts for 81% of the variance. Its slope is essentially 1, which means that the two poke rates are the same scalar function of the two reinforcement rates. The scalar that relates response rate to reinforcement rate is 101.25 = 18, where 1.25 is the constant in the logarithmic regression. Thus, the average rates of poking are 18 times faster than the reinforcement rates. This maximally simple relation between reinforcement rate and poke rate holds from reinforcement rates almost as low as 1 in 100 min and corresponding poke rates as slow as 5 or 6/min up to reinforcement rates as high as 10/min (1 every 6 s) and corresponding poke rates of 90-110/min (1 every 0.5 s).
Put another way, the average time the rat waits between magazine entries is 1/18 (= .06) times the average time it expects to wait for food (also Harris & Carpenter, 2011). The same scalar relation between the behavioural wait and the expected wait for reinforcement applies when the rat is in an intertrial interval—when reinforcement is never delivered—and when it is in a CS interval. As the informativeness of a protocol gets lower, that is, as the CS occupies a greater and greater fraction of the subject’s time in the chamber, the difference between the CS poke rate and the contextual poke rate gets smaller because the difference between the CS reinforcement rate and the contextual reinforcement rate gets smaller.
As is also apparent in Figure 6, the variability about the scalar relation between poke rate and reinforcement rate also scales with reinforcement rate. That is why the regression must be computed in the logarithmic domain. The rmse of the loglog base 10 regression is 0.39. Thus, 68% of the observed poke rates fall within a factor of between 7.2 and 45 times the reinforcement rate.
Trials from onset of responding to peak responding
Having established the relationship between response rate and reinforcement rate, we next analysed how response rate increased over trials towards its maximum value. Our first analysis assumes that there is a consistent (monotonic) increase in response rate starting from the initial point of acquisition. This analysis followed a method recently described (Harris, 2022) that uses the slope of the cumulative response rate over trials to identify the trial on which the response rate had reached each decile (from 10% to 90%) of the peak response rate. Based on our earlier analysis (see Figure 5), our measure of the CS response rate was calculated by dividing the response count by the total time out of the magazine during the CS. As shown in Figure 7A, the response rate of an individual rat varies greatly from trial to trial. However, a clearer picture of the overall change in responding over trials can be obtained by plotting the cumulative response count against the cumulative opportunity to respond (cumulative time out of the magazine; Figure 7B).
The slope of the cumulative function can be used to estimate the rat’s response rate across conditioning to find when responding had reached a given proportion of the peak response rate. To analyse how response rates changed across the course of conditioning, we extracted a segment of each rat’s conditioning data starting from the trial, t1, on which the response rate during the CS became reliably greater than the ITI response rate and finishing at the trial, tend, on which the response rate reached its peak (according to a moving average with a window width of 3 sessions). The total change in responding across conditioning was calculated by subtracting the response rate at the start of this segment of trials, R1, from the peak response rate, Rmax (at the end of the segment): ΔR = Rmax – R1. To identify when responding had increased by 10% of ΔR, we estimated what the cumulative response count, cumR’, would be at each trial, t, if the rat maintained a fixed level of responding equal to the starting rate plus 10% of ΔR. Thus, cumR’t = R1 + 0.1٠ΔR٠cumTt, where cumTt is the cumulative CS-US interval from trial 1 to t. We then calculated the difference between cumR’t and the observed cumRt. The trial at which this difference was maximum was identified as the trial when the slope of cumRt had increased by at least 10% of ΔR. This process was repeated for all deciles up to 90%. An example of the values obtained for one rat is show in Figure 7C. In this example, the rats’ response rate had increased by 10% of ΔR on Trial 34, by 50% of ΔR by Trial 147, and by 90% of ΔR by Trial 233.
The analysis illustrated in Figure 7 was conducted on the individual data of all rats (except those rats missing data from Session 1). We excluded rats with a ΔR less than 0.1 responses/s. The mean number of trials to reach each decile for every rat in the 14 groups is shown in Figure 8. With some exceptions, for most rats, the relationship between the number of trials and response decile was roughly linear, meaning that their response rate increased uniformly over trials as it approached the peak response rate. This is clearest in the averaged functions (thick black lines in Figure 8). To investigate more precisely the relationship between trials to criterion, tc, and response decile, d, we compared 4 different functions for their fit to the data for each individual rat. Based on the apparent linear increase in trials across deciles, the first function tested was a straight line, tc = m.d +c. The second was an exponential function, tc = c.em.d, which has a continuously increasing slope and thus predicts a systematic increase across deciles in the number of trials between deciles. This function had most successfully captured the relationship between trials and response deciles in the data analysed by Harris (2022). The third function was an inverse cumulative Gaussian: tc = s.2½.erf-1[2.(m.d +0.5)–1] +c. This was used to model a stepwise increase in responding, as would be observed if responding were governed by a decision process when evidence for the CS-US relationship exceeded some threshold (Gallistel, Fairhurst, & Balsam, 2004). The fourth function was a log function, tc = -(loge[1–d])/k +c, derived as the inverse of a cumulative exponential function. This function models the relationship between trials and response criterion predicted by an error-correction learning algorithm such as used by the Rescorla- Wagner model (Rescorla & Wagner, 1972). To compare between these four models of the data, the Bayesian Information Criterion (BIC) was calculated as
where RSS is the residual sum of squares for the difference between each observed tc and its corresponding point on the fitted function, n is the number of points being fitted (= 9) and p is the number of free parameters in the function.
The overall performance of the functions can be compared by summing the BICs obtained from all rats. The function with the smallest ΣBIC is the function with most evidence. According to this analysis, the linear function had the most evidence, ΣBIC = 9588, followed by the exponential function, ΣBIC = 9974. The difference between these (ΔBIC=386) constitutes overwhelming evidence in favour of the linear model: BF = eΔBIC/2 = 7.3x1083 (Wagenmakers, 2007). The ΣBIC of the other two models were much higher again, 12128 and 10358, indicating that they had even less support from the data. In addition to this comparison of the aggregate BIC, a more specific comparison can be made by comparing the BIC for one model against the BIC for another for each rat. The scatter plot in the bottom right corner of Figure 8 plots, for each rat, the BIC for the exponential function against the BIC for the linear function (the two best-fitting functions). The orange line marks where these BICs are equal. The large majority of values (72%) sit above this line. These represent cases where the BIC for the linear function is lower than that for the exponential function, meaning that the evidence is stronger for the linear function. If we look at cases where the difference in BICs was greater than 4.6, corresponding to strong evidence in favour of one function over the other (a BF ≥ 10), there is strong evidence favouring the linear function over the exponential in 39% of cases, whereas only 9% of cases provide strong evidence in favour of the exponential function. The evidence favouring the linear function over either the inverse cumulative Gaussian or the log function is even stronger: 69% of cases provide stronger evidence (BF ≥ 10) in favour of the linear over the inverse cumulative Gaussian and 0% of cases favour the latter function; 59% of cases provide strong evidence in favour of the linear over the log function and only 8% of cases strongly favour the log function over the linear.
The above analyses reveal an overall tendency for the response rate to increase approximately linearly up to the point where the peak response rate is reached. To test how the rate of this increase varied across groups, we calculated the correlation coefficient between the slope of the line for each group, as shown in Figure 8, and the value of C, T, or C/T. These correlations did not include the 3 groups given just 126 trials because the smaller number of trials substantially reduced their slopes. For the other 11 groups, the slope was not significantly correlated with C/T, r = 0.51, p = .110, or with C, r = 0.08, p = .938, but was significantly negatively correlated with T, r = −0.73, p = .011 (see plot titled “Slope” in Figure 8). This suggests that the response rate increased more quickly when the reinforcement rate of the CS (1/T) was higher, which is not surprising given that the peak rate of responding also increased systematically with reinforcement rate (Figure 6).
The preceding analysis assumes that there is an overall monotonic increase in responding over trials. However, this trend is not always apparent, particularly when the CS informativeness is low. Such irregular and non-monotonic changes in responding are revealed by an analysis (see description of the Kullback-Leibler divergence, and the nDKL, in Methods) that uses the nDKL to parse the response rates into segments that have significantly different rates (either higher or lower). In such cases, the path to the peak rate can be bumpy and the peak can be higher than the terminal response rate, as illustrated in Figure 9 for 6 rats from the group with ι = 1.5. By contrast, when the informativeness is very high, the peak is usually reached almost immediately and there is little subsequent variation in the poke rate (Figure 10). The parsing of response rates shown in Figures 9 and 10 used a stringent decision criterion; the algorithm found a step up or down only when the evidence for a divergence exceeded 6 nats. On the null hypothesis, a divergence that large would occur by chance about once in 1800 tests, well above the number of tests made when parsing the data over the 420 or 126 trials of the experiment. When the divergence between the contextual rate of reinforcement and the CS rate is high, as in Figure 10, there are only between 0 and 2 steps in the CS poke rate over 126 trials. By contrast, when the divergence between the contextual rate and the CS rate of reinforcement is low, as in Figure 9, there are many up and down steps, some lasting only a few trials. Thus, the linear increases to a peak revealed by the preceding analyses, should not be taken to indicate that a steady increase is routinely seen in the individual subjects regardless of the informativeness. When informativeness is low, the post- acquisition rate of responding during CSs and the difference between it and the rate during ITIs are unstable but trend upwards. When informativeness is high, a stable asymptotic rate generally appears after only one or two early steps, the first of which is often the step after the first trial.
Discussion
The present results have provided several important pieces evidence about the nature of the conditioned response acquired by rats in an appetitive Pavlovian paradigm. The first is that the initial acquisition of responding was determined by the C/T ratio. Neither T nor C had any independent effect after partialling out the effect of C/T. This is the same conclusion reached by Gibbon and Balsam (1981) in their meta-analysis of the acquisition of autoshaped key- pecking by pigeons. As shown in Figure 3, responding to the CS emerged sooner as the C/T ratio increased, or, more precisely, the log of the number of trials to acquire a response scaled negatively with the log of C/T.
The dependence of trials to acquisition on C/T extends over the entire range from 1 to infinity. Contrary to what Gibbon and Balsam supposed, but in agreement with what Jenkins et al. (1981) showed, the regression continues to describe the data all the way to the abscissa, that is, to acquisition after 1 reinforcement. It is remarkable that the 1-parameter regression model that describes the pigeon acquisition data also describes the current rat magazine-poking data—and with essentially the same parameter value (Figure 3).
The second finding is that, when each CS is reinforced so that reinforcement rate scales inversely with T, the response rate after extended conditioning is strongly related to T (but not C or C/T). More specifically, the terminal CS response rate is a scalar function of the CS reinforcement rate (red symbols in Figure 6). Related to this is the third, and unexpected, finding that the response rates during the never-reinforced ITIs is proportional to the contextual rates of reinforcement (black symbols in Figure 6). It is particularly noteworthy that the constant of proportionality relating the ITI response rates to contextual reinforcement rates is the same as that relating the CS response rates to CS reinforcement rates. To the best of our knowledge, the dependence of the rate of responding during the ITIs on the contextual rate of reinforcement—rather than on the probability of reinforcement or the rate of reinforcement during the ITIs—has not been previously reported.
The fourth finding is that the post-acquisition response rate trends linearly upward to a peak. The higher the rate of reinforcement of the CS, the steeper that linear increase (Figure 8). However, the path to the peak rate has several ups and downs, particularly when informativeness is low, and the terminal rate can be lower than the peak rate (Figure 9). When informativeness is high, the response rate generally rises to the peak in one or two steps. The first and often final step often occurs after the first trial (Figure 10).
The first finding confirms results reported previously in studies of Pavlovian conditioning in pigeons. Perhaps the most remarkable aspect of the observed relationship between trials to acquisition and C/T is that the data collected from conditioning experiments with pigeons and the data reported here are consistent with a simple regression function that has a slope of -1 over most of its range and an intercept of 255. As discussed later, this finding is strong evidence in favour of the notion that conditioning is directly related to how much information the CS provides about the US. That rats poking into a magazine would satisfy the same model with essentially the same value for its one parameter was not to be expected.
The previous published evidence concerning the relationship between C/T and acquisition of Pavlovian conditioning with rodents has been mixed. Lattal (1999) and Holland (2000) reported evidence that conditioning in rats was related to C/T, as did Ward et al. (2012) in a series of experiments with mice. However, Kirkpatrick and Church (2000), and, more recently, Thrailkill et al. (2020), found no evidence for an effect of C/T on trials to acquisition in rats. Moreover, Lattal, Holland, and Thrailkill et al. all found evidence for an effect of T when C/T was held constant. We suggest that these inconsistencies relate to differences in how response acquisition was measured across the studies. Our results show that the point at which evidence for conditioned responding first emerges is directly related to C/T, and not to C or T alone, but the level of responding subsequently acquired is directly related to T and not to C/T or C. Therefore, both C/T and T will affect any index of conditioning that is sensitive to both the time when responding emerges and how much responding is subsequently acquired. Lattal’s evidence for an effect of T between groups matched on C/T was obtained in a test session conducted after 4 conditioning sessions totalling 48 trials. Holland observed more responding in the last 8 of 16 conditioning sessions and also found a steeper increase in responding over sessions for rats trained with short values of T despite matched C/T. In both cases, the observed effects of T could have been due to its effect on the level of responding acquired rather than how quickly responding emerged. A similar argument can be made for the evidence provided by Thrailkill et al., based on the evidence that our rats took substantially longer to reach the Thrailkill et al. criterion for response acquisition than to reach the criterion we developed (see Supplementary Materials). This suggests that, to satisfy their criterion for acquisition, the rats acquired a higher level of responding which would have been affected by T. Finally, the absence of evidence for an effect of C/T in the study by Kirkpatrick and Church may also have been due to the particular method they used to assess trials to acquisition. Their method, when applied to the present data (see Supplementary materials), produced smaller differences between groups and therefore a weaker (albeit significant) correlation between trials to acquisition and log(C/T). Thus, their method may have been less sensitive to differences between groups in their rate of acquisition, which could explain why they failed to see an effect of C/T across the relatively limited range of C/T ratios they tested (from 1.5 to 12).
Had it been appreciated in the 1960s that the rate of responding during ITIs has the same scalar relation to the contextual rate of reinforcement as the rate of responding during CSs has to the CS rate of reinforcement, the results from Rescorla’s (1967, 1968) truly random control and from Kamin’s (1968) blocking experiments would have been considered entirely predictable. In both protocols, the reinforcement rate during the (target) CS is the same as the rate expected in the context in which it occurs. In the truly random protocol, the context is the test chamber; in a blocking protocol, it is the previously conditioned CS.
The contextual rate of reinforcement plays a fundamental role in Rate Estimation Theory (RET, Gallistel & Gibbon, 2000). It is the first term in the vector of uncorrected rates. The uncorrected vector is the list of the observed rates of reinforcement for each possible predictor. The corrected rate vector is the list of reinforcement rates subjects ascribe to the actions of the different predictors. The matrix equation that does the assignment is based on the assumption they act independently, in which case the ascribed rates must sum to the observed rates when the predictors co-occur. The contextual rate of reinforcement also plays a fundamental role in the information-theoretic model of acquisition (Balsam, Fairhurst, & Gallistel, 2006; Balsam & Gallistel, 2009; Ward et al., 2012). In that model, the appearance of differential responding to the CS is determined by the informativeness of the protocol, which is the ratio between the CS reinforcement rate and the contextual reinforcement rate. The information transmitted to a subject by CS onset is the log of the informativeness.
The results in Figure 6 confirm that the subjects compute the contextual rate of reinforcement. It determines how aroused they are when in that context. It has long been recognized that the contextual rate of reinforcement in appetitive operant conditioning has a scalar effect on foraging activities (Belke, 1992; Drew, Zupan, Cooke, Couvillon, & Balsam, 2005; Killeen, 2023; Killeen, Hall, & Bizo, 1999; Killeen, Hanson, & Osborne, 1984). The effect of the operant contingencies is to channel that activity into the activity or activities on which reinforcement is contingent (Gallistel & Shahan, 2024). It now appears that the same is true in Pavlovian conditioning. Reinforcement is contingent on poking into the magazine. During the ITIs, subject’s poke rate is 18 times the reinforcement rate expected when in the experimental context; during the CSs, it is 18 times to the reinforcement rate expected when in the context and in the presence of the CS.
Information-Theoretic Contingency
Contingency in Pavlovian and operant conditioning has long resisted a mathematical definition that made it measurable in all circumstances, particularly when there is no time at which reinforcement may be anticipated hence no time at which failures of reinforcement to occur can be counted (Donahaoe, 2006; Gallistel, 2021; Gibbon, Berryman, & Thompson, 1974; Granger & Schlimmer, 1986; Hallam, Grahame, & Miller, 1992; Hammond & Paynter, 1983). Gallistel and Latham (2023) have developed a generally applicable measure based on the trivially computable prospective and retrospective mutual information between CSs and reinforcements or between responses and reinforcements. Given two distinguishable event streams X and Y—for example a stream of CS onsets and a stream of reinforcements at CS termination—there is prospective mutual information between the x events and the y events, , when the expected wait to the next y, conditional on an x, , is reliably shorter than the expected wait between the y’s (μy), as defined in Equation (1). There is retrospective mutual information, , when the expected wait looking back from a y to the most recent x, , is shorter than the expected wait between the x’s (μx), as in Equation (2):
The arguments of the log function in Equations (1) and (2) are the ratio of the unconditional wait (in the numerator) to the conditional wait (in the denominator), or equivalently, the ratio of the inverse of the waits, conditional rate in numerator and unconditional rate in denominator. Because its logarithm is the mutual information, Balsam et al. (2006) have termed this ratio the informativeness of a Pavlovian protocol.
Gallistel and Latham (2023) define contingency as the ratio of the mutual information to the available information, which is the amount that reduces subjective uncertainty to 0. Because measurement error scales with latency (Weber’s Law, see Gibbon, 1977a), contingency equals one only when the x’s and y’s coincide. How to measure the available information is often unclear. Mutual information, however, is trivially computed, as in Equations (1) and (2).
Another information-theoretic measure, nDKL, measures the degree to which computed mutual information may be trusted. Its role, relative to mutual information, is analogous to the role of p values relative to correlations. The DKL in the nDKL is the Kullback- Leibler divergence. It is analogous to the effect size in conventional statistics. The effect size is the normalized distance between two distributions assumed to have the same variance. Distance is symmetric, i.e., D(X,Y) = D(Y,X), but divergence is not: DKL(X||Y) ≠ DKL(Y||X). The asymmetric information-theoretic divergence is arguably the better measure, because the amount of data required to determine whether an X distribution differs from a Y distribution is not the same as the amount required to determine whether the opposite is true—and rats and mice are sensitive to this asymmetry (Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012).
The DKLof one exponential distribution from another depends only on their rate parameters:
The uncertainty regarding the values of estimates for λX and λY depends on the sample sizes, nXand nY. Peter Latham (Gallistel & Latham, 2023, see their Appendix) has recently shown that when λX = λY(the null hypothesis),
where ne is the effective sample size; ne = nX⁄(1 + nX⁄nY); and np is the size of the parameter vector. For the exponential, np= 1. Therefore, as described earlier in Equation (2),
Equation (5) measures how “significant” an observed amount of mutual information is, how unlikely it is to have arisen by chance, and, by Equation (4), nDKL(X||Y)exp∼Γ(0.5,1), nDKL’s, so it may be converted to the more familiar p values. Both are measures of the strength of statistical evidence. Conversion is motivated only by sociological considerations.
Equations (1, 2 and 5) and the RET equation (Gallistel, 1990)—constitute a computationally simple, parameter-free model of associative learning. Equation (2) enables us to address the question, How sensitive are subjects to the strength of the evidence for observed mutual information? Put another way, Does their behavioural sensitivity to the accumulating mutual information suggest that they make a rational assessment of the extent to which they can trust the mutual information so far observed? Figure 11 plots the cumulative distributions (CDFs) of the values for nDKL(λR|CS||λR|C)—the nDKL for the evidence that the CS rate of reinforcement, λR|CS, differs from the contextual rate of reinforcement, λR|C, when subjects’ increased poke rates during CSs meets increasingly stringent evidentiary criteria for acquisition.
The medians of the blue CDFs in Figure 11 generally fall at or below 3.2 nats worth of evidence, which corresponds to a p value of .01. This implies that they assess appropriately the strength of the accumulating evidence for reliable mutual information. The CDFs for the 10:1, 100:10, and 1000:1 evidentiary criteria cluster together well to the right, toward absurdly stringent criteria. (The odds against when the nDKL is greater than 10 are more than 10,000:1.) The strong rightward shift in the strength of the stimulus evidence at acquisition occurs because the behavioural effect size is small and noisy (Figure 11). The low behavioural effect size requires larger n’s (more trials) to satisfy increasingly stringent evidentiary criteria (Figure 3). The Kullback-Leibler divergence of the CS reinforcement rate from the contextual rate is also small when the informativeness is low; when iota = 2, it is only 0.2 nats. However, the effect of the small ratio of the poke rates on the n’s required for increasingly stringent evidentiary criteria outweighs the small divergence in determining the amount of evidence for mutual information subjects have acquired after a given number of reinforcements.
The divergence is a decelerating function of informativeness, but its initial rise is steep: An informativeness of 10 yields a divergence of 1.5 nats. Thus, when a CS transmits 3.3 bits of mutual information per trial, 3 reinforced trials provide on average 4.5 nats of evidence, which corresponds to Odds of 370:1 in favour of the conclusion that the observed mutual information is reliable. This explains why CDFs for more stringent criteria in Figure 11 migrate leftward as informativeness increases. When informativeness exceeds the value required for 1-trial acquisition in the median subject, the median subject satisfies the Odds 100:1 criterion when there is only 2.4 nats of evidence for the mutual information, which corresponds to Odds of 35:1 (p < .03). Again, this a rational evidentiary criterion.
Conclusion
Associative learning obeys simple equations that map from measurable properties of subjects’ experience to measurable properties of their behaviour. The equations have at most one free parameter, and it is a scale factor with a data-anchored interpretation. The poke rate, for example, is 18 times the reinforcement rate in the median subject (Figure 6) over a range that covers 3 orders of magnitude. The variability about this regression is also multiplicative, with a scale factor of 10.39 = 2.5. A second example is the simple relation between the learning rate (the reciprocal of reinforcements to acquisition) and the informativeness of a Pavlovian protocol. The log of the learning rate is -1 times the mutual information; the constant, k, in this regression is the informativeness that produces 1-trial acquisition (Figures 1, 3, and 4). A third example is the parameter free rate estimation equation. In a 1-CS protocol, that equation is:
Equation (6) maps the observed contextual rate of reinforcement, λR|C, and the observed rate of reinforcement when the CS is also present, λR|C&CS, to the rates of reinforcements that subjects ascribe to these predictors, . It does so by way of the inverse of a simple matrix. The only element not equal to 1 in the matrix in Equation (6) is the inverse of the informativeness, which is also the protocol parameter in the learning-rate equation. Equation (6) explains the results in the cue-competition literature (a.k.a. the assignment-of-credit literature). (For review and comparison to the Rescorla-Wagner model, see Gallistel, 1990, Chapter 13).
These equations bring out the formal structure of the data from associative learning experiments. They map an always defined and computable aspect of subjects’ experience— rate of reinforcement—to measurable properties of subjects’ behaviour. They contrast with equations that map from the often-undefinable probability of reinforcement to a hypothetical construct like associative strength or expected value (Honey, Dwyer, & Iliescu, 2020; Ludvig, Sutton, & Kehoe, 2012; Pearce & Hall, 1980; Rescorla & Wagner, 1972; Sutton & Barto, 1990; Vogel, Ponce, & Wagner, 2019).
That the rates of reinforcement rather than the probabilities are the inputs to the equations that map perceived associations to behaviour calls into question the basic assumption in the common understanding of associative processes, the assumption that associative learning is mediated by the incrementing and decrementing of some scalar brain quantity (e.g., the conductivity of a plastic synapse) by reinforcement and non-reinforcement, respectively. This assumption is central to all models of Pavlovian and operant learning that map probability of reinforcement to associative strength and is central to all reinforcement learning models that map it to a weight state in a neural net (supervised and unsupervised learning) or to an expected long-term reward (reinforcement learning). They all fail to “leverage the statistical and computational structure of [the] problem” (Russo, Roy, Kazerouni, Osband, & Wen, 2018, p. 6), because they must discretize time into unobserved trials or states with unspecified durations. They must do this because, unlike a rate that has a temporal unit, a probability is unitless.
A probability is the ratio between a count of reinforcements and the sum of the count of reinforcements and non-reinforcements. It is impossible to map from probability of reinforcement to behaviour in the general case because the non-reinforcements in the denominator are unobserved events with no physical properties, and, in the general case, often uncountable. They are countable only when reinforcement fails to occur at an expected time of reinforcement. In our protocol and many others the reinforcements occur at an unpredictable time or times following CS onset. The durations of our CS were drawn from a uniform distribution and the reinforcements coincided with its termination. In Rescorla’s (1967, 1968) experiments with truly random controls (and the many follow-ons), the reinforcements were programmed by Poisson processes. The defining feature of a Poisson process is its flat hazard function: there is no moment at which a reinforcement is any more likely than at any other moment.
Unlike models that take probability of reinforcement as the essential aspect of subjects’ experience, all of which have at least two free parameters, the model of associative learning provided by the simple equations we present here directly explains quantitative facts about associative learning that have gone unexplained for decades: One such fact is that reinforcements to acquisition are unaffected by partial reinforcement (Balsam & Gallistel, 2009; Gallistel, 2003; Gallistel, Craig, & Shahan, 2014; Gibbon, Farrell, Locurto, Duncan, & Terrace, 1980; Gottlieb, 2004, 2005). This follows immediately from the equation for the learning rate as a function of informativeness (top of Figure 1), because partial reinforcement does not alter informativeness. A second such fact is that deleting reinforced trials while retaining the spacing of the remaining reinforced trials does not alter the temporal progress of acquisition (Bouton & Sunsay, 2003; Gottlieb, 2008). The number of reinforced trials in a given amount of training time is irrelevant, given only that there is at least one. This, too, follows from the equation that relates the learning rate to the informativeness or to its log, the mutual information (Figures 1 and 3). Because the slope on a loglog plot is essentially –1 over most of the useable range of values for informativeness, halving the number of trials doubles the informativeness and that doubles the learning rate.
We conclude that models of associative learning based on probability of reinforcement cannot be correct because the rate of responding, the learning rate and the assignment of credit are simple functions of rate of reinforcement, a quantity with temporal units. Only models that leverage the metric temporal structure of subject’s experience can be neurobiologically realisable.
Methods
Subjects
A total of 176 experimentally-naive female albino Sprague Dawley rats (8 to 10 weeks of age) were obtained from Animal Resources Centre, Perth, Western Australia. They were housed in groups of 4 in split-level ventilated plastic tubs (TechniplastTM), measuring 40 x 46 x 40cm (length x width x height), located in an animal research facility at the University of Sydney. They had unrestricted access to water in their home tubs. Three days before commencing the experiment, they were placed on a restricted food schedule. Each day, half an hour after the end of the daily training session, each tub of rats received a ration of their regular dry chow (3.4 kcal/g) equal to 5% of the total weight of all rats in the tub. This amount is approximately equal to their required daily energy intake (Rogers, 1979), and took at least 2 h to be eaten (but was usually finished within 3 h). Rats on this schedule do not typically lose weight (and never more than 10%) but gain weight only very slowly. All experimental procedures were approved by the Animal Research Authority of the University of Sydney (protocol 2020/1840).
Apparatus
Rats were trained and tested in 32 Med Associates™ conditioning chambers distributed equally across four rooms. Twenty-four chambers (Set A) measured 28.5 x 30 x 25 cm (height x length x depth) and the other eight (Set B) were 21 x 30.5 x 24 cm (height x length x depth). Each chamber was individually enclosed in a sound- and light-resistant wooden shell (Set A) or PVC shell (Set B). The end walls of each chamber were made of aluminum; the sidewalls and ceiling were Plexiglas™. The floor consisted of stainless-steel rods, 0.5 cm in diameter, spaced 1.5 cm apart. Each chamber had a recessed food magazine in the center of one end wall, with an infra-red LED and sensor located just inside the magazine to record entries by the rat. A small metal cup measuring 3.5 cm in diameter and 0.5 cm deep was fixed on the floor of each food magazine either in the center (Set A) or offset to the left of center (Set B). Attached to the food magazine was a dispenser delivering 45-mg food pellets (purified rodent pellets; Bioserve, Frenchtown, NJ). Illumination of an LED (Med Associates product ENV- 200RL-LED) mounted in the ceiling of the magazine served as the CS. Experimental events were controlled and recorded automatically by computers and relays located in the same room. Throughout all sessions, fans located in the rear wall of the outer shell provided ventilation and created background noise (between 61 and 66 dB, depending on the chamber).
Procedure
The rats were allocated to the 14 groups shown in Table 1. The experiment was run with three separate cohorts of 64, 80, and 32 rats, so that by the end of the experiment each of the 14 groups had at least 12 rats (Groups 6 and 10 had 16 rats, as described next). In the first cohort, the data for 4 rats in each of Groups 6 and 10 were not saved in Session 1 due to human error. These 8 rats continued through the entire experiment but only their data for the final 5 sessions were used (to calculate their terminal response rate). An extra 4 rats were run in both groups in the second cohort.
The rats were not given magazine training. The experiment commenced with the first conditioning session and continued for 42 sessions over 42 consecutive days. Within each group, the CS-US interval varied from trial-to-trial according to a uniform distribution centred on the value of T for that group (the interval varied from a minimum of 2 s to a maximum of 2xT–2 s). The ITI in each group also varied from trial-to-trial as a uniform distribution with a minimum of 15 s. Between groups, T varied from 6 s to 62 s, and C varied from 63 s to 4,200 s (70 min) as summarised in Table 1. The combinations of C and T gave rise to 14 distinct C/T ratios that were approximately evenly distributed on a log scale ranging from a ratio equal to 1.5 (63/42) up to 300 (4200/14). For the first 11 groups (those with ratios from 1.5 to 72), there were 10 trials per session; for the remaining three groups (with C/T ratios of 110, 180 and 300), each session contained only 3 trials so that the total session time remained within a manageable length (less than 4 h). All groups were trained with one session per day for a total of 42 sessions. Photo-beam interruptions by entry into the magazine were recorded during each CS and each ITI (recorded during the 20-s period immediately before CS onset).
Data analysis
Several different indices were used to identify when responding to the CS first appeared. The first index involved creating, for each rat, cumulative records of response counts during the CS and during the ITI, then converting these into cumulative records of response rates by dividing the cumulative count at each trial by the cumulative time (CS or ITI) at that trial. The primary index used here to establish when rats start responding to the CS is based on the difference between these cumulative response rates (cumulative CS rate minus cumulative pre-CS ITI rate). The minimum value of this difference record identifies the trial after which the cumulative response rate during the CS becomes permanently greater than the cumulative ITI response rate. (This takes into account cases in which the CS response rate is initially lower than the ITI rate, wherein the difference in cumulative records decreases below zero but then reverses and starts to increase as soon as responding during the CS begins to outstrip ITI responding.) We used a one-sample t-test using to identify when this difference in the cumulative response rates became statistically reliable according to a significance threshold of p < .05. We have also used a new information-theoretic statistic, the nDKL (see next section), to identify when the rate of responding during the CS began to exceed that during the ITI. Additional analyses were run adopting measures used in previous studies (Kirkpatrick & Church, 2000; Thrailkill et al., 2020) to identify trials to a learning criterion. These analyses and results are described in the Supplementary Materials.
Analyses were also conducted to assess how T, C, and C/T affect conditioned responding after the point when it first emerges. These analyses examined how the response rate increased over trials and what level of responding was reached after extended training. The level of responding ultimately acquired by each group was computed as the mean response rate over the last 5 conditioning sessions. Further analyses broke down the response rate into separate components: the latency to first response; the mean duration of each response (time in the magazine); and the interval between responses (time out of the magazine). The increase in responding over sessions was assessed using a method described by Harris (2022) that uses the slope of the cumulative response record to identify the number of trials required for responding to reach successive response milestones corresponding to each decile of the rat’s peak response rate. Based on findings relating response rates to reinforcement rates (1/T) using this paradigm (Harris & Carpenter, 2011), we hypothesised that responding at the end of conditioning would be related to T, and indeed would scale linearly with log(T), but would not be related to C or C/T. We had no clear hypotheses about whether T, C, or C/T would affect how quickly responding increases after it has emerged.
The Kullback-Leibler divergence, and the nDKL
Comparing the CS poke rate to the ITI poke rate reinforcement-by-reinforcement is problematic in cases where one of both rates are undefined because the subject has not yet made a poke. When a subject has made 5 pokes during the first 2 CSs and no pokes during the much longer ITIs, one does not want to conclude that the subject has not yet acquired a conditioned response to the CS. A second problem when using conventional statistics like the t-test is that one is required to specify in advance the sample size.
We circumvented these difficulties by reformulating the null hypothesis as a comparison between the contextual rate of responding (during the ITI) and the rate during the CS and by using an information-theoretic statistic to measure the strength of the evidence that the CS rate of responding differs from the contextual rate (Gallistel & Latham, 2023). The information theoretic statistic measures the strength of the evidence against the hypothesis that the distribution of inter-poke intervals during CSs is the same as the distribution in the context in which the CS occurs. Put another way, the null hypothesis is that poking during CSs does not differ from the poking expected because the subject pokes into the magazine in response to the fact that pellets sometimes drop there without regard to the signal value of the CS.
The CS poke rate, λr|CS, at any point in training is the number of pokes made during the CSs divided by their cumulative duration: λr|CS = nr|CS⁄DCS. And likewise for the contextual poke rate: λr|C = nr|C⁄DC. In these calculations, DCS is the cumulative duration of the CS and DC is cumulative training time. Assume for example that cumulative CS time as of the 2nd reinforcement is 20 s, total training time is 1000 s, and the subject has made 5 pokes during the two CS intervals and no pokes during the ITIs. Then, λr|CS = 5⁄20 = .25/s and λr|C = 5⁄1000 = .005/s. The ratio of the two estimates is .25/.005 = 50:1, that is, the subject’s observed poke rate at this point in training is 50 times faster during the CSs than would be expected given the estimate of how frequently it pokes in the training context. This discrepancy is unlikely to have arisen by chance.
An information theoretic statistic, the nDKL, measures the unlikeliness (Gallistel & Latham, 2023). The DKL is the Kullback-Leibler divergence, a measure of the distance between two distributions. The divergence of one exponential distribution from another and depends only on the rate parameters of the distributions. Therefore, as per Equation (3), the divergence of λr|CSfrom λr|C is
It is the information-theoretic measure of the extent to which an exponential distribution with rate parameter λr|CS diverges from an exponential distribution with rate parameter λr|C. The divergence is, roughly speaking, the equivalent of the effect size in a conventional analysis. The equivalence is rough because the effect size—the normalized distance between the means—is symmetric, whereas the divergence is not: DKL(λr|CS||λr|C) ≠ DKL(λr|C||λr|CS).
The nDKL measures the additional cost of encoding n data drawn from the exponential distribution with parameter λr|CSon the assumption that they come from an exponential distribution with rate parameter λr|C(Cover & Thomas, 1991). It multiples the DKL by the effective sample size, n = nr|CS/(1 + nr|CS/nr|C). Therefore, as per Equation (5),
When there is no divergence, the nDKL is distributed gamma(.5,1) (for proof, see Appendix in Gallistel & Latham, 2023). Thus, we can convert the information-theoretic measure of the strength of the evidence for divergence to the more familiar p value measure.
In the illustrative example, DKL(. 25||.005) = 2.93 nats (The nat is a unit of information equal to the base of the natural logarithm. The Kullback-Leibler divergence must be computed using the natural logarithm, but the result in nats may be converted to bits by multiplying by log2e = 1.44.), and n = 5/(1+5/5) = 2.5. Therefore, the nDKL = 2.5 × 2.93 = 7.32 nats and 1 − gamcdf(7.32, .5,1) ≅ .0001. The evidence against the null hypothesis is very strong; the odds against the assumption that all 5 pokes have occurred only during the CSs only by chance are on the order of 10,000 to 1. This example illustrates one approach to the statistical comparison of poke rates following each of the first few variable duration CSs, each terminating in a reinforcement.
Another approach to estimating the onset of conditioned poking comes from parsing the CS and ITI inter-poke interval vectors into segments with significantly different rate parameters. Parsing of the inter-poke interval vectors was also motivated by a desire to capture differences between subjects in the often bumpy evolution of their post-acquisition rate of responding.
Our parsing algorithm recursively extends the length, ne, of the vector of inter-poke intervals one interval at a time. After each extension, it compares the rate estimate for each successively longer subsequence to the rate estimate for the full sequence, using the nDKL statistic. These comparisons generate the function nDKL(ns) for ns ≤ ne. Whenever max[nDKL(ns)] > c, the parser truncates the inter-poke interval vector at the location of the maximum. The value for the decision criterion, c, is user supplied and generally falls between 2 and 6 nats. The rate estimate for the segment truncated is the number of pokes up to the truncation divided by the sum of the intervals up to the truncation. The algorithm operates recursively on the post-truncation portions of the vector until there is no significant subsequence. The Matlab™ code for the custom expparser.m function is provided in the Supplementary Materials. The ParseTable.xlsx file in the Supplementary Material gives results for values of 2, 4 and 6 nats. The corresponding p values are .05, .005 and .0005. Because parsing inescapably involves multiple comparisons, highly conservative decision criteria are generally to be preferred. When reporting results, we give only the results for c = 6nats.
In computing the parses, we did not include the first CS poke in any given CS in our estimate of the CS poke rate because the latencies to the first pokes clearly came from a different distribution than the distribution of inter-poke intervals. In protocols with a short mean CS, the mean inter-poke interval (the reciprocal of the poke rate) was as short as 0.1 s; whereas the mean latency to the first poke in a CS was rarely shorter than 2 s. The explanation for the slow first pokes is probably the fact that the duration of a CS—hence the minimum reinforcement latency—was never shorter than 2 s. In the denominator of our poke rate estimates, we excluded the interval to the first poke. We also excluded the intervals when the head was in the hopper, because a poke cannot be made when the head is in the hopper. CS durations with no pokes were also excluded because there was no way to estimate the first- poke latency. Thus, our estimates of the poke rates included only CSs where there were pokes and the n’s in the numerators of those estimates were only the counts after each first poke.
In the denominator was the cumulative duration of the CSs that had at least one poke minus the cumulative latencies to the first pokes minus the cumulative duration of the hopper entries. The estimates of the poke rates during the Pre periods were the cumulative number of pokes in those intervals divided by their cumulative duration. The estimates of the contextual poke rates were the Pre poke rate estimate and the CS poke rate estimate weighted respectively by the cumulative ITI duration divided by the cumulative training duration and by the cumulative CS duration divided by the cumulative training duration.
The initial rate estimate in a parse extends back to the beginning of observation, which makes it possible to plot, for example, an initial ITI poke rate that starts at the beginning of training even though the first poke during a Pre interval may not have occurred until the 5th ITI. The parsing gives an alternative way of comparing rates of poking as of each successive reinforcement for the first few reinforcements. This becomes important when informativeness is high because then conditioned poking appears very early in training, as early as the second CS presentation.
Given the novelty of the methods used to estimate the onset of conditioned poking, we thought it essential to provide plots of the results on which these estimates are based. The supplementary material gives one 4-plot figure for each subject. Figures 12 and 13 are examples of these plots. The bottom row of plots in Figures 12 and 13 show the functions plotted over the entire 420 reinforcements (hence CS trials). In the top row, these complete plots have been right-cropped to better show what happens early in training.
In the left column of Figures 12 and 13, three cumulative poke-rate functions are plotted in black against the left axes. The poke rate as of a given reinforcement is the cumulative number of pokes as of that reinforcement divided by the cumulative duration of the relevant interval. In other words, it is the average rate as of a given reinforcement The CS average rate is a solid line, the Pre average rate a dashed line. The contextual average rate is a dotted line.
In the left column of Figure 12, all three averages rise rapidly over the first 40 trials and then decline. The Pre average (dashed line) is generally greater than the CS average (solid line) until about the 110th trial, after which it is consistently lower. Both averages steadily decline up to about Trial 300, implying a persistent drop in both rates following their peak at around Trial 40. At about Trial 300, the average CS poke rate begins to rise rapidly, implying that the CS poke rate increased at the start of the rise. The slight rise in the dashed curve indicates a slight increase in the Pre poke rate as well. The successive changes in the poke rates inferred from these plots of the cumulative rate estimates are confirmed by the parses plotted in the right column of Figures 12, where the solid black line plots the parse of the CS poke rate and the dashed black line the parse of the Pre rate. The Pre poke rate started lower than the CS rate but jumped after about 10 reinforcements to a higher value. At about Trial 40, both rates became the same, but at about Trial 85, the Pre poke rate dropped permanently below the CS rate. At about Trial 300, there is a sequence of increases in the CS poke rate, accompanied by a single smaller increase in the Pre poke rate. The decision criterion used in this and all the plotted parses was 6 nats, so one may have substantial confidence that the changes are statistically significant.
The signed nDKL function is plotted in red against the right axis in the left columns of Figures 12 and 13. The nDKL is always positive because the magnitude of a divergence cannot be negative. The direction of a divergence may, however, vary; the Pre rate may be greater than or less than the CS rate. To make the direction of divergence apparent, we give the nDKL positive sign when the CS rate is greater than the Pre rate and negative sign when the reverse is true. In the left column of Figure 12, there is a short initial positive spike in the signed nDKL followed by an interval of several tens of reinforcements when it is negative. After about 40 reinforcements, it hits its minimum and began a more or less steady climb. Its last upward crossing of the 0 line is at about Trial 110. The upper limit on the nDKL axis (right axis) is set at 6 nats, because the evidence for acquisition is decisive when that value is exceeded. (When nDKL = 6, the odds are greater than 2000:1 against the null hypothesis.)
The vertical red lines on the four plots are drawn at the values for various estimates of reinforcements to acquisition. The solid heavy red line is drawn at our earliest estimate of the onset of CS-conditional responding. In Figure 12, this is at the minimum of the signed nDKL, which is more often than not what we take to be the best estimate of where conditioned poking to the CS first appeared. The thinner solid red line is for a generally more conservative estimate of first appearance, namely, the reinforcement after which the signed nDKL becomes permanently positive (last upward 0 crossing). The five vertical dashed red lines are for increasingly stringent evidentiary criteria: odds against the null of 4:1, 10:1, 20:1 (p = .05), 200:1 (p = .005) and 2000:1 (p = .0005).
The subject whose results are plotted in Figure 13 made no pokes during the first 2 CSs, one during the 3rd CS, none during the 4th, one during the 5th and 16 on the 6th.It made no pokes during the first five 30s Pre intervals. Consequently, the average ITI poke rates and the average contextual poke rates are undefined over the first 5 trials and so is the nDKL. On Trial 6, all 3 statistics become defined—and the nDKL is already off scale, which is why it does not appear in Figure 13a&c. The nDKL on Trial 6 is 20.7 nats, which corresponds to odds of 8 billion to 1 against the null. Consequently, the minimum in the signed nDKL, the point where it became permanently positive, and the 5 the increasingly stringent evidentiary criteria for acquisition all fell at Trial 6, as indicated by the superposed red verticals at the right edge of Figure 13a. One might conclude that CS-conditional responding in this subject did not appear until Trial 5.
However, as already remarked, the first segment of a rate parse always extends back to the onset of observation. In the parses of the CS poke rate and the ITI poke rate, which are plotted in the right column of Figure 13 found there is no evidence for a changes in the poke rates at Trial 6 . Indeed, the parse of the CS poke rate shows no change over all 126 CSs, while the parse of the ITI poke rate finds the first change to be at Trial 13.
The failure to find changes in the poke rates at Trial 6 is not a consequence of the high value for the decision criterion. Lowering it from 6 nats to 4 did not change the parse; lowering it to 2 nats produced a parse with a single change, a drop from 52 pokes/min to 38 pokes/min at Trial 40. Likewise for the parse of the ITI rate of poking: lowering the decision criterion from 6 nats to 2 nats did not alter it. Nor does this failure occur because the algorithm cannot find very short segments. Parses of the contextual poke rate at all three values for the decision criterion find the same 1-trial long segment at Trial 13—see upward blip in dashed plot in Figure 13d. In other subjects, upward or downward blips 1 or 2 trials long are sometimes found at the outset of training.
There are significantly fewer pokes during the first 5 CSs than expected given the initial rate estimate for the CS poke rate. Much of this is attributable to the initially slow reaction to CS onset. The poke on Trial 3 came 20.3s into the 23s long CS; the poke on Trial 5 came 14.45s into the 16s long CS. The latency to make the first poke dropped rapidly over the first 9 trials from a mean greater than 11s for the first 5 trials to a mean of 3.6s for the Trials beyond 10. During the first 5 trials, the observation intervals during which it was possible to register a poke that would go into the parsing algorithm totaled only 4.18s. Given an initial poke rate estimate of 0.7/s, the expected number of pokes in that interval is 2.9 and there is a 5% chance of observing no pokes.
All considered, an argument can be made in this and several other cases where the informativeness was large that the parse results are a better indicator of the onset of CS- conditional poking, particularly when informativeness is high and the ITI poke rate very low. Blips in the ITI poke rate often put it momentarily greater than the CS poke rate (see dashed plot in Figure 13d) are common in post-acquisition protocols. Therefore, our parse-based estimate of the onset of conditioned is the trial after which the parsed CS poke rate is greater than the parsed ITI poke rate on 95% of the trials.
Columns 4, 5 and 6 of the Acquisition Table in the Supplementary Materials give the trial after which conditioned responding appeared as estimated in the above described three different ways— by the location of the minimum in the nDKL, the last upward 0 crossing, and the CS parse consistently greater than the ITI parse, respectively. Column 3 in that table gives the minimum of the three estimates. Columns 7-11 of the Acquisition Table give the trial after which the evidence for acquisition exceeded increasingly stringent criteria (odds of 4, 10, 20, 100 and 1000:1). Four-panel figures, equivalent to Figures 12 and 13, are included for each subject on pages 2 to 169 of the Supplementary Materials. The different estimates may coincide: In 81 subjects, the minimum of the nDKL coincides with the earliest estimate; in 69 subjects the parse-based estimate does, and in 40 subjects the last upward 0 crossing does.
References
- 1.Pavlovian contingencies and temporal informationJournal of Experimental Psychology: Animal Behavior Processes 32:284–294
- 2.Temporal maps and informativeness in associative learningTrends in Neurosciences 32:73–78
- 3.Stimulus preference and the transitivity of preferenceAnimal Learning and Behavior 20:401–406
- 4.Importance of Trials Versus Accumulating Time Across Trials in Partially Reinforced Appetitive ConditioningJournal of Experimental Psychology: Animal Behaviour Processes 29:62–77
- 5.Few-shot learning: temporal scaling in behavioral and dopaminergic learningbioRxiv https://doi.org/10.1101/2023.03.31.535173
- 6.Information theoryNew York: Wiley
- 7.Contingency: Its meaning in the experimental analysis of behaviorEuropean Journal of Behavior Analysis 7:111–114
- 8.Temporal control of conditioned responding in goldfishJournal of Experimental Psychology: Animal Behavior Processes 31:31–39
- 9.The organization of learningCambridge, MA: Bradford Books/MIT Press
- 10.Conditioning from an information processing pespectiveBehavioural Processes 62:89–101
- 11.Robert Rescorla: Time, Information and ContingencyRevista de historia de la psicología 42:7–21
- 12.Temporal contingencyBehavioural Processes 101:89–96https://doi.org/10.1016/j.beproc.2013.08.012
- 13.The learning curve: Implications of a quantitative analysisProceedings of the National Academy of Sciences of the United States of America 101:13124–13131
- 14.Time, rate, and conditioningPsychological Review 107:289–344
- 15.Bringing Bayes and Shannon to the study of behavioral and neurobiological timingTiming & TIME Perception 11:29–89https://doi.org/10.1163/22134468-bja10069
- 16.Time-scale invariant contingency in reinforcement learning with extremely long delays to reinforcement
- 17.Scalar expectancy theory and Weber’s Law in animal timingPsychological Review 84:279–335https://doi.org/10.1037/0033-295X.84.3.279
- 18.Scalar expectancy theory and Weber’s law in animal timingPsychological Review 84:279–325
- 19.Trial and intertrial durations in autoshapingJournal of Experimental Psychology: Animal Behavior Processes 3:264–284
- 20.Spreading association in timeAutoshaping and conditioning theory New York: Academic Press :219–253
- 21.Contingency spaces and measures in classical and instrumental conditioningJournal of the Experimental Analysis of Behavior 21:585–605https://doi.org/10.1901/jeab.1974.21-585
- 22.Partial reinforcement in autoshaping with pigeonsAnimal Learning and Behavior 8:45–59
- 23.Acquisition with partial and continuous reinforcement in pigeon autoshapingLearning & Behavior 32:231–334
- 24.Acquisition with partial and continuous reinforcement in rat magazine approachJournal of Experimental Psychology: Animal Behavior Processes 31:319–333
- 25.Is the number of trials a primary determinant of conditioned responding?Journal of Experimental Psychology: Animal Behavior Processes 34:185–201
- 26.The computation of contingency in classical conditioningThe psychology of learning and motivation (Vol. 20 New York: Academic Press :137–192
- 27.Exploring the edges of Pavlovian contingency space: An assessment of contingency theory and its various metricsLearning and Motivation 23:225–249
- 28.Probabilistic contingency theories of animal conditioning: A critical analysisLearning and Motivation 14:527–550https://doi.org/10.1016/0023-9690(83)90031-0
- 29.The learning curve, revisitedJournal of Experimental Psychology: Animal Learning and Cognition 48:265–280
- 30.Response rate and reinforcement rate in Pavlovian conditioningJournal of Experimental Psychology: Animal Behavior Processes 37:375–384
- 31.Response rates track the history of reinforcement timesJournal of Experimental Psychology: Animal Behavior Processes 37:277–286
- 32.Trial and intertrial durations in appetitive conditioning in ratsAnimal Learning and Behavior 28:121–135
- 33.A model for Pavlovian learning and performance with reciprocal associationsPsychological Review 127:829–852
- 34.Why autoshaping depends on trial spacingAutoshaping and conditioning theory New York: Academic Press :255–284
- 35."Attention-like" processes in classical conditioningMiami symposium on the prediction of behavior: aversive stimulation Miami: Miami University Press :9–31
- 36.Theoretical implications of quantitative properties of interval timing and probability estimation in mouse and ratJournal of the Experimental Analysis of Behavior 108:39–72https://doi.org/10.1002/jeab.261
- 37.Mice take calculated risksProceedings of the National Academy of Sciences of the United States of America 109:8776–8779https://doi.org/10.1073/pnas.1205131109
- 38.Theory of reinforcement schedulesJournal of the Experimental Analysis of Behavior 120:289–319https://doi.org/10.1002/jeab.880
- 39.A clock not wound runs downBehavioural Processes 45:129–139
- 40.Arousal: its genesis and manifestation as response ratePsychological Review 85:571–581https://doi.org/10.1037/0033-295X.85.6.571
- 41.Independent effects of stimulus and cycle duration on conditioning: The role of timing processesAnimal Learning and Behavior 28:373–388
- 42.Trial and intertrial durations in Pavlovian conditioning: Issues of learning and performanceJournal of Experimental Psychology: Animal Behavior Processes 25
- 43.Evaluating the TD model of classical conditioningLearning & Behavior 40:305–319
- 44.A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuliPsychological Review 87:532–552
- 45.Pavlovian conditioning and its proper control proceduresPsychological Review 74:71–80
- 46.Probability of shock in the presence and absence of CS in fear conditioningJournal of Comparative and Physiological Psychology 66:1–5
- 47.A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcementClassical conditioning II: Current research and theory New York: Appleton- Century-Crofts :64–99
- 48.NutritionThe laboratory rat New York: Academic Press :123–152
- 49.A tutorial on Thompson SamplingFoundations and Trends in Machine Learning 11:1–96
- 50.Time-derivative models of Pavlovian reinforcementLearning and computational neuroscience: Foundations of adaptive networks Cambridge, MA: Bradford Books/MIT Press :497–537
- 51.Effects of conditioned stimulus (CS) duration, intertrial interval, and I/T ratio on appetitive Pavlovian conditioningJournal of Experimental Psychology: Animal Learning and Cognition 46:243–255
- 52.The development and present status of the SOP model of associative learningQuarterly Journal of Experimental Psychology 72:346–374https://doi.org/10.1177/1747021818777074
- 53.A practical solution to the pervasive problems of p valuesPsychonomic Bulletin & Review 14:779–804
- 54.Conditioned stimulus informativeness governs conditioned stimulus-unconditioned stimulus associabilityJournal of Experimental Psychology: Animal Behavior Processes 38:217–232
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2024, Justin A Harris & CR Gallistel
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.