Peer review process
Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.
Read more about eLife’s peer review process.Editors
- Reviewing EditorNathan HolmesUNSW Sydney, Sydney, Australia
- Senior EditorKate WassumUniversity of California, Los Angeles, Los Angeles, United States of America
Reviewer #1 (Public review):
Summary:
This manuscript by Harris and Gallistel investigates how the rate of learning and strength of conditioned behavior post learning depend on the various temporal parameters of Pavlovian conditioning. They replicate results from Gibbon and Balsam (1981) in rats to show that the rate of learning is proportional to the ratio between the cycle duration and the cue duration. They further show that the strength of conditioned behavior post learning is proportional to the cue duration, and not the above ratio. The overall findings here are interesting, provide context to many conflicting recent results on this topic, and are supported by reasonably strong evidence. Nevertheless, there are some major weaknesses in the evidence presented for some of the stronger claims in the manuscript.
Strengths:
This manuscript has many strengths including a rigorous experimental design, several different approaches to data analysis, careful consideration of prior literature, and a thorough introduction and discussion. The central claim-that animals track the rates of events in their environment, and that the ratio of two rates determine the rate of learning-is supported with solid evidence.
Weaknesses:
Despite the above major strengths, some key aspects of the paper need major improvement. These are listed below.
(1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981) and the rats in this study. I think the evidence for this claim is weak as presented here. First, the exact measure used for identifying trials to criterion makes a big difference in Fig 3. As best as I understand, the authors do not make any claims about which of these approaches is the "best" way. Second, the measure used for identifying trials to criterion in Fig 1 appears different from any of the criteria used in Fig 3. If so, to make the claim that the quantitative relationship is one and the same in both datasets, the authors need to use the same measure of learning rate on both datasets and show that the resultant plots are statistically indistinguishable. Currently, the authors simply plot the dots from the current dataset on the plot in Fig 1 and ask the readers to notice the visual similarity. This is not at all enough to claim that both relationships are the same. In addition to the dependence of the numbers on the exact measure of learning rate used, the plots are in log-log axis. Slight visual changes can mean a big difference in actual numbers. For instance, between Fig 3 B and C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5. The authors need to perform much more rigorous quantification to make the strong claim that the quantitative relationships obtained here and in Gibbon and Balsam 1981 are identical.
(2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, I am confused why the data used to the test the ITI proportionality come from the last 5 sessions. Specifically, if the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Fig 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ pre and post cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to cue reward rate instead of cue reward rate plus contextual reward rate?
(3) I think there was a major conceptual disconnect between the gradual nature of learning shown in Figs 7 and 8 and the information theoretic model proposed by the authors. To the extent that I understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to rate of rewards, why is it changing as animals go from 10% to 90% of peak response? I think the manuscript would be much strengthened if these results are explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, please do explicitly state this in the manuscript.
(4) I find the idea stated in the Conclusion section that any model considering probability of reinforcement cannot be correct because it doesn't have temporal units to be weak. I think the authors might mean that existing models based on probability do not work and not that no possible model can work. For any point process, the standard mathematical treatment of continuous time is to compute the expected count of events as p*dt where p is the probability of occurrence of the event in that time bin and dt is an infinitesimal time bin. There is obviously a one-to-one mapping between probability of an event in a point process and its rate. Existing models use an arbitrary time bin/trial and thus, I get the authors' argument in the discussion. However, I think their conclusion is overstated.
(5) The discussion states that the mutual information defined in equation 1 does not change during partial reinforcement. I am confused by this. The mean delay between reinforcements increases in inverse proportion to the probability of reinforcement, but doesn't the mean delay between cue and next reinforcement increase by more than this amount (next reinforcement is greater than or equal to the cue-to-cue interval away from the cue for many trials)? Why is this ratio invariant to partial reinforcement?
Comments on revisions:
Update following revision
(1) This point is discussed in more detail in the attached file, but there are some important details regarding the identification of the learned trial that require more clarification. For instance, isn't the original criterion by Gibbon et al. (1977) the first "sequence of three out of four trials in a row with at least one response"? The authors' provided code for the Wilcoxon signed rank test and nDkl thresholds looks for a permanent exceeding of the threshold. So, I am not yet convinced that the approaches used here and in prior papers are directly comparable. Also, there's still no regression line fitted to their data (Fig 3's black line is from Fig 1, according to the legends). Accordingly, I think the claim in the second paragraph of the Discussion that the old data and their data are explained by a model with "essentially the same parameter value" is not yet convincing without actually reporting the parameters of the regression. Related to this, the regression for their data based on my analysis appears to have a slope closer to -0.6, which does not support strict timescale invariance. I think that this point should be discussed as a caveat in the manuscript.
(2) The authors report in the response that the basis for the apparent gradual/multiple step-like increases after initial learning remains unclear within their framework. This would be important to point out in the actual manuscript. Further, the responses indicating the fact that there are some phenomena that are not captured by the current model would be important to state in the manuscript itself.
(3) There are several mismatches between results shown in figures and those produced by the authors' code, or other supplementary files. As one example, rat 3 results in Fig 11 and Supplementary Materials don't match and neither version is reproduced by the authors' code. There are more concerns like this, which are detailed in the attached review file.
Reviewer #2 (Public review):
A long-standing debate in the field of Pavlovian learning relates to the phenomenon of timescale invariance in learning i.e. that the rate at which an animal learns about a Pavlovian CS is driven by the relative rate of reinforcement of the cue (CS) to the background rate of reinforcement. In practice, if a CS is reinforced on every trial, then the rate of acquisition is determined by the relative duration of the CS (T) and the ITI (C = inter-US-interval = duration of CS + ITI), specifically the ratio of C/T. Therefore, the point of acquisition should be the same with a 10s CS and a 90s ITI (T = 10; C = 90 + 10 = 100, C/T = 100/10 = 10) and with a 100s CS and a 900s ITI (T = 100; C = 900 + 100 = 1000, C/T = 1000/100 = 10). That is to say, the rate of acquisition is invariant to the absolute timescale as long as this ratio is the same. This idea has many other consequences, but is also notably different from more popular prediction-error based associative learning models such as the Rescrola-Wagner model. The initial demonstrations that the ratio C/T predicts the point of acquisition across a wide range of parameters (both within and across multiple studies) was conducted in Pigeons using a Pavlovian autoshaping procedure. What has remained under contention is whether or not this relationship holds across species, particularly in the standard appetitive Pavlovian conditioning paradigms used in rodents. The results from rodent studies aimed at testing this have been mixed, and often the debate around the source of these inconsistent results focuses on the different statistical methods used to identify the point of acquisition for the highly variable trial-by-trial responses at the level of individual animals.
The authors successfully replicate same effect found in pigeon autoshaping paradigms decades ago (with almost identical model parameters) in a standard Pavlovian appetitive paradigm in rats. They achieve this through a clever change the experimental design, using a convincingly wide range of parameters across 14 groups of rats, and by a thorough and meticulous analysis of these data. It is also interesting to note that the two author's have published on opposing sides of this debate for many years, and as a result have developed and refined many of the ideas in this manuscript through this process.
Main findings
(1) The present findings demonstrate that the point of initial acquisition of responding is predicted by the C/T ratio.
(2) The terminal rates of responding to the CS appears to be related to the reinforcement rate of the CS (T; specifically, 1/T) but not its relation to the reinforcement rate of the context (i.e. C or C/T). In the present experiment, all CS trials were reinforced so it is also the case that the terminal rate of responding was related to the duration of the CS.
(3) An unexpected finding was that responding during the ITI was similarly related to the rate of contextual reinforcement (1/C). This novel finding suggests that the terminal rate of responding during the ITI and the CS are related to their corresponding rates of reinforcement. This finding is surprising as it suggests that responding during the ITI is not being driven by the probability of reinforcement during the ITI.
(4) Finally, the authors characterised the nature of increased responding from the point of initial acquisition until responding peaks at a maximum. Their analyses suggest that nature of this increase was best described as linear in the majority of rats, as opposed to the non-linear increase that might be predicted by prediction error learning models (e.g. Rescorla-Wagner). However, more detailed analyses revealed that these changes can be quite variable across rats, and more variable when the CS had lower informativeness (defined as C/T).
Strengths and Weaknesses:
There is an inherent paradox regarding the consistency of the acquisition data from Gibbon & Balsam's (1981) meta-analysis of autoshaping in pigeons, and the present results in magazine response frequency in rats. This consistency is remarkable and impressive, and is suggestive of a relatively conserved or similar underlying learning principle. However, the consistency is also surprising given some significant differences in how these experiments were run. Some of these differences might reasonably be expected to lead to differences in how these different species respond. For example:
- The autoshaping procedure commonly used in the pigeons from these data were pretrained to retrieve rewards from a grain hopper with an instrumental contingency between head entry into the hopper and grain availability. During Pavlovian training, pecking the key light also elicited an auditory click feedback stimulus, and when the grain hopper was made available the hopper was also illuminated.
- In the present experimental procedure, the rats were not given contextual exposure to the pellet reinforcers in the magazine (e.g. a magazine training session is typically found in similar rodent procedures). The Pavlovian CS was a cue light within the magazine itself.
These design features in the present rodent experiment are clearly intentional. Pretraining with the reinforcer in the testing chambers would reasonably alter the background rate of reinforcement (parameter), so it make sense not to include this but differs from the paradigm used in pigeons. Having the CS inside the magazine where pellets are delivered provides an effective way to reduce any potential response competition between CS and US directed responding and combines these all into the same physical response. This makes the magazine approach response more like the pecking of the light stimulus in the pigeon autoshaping paradigm. However, the location of the CS and US is separated in pigeon autoshaping, raising questions about why the findings across species are consistent despite these differences.
Intriguingly, when the insertion of a lever is used as a Pavlovian cue in rodent studies, CS directed responding (sign-tracking) often develops over training such that eventually all animals bias their responding towards the lever than towards the US (goal-tracking at the magazine). However, the nature of this shift highlights the important point that these CS and US directed responses can be quite distinct physically as well as psychologically. Therefore, by conflating the development of these different forms of responding, it is not clear whether the relationship between C/T and the acquisition of responding describes the sum of all Pavlovian responding or predominantly CS or US directed responding.
Another interesting aspect of these findings is that there is a large amount of variability that scales inversely with C/T. A potential account of the source of this variability is related to the absence of preexposure to the reward pellets. This is normally done within the animals' homecage as a form of preexposure to reduce neophobia. If some rats take longer to notice and then approach and finally consume the reward pellets in the magazine, the impact of this would systematically differ depending on the length of the ITI. For animals presented with relatively short CSs and ITIs, they may essentially miss the first couple of trials and/or attribute uneaten pellets accumulating in the magazine to the background/contextual rate of reinforcement. What is not currently clear is whether this was accounted for in some way by confirming when the rats first started retrieving and consuming the rewards from the magazine.
While the generality of these findings across species is impressive, the very specific set of parameters employed to generate these data raise questions about the generality of these findings across other standard Pavlovian conditioning parameters. While this is obviously beyond the scope of the present experiment, it is important to consider that the present study explored a situation with 100% reinforcement on every trial, with a variable duration CS (drawn form a uniform distribution), with a single relatively brief CS (maximum of 122s) CS and a single US. Again, the choice of these parameters in the present experiment is appropriate and very deliberately based on refinements from many previous studies from the authors. This includes a number of criteria used to define magazine response frequency that includes discarding specific responses (discussed and reasonably justified clearly in the methods section). Similarly, the finding that terminal rates of responding are reliably related to 1/T is surprising, and it is not clear whether this might be a property specific to this form of variable duration CS, the use of a uniform sampling distribution, or the use of only a single CS. However, it is important to keeps these limitations in mind when considering some of the claims made in the discussion section of this manuscript that go beyond what these data can support.
The main finding demonstrating the consistent findings across species is presented in Figure 3. In the analysis of these data, it is not clear why the correlations between C, T, and C/T and the measure of acquisition in Figure 3A were presented as r values, whereas the r2 values were presented in the discussion of Figure 3B, and no values were provided in discussing Figure 3C. The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric. Unfortunately, the new metric is discussed and defined at great length but its utility is not considered.
An important analysis issue that is unclear in the present manuscript is exactly how the statistics were run (how the model was defined, were individual subjects or group medians used, what software was used etc...). For example, it is not clear whether the analyses conducted in relation to Figure 3 used the data from individual rats or the group medians. Similarly, it appears that each rat contributes four separate data points, and a single regression line was fit to all these data despite the highly likely violation of the assumption independent observations (or more precisely, the assumption of uncorrelated errors) in this analysis. Furthermore, it is claimed that the same regression line fit the IT and CS period data in this figure, however this
If the data in figure 3 were analyzed with log(ITI) or log(C/ITI) i.e. log(C/(T-C)), would this be a better fit for these data? Is it the case that the ratio of C/T the best predictor of the trial/point of acquisition, or is it the case that another metric related to reinforcement rates provides a better fit?
Based on the variables provided in Supplementary file 3, containing the acquisition data, I was unable to reproduce the values reported in the analysis of Figure 3.
In relation to Figure 3: I am curious about whether the authors would be able to comment on whether the individual variability in trials to acquisition would be expected to scale differently based on C/T, or C, or (if a less restricted range was used) T?
It is not clear why Figure 3C is presented but not analyzed, and why the data presented in Figure 4 to clarify the spread of the distribution of the data observed across the plots in Figure 3 uses the data from Figure 3C. This would seem like the least representative data to illustrate the point of Figure 4. It also appears to my eye that the data actually plotted in Figure 4 correspond to Figure 3A and 3B rather than the odds 10:1 data indicated in text.
What was the decision criteria used to decide on averaging the final 5 conditioning sessions as terminal responding for the analyses in Figure 5? This is an oddly specific number. Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?
In the analysis corresponding to Figures 7-8: If I understand the description of this analysis correctly, for each rat the data are the cumulative response data during the CS, starting from the trial on which responding to the CS > ITI (t = 1), and ending at the trial on which CS responding peaked (maximum over 3 session moving average window; t = end). This analysis does not seem to account for changes (decline) in the ITI response rates over this period of acquisition, and it is likely that responding during the ITI is still declining after t=1. Are the 4 functions that were fit to these data to discriminate between different underlying generative processes still appropriate on total CS responding instead of conditional CS responding after accounting for changes in baseline response rates during ITI?
Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. Please indicate if the definition of the ITI period was modified on trials where the preceding ITI was <20s, and if any other criteria were used to define the ITI.
Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition? Please indicate whether rats where pre-exposed to the reward pellets in their home cages e.g. as is often done to reduce neophobia. Given the deliberate absence of a magazine-training phase, this information is important when assessing the experienced contingency between the CS and the US.
For all the analyses, please provide the exact models that were fit and the software used. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model fits discussed in Figure 3 are fit on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses does not correct for the potentially highly correlated multiple measurements from the same subjects i.e. each rat provides 4 data points which are very likely not to be independent observations.
A number of sections of the discussion are speculative or not directly supported by the present experimental data (but may well be supported by previous findings that are not the direct focus of the present experiment). For example, Page 19, Paragraph 2: this entire paragraph is not really clearly explained and is presenting an opinion rather than a strong conclusion that follows directly from the present findings. Evidence for an aspect of RET in the present paper (i.e. the prediction of time scale invariance on the initial point of acquisition, but not necessarily the findings regarding the rate of terminal acquisition) - while supportive - does not necessarily provide unconditional evidence for this theory over all the alternatives.
Similarly, the Conclusion section (Page 23) makes the claim that "the equations have at most one free parameter", which may be an oversimplification that is conditionally true in the narrow context of the present experiment where many things were kept constant between groups and run in a particular way to ensure this is the case. While the equations do well in this narrow case, it is unlikely that additional parameters would not need to be added to account for more general learning situations. To clarify, I am not contending that this kind of statement is necessarily untrue, merely that it is being presented in a narrow context and may require a deeper discussion of much more of the literature to qualify/support properly - and the discussion section of the present experiment/manuscript may not be the appropriate place for this.
- Consider taking advantage of an "Ideas and Speculation" subsection within the Discussion that is supported by eLife [ https://elifesciences.org/inside-elife/e3e52a93/elife-latest-including-ideas-and-speculation-in-elife-papers ]. This might be more appropriate to qualify the tone of much of the discussion from page 19 onwards.
It seems like there are entire analyses and new figures being presented in the discussion e.g. Page 20: Information-Theoretic Contingency. These sections might be better placed in the methods section or a supplementary section/discussion.