Related to Figure 1D. (A) The tetrachoric correlation (r) between RL model-inferred explore-exploit states and HMM-inferred states. (B) The standardized regression coefficient (beta coefficients) of RL model-inferred states and HMM-inferred states in predicting response time. (C) The distribution of times between switch decisions (inter-switch intervals). A single probability of switching would produce exponentially distributed inter-switch intervals. Orange line, the maximum likelihood fit for a single discrete exponential distribution. Solid blue line, a mixture of two exponential distributions, with each component distribution in dotted blue. The two components reflect one fast-switching time constant (average interval, 1.7 trials) and one persistent time constant (6.8 trials). The right plot is the same as the left, but with a log scale. Inset is the log likelihood of mixtures of different numbers of exponential distributions. (D) Probability of choice as a function of value differences between choices for exploratory and exploitative states. (E) Difference in choice response time between explore and exploit choices. (F) The probability of animals switching targets on the next trial, given the current trial’s outcome and latent state. (G) Difference in choice response time between explore and exploit choices. There is no significant difference in retrieval time between two latent states, suggesting that exploration was not merely disengagement from the task. * indicates p < 0.05. Graphs depict mean ± SEM across animals.