We checked our results could not be explained by differences in luminosity between outcomes signalling reward vs. no reward. We counterbalanced the stimuli signalling rewarding vs. non-rewarding outcomes across participants: half the participants received a cheese stimulus as the rewarding outcome and apple stimulus as the unrewarding outcome, and the other half of participants received the converse. We analyse the pupil data from the behavioural session that each participant completed prior to the fMRI session, and split the participants according to the stimulus outcomes they received. We analyse the sixteen participants that entered pupil-brain analyses (ten received cheese as a reward, and six received apples as reward). We demonstrate that although noisier due to reduced power, pupil dilations to task factors are qualitatively the same when either the cheese or the apple was the rewarding outcome. We demonstrate this with two analyses. (a) First, by performing an analysis looking at the baseline pupil size on trials around transitions from exploitation to exploration. Mean baseline pupil size (again pupil size expressed as % of mean of that session) is presented as a function of trials around a transition from exploitation to exploration. A marked one trial increase in pupil size was observed as participants transitioned from exploitation to exploration (left panel), as has been previously observed (Jepma and Nieuwenhuis, 2011). Although noisier due to reduced power, this result was true when data was split according to either outcome identity type (cheese or apples; middle and right panels, respectively). Error bars are SEM. (b) Second, by constructing a GLM with regressors: reward, changing response on the next trial, switching in to exploration, as well as a main effect regressor, and performing a timeseries analysis with this GLM on all data and on data split according to stimulus outcome identities. Mean beta weights for the effect of the regressors in the GLM on pupil size are plotted as a function of time relative to outcome delivery. The data on each trial was normalised before performing the regression by demeaning the one second preceding outcome delivery. A constriction following reward delivery, a small dilation on trials on which participants changed their choice on the subsequent trial, and a large dilation when switching from exploitation in to exploration was observed. Again it can be observed that the results are qualitatively similar regardless of outcome identity type (middle and right panels). Shaded regions are SEM.