Introduction

A core goal of cognitive neuroscience is to understand how sensory input describing events in the external world is translated into the patterns of thoughts we experience in our lives. Complex naturalistic states, such as movie watching, are important paradigms to understand this process because they allow cognition and brain dynamics to be understood in a situation that maps directly onto experiences in the real world [15]. Developments in cognitive neuroscience, leveraging state-of-the-art brain imaging techniques such as functional magnetic resonance imaging (fMRI), have established core features of neural patterns that emerge across participants during movie-watching tasks [2], highlighting their similarity across individuals [6] and their links to memory for information in the films [7]. However, it is more difficult to reliably map ongoing thought patterns in this context since experiential sampling, the gold-standard for tracking thought patterns [8], has the potential to disrupt the natural unfolding of brain activity during movie watching. The goal of our study was to minimize the disruptive impact of sampling ongoing experience by using a novel approach that allows us to explicitly link patterns of ongoing thought to brain activity during movie-watching at specific moments in a film.

Contemporary theories of ongoing thought suggest that a primary dimension to differentiate subjective experiences is the extent to which they depend on immediate sensory input [8, 9]. Cognitive states that are ‘coupled’ to events in the immediate environment are assumed to be linked to greater cortical processing of sensory input [10], better task performance, and memory for events in narrative comprehension tasks like reading [1113]. In contrast, perceptually ‘decoupled’ states from sensory input, such as the experience of mind-wandering [14, 15], provide an opportunity to pursue thoughts derived from memory [13, 16] but can be linked to compromised task performance and worse memory for events [17]. Moreover, in situations where comprehension is important, states of distraction are hypothesized to be linked to poor executive control [11, 15, 18]. Given that movie-watching provides a situation where dynamic changes in visual and auditory input drive a complex narrative, movie-watching provides an ecologically valid opportunity to understand how ongoing thought patterns map onto neural activation patterns in a naturalistic context.

Consistent with the notion that movie watching is a perceptually coupled state, recent work in cognitive neuroscience suggests an important role for primary systems, such as visual and auditory cortices [19]. However, studies also hypothesize a role for regions of association cortex linked to higher-order thought, such as the default mode network (DMN) or the frontoparietal network (FPN) [2, 4, 20]. For example, the DMN is hypothesized to be important in social cognition, episodic memory, and conceptual knowledge — all of which are likely important for understanding the narrative of the film (for a review of the broad role the DMN plays in cognition, see [21]). However, the DMN has also been implicated in perceptually decoupled states, such as mind-wandering, that are likely antagonistic to movie watching [13, 22-24].

Similarly, the FPN is important for multiple types of tasks, including those superficially different from movies, such as working memory maintenance, reflecting this networks hypothesized role in goal maintenance [25, 26]. Contemporary views of ongoing thought argue that the FPN is likely to be important in suppressing distraction, including reductions in self-generated states like mind-wandering [27]. Although studies have highlighted the role of both primary sensory and higher-order systems, such as the DMN and the FPN in movie watching [4, 28], our lack of a formal understanding of the mapping between thought patterns when we watch films and associated patterns of brain activity means the specific role that different brain systems play in the experience of movie watching remains largely a matter of speculation [29].

Our experiment was designed to better understand how patterns of brain activity at different moments during a film map onto the ongoing thoughts accompanying them. We used multi-dimensional experience sampling (mDES) to describe ongoing thought patterns during the movie-watching experience [8]. mDES is an experience sampling method that identifies different features of thought by probing participants about multiple different dimensions of their experiences. mDES can provide a description of a person’s thoughts, generating reliable thought patterns across laboratory cognitive tasks [21, 30, 31] and in daily life [32, 33] and is sensitive to accompanying changes in brain activity [23, 34]. One challenge that arises when attempting to map the dynamics of thought onto brain activity during movie watching is accounting for the inherently disruptive nature of experience sampling: to measure experience with sufficient frequency to map the dynamics of thoughts during movies would disrupt the natural dynamics of the brain and would also alter the viewer’s experience. To overcome this obstacle, we developed a novel methodological approach in which participants are probed five times in a ten-minute movie clip (11 minutes total, no sampling in the first minute). We used a jittered sampling technique where probes were delivered at different intervals across the film for different people depending on the condition they were assigned. Probe orders were also counterbalanced to minimize the systematic impact of prior and later probes at any given sampling moment. We leveraged this technique to construct a precise description of the dynamics of experience for every 15 seconds of three ten-minute movie clips for which existing fMRI data were available [35]. We then mapped the time series of experience measured by mDES in our laboratory onto the time series of brain activity. In this way, we can explicitly understand how the patterns of thoughts that dominate at different moments in a film relate to the brain activity at these time points and, therefore, better understand the contribution of different neural systems to the movie-watching experience.

Results

Analytic Goal

The goal of our study, therefore, was to understand the association between patterns of brain activity over time during movie clips and the patterns of thought that participants reported at the corresponding moment (see Figure 1). This can be conceptualized as identifying the mapping between two multi-dimensional spaces, one reflecting the time series of brain activity and the other describing the time series of ongoing experience (see Figure 1 right-hand panel). In our study, we selected three 11-minute clips from movies (Citizenfour, Little Miss Sunshine and 500 Days of Summer) for which recordings of brain data in fMRI already existed [35] (Figure 1, Sample 1). A second set of participants viewed the same movie clips, providing intermittent reports on their thought patterns using mDES (Figure 1, Sample 2). Our goal was to understand the mapping between the patterns of brain activity at each moment of the film and the reports of ongoing thought recorded at the same point in the movie. We applied Principal Components Analysis (PCA) to the mDES data to reduce these data to a set of four simple dimensions that explained the thought patterns reported during the movies. These are represented as word clouds in Figure 1. We performed two analyses to understand the associations between the mDES reports and brain activity at each point in the film. Our first analysis computed the mean time series of experience for each of the four thought pattern components (averaged across participants in Sample 2) and used this as a regressor of interest in a model predicting brain activity recorded from each participant from Sample 1. We refer to this as a voxel-space analysis, and it allowed us to perform a whole-brain search of the mapping between activity in each region to each dimension of ongoing thought. In our second analysis, we projected the grand mean of brain activity for each volume of each film against the first five dimensions of brain activity from a decomposition of the Human Connectome Project (HCP) resting state date to form a 5D “brain-space” that describes the trajectory of the brain during each movie (Figure 1, note only the first four dimensions are shown) [36]. We used the results of this analysis to produce coordinates for each TR of each movie, which were used as explanatory variables in a linear mixed model in which the location of each mDES probe in the “thought space” described by the PCA dimensions were the dependent variables. We refer to this second analysis as a state-space analysis (see [3739] for prior examples of this approach). Similarly, we created a brain space to describe the movie-watching experience by comparing each moment in the film to validated dimensions of brain variation. For this purpose, we used the dimensions defined from the resting states of the HCP conducted by Margulies [36] (often referred to as gradients): Gradient 1 (Association to Primary cortex), Gradient 2 (Visual to Motor cortex), Gradient 3 (Frontoparietal to Default Mode Networks) and Gradient 4 (Dorsal Attention Network (DAN)/Visual to Default Mode Networks) of brain variation dimensions illustrated by colour to map activity in state space analysis (purple = low, yellow = high) (not shown: Gradient 5 (Lateral Default Mode to Primary sensory cortex)) [36]. Two 3D scatter plots illustrating two examples from our data of how the movie watching can be seen as two complimentary trajectories through a “Brain Space” (focusing on Gradients 1, 2 and 3, shown at the top) and a “Thought Space” (focusing on“Episodic Knowledge,” “Verbal Detail,” and “Sensory Engagement,” shown at the bottom). In these plots, the cooler (blue) points occur earlier in the movie clip, and the warmer (red) points occur later.

Using fMRI data and experience sampling data to map patterns of ongoing thought onto brain activity during movie-watching. Left to Right - One sample of participants was scanned while watching movies (Sample 1), and a different set of participants responded to experience sampling probes (Sample 2) while watching the same movies in the laboratory. Decomposition of mDES data into low-dimension experiential patterns using principal component analysis (PCA) produced a set of dimensions that describe experience during movie watching (a “thought-space” within which the dynamics of the movie-watching experience unfold). Word clouds illustrate how the experience sampling questions map onto each dimension that describes this space. In these word clouds, the font size describes their importance (bigger = more important), and the colour describes their polarity (red = positive, blue = negative).

Generation of the thought-space

The first step in our analysis was to decompose the mDES data using PCA to produce the dimensions that make up the “thought-space” we will use for our subsequent analyses (Figure 1, see Methods). Based on the scree plot (see Supplementary Figure 1), the data best fit a four-component solution and are displayed as word clouds (see Methods for further details). In these words clouds, items with similar colours are related, and the font size highlights their importance. Component 1 contributed 26.1% of the variance explained and loaded highly on terms “past,” “self,” and “knowledge,” and negatively on “words” and “sounds” and is referred to as “Episodic Knowledge.” Component 2 explained 10.5% of the variance and loaded positively on the items “intrusive” and “distracting” and negatively on “deliberateness” and is referred to as “Intrusive Distraction.” Component 3 loaded positively on “words,” “detail,” and “deliberateness,” explaining 7.7% of the variance, and it was called “Verbal Detail.” Finally, Component 4 contributed 6.8% of the explained variance, loaded positively on emotion, images, sounds and people and was named “Sensory Engagement.” See Supplementary Table 1 for a description of the mDES questionnaire and Supplementary Table 2 for the percentage of variance explained by principal components overall and in each movie.

Split-half reliability results

A bootstrapped split-half reliability analysis was conducted to confirm that the four-component solution provided a reasonable description of our data. This analysis repeatedly divided the mDES data into two random samples and evaluated the correlation between the two halves’ components. The reliability analysis supported that the 4-component solution was reproducible because it had a strong homologue similarity score (r = .96, 95% CI [.93, 1.00]; see Methods for further details).

Variation in thought patterns

Next, we examined how these dimensions describe experience within each movie (see Figure 2). We performed four linear mixed models (LMM), one for each thought component (Episodic Knowledge, Intrusive Distraction, Verbal Detail, and Sensory Engagement), in which the movie was the explanatory variable of interest and participants were included as a random effect. The significance threshold was adjusted using the False Discovery Rate (FDR) to control for family-wise error (FWE) within the model (controlling for the three movies). The four-model analyses found significant differences in overall thought pattern scores across the three movies, including reported thoughts resembling “Episodic Knowledge,” F(2, 2015.3) = 5.41, p = .005, “Intrusive Distraction,” F(2, 2015.3) = 77.84, p <.001, “Verbal Detail,” F(2, 2015.4) = 13.90, p <.001, and “Sensory Engagement,” F(2, 2015.7) = 82.69, p <.001. This suggests that within each model, there was a significantly different score for the reported thought pattern in at least one of the movies. Post-hoc pairwise comparisons using the least-squares means (lsmeans) were conducted for each model to investigate how thought component scores differ in each movie, adjusting significance thresholds using the Tukey method to control for FWE within the model. The first model suggests patterns of responses in Little Miss Sunshine showed less similarity to “Episodic Knowledge” (M = −0.12, SE = .10) than did patterns of thoughts reported in 500 Days of Summer (M = 0.11, SE = .10), t(2016) = −3.27, p = .003. However, there were no significant differences in “Episodic Knowledge” thoughts reported during Citizenfour (M = −0.02, SE = .10) compared to Little Miss Sunshine, t(2015) = 1.31, p = .392, or 500 Days of Summer, t(2016) = - 1.97, p = .121. The second model identified self-reported thoughts that were more similar to the pattern of “Intrusive Distraction” during Citizenfour (M = 0.41, SE = .09) than during Little Miss Sunshine (M = −0.15, SE = .09), t(2015) = 9.66, p <.001, or during 500 Days of Summer (M = −0.27, SE = .09), t(2015) = 11.66, p <.001. There was no difference in how similar reported thoughts scores were to “Intrusive Distraction” between Little Miss Sunshine and 500 Days of Summer, t(2016) = 2.03, p = .106. Model three found self-reported thoughts resemble patterns of “Verbal Detail” more for Citizenfour (M = 0.17, SE = .09) than for Little Miss Sunshine (M = - 0.14, SE = .09), t(2015) = 5.14, p <.001, or 500 Days of Summer (M = −0.04, SE = .09), t(2016) = 3.58, p = .001. Again, there were no significant differences in reported “Verbal Detail” scores between Little Miss Sunshine and 500 Days of Summer, t(2016) = −1.54, p = .271. Lastly, model four found reported thoughts during Citizenfour resembled patterns of “Sensory Engagement” (M = −0.32, SE = .07) significantly less than for either Little Miss Sunshine (M = −0.01, SE = .07), t(2015) = −6.07, p <.001, or 500 Days of Summer (M = 0.33, SE = .07), t(2016) = −12.85, p <.001. Additionally, reported thoughts during Little Miss Sunshine resembled patterns of “Sensory Engagement” less than reported thoughts during 500 Days of Summer, t(2016) = −6.81, p <.001. The results are presented visually in Figure 2, and further details of the LMM are presented in Supplementary Table 3.

The relationship between how patterns differ across genres of movies and relate to comprehension. Left to Right – The 3D scatterplot shows the average location of each film on three of the four PCA dimensions, “Episodic Knowledge,” “Verbal Detail,” and “Sensory Engagement.” The bar graphs show the average loading on each dimension, with the error bars showing the 95% Confidence Intervals. The plots on the right illustrate the relationship between the mDES dimensions and memory for information in the film. The top barplot shows the average comprehension score on each film with 95% Confidence Intervals error bars. The scatter plots below show the association between mDES components and comprehension. The scatter plot on the left shows the negative linear relationship between the “Intrusive Distraction” thought and memory. The plot on the right shows a positive association with “Sensory Engagement.” The blue line represents the best-fit line, and the shaded area shows the 95% Confidence Intervals.

Comprehension

Next, we examined how the thought patterns relate to the participants’ memory of information from the movie (Figure 2). Participants answered four comprehension questions for each film (12 total) related to relevant information in the clip they just watched (see Supplementary Table 4 for the comprehension questionnaire). We performed an LMM for which the movies, each thought pattern, and their interaction were explanatory variables of interest. Comprehension score was the dependent variable, and participant was included as a random effect. FDR was used to control for FWE, consisting of nine comparisons. The analysis revealed three significant main effects and a significant interaction. First, there was a significant main effect of movie genre on memory, F(2, 254.12) = 49.33, p <.001. Post hoc pairwise comparisons using the lsmeans were conducted to investigate the effect of memory performance across the different films. Significance thresholds for the post-hoc comparisons were adjusted using the Tukey method to control FWE within the model. Comprehension scores were significantly lower for questions related to information in Citizenfour (M = 2.42, SE = .09) compared to Little Miss Sunshine (M = 3.35, SE = .08), t(249) = −9.16, p <.001, as well as 500 Days of Summer (M = 3.33, SE = .08), t(273) = −8.33, p <.001. Notably, there was no significant difference in comprehension performance between Little Miss Sunshine and 500 Days of Summer, t(242) = -.18, p = .982. There were also two significant main effects of thought patterns — “Intrusive Distraction” was significantly associated with worse comprehension across the three movies, F(1, 324.41) = 9.27, p = .011, whereas “Sensory Engagement” was associated with better overall comprehension, F(1, 341.44) = 8.30, p = .013. Finally, there was a significant movie by thought pattern interaction for “Episodic Knowledge,” F(2, 268.96) = 4.46, p = .028. To follow up on this significant interaction, post-hoc simple slopes analysis was performed to assess the effect of “Episodic Knowledge” on comprehension performance across each movie, using FDR to control for multiple comparisons. The analysis found moments when patterns of thought were more similar to “Episodic Knowledge” were associated with significantly better comprehension performance for information in 500 Days of Summer, t(319.83) = 2.54, p = .030. The interaction predicted negative comprehension performance for information in Citizenfour, t(317.55) = −1.39, b = −0.09, SE = .06, p = .240, but positive comprehension performance for information in Little Miss Sunshine, t(321.85) = 0.93, b = 0.07, SE = .07, p = .350, although neither of these relationships was statistically significant. To see the complete model output and the pairwise comparisons, see Supplementary Table 5.

Brain – Thought Mappings: Voxel-space Analysis

Having established the dimensions that characterize the mDES data, how they organize experience in each movie, and their associations to memory, we then examined how these dimensions of experience relate to the brain activity at each moment in the films. Our first analysis examined this question at the voxel level. In this analysis, the averaged time course of each PCA dimension (collapsed across all individuals in Sample 2) was included as a regressor of interest at the first level for each of the three movies for the brain activity recorded in each subject in Sample 1. To perform a group comparison of these analyses, we used FLAME in FSL with a cluster forming threshold of z = 3.1 FWE, controlling for the number of regressors of interest to determine the significance of each cluster (p < .0125). This generated four group-level thresholded maps corresponding to regions whose activation during moments in the film was correlated with a specific thought pattern (see Figure 3 and Supplementary Table 6). “Episodic Knowledge” was significantly positively associated with activation in a region of dorsal visual cortex (b = 0.62, 95% CI [0.27, 0.97]). “Intrusive Distraction” was significantly associated with deactivation in the FPN (b = −0.78, 95% CI [-1.37, −0.20]). “Verbal Detail” was significantly associated with suppression of activity in primary auditory cortex (b = −1.64, 95% CI [-2.11, - 1.17]). Lastly, “Sensory Engagement” was significantly associated with activation in both visual and auditory cortexes (b = 1.26, 95% CI [0.81, 1.70]) (see Supplementary Table 7 for the table of average Gradient score for each movie derived from this analysis). We also performed a functional connectivity analysis using each set of clusters as the seed region, see Supplementary Figure 2-3 and Supplementary Table 8.

Group-level neural activation patterns associated with each of the dimensions of thought identified in a voxel space analysis. Left to Right - Regions in red are associated with activity corresponding to reports of “Episodic Knowledge,” green regions are associated with “Intrusive Distraction,” areas in purple are associated with “Verbal Detail,” and the regions in orange represent activity associated with “Sensory Engagement.” The bar plot illustrates the directionality of each parameter estimate with error bars representing 95% Confidence Intervals. Corresponding word clouds for each thought pattern are presented on the right for reference (Top to Bottom: “Episodic Knowledge,” “Intrusive Distraction,” “Verbal Detail,” and “Sensory Engagement”).

Our voxel-space analysis highlights two notable features of how thought patterns during movie watching were linked to brain activity. First, most of the regions whose activity we can predict based on mDES scores tended to fall within sensory cortex. Notably, “Sensory Engagement,” a pattern of multi-sensory thought linked to sounds and images, is associated with increased activity in both the visual and auditory systems. Interestingly, these regions overlap with regions linked to “Episodic Knowledge” maps (Posterior [orange and red]) and those linked to “Verbal Detail” (Right [orange and purple]). Notably, since both Episodic Knowledge and Sensory Engagement show positive links to comprehension and greater activity in sensory cortex regions, these results support the hypothesis that perceptual coupling is an important feature of making sense of events during movie watching (e.g. [9]). Second, the only regions identified outside sensory cortex were linked to “Intrusive Distraction” and broadly fall within regions of the FPN. Supplementary Figure 5 compares the regions linked to “Intrusive Distraction” with the FPN as defined by [40], showing that the regions showing less activation during moments when Intrusive Distraction was high generally fall within this system. This pattern is consistent with views of the FPN as playing an active role in maintaining a state of non-distracted task focus [41]. See Supplementary Table 6 for the analysis output.

Our analysis highlighted significant overlap across analyses in regions of visual and auditory cortex. To better understand these common regions, we calculated the overlap in these maps (left-hand panel of Figure 4) and performed a meta-analysis using Neurosynth to identify their likely functions (See Supplementary Table 9 for the specific loadings for each term). The results of this meta-analysis are displayed in the form of word clouds where it can be seen that regions common to reduced “Verbal Detail” and greater “Sensory Engagement” are linked to auditory processes (“sounds,” “noise,” and “pitch”). In contrast, regions common to “Sensory Engagement” and “Episodic Knowledge” are most likely associated with “videos,” providing independent meta-analytic corroboration that these regions are paramount for movie watching (see Supplementary Table 9). Finally, we conducted a resting state functional connectivity analysis using the regions overlapping as seeds (right-hand panel of Figure 4). This highlighted that both regions exhibited functional connectivity patterns, including many overlapping areas (coloured yellow). Notably, both functional connectivity maps contained the seed regions of the other analysis. Regions in red indicate those connected to the region of visual cortex, regions in green show those linked to auditory cortex, and regions in yellow are common to both spatial maps.

Brain regions are associated with multiple experiential features during movie watching. A region of superior temporal cortex is associated with positive reports of thoughts like “Sensory Engagement” and negative reports of thoughts like “Verbal Detail” (coloured green). A region of dorsal visual cortex was associated with both thoughts reported like “Sensory Engagement” and “Episodic Detail” (coloured red). The word clouds in the middle panel show the results of a Neurosynth meta-analysis of the regions, highlighting the most likely functions associated with these regions. The font size describes their importance (bigger = more important), and the colour describes their polarity (darker = positive). The panel on the right shows the results of seed-based functional connectivity analysis of these regions of overlap from a separate resting-state study.

State Space Analysis

Our next analysis used a “state-space” approach to determine how brain activity at each moment in the film predicted the patterns of thoughts reported at these moments (for prior examples in the domain of tasks, see [37, 38], See Methods). In this analysis, we used the coordinates of the group average of each TR in the “brain-space” and the coordinates of each experience sampling moment in the “thought-space.”. We ran four LMMs, one for each thought component, in each case using the location of each sampling point in the movie on Gradients 1-5 as explanatory variables and the scores for each thought pattern component (“Episodic Knowledge,” “Intrusive Distraction,” “Verbal Detail” and “Sensory Engagement) as dependent variables. Participant was included as a random intercept. The significance threshold was adjusted using the FDR to control for FWE within each model, controlling for five brain dimensions. After correction, we found two significant main effects. First, we found a significant main effect of Gradient 4 (DAN to Visual), which predicted the similarity of answers to the “Episodic Knowledge” component, t(.01) = 2.17, p = .013. This suggests that moments when thoughts were most similar to “Episodic Knowledge” were associated with moments when activity was high in visual cortex and lower in regions of the dorsal attention network (See Figure 6). There was also a significant main effect of Gradient 1 (Primary to Association) predicting patterns of thought related to “Sensory Engagement,” t(2046.34) = −3.26, p = .006. These results show that moments when thoughts are high on “Sensory Engagement” were associated with increased brain activity in regions within the primary cortex low on Gradient 1 (See Figure 6). See Supplementary Table 10 for complete results.

Our study highlighted links between patterns of self-reports resembling “Sensory Engagement” and “Episodic Knowledge” that were associated with brain activity patterns in our voxel and state space analyses. Therefore, in our next analysis, we used spin tests to formally understand the mapping between the voxel-based and state space analyses. To this end, we sampled the location of the identified cluster in our voxel analysis on the gradient of interest (e.g. the cluster of voxels associated with “Sensory Engagement” on Gradient 1) and used spin tests [42] to determine the likelihood that a score with this magnitude would occur by chance (based on a null distribution of 2500 permutations). This analysis identified that our voxel-based estimate of “Sensory Engagement” falls within regions of sensory cortex implied by the state space analysis (i.e. the sensory end of Gradient 1) at a level that is unlikely to occur by chance, p = .018 (two-tailed). In contrast, the location of “Episodic Knowledge” on Gradient 4 was not significant, p = .251 (See Figure 6).

Discussion

Our study aimed to identify how patterns of thought during movie watching relate to brain activity during movie clips from three different films: Citizenfour (a documentary), Little Miss Sunshine (a comedy), and 500 Days of Summer (a romance). We used open-source fMRI data from one group of participants (Sample 1) who watched these films while brain activity was recorded using fMRI and measured ongoing thought patterns using mDES in a second group of participants (Sample 2) for whom no brain activity was acquired (Figure 1). We used a novel sampling approach that allowed us to build a detailed description of the time series of different thought patterns every 15 seconds in the clips while only sampling individual participants a relatively small number of times per movie, minimizing disruption of the subjective experience of movie watching. Our analyses examined the overlap between the time series of brain activity in Sample 1 and reported thought patterns in Sample 2 to reveal the relationship between brain activity at different moments in a film and the associated experiential states.

Across the movies, we identified four thought patterns. First, “Episodic knowledge” was linked to experiences related to knowledge, the past, and the self. This pattern was also highest during the romance movie, specifically associated with better memory of information in this context and increased activity in dorsal medial regions of visual cortex by our state space and voxel space analysis. Second, “Intrusive Distraction” was related to thoughts with intrusive, distracting features that were spontaneous in nature. This thought pattern predicted poorer overall comprehension across all the movies and emerged in moments during the movies associated with reduced activation in regions of the FPN by our voxel space analysis. Third, “Verbal Detail,” which described experience as deliberate, detailed experiences in the form of words and with a negative emotional valance, was most prevalent in the documentary and associated with relative reductions in auditory cortex activation using our voxel space analysis. Finally, “Sensory Engagement” was related to multi-modal sensory experience (loading on images, sounds, and people with a positive emotional tone). “Sensory Engagement” was highest in the romance movie, associated with better overall comprehension performance across all movies, linked to activity in sensory cortex by both the voxel-based analysis and the state-space analysis, which were formally linked through a spin test.

Our study supports the hypothesis that perceptual coupling between the brain and external input is a core feature of how we make sense of events in movies (e.g. Smallwood, 2013). For example, “Sensory Engagement,” a pattern of enjoyable multi-sensory experience, was linked to better memory for information across all the movies and emerged when activity was high in both auditory and visual cortexes (regions at the sensory end of the principle gradient of functional brain organization). “Sensory Engagement” was the thought pattern with the most consistent and clearest links to the brain since it was the only thought pattern that showed a brain-thought mapping across our voxel and state space analysis that were formally linked using a spin test (see Figure 2 and 6). Similarly, reports of Episodic Knowledge emerged when brain activity was high within a dorsal region of visual cortex and was linked to better comprehension in one of the films (500 Days of Summer). Together, these data provide important corroboration for the hypothesis that states of sensory coupling support better memory for environmental events [8, 9]. Further, they also provide support for contemporary perspectives that movie-watching is a useful and important paradigm for understanding the brain basis behind naturalistic states because it allows brain function to be understood through the lens of a state rich in complex sensory input [19].

Moreover, our study provides support for the hypothesized role the frontoparietal system plays in supporting states of non-distracted focus during move watching. Reports of “Intrusive Distraction” were the only thought pattern associated with activity outside primary sensory systems and was seen to emerge at moments in films when activity during regions within the FPN was reduced (See Supplementary Figure 4 for the overlap between the regions identified linked to “Intrusive Distraction” and the FPN as defined by Yeo and colleagues [43]). Interestingly, the association between greater distraction and reduced activity within the FPN is consistent with this network’s assumed role in goal maintenance [44]. This result also confirms predictions from psychological research that states of distraction, like mind-wandering, often emerge when executive control is reduced [18]. This hypothesis gains further support for the consistent negative association between reports of “Intrusive Distraction” and worse performance in movie comprehension [18]. However, given that processes that drive the occurrence of states such as “Intrusive Distraction” are likely to depend on individual and contextual factors (e.g. poor executive control [18, 45]), we suggest further research may be necessary to understand this brain-cognition mapping fully.

Although our study highlighted neural activity in sensory cortex and regions of association cortex with the frontoparietal system, we found less evidence for the hypothesized role of the DMN during the movie-watching experience. Notably, the pattern of “Episodic Knowledge” identified by our analysis focuses on features of cognition such as knowledge, people, and oneself — all of which are terms that previous literature suggests could relate to the DMN [8, 20]. However, despite this conceptual mapping, neither our voxel space nor our state space analysis highlighted that this experience was related to moments when brain activity was higher within the DMN (See Figures 3 and 6).

Although our study did not identify a role of the DMN in movie watching, there are several possible methodological reasons why such a mapping may nonetheless exist. For example, our choice of films (documentary, romance, and comedy) may have precluded a genre in which the DMN may play a more obvious role (for example, mystery or suspense). Another possibility is that the DMN may be relevant to an understanding aspect of experience that is only captured during longer intervals of movie watching, such as extended plot lines, unexpected events, or other features of movies that depend on the segmentation of a movie into different events [46]. We only sampled experience in short 11-minute clips, so the DMN may relate to aspects of experience that are important for movie-watching over longer time periods. It is also possible that the unique features of the DMN make it difficult for our method to reveal its role in experience. The DMN is a spatially heterogeneous system and highly variable across individuals [47]. Since our analytic approach links thought patterns in one set of individuals to brain activity in another, it could be challenging for this method to identify its role in movie-related thought patterns in a highly idiosyncratic brain network such as the DMN [47, 48]. This possibility could be easily tested by examining mappings between thought patterns and individuals using precision scanning methods [49]. Studies have also highlighted that the DMN is heterogeneous in the functions it is involved in and, in particular, is hypothesized to shift flexibly from perpetually decoupled to coupled states [13]. So, for example, the role this network is hypothesized to play in off-task or mind-wandering states (e.g. [22]) may obscure its’ role in perceptually coupled states (like movie watching).

It is important to note that while our study does not establish what role DMN plays in movie-watching states, it does highlight a clear role for sensory systems in multiple ongoing thought patterns. Thus, based on our study, whatever role the DMN plays during movie-watching, it is likely to be built upon the foundational role sensory systems play in our thoughts and feelings while we watch films. Consistent with this possibility, contemporary views on the DMN argue that its function arises from its’ topographical location in the cortex [8]. According to this perspective, the DMN is located at the maximal distance from primary systems but also constitutes the apex of processing streams like the ventral and dorsal streams [36]. We have previously argued that whatever role the DMN plays in cognition may entail interactions with primary systems, possibly through the transformation of neural signals along different processing streams [8]. In other words, it is possible that the DMN plays a role in movie watching that complements information processing in sensory input, an important but possibly less direct contribution to the movie-watching experience than regions in the visual or auditory cortex. Since the DMN is widely hypothesized to be important in movie watching, we performed an exploratory functional connectivity analysis to examine whether the sensory regions we identified in our study are functionally coupled to the DMN at rest (see Supplementary Figure 3). This revealed that sensory regions identified in our study shared a common set of regions within the DMN (including anterior regions of the temporal lobe and the inferior frontal gyrus; see Supplementary Figure 3). This analysis was exploratory, so any results should be treated with caution; however, it is consistent with the possibility that a more fine-grained precision mapping approach could identify the role these regions play in ongoing thought during movie watching.

In summary, our study used a novel paradigm to establish the role primary systems play during our experiences while we watch movies. Nonetheless, important questions about other features of experience during move-watching remain unanswered. For example, patterns of “Verbal Detail” were associated with moments in the films where auditory cortex activation was reduced. This may reflect a shift in attention away from the processing of the auditory input related to the movie towards evaluative thoughts about the people or events in the film, perhaps in the form of inner speech [39]. These thoughts may occur when participants form opinions about movie characters, elaborate on the context, or make inferences about the information they have encoded from the film [50]. This possibility could be easily explored by examining more specific experience sampling items that directly target inner speech or comprehension questions that target inferential processing on events within the movies (See [12]). Importantly, our study provides a novel method for answering these questions and others regarding the brain basis of experiences during films that can be applied simply and cost-effectively. In the future, the ease with which our method can be applied to different groups of individuals and to different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and the movie-watching experience.

Methods

Participant Pool – Laboratory Sample

The sample consisted of 120 participants (98 women (81.7%), 17 men (14.2%), 5 non-binary or similar gender identity (3.3%); age: M = 18.83, SD = 1.19, range of 18-23) who participated in the in-person laboratory study to watch three 11-minute movie clips, responded to mDES probes, and completed a brief comprehension assessment. All participants spoke English, with 95% of the sample primarily residing in Canada (China (1.7%), India (.8%), Nigeria (.8%), USA (1.7%)). This study was granted ethics clearance by the Queen’s University General Research Ethics Board. Participants were recruited between March 2023 and April 2023 through the Queen’s University Psychology Participant Pool. Participants provided written, informed consent via electronic documentation before participating in the research study. Participants were rewarded with one-course credit or $10.00 for their participation and were provided with a verbal and written debrief form upon completion of the study.

Participant Pool – Brain Data Sample

See Aliko and colleagues for a description of the sample [35].

Resting-state Participant Pool

191 student volunteers (mean age=20.1 ± 2.25 years, range 18 – 31; 123 females) with normal or corrected-to-normal vision and no history of neurological disorders participated in this study. Written informed consent was obtained from all subjects prior to the resting-state scan. The study was approved by the ethics committees of the Department of Psychology and York Neuroimaging Centre, University of York. Previous studies have used this data to examine the neural basis of memory and mind-wandering, including region-of-interest-based connectivity analysis and cortical thickness investigations [39, 51-59].

Procedure

Participants attended an individual in-person testing session at the laboratory at Queen’s University to watch movie clips after providing written informed consent and basic demographic information. Participants were assigned to a testing booth, a small room with a desk, a chair, a computer to present the stimuli, and headphones to listen to the audio stimuli. Participants had to attend to the computer screen to watch and listen to three randomly presented 11-minute video clips. During each movie clip, participants were briefly interrupted five times to answer randomly assigned mDES probes about the content of their thoughts just prior to the probe. After the first minute of each clip, each probe was delivered once every two minutes, using a jittered technique, by assigning participants to a counterbalanced probe order to minimize the systematic impact of prior and later probes at any given sampling moment (see Supplementary Figure 5 for visualization). Once participants finished watching the three clips, they completed a 12-item comprehension questionnaire on Qualtrics, with four items related to information from each of the three movie clips (see Supplementary Table 4).

Multi-dimensional experience sampling (mDES)

Participants received 15 total mDES probes across the three clips, five for each, and all responses were made with respect to their thoughts just before the probe interrupted their viewing. No probes were administered within the first minute of the clip — the first possible probe was administered at the 75-second mark to allow participants to situate themselves with the context of the movie clip. Each of the 16 mDES questions appeared in a randomized order, and participants were asked to use the directional arrow keys to move a slider across the screen to indicate, on a scale of 1(not at all) to 10 (completely), how much that particular feature characterized their thoughts. The specific items used in this experiment are presented in Supplementary Table 1.

Probe Orders

There were 16 probe orders that a participant could be assigned to for each movie, which determined the delivery time of the five mDES probes throughout each 11-minute clip. Each subject ID was assigned to three different probe orders for each of the three films, and no subject ID was given the same probe order twice. This was achieved by creating a matrix of equally distributed probe orders across subject IDs to ensure each moment in the movie was probed an equal amount of times while uniquely distributing probes to control for order effects. Probe orders were designed to sample participants at every 15-second interval of the entire movie clip but only probed a single participant five times per clip. This allowed us to sample experience as frequently as possible without interrupting participants from naturalistic viewing by oversampling or too frequent probes and to control for ordering effects from the delivery time of the other probes. Each participant received a probe approximately every two minutes using a jittered technique. The first eight probe orders do not share any of the same probe delivery times, whereas the latter eight probe orders (9-16) have been shuffled so that they share one probe time with only one of the orders from the former eight orders. Across orders 9-16, each probe from the first eight orders is repeated in a different combination of probes so that mDES responses at each probe are derived from participants in two different orders. See Supplementary Figure 6.

Movie Clip Stimuli

Movie stimuli were presented in 11-minute scenes from Citizenfour, Little Miss Sunshine, and 500 Days of Summer. Stimuli were selected from the Naturalistic Neuroimaging Database (NNDb) [35] and chosen based on genre, and they were cut from the full-length movie down to 11-minute clips. Participants were informed they would watch three movie clips from varying genres but were presented randomly. Written instructions were presented on screen at the beginning of each clip. After watching the three clips and responding to the mDES probes, participants were presented with a Qualtrics questionnaire to complete a comprehension test on the content of each film clip.

Comprehension questions

Participants completed a comprehension test of 12 questions, four from each movie. The questions were created collaboratively to test general knowledge about the movie that would otherwise not be common sense and cover events during the clip’s beginning, middle, and end. An example of one of the comprehension questions was “What breakfast item did Olive order a la mode?” for the movie clip Little Miss Sunshine. A table of all the questions with corresponding answers can be found in Supplementary Table 4. Participants responded using 1-2 words and were otherwise instructed to enter “?” if they had no answer.

Brain Analysis

Our analyses used brain data acquired and shared by the NNDb, an open-access database of pre-processed MNI 2mm fMRI data (TR = 1) of participants who watched one of 10 full-length movies [35]. We utilized MNI 2mm fMRI data corresponding to participants who watched Little Miss Sunshine (n = 6), Citizenfour (n = 18), and 500 Days of Summer (n = 20). The specific pre-processing steps applied to the brain data and specific details of the sample are described in [35].

Voxel-space Analysis

Our analysis used the pre-processed data from [35]. The first step in our analysis was to extract the brain activity of each individual for the 10-minute section that we sampled in experience using mDES. To map the mDES time series onto these data, we created a mean time series for each movie, which described the mDES experience, averaged across 40 observations at every 15-second interval. Next, we interpolated this time series to generate a time series of experiences that matched the TR used to sample brain activity in Sample 1 (1 second). Next, the interpolated time series for each PCA for each film were included as regressors for each individual’s brain activity (i.e. four regressors for each movie). Finally, we used FLAME as implemented in FSL to perform a group-level analysis across the three movie clips. In this analysis, we set the cluster-forming threshold at z = 3.1 and corrected for FWE by accounting for the number of voxels in the brain, the three movies we examined, and the four PCAs in each movie. This resulted in the correction of the FWE p-value from FSL p < .0025.

Brain-space Analysis

To create the “brain space” coordinates, we first calculated a group-averaged timeseries for each movie. To do this, we first z-scored each individual’s timeseries data and calculated the mean activity in each voxel at each TR across the whole sample, resulting in a group-averaged brain volume at each TR. Next, we applied a binarized mask to each group-averaged brain volume at each TR. This mask was generated based on the (cortical and subcortical) gradient maps openly available on Neurovault (https://identifiers.org/neurovault.collection:1598). These gradient maps were produced from the decomposition of the Human Connectome Project resting-state fMRI data [36]. Then, we calculated the (spearman rank) correlation between each group-averaged per-TR brain map and each of the first five gradient maps. Consistent with published literature, the results of these correlations constitute the coordinates of each moment of the film in the 5D Brain space [37]. An example of these coordinates is presented in the upper left panel of Figure 5. The code for this analysis is openly available at https://github.com/willstrawson/StateSpace (v1.0.0).

Comparison of the locations of each moment across the movie clip in the (top row) “Thought Space” and the “Brain Space.” Left to Right – 3D scatterplots of the coordinate locations of each thought pattern (“Episodic Knowledge,” “Verbal Detail,” and “Sensory Engagement) and gradients 1-3 (Gradient 1 (Associated – Primary), Gradient 2 (Visual – Somato-motor), Gradient 3 (Frontoparietal – Default) during Citizenfour, Little Miss Sunshine, and 500 Days of Summer. Observations in blue occur earlier during the film, and observations in red occur later in the film. The gradient maps (1-3) and thought pattern word clouds are presented on the right for reference.

Comparison of State Space- and Voxel-based analyses of “Sensory Engagement” and “Episodic Knowledge” with Gradients. Left to right – The barplots illustrate the associations for the significant models using Gradients 1-5 as explanatory variables and the thought patterns, “Sensory Engagement” and “Episodic Knowledge” as dependent variables. We performed two spin tests to formally compare these results to those using the voxel space analysis (permutation = 2500). The spin tests revealed the location of the cluster of voxels associated with “Sensory Engagement” are located within the sensory regions of Gradient 1, unlike to have occurred by chance, p = .018. In contrast, the location of the cluster of voxels associated with Episodic Knowledge on Gradient 4 was within the null distribution, p = .251. The locations of the relevant clusters in gradient parcel space are presented in the scatter plots (red points indicate the location of parcels from the relevant comparison).

Cognitive Decoding

Connectivity maps were uploaded to Neurovault [60] (https://neurovault.org/collections/13821/) and decoded using Neurosynth [43]. Neurosynth is an automated meta-analysis tool that uses text-mining approaches to extract terms from neuroimaging articles that typically co-occur with specific peak coordinates of activation. It can be used to generate a set of terms frequently associated with a spatial map. The results of cognitive decoding were rendered as word clouds using in-house scripts implemented in Python. We excluded terms referring to neuroanatomy (e.g., “inferior” or “sulcus”), as well as the second occurrence of repeated terms (e.g., “semantic” and “semantics”). The size of each word in the word cloud relates to the frequency of that term across studies.

Analysis of intrinsic functional connectivity using resting-state fMRI

Our analysis additionally used resting state cohort data (see below) to seed the maps created from each of the four thought pattern time series regressors from the prior analysis. We used this seed-based analysis to see if different resting state networks converge with the maps we have generated from movie-watching.

Pre-processing

Pre-processing and statistical analyses of resting-state data were performed using the CONN functional connectivity toolbox V.20a (http://www.nitrc.org/projects/conn; [61] implemented through SPM (Version 12.0) and MATLAB (Version 19a). For pre-processing, functional volumes were slice-time (bottom-up, interleaved) and motion-corrected, skull-stripped and co-registered to the high-resolution structural image, spatially normalized to the Montreal Neurological Institute (MNI) space using the unified-segmentation algorithm, smoothed with a 6 mm FWHM Gaussian kernel, and band-passed filtered (.008 - .09 Hz) to reduce low-frequency drift and noise effects. A pre-processing pipeline of nuisance regression included motion (twelve parameters: the six translation and rotation parameters and their temporal derivatives), scrubbing (outlier volumes were identified through the composite artifact detection algorithm ART in CONN with conservative settings, including scan-by-scan change in global signal z-value threshold = 3; subject motion threshold = 5 mm; differential motion and composite motion exceeding 95% percentile in the normative sample) and CompCor components (the first five) attributable to the signal from white matter and CSF [62] as well as a linear detrending term, eliminating the need for global signal normalization [63, 64]

Seed Selection and Analysis

Intrinsic connectivity seeds were binarized masks derived from voxel-space analysis using FLAME through FSL. We excluded all non-grey matter voxels that fell within these masks. We performed seed-to-voxel analyses convolved with a canonical hemodynamic response function for each seed. At the group-level, analyses were conducted using CONN with cluster correction at p < .05 and a threshold of p-FDR = .001 (two-tailed) to define contiguous clusters.