1. Neuroscience
Download icon

Temporo-parietal cortex involved in modeling one’s own and others’ attention

  1. Arvid Guterstam  Is a corresponding author
  2. Branden J Bio
  3. Andrew I Wilterson
  4. Michael Graziano
  1. Department of Psychology, Princeton University, Department of Psychology, United States
  2. Department of Clinical Neuroscience, Karolinska Institutet, Sweden
Research Article
  • Cited 0
  • Views 301
  • Annotations
Cite this article as: eLife 2021;10:e63551 doi: 10.7554/eLife.63551

Abstract

In a traditional view, in social cognition, attention is equated with gaze and people track other people’s attention by tracking their gaze. Here, we used fMRI to test whether the brain represents attention in a richer manner. People read stories describing an agent (either oneself or someone else) directing attention to an object in one of two ways: either internally directed (endogenous) or externally induced (exogenous). We used multivoxel pattern analysis to examine how brain areas within the theory-of-mind network encoded attention type and agent type. Brain activity patterns in the left temporo-parietal junction (TPJ) showed significant decoding of information about endogenous versus exogenous attention. The left TPJ, left superior temporal sulcus (STS), precuneus, and medial prefrontal cortex (MPFC) significantly decoded agent type (self versus other). These findings show that the brain constructs a rich model of one’s own and others’ attentional state, possibly aiding theory of mind.

Introduction

Reconstructing someone else’s attentional state is of central importance in theory of mind (Baron-Cohen, 1997; Calder et al., 2002; Graziano, 2013). By identifying the object of someone else’s attention, and having some intuitive understanding of the complex dynamics and consequences of attention, one can reconstruct at least some of the other person’s likely thoughts, intentions, and emotions, and make predictions about that person’s behavior. Almost all work on how people reconstruct the attention of others has focused on gaze direction. For example, the human eye has a high contrast between pupil and sclera, possibly an adaptation for better gaze tracking (Kobayashi and Kohshima, 1997). The superior temporal sulcus in monkeys and humans may contain specialized neural circuitry for processing gaze direction (Hoffman and Haxby, 2000; Marquardt et al., 2017; Perrett et al., 1985; Puce et al., 1998; Wicker et al., 1998). Seeing a face gaze at an object automatically draws one’s own attention to the object (Friesen and Kingstone, 1998; Frischen et al., 2007). These and other findings show the importance of reconstructing gaze direction in social cognition.

To be adaptive in aiding theory of mind, however, a model of attention should be far more than a vector indicating gaze direction. We previously suggested that the human brain constructs a rich, dynamic, and predictive model of other people’s attention (Graziano, 2019; Graziano, 2013; Graziano and Kastner, 2011). The model should contain information about different types of attention, about the rapidity or sluggishness with which attention tends to move from item to item, about how external factors such as salience and clutter are likely to affect a person’s attention, and about how attention profoundly affects thought, memory, and behavior. In the proposal, that deeper model is constrained by incoming information, including gaze direction. However, other cues can also constrain the model. People rely on the other person’s body posture, on cues in the surrounding environment, on speech, and on social context. For example, blind people must be able to build models of other people’s attention without seeing the other person’s eyes. Likewise, during a phone conversation, we cannot see the other person and yet we intuitively understand whether that person is attending to what we have said or is distracted by her own words or by a salient event on her end of the line.

Several recent experiments provide evidence for an automatically constructed model of the attention of others that may go beyond merely registering gaze direction (Guterstam et al., 2019; Guterstam and Graziano, 2020a; Kelly et al., 2014; Pesquita et al., 2016; Randall and Guterstam, 2020; Vernet et al., 2019). For example, Pesquita et al., 2016 found that when participants watch an actor in a video attending to an object, the participants implicitly distinguish between whether the actor’s attention was drawn to the object exogenously (bottom-up or stimulus-driven attention), or whether the actor endogenously shifted attention to the object (top-down or internally driven attention). Exogenous and endogenous attention are the two principal ways in which selective attention moves between objects. They are emphasized in distinct cortical networks (the ventral and dorsal attention networks), and they influence the behavior of agents in profoundly different manners (Corbetta et al., 2008; Corbetta and Shulman, 2002; Posner, 1980; Shulman et al., 2010). The ability to distinguish between someone else’s exogenous and endogenous attention is therefore one example of how people may construct a rich, dynamic, and useful model of other people’s attention beyond merely encoding gaze direction or identifying the object of attention.

Inspired by the vignette-style tasks widely used in studies on theory of mind (Fletcher et al., 1995; Gallagher et al., 2000; Happé, 1994; Saxe and Kanwisher, 2003; Vogeley et al., 2001), in the present study, we used functional magnetic resonance imaging (fMRI) and multi-voxel pattern analysis (MVPA) to study brain activity in participants while they read brief stories about people’s attention (Figure 1). Some of the stories implied that attention was being attracted exogenously (‘Kevin walks into his closet and notices the bright red tie…') and some stories implied that attention was being directed endogenously (‘Kevin walks into his closet and looks for the bright red tie…'). We also included analogous stories written in the first person, casting the subject of the experiment as the agent (‘You walk into your closet and notice the bright red tie…'). The study therefore used a 2 × 2 design (exogenous versus endogenous attention X self agent versus other agent). Finally, we included a fifth, control condition, consisting of nonsocial stories in which the agent was replaced by an inanimate object, that, like attention, has a source and a target, such as a camera or a light source (‘In a closet, a light shines on a red tie…').

Methods.

(A) Schematic timeline of the fMRI design. In each trial, subjects were presented with a short story for 10 s, describing a scene in which an agent attended to an object in the environment. A probe statement was then shown for 4 s, relating to either the story’s spatial context or object property, to which the subjects responded either true or false by button press. (B) The agent in the story was either the subject him-/herself (self) or another person (other), and directed attention to the object endogenously (internally driven attention) or exogenously (stimulus-driven attention), yielding a 2 × 2 factorial design of attention type × agent. We created 80 unique stories in four different versions, one for each condition. We made minimal changes to the wordings to keep the story versions as semantically similar as possible. Green highlighting indicates wording specifying agent, yellow highlighting indicates wording specifying attention type (colors not part of actual visual stimuli). For each story, each subject saw only one of the four versions (balanced across subjects). We also included a nonsocial control condition (20 unique stories based on a subset of the 80 social stories) in which the agent was replaced by a non-human object.

We made four predictions. Our first, central prediction was inspired by the Pesquita et al., 2016 study described above. We hypothesized that participants would encode the type of attention in the story (exogenous versus endogenous), and that this encoding would be evident in some subset of the areas classically involved in theory of mind. Previous experiments on theory of mind typically recruited a network of cortical areas including the temporoparietal junction (TPJ), the superior temporal sulcus (STS), the medial prefrontal cortex (MPFC), and the precuneus (Gallagher et al., 2000; Saxe and Kanwisher, 2003; van Veluw and Chance, 2014; Vogeley et al., 2001). This first prediction, that the social cognition network will encode the exogenous-versus-endogenous distinction, represents the main, novel contribution of this study. Previous studies have used MVPA to decode various aspects of other people’s mental states from activity in social brain areas, such as their beliefs (Koster-Hale et al., 2017), intentions (Koster-Hale et al., 2013), and perceptual source (Koster-Hale et al., 2014). To the best of our knowledge, this investigation is the first to test whether activity in social brain areas can decode other people’s attentional states.

Our second prediction was that participants would encode information about the agent in the story (self versus other), and that this encoding would again be evident in some subset of the areas classically involved in theory of mind. Self-versus-other encoding has been examined in previous studies, and found to be reflected in the theory-of-mind network (e.g. Northoff et al., 2006; Ochsner et al., 2004; Passingham et al., 2010; Qin and Northoff, 2011; van Veluw and Chance, 2014). This second prediction represents a test of whether our present paradigm, using subtle wording differences between similar sentences, can produce results consistent with previous findings.

Third, we predicted that participants would encode information associated with the interaction between the two factors. We predicted that at least some subset of the areas in the theory-of-mind network may encode the type of attention (exogenous versus endogenous) to a different extent in self-related stories as compared to other-related stories.

Fourth and finally, we tested for brain regions that encoded the distinction between social stories (with human agents) and nonsocial stories (with only non-agent objects). We predicted that this social-versus-nonsocial encoding would again be evident in the same network of brain regions noted above, that are known to be involved in theory of mind. This final analysis served as a control to check on the validity of the story stimuli and confirm that they engaged social cognition as expected.

Results

Behavioral results

Participants answered a simple true-or-false probe question after each story (e.g. ‘Emma is on a bus’), to ensure alertness throughout the experiment. Although the MRI results depended on the time interval during the reading of the story and not during the reading or answering of the probe question, the behavioral response to the probe question may give some indication of whether the story conditions were well balanced. If one type of story was more difficult to process, or caused the subjects to think more deeply about the character in the story, that difference may be reflected in a different accuracy or latency in responding to the subsequent probe question. However, no significant differences were observed in accuracy (F3,93=0.15, p=0.930, ANOVA) or latency (F3,93=1.66, p=0.181, ANOVA) among the four social story conditions (Table 1).

Table 1
Behavioral results.

Mean accuracy and latency for the probe question that was presented after each story, for each of the five experimental conditions. The mean accuracy and latency across all social story conditions are also reported.

Story typeSelf-endoSelf-exoOther-endoOther-exoNon-socialSocial (all)
Accuracy (%)Mean94.293.493.193.490.093.6
SEM1.91.81.61.92.21.4
Latency (ms)Mean161316011667165319571634
SEM545646566349

When participants responded to the probe questions after non-social, control stories, versus when they responded to the probe questions after social stories, no significant difference in accuracy was found (t31 = −1.83, p=0.077), although as expected, a slightly longer latency was observed after non-social stories than after social stories (t31 = 9.18, p<0.001; see Table 1). The reason for the latency difference is almost certainly because the probe statements in the non-social condition were on average two words longer than those in the social conditions (eight words versus six words), because the character in the probe statements ("You …" or "Emma …") was replaced by an object that required more words to describe (e.g. "The surveillance camera …"). One might therefore expect that it took participants a little longer to read the non-social probe statements compared to the social ones. We suggest that this subtle difference in latency during the post-story question period is unlikely to have affected the comparison of MRI activity between social and non-social conditions, since the relevant MRI activity was evoked by the time period during the reading of the story, not during the reading and answering of the questions.

Prediction 1

We hypothesized that participants would encode the attentional state of the agents in the stories in enough detail to distinguish between endogenous and exogenous attention, even though the difference between the story types was extremely subtle – only a few words that very slightly altered the semantic meaning of the sentences. We made the strong prediction that decoding would be found within the set of brain areas typically included in the theory-of-mind cortical network. Figure 2 shows six ROIs within the theory-of-mind network, based on a meta-analysis of previous theory-of-mind studies (van Veluw and Chance, 2014). Figure 3A shows the results (see Table 2 and Figure 3—figure supplement 1 for more details). Decoding accuracy for endogenous versus exogenous stories was significantly above chance for the left TPJ, and the significance of the left TPJ decoding survived a multiple comparison correction for the six ROIs (mean decoding accuracy 52.9%, 95% CI 50.7–55.2, puncorrected = 0.0046, pFDR-corrected = 0.0276). For the sake of a thorough evaluation, because different researchers have defined slightly different locations for the TPJ, we replicated the finding of a significant decoding in the left TPJ using three additional, previously reported theory-of-mind ROIs in the left TPJ (Mar, 2011; Molenberghs et al., 2016; Schurz et al., 2014), suggesting that the effect is robust (Figure 3B and Figure 3—figure supplement 2). The results therefore show that activity in the left TPJ allowed for significant decoding of the attentional state – exogenous versus endogenous – of agents in a story.

Regions of interest (ROIs).

Six ROIs were defined based on peaks reported in an activation likelihood estimation meta-analysis of 16 fMRI studies involving theory-of-mind reasoning (van Veluw and Chance, 2014). The ROIs consisted of 10-mm-radius spheres centered on peaks in the bilateral temporoparietal junction (TPJ) and superior temporal sulcus (STS), and two midline structures: the precuneus and medial prefrontal cortex (MPFC). Here, the TPJ and STS ROIs on the left side are shown.

Figure 3 with 9 supplements see all
Decoding attention type, agent, and the interaction between them, in six brain areas.

For definition of ROIs, see Figure 2. Each point shows mean decoding accuracy. Error bars show SEM. Red horizontal line indicates chance level decoding. Significance indicated by * (p<0.05) and ** (p<0.01), based on permutation testing (all significant p values also survived FDR correction for multiple comparisons across all six ROIs [all significant corrected ps <0.05]). (A) The ability of a classifier, trained on BOLD activity patterns within each ROI, to decode endogenous (endo) versus exogenous (exo) attention. (B) To test the robustness of the endo-versus-exo decoding in the left TPJ, we replicated the results in three ROIs derived from theory-of-mind neuroimaging meta-analyses (Mar, 2011; Molenberghs et al., 2016; Schurz et al., 2014) other than the one used for the main analysis (van Veluw and Chance, 2014) (see Figure 3—figure supplement 2 for details). (C) The cluster shown had the highest decoding accuracy in the whole-brain, searchlight analysis, for the endo-versus-exo comparison. No clusters in this analysis survived brain-wide correction for multiple comparisons. We here report clusters surviving the conventional uncorrected voxelwise threshold p<0.001, for purely descriptive purposes. See Figure 3—figure supplement 3 and Supplementary file 1 for details. (D) Decoding accuracy for agent (self versus other). (E) Decoding accuracy for the interaction between type of attention and agent. (F) The ability of a classifier, trained to discriminate attention type in self stories, to decode attention type in other stories, and vice versa (i.e. two-way cross-classification), based on activity patterns in the left TPJ ROI.

Table 2
Decoding attention type, agent, and the interaction between the two, within the six ROIs.

For definition of ROIs, see Figure 2. Mean decoding accuracy (%), 95% confidence interval (based on bootstrap distribution), and p value (based on permutation testing) are shown for each of the six ROIs. Results shown for decoding endogenous (endo) versus exogenous (exo) attention type, self versus other agent type, and the interaction between the two variables. * indicates significant p values that survived correction for multiple comparisons across all six ROIs (FDR-corrected p<0.05).

L TPJR TPJL STSR STSMPFCPrecuneus
Endo vs. ExoMean accuracy52.9%51.4%50.4%48.0%49.5%50.2%
 95% CI50.7–55.249.1–53.947.8–52.845.9–50.147.6–51.448.5–51.8
 P value0.0046*0.11480.35180.95470.64390.4428
Self vs. OtherMean accuracy53.0%51.0%52.3%51.3%52.6%52.7%
 95% CI50.1–55.648.5–53.450.6–54.148.9–54.150.5–55.050.4–55.0
 P value0.0053*0.19740.0204*0.12410.0105*0.0099*
(Self vs. Other) × (Endo vs. Exo)Mean accuracy diff1.6%1.5%2.0%−3.0%2.5%0.6%
 95% CI−2.7–6.3−3.6–6.5−2.7–6.3−7.4–1.1−2.3–6.7−5.5–5.5
 P value0.24300.26390.19670.89440.14140.3900

We then used a searchlight analysis over the whole brain to test whether any further areas may have significantly decoded the endogenous-versus-exogenous distinction. It should be noted that an exploratory searchlight analysis, compared to ROI analyses based on strong predictions, is much more statistically conservative because of the brain-wide correction for multiple comparisons. Its usefulness is that it may reveal any cluster of very strong decoding that was missed by the more sensitive analysis restricted to the ROIs. We found no brain-wide significant clusters of decoding for the endogenous versus exogenous distinction. However, four clusters survived the uncorrected p<0.001 threshold, and are reported in a purely descriptive manner in Figure 3—figure supplement 3 and Supplementary file 1. The brain-wide searchlight peak was located in the left posterior STS (decoding accuracy 53.7%, t = 4.21, p<0.001 uncorrected; Figure 3C), at a distance of 20 mm from the center of the left TPJ ROI, and coincided with the posterior TPJ (TPJp) subregion as defined by Bzdok et al., 2013 and Mars et al., 2012.

To control for potential univariate effects that could drive classifier performance, we explored the endogenous > exogenous and exogenous > endogenous univariate contrasts, which did not reveal any significant activity within the ROIs (Figure 3—figure supplement 9) or anywhere else in the brain, not even at the uncorrected threshold p<0.001 (Supplementary file 5). These findings are compatible with previous studies (e.g. Hassabis et al., 2009) that have demonstrated the superiority of pattern-sensitive multivariate analyses compared with conventional univariate approaches for detecting differences in activity between conditions with highly similar macroscopic characteristics.

In addition to these planned analyses, we explored the endogenous-versus-exogenous decoding within dorsal attention network regions. This analysis was motivated by an alternative hypothesis: people might simulate the act of attention orienting when reading the exogenous and endogenous stories, and thus activate the corresponding ventral (exogenous) attention network, to which the TPJ belongs, and dorsal (endogenous) attention network in a ‘mirror-neuron-like’ fashion (see Discussion for details). However, this control analysis revealed no significant decoding in any of the dorsal attention network ROIs (Figure 3—figure supplement 6).

Prediction 2

We hypothesized that participants would process the distinction between the two types of agent in the stories (self versus other). We made the strong prediction that decoding would be found within the same set of ROIs in the theory-of-mind cortical network. Figure 3D shows the results (see Table 2 and Figure 3—figure supplement 1 for more details). Decoding accuracy for self versus other stories was significantly above chance, and survived a multiple comparisons correction, for the left TPJ (mean decoding accuracy 53.0%, 95% CI 50.1 to –55.6, puncorrected = 0.0053, pFDR-corrected = 0.0210), left STS (mean decoding accuracy 52.3%, 95% CI 50.6–54.1, puncorrected = 0.0204, pFDR-corrected = 0.0306), MPFC (mean decoding accuracy 52.6%, 95% CI 50.5–55.0, puncorrected = 0.0105, pFDR-corrected = 0.0210), and precuneus (mean decoding accuracy 52.7%, 95% CI 50.4–55.0, puncorrected = 0.0099, pFDR-corrected = 0.0210). These results confirm that the present paradigm, using stories that are subtly different from each other, can obtain social cognitive results that are consistent with previous findings.

Prediction 3

We hypothesized that areas in the theory-of-mind network would not only encode the distinction between endogenous and exogenous attention, but do so to a significantly different extent in self-related stories than in other-related stories. However, the results showed no significant interaction in any of the ROIs (Figure 3E, Table 2, and Figure 3—figure supplement 1). Thus, we found no support for prediction 3.

An alternative hypothesis is that attention type is encoded similarly in self and others. In a post hoc analysis, we focused on the left TPJ which was the only ROI that showed significant attention type decoding (Prediction 1), and tested for overlap in attention encoding in self and others using a two-way cross-classification analysis (see Figure 3F and Figure 3—figure supplement 7). In this analysis, one classifier was trained to discriminate endogenous versus exogenous self-stories and tested on other-stories, and another classifier was trained to discriminate endogenous versus exogenous other-stories and tested on self-stories, from which an average cross-classification decoding accuracy for the left TPJ was obtained. Endogenous-versus-exogenous decoding significantly generalized across self-stories and other-stories (mean decoding accuracy 51.9%, 95% CI 49.9–54.1, p=0.0393), suggesting at least some degree of overlap in the encoding of attention in others and in oneself.

Prediction 4

Finally, we asked whether the activity in the theory-of-mind network would distinguish between social stories and nonsocial stories. This final analysis served as a control to check the validity of the story stimuli and confirm that they engaged social cognition as expected. We expected a signal of much greater magnitude in this analysis than in the analyses described above. The reason is that, as noted above, the types of social stories differed from each other by only a few words, and were nearly identical in semantic content; thus any brain signal reflecting those differences is expected to be subtle. The distinction between social and nonsocial stories, however, was much greater semantically, and therefore the evidence of decoding in the brain is expected to be of greater magnitude.

Figure 4 shows the results (see Table 3 and Figure 4—figure supplement 2 for more details). The results are separated into six ROIs, and for each ROI, separated into four individual analyses, corresponding to each of the four main social conditions contrasted with the nonsocial control. Decoding accuracy was significantly greater than chance in almost all analyses across the six ROIs. The right STS showed the least consistent evidence of decoding. The TPJ bilaterally and the precuneus showed the most consistent evidence of decoding. These results show strong evidence of decoding of the social versus nonsocial stimuli in the known theory-of-mind, cortical network.

Figure 4 with 2 supplements see all
Decoding social versus nonsocial stories.

The ability of a classifier, trained on BOLD activity patterns within each of the six ROIs, to decode each of the four social story conditions (endogenous-self, exogenous-self, endogenous-other, and exogenous-other) versus the nonsocial control. Each bar shows mean decoding accuracy, error bars show SEM, red horizontal line shows chance level decoding. Significance indicated by * (p<0.05), ** (p<0.01), and *** (p<0.001) based on permutation testing (all but one of the significant p values also survived FDR correction for multiple comparisons across all six ROIs; see Table 3 for numerical details).

Table 3
Decoding social versus nonsocial stories within the six ROIs.

For definition of ROIs, see Figure 2. Mean decoding accuracy (%), 95% confidence interval (based on bootstrap distribution), and p value (based on permutation testing) are shown for each of the six ROIs. Results shown for each of four social story conditions (endogenous-self, exogenous-self, endogenous-other, and exogenous-other) versus the nonsocial control. * indicates significant p values that survived correction for multiple comparisons across all six ROIs (FDR-corrected p<0.05).

L TPJR TPJL STSR STSMPFCPrecuneus
Endo-Self vs. nonsocialMean accuracy58.5%56.7%53.0%50.9%55.2%54.6%
 95% CI55.5–61.653.5–59.849.3–56.847.5–54.052.3–58.051.5–57.7
 p value0.0001*0.0001*0.0338*0.27030.0026*0.0027*
Exo-Self vs. nonsocialMean accuracy60.7%56.5%56.3%53.6%54.5%55.7%
 95% CI57.0–64.453.7–58.853.5–59.050.5–56.651.7–57.151.9–59.4
 p value0.0001*0.0002*0.0001*0.0179*0.0044*0.0001*
Endo-Other vs. nonsocialMean accuracy58.4%52.7%53.5%52.2%53.4%55.4%
 95% CI55.9–60.949.6–55.951.1–55.948.8–55.049.5–57.752.6–58.2
 p value0.0001*0.04970.0147*0.09690.0219*0.0014*
Exo-Other vs. nonsocialMean accuracy56.7%54.2%53.1%51.5%53.4%55.2%
 95% CI53.8–59.851.1–57.350.4–55.949.1–54.049.8–56.752.0–58.5
 p value0.0002*0.0065*0.0319*0.19010.0197*0.0012*

Discussion

This study analyzed brain activity while people read stories about agents attending to objects in the environment. We examined whether specific brain areas could decode information about the type of attention referenced in the story (exogenous versus endogenous), and about the type of agent in the story (whether the agent was the subject reading the story or a different person). We hypothesized that if the brain constructs a model of attentional state that is used in social cognition, then areas of the brain known to be involved in social cognition should be able to distinguish between the two types of attention, exogenous and endogenous, represented in the stories. Our main analysis confirmed the hypothesis: the left TPJ showed significant decoding of information about endogenous versus exogenous attention. The finding is, arguably, remarkable, given that the semantic and wording difference between the two story types is extremely subtle.

These results support a new and growing body of evidence that the human brain constructs a model of attention to aid in theory of mind (Guterstam et al., 2019; Guterstam and Graziano, 2020b; Kelly et al., 2014; Pesquita et al., 2016; Vernet et al., 2019). The model includes information about attention that is deeper and more complex than just gaze direction or an identification of the attended object. At least one aspect of attention incorporated into the model appears to be the manner in which attention moves to an object: endogenously (internally directed) or exogenously (externally induced). The processing of the model appears to engage the theory-of-mind cortical network. The left TPJ showed the strongest decoding result.

Why should the TPJ in particular have shown involvement in decoding someone else’s attention state, rather than areas in the STS that are known for responding to the gaze direction of others (Marquardt et al., 2017)? As noted in the Introduction, we suggest that modeling someone else’s attention, and processing someone else’s gaze direction, are not the same. Gaze is only one of many cues that can be integrated to constrain a model of someone else’s attention. In Kelly et al., 2014, in a task in which participants judged the attention state of a cartoon, the TPJ was not active in association with gaze direction, and also not active in association with facial expression; but both left and right TPJ were active in association with the integration of the two cues, gaze and expression, in order to judge the cartoon’s attentional state. The current finding of significant decoding in the left TPJ is consistent with that prior finding.

It is not clear why the left hemisphere showed stronger activity than the right in the present task. Social cognition tasks often activate the TPJ bilaterally, but typically engage the right TPJ more (Saxe and Wexler, 2005). One speculation is that some aspect of the present task, perhaps explicitly instructing people that the task was a test of reading comprehension, caused an emphasis on linguistic processing, biasing the activity toward the left hemisphere. Other explanations for the left-hemisphere bias may also be possible.

One alternative interpretation of the present finding in the TPJ is that people simulated attention orienting when reading the stories, and activated the corresponding dorsal (endogenous) and ventral (exogenous) attention networks in a ‘mirror-neuron-like’ fashion. Under this hypothesis, the TPJ, as part of the ventral attention network, was activated when reading the exogenous stories, and the dorsal attention network (consisting of the frontal eye fields, intraparietal sulcus, and middle temporal complex) should be active in association with reading the endogenous stories (Corbetta et al., 2008; Corbetta and Shulman, 2002; Fox et al., 2006). However, a control analysis showed that none of these dorsal attention network regions significantly decoded information concerning the attention type in the stories (Figure 3—figure supplement 6), which is incompatible with the attention-orienting simulation account of the TPJ result. Our findings are thus more consistent with the notion that the TPJ is involved in constructing a rich, perhaps implicit model of attention that may assist in social cognition (Graziano and Kastner, 2011; Guterstam et al., 2020; Guterstam and Graziano, 2020a; Kelly et al., 2014). We are not suggesting, however, that the possible involvement of the TPJ in modeling one’s own and others’ attention, and the involvement of the TPJ in the control of attention, is a coincidence. We suggested in previous work (Graziano and Webb, 2015) that a rich model of attention may be of benefit in the control of attention.

A second, alternative interpretation of the present results is that the exogenous sentences might make the reader focus more on the object in the story (thus engaging less mentalizing), whereas the endogenous sentences might make the reader focus more on the character’s mental act of attending (thus engaging more mentalizing). In that interpretation, the TPJ shows a significant decoding result because it becomes more active in the endogenous story type, due to more mentalizing cognition. Although this possible explanation of the TPJ decoding results is difficult to exclude, we believe it is unlikely. First, if endogenous stories simply engaged more mentalizing than exogenous stories, and thus caused more activity in the TPJ, our univariate analysis should have found more activity in the TPJ to endogenous stories. It did not (see Supplementary file 5). We found no evidence that a simple difference in the amount of mentalizing between exogenous and endogenous stories affected the overall amount of activity in the TPJ or anywhere else in the brain. Although the pattern of activity in the left TPJ clearly contains information about the exogenous-endogenous distinction, that information is not in the form of a simple increase in activity during the endogenous stories. Second, it is not clear that subjects should mentalize more in the endogenous story type than in the exogenous story type. When reading that ‘Alice looks for the red ball,’ subjects might wonder why, and think about Alice’s mental state. When reading that ‘Alice notices the red ball,’ subjects might again wonder why the ball was of enough significance to be capturing her notice, or what mental reaction she experiences when noticing that apparently unanticipated object. Both story types invite mentalizing, alhough one focuses on an endogenous process of attention and the other on exogenous attention. Third, the probe task was designed to limit readers from speculating too much about the deeper meaning of the stories by asking only about literal details and never about the internal states of the characters. Thus, the subjects had an incentive to focus on the same physical aspects of the story in both the exogenous and endogenous conditions. Fourth, and finally, prior studies suggest that if an experimental and a control story both include human agents with potential thoughts, then subjects automatically mentalize in both story types, and the contrast between the story types will tend not to show a differential degree of activity in theory-of-mind cortical areas, especially in the TPJ (Saxe and Kanwisher, 2003). It is likely that, in our case, subjects cannot help mentalizing about the characters in both story types. We acknowledge that any story stimuli are always so complex that a variety of unintended, subtle differences may affect the results, but for the reasons listed here, we argue that a difference in the overall amount of mentalizing during endogenous versus exogenous stories is unlikely to explain the present results in the TPJ. The results point to the left TPJ processing different information in the exogenous and endogenous story types, but not a difference in overall amount of activity.

In addition to the endogenous-versus-exogenous comparisons, we also analyzed brain areas involved in self-versus-other encoding. We found evidence of self-versus-other encoding in the left TPJ, left STS, MPFC, and precuneus. The MPFC and precuneus have been previously implicated in self processing (Northoff et al., 2006; Ochsner et al., 2004; Passingham et al., 2010; Qin and Northoff, 2011; van Veluw and Chance, 2014), and the TPJ is consistently activated in fMRI studies involving self-recognition (van Veluw and Chance, 2014) and first-person perspective taking (Ionta et al., 2011). These results lend confidence to the present paradigm, showing that even the very subtle differences between our story stimuli were able to reveal cortical results consistent with previous studies.

Contrary to our prediction 3, we found no evidence for an interaction between attention type and agent type decoding in the theory-of-mind ROIs. (As noted in the Supplementary Information, during an exploratory searchlight analysis, we also found no evidence of an interaction effect in any other brain area.) Although it is possible that our paradigm was simply not sensitive enough to detect subtle interaction effects, these results suggest that the brain encodes information about attention type in a similar manner in the self and in others. In light of previous results showing that the attribution of sensory awareness to others and to oneself have a shared representation in the TPJ (Kelly et al., 2014), we directly tested this notion in a post hoc cross-classification analysis focusing on the left TPJ, which was the only region showing significant decoding of attention type. We found that endogenous-versus-exogenous decoding significantly generalized across self-stories and other-stories (Figure 3—figure supplement 7), suggesting that there is an overlap in brain mechanisms that participate in the encoding of attention in others and in encoding of attention in the self.

Finally, significant decoding of the social-versus-nonsocial distinction was obtained across most of the theory-of-mind ROIs. This finding confirmed the validity of the paradigm, and was expected based on previous experiments of the theory-of-mind network (Gallagher et al., 2000; Saxe and Kanwisher, 2003; van Veluw and Chance, 2014; Vogeley et al., 2001).

The use of a story-reading paradigm allowed us to systematically manipulate the kind of attention represented in the stimulus while keeping other experimental factors close to identical. The endogenous and exogenous story versions differed only with respect to a few key words specifying the type of attention, while the rest of the stories were semantically the same. The absence of any univariate effect within the ROIs or anywhere else in the brain, even at a liberal uncorrected threshold, confirm that the stimuli were well matched (Supplementary file 5). To avoid cognitive bias or expectation effects, the probe task performed by the subjects concerned details about the spatial context or the objects in the stories, effectively distracting subjects from the description of attention. We also speculate that this design of the probe task minimized the theoretical risk of the reader focusing on slightly different aspects of the story in the endogenous and exogenous conditions (e.g. focusing more on the mental act of attending in endogenous stories, and more on the attention-grabbing object in the exogenous stories), because the task ‘forced’ the reader to focus on processing the spatial context and the object descriptions equally in both conditions. A post-scan questionnaire confirmed that none of the subjects came close to figuring out the purpose of the experiment (which they had been told was a ‘Reading Comprehension Experiment’). The finding of brain areas that significantly decoded the type of attention, despite the distinction between endogenous and exogenous attention being subtle and task-irrelevant, suggests that the human brain automatically, and possibly also implicitly (Pesquita et al., 2016), constructs a model of an agents’ attention that specifies at least some dynamic aspects of how that attention is moving around the scene.

Materials and methods

Subjects

Thirty-two healthy human volunteers (12 females, 30 right-handed, aged 18–52, normal or corrected to normal vision) participated in the study. Subjects were recruited either from a paid subject pool, receiving 40 USD for participation, or from among Princeton undergraduate students, who received course credits as compensation. In the subject recruitment material, the experiment was described as a ‘Reading Comprehension Study.’ All subjects provided informed consent and all procedures were approved by the Princeton Institutional Review Board.

Experimental setup

Request a detailed protocol

Before scanning, subjects were instructed and then shown three sample trials (which were not part of the stories presented in the subsequent experiment) on a laptop computer screen. All subjects gave the correct response to all three trials on the first try, indicating they had understood the instructions adequately. During scanning, the subjects laid comfortably in a supine position on the MRI bed. Through an angled mirror mounted on top of the head coil, they viewed a translucent screen approximately 80 cm from the eyes, on which visual stimuli were projected with a Hyperion MRI Digital Projection System (Psychology Software Tools, Sharpsburg, PA, USA) with a resolution of 1920 × 1080 pixels. A PC running MATLAB (MathWorks, Natick, MA, USA) and the Psychophysics Toolbox (Brainard, 1997) was used to present visual stimuli. A right hand 5-button response unit (Psychology Software Tools Celeritas, Sharpsburg, PA, USA) was strapped to the subjects’ right wrist. Subjects used the right index finger button to indicate a true response, and the right middle finger to indicate a false response during the probe phase of each trial.

Experimental conditions and stimuli

Request a detailed protocol

Five experimental conditions were included. Subjects were presented with short stories (2–3 sentences, average word count = 24) describing a scene in which an agent, which was either the subject him-/herself (self) or another person (other), directed attention to something in the external world endogenously (e.g., ‘X is attentively looking for Y’) or exogenously (e.g., ‘X’s attention is captured by Y’). These four conditions made up a 2 × 2 factorial design: attention type (endogenous versus exogenous) X agent (self versus other). In addition, we included a control condition featuring stories in which the agent was substituted by a non-human object. In each trial, after a 9–11 s inter-trial interval, the story was presented for 10 s in easily readable, white text on a black background, at the center of the screen, after which a probe statement was shown for 4 s, to which the subjects responded either true or false by button press. See Figure 1 for details, and Supplementary file 6 for all stories.

Each subject ran 100 trials and thus saw 100 stories: 80 social stories and 20 non-social control stories. The 80 social stories were constructed as follows. We began with 80 unique short stories. For each story, four versions were constructed, one for each of the factorial conditions (Figure 1B). To keep the story versions as semantically similar as possible, we made minimal changes to the wordings. To distinguish the self and other versions, we substituted the word ‘you’ with a name (e.g. ‘Karen’) and the word ‘your’ with ‘his’ or ‘her’. The names in the stories were selected from a list of the 100 most popular given names for male and female babies born during the years 1919–2018 in the United States, which is published by the Social Security Administration (https://www.ssa.gov/oact/babynames/decades/century.html). Half of the names were masculine, half feminine. To distinguish the endogenous and exogenous story versions, we used different wording for the part of the story where the agent (X) is related to the object (Y). In the endogenous versions, we used formulations such as: ‘X is trying to find Y,’ ‘X is trying to spot Y,’ or ‘X is looking attentively for Y.’ In the exogenous versions, we used formulations such as: ‘X’s eyes are drawn to Y,’ ‘X’s gaze is captured by Y,’ or ‘X’s attention is captured by Y.’ We matched the average number of words across all four conditions (24 words). The number of stories that included the words ‘attention’ or ‘attentively’ was balanced between the endogenous and exogenous categories (43 stories in each). Among the 80 stories, for each subject, 20 were randomly selected to be used in the endogenous-self version; 20 in the endogenous-other version; 20 in the exogenous-self version; 20 in the exogenous-other version. Thus, for the example story shown in Figure 1B, each subject saw only one of the four versions. In this manner, each subject saw 80 social stories, 20 of each type, balanced for as many properties as possible other than the two factors that were manipulated.

Finally, we constructed 20 additional stories for the non-social control condition (Figure 1B). To keep the control stories as semantically similar as possible to the social stories, we based them on a subset of the 80 original stories. Crucially, the agent in the original story was substituted with a non-human object, such as a camera or a spotlight, that has a source and a target just as attention does. For instance, the original story, "You are in a bike shop, and numerous bikes hang on one of the walls. You are attentively looking for that red Italian sports bike", was adapted to the non-social condition by substituting the agent with a spotlight: "In a bike shop, on one of the walls, hangs numerous bikes. A bright spotlight is shining on a red Italian sports bike". The average number of words of the non-social stories (24 words, standard deviation = 3) was matched with the attention stories.

The purpose of the probe statement at the end of each trial was to ensure that subjects carefully read the stories. Each statement described one detail of the preceding story that could be either true or false. We restricted the probe statements to the spatial context of the story (place probe: e.g. ‘Emma is on a bus’) or the object being described (object probe: e.g. ‘The Van Gogh painting has sunflowers’) in order to avoid alerting subjects to the focus of the experiment on theory of mind and attention. Half of the probe statements were place probes and half object probes. Within both the place and the object probes, half were true and half were false. The probe was on screen for 4 s, during which subjects were required to indicate whether the statement was true or false by button press.

The experiment consisted of 10 runs of approximately 4 min each. In each run, the five conditions were repeated two times, yielding a total of 10 trials per run. The trial order was randomized, with the limitation that two consecutive trials could not belong to the same condition. Each run included 18 s of baseline before the onset of the first trial and 12 s of baseline after the offset of the last trial.

Post-scan questionnaire

Request a detailed protocol

At the end of the scanning session, subjects were asked what they thought the purpose of the experiment was and what they thought it was testing.

fMRI data acquisition

Request a detailed protocol

Functional imaging data were collected using a Siemens Prisma 3T scanner equipped with a 64-channel head coil. Gradient-echo T2*-weighted echo-planar images (EPI) with blood-oxygen dependent (BOLD) contrast were used as an index of brain activity (Logothetis et al., 2001). Functional image volumes were composed of 54 near-axial slices with a thickness of 2.5 mm (with no interslice gap), which ensured that the entire brain excluding cerebellum was within the field-of-view in all subjects (54 × 78 matrix, 2.5 mm x 2.5 mm in-plane resolution, TE = 30 ms, flip angle = 80°). Simultaneous multi-slice (SMS) imaging was used (SMS factor = 2). One complete volume was collected every 2 s (TR = 2000 ms). A total of 1300 functional volumes were collected for each participant, divided into 10 runs (130 volumes per run). The first three volumes of each run were discarded to account for non-steady-state magnetization. A high-resolution structural image was acquired for each participant at the end of the experiment (3D MPRAGE sequence, voxel size = 1 mm isotropic, FOV = 256 mm, 176 slices, TR = 2300 ms, TE = 2.96 ms, TI = 1000 ms, flip angle = 9°, iPAT GRAPPA = 2). At the end of each scanning session, matching spin echo EPI pairs (anterior-to-posterior and posterior-to-anterior) were acquired for blip-up/blip-down field map correction.

FMRI preprocessing

Results included in this manuscript come from preprocessing performed using FMRIPREP version 1.2.3 (Esteban et al., 2019) (RRID:SCR_016216), a Nipype (Gorgolewski et al., 2011) (RRID:SCR_002502) based tool. Each T1w (T1-weighted) volume was corrected for INU (intensity non-uniformity) using N4BiasFieldCorrection v2.1.0 (Tustison et al., 2010) and skull-stripped using antsBrainExtraction.sh v2.1.0 (using the OASIS template). Spatial normalization to the ICBM 152 Nonlinear Asymmetrical template version 2009c (Fonov et al., 2009) (RRID:SCR_008796) was performed through nonlinear registration with the antsRegistration tool of ANTs v2.1.0 (Avants et al., 2008) (RRID:SCR_004757), using brain-extracted versions of both T1w volume and template. Brain tissue segmentation of cerebrospinal fluid (CSF), white-matter (WM) and gray-matter (GM) was performed on the brain-extracted T1w using fast (Zhang et al., 2001) (FSL v5.0.9, RRID:SCR_002823).

Functional data was slice time corrected using 3dTshift from AFNI v16.2.07 (Cox, 1996) (RRID:SCR_005927) and motion corrected using mcflirt (FSL v5.0.9) (Jenkinson et al., 2002). This was followed by co-registration to the corresponding T1w using boundary-based registration (Greve and Fischl, 2009) with six degrees of freedom, using flirt (FSL). Motion correcting transformations, BOLD-to-T1w transformation and T1w-to-template Montreal Neurological Institute (MNI) warp were concatenated and applied in a single step using antsApplyTransforms (ANTs v2.1.0) using Lanczos interpolation.

Many internal operations of FMRIPREP use Nilearn (Abraham et al., 2014) (RRID:SCR_001362) principally within the BOLD-processing workflow. For more details of the pipeline see https://fmriprep.readthedocs.io/en/latest/workflows.html.

Testing prediction 1

Request a detailed protocol

The purpose of the first analysis was to determine whether the brain encoded information concerning the type of attention (endogenous or exogenous) present in the stories. For this analysis, we used MVPA, which tests whether patterns of brain activity can be used to decode the distinction between two conditions. It is a more sensitive analysis than the more common, simple subtraction methods. The reason for using this sensitive measure is that the difference between exogenous and endogenous trial types was extremely subtle. Both trial types engaged social cognition, and therefore might cancel each other out in a simple subtraction. The stimuli were nearly identical, differing only in a few words that indicated the type of attention used by the agent in the story. In addition, the type of attention featured in the story was irrelevant to the task performed by the subject. To accommodate the subtlety of the distinction between conditions, we designed the study to use MVPA. We hypothesized that with MVPA, brain activity would carry information about the endogenous versus exogenous distinction; and that decoding would be evident in regions of interest (ROIs) within the network of areas typically found to be involved in social cognition.

We defined our ROIs as spheres centered on the statistical peaks reported in an activation likelihood estimation (ALE) meta-analysis of 16 fMRI studies (including 291 subjects) involving theory-of-mind reasoning (van Veluw and Chance, 2014), in accordance with generally accepted guidelines in ROI analysis (Poldrack, 2007). The ROIs are shown in Figure 2. The peaks were located in six areas: the left TPJ (Montreal Neurological Institute [MNI]: −52,–56, 24), right TPJ (MNI: 55,–53, 24), left STS (MNI: −59,–26, −9), right STS (MNI: 59,–18, −17), MPFC (MNI: 1, 58, 19), and the precuneus (MNI: −3,–56, 37). The radius of the ROI spheres was 10 mm, corresponding to the approximate volume (4000 mm3) of the largest clusters (TPJ and MPFC) reported in van Veluw and Chance, 2014. The same sphere radius was used for all ROIs.

The fMRI data from all participants were analyzed with the Statistical Parametric Mapping software (SPM12) (Wellcome Department of Cognitive Neurology, London, UK) (Friston et al., 1994). We first used a conventional general linear model (GLM) to estimate regression (beta) coefficients for each individual trial (i.e. 100 regressors), focusing on the 10 s story presentation phase of each trial. One regressor of no interest modeled the 4 s probe statement phase across conditions. Each regressor was modeled with a boxcar function and convolved with the standard SPM12 hemodynamic response function. In addition, ten run-specific regressors controlling for baseline differences between runs, and six motion regressors, were included. The trialwise beta coefficients for the endogenous and exogenous conditions (i.e. 80 beta maps) were then submitted to subsequent multivariate analyses (Haxby et al., 2001).

The MVPA was carried out using The Decoding Toolbox (TDT) version 3.999 (Hebart et al., 2014) for SPM. For each subject and ROI, we used linear support vector machines (SVMs, with the fixed regularization parameter of C = 1) to compute decoding accuracies. To ensure independent training and testing data sets, we used leave-one-run-out cross-validation approach. For each fold, the betas across all training runs were normalized relative the mean and standard deviation, and the same Z-transformation was applied to the betas in the left-out test run (Misaki et al., 2010). An SVM was then trained to discriminate activity patterns belonging to the endogenous or exogenous trials in nine runs, and then tested on the left-out run, repeated for all runs, resulting in a run-average decoding accuracy for each ROI and subject.

For statistical inference, the true group mean decoding accuracy was compared to a null distribution of group mean accuracies obtained from permutation testing. The same MVPA was repeated within each subject and ROI using permuted condition labels (10,000 iterations). A p value was computed as (1+the number of permuted group accuracy values > true value)/(1+the total number of permutations). To control for multiple comparisons across the six ROIs, we used the false discovery rate (FDR) correction (Benjamini and Hochberg, 1995). In addition, we also computed a bootstrap distribution around the true group mean accuracy by resampling individual-subject mean accuracies with replacement (10,000 iterations), from which a 95% confidence interval (CI) was derived (Nakagawa and Cuthill, 2007). A corrected p value < 0.05 in combination with a 95% CI that does not cross chance level were interpreted as a significant decoding effect at the group level (Nakagawa and Cuthill, 2007).

In addition, as further exploratory statistics beyond the targeted hypotheses of this study, we used a whole-brain searchlight analysis (Kriegeskorte et al., 2006) to test for possible areas of decoding outside the ROIs. This searchlight analysis is described in the Supplementary Information (Supplementary file 14 and 7, Figure 3—figure supplements 35, and Figure 4—figure supplement 1).

Testing prediction 2

Request a detailed protocol

The purpose of the second analysis was to determine whether the brain encoded information concerning the type of agent (self versus other) present in the stories. The analysis methods were the same as for testing hypothesis 1, except that for regressors of interest we used the self-related and other-related trials, collapsed across the type of attention (exogenous or endogenous). Just as for hypothesis 1, we tested the six defined ROIs within the theory-of-mind network.

Testing prediction 3

Request a detailed protocol

The purpose of the third analysis was to test for an interaction between the two variables (endogenous versus exogenous, and self versus other). We used MVPA to test whether the decoding for the type of attention was significantly different between the self-related and the other-related stories. The analysis methods were similar to those used for testing hypotheses 1 and 2, except in the following ways. We computed two MVPA decoding results, the first for distinguishing endogenous-self from exogenous-self stories, the second for distinguishing endogenous-other from exogenous-other stories. We then computed the difference between the two decoding results ([endogenous-self versus exogenous-self] – [endogenous-other versus exogenous-other]) to create a decoding difference score. Just as for hypotheses 1 and 2, we tested the six defined ROIs within the theory-of-mind network (van Veluw and Chance, 2014).

In a post hoc analysis, we tested for overlap in attention type encoding in self and others in the left TPJ ROI by using a two-way cross-classification analysis. One classifier was trained to discriminate endogenous versus exogenous self-stories and tested on other-stories (using a leave-one-run-out approach), and another classifier was trained to discriminate endogenous versus exogenous other-stories and tested on self-stories, from which an average cross-classification decoding accuracy was obtained. Overlap in attention type representations in self and other should be reflected in above-chance decoding.

Testing prediction 4

Request a detailed protocol

The purpose of the fourth analysis was to confirm whether our story stimuli engaged social cognition and thereby recruited brain areas within the expected theory of mind network. The analysis was meant as an added control to check the validity of the paradigm. The analysis methods were similar to those used for testing hypotheses 1–3, except in the following ways. We computed four MVPA decoding results: endogenous-self versus nonsocial, endogenous-other versus nonsocial, exogenous-self versus nonsocial, and exogenous-other versus nonsocial. (Because using MVPA to compare two conditions requires equal numbers of trials in both conditions, it was not possible to use a single analysis to compare all 80 social trials to the 20 nonsocial trials.) Each analysis represents a separate, alternative way to assess the social-versus-nonsocial decoding. Just as for hypotheses 1–3, we tested the six defined ROIs within the theory-of-mind network (van Veluw and Chance, 2014).

Eye tracking analysis

Request a detailed protocol

Eye movements were recorded via an MRI-compatible infrared eye tracker (SR Research EyeLink 1000 Plus), mounted just below the projector screen, sampling at 1000 Hz. Before each scanning session, a calibration routine on five screen locations was used and repeated until the maximum error for any point was less than 1°. The obtained eye position data was cleaned of artifacts related to blink events and smoothed using a 20 ms moving average. We then built an SVM decoding model analogue to the cross-validation approach used for the fMRI data, but here based purely on eye tracking data, to test whether eye movement dynamics alone were sufficient to decode the conditions of interest (endogenous versus exogenous, and self versus other). In keeping with a previous study (Schneider et al., 2013), we organized the data in the following way. The part of the display within which the stimuli appeared was divided into an 8 × 4 grid of 32 equally sized squares. The grid covered the screen area within which the stories were presented (see red outline in Figure 3—figure supplement 8), and approximately corresponded to the locations of individual words (four lines, with eight words per line). For each trial, the proportion of time that the subject fixated within each square (32 features) and the saccades between those regions (32 × 32 = 1024 features) was calculated. These 1056 features, representing information about both where people were looking as well as saccade dynamics, were then averaged across repetitions for each of the four main conditions within each of the 10 runs, yielding one eye movement feature vector per condition per run (per subject). The feature vectors were submitted to an SVM classifier (C = 1). Using a leave-one-run-out approach, the SVM model was trained on endogenous versus exogenous story types, and then tested in the left-out run. At the group level, the decoding accuracies were tested against chance level using t-tests. A similar analysis was then performed on the contrast between self-related stories versus other-related stories. The results showed that endogenous-versus-exogenous and self-versus-other story types could not be decoded significantly better than chance using the pattern of eye movement. See Supplementary Information (Supplementary file 7 and Figure 3—figure supplement 8) for the results of the eye-tracking analysis.

References

    1. Graziano MS
    (2013)
    Consciousness and the Social Brain
    Oxford University Press.
    1. Graziano MS
    (2019)
    Attributing awareness to others: the attention schema theory and its relationship to behavioural prediction
    Journal of Consciousness Studies : Controversies in Science & the Humanities 26:17–37.
    1. Poldrack RA
    (2007) Region of interest analysis for fMRI
    Social Cognitive and Affective Neuroscience 2:67–70.
    https://doi.org/10.1093/scan/nsm006
    1. Posner MI
    (1980) Orienting of attention
    Quarterly Journal of Experimental Psychology 32:3–25.
    https://doi.org/10.1080/00335558008248231
    1. Schneider B
    2. Pao Y
    3. Pea RD
    (2013)
    Predicting students’ Learning Outcomes Using Eye-Tracking Data
    Learn Anal Knowl Conference.

Decision letter

  1. Marius V Peelen
    Reviewing Editor; Radboud University, Netherlands
  2. Michael J Frank
    Senior Editor; Brown University, United States
  3. Anthony Atkinson
    Reviewer; Durham University, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This paper reports the novel finding that fMRI activity patterns in the left TPJ distinguish between stories in which an agent endogenously versus exogenously attends to an object. The experimental design is straightforward and elegant, using tightly-controlled comparisons. The findings suggest that brain regions implicated in theory-of-mind represent a model of another person's attentional state.

Decision letter after peer review:

Thank you for submitting your article "Temporo-Parietal Cortex Involved in Modeling One's Own and Others' Attention" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Anthony Atkinson (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Summary:

This fMRI study tested whether theory-of-mind (ToM) regions differentially represent information about endogenous and exogenous attention of the self and of another person. Participants read short passages that described themselves or another person deliberately attending to or having their attention drawn to an object. The main finding was that activity patterns in left TPJ allowed for decoding endogenous vs. exogenous attention (collapsed across self and other). This finding indicates a role for the left TPJ in modelling attentional states.

Revisions for this paper:

Motivation and interpretation:

1) Considering that the study decodes mental states (here: attentional states) in ToM areas, as described in stories, please review work similarly decoding mental states in these regions (e.g., work by Koster-Hale, Saxe). How is your study similar or different from previous work decoding mental states? Do results follow from these previous studies? I also missed a review of (and integration with) the literature on endogenous vs exogenous attention itself (e.g., ventral vs dorsal attention network).

2) It was not clear from the Introduction why you hypothesized that exo vs endo attention should be decodable in TPJ. Is this where eye gaze direction – which appeared to have motivated the study – can be decoded from? The study cited for this hypothesis (Kelly et al., 2014; reanalysed in Igelstrom et al., 2016) showed that univariate activity in TPJ reflected the difficulty of social attribution, which does not obviously lead to the current hypothesis.

3) Does the main finding reflect a model of others' attention or differences in mental state attribution: sentences describing endogenous attention focus the reader more on the mental act of attending and might also induce further mentalizing (e.g., the reader may wonder: "why would he decide to look for that object?"). In exogenous sentences, the focus of the sentence is instead more strongly on the attention-grabbing object (e.g., "the bright red tie"). Can this alternative interpretation be excluded?

4) Searchlight analysis for exogenous v. endogenous attention: Is the cluster centred at -59, -47, 5, labelled left posterior STS (TPJ), really TPJ? It is, after all, squarely in the temporal lobe and some distance (22mm) from the centre of the left TPJ ROI, the latter being in left angular gyrus (areas PGa and PFm). Do you have some independent justification for the labelling of the location of this cluster as TPJ? For example, perhaps it lies within the anterior TPJ (TPJa) subregion identified by Mars et al. (Cerebral Cortex 2012)? The posterior STS area from the searchlight analysis seems to fall into the "gaze-following patch" in the posterior STS as reported by Marquardt et al., 2017. Since this cortical area is not generally considered to be a part of the theory of mind network, the seeming involvement of this area potentially changes the interpretation of the results.

Additional analyses:

5) Please report and compare accuracy and RTs for the response to the probe statement. If either differs between conditions, then that becomes a potential confound for the between condition contrasts in the MVPA analyses, considering that the BOLD response to the story and probe events could not be separated.

6) The TPJ is introduced in the context of theory of mind and social cognition, but it has also been implicated in attentional orienting. Could it be the case that participants simulate such orienting when reading the stories, leading to the above-chance decoding in TPJ? If this is the case, one may similarly expect above-chance decoding in areas that have been implicated in endogenous attention. This should be tested by including dorsal attention network ROIs. Above-chance decoding in such attention regions may inform the interpretation of the TPJ results.

7) Please provide univariate activity estimates, both in the ROIs and in whole-brain contrasts. This may help to interpret the multivariate results.

Further support for main finding:

8) The reported decoding accuracy values are really quite small and close to 50% (chance), especially for the endogenous vs. exogenous and self vs. other contrasts, even those values that are statistically significant. Please report appropriate effect sizes and the confidence intervals around those effect sizes (for all your reported t-tests, not just those that are statistically significant). It would also be informative if you were to include in your graphs the individual subject data (e.g., mean decoding accuracy per subject for each ROI) and/or plots of the effect sizes and their distributions. For more on effect sizes, their CIs and associated plots, I point you to the following sources and the references therein:

https://thenewstatistics.com/itns

https://thenewstatistics.com/itns/2019/05/20/reply-to-lakens-the-correctly-used-p-value-needs-an-effect-size-and-ci/

https://www.estimationstats.com/

9) The authors computed the attentional state decoding separately for "self" and "other". These decoding accuracies did not differ. Please also provide and test the accuracies of self and other separately; this would shed light on the reliability of the main result (which was collapsed across the two conditions) and might also indicate whether the decoding was more reliable (less variable) in self or other.

10) Replicate results in one or multiple additional ToM TPJ ROIs. Reviewers raised two suggestions: 1) Surface-based ROI: the TPJ is a highly anatomically variable cortical region that isn't well approximated by a single volume ROI (see Croxson et al., 2017). There are many surface-based ROIs now available that mitigate the effect of this variability, including ones from the authors' own lab. 2) Use a different meta-analysis: The used meta-analysis defines “theory of mind” solely in terms of false-belief tasks (van Veluw and Chance, 2014), which may not be appropriate for delineating the ROIs and their exact locations in your study. Different types of ToM task reliably activate different brain areas, as well as common ones (mPFC and bilateral TPJ: Molenberghs et al., Neuroscience and Biobehavioral Review 2016). Consider using the results of a different meta-analysis, e.g., one that identifies regions based on the conjunction of multiple types of theory-of-mind tasks (e.g., Mar, Annual Review of Psychology 2011; Molenberghs et al., 2016; Schurz et al., Neuroscience and Biobehavioral Reviews 2014).

11) In previous work (Kelly et al., 2014), the authors were interested in an overlap between attention in self and other. Here, this could be addressed in a cross-decoding analysis, training a classifier on exo vs endo in the "self" stories and testing this on the "other" stories. Above-chance classification in this analysis would strengthen the evidence for lTPJ involvement and would provide additional information that would help interpreting the TPJ findings.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Temporo-Parietal Cortex Involved in Modeling One's Own and Others' Attention" for consideration by eLife. Your article has been reviewed by two peer reviewers, one of whom is a member of our Board of Reviewing Editor, and the evaluation has been overseen by Michael Frank as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Summary:

Both reviewers noted that many of their concerns were addressed. However, they each had one remaining concern that would need to be addressed.

Essential Revisions:

1) Please report and analyze the univariate activity in the ROIs

2) More fully report (in main Results section) and discuss the searchlight results.

Reviewer #1:

The authors have addressed many of my concerns. In particular, the additional TPJ ROIs and the demonstration of self-other cross-decoding increased my confidence in the main finding of attention state decoding in lTPJ.

One comment has not been sufficiently addressed, however: it would be highly relevant and informative to see the average univariate activity for each of the conditions in the 6 original ROIs, plotted in a graph. The endo>exo contrast should be tested, correcting for the 6 ROIs, similar to how the decoding accuracies are tested (i.e., with the same p<0.05 cut-off). This ROI-based univariate analysis is also interesting for other comparisons, e.g. to test whether the social conditions gave higher activity than the non-social condition, as would be expected based on previous work. Thus, please provide a full univariate analysis on average ROI activity.

There is a lot of interesting information in the Supplementary figures (e.g., searchlight, additional ROIs), which will not be visible when viewing or printing pdfs. You could consider moving some of this to the main text, or add panels to existing figures where you include some of this information.

Reviewer #2:

In this revised manuscript, the authors have greatly clarified some key points, most important of which was how well matched the behavior was across all of the conditions. While there is still a potential confound in the social vs. non-social contrast given the difference in reaction time, (a) the confound effects are likely negligible in size, (b) the end result matches the previous literature, and (c) the contrast was only a control analysis and doesn't greatly affect the main point of the paper. However, the handling of the left posterior STS result in the searchlight analysis remains problematic, both in the manuscript and in the author's response.

pSTS response and discussion: The main point of the manuscript is to examine whether known theory of mind areas contain information that differentiates between the endogenous vs. exogenous story conditions. The revised manuscript makes quite clear that the answer is yes, activity in the L TPJ differentiates between the two conditions, albeit with a relatively low decoding accuracy. The fact that multiple versions of the L TPJ ROI produces the same result is particularly reassuring. The spotlight analysis, however, finds that there is potentially more discriminatory information to be found in the posterior STS/middle temporal gyrus, and this is where the manuscript runs into problems.

In the revised manuscript, this result is simply not mentioned, though it clearly has implications for the interpretations of the result as mentioned in the previous round of reviews. In addition, in the response to the reviewers' comments, the authors give a very convoluted and confusing argument based on MNI coordinates that somehow this focus is somehow (a) part of the TPJ and (b) may be a part of the attention-related TPJa as opposed to their TPJp ROI. In the process of saying that the surface projections are misleading, they state that the focus as falling on the STG on the surface projection, when it is really on the MTG. In addition, they base their arguments on MNI coordinates, but MNI coordinates are notoriously inaccurate in comparing results from different studies, so the author's attempts to justify their conclusions using these coordinates falls flat. The authors own volume images very clearly show the focus as being in the STS and not on either the angular gyrus or supramarginal gyrus that Mars, Corbetta, and others have generally shown the TPJp and TPJa to fall on. And even if this focus was in the TPJa, as the authors seem to hint at, they do not discuss the implications of this result.

The authors' defensiveness around this point is frankly puzzling. The searchlight results do not seem to invalidate their main point, only potentially augment it. The fact that the posterior STS (and what seem to be the right FEF and iPCS areas) exhibit higher discriminability between the endo and exo conditions may not be shocking given that the theory of mind areas are likely involved in many operations in this task, whereas the pSTS and right hemisphere areas likely may only be involved in just the imagined attention/sensory processing aspects of the task. It is possible that these areas are "reading out" the differing attentional conditions from the L TPJ. Whatever the explanation may be, the authors seem to be trying to bury this result to shoehorn the results into fitting their a priori model, which is a disservice and misleading to the readers, and needs to be rectified before the manuscript can be published. My recommendation is to at least mention the searchlight results in the main Results section, then add a paragraph to the Discussion discussing the implications of these results. Better yet would be to quantify the discrimination accuracy within each of the foci uncovered in the spotlight analyses.

https://doi.org/10.7554/eLife.63551.sa1

Author response

Revisions for this paper:

Motivation and interpretation:

1) Considering that the study decodes mental states (here: attentional states) in ToM areas, as described in stories, please review work similarly decoding mental states in these regions (e.g., work by Koster-Hale, Saxe). How is your study similar or different from previous work decoding mental states? Do results follow from these previous studies? I also missed a review of (and integration with) the literature on endogenous vs exogenous attention itself (e.g., ventral vs dorsal attention network).

We thank the reviewers for these relevant suggestions. The work in our lab has focused on how people construct models of their own and others’ attention, an area of modeling mental states that is relatively less well studied. We now cite and briefly review previous relevant work using MVPA to decode various aspects of other people’s mental state, such as their beliefs (Koster-Hale et al., 2017), intentions (Koster-Hale et al., 2013), and perceptual source (Koster-Hale et al., 2014), from activity patterns in the theory of mind network. We also clarify what we believe distinguishes the current study from earlier ones. From the fifth paragraph of the Introduction:

“This first prediction, that the social cognition network will encode the exogenous-versus-endogenous distinction, represents the main, novel contribution of this study. Previous studies have used MVPA to decode various aspects of other people’s mental states from activity in social brain areas, such as their beliefs (Koster-Hale et al., 2017), intentions (Koster-Hale et al., 2013), and perceptual source (Koster-Hale et al., 2014). To the best of our knowledge, this investigation is the first to test whether activity in social brain areas can decode other people’s attentional states.”

We also added a brief review of endogenous versus exogenous attention in the revised Introduction (paragraph three):

“For example, Pesquita et al. (2016) found that when participants watch an actor in a video attending to an object, the participants implicitly distinguish between whether the actor’s attention was drawn to the object exogenously (bottom-up, or stimulus-driven attention), or whether the actor endogenously shifted attention to the object (top-down, or internally driven attention). Exogenous and endogenous attention are the two principal ways in which selective attention moves between objects. They may be relatively emphasized in distinct cortical networks (the ventral and dorsal attention networks), and they influence the behavior of agents in profoundly different manners (Corbetta et al., 2008; Corbetta and Shulman, 2002; Posner, 1980; Shulman et al., 2010). The ability to distinguish between someone else’s exogenous and endogenous attention is therefore one example of how people may construct a rich, dynamic, and useful model of other people’s attention beyond merely encoding gaze direction or identifying the object of attention.”

2) It was not clear from the Introduction why you hypothesized that exo vs endo attention should be decodable in TPJ. Is this where eye gaze direction – which appeared to have motivated the study – can be decoded from? The study cited for this hypothesis (Kelly et al., 2014; reanalysed in Igelstrom et al., 2016) showed that univariate activity in TPJ reflected the difficulty of social attribution, which does not obviously lead to the current hypothesis.

We thank the reviewers for allowing us to clarify the rationale behind our central Prediction 1. First, we would like to stress that the wording of Prediction 1 emphasized that we expected above-chance decoding in “some subset” of theory-of-mind areas, not only in the TPJ:

“We hypothesized that participants would encode the type of attention in the story (exogenous versus endogenous), and that this encoding would be evident in some subset of the areas classically involved in theory of mind. […] We therefore predicted that the exogenous-versus-endogenous distinction would be significantly encoded in some subset of these areas.”

In accordance with this prediction, we tested exogenous-versus-endogenous decoding in each of the six ROIs and corrected for multiple comparisons across all ROIs.

We also anticipated that, among the theory-of-mind areas, the TPJ might show the clearest evidence of attention type decoding, based on the results of Kelly et al. (2014). Kelly et al. (2014) is particularly relevant to our study because it identified areas showing greater activity when the social attribution of an attentional state to a cartoon face is more difficult, while controlling for factors such as the direction of gaze, emotional valence, and low-level visual features. An important underlying concept in our current work is that modeling someone else’s attention, and processing someone else’s gaze direction, are not the same. Gaze is only one cue to someone else’s attention. In Kelly et al. (2014), we found that the TPJ was not active in association with gaze, and also not active in association with facial expression; but was active in association with a task in which subjects integrated both gaze and expression in order to judge the attentional state of someone else. Considering that the task in Kelly et al. (2014) engaged the TPJ bilaterally, but none of the other theory-of-mind regions investigated here (i.e., STS, precuneus, and MPFC), we anticipated the TPJ might be more likely than the other theory-of-mind ROIs to be involved in encoding attention type. As discussed more in detail in our response to point #4 below, the TPJ region investigated here, which was defined based on studies of theory of mind reasoning, is spatially separated (by 2 cm) from the location of the gaze-following patch within the STS.

To address the reviewers’ concern, we removed the statement in the Introduction that predicts an effect specifically in the TPJ. Instead, in the Introduction, we confine Prediction 1 to a more cautious statement, predicting that the exogenous-versus-endogenous distinction should be significantly encoded in at least some subset of the ROIs representing the ToM cortical network.

In the revised Discussion section, we added a paragraph (third paragraph) discussing the reasons why the TPJ may have been a special focus of activity here. In that paragraph, we describe more fully why Kelly et al. (2014) is consistent with this finding.

3) Does the main finding reflect a model of others' attention or differences in mental state attribution: sentences describing endogenous attention focus the reader more on the mental act of attending and might also induce further mentalizing (e.g., the reader may wonder: "why would he decide to look for that object?"). In exogenous sentences, the focus of the sentence is instead more strongly on the attention-grabbing object (e.g., "the bright red tie"). Can this alternative interpretation be excluded?

This is an important point that we thought deeply about when constructing the story stimuli. The short answer is that the alternative interpretation can’t be entirely excluded, but we believe it is unlikely. We now address it explicitly in the sixth paragraph of the Discussion section:

“A second, alternative interpretation of the present results is that the exogenous sentences might make the reader focus more on the object in the story (thus engaging less mentalizing), whereas the endogenous sentences might make the reader focus more on the character’s mental act of attending (thus engaging more mentalizing). […] The results point to the left TPJ processing different information in the exogenous and endogenous story types, but not a difference in overall amount of activity.”

4) Searchlight analysis for exogenous v. endogenous attention: Is the cluster centred at -59, -47, 5, labelled left posterior STS (TPJ), really TPJ? It is, after all, squarely in the temporal lobe and some distance (22mm) from the centre of the left TPJ ROI, the latter being in left angular gyrus (areas PGa and PFm). Do you have some independent justification for the labelling of the location of this cluster as TPJ? For example, perhaps it lies within the anterior TPJ (TPJa) subregion identified by Mars et al. (Cerebral Cortex 2012)? The posterior STS area from the searchlight analysis seems to fall into the "gaze-following patch" in the posterior STS as reported by Marquardt et al. 2017. Since this cortical area is not generally considered to be a part of the theory of mind network, the seeming involvement of this area potentially changes the interpretation of the results.

We very much appreciate the reviewers bringing this issue of neuroanatomy to our attention, and for directing us to the Mars et al. (2012) paper. First, we would like to emphasize that the anatomical localization of the Searchlight decoding clusters was based on the projection of the clusters onto sections of the average structural scan generated from the 32 subjects. The projection of volumetric results onto a 3D canonical brain surface is an approximate and imperfect process, and was here used for visualization purposes only. To better clarify the three-dimensional location of the left posterior STS cluster in the endogenous-vs-exogenous Searchlight analysis, we have now included all three sections in the bottom part of Figure 3—figure supplement 3

As can be seen in Figure 3—figure supplement 3, the cluster lies entirely in the posterior section of the STS (and not on the STG, which one might guess from looking at the brain surface projection). As the reviewers point out, the peak of the Searchlight cluster (MNI: -59, -47, 5) is anterior-inferior with respect to the center of our left TPJ ROI (MNI: -52, -56, 24). However, do they belong to the same or different TPJ subregions? According to the connectivity-based TPJ subdivision proposed by Mars et al. (2012), the border between the anterior (TPJa) and posterior TPJ (TPJp) within the posterior STS is somewhere between the MNI y-coordinates -40 and -48 (see Figure 3a in Mars et al.; unfortunately, a more precise definition in terms of MNI coordinates are not reported). Our TPJ ROI is clearly located in TPJp. The Searchlight cluster is likely located in TPJp but very close to the border of TPJa, given that the Searchlight cluster peak is 3 mm closer to the center of gravity for TPJp compared to TPJa (23 mm vs 26 mm). This anatomical labelling is in accordance with another functional TPJ parcellation (Bzdok et al. 2013), in which the TPJp within STS starts at the MNI y-coordinate -47 (see Figure 2 in Bzdok et al. 2013).

It should be noted that both the Mars et al. (2012) and Bzdok et al. (2013) TPJ characterizations are based on the right TPJ. However, the TPJ has a known hemispheric asymmetry regarding functional specialization (Seghier, 2013), neurological lesion effects (Corbetta et al., 2000), functional (Uddin et al., 2010) and anatomical (Caspers et al., 2011) connectivity, as well as cytoarchitectonic borders and gyral pattern (Caspers et al., 2006, 2008). The proposed TPJa and TPJp subdivisions may therefore not directly apply to the left TPJ, and should be interpreted with caution.

Finally, the endogenous-vs-exogenous decoding cluster does not overlap with the gaze-following patch (GFP) reported in Marquardt et al. 2017 (MNI: -55, -67, 6). The GFP is located 20 mm directly posterior to the decoding cluster (and inferior to our TPJ ROI).

We have updated the Supplementary file 7, Figure 3—figure supplement 3, and Supplementary file 1, which now briefly discuss the spatial discrepancy of the left TPJ ROI and the Searchlight decoding cluster. We have also added references to Mars et al. (2012) and Bzdok et al. (2013).

Additional analyses:

5) Please report and compare accuracy and RTs for the response to the probe statement. If either differs between conditions, then that becomes a potential confound for the between condition contrasts in the MVPA analyses, considering that the BOLD response to the story and probe events could not be separated.

We thank the reviewers for suggesting this relevant analysis. Among the four social conditions, self-endo, self-exo, other-endo, and other-exo, we did not expect any systematic differences in performance on the probe statement task, given the extremely subtle semantic differences across conditions. As predicted, neither the mean accuracies (94.2% vs 93.4% vs 93.1% vs 93.4%; F3,93=0.15, p=0.930, repeated-measures ANOVA) nor the mean RTs (1.61s vs 1.60s vs 1.67s vs 1.65s; F3,93=1.66, p=0.181, repeated-measures ANOVA) for the response to the probe statement differed significantly across the four conditions.

These behavioral results relating to the response to the probe statement are now included in the revised manuscript, in a new first section of the Results (Behavioral results). Note also that since the story period was 10 sec, and the subsequent probe event was 4 sec, it was possible to focus the analysis on the MRI activity evoked by the story, fairly well uncontaminated by the probe event.

6) The TPJ is introduced in the context of theory of mind and social cognition, but it has also been implicated in attentional orienting. Could it be the case that participants simulate such orienting when reading the stories, leading to the above-chance decoding in TPJ? If this is the case, one may similarly expect above-chance decoding in areas that have been implicated in endogenous attention. This should be tested by including dorsal attention network ROIs. Above-chance decoding in such attention regions may inform the interpretation of the TPJ results.

We thank the reviewers for this intriguing suggestion. We have performed the requested analysis and included it in the new submission. If we understand the reviewers’ suggestion correctly, they hypothesize that our subjects, when reading stories, may have simulated exogenous and endogenous attention reorienting by activating the corresponding ventral and dorsal attention networks in a “mirror-neuron-like” fashion. This proposal would explain our TPJ result, because this area is a key node in the ventral attention network and might thus be involved in simulating exogenous attention orienting. If this mechanism is the underlying cause of the TPJ result, one would also predict the involvement of the dorsal attention network when subjects read (and simulated) the endogenous stories.

To test this prediction, as requested, we repeated the endogenous-versus-exogenous decoding analysis in four areas typically considered to constitute the dorsal attention network: the frontal eye fields (FEF), the anterior and posterior intraparietal sulcus (aIPS and pIPS), and the middle temporal complex (MT+). The ROIs were defined as 10-mm-radius spheres around the peak coordinates reported in Fox et al. (2006). The results are shown in Figure 3—figure supplement 6. None of the dorsal attention network ROIs decoded the attention type better than chance (all ps > 0.05, based on permutation testing with 10,000 iterations, uncorrected for multiple comparisons). These findings are thus incompatible with the proposed attention simulation account of the TPJ result.

This issue and these new results are now discussed in the fifth paragraph of the revised Discussion section, and the analysis results are also included in the supplementary material and shown in Figure 3—figure supplement 6.

7) Please provide univariate activity estimates, both in the ROIs and in whole-brain contrasts. This may help to interpret the multivariate results.

The results for the seven univariate contrasts corresponding to the main multivariate analyses are now included in the supplementary material (Supplementary file 5). We report univariate activity corrected for multiple comparisons both at the whole-brain level and within the search volume of each ROI. Notably, there were no significant clusters in any of the ROIs, or anywhere else in the brain, for the ENDOGENOUS > EXOGENOUS and EXOGENOUS > ENDGENOUS contrasts, not even at the uncorrected threshold p<0.001. These findings are compatible with previous studies (e.g., Hassabis et al. 2009 Current Biology) that have demonstrated the superiority of pattern-sensitive multivariate analyses compared with conventional univariate approaches for detecting differences in activity between conditions with highly similar macroscopic characteristics. In the revised manuscript, we added a sentence in the tenth Discussion paragraph referring to the univariate results.

The SELF > OTHER contrast also did not reveal any significant activity (even at the p<0.001 uncorrected level). The OTHER > SELF contrast revealed one single significant cluster, which was located in the left calcarine sulcus and survived whole-brain correction for multiple comparisons. We speculate that this activation reflects a main effect of the visual input of a name (e.g. “Emma”) in the OTHER condition versus the word “You” in the SELF. We observed no other activations at the p<0.001 uncorrected level in the rest of the brain, and the left calcarine activation was located at a distance from the locations of our ROIs (the STS, TPJ, MPFC, and precuneus) and did not overlap with the decoding results.

The attention type X agent type interactions did not reveal any activity at the whole-brain or ROI levels that survived multiple comparisons.

The Social > Non-Social contrast revealed more significant and widespread activity than the above contrasts, which was expected given the much greater semantic difference between the conditions. We found no significant activity within the ROIs, but four clusters survived correction at the whole-brain level: the left middle orbital gyrus, right SMG, left SFG, and the left insula.

Further support for main finding:

8) The reported decoding accuracy values are really quite small and close to 50% (chance), especially for the endogenous vs. exogenous and self vs. other contrasts, even those values that are statistically significant. Please report appropriate effect sizes and the confidence intervals around those effect sizes (for all your reported t-tests, not just those that are statistically significant). It would also be informative if you were to include in your graphs the individual subject data (e.g., mean decoding accuracy per subject for each ROI) and/or plots of the effect sizes and their distributions. For more on effect sizes, their CIs and associated plots, I point you to the following sources and the references therein:

https://thenewstatistics.com/itns

https://thenewstatistics.com/itns/2019/05/20/reply-to-lakens-the-correctly-used-p-value-needs-an-effect-size-and-ci/

https://www.estimationstats.com/

We thank the reviewers for this suggestion. We believe the decoding accuracy effects are not so small compared to other findings in the literature. This is especially so when considering that the effects noted by the reviewers are for the most subtle stimulus distinctions in our experiments. The effects are, of course, larger for the social-vs-nonsocial comparisons, because the distinction between the stimuli is much larger. In addition (as suggested by the reviewers in point 10 below), we have now replicated the crucial left TPJ decoding effect using three other anatomical definitions of that theory-of-mind ROI (see new Figure 3—figure supplement 2). All three anatomical definitions of the ROI show a highly significant effect (actually larger than the effect reported in our main analysis in the ROI based on van Veluw and Chance, 2014). Those replications of the finding may help convince the reviewers of the robustness of the result.

We also thank the reviewers for the suggestion to show more information about effect size. In Tables 1 and 2, we had already reported, for all analyses regardless of their significance, the effect size in terms of mean decoding accuracy (relative to chance), the 95% confidence interval around the mean (based on a bootstrap distribution), and the p value (based on permutation testing). To visualize the data more clearly, we have now included two additional supplementary figures featuring violin plots for all other ROI analyses (see Figure 3—figure supplement 1 and Figure 4—figure supplement 2). The violin plots show mean and median decoding accuracy, 95% confidence interval, a kernel density estimation, and individual data points for each decoding analysis and each ROI. (To avoid confusion and visual clutter, we left the figures in the main body of the paper as they were, and presented the new violin plots in the supplementary material.)

9) The authors computed the attentional state decoding separately for "self" and "other". These decoding accuracies did not differ. Please also provide and test the accuracies of self and other separately; this would shed light on the reliability of the main result (which was collapsed across the two conditions) and might also indicate whether the decoding was more reliable (less variable) in self or other.

Thank you for this suggestion. These results are now presented in Figure 3—figure supplement 1 in the revised submission. These results revealed no significant activity in the left TPJ (or in any of the other ROIs), which is likely due to a lack of statistical power (since only half of the data set is used for these analyses). There is, however, an alternative analysis that gets at the same question but does not suffer from a reduction in statistical power, and which does show significant activity in the left TPJ. We included that new analysis in the revision. Please see our response to the related point #11 for a description of that new analysis.

10) Replicate results in one or multiple additional ToM TPJ ROIs. Reviewers raised two suggestions: 1) Surface-based ROI: the TPJ is a highly anatomically variable cortical region that isn't well approximated by a single volume ROI (see Croxson et al. 2017). There are many surface-based ROIs now available that mitigate the effect of this variability, including ones from the authors' own lab. 2) Use a different meta-analysis: The used meta-analysis defines “theory of mind” solely in terms of false-belief tasks (van Veluw and Chance, 2014), which may not be appropriate for delineating the ROIs and their exact locations in your study. Different types of ToM task reliably activate different brain areas, as well as common ones (mPFC and bilateral TPJ: Molenberghs et al., Neuroscience and Biobehavioral Review 2016). Consider using the results of a different meta-analysis, e.g., one that identifies regions based on the conjunction of multiple types of theory-of-mind tasks (e.g., Mar, Annual Review of Psychology 2011; Molenberghs et al., 2016; Schurz et al., Neuroscience and Biobehavioral Reviews 2014).

We welcome the reviewers’ suggestion of replicating the endogenous-versus-exogenous decoding results in different ToM TPJ ROIs. In line with their recommendation, we defined three new left TPJ ROIs based on the peak coordinates reported in Mar (2011), Schurz et al. (2014), and Molenberghs et al. (2016). The results showed significant decoding in the Mar ROI (mean decoding accuracy 53.4%, 95% CI 51.0 to 55.1, p=0.0023), the Schurz ROI (mean decoding accuracy 52.7%, 95% CI 50.8 to 54.4, p=0.0092), and in the Molenberghs ROI (mean decoding accuracy 53.4%, 95% CI 51.3 to 55.2, p=0.0020), suggesting that the attention type decoding in the left TPJ is robust. In the revised paper, these results are included in the supplementary material (see Figure 3—figure supplement 2, also pasted above in relation to major point 8).

The suggestion of re-analyzing the data set using surface-based techniques is interesting and definitely something we aim to pursue in future studies. However, we feel that mixing surface- and volume-based analysis methods, and defining ROIs using fundamentally different approaches, might risk over-complicating the paper. Furthermore, the surface-based TPJ subdivision described by Igelstrom et al. (2016) suffers the same draw-back as the van Veluw and Chance (2014) meta-analysis in that it only involves one type of theory-of-mind task (a false belief task). We therefore decided to use volume-based methods throughout.

11) In previous work (Kelly et al., 2014), the authors were interested in an overlap between attention in self and other. Here, this could be addressed in a cross-decoding analysis, training a classifier on exo vs endo in the "self" stories and testing this on the "other" stories. Above-chance classification in this analysis would strengthen the evidence for lTPJ involvement and would provide additional information that would help interpreting the TPJ findings.

We thank the reviewer for this excellent suggestion. Indeed, one of the goals of the study was to test for differences in the encoding of attention type in self versus others (Prediction 3). Attention type could be significantly decoded in the left TPJ when taking all data (“self” and “other” stories) into account (Figure 3A). However, we did not find significant decoding in the left TPJ when analyzing the “self” stories and “other” stories separately (see our response to point #9), which is likely due to a lack of statistical power because the data set was split in two. The absence of a self-vs-other difference in attention decoding could thus be due to either a lack of statistical power, or a true overlap between attention type encoding in self and other.

A direct way of testing for overlap in attention encoding in self and others in the left TPJ, while maintaining statistical power by using the entire data set, is to employ a two-way cross-classification analysis. In line with the reviewers’ suggestion, we have now included this analysis in the paper (Figure 3—figure supplement 7). In each subject, one classifier was trained to discriminate endogenous versus exogenous self-stories and tested on other-stories (using a leave-one-run out approach), and another classifier was trained to discriminate endogenous versus exogenous other-stories and tested on self-stories, from which an average cross-classification decoding accuracy for the left TPJ was obtained. The group-level result shows significant above-chance decoding (mean decoding accuracy 51.9%, 95% CI 49.9 to 54.1, p=0.0393, permutation testing with 10,000 iterations). This finding suggests that there is an overlap in brain mechanisms that participate in the encoding of attention in others and in encoding of attention in the self in the left TPJ.

The new results are now included in the Results section (under Prediction 3) and Figure 3—figure supplement 7 in the revised paper. We have also expanded the eighth Discussion paragraph, relating to the interaction between attention type and agent type, to address this point.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Essential Revisions:

1) Please report and analyze the univariate activity in the ROIs

2) More fully report (in main Results section) and discuss the searchlight results.

We thank the editor and two reviewers for their detailed and constructive comments. In the revised submission, we report the univariate activity averaged across all voxels within each ROI, and report and discuss the Searchlight results in the main text. In our view the paper has been greatly improved as a result of these revisions. Below we answer all comments in a point-by-point manner.

Reviewer #1:

The authors have addressed many of my concerns. In particular, the additional TPJ ROIs and the demonstration of self-other cross-decoding increased my confidence in the main finding of attention state decoding in lTPJ.

One comment has not been sufficiently addressed, however: it would be highly relevant and informative to see the average univariate activity for each of the conditions in the 6 original ROIs, plotted in a graph. The endo>exo contrast should be tested, correcting for the 6 ROIs, similar to how the decoding accuracies are tested (i.e., with the same p<0.05 cut-off). This ROI-based univariate analysis is also interesting for other comparisons, e.g. to test whether the social conditions gave higher activity than the non-social condition, as would be expected based on previous work. Thus, please provide a full univariate analysis on average ROI activity.

We thank the reviewer for this suggestion. In the revised submission, we now include a supplementary figure showing the difference in average univariate activity across all voxels within each ROI, for the endo-vs-exo, self-vs-other, and social-vs-nonsocial contrasts. In line with the voxelwise univariate results reported in Supplementary file 5, none of these contrasts were statistically significant (all FDR-corrected ps > 0.21).

There is a lot of interesting information in the Supplementary figures (e.g., searchlight, additional ROIs), which will not be visible when viewing or printing pdfs. You could consider moving some of this to the main text, or add panels to existing figures where you include some of this information.

We agree with the reviewer, and have incorporated some of the supplementary results into the main text in the revised submission. Specifically, we have added three panels to Figure 3, which now features the additional left TPJ ROI results, the endo-versus-exo Searchlight results, and the cross-classification results.

Reviewer #2:

In this revised manuscript, the authors have greatly clarified some key points, most important of which was how well matched the behavior was across all of the conditions. While there is still a potential confound in the social vs. non-social contrast given the difference in reaction time, (a) the confound effects are likely negligible in size, (b) the end result matches the previous literature, and (c) the contrast was only a control analysis and doesn't greatly affect the main point of the paper. However, the handling of the left posterior STS result in the searchlight analysis remains problematic, both in the manuscript and in the author's response.

pSTS response and discussion: The main point of the manuscript is to examine whether known theory of mind areas contain information that differentiates between the endogenous vs. exogenous story conditions. The revised manuscript makes quite clear that the answer is yes, activity in the L TPJ differentiates between the two conditions, albeit with a relatively low decoding accuracy. The fact that multiple versions of the L TPJ ROI produces the same result is particularly reassuring. The spotlight analysis, however, finds that there is potentially more discriminatory information to be found in the posterior STS/middle temporal gyrus, and this is where the manuscript runs into problems.

In the revised manuscript, this result is simply not mentioned, though it clearly has implications for the interpretations of the result as mentioned in the previous round of reviews. In addition, in the response to the reviewers' comments, the authors give a very convoluted and confusing argument based on MNI coordinates that somehow this focus is somehow (a) part of the TPJ and (b) may be a part of the attention-related TPJa as opposed to their TPJp ROI. In the process of saying that the surface projections are misleading, they state that the focus as falling on the STG on the surface projection, when it is really on the MTG. In addition, they base their arguments on MNI coordinates, but MNI coordinates are notoriously inaccurate in comparing results from different studies, so the author's attempts to justify their conclusions using these coordinates falls flat. The authors own volume images very clearly show the focus as being in the STS and not on either the angular gyrus or supramarginal gyrus that Mars, Corbetta, and others have generally shown the TPJp and TPJa to fall on. And even if this focus was in the TPJa, as the authors seem to hint at, they do not discuss the implications of this result.

The authors' defensiveness around this point is frankly puzzling. The searchlight results do not seem to invalidate their main point, only potentially augment it. The fact that the posterior STS (and what seem to be the right FEF and iPCS areas) exhibit higher discriminability between the endo and exo conditions may not be shocking given that the theory of mind areas are likely involved in many operations in this task, whereas the pSTS and right hemisphere areas likely may only be involved in just the imagined attention/sensory processing aspects of the task. It is possible that these areas are "reading out" the differing attentional conditions from the L TPJ. Whatever the explanation may be, the authors seem to be trying to bury this result to shoehorn the results into fitting their a priori model, which is a disservice and misleading to the readers, and needs to be rectified before the manuscript can be published. My recommendation is to at least mention the searchlight results in the main Results section, then add a paragraph to the Discussion discussing the implications of these results. Better yet would be to quantify the discrimination accuracy within each of the foci uncovered in the spotlight analyses.

We are sorry to hear that the reviewer interpreted our previous response as defensive, it was not our intention. And we apologize for erroneously referring to STG in our response (“they state that the focus as falling on the STG on the surface projection”), it should have said MTG, which is obvious from the surface projection in Figure 3—figure supplement 3. In the revised submission, we now include the searchlight results in the main Results section and in the main results Figure 3 (panel C).

First, we would like to take the opportunity put the searchlight result into a broader context. The searchlight analysis is fundamentally different from the ROI analysis. It is not targeted to specific brain areas on the basis of strong predictions. Instead, it is a whole-brain analysis that is much more statistically conservative because of the brain-wide correction for multiple comparisons. In general, one would not necessarily expect the searchlight analysis to align perfectly with the ROI analysis. Its usefulness is that it may reveal any cluster of very strong decoding that was missed by the more sensitive analysis restricted to the ROIs. However, the results of our searchlight analysis revealed no brain-wide significant clusters of decoding for the endogenous versus exogenous distinction. Thus, we cannot make any inferences about attention-type decoding based on the searchlight results. However, in a purely descriptive manner, we report the (four) decoding clusters that survived the conventional uncorrected statistical threshold p<0.001. It is interesting to note that the brain-wide searchlight decoding peak happened to fall in the posterior STS just 20 mm from the center of our predefined left TPJ ROI, and that they belong to the same anatomical subregion of the TPJ (TPJp). In our view, this approximate, descriptive overlap lends at least some additional confidence to our significant left TPJ ROI decoding result. Nevertheless, since the searchlight result consists of an uncorrected decoding map, we have been very careful to not over-interpret the data. We suspect that the reviewer may have taken this caution on our part as defensiveness. Because the searchlight results for the endo-versus-exo distinction were inconclusive, interpretations such as the one offered by the reviewer (“The fact that the posterior STS [and what seem to be the right FEF and iPCS areas] exhibit higher discriminability between the endo and exo conditions may not be shocking given that the theory of mind areas are likely involved in many operations in this task, whereas the pSTS and right hemisphere areas likely may only be involved in just the imagined attention/sensory processing aspects of the task. It is possible that these areas are "reading out" the differing attentional conditions from the L TPJ.”) are in our view therefore highly speculative.

The goal with our previous response was to make an honest attempt at pinpointing the functional location the left pSTS searchlight cluster, which was prompted by the reviewer’s previous comment that questioned whether the cluster belong to the TPJ at all and speculated that it perhaps belongs to TPJa or coincides with the gaze-following patch reported in Marquardt et al. 2017. In contrast to the reviewer’s assertion that the TPJa and TPJp encompass only the angular gyrus and the SMG, the connectivity-based subdivision of TPJa and TPJp described by Bzdok et al. (2013) and Mars et al. (2012) both define the pSTS as part of the TPJ. According to this definition, our pSTS searchlight cluster clearly belongs to the TPJ and specifically to the posterior subregion TPJp. The reviewer appears to have the impression that we are arguing the cluster belongs to the TPJa (“the authors [argue that] this focus is somehow (a) part of the TPJ and (b) may be a part of the attention-related TPJa as opposed to their TPJp ROI.” and “And even if this focus was in the TPJa, as the authors seem to hint at […]”); however, we are not. Furthermore, the pSTS searchlight cluster does not coincide with the gaze-following patch reported in Marquardt et al. 2017.

To help clarify the relationship between the left TPJ ROI, in which we obtained significant decoding, and the left STS, in which we found a non-significant peak in the searchlight analysis, we performed a further analysis for the reviewer. In this new analysis, we used a more lax statistical threshold, to better examine whether there is any halo of activity hidden under the statistical threshold. Instead of using the arbitrary uncorrected threshold of p<0.001, we used the uncorrected threshold of p<0.01, in the searchlight analysis. Author response image 1 shows the result. Using this threshold, the left posterior STS cluster merged with clusters encompassing both the angular gyrus, the SMG and the MTG. The general area of activity is now more clearly in the TPJ, including the TPJ ROI and extending into the STS. It is important to keep in mind, however, that all of these methods – the method that shows a small peak in the STS, and the method that shows a large cluster encompassing the TPJ – should not be taken as definite, statistically strong findings. They are the result of exploratory searchlight analysis and, though interesting, should be taken cautiously.

Author response image 1

We certainly want to avoid the impression of “burying” the searchlight results. Although descriptive, they are interesting and compatible with our ROI results, and constitute valuable information not least for future studies. Therefore, we now include the endo-versus-exo searchlight results in the updated main results figure (Figure 3C) and report the results, including the decoding accuracy for the pSTS peak voxel (53.7%), in the main Results section under “Prediction 1”.

https://doi.org/10.7554/eLife.63551.sa2

Article and author information

Author details

  1. Arvid Guterstam

    1. Department of Psychology, Princeton University, Department of Psychology, Princeton, United States
    2. Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft
    For correspondence
    arvidg@princeton.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3694-1318
  2. Branden J Bio

    Department of Psychology, Princeton University, Department of Psychology, Princeton, United States
    Contribution
    Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Andrew I Wilterson

    Department of Psychology, Princeton University, Department of Psychology, Princeton, United States
    Contribution
    Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Michael Graziano

    Department of Psychology, Princeton University, Department of Psychology, Princeton, United States
    Contribution
    Conceptualization, Writing - review and editing
    Competing interests
    No competing interests declared

Funding

Wenner-Gren Foundation

  • Arvid Guterstam

Swedish Brain Foundation

  • Arvid Guterstam

Sweden-America Foundation

  • Arvid Guterstam

Promobilia Foundation

  • Arvid Guterstam

Princeton Institute for International and Regional Studies

  • Arvid Guterstam
  • Branden J Bio
  • Andrew I Wilterson
  • Michael Graziano

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the Princeton Neuroscience Institute Innovation Fund. Arvid Guterstam was supported by the Wenner-Gren Foundation, the Sweden-America Foundation, the Swedish Brain Foundation, and the Promobilia Foundation. The authors would like to thank Sam Nastase for valuable input regarding the multivoxel pattern analysis.

Ethics

Human subjects: All subjects provided informed consent and all procedures were approved by the Princeton Institutional Review Board (IRB# 10740).

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Marius V Peelen, Radboud University, Netherlands

Reviewer

  1. Anthony Atkinson, Durham University, United Kingdom

Publication history

  1. Received: September 28, 2020
  2. Accepted: February 4, 2021
  3. Version of Record published: February 15, 2021 (version 1)

Copyright

© 2021, Guterstam et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 301
    Page views
  • 41
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Timothy S Balmer et al.
    Research Article Updated

    Synapses of glutamatergic mossy fibers (MFs) onto cerebellar unipolar brush cells (UBCs) generate slow excitatory (ON) or inhibitory (OFF) postsynaptic responses dependent on the complement of glutamate receptors expressed on the UBC’s large dendritic brush. Using mouse brain slice recording and computational modeling of synaptic transmission, we found that substantial glutamate is maintained in the UBC synaptic cleft, sufficient to modify spontaneous firing in OFF UBCs and tonically desensitize AMPARs of ON UBCs. The source of this ambient glutamate was spontaneous, spike-independent exocytosis from the MF terminal, and its level was dependent on activity of glutamate transporters EAAT1–2. Increasing levels of ambient glutamate shifted the polarity of evoked synaptic responses in ON UBCs and altered the phase of responses to in vivo-like synaptic activity. Unlike classical fast synapses, receptors at the UBC synapse are virtually always exposed to a significant level of glutamate, which varies in a graded manner during transmission.

    1. Developmental Biology
    2. Neuroscience
    Hiroki Takechi et al.
    Research Article

    Transmembrane protein Golden goal (Gogo) interacts with atypical cadherin Flamingo to direct R8 photoreceptor axons in the Drosophila visual system. However, the precise mechanisms underlying Gogo regulation during columnar- and layer-specific R8 axon targeting are unknown. Our studies demonstrated that the insulin secreted from surface and cortex glia switches the phosphorylation status of Gogo, thereby regulating its two distinct functions. Non-phosphorylated Gogo mediates the initial recognition of the glial protrusion in the center of the medulla column, whereas phosphorylated Gogo suppresses radial filopodia extension by counteracting Flamingo to maintain a one axon to one column ratio. Later, Gogo expression ceases during the midpupal stage, thus allowing R8 filopodia to extend vertically into the M3 layer. These results demonstrate that the long- and short-range signaling between the glia and R8 axon growth cones regulates growth cone dynamics in a stepwise manner, and thus shape the entire organization of the visual system.