Introduction

A distinguishing feature of human cognition is our ability to reason about complex cause-effect relationships, particularly when causes are hidden (Tooby & DeVore, 1987, Lagnado et al., 2007; Rottman, Ahn, & Luhmann, 2011; Muentener & Schulz, 2014; Sloman & Lagnado, 2015; Goddu & Gopnik, 2024). Inferring the causes of illness provides a classic example (Keil et al., 1999; Waldmann, 2000; Meder & Mayrhofer, 2017; Legare & Shtulman, 2018). When reading something like, Hugh sat by sneezing passengers on the subway. Now he has a case of COVID, we naturally infer a causal relationship between crowded spaces and the invisible transmission of infectious disease. Here we investigate the neurocognitive mechanisms that support such automatic inferences.

Inferring illness causes is a universal yet culturally variable phenomenon (Ackerknecht, 1982; Foster, 1976; Legare & Gelman, 2008; Lock & Nguyen, 2010). Young children reason about the causes of illness even prior to formal schooling, for instance attributing illness to contamination or contact with a sick person (Springer & Ruckel, 1992; Kalish, 1996, 1997; Keil et al., 1999; Raman & Gelman, 2005; Legare & Gelman, 2008; Legare, Wellman, & Gelman, 2009). During development and throughout life, we acquire cultural knowledge about the invisible forces that bring about illness, from pathogen transmission to divine retribution (Notaro, Gelman, & Zimmerman, 2001; Raman & Winer, 2004; Lynch & Medin, 2006; Legare & Gelman, 2008; Legare et al., 2012). In many societies, designated ‘healers’ become experts in diagnosing and curing disease (Foster, 1976; Ackerknecht, 1982; Norman et al., 2006; Lightner, Heckelsmiller, & Hagen, 2021) and non-experts routinely infer the causes of illness in themselves and others (e.g., how did my friend get COVID?).

One hypothesis is that causal inferences about illness depend on content-specific semantic representations in the ‘animacy network’ (see preregistration https://osf.io/cx9n2/) (Fairhall & Caramazza, 2013b; Deen & Freiwald, 2022). This hypothesis is consistent with the prominent view that semantic knowledge is organized into distinct causal frameworks, so-called ‘intuitive theories’ (e.g., Wellman & Gelman, 1992; Gopnik & Meltzoff, 1997; Tenenbaum et al., 2007; Gerstenberg & Tenenbaum, 2017). Illness exclusively affects living things, and semantic networks that represent living things may contain causal information. The causal mechanisms that lead to illness are related to our intuitions about bodily function and are distinguishable from intuitions about inanimate objects (Wellman & Gelman, 1992; Keil et al., 1999; Inagaki & Hatano, 2006). For example, people but not machines transmit diseases to each other through physical contact. Thinking about living things (animates) as opposed to non-living things (inanimate objects/places) depends on partly dissociable neural systems (e.g., Warrington & Shallice, 1984; Hillis & Caramazza, 1991; Caramazza & Shelton, 1998; Farah & Rabinowitz, 2003). Recent evidence suggests that the precuneus (PC) in particular responds preferentially to images and words referring to living things (i.e., people and animals) across a variety of tasks, such as semantic categorization and similarity judgment (Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019 Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2023). Whether the PC is sensitive to causal inferences about illness is not known.

The PC is also active during mental state inferences (Saxe & Kanwisher, 2003; Saxe et al., 2006). We therefore tested whether inferences about illness (bodies) as opposed to mental states (minds) recruit the same or different subregions of the PC by localizing the ‘mentalizing’ network (Dodell-Feder et al., 2010; Dufour et al., 2013).

A non-mutually exclusive hypothesis is that inferring illness causes depends on neurocognitive mechanisms that support causal inferences regardless of their content, e.g., causal inferences about bodies (e.g., illness), minds, or mechanical objects (e.g., mechanical failure). A large body of behavioral evidence has shown that children and adults use similar cognitive principles to infer causality across domains (e.g., Cheng & Novick, 1992; Waldmann & Holyoak, 1992; Pearl, 2000; Gopnik et al., 2001; Steyvers et al., 2003; Gopnik et al., 2004; Schulz & Gopnik, 2004; Rehder & Burnett, 2005; Lagnado et al., 2007; Rottman & Hastie, 2014; Davis & Rehder, 2020). For instance, children as young as 3 years old correctly discard or ‘screen off’ conditionally independent variables when identifying the causes of both biological and psychological events (i.e., allergic reactions, fear responses) (Schulz & Gopnik, 2004). The deployment of content-invariant probabilistic knowledge during causal inference suggests that a common brain network could enable causal inferences about any domain, including illness.

Frontotemporal language processing mechanisms themselves could support causal inferences across domains that occur during language comprehension. Language has a rich capacity to convey causal information (Tooby & DeVore, 1987; Pinker, 2003; Solstad & Bott, 2017) and has been suggested to play an important role in some aspects of combinatorial thought (e.g., Spelke, 2003; 2022). Alternatively, causal inference may depend on general logical deduction mechanisms (Goldvarg & Johnson-Laird, 2001; Barbey & Patterson, 2011; Khemlani et al., 2014; Operskalski & Barbey, 2017). Finally, it is possible that causal inferences are supported by a dedicated ‘causal engine’ that is distinct from language processing or logical reasoning (Gopnik et al., 2004; Saxe & Carey, 2006; Tenenbaum et al., 2007; Carey, 2011).

To our knowledge, no prior study has examined the neural basis of implicit causal inferences about illness. A handful of prior experiments investigated neural responses during explicit causality judgments that were collapsed across semantic domains (e.g., biological, mechanical, mental state inference). For example, Kuperberg et al. (2006) asked participants to rate the causal relatedness of three-sentence stories and observed higher responses to causal stories in left frontotemporal cortex (Kuperberg et al., 2006). Frontotemporal responses in both hemispheres have been observed in other experiments using single words, symbols, and passages as stimuli, but across studies no consistent neural signature of causal inference has emerged (Ferstl & von Cramon, 2001; Satpute et al., 2005; Fugelsang & Dunbar, 2005; Chow et al., 2008; Fenker et al., 2009; Mason & Just, 2011; Prat et al., 2011; Kranjec et al., 2012; Pramod, Chomik-Morales, et al., 2024). Nearly all prior experiments use explicit tasks, and, in many cases, causal trials were more difficult. Some of the observed effects may therefore reflect linguistic or executive load rather than cognitive processes specific to causal inferences. Additionally, prior studies did not localize language or logical reasoning networks in individual participants, making it difficult to assess the involvement of these systems (e.g., Fedorenko et al., 2010; Monti et al., 2009).

In the current fMRI study, we use an implicit task to capture automatic causal inferences that unfold during language comprehension (Black & Bern, 1981; Keenan et al., 1984; Trabasso & Sperry, 1985; Myers et al., 1987; Duffy et al., 1990). We treat causal inferences about illness as a case study of a circumscribed yet highly familiar and motivationally relevant domain (Keil et al., 1999; Legare & Shtulman, 2018). We asked participants to read two-sentence vignettes (e.g., “Hugh sat by sneezing passengers on the subway. Now he has a case of COVID”). The first sentence described a potential cause and the second sentence a potential effect. Participants performed a covert task of detecting ‘magical’ catch-trial vignettes.

Causal inferences about illness were compared to two control conditions: i) causal inferences about mechanical failure (e.g., “Jake dropped all of his things on the subway. Now he has a shattered phone”) and ii) vignettes that contained illness-related language but were causally unrelated (e.g., “Lynn dropped all of her things on the subway. Now she has a case of COVID”). This combination of control conditions allowed us to test jointly for sensitivity to content domain and causality. Critically, all vignettes, including mechanical ones, described events involving people, such that responses to causal inferences about illness in the animacy network could not be explained by the presence of animate agents in the stories. Non-causal vignettes were constructed by shuffling the causes/effects across conditions and were therefore matched in linguistic content to the causal vignettes. A separate group of participants rated the causal relatedness of all vignettes prior to the experiment. We predicted that illness inferences would activate the PC relative to both the mechanical inference (causal non-illness) and non-causal control conditions. We also localized language and logical reasoning networks in each participant to test the alternative but not mutually exclusive prediction that the language and/or logical reasoning networks respond preferentially to causal inference regardless of content domain (i.e., illness, mechanical).

Method

Open science practices

The methods and analysis of this experiment were pre-registered prior to data collection (https://osf.io/cx9n2/).

Participants

Twenty adults (7 females, 13 males, 25-37 years old, M = 28.7 years ± 3.2 SD) participated in the study. Participants either had or were pursuing graduate degrees (M = 8.8 years of post-secondary education). Two additional participants were excluded from the final dataset due to excessive head motion (> 2 mm) and an image artifact. One participant in the final dataset exhibited excessive head motion (> 2 mm) during 1 run of the language/logic localizer task that was excluded from analysis. All participants were screened for cognitive and neurological disabilities (self-report). Participants gave written informed consent and were compensated $30 per hour. The study was reviewed and approved by the Johns Hopkins Medicine Institutional Review Boards.

Causal inference experiment

Stimuli

Participants read two-sentence vignettes in 4 conditions, 2 causal and 2 non-causal (Figure 1C). Each vignette focused on a single agent, specified by a proper name in the initial sentence and by a pronoun in the second sentence. The first sentence described something the agent did or experienced and served as the potential cause. The second sentence described the potential effect (e.g., “Kelly shared plastic toys with a sick toddler at her preschool. Now she has a case of chickenpox”). Illness-Causal vignettes elicited inferences about biological causes of illness, including pathogen transmission, exposure to environmental toxins, and genetic mutations (see Supplementary Table 1 for a full list of the types of illnesses included in our stimuli).

Responses to illness inferences in the precuneus (PC).

Panel A: Percent signal change (PSC) for each condition among the top 5% Illness-Causal > Mechanical-Causal vertices in a left PC mask (Dufour et al., 2013) in individual participants, established via a leave-one-run-out analysis. Panel B: Whole-cortex results (one-tailed) for Illness-Causal > Mechanical-Causal and Illness-Causal > Non-Causal (both versions of non-causal vignettes), corrected for multiple comparisons (p < .05 FWER, cluster-forming threshold p < .01 uncorrected). Vertices are color coded on a scale from p=0.01 to p=0.00001. Panel C: Example stimuli. ‘Magical’ catch trials similar in meaning and structure (e.g., “Sadie forgot to wash her face after she ran in the heat. Now she has a cucumber nose”) enabled the use of a semantic ‘magic detection’ task.

Mechanical-Causal vignettes elicited inferences about physical causes of structural damage to personally valuable inanimate objects (e.g., houses, jewelry). Two non-causal conditions used the same sentences as in the Illness-Causal and Mechanical-Causal conditions but in a shuffled order: illness cause with mechanical effect (Non-Causal Illness First) or mechanical cause with illness effect (Non-Causal Mechanical First). Explicit causality judgments collected from a separate group of online participants (n=26) verified that the both causal conditions Illness-Causal, Mechanical-Causal) were more causally related than both non-causal conditions, t(25) = 36.97, p < .0001. In addition, Illness-Causal and Mechanical-Causal items received equally high causality ratings, t(25) = –0.64, p = 0.53 (see Appendix 1 for details).

Illness-Causal and Mechanical-Causal vignettes were constructed in pairs such that each member of a given pair shared parallel or near-parallel phrase structure. All conditions were also matched (pairwise t-tests, all ps > 0.3, no statistical correction) on multiple linguistic variables known to modulate neural activity in language regions (e.g., Pallier, Devauchelle, & Dehaene, 2011; Shain, Blank et al., 2020). These included number of characters, number of words, average number of characters per word, average word frequency, average bigram surprisal (Google Books Ngram Viewer, https://books.google.com/ngrams/), and average syntactic dependency length (Stanford Parser; de Marneffe, MacCartney, & Manning, 2006). Word frequency was calculated as the negative log of a word’s occurrence rate in the Google corpus between the years 2017-2019. Bigram surprisal was calculated as the negative log of the frequency of a given two-word phrase in the Google corpus divided by the frequency of the first word of the phrase (see Appendix 2 for details). All conditions were matched for all linguistic variables across the first sentence, second sentence, and the entire vignette.

Procedure

We used a ‘magic detection’ task to encourage participants to process the meaning of the vignettes without making explicit causality judgments. Participants saw ‘magical’ catch trials that closely resembled the experimental trials but were fantastical (e.g., “Sadie forgot to wash her face after she ran in the heat. Now she has a cucumber nose.”). On each trial, participants indicated via button press whether ‘something magical’ occurred in the vignette (Yes/No). Both sentences in a vignette were presented simultaneously for 7 s, one above the other, followed by a 12 s inter-trial interval. Each participant saw 38 trials per condition plus 36 ‘magical’ catch trials (188 total trials) in one of two versions, counterbalanced across participants, such that individual participants did not see the same sentence in both causal and non-causal vignettes. The two stimulus versions had similar meanings but different surface forms (e.g., “Luna stood by coughing travelers on the train…” vs. “Hugh sat by sneezing passengers on the subway…”).

The experiment was divided into 6 10-minute runs containing a similar number of trials per condition per run presented in a pseudorandom order. Specifically, vignettes from the same experimental condition repeated no more than twice consecutively, vignettes that were constructed to share a similar phrase structure never repeated within a run, vignettes that referred to the same illness never repeated consecutively, and vignettes from each condition, including catch trials, were equally distributed in time across the course of the experiment.

Language/logic localizer experiment

A localizer task was used to identify the language and logic networks in each participant. The task had three conditions: language, logic, and math. In the language condition, participants judged whether two visually presented sentences, one in active and one in passive voice, shared the same meaning. In the logic condition, participants judged whether two logical statements were consistent (e.g., If either not Z or not Y then X vs. If not X then both Z and Y). In the math condition, participants judged whether the variable X had the same value across two equations (for details see Liu et al., 2020). Trials lasted 20 s (1 s fixation + 19 s display of stimuli) and were presented in an event-related design. Participants completed 2 9-minute runs of the task, with trial order counterbalanced across runs and participants. Following prior studies, the language network was identified in individual participants by contrasting language > math and the logic network by contrasting logic > language (Monti et al., 2009; Kanjlia et al., 2016; Liu et al., 2020).

Mentalizing localizer experiment

An additional localizer task was used to identify the mentalizing network in each participant (Saxe & Kanwisher, 2003; Dodell-Feder et al., 2011; http://saxelab.mit.edu/use-our-efficient-false-belief-localizer). In this task, participants read 10 mentalizing stories (e.g., a protagonist has a false belief about an object’s location) and 10 physical stories (physical representations depicting outdated scenes, e.g., a photograph showing an object that has since been removed) before answering a true/false comprehension question. We used the mentalizing stories from the original localizer but created new stimuli for the physical stories condition. Our physical stories incorporated more vivid descriptions of physical interactions and did not make any references to human agents. They were also linguistically matched to the mentalizing stories to reduce linguistic confounds (see Shain et al., 2022). Specifically, we matched physical and mentalizing stories (pairwise t-tests, all ps > 0.3, no statistical correction) for number of characters, number of words, average number of characters per word, average syntactic dependency length, average word frequency, and average bigram surprisal, as was done for the causal inference vignettes. A comparison of both localizers in 3 pilot participants can be found in Supplementary Figure 9.

Trials were presented in an event-related design, with each one lasting 16 s (12 s stories + 4 s comprehension question) followed by a 12 s inter-trial interval. Participants completed 2 5-minute runs of the task, with trial order counterbalanced across runs and participants. The mentalizing network was identified in individual participants by contrasting mentalizing stories > physical stories (Saxe & Kanwisher, 2003; Dodell-Feder et al., 2011).

Data acquisition

Whole-brain fMRI data was acquired at the F.M. Kirby Research Center of Functional Brain Imaging on a 3T Phillips Achieva Multix X-Series scanner. T1-weighted structural images were collected in 150 axial slices with 1 mm isotropic voxels using the magnetization-prepared rapid gradient-echo (MP-RAGE) sequence. T2*-weighted functional BOLD scans were collected in 36 axial slices (2.4 2.43 mm voxels, TR = 2 s). Data were acquired in one experimental session lasting approximately 120 minutes. All stimuli were visually presented on a rear projection screen with a Cambridge Research Systems BOLDscreen 32 UHD LCD display (image resolution = 1920 x 1080) using custom scripts written in PsychoPy3 (https://www.psychopy.org/, Peirce et al., 2019). Participants viewed the screen via a front-silvered, 45° inclined mirror attached to the top of the head coil.

fMRI data preprocessing and general linear model (GLM) analysis

Preprocessing included motion correction, high-pass filtering (128 s), mapping to the cortical surface (Freesurfer), spatially smoothing on the surface (6 mm FWHM Gaussian kernel), and prewhitening to remove temporal autocorrelation. Covariates of no interest included signal from white matter, cerebral spinal fluid, and motion spikes.

For the main causal inference experiment, the GLM modeled the four main conditions (Illness-Causal, Mechanical-Causal, Non-Causal Illness First, Non-Causal Mechanical First) as well as the ‘magical’ catch trials during the 7 s display of the vignettes after convolving with a canonical hemodynamic response function and its first temporal derivative. For the language/logic localizer experiment, a separate predictor was included for each of the three conditions (language, logic, math), modeling the 20 s duration of each trial. For the mentalizing localizer experiment, a separate predictor was included for each condition (mentalizing stories, physical stories), modeling the 16 s display of each story and corresponding comprehension question.

For each task, runs were modeled separately and combined within subject using a fixed-effects model (Dale, Fischl, & Sereno, 1999; Smith et al., 2004). Group-level random-effects analyses were corrected for multiple comparisons across the whole cortex at p < .05 family-wise error rate (FWER) using a nonparametric permutation test (cluster-forming threshold p < .01 uncorrected) (Winkler et al., 2014; Eklund, Nichols, & Knutsson, 2016; Eklund, Knutsson, & Nichols, 2019). A control analysis of the causal inference task that modeled participant response time and number of people in each vignette revealed similar results to a model without the covariates added. We therefore report only the results of the model without the covariates.

Individual-subject ROI analysis (univariate)

We defined individual-subject functional ROIs in the PC, TPJ, language (frontal and temporal masks), and logic networks. In an exploratory analysis, we defined individual-subject fROIs in anterior medial VOTC. For all analyses, PSC was extracted and averaged over the entire duration of the trial (17 s total), allowing 4 s to account for the hemodynamic lag.

Illness inference ROIs were created in left and right PC group search spaces (Dufour et al., 2013) using an iterated leave-one-run-out procedure, which allowed us to perform sensitive individual-subjects analysis while avoiding double-dipping (Vul & Kanwisher, 2011). In each participant, we identified the most illness inference-responsive vertices in left and right PC search spaces in 5 of the 6 runs (top 5% of vertices, Illness-Causal > Mechanical-Causal). We then extracted percent signal change (PSC) for each condition compared to rest in the held-out run (Illness-Causal, Mech-Causal, Non-Causal Illness First, Non-Causal Mechanical First), averaging the results across all iterations. We performed the same analysis using left and right TPJ search spaces (Dufour et al., 2013). We used the same approach to create mechanical inference ROIs in left and right anterior medial VOTC search spaces from a previous study on place word representations (Hauptman, Elli, et al., 2023). All aspects of this analysis were the same as those described above, except that the most mechanical inference-responsive vertices (top 5%, Mechanical-Causal > Illness-Causal) were selected.

Mentalizing ROIs were created by taking the most mentalizing-responsive vertices (top 5%) in bilateral PC and TPJ search spaces (Dufour et al., 2013) using the mentalizing stories > physical stories contrast from the mentalizing localizer. Language ROIs were identified by taking the most language-responsive vertices (top 5%) in left frontal and temporal language areas (search spaces: Fedorenko et al., 2010) using the language > math contrast from the language/logic localizer. A logic-responsive ROI was identified by taking the most logic-responsive vertices (top 5%) in a left frontoparietal network (search space: Liu et al., 2020) using the logic > language contrast. In each ROI, we extracted PSC for all conditions in the causal inference experiment.

ROI MVPA

We performed MVPA (PyMVPA toolbox; Hanke et al., 2009) to test whether patterns of activity in the PC distinguished illness inferences from mechanical inferences. In each participant, we identified the top 300 vertices most responsive to causal inference across domains (i.e., both Illness-Causal + Mech-Causal > Rest) in a left PC mask (Dufour et al., 2013).

For each vertex in each participant’s ROIs, we obtained one observation per condition per run (z-scored beta parameter estimate of the GLM). A linear support vector machine (SVM) was then trained on data all but one of the runs and tested on the left-out run in a cross-validation procedure. Classification accuracy was averaged across all permutations of the training/test splits. We compared classifier performance within each ROI to chance (50%; one-tailed test).

Significance was evaluated against an empirically generated null distribution using a combined permutation and bootstrap approach (Schreiber & Krekelberg, 2013; Stelzer et al., 2013). In this approach, t-statistics obtained for the observed data are compared against an empirically generated null distribution. We report the t-values obtained for the observed data and the nonparametric p-values, where p corresponds to the proportion of the shuffled analyses that generated a comparable or higher t-value.

The null distribution was generated using a balanced block permutation test by shuffling condition labels within run 1000 times for each subject (Schreiber & Krekelberg, 2013). Then, a bootstrapping procedure was used to generate an empirical null distribution for each statistical test across participants by sampling one permuted accuracy value from each participant’s null distribution 15,000 times (with replacement) and running each statistical test on these permuted samples, thus generating a null distribution of 15,000 statistical values for each test (Stelzer et al. 2013).

Searchlight MVPA

We used a whole-brain SVM classifier to decode among Illness-Causal vs. Mechanical-Causal, Illness-Causal vs. Non-Causal Mechanical First, Illness-Causal vs. Non-Causal Illness First, and Causal (both) vs. Non-Causal (both) over the whole cortex using a 10 mm radius spherical searchlight (according to geodesic distance, to better respect cortical anatomy over Euclidean distance; Glasser et al., 2013). This yielded for each participant 4 classification maps indicating the classifier’s accuracy in a neighborhood surrounding every vertex. Individual subject searchlight accuracy maps were then averaged, and the resulting group-wise map was thresholded using the PyMVPA implementation of the 2-step cluster-thresholding procedure described in Stelzer et al. (2013) (Hanke et al., 2009). This procedure permutes block labels within participant to generate a null distribution within subject (100 times) and then samples from these (10,000) to generate a group-wise null distribution (as in the ROI analysis). The whole-brain searchlight maps are then thresholded using a combination of vertex-wise threshold (p < 0.001 uncorrected) and cluster size threshold (FWER p < 0.05, corrected for multiple comparisons across the entire cortical surface).

Results

Behavioral results

Accuracy on the magic detection task was at ceiling (M = 97.9% ± 2.2 SD) and there were no significant differences across the 4 main experimental conditions (Illness-Causal, Mechanical-Causal, Non-Causal Illness First, Non-Causal Mechanical First), F(3,57) = 2.39, p = .08. A one-way repeated measures ANOVA evaluating response time revealed a main effect of condition, F(3,57) = 32.63, p < .0001, whereby participants were faster on Illness-Causal trials (M = 4.73 ± 0.81 SD) compared to Non-Causal Illness First (M = 5.33 s ± 0.85 SD) and Non-Causal Mechanical First (M = 5.27 s ± 0.89 SD) trials. There were no differences in response time between Mechanical-Causal (M = 5.15 s ± 0.88 SD) and any other conditions. Performance on the localizer tasks was similar to previously reported studies that used these paradigms (see Appendix 3 for details).

Inferring illness causes recruits animacy-responsive PC

In whole-cortex analysis, left and right PC were the only regions to show a preference for causal inferences about illness over both mechanical inferences and causally unrelated sentences (p < .05, corrected for multiple comparisons; Figure 1B). PC responses during illness inferences overlap with previously reported responses to people-related concepts (Fairhall & Caramazza, 2013b; Supplementary Figure 2). In individual-subject fROI analysis, we similarly found that inferring illness causes activated the left and right PC more than inferring causes of mechanical failure (leave-one-run-out analysis; left: F(1,19) = 28.69, p < .0001; right: F(1,19) = 5.14, p = .04; Figure 1A). Illness inferences also activated left PC more than illness-related language that was not causally related (i.e., average of both non-causal conditions, F(1,19) = 13.23, p < .01). MVPA performed in PC fROIs and across the whole cortex similarly revealed that illness inferences and mechanical inferences produced spatially distinguishable neural patterns in left PC (t(19) = 3.50, permuted p < .001, Supplementary Table 2).

Inferring the causes of mechanical failure relative to both illness inferences and the non-causal conditions activated bilateral anterior medial ventral occipitotemporal-cortex (VOTC) (Figure 4B). This anterior medial VOTC region overlaps with the so-called anterior parahippocampal place area (PPA) (Epstein & Kanwisher, 1998; Weiner et al., 2017) and is engaged during memory and verbal tasks related to physical spaces (Baldassano et al., 2013; Fairhall et al., 2014; Silson et al., 2019; Häusler et al., 2022; Hauptman, Elli, et al., 2023). In individual-subject fROI analysis, we similarly found that mechanical inferences activated left anterior medial VOTC more than illness inferences (leave-one-run-out analysis; left: F(1,19) = 16.05, p < .001) and more than the non-causal conditions (left: F(1,19) = 17.46, p < .0001, Figure 4A)

Inferring illness causes is dissociable from other types of inference about animates within the PC

Within left PC, responses to illness inferences were spatially dissociable from responses to other types of inferences about animate entities (i.e., mental state inferences) in individual participants, with illness responses located more inferiorly (Figure 2, Supplementary Figure 3). In an exploratory analysis, we quantified this effect by identifying the peak vertices for illness inferences (illness inferences > mechanical inferences) and mentalizing (mentalizing stories > physical stories) in individual participants within a left PC mask (Dufour et al., 2013). We then compared the z-coordinates of these peaks and observed a significant difference across participants, F(1,19) = 13.52, p < .01.

Spatial dissociation between responses to illness inferences and mental state inferences in the precuneus (PC).

The left medial surface of 6 individual participants were selected for visualization purposes. The locations of the top 10% most responsive vertices to Illness-Causal > Mechanical-Causal in a PC mask (Dufour et al., 2013) are shown in red. The locations of the top 10% most responsive vertices to mentalizing stories > physical stories (mentalizing localizer) in the same PC mask are shown in blue. Overlapping vertices are shown in green.

The most illness-responsive vertices in left PC exhibited a strong preference for mentalizing, F(1,19) = 38.65, p < .0001 (Supplementary Figure 8). The most mentalizing-responsive vertices in left PC also exhibited a preference for illness inferences over mechanical inferences, F(1,19) = 6.48, p = .02. However, this effect was weaker than the preference for illness inferences observed among the most illness inference-responsive vertices (leave-one-run-out analysis) in this region, F(1,38) = 6.56, p = .01 (Supplementary Figure 7). Together, these results suggest that illness inferences are carried out by a partially distinctive subset of the PC vertices captured by the mentalizing stories vs. physical stories contrast.

Responses to illness inferences and mentalizing in right TPJ provide further evidence of a partial dissociation between illness inferences and mental state inferences. In right TPJ, where the strongest univariate preference for mentalizing is classically observed (e.g., Saxe & Kanwisher, 2003; see Supplementary Figure 1 for results from our localizer), the most mentalizing-responsive vertices did not exhibit a robust univariate preference for illness inferences in individual-subject fROI analysis, F(1,19) = 3.40, p = .08. Patterns of activity in the same vertices similarly did not distinguish between illness inferences and mechanical inferences, t(19) = 0.94, permuted p = .19. These findings are consistent with the fact that right TPJ did not exhibit a preference for illness inferences in whole-cortex analysis (p < .05, corrected for multiple comparisons; Supplementary Figure 4).

No evidence for a content-invariant preference for causal inference in logic or language networks

Neither the logic nor the language network exhibited elevated neural responses during causal inferences. Language regions in frontotemporal cortex responded more to non-causal than causal vignettes (frontal search space: F(1,19) = 23.91, p < .0001; temporal search space: F(1,19) = 4.31, p = .05, Figure 3, Supplementary Figure 6). A similar pattern was observed in the logic network, F(1,19) = 3.88, p = .07 (Figure 3). These effects likely reflect the greater difficulty associated with integrating unrelated sentences. In whole-cortex analysis, no shared regions emerged across both causal contrasts (i.e., illness inferences > non-causal and mechanical inferences > non-causal). Although MVPA searchlight analysis identified several areas where patterns of activity distinguished between causal and non-causal vignettes, all of these regions showed a preference for non-causal vignettes in univariate analysis (Supplementary Figure 5).

Individual-subjects analysis of language– and logic-responsive vertices.

Panel A: percent signal change (PSC) for each condition among the top 5% most language-responsive vertices (language > math) in a temporal language network mask (Fedorenko et al., 2010). Results from a frontal language mask (Fedorenko et al., 2010) can be found in Supplementary Figure 6. Panel B: PSC among the top 5% most logic-responsive vertices (logic > language) in a logic network mask (Liu et al., 2020). Group maps for each contrast of interest (one-tailed) are corrected for multiple comparisons (p < .05 FWER, cluster-forming threshold p < .01 uncorrected). Vertices are color coded on a scale from p=0.01 to p=0.00001.

Responses to mechanical inferences in anterior medial ventral occipito-temporal cortex (VOTC).

Panel A: Percent signal change (PSC) for each condition among the top 5% Illness-Causal > Mechanical-Causal vertices in a left anterior medial VOTC mask (Hauptman, Elli, et al., 2023) in individual participants, established via a leave-one-run-out analysis. Panel B: The intersection of two whole-cortex contrasts, Mechanical-Causal > Illness-Causal and Mechanical-Causal > Non-Causal, FWER cluster-correction for multiple comparisons (p < .05 FWER, cluster-forming threshold p < .01 uncorrected). Vertices are color coded on a scale from p=0.01 to p=0.00001. Similar to PC responses to illness inferences, anterior medial VOTC is the only region to emerge across both mechanical inference contrasts. The average PPA location from separate study involving perceptual place stimuli (Weiner et al., 2017) is overlaid in black. The average PPA location from separate study involving verbal place stimuli (Hauptman, Elli, et al., 2023) is overlaid in blue.

Discussion

We find that an inferior precuneus (PC) region previously implicated in thinking about animates is preferentially active when participants infer causes of illness. The PC responded more to causal inferences about illness compared to vignettes that contained illness-related language but were causally unrelated. The PC also responded more to causal inferences about illness compared to causal inferences about mechanical failure. In the current study, we did not find any cortical areas that responded to implicit causal inferences across content domains. These findings suggest that implicit causal inferences about illness during language comprehension draw upon a content-specific semantic network for representing animacy.

Previous work has implicated the PC in the representation of animate entities, such as people and animals (Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2023). The present results are consistent with these findings and expand upon them by showing that inferring the causes of illness, a phenomenon that is exclusive to animates, activates the PC. Thus, the PC exhibits sensitivity to causal inferences about processes specific to animates (e.g., illness) beyond the mere mention of animate entities.

The finding that the PC is sensitive to causal inferences across sentences is also consistent with prior evidence that the PC is involved in discourse-level processes and responds to semantic information across long timescales during narrative comprehension (e.g., Hasson et al., 2008; Lerner et al., 2011; Lee & Chen, 2022). PC responses observed during narrative comprehension could be driven by causal inferences related to animacy, since narratives are rich in information about the behavior of animate agents. Likewise, PC involvement in episodic memory could be related to animacy-related inferential processes (DiNicola, Braga, & Buckner, 2020; Ritchey & Cooper, 2020). Whether the PC is uniquely relevant to causal inferences compared to other types of inferences (e.g., temporal, referential; Graesser et al., 1994) about animates remains to be tested.

The results of the present study suggest that causal knowledge is embedded in semantic networks that represent animates. This hypothesis is consistent with evidence from developmental psychology that causal knowledge is central to our understanding of animacy. For example, preschoolers intuit that animates but not inanimate objects get sick, need nourishment to grow and live, and can die (e.g., Rosengren et al., 1991; Kalish, 1996; Gutheil, Vera, & Keil, 1998; Raman & Gelman, 2005; see Inagaki & Hatano, 2004; Opfer & Gelman, 2011 for reviews). Early emerging intuitions about the biological world are present across cultures and are sometimes referred to as ‘intuitive biology’ (Keil, 1992; Wellman & Gelman, 1992; Hatano & Inagaki, 1994; Simons & Keil, 1995; Atran, 1998; Keil et al., 1999; Coley, Solomon, & Shafto, 2002; Medin & Atran, 2004). The current study suggests that such knowledge is encoded in the animacy semantic network. In future work, it will be important to test whether the inferior PC is sensitive to causal knowledge about biological processes beyond illness, such as growth, inheritance, and reproduction. Another open question concerns how cultural expertise in causal reasoning about illness (e.g., medical expertise) influences representations in the PC.

Our findings additionally suggest that inferences about the biological and mental properties of animates are linked yet neurally separable. We find that inferring illness causes recruits a subregion of the PC that neighbors but is distinct from peak PC responses to mental state inferences (Saxe & Kanwisher, 2003; Saxe et al., 2006). The neural distinction between body and mind is consistent with developmental work showing that even young children provide different causal explanations for biological vs. psychological processes (Springer & Keil, 1991; Callanan & Oakes, 1992; Wellman & Gelman, 1992; Inagaki & Hatano, 1993; 2004; Keil, 1994; Hickling & Wellman, 2001; Medin et al., 2010; cf. Carey, 1985; see also Medin & Atran, 2004). For example, when asked why blood flows to different parts of the body, 6-year-olds endorse explanations referring to bodily function, e.g., “because it provides energy to the body” and not to mental states “because we want it to flow” (Inagaki & Hatano, 1993). At the same time, animate entities have a dual nature: they have both bodies and minds (Opfer & Gelman, 2011; Spelke, 2023). The current findings are consistent with the notion of distinct but partially overlapping systems for biological and mentalistic knowledge.

In addition to responses to illness inferences in the PC, we find that mechanical inferences activate an anterior portion of left medial ventral occipito-temporal cortex (VOTC), a region that has been previously implicated in supporting abstract (i.e., non-perceptual) place representations (Baldassano et al., 2013; Fairhall et al., 2014; Silson et al., 2019; Häusler et al., 2022; Hauptman, Elli, et al., 2023). This finding illustrates a neural double dissociation between biological and mechanical causal knowledge and suggests that neural systems representing semantic knowledge are sensitive to causal information in their respective domain. Our neuroscientific evidence coheres with the ‘intuitive theories’ proposal, according to which semantic knowledge is organized into causal frameworks that serve as ‘grammars for causal inference’ (Tenenbaum et al., 2007; Wellman & Gelman, 1992; Gopnik & Meltzoff, 1997; Gerstenberg & Tenenbaum, 2017; see also Boyer 1995; Barrett, Cosmides, & Tooby, 2007; Cosmides & Tooby, 2013; Bender, Beller, & Medin, 2017). It is worth noting that many real-world causal inferences, including inferences about illness, combine knowledge from multiple domains (e.g., intentional bewitchment and viral infection together cause illness; Lynch & Medin, 2006; Legare & Gelman, 2008; Legare & Shtulman, 2018). Such causal inferences might recruit multiple semantic networks.

In the current study, we failed to find a neural signature of domain-general causal inference. That is, no brain region responded more to causal than non-causal vignettes across domains. The language network responded more to non-causal than causal vignettes, perhaps because of greater integration demands associated with the non-causal condition. This result aligns with evidence that the language network is specialized for sentence-internal processing and is not sensitive to inferences at the discourse level (Fedorenko & Varley, 2016; Jacoby & Fedorenko, 2020; Blank & Fedorenko, 2020). Interestingly, responses to causal inference in semantic networks (i.e., PC, anterior medial VOTC) were stronger in the left hemisphere. The left lateralization of responses to causal inference may enable efficient interfacing with the language system during discourse comprehension. The frontoparietal logical reasoning network similarly did not exhibit a domain-general preference for causal over non-causal vignettes. This finding coheres with prior evidence that the logic network supports inferences involving symbolic, ‘content-free’ stimuli, such as If X then Y = If not Y then X (Monti et al., 2009; Feng et al., 2021). In whole cortex analysis, we also did not find any region that showed a general preference for causal inferences. Our results suggest that implicit causal inferences do not depend on domain-general causal inference machinery.

These findings do not rule out the possibility that domain-general inference mechanisms contribute to causal inference under some circumstances. Causal inferences are a highly varied class. Here we focused on implicit causal inferences that unfold spontaneously during language comprehension. A centralized mechanism could still enable causal inferences during more complex explicit tasks, even explicit inferences about the causes of illness. The vignettes used in the current study stipulate illness causes, allowing participants to reason from causes to effects. By contrast, illness reasoning performed by medical experts proceeds from effects to causes and can involve searching for potential causes within highly complicated and interconnected causal systems (Schmidt, Norman, & Boshuizen, 1990; Norman et al., 2006; Meder & Mayrhofer, 2017). Domain-general mechanisms may also be relevant to learning novel causal relationships, such as identifying new illness causes or inferring causal powers of unfamiliar objects (e.g., ‘blicket detectors’; Gopnik et al., 2001). Future neuroimaging work can help test these possibilities.

Supplementary figures and tables

Functional localization of language (Liu et al., 2020), logical reasoning (Liu et al., 2020), and mentalizing (Dodell-Feder et al., 2011) networks.

Group maps for each contrast of interest (one-tailed) are corrected for multiple comparisons (p < .05 FWER, cluster-forming threshold p < .01 uncorrected). Vertices are color coded on a scale from p=0.01 to p=0.00001.

Overlap between left precuneus (PC) responses to illness inferences in the current study and people stimuli in a separate study (Fairhall & Caramazza, 2013b).

The average location from a separate study comparing people and place concepts (Fairhall & Caramazza, 2013b) is overlaid in blue on the response to illness inferences observed in the current study. Group map (one-tailed) is corrected for multiple comparisons (p < .05 FWER, cluster-forming threshold p < .01 uncorrected). Vertices are color coded on a scale from p=0.01 to p=0.00001.

Spatial dissociation between responses to illness inferences and mental state inferences in the left precuneus (PC).

The left medial surface of all participants (n=20) is shown. The locations of the top 10% most responsive vertices to Illness-Causal > Mechanical-Causal in a PC mask (Dufour et al., 2013) are shown in red. The locations of the top 10% most responsive vertices to mentalizing stories > physical stories (mentalizing localizer) in the same PC mask are shown in blue. Overlapping vertices are shown in green.

Full whole-cortex results for Illness-Causal > Mechanical-Causal.

Group maps (two-tailed) are corrected for multiple comparisons (p < .05 FWER, cluster-forming threshold p < .01 uncorrected). Vertices are color coded on a scale from p=0.01 to p=0.00001.

Searchlight MVPA group maps.

Whole-brain searchlight maps were thresholded using a combination of vertex-wise threshold (p < 0.001 uncorrected) and cluster size threshold (FWER p < 0.05, corrected for multiple comparisons across the entire cortical surface). Vertices are color coded on a scale from 55-65% decoding accuracy.

Responses to causal inference in the language network.

Panel A: Percent signal change (PSC) for each condition among the top 5% most language-responsive vertices (language > math) in a temporal language network mask (Fedorenko et al., 2010). Panel B: The same results in a frontal language mask (Fedorenko et al., 2010).

Responses to illness inferences in bilateral PC and TPJ.

Top 4 plots: percent signal change (PSC) for each condition among the top 5% Illness-Causal > Mechanical-Causal vertices in bilateral PC and TPJ masks (Dufour et al., 2013) in individual participants, established via a leave-one-run-out analysis. Bottom 4 plots: PSC for each condition among the top 5% mentalizing stories > physical stories vertices in the same masks. We hypothesized that the PC would exhibit a preference for illness inferences and report all other responses for completeness (see preregistration https://osf.io/cx9n2/). Significance codes for Illness-Causal > Mechanical-Causal comparison: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1.

Responses to mentalizing in illness-responsive vertices in bilateral PC and TPJ.

Percent signal change (PSC) for mentalizing stories and physical stories (mentalizing localizer) was extracted from the top 5% Illness-Causal > Mechanical-Causal vertices in bilateral PC and TPJ masks (Dufour et al., 2013) in individual participants. The difference between mentalizing stories and physical stories was significant (all ps < .01) across all analyses.

Comparison of mentalizing localizers used in previous work and in the current study, in 3 pilot participants.

The mentalizing localizer in the current study used the same mentalizing stories as in previous work (Dodell-Feder et al., 2010) but contained new physical stories that included more vivid physical description and did not refer to animate agents. Group maps are shown at p < .01 uncorrected.

Illness types present in the stimulus set.

MVPA results in individual-subject functional ROIs. Each ROI was created by selecting the top 300 vertices for each contrast in each search space.

Accuracy refers to classifier performance against chance (50%) for Illness-Causal vs. Mechanical-Causal. Permuted and Bonferroni-corrected (across ROIs) p-values are reported. Ment_vs_phys: mentalizing stories > physical stories (mentalizing localizer). Caus_vs_rest: Illness-Causal + Illness-Mechanical > Rest. Logic_vs_lang: logic > language (language/logic localizer). Lang_vs_math: language > math (language/logic localizer).

Appendix 1: Online experiment protocol

Prior to the fMRI experiment, we collected explicit causality judgments from a separate group of online participants (n=30). Each online participant read all vignettes from the causal inference experiment (152 vignettes) in addition to 12 filler vignettes that were designed to be either maximally causally related or unrelated (164 vignettes total), one vignette at a time. Their task was to judge the extent to which it was possible that the event described in the first sentence of each vignette caused the event described in the second sentence on a 4-point scale (1 = not possible; 4 = very possible). 4 participants were excluded on the basis of inaccurate responses on the filler trials (i.e., difference between average ratings for maximally causally related and maximally causally unrelated vignettes <2). Among the 26 remaining participants, 12 read vignettes from Version A and 14 read vignettes from Version B of the experiment. To eliminate erroneous responses, we first excluded trials with RTs 2.5 SD outside their respective condition means within participants, and then excluded trials with outlier RTs (more than 1.5 IQR below Q1 or more than 1.5 IQR above Q3) across participants (approximately 5% of all trials excluded in total). We found that both causal conditions (Illness-Causal, Mechanical-Causal) were more causally connected than both non-causal conditions, t(25) = 36.97, p < .0001 (causal: M = 3.51 ± 0.78 SD, non-causal: M = 1.10 ± 0.45 SD). In addition, Illness-Causal and Mechanical-Causal items received equally high causality ratings, t(25) = –0.64, p = 0.53 (Illness-Causal: M = 3.49 ± 0.77 SD, Mechanical-Causal: M = 3.53 ± 0.79 SD).

Appendix 2: Details on measuring linguistic variables

All conditions were matched (pairwise t-tests, all ps > 0.3, no statistical correction) on multiple linguistic variables known to modulate neural activity in language regions (e.g., Pallier, Devauchelle, & Dehaene, 2011; Shain, Blank et al., 2020). These included number of characters, number of words, average number of characters per word, average word frequency, average bigram surprisal (Google Books Ngram Viewer, https://books.google.com/ngrams/), and average syntactic dependency length (Stanford Parser; de Marneffe, MacCartney, & Manning, 2006). Sentences that were incorrectly parsed by the automatic syntactic parser (i.e., past participle adjectives parsed as verbs) were corrected by hand. Word frequency was calculated as the negative log of a word’s occurrence rate in the Google corpus between the years 2017-2019. Bigram surprisal was calculated as the negative log of the frequency of a given two-word phrase in the Google corpus divided by the frequency of the first word of the phrase.

This calculation uses a log base of 2 in order to express surprisal in terms of “bits” that the first word provides in the context of the phrase. We used bigram surprisal as our surprisal measure to maximize the number of n-grams that had an entry in the corpus. Even so, 64 out of the 1515 total bigrams (4%) did not have an entry in the corpus and were therefore assigned the highest surprisal value among the rest of the bigrams (see Willems et al., 2016).

Appendix 3: Behavioral results from the localizer tasks

Accuracy on the language/logic localizer task was significantly lower for the logic task compared to both the language and math tasks (logic: M = 67.5% ± 14.0 SD, math: M = 93.8% ± 6.4 SD, language: M = 98.1% ± 5.8 SD; F(2,38) = 60.38, p < .0001). Similarly, response time was slowest on the logic task, followed by math and then language (logic: M = 8.78 s ± 1.88 SD, math: M = 6.20 s ± 1.37 SD, language: M = 5.18 s ± 1.53 SD; F(2,38) = 44.28, p < .0001).

Accuracy on the mentalizing localizer task was not different across the mentalizing stories and physical stories conditions (mentalizing: 83.50% ± 15.7 SD, physical: 90.50% ± 12.3 SD; F(1,19) = 2.73, p = .12). However, response time for the mentalizing stories was significantly slower (mentalizing: 3.46 s ± 0.55 SD, physical: 3.11 s ± 0.56 SD; F(1,19) = 16.59, p < .001).