When Abstract Becomes Concrete: Naturalistic Encoding of Concepts in the Brain

  1. Experimental Psychology, University College London, 26 Bedford Way, WC1H 0DS London

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.


  • Reviewing Editor
    Andrea Martin
    Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
  • Senior Editor
    Barbara Shinn-Cunningham
    Carnegie Mellon University, Pittsburgh, United States of America

Reviewer #1 (Public Review):

In this study, the authors investigate a very interesting but often overlooked aspect of abstract vs. concrete processing in language. Specifically, they study if the differences in processing of abstract vs. concrete concepts in the brain are static or dependent on the (visual) context in which the words occur. This study takes a two-step approach to investigate how context might affect the perception of concepts. First, the authors analyze if concrete concepts, expectedly, activate more sensory systems while abstract concepts activate higher-order processing regions. Second, they measure the contextual situatedness vs. displacement of each word with respect to the visual scenes it is spoken in and then evaluate if this contextual measure correlates with more activation in the sensory vs. higher-order regions respectively.

This study raises a pertinent and understudied question in language neuroscience. It also combines both computational and meta-analytic approaches.

Overall, the study had many intermediary steps that required manual subsection / random sampling and variable choices (like the time lag of analysis) with almost no visualization and interpretation of how these choices affect the observed results. The approach was also roundabout.

Peaks and Valleys Analysis:
1. Doesn't this method assume that the features used to describe each word, like valence or arousal, will be linearly different for the peaks and valleys? What about non-linear interactions between the features and how they might modulate the response?
2. Doesn't it also assume that the response to a word is infinitesimal and not spread across time? How does the chosen time window of analysis interact with the HRF? From the main figures and Figures S2-S3 there seem to be differences based on the timelag.
3. Were the group-averaged responses used for this analysis?
4. Why don't the other terms identified in Figure 5 show any correspondence to the expected categories? What does this mean? Can the authors also situate their results with respect to prior findings as well as visualize how stable these results are at the individual voxel or participant level? It would also be useful to visualize example time courses that demonstrate the peaks and valleys.

Estimating contextual situatedness:
1. Doesn't this limit the analyses to "visual" contexts only? And more so, frequently recognized visual objects?
2. The measure of situatedness is the cosine similarity of GloVE vectors that depend on word co-occurrence while the vectors themselves represent objects isolated by the visual recognition models. Expectedly, "science" and the label "book" or "animal" and the label "dog" will be close. But can the authors provide examples of context displacement? I wonder if this just picks up on instances where the identified object in the scene is unrelated to the word. How do the authors ensure that it is a displacement of context as opposed to the two words just being unrelated? This also has a consequence on deciding the temporal cutoff for consideration (2 seconds).
3. While the introduction motivated the problem of context situatedness purely linguistically, the actual methods look at the relationship between recognized objects in the visual scene and the words. Can word surprisal or another language-based metric be used in place of the visual labeling? Also, it is not clear how the process identified in (2) above would come up with a high situatedness score for abstract concepts like "truth".
4. It is a bit hard to see the overlapping regions in Figures 6A-C. Would it be possible to show pairs instead of triples? Like "abstract across context" vs. "abstract displaced"? Without that, and given (2) above, the results are not yet clear. Moreover, what happens in the "overlapping" regions of Figure 3?

Miscellaneous comments:
1. In Figure 3, it is surprising that the "concrete-only" regions dominate the angular gyrus and we see an overrepresentation of this category over "abstract-only". Can the authors place their findings in the context of other studies?
2. The following line (Pg 21) regarding the necessary differences in time for the two categories was not clear. How does this fall out from the analysis method?
3. Both categories overlap **(though necessarily at different time points)** in regions typically associated with word processing.

Reviewer #2 (Public Review):

This study tests a plausible and intriguing hypothesis that one cause of the differences in the neural underpinnings of concrete and abstract words is differences in their grounding in the current sensory context. The authors reasoned that, in this case, an abstract word presented with a relevant visual scene would be processed in a more similar way to a concrete word. Typically, abstract and concrete words are tested in isolation. In contrast, this study takes advantage of naturalistic movie stimuli to assess the neural effects of concreteness in both abstract and concrete words (the speech within the film), when the visual context is more or less tied to the word meaning (measured as the similarity between the word co-occurrence-based vector for the spoken word and the average of this vector across all present objects). This novel approach allows a test of the dynamic nature of abstract and concrete word processing, and as such could extend the literature and add a useful perspective accounting for differences in processing these word types.

The critical contrasts needed to test the key hypothesis are not presented or not presented in full within the core text. To test whether abstract processing changes when in a situated context, the situated abstract condition would first need to be compared with the displaced abstract condition as in Supplementary Figure 6. Then to test whether this change makes the result closer to the processing of concrete words, this result should be compared to the concrete result. The correlations shown in Figure 6 in the main text are not focused on the differences in activity between the situated and displaced words or comparing the correlation of these two conditions with the other (concrete/abstract) condition. As such they cannot provide conclusive evidence as to whether the context is changing the processing of concrete/abstract words to be closer to the other condition. Additionally, it should be considered whether any effects reflect the current visual processing only or more general sensory processing.

Overall, the study would benefit from being situated in the literature more, including a) a more general understanding of the areas involved in semantic processing (including areas proposed to be involved across different sensory modalities and for verbal and nonverbal stimuli), and b) other differences between abstract and concrete words and whether they can explain the current findings, including other psycholinguistic variables which could be included in the model and the concept of semantic diversity (Hoffman et al.,). It would also be useful to consider whether difficulty effects (or processing effort) could explain some of the regional differences between abstract and concrete words (e.g., the language areas may simply require more of the same processing not more linguistic processing due to their greater reliance on word co-occurrence). Similarly, the findings are not considered in relation to prior comparisons of abstract and concrete words at the level of specific brain regions.

The authors use multiple methods to provide a post hoc interpretation of the areas identified as more involved in concrete, abstract, or both (at different times) words. These are designed to reduce the interpretation bias and improve interpretation, yet they may not successfully do so. These methods do give some evidence that sensory areas are more involved in concrete word processing. However, they are still open to interpretation bias as it is not clear whether all the evidence is consistent with the hypotheses or if this is the best interpretation of individual regions' involvement. This is because the hypotheses are provided at the level of 'sensory' and 'language' areas without further clarification and areas and terms found are simply interpreted as fitting these definitions. For instance, the right IFG is interpreted as a motor area, and therefore sensory as predicted, and the term 'autobiographical memory' is argued to be interoceptive. Language is associated with the 'both' cluster, not the abstract cluster, when abstract >concrete is expected to engage language more. The areas identified for both vs. abstract>concrete are distinguished in the Discussion through the description as semantic vs. language areas, but it is not clear how these are different or defined. Auditory areas appear to be included in the sensory prediction at times and not at others. When they are excluded, the rationale for this is not given. Overall, it is not clear whether all these areas and terms are expected and support the hypotheses. It should be possible to specify specific sensory areas where concrete and abstract words are predicted to be different based on a) prior comparisons and/or b) the known locations of sensory areas. Similarly, language or semantic areas could be identified using masks from NeuroSynth or traditional meta-analyses. A language network is presented in Supplementary Figure 7 but not interpreted, and its source is not given. Alternatively, there could be a greater interpretation of different possible explanations of the regions found with a more comprehensive assessment of the literature. The function of individual regions and the explanation of why many of these areas are interpreted as sensory or language areas are only considered in the Discussion when it could inform whether the hypotheses have been evidenced in the results section.

Additionally, these methods attempt to interpret all the clusters found for each contrast in the same way when they may have different roles (e.g., relate to different senses). This is a particular issue for the peaks and valleys method which assesses whether a significantly larger number of clusters is associated with each sensory term for the abstract, concrete, or both conditions than the other conditions. The number of clusters does not seem to be the right measure to compare. Clusters differ in size so the number of clusters does not represent the area within the brain well. Nor is it clear that many brain regions should respond to each sensory term, and not just one per term (whether that is V1 or the entire occipital lobe, for instance). The number of clusters is therefore somewhat arbitrary. This is further complicated by the assessment across 20 time points and the inclusion of the 'both' category. It would seem more appropriate to see whether each abstract and concrete cluster could be associated with each different sensory term and then summarise these findings rather than assess the number of abstract or concrete clusters found for each independent sensory term. In general, the rationale for the methods used should be provided (including the peak and valley method instead of other possible options e.g., linear regression).

The measure of contextual situatedness (how related a spoken word is to the average of the visually presented objects in a scene) is an interesting approach that allows parametric variation within naturalistic stimuli, which is a potential strength of the study. This measure appears to vary little between objects that are present (e.g., animal or room), and those that are strongly (e.g., monitor) or weakly related (e.g., science). Additional information validating this measure may be useful, as would consideration of the range of values and whether the split between situated (c > 0.6) and displaced words (c < 0.4) is sufficient.

Finally, the study assessed the relation of spoken concrete or abstract words to brain activity at different time points. The visual scene was always assessed using the 2 seconds before the word, while the neural effects of the word were assessed every second after the presentation for 20 seconds. This could be a strength of the study, however almost no temporal information was provided. The clusters shown have different timings, but this information is not presented in any way. Giving more temporal information in the results could help to both validate this approach and show when these areas are involved in abstract or concrete word processing. Additionally, no rationale was given for this long timeframe which is far greater than the time needed to process the word, and long after the presence of the visual context assessed (and therefore ignores the present visual context).

Reviewer #3 (Public Review):

The primary aim of this manuscript was to investigate how context, defined from visual object information in multimodal movies, impacts the neural representation of concrete and abstract conceptual knowledge. The authors first conduct a series of analyses to identify context-independent regional responses to concrete and abstract concepts in order to compare these results with the networks observed in prior research using non-naturalistic paradigms. The authors then conduct analyses to investigate whether the regional response to abstract and concrete concepts changes when the concepts are either contextually situated or displaced. A concept is considered displaced if the visual information immediately preceding the word is weakly associated with the word whereas a concept is situated if the association is high. The results suggest that, when ignoring context, abstract and concrete concepts engage different brain regions with overlap in core language areas. When context is accounted for, however, similar brain regions are activated for processing concrete and situated abstract concepts and for processing abstract and displaced concrete concepts. The authors suggest that contextual information dynamically changes the brain regions that support the representation of abstract and concrete conceptual knowledge.

There is significant interest in understanding both the acquisition and neural representation of abstract and concrete concepts, and most of the work in this area has used highly constrained, decontextualized experimental stimuli and paradigms to do so. This manuscript addresses this limitation by using multimodal narratives which allows for an investigation of how context-sensitive the regional response to abstract and concrete concepts is. The authors characterize the regional response in a comprehensive way.

The context measure is interesting, but I'm not convinced that it's capturing what the authors intended. In analysing the neural response to a single word, the authors are presuming that they have isolated the window in which that concept is processed and the observed activation corresponds to the neural representation of that word given the prior context. I question to what extent this assumption holds true in a narrative when co-articulation blurs the boundaries between words and when rapid context integration is occurring. Further, the authors define context based on the preceding visual information. I'm not sure that this is a strong manipulation of the narrative context, although I agree that it captures some of the local context. It is maybe not surprising that if a word, abstract or concrete, has a strong association with the preceding visual information then activation in the occipital cortex is observed. I also wonder if the effects being captured have less to do with concrete and abstract concepts and more to do with the specific context the displaced condition captures during a multimodal viewing paradigm. If the visual information is less related to the verbal content, the viewer might process those narrative moments differently regardless of whether the subsequent word is concrete or abstract. I think the claims could be tailored to focus less generally on context and more specifically on how visually presented objects, which contribute to the ongoing context of a multimodal narrative, influence the subsequent processing of abstract and concrete concepts.

Author Response

We thank the reviewers for their detailed and constructive criticisms of our work. They raise many important questions (such as the issue of defining context) that we have also been thinking about extensively and they provide new and insightful avenues that have the potential to meaningfully improve the manuscript. We also appreciate that they commented on the novelty and importance of this work. Going forward, we will address the methodological concerns raised as best as we can and thereby hope to make the evidence for our conclusion more compelling

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation