Abstract

Context is information linked to a situation that can guide behavior. In the brain, context is encoded by sensory processing and can later be retrieved from memory. How context is communicated within the cortical network in sensory and mnemonic forms is unknown due to the lack of methods for high-resolution, brain-wide neuronal recording and analysis. Here, we report the comprehensive architecture of a cortical network for context processing. Using hemisphere-wide, high-density electrocorticography, we measured large-scale neuronal activity from monkeys observing videos of agents interacting in situations with different contexts. We extracted five context-related network structures including a bottom-up network during encoding and, seconds later, cue-dependent retrieval of the same network with the opposite top-down connectivity. These findings show that context is represented in the cortical network as distributed communication structures with dynamic information flows. This study provides a general methodology for recording and analyzing cortical network neuronal communication during cognition.

DOI: http://dx.doi.org/10.7554/eLife.06121.001

eLife digest

If we see someone looking frightened, the way we respond to the situation is influenced by other information, referred to as the ‘context’. For example, if the person is frightened because another individual is shouting at them, we might try to intervene. However, if the person is watching a horror video we may decide that they don't need our help and leave them to it. Nevertheless, it is not clear how the brain processes the context of a situation to inform our response.

Here, Chao et al. developed a new method to study electrical activity across the whole of the brain and used it to study how monkeys process context in response to several different social situations. In the experiments, monkeys were shown video clips in which one monkey—known as the ‘video monkey’—was threatened by a human or another monkey, or in which the video monkey is facing an empty wall (i.e., in three different contexts). Afterwards, the video monkey either displays a frightened expression or a neutral one. Chao et al. found that if the video monkey looked frightened by the context, the monkeys watching the video clip shifted their gaze to observe the apparent threat. How these monkeys shifted their gaze depended on the context, but this behavior was absent when the video monkey gave a neutral expression.

The experiments used an array of electrodes that covered a wide area of the monkeys' brains to record electrical activity of nerve cells as the monkeys watched the videos. Chao et al. investigated how brain regions communicated with each other in response to different contexts, and found that the information of contexts was presented in the interactions between distant brain regions. The monkeys' brains sent information from a region called the temporal cortex (which is involved in processing sensory and social information), to another region called the prefrontal cortex (which is involved in functions such as reasoning, attention, and memory). Seconds later, the flow of information was reversed as the monkeys utilized information about the context to guide their behavior.

Chao et al.'s findings reveal how information about the context of a situation is transmitted around the brain to inform a response. The next challenge is to experimentally manipulate the identified brain circuits to investigate if problems in context processing could lead to the inappropriate responses that contribute to schizophrenia, post-traumatic stress disorder and other psychiatric disorders.

DOI: http://dx.doi.org/10.7554/eLife.06121.002

Main text

Introduction

Context is the contingent sensory or cognitive background for a given situation. Different contexts can dramatically alter perception, cognition, or emotional reactions and decision-making, and in the brain network context can be represented during sensory encoding or mnemonic retrieval. The study of context is important for understanding the link between perception and cognition, in terms of both behavioral and neural processing, and the neural mechanisms underlying contextual information processing have been studied in a variety of domains including visual perception (Bar, 2004; Schwartz et al., 2007), emotion (De Gelder, 2006; Barrett et al., 2007; Barrett and Kensinger, 2010), language (Hagoort, 2005; Aravena et al., 2010), and social cognition (Ibañez and Manes, 2012). In the brain, context is proposed to require an interplay between bottom-up and top-down information processing in distributed neural networks (Tononi and Edelman, 1997; Friston, 2005). However, a comprehensive functional view of the brain circuits that mediate contextual processing remains unknown because bottom-up and top-down processes are often concurrent and interdependent, making the temporal and spatial resolution of their neural network organization difficult to separate.

To understand contextual information processing, we developed a fundamentally new approach to study high-resolution brain network architecture. The approach combines broadband neural recording of brain activity at high spatial and temporal resolution with big data analytical techniques to enable the computational extraction of latent structure in functional network dynamics. We employed this novel methodological pipeline to identify functional network structures underlying fast, internal, concurrent, and interdependent cognitive processes during context processing in monkeys watching video clips with sequentially staged contextual scenarios. Each scenario contained a conspecific showing emotional responses preceded by different situational contexts. With specific combinations of context and response stimuli, this paradigm allowed an examination of context-dependent brain activity and behavior by isolating context processing as a single variable in the task.

To measure large-scale brain network dynamics with sufficient resolution, we used a 128-channel hemisphere-wide high-density electrocorticography (HD-ECoG) array to quantify neuronal interactions with high spatial, spectral, and temporal resolution. This ECoG system has wider spatial coverage than conventional ECoG and LFP (Buschman and Miller, 2007; Pesaran et al., 2008; Haegens et al., 2011) and single–unit activity (Gregoriou et al., 2009), higher spatial resolution than MEG (Gross et al., 2004; Siegel et al., 2008), broader bandwidth than EEG (Hipp et al., 2011), and superior temporal resolution to fMRI (Rees and House, 2005; Freeman et al., 2011). After recording, we interrogated large-scale functional network dynamics using a multivariate effective connectivity analysis to quantify information content and directional flow within the brain network (Blinowska, 2011; Chao and Fujii, 2013) followed by big data analytical approaches to search the database of broadband neuronal connectivity for a latent organization of network communication structures.

Results

Large-scale recording of brain activity during video presentations

Monkeys watched video clips of another monkey (video monkey, or vM) engaging with a second agent (Figure 1) while cortical activity was recorded with a 128-channel ECoG array covering nearly an entire cerebral hemisphere. Three monkeys participated, one with a right hemisphere array (Subject 1), and two in the right (Subjects 2 and 3) (Figure 1—figure supplement 1). The data are fully accessible online and can be downloaded from the website Neurotycho.org.

The video clips started with a context between the two agents (Context period) followed by a response to the context (Response period). Six different video clips were created from three contexts, vM threatened by a human (Ch), threatened by another monkey (Cm), or facing an empty wall (Cw), combined with two responses, vM showing a frightened expression (Rf), or neutral expression (Rn), which were termed ChRf, CmRf, CwRf, ChRn, CmRn, and CwRn (see Videos 1–6). Each video contained audio associated with the event, for example, sounds of a threatening human (Ch) and a frightened monkey (Rf). Each video represents a unique social context-response scenario, For example, ChRf shows a human threatening a monkey (vM) followed by the monkey's frightened response. These staged presentations were designed to examine whether different contexts (Ch, Cm, or Cw) would give rise to context-dependent brain activity even with the same responses (Rf or Rn).

Eye movements demonstrate context- and response-dependent behaviors

During the task, subjects freely moved their eyes to observe the video interactions. We monitored eye movements to examine these spontaneous behavioral reactions and the associated zones in the video. We divided the trials into two conditions based on whether the context stimulus was visually perceived: C+ where the subject was looking at the screen during the Context period, and C where the subject was either closing its eyes or looking outside of the screen. Example eye movements are shown in Figure 2—figure supplement 1.

We first investigated which side of the video monitor the monkey attended. When the context was perceived (C+) and the response stimulus was Rf, subjects focused more on the right section during the Response period, indicating interest in the curtain, or the threat behind the curtain, than the frightened vM (Figure 2A). This preference was absent when the response stimulus was Rn or the context was not visually perceived (C). This is behavioral evidence that gaze direction preference required not only the vM response, but also perception of the preceding context, which demonstrated a cognitive association between the perception of the context and response stimuli.

We then compared gazing behaviors from different trial types to identify behaviors selective to different scenarios (ChRf, CmRf, CwRf, ChRn, CmRn, or CwRn) and conditions (C+ or C). We performed nine pairwise comparisons on gaze positions from different scenarios, separating C+ and C conditions, to examine their context and response dependence. For context dependence, we compared behaviors from trials with different context stimuli but the same response stimulus (6 comparisons: CmRf vs CwRf, CwRf vs ChRf, and ChRf vs CmRf for context dependence in Rf; CmRn vs CwRn, CwRn vs ChRn, and ChRn vs CmRn for context dependence in Rn). For response dependence, we compared behaviors from trials with the same context stimulus but with different response stimuli (3 comparisons: CmRf vs CmRn, CwRf vs CwRn, and ChRf vs ChRn).

A context and response dependence was found in gazing behavior (Figure 2B). In C+, significant differences in gaze position were found during the Response period between CmRf and CwRf, and CwRf and ChRf, but not between ChRf and CmRf (blue circles in left panel). This indicated that gaze shifting in CmRf and ChRf was comparable and stronger than in CwRf. This context dependence was absent when the response stimuli were Rn (green circles in left panel). Furthermore, a significant response dependence was found during the Response period for all contexts (CmRf vs CmRn, CwRf vs CwRn, and ChRf vs ChRn) (red circles in left panel) consistent with the results described in Figure 2A. In C, the context and response dependence found in C+ was absent (right panel). These results indicated that the subjects' gaze shift during the Response period showed both response dependence (Rf > Rn) and context dependence (CmCh > Cw), but only when context was perceived (C+ > C).

Mining of large-scale ECoG data for cortical network interactions

To analyze the large-scale ECoG dataset, we identified cortical areas over the 128 electrodes in the array by independent component analysis (ICA). Each independent component (IC) represented a cortical area with statistically independent source signals (Figure 3—figure supplement 1, and experimental parameters in Table 1).

Table 1.

Experimental parameters

DOI: http://dx.doi.org/10.7554/eLife.06121.013

Subject 1Subject 2Subject 3
ExperimentHemisphere implantedRightLeftLeft
# of electrodes128128128
# of trials per class150150150
# of trials preserved per class (mean ± std) (see trial screening in ‘Materials and methods’)117.7 ± 3.5122.2 ± 3.1109.5 ± 3.6
# of C+ trials per class (mean ± std)64.8 ± 5.260.3 ± 6.757.3 ± 5.6
# of C trials per class (mean ± std)52.8 ± 1.961.8 ± 8.152.2 ± 3.5
ICA (see Figure 3—figure supplement 1)# of ICs for 90% variance explained583839
# of ICs preserved (see IC screening in ‘Materials and methods’)49 (removed ICs 1, 2, 3, 4, 5, 11, 44, 46, and 47)33 (removed ICs 1, 2, 7, 8, and 29)36 (removed ICs 2, 10, and 27)

We then measured the causality of a connection from one cortical area (source area) to another (sink area) with a multivariate effective connectivity measure based on Granger causality: direct directed transfer function (dDTF) (Korzeniewska et al., 2003), which can represent phase differences between the two source signals to provide a time-frequency representation of their asymmetric causal dependence. We acquired dDTFs from all connections for each trial type (12 types: six scenarios and two conditions), and measured event-related causality (ERC), by normalizing the dDTF of each time point and each frequency bin to the median of the corresponding baseline control values. Thus, ERCs represent the spectro-temporal dynamics of network interactions evoked by different scenarios and conditions. Examples of ERCs are shown in Figure 3A.

We compared ERCs from different trial types to identify networks selectively activated in different scenarios (ChRf, CmRf, CwRf, ChRn, CmRn, or CwRn) and conditions (C+ or C). We performed nine pairwise comparisons on ERCs from different scenarios, separating C+ and C trials, to examine their context and response dependence. To examine context dependence, we compared ERCs from trials with different contexts but the same response (6 comparisons). In contrast, to examine response dependence, we compared ERCs from trials with the same context but with different responses (3 comparisons). This approach is similar to the eye movement analysis (Figure 2B). The comparisons were performed with a subtractive approach to derive a significant difference in ERCs (∆ERCs, Figure 3B). Hence, ∆ERCs revealed network connections, with corresponding time and frequency, where ERCs were significantly stronger or weaker in one scenario compared to another.

We pooled ∆ERCs from all comparisons, conditions, connections, and subjects, to create a comprehensive broadband library of network dynamics for the entire study. To organize and visualize the dataset, we created a tensor with three dimensions: Comparison-Condition, Time-Frequency, and Connection-Subject, for the functional, dynamic, and anatomical aspects of the data, respectively (Figure 2C). The dimensionality of the tensor was 18 (nine comparisons under two conditions) by 3040 (160 time windows and 19 frequency bins) by 4668 (49 × 48 connections for Subject 1, 33 × 32 for Subject 2, and 36 × 35 for Subject 3).

To extract structured information from this high-volume dataset, we deconvolved the 3D tensor into multiple components by performing parallel factor analysis (PARAFAC), a generalization of principal component analysis (PCA) to higher order arrays (Harshman and Lundy, 1994) and measured the consistency of deconvolution under different iterations of PARAFAC (Bro and Kiers, 2003). Remarkably, we observed five dominant structures from the pooled ∆ERCs that represented functional network dynamics, where each structure contained a comprehensive fingerprint of network function, dynamics, and anatomy (Figure 3D, and Figure 3—figure supplement 2). These five structures were robust against model order selection for ICA (Figure 3—figure supplement 3).

Discrete structured representations of functional network dynamics

The five structures are shown in Figure 4 (Structures 1 and 2) and Figure 5 (Structures 3, 4, and 5). Each structure represented a unique functional network dynamics, described by its compositions in the three tensor dimensions. The first tensor dimension (panel A) represented the differences across comparisons for each structure. We identified the significant differences and reconstructed the activation levels to show how each structure was activated under different scenarios and conditions (see the ‘Materials and methods’). The second tensor dimension (panel B) represented spectro-temporal dynamics for each structure. The third tensor dimension (panel C) represented the anatomical connectivity pattern for each structure. We measured three connectivity statistics: (1) causal density is the sum of all outgoing and incoming causality for each area, showing areas with busy interactions; (2) causal outflow is the net outgoing causality of each area, indicating the sources and sinks of interactions; and (3) maximum flow between areas is the maximal causality of all connections between cortical areas (7 areas found with busy interactions were chosen) (see results for individual subjects in Figure 4—figure supplements 1–3). The extracted statistics were robust across all subjects with different electrode placements suggesting that the structures were bilaterally symmetric across hemispheres (Figure 4—figure supplement 4).

Structure 1 was activated first, with context dependence only in the Context period (Cm > Ch > Cw) suggesting sensory processing that can discriminate between contextual stimuli, that is, context perception. The context dependence was weaker but remained in C (C+ > C), suggesting that auditory information in context stimuli was processed. The spectral dynamics of Structure 1 emerged primarily in the high-γ band (>70 Hz), and contained mostly bottom-up connections from the posterior to anterior parts of temporal cortex.

Structure 2 was the earliest activated in the Response period, with only response dependence (Rf > Rn), and was independent of whether context stimuli were visually perceived (C+C). Thus, Structure 2 corresponds to sensory processing that can discriminate between response stimuli, that is, response perception. Spectrally, Structure 2 emerged in both high-γ and β bands, and contained connections similar to those in Structure 1, with an additional communication channel from anterior temporal cortex to the prefrontal cortex (PFC). Therefore, Structures 1 and 2 represent the multisensory processing of audiovisual stimuli, and Structure 2 could underlie the additional evaluation of emotional valence associated with the stimuli.

Structure 3 was activated the second earliest in the Context period showing a generalized context dependence (CmCh > Cw) representing the abstract categorization of the context (‘an indeterminate agent is threatening vM’). Similar to Structure 1, the dependence in Structure 3 was weaker in C (C+ > C), which suggested that the creation of abstract contextual information depended on the initial perception of context stimuli. Structure 3 appeared mainly in the β band (10–30 Hz), and contained primarily bottom-up connections from the posterior temporal cortex (mainly the area TEO) to the anterior temporal cortex (mainly the temporal pole) and the lateral and medial PFC.

Structure 4 showed the same generalized context dependence as Structure 3, but during the Response period when context stimuli were absent and only in C+Rf (not in C+Rn and C). The absence of context dependence in C+Rn and C suggested that Structure 4 required both vM responses with high emotional valence and its context. Moreover, Structure 4 exhibited spatial and spectral characteristics similar to Structure 3 (Figure 5—figure supplement 1). We conclude that Structures 3 and 4 represent the same or very similar neural substrate, differing only in when and how they were activated. Structure 3 corresponds to the initial formation/encoding of the contextual information, while Structure 4 represents the Rf -triggered reactivation/retrieval of the contextual information. Therefore, Structures 3 and 4 represent the generalized, abstract perceptual and cognitive content of the context.

Structure 5 showed context dependence (CmCh > Cw) in C+Rf (not in C+Rn and C), and response dependence (Rf > Rn) in C+ (not in C) during the Response period, and appeared mainly in α and low-β bands (5–20 Hz). Anatomically, the structure showed primarily top-down connections between posterior temporal cortex, the anterior temporal cortex, and the lateral and medial PFC. Remarkably, Structure 5 is the only one demonstrating clear top-down connections, with the same context and response dependence as the gaze behavior (see Figure 2B). These results suggest that Structure 5 corresponds to a network for the context-dependent feedback modulation of eye gaze or visual attention during the task, and the other four structures index internal processes that lead to this behavioral modulation.

Functional, dynamical, and anatomical correlations between network structures

We investigated how the structures coordinated with each other during the task by examining how they correlated with each other in the functional, dynamical, and anatomical domains.

To study function, we evaluated how each structure's context and response dependence correlated with others', by measuring correlation coefficients of structures' differences in comparisons across contexts in Rf, across contexts in Rn, and across responses (Figure 6A) (detailed in the ‘Materials and methods’). Significant correlations between two structures indicated that one structure's activation affected another's, and vice versa, demonstrating a causal interdependence or a common external driver. Across contexts in Rf (Figure 6A, left), Structure 1 significantly correlated to Structures 3 and 4, which were themselves significantly correlated to Structure 5. However, across contexts in Rn (Figure 6A, middle), a significant correlation was found only between Structures 1 and 3. These results confirmed that sensory perception of the context stimuli could be significantly correlated to the formation of an abstract context, and, in turn this abstract context could be significantly correlated to its reactivation and top-down modulation when a response had high emotional valence. Across responses (Figure 6A, right), Structure 2 significantly correlated to Structure 4, which was itself significantly correlated to Structure 5. This indicated that that top-down modulation is the integration of response information and abstract context information.

Figure 6.
Download figureOpen in new tabFigure 6. Coordination and co-activation of network structures.

(A) Functional coordination: The coordination between structures was evaluated by the correlation coefficients between structures' context and response dependence (the differences shown in Figures 4A, 5A). Each panel illustrates how Structure i (y-axis) correlated with Structure j (x-axis) in context dependence in Rf (left), context dependence in Rn (middle), and response dependence (right). Significant correlations are indicated as asterisks (α = 0.05) (see ‘Materials and methods’). (B) Dynamic co-activation: The dynamics correlation was shown by correlation coefficients between structures' temporal and spectral activation. Each panel shows how Structure i correlated with Structure j in temporal dynamics (left) and frequency profile (right). Significant correlations are indicated as asterisks (α = 0.05). (C) Anatomical overlap: The anatomical similarity was indexed by the ratio of shared anatomical connections between structures. Each panel illustrates the ratio of the number of shared connections between Structures i and j and the total number of connections in Structure i. Results obtained from three subjects are shown separately. (D) Undirected pathways of connections shared by all structures for each subject (top), and those appearing in at least one structure for each subject (bottom). The lateral cortical surface is shown on the left for Subject 1, and on the right for Subjects 2 and 3. Shared pathways (lines) between two cortical areas (circles) of the top 1, 5, 10, and 25% connections are shown. Pathways with greater strengths are overlaid on those with weaker strengths.

DOI: http://dx.doi.org/10.7554/eLife.06121.025

To examine dynamics, we tested whether network structures had mutually exclusive or overlapping spectro-temporal dynamics. We measured the temporal dynamics of each structure by summing up the activation in the second tensor dimension across frequencies. Significant correlations in temporal dynamics were found between structures activated in the Context period (Structures 1 and 3) and the Response period (Structures 2, 4, and 5) (Figure 6B, left). We then measured the spectral profile of each structure by summing up the activation in the second tensor dimension over time. Significant correlations in spectral profiles were found among structures with β band activation (Structures 2, 3, 4, and 5) (Figure 6B, right).

To investigate anatomy, we identified directed connections with the top 10% strengths in the third tensor dimension, and examined the shared top 10% connections between structures for each subject (Figure 6C). The numbers of shared connections between all structures were particularly high (>70% shared) between Structures 3 and 4 (abstract contextual information), and Structures 1 and 2 (perception). We examined the undirected pathways that exclude the directionality of connections and found pathways shared by all structures and subjects in and from the temporal cortex to PFC (Figure 6D, top). Pathways appearing in at least one structure were widespread across cortex (Figure 6D, bottom).

These results demonstrate the functional coordination and spatio-spectro-temporal co-activation of the five identified network structures, and reveal the multiplexing property of large-scale neuronal interactions in brain: simultaneous information transfer in similar frequency bands along similar anatomical pathways could be functionally reconstituted into distinct cognitive operations depending on other networks' ongoing status. This type of information would be difficult to extract from traditional EEG/MEG/fMRI analyses.

Discussion

In this study, we demonstrate that context can be represented by dynamic communication structures involving distributed brain areas and coordinated within large-scale neuronal networks, or neurocognitive networks (Varela et al., 2001; Fries, 2005; Bressler and Menon, 2010; Siegel et al., 2012). Our analysis combines three critical properties of neurophysiology—function, dynamics, and anatomy—to provide a high-resolution large-scale description of brain network dynamics for context. The five network structures we identified reveal how contextual information can be encoded and retrieved to modulate behavior with different bottom-up or top-down configurations. The coordination of distributed brain areas explains how context can regulate diverse neurocognitive operations for behavioral flexibility.

Context is encoded by interactions of large-scale network structures

These findings show that context can be encoded in large-scale bottom-up interactions from the posterior temporal cortex to the anterior temporal cortex and the lateral and medial PFC. The PFC is an important node in the ‘context’ network (Miller and Cohen, 2001; Bar, 2004), where the lateral PFC is believed to be critical for establishing contingencies between contextually related events (Fuster et al., 2000; Koechlin et al., 2003), and the medial PFC is involved in context-dependent cognition (Shidara and Richmond, 2002) and conditioning (Fuster et al., 2000; Koechlin et al., 2003; Frankland et al., 2004; Quinn et al., 2008; Maren et al., 2013). Our results indicate that abstract contextual information can be encoded not only within the PFC, but in PFC interactions with lower-level perceptual areas in the temporal cortex. These dynamic interactions between unimodal sensory and multimodal association areas could explain the neuronal basis of why context networks can affect a wide range of cognitive processes, from lower-level perception to higher-level executive functions.

Apart from the bottom-up network structure that encodes abstract context, we discovered other network structures that process either lower-level sensory inputs for context encoding or integrate contextual information for behavioral modulation. Evidently, brain contextual processing, from initial perception to subsequent retrieval, is represented not by sequential activation but rather sequential modular communication among participating brain areas. Thus, we believe that the network structures we observed represent a module of modules, or ‘meta-module’ for brain communication connectivity. Further investigation of this meta-structure organization for brain network communication could help determine how deficits in context processing in psychiatric disorders such as schizophrenia (Barch et al., 2003) and post-traumatic stress disorder (Milad et al., 2009) could contribute to their etiology.

Cognition as a modular organization consisting of network structures

These results suggest a basic structural organization of large-scale communication within brain networks that coordinate context processing, and provide insight into how apparently seamless cognition is constructed from these network communication modules. In contrast to previous studies where brain modularity is defined as a ‘community’ of spatial connections (Bullmore and Sporns, 2009; Sporns, 2011), or coherent oscillations among neuronal populations in overlapping frequency bands (Siegel et al., 2012), our findings provide an even more general yet finer grained definition of modularity based on not only anatomical and spectral properties, but also temporal, functional, and directional connectivity data. The relationships among network properties in the functional, temporal, spectral, and anatomical domains revealed network structures whose activity coordinated with each other in a deterministic manner (Figure 7A), despite being highly overlapping in time, frequency, and space (Figure 7B). Such multiplexed, yet large-scale, neuronal network structures could represent a novel meta-structure organization for brain network communication. Further studies will be needed to show whether these structures are components of cognition.

Figure 7.
Download figureOpen in new tabFigure 7. Context as a sequence of interactions between network structures.

(A) Coordination between network structures (S1 to S5, circles), under Rn (top) or Rf (bottom) responses. In both response contingencies, context perception (S1) encoded contextual information (S3). However, when the response stimulus contained high emotional valence (Rf, bottom), response perception (S2) reactivates the contextual information (S4), resulting in top-down modulation feedback (S5) that shares the same context and response dependence as the gazing behavior (black arrow and rounded rectangles). Green, blue, and red arrows represent correlations in context dependence in Rn, context dependence in Rf, and in response dependence, respectively (see Figure 6A). (B) Temporal, spectral, and spatial profiles and overlap in defined network structures. Network structures can be characterized by frequency range (labeled on the left) and connectivity pattern (shown on the right). Their temporal activations are plotted over trial time, with a ‘sound-like’ presentation, where a higher volume represents stronger activation. Black vertical lines represent the events as indicated in Figure 2.

DOI: http://dx.doi.org/10.7554/eLife.06121.026

Applications for large-scale functional brain network mapping

We developed an analytical approach using an unbiased deconvolution of comprehensive network activity under well-controlled and staged behavioral task conditions. This workflow enabled us to identify novel network structures and their dynamic evolution during ongoing behavior. In principle, this approach can be generally useful to investigate how network structures link neural activity and behavior. However, we caution that the latent network structures we identified were extracted computationally, and therefore will require further confirmatory experiments to verify their biological significance, particularly the causality of the connectivity patterns within each structure and the functional links bridging different structures. The biological meaning of the identified network structures could be achieved by selective manipulation of neuronal pathways by electrical or optogenetic stimulation linked to the ECoG array by neurofeedback, or neuropharmacological manipulations.

The general class of network structures we identified are not necessarily unique to context, By recording with a hemisphere-wide ECoG array and applying our analytical methodologies to other cognitive behaviors and tasks in non-human primates, we fully expect to observe similar network structures. Our approach of pooling large-scale data across subjects may be useful to extract network structures that are generalizable, because neural processes specific to individual subjects or trials will cancel. Indeed, the stable and consistent trial responses across subjects in our chronic ECoG recordings suggest that the network structures we isolated may be candidate innate, elemental units of brain organization. Conversely, future identification of unique differences in network structures between subjects could offer insight into structures related to individual trait and state variability, and the network-level etiology of brain diseases (Belmonte et al., 2004; Uhlhaas and Singer, 2006).

Materials and methods

Subjects and materials

Customized 128-channel ECoG electrode arrays (Unique Medical, Japan) containing 2.1 mm diameter platinum electrodes (1 mm diameter exposed from a silicone sheet) with an inter-electrode distances of 5 mm were chronically implanted in the subdural space in three Japanese macaques (Subjects 1, 2, and 3). The details of surgical methods can be found on Neurotycho.org. In Subject 1, electrodes were placed to cover most of the lateral surface of the right hemisphere, also the medial parts of the frontal and occipital lobes. In Subject 2, a similar layout was used, but in the left hemisphere. In Subjects 3, all electrodes were placed on the lateral surface of the left hemisphere, and no medial parts were covered. The reference electrode was also placed in the subdural space, and the ground electrode was placed in the epidural space. Electrical cables leading from the ECoG electrodes were connected to Omnetics connectors (Unique Medical) affixed to the skull with an adaptor and titanium screws. The locations of the electrodes were identified by overlaying magnetic resonance imaging scans and x-ray images. For brain map registration, the electrode locations and the brain outlines from Subjects 1 and 3 were manually registered to those from Subject 2 based on 13 markers in the lateral hemisphere and 5 markers in the medial hemisphere (see Figure 1—figure supplement 1).

All experimental and surgical procedures were performed in accordance with the experimental protocols (No. H24-2-203(4)) approved by the RIKEN ethics committee and the recommendations of the Weatherall report, ‘The use of non-human primates in research’. Implantation surgery was performed under sodium pentobarbital anesthesia, and all efforts were made to minimize suffering. No animal was sacrificed in this study. Overall care was managed by the Division of Research Resource Center at RIKEN Brain Science Institute. The animal was housed in a large individual enclosure with other animals visible in the room, and maintained on a 12:12-hr light:dark cycle. The animal was given food (PS-A; Oriental Yeast Co., Ltd., Tokyo, Japan) and water ad libitum, and also daily fruit/dry treats as a means of enrichment and novelty. The animal was occasionally provided toys in the cage. The in-house veterinary doctor checked the animal and updated daily feedings in order to maintain weight. We have attempted to offer as humane treatment of our subject as possible.

Task design

During the task, each monkey was seated in a primate-chair with its arms and head gently restrained, while a series of video clips was presented on a monitor (Videos 1–6). In one recording session, each of six video clips was presented 50 times, and all 300 stimuli were presented in a pseudorandom order in which the same stimulus would not be successively presented. In order to keep the monkey's attention to the videos, food items were given after every 100 stimuli. Each monkey participated three recording sessions within a week. Each stimulus consisted of three periods: Waiting, Context, and Response periods. During the Waiting period, a still picture created by pixel-based averaging and randomizing the all frames of stimuli was presented without sound for 2. During the first 0.5 s of the Context period, a still image of an actor (a monkey) and an opponent (a monkey, a human, or wall) was presented with the sound associated with the opponents. The actor was always positioned on the left side of the image. Then a curtain in the video started to close from the right side toward the center to cover the opponent. The curtain closing animation took 0.5 s, and the curtain stayed closed for another 0.5 s. During the Response period, one of two emotional expressions of the actor (frightening or neutral) was presented with sound for 3 s, followed by the Waiting period of the next trial.

ECoG and behavior recordings

An iMac personal computer (Apple, USA) was used to present the stimuli on a 24-in LCD monitor (IOData, Japan) located 60 cm away from the subject. The sound was presented through one MA-8BK monitor speaker (Roland, Japan) attached to the PC. The experiments were run by a program developed in MATLAB (MathWorks, USA) with Psychtoolbox-3 extensions (Brainard, 1997). The same PC was used to control the experiments and the devices for recording monkey's gaze and neural signals via USB-1208LS data acquisition device (Measurement Computing Co., USA). A custom-made eye-track system was used for monitoring and recording the monkey's left (Subject 1) or right (Subjects 2 and 3) eye at 30 Hz sampling (Nagasaka et al., 2011). Cerebus data acquisition systems (Blackrock Microsystems, USA) were used to record ECoG signals with a sampling rate of 1 kHz.

Trial screening

Trials during which the subject's eye position was within the screen area more than 80% of the time during the first 0.5 s of the Context period were classified as C+ trials. The rest of the trials were identified as C trials, where the subject either closed its eyes or the eye position was outside the screen or outside the recording range (±30°).

Data analysis

ICA

ICA was performed on the data combined C+ and C trials to acquire a common basis for easier interpretation of the results. On the other hand, dDTF and the following analyses (ERC and SD-ERC) were calculated from C+ and C trials separately, and later combined in PARAFAC analysis.

Preprocessing

The 50 Hz line noise was removed from raw ECoG data by using the Chronux toolbox (Bokil et al., 2010). The data was then downsampled four times, resulted in a sampling rate of 250 Hz. Trials with abnormal spectra were rejected by using an automated algorithm from the EEGLAB library (Delorme et al., 2011), which has been suggested as the most effective method for artifact rejection (Delorme et al., 2007). The numbers of trials preserved are shown in Table 1.

Model order selection

The model order, that is, the number of components, for ICA was determined by the PCA of the data covariance matrix, where the number of eigenvalues accounted for 90% of the total observed variance. The resulted model order is shown in Table 1.

ICA algorithm

ICA was performed in multiple runs using different initial values and different bootstrapped data sets by using the ICASSO package (Himberg et al., 2004) with the FastICA algorithm (Hyvärinen and Oja, 1997), which could significantly improve the reliability of the results (Meinecke et al., 2002). In the end, artifactual components with extreme values and abnormal spectra were discarded by using an automated algorithm from EEGLAB (Delorme and Makeig, 2004). The number of components preserved after this screening process is shown in Table 1.

Multivariate spectral causality

Spectral connectivity measures for multitrial multichannel data, which can be derived from the coefficients of the multivariate autoregressive model, require that each time series be covariance stationary, that is, its mean and variance remain unchanged over time. However, ECoG signals are usually highly nonstationary, exhibiting dramatic and transient fluctuations. A sliding-window method was implemented to segment the signals into sufficiently small windows, and connectivity was calculated within each window, where the signal is locally stationary.

Preprocessing

Three preprocessing steps were performed to achieve local stationarity: (1) detrending, (2) temporal normalization, and (3) ensemble normalization (Ding et al., 2000). Detrending, which is the subtraction of the best-fitting line from each time series, removes the linear drift in the data. Temporal normalization, which is the subtraction of the mean of each time series and division by the standard deviation, ensures that all variables have equal weights across the trial. These processes were performed on each trial for each channel. Ensemble normalization, which is the pointwise subtraction of the ensemble mean and division by the ensemble standard deviation, targets rich task-relevant information that cannot be inferred from the event-related potential (Ding et al., 2000; Bressler and Seth, 2011).

Window length selection

The length and the step size of the sliding-window for segmentation were set as 250 ms and 50 ms, respectively. The window length selection satisfied the general rule that the number of parameters should be <10% of the data samples: to fit a VAR model with model order p on data of k dimensions (k ICs selected from ICA), the following relation needs to be satisfied: w ≥ 10 × (k^2 × p/n), where w and n represent the window length and the number of trials, respectively.

Model order selection

Model order, which is related to the length of the signal in the past that is relevant to the current observation, was determined by the Akaike information criterion (AIC) (Akaike, 1974). In all subjects, a model order of nine samples (equivalent to 9 × 4 = 36 ms of history) resulted in minimal AIC and was selected. The selected model order also passed the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test, thus maintained local stationarity. Furthermore, the VAR model was validated by the whiteness test and the consistency test.

Spectral connectivity

dDTF (Korzeniewska et al., 2003), a VAR-based spectral connectivity measure, was calculated by using the Source Information Flow Toolbox (SIFT) (Delorme et al., 2011) together with other libraries, such as Granger Causal Connectivity Analysis (Seth, 2010) and Brain-System for Multivariate AutoRegressive Timeseries (Cui et al., 2008). A detailed tutorial of VAR-based connectivity measures can be found in the SIFT handbook (http://sccn.ucsd.edu/wiki/SIFT).

ERC

To calculate ERC at time t and frequency f, or ERC(t, f), dDTF at time t and frequency f, or dDTF(t, f), was normalized by the median value during the baseline period at frequency f, or dDTFbaseline (f):dDTFbaseline(f)=median(dDTF(tbaseline,f)),ERC(t,f)=10log10(dDTF(t,f)dDTFbaseline(f)).(1)

Comparisons for context and response dependencies

Nine comparisons were performed on the ERCs obtained from different social scenarios in C+ and C trials separately. To examine context dependency, three comparisons were performed between scenarios with different contexts followed by the frightened response (ChRfCmRf, CmRfCwRf, and CwRfChRf), and another three comparisons were performed between scenarios with different contexts followed by the neutral response (ChRnCmRn, CmRnCwRn, and CwRnChRn). To examine response dependency, three comparisons were performed between scenarios with different responses under the same contexts (ChRfChRn, CmRfCmRn, and CwRfCwRn). False discovery rate (FDR) control was used to correct for multiple comparisons in multiple hypothesis testing, and a threshold of αFDR = 0.05 was used.

PARAFAC

PARAFAC was performed by using the N-way toolbox (Andersson and Bro, 2000), with the following constraints: no constraint on the first tensor dimension, and non-negativity on the second and the third tensor dimensions. The non-negativity constraint was introduced mainly for a more simple visualization of the results. The convergence criterion, that is, the relative change in fit for which the algorithm stops, was set to be 1e-6. The initialization method was set to be DTLD (direct trilinear decomposition) or GRAM (generalized rank annihilation method), which was considered the most accurate method (Cichocki et al., 2009). Initialization with random orthogonalized values (repeated 100 times, each time with different random values) was also shown for comparison.

Connectivity statistics

The connectivity statistics used in this study were calculated from the connectivity matrix (a weighted directed relational matrix) from each latent network structure and each subject, by using the Brain Connectivity Toolbox (Rubinov and Sporns, 2010).

Causal density and outflow

Node strength was measured as the sum of weights of links connected to the node (IC). The causal density of each node was measured as the sum of outward and inward link weights (out-strength + in-strength), and the causal outflow of each node was measured as the difference between outward and inward link weights (out-strength—in-strength). For visualization, each measure was spatially weighted by the absolute normalized spatial weights of the corresponding IC. For example, assume the causal density and causal outflow of IC ic is Densityic, and Outflowic, respectively, and the spatial weights of IC ic on channel ch is Wic,ch, then the spatial distributions of causal density and causal outflow on each channel will be:Densitych=icabs(Wic,ch)max(abs(Wic,ch))Densityic,Outflowch=icabs(Wic,ch)max(abs(Wic,ch))Outflowic.(2)

Maximum flow between areas

Seven cortical areas were first manually determined: the visual (V), parietal (P), prefrontal (PF), medial prefrontal (mPF), motor (M), anterior temporal (aT), and posterior temporal (pT) cortices. The maximum link among all links connected two areas was selected to represent the maximum flow between the two areas.

Activation levels from comparison loadings

For each experimental condition (C+ or C trials), we determined the significant comparisons for each latent network structure by performing trial shuffling. For each shuffle, dDTF and SR-ERC were recalculated after trial type was randomly shuffled. A new tensor was formed and we estimated how the five spectrotemporal connectivity structures identified from the original data contributed in the shuffled data, by performing PARAFAC on the tensor from the shuffled data with the last two tensor dimensions (Time-Frequency and Connection-Subject) fixed with the values acquired from the original data. Trial shuffling was performed 50 times. The loading from the original data that was significantly different than the loadings from the shuffled data is identified as the significant loading (α = 0.05). The comparisons with significant loadings showed the context and response dependencies of each structure, and further revealed the activation levels of each structure in different scenarios.

Interdependencies among latent network structures

We determined the interdependencies among latent network structures by examining how the activation of one structure could affect the activation of the others. To achieve this, we examined the comparison loadings in the first tensor dimension. For the correlations of structures' activation differences across contexts under Rf, we focused on the six comparison loadings representing activation differences across contexts under Rf (ChRfCmRf, CmRfCwRf, and CwRfChRf) from C+ and C trials, and evaluated the correlations of these comparison loadings from different structures. The p-values for testing the hypothesis of no correlation were then computed. For the correlations of structures' activation differences across contexts under Rn, we used the same approach to evaluate the correlations of structures' activation differences across contexts under Rn (ChRnCmRn, CmRnCwRn, and CwRnChRn). For the correlations of structures' activation differences across responses, we evaluated the correlations of structures' activation differences across responses (ChRfChRn, CmRfCmRn, and CwRfCwRn).

Video 1. Video clip for CmRf trials.

The clip contains the Context and Response periods.

DOI: http://dx.doi.org/10.7554/eLife.06121.005

Video 2. Video clip for ChRf trials.

The clip contains the Context and Response periods.

DOI: http://dx.doi.org/10.7554/eLife.06121.006

Video 3. Video clip for CwRf trials.

The clip contains the Context and Response periods.

DOI: http://dx.doi.org/10.7554/eLife.06121.007

Video 4. Video clip for CmRn trials.

The clip contains the Context and Response periods.

DOI: http://dx.doi.org/10.7554/eLife.06121.008

Video 5. Video clip for ChRn trials.

The clip contains the Context and Response periods.

DOI: http://dx.doi.org/10.7554/eLife.06121.009

Video 6. Video clip for CwRn trials.

The clip contains the Context and Response periods.

DOI: http://dx.doi.org/10.7554/eLife.06121.010

References

Acknowledgements

We thank Charles Yokoyama for valuable discussion and paper editing, and Naomi Hasegawa and Tomonori Notoya for medical and technical assistance. We also thank Jun Tani and Douglas Bakkum for their critical comments.

Decision letter

Timothy Behrens, Reviewing editor, Oxford University, United Kingdom

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for sending your work entitled “Mesoscopic brain networks regulate cognitive enchainment in social monitoring” for consideration at eLife. Your article has been favorably evaluated by three reviewers, one of whom, Timothy Behrens (Senior Editor and Reviewing Editor) is a member of our board.

The editor and the other reviewers discussed their comments before we reached this decision, and the editor has assembled the following comments to help you prepare a revised submission.

All three reviewers were very impressed by the unusual nature of the data, by the sophisticated and revealing analysis that was able to simplify complex data to the dynamics of network activity that underlies social observations in the task, and all reviewers were excited by the ability to study this particular brain network in macaque monkeys, given its importance in social cognition in many human studies.

Example praise for the manuscript in review was as follows:

Reviewer 1:

The neural mechanisms mediating social behavior appear to be distributed across the brain, including areas thought to be specialized for social information processing (such as those in the temporal lobe and medial prefrontal cortex) and others that serve more general purpose functions (such as those involved in reward, decision-making, executive control, and attention). One impediment to understanding how these circuits interact to translate sensory perception into behavior is that the methods typically applied suffer from either poor temporal resolution (fMRI in humans), poor spatial resolution (EEG), or limited coverage (single unit recording in animals). The current paper uses wide-scale recordings from intracranial EEG (ECoG) system to simultaneously assess interactions amongst cortical areas during social information processing. Importantly, the authors apply sophisticated, data-driven analytical tools to derive the information flow between and amongst these areas. This is an important advance.

Reviewer 2:

This paper describes a novel approach to extract information about the functional connectivity of distributed brain networks, and how these connections evolve through time as a function of carrying out social observation. The paper is unique in two respects. First, the dataset contains whole-hemisphere ECoG is collected whilst three non-human primates perform a social task (although the authors have published several papers with similar ECoG data previously). Second, the analysis approach is novel and innovative. dDTF is used to identify connections between regions, and then factor analysis is used to isolate how these vary as a function of different task conditions. Using this approach, the authors identify five separate networks (‘structures’). These superficially have similarities in terms of their connectional structure (e.g. structures 1/2, and structures 3/4), but differ in terms of their temporal dynamics and their activation across conditions. Intriguingly, some of the networks contain classic ‘social’ regions, such as the superior temporal sulcus. By examining how these networks evolve through time, the authors attempt to reveal the chain of events underlying different forms of social observation.

Reviewer 3:

This paper reports extremely unusual data from ECoG recordings of macaque monkeys viewing other monkeys engaged in socially threatening situations. It also reports a novel and potentially powerful set of analysis tools for analysing functional networks acquired at high temporal resolution in ECoG data.

There are several key strengths and novelties about the paper.

(1) Intriguing patterns of brain activity and functional connections are reported that have perhaps never been recorded from outside and fMRI scanner, and certainly not in social situations. These patterns are reminiscent of human brain areas that respond to complex social tasks.

(2) The broad coverage of the ECoG data by comparison to most other macaque monkey recordings allows the analysis of information flow between brain areas. Because of the temporal resolution, these analyses can also begin to make directional inferences.

(3) The complex nature of ECoG data requires sophisticated data compression techniques. The authors are extremely inventive in how they analyse their data – they develop tools which compress the data into its digestible patterns, but which maintain the key comparisons between conditions, between brain areas, and between task times. I find this very impressive.

However, there were several features of the data that limited the reviewers' enthusiasm. In brief, these were broadly to do with the task and the over-interpretation of the data. We believe that with a thorough rewrite of the manuscript, to focus on describing the data clearly rather than making interpretations of the data, both in terms of its relevance to particular social behaviours, and in terms of causal mechanisms that are not supported directly by the data, it will be possible to improve the manuscript to remove these concerns.

Specifically:

Comments about the task:

Reviewer #1:

1) The task, which the authors refer to as monitoring of a social context, involves only passive viewing without any differential response expected (or found) on the part of the observer monkey in reaction to different ‘social scenarios’. The only statistically significant difference in behavior is the reported difference in left vs. right gaze positions during the response phase of the trial when monkeys view a frightened vM vs. a neutral vM. Since the authors did not find this result to vary under different ‘social’ contexts, it remains unclear whether/what the observer monkeys made out of the different social situations examined in the study.

2) In the Abstract, it might be more precise to talk about “mapping the network structure” as monkeys viewed scenes leading up to examining the valence on a conspecific's face rather than calling it a “social cognitive behavior”/“social context monitoring” since the behavior per se doesn't inform us in this regard.

3) In the Methods, the task lacks a non-social control, which will in turn depend on what the authors, for the purpose of the expt., define as ‘social’. For instance, is the context of two monkeys looking at each other ‘social’ in which case, the non-social control could be another monkey looking away from the monkey on the left of the screen. Is monkey-monkey looking at each other more ‘social’ than monkey-human looking at each other? These are of course tough questions to answer from a single experiment but it will be still useful to discuss the authors' view on these issues as their basis for designing the expt.

On the other hand, if the element of affect/threatening the monkey on the left is what constitutes social in this case, a non-social control could be a non-threatening/neutral monkey on the right or perhaps an inanimate monkey/human with a threatening expression on it?

Furthermore, the monkey in the video facing an empty wall does not control for the presence of an object or an individual. There are clear sensory/perceptual differences between a monkey face, a human face, and a wall.

Reviewer #3:

It is not clear from the monkey's behaviour that the social nature of the task, rather than the perceptual differences between stimuli, is what is important. In my view this slightly confounds the clear interpretation of these signals as social signals.

Comments about the interpretation of the data:

Reviewer #1:

4) In the Results, the authors say that “subjects tended to focus on the right section during the Response period when the response stimuli were Rf (C+Rf trials)” vs C+Rn trials as well as C trials. They conclude that this “indicated that gazing behavior required not only vM responses with high emotional valence, but also the context of vM's response. This suggested that the gazing behavior by the subject represented an automatic or intuitive reaction to socially relatable scenarios (e.g. ‘vM was frightened after being threatened’).”

I think the comparison that the authors analyzed suggest that monkeys are interested in knowing what is behind the curtain when vM is frightened vs. when it is not. This comparison doesn't take into account the nature of context preceding the response since what frightened the monkey – monkey vs. human vs. empty wall – was not compared against each other here. In fact, when the context was indeed taken into account, the authors did not find any significant context dependence at all (Figure 1–figure supplement 3) and hence, in my opinion, the reported difference in gazing location does not make the case for subjects ‘monitoring the context of a social scenario’ at all. This is a critical concern.

Reviewer #2:

I came away with a less than clear impression of what the findings had taught us. I think that this was partly a result of the way the paper was structured. The focus, in the Abstract, Introduction and initial results, was quite heavily on the mathematical technique used as opposed to the results obtained with this technique. Upon reaching the Results, there were some clearly interesting findings, but many of the interpretations were dependent upon a reverse inference from the brain regions activated (Poldrack RA, TiCS 2006), as opposed to an inference based on the task manipulation. I would therefore urge the authors to shift the emphasis in the initial part of the paper towards what their technique and dataset tells us about the dynamics of connectivity in these brain regions, as opposed to being so heavily focused on the methodology.

Reviewer #3:

The nature of the task in combination with the complex analysis often makes interpretation of the results complex, and the authors resort to an interpretation that does not rely on the data. There are examples of this throughout the manuscript. Here is a typical one:

“This result suggests that Structure 5 underlies the context-dependent feedback modulation of response perception linked to social reasoning in the task (‘why vM is frightened?’ or ‘is vM frightened because it was threatened by something?’ or ‘should I be concerned that vM is frightened?’).”

This kind of inference is inappropriate, and is also unnecessary.

The remaining major comments were either about the technicalities of the analysis, or about the reporting of these technicalities. Where possible, we would like you to address these technical concerns. More broadly speaking, we would like you to focus on clarifying the more technical aspects of the manuscript so that the manuscript can be clearly understood by a broad audience.

Other comments:

Reviewer #1:

In the subsection “Deconvolution Analysis or Cortical Information Processing,” wouldn't it be more useful task-wise to identify independent sources by using only C+ trials instead of merging data from C+ & C trials for ICA? Did the authors do this analysis? How does it affect the results?

In the first paragraph of the subsection “Dynamic Cognitive Chain Describes Social Context Monitoring” to examine how the network interactions change as a function of context, wouldn't it be better to examine the correlations between activation differences across contexts (Cm & Ch vs. Cw) rather than between C+ vs. C trials?

In the same subsection, as per Figure 5A, none of the correlations with structure 2 are significant. In that case does a causal dependence of structure 4 and 5 on structure 2 apply?

In the second paragraph of the same subsection, does this analysis include the time courses of gaze positions and scanning behavior of all contexts (Cm, Ch & Cw) and trials (C+ & C)? Did you find the timing correlations to be different across contexts?

Reviewer #2:

I would therefore urge the authors to shift the emphasis in the initial part of the paper towards what their technique and dataset tells us about the dynamics of connectivity in these brain regions, as opposed to being so heavily focused on the methodology. That said, there were also times where it was unclear what order different techniques were being applied, and how they were being applied. For instance, it is mentioned that ICA was first applied, but it was unclear what the input dimensions of the ICA were, or how the obtained components were subsequently used for the dDTF and PARAFAC analysis. I felt that a clear ‘analysis pipeline’ diagram, starting with raw data and ending with the key results, would be very helpful to include as a supplementary figure.

Finally, I felt that the presentation of the comparison-condition component in Figures 3 and 4 was unclear, in that it seemed diagrammatic whereas presumably it was based upon the statistics of the comparison being performed. A more quantitative approach to presenting this data would help to make it more clear and compelling.

Reviewer #3:

The manuscript is written from a very technical perspective, and does not introduce the key neuroscience issues well. This makes it a very tough read, particularly for a broad interest journal such as eLife.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled “Mesoscopic brain networks regulate cognitive enchainment in social monitoring” for further consideration at eLife. Your revised article has been evaluated by Timothy Behrens (Senior Editor and Reviewing Editor) and the original reviewers.

The reviews are appended below, and the sentiment in the reviews was reiterated in discussion between reviewers. Essentially, we all remain impressed with the data, are aware that the volume and complexity of data requires innovative new analytical tools, and we still all believe that the proposed analyses are likely very interesting. However, two of the reviewers are clear (and the editorial team agree) that the manuscript cannot possibly be published in a broad interest journal in its present form. Whilst the technical details are more revealing and the claims better substantiated in this current revision, the clarity of the manuscript has, in our view, not improved. It is very difficult indeed to parse the manuscript to understand what the central contribution is. The figures are not well explained in the legends – the legends report details that might be more appropriate in a technical methods section and do not perform the main function of a figure legend, which is to explain how to read the figure.

We would like to ask you to follow the reviewers’ advice below and to restructure and rewrite the paper, so it can be followed in detail by a naïve reader.

Reviewer #1:

I think the authors have done a fine job revising this paper, which presents a novel analytical approach to analyzing density neurophysiological data gathered in monkeys viewing a set of different social scenarios. I'm not yet convinced that the descriptor “social scenario viewing task” is much better than “social monitoring task”– it's both a mouthful and I think still puts too much emphasis on the idea of a task. It might be more concise and precise to say the monkeys were viewing social scenarios.

Reviewer #2:

The authors have put quite some work into making the manuscript clearer, and providing more substantial information concerning the methodology. But the main concerns of the reviewers seem to have held: because of the task design, it is very difficult to draw any strong conclusions about the role of the identified networks in social cognition. In their own words, they acknowledge “without a proper control, we can't conclusively link our results to social cognition.” As such, the authors now explicitly state (in their response) that the main conclusion from the paper is about the development of a novel method. Not much can be learnt about the explicit meaning of the underlying cognitive processes.

The question then becomes, is this novel method of sufficiently broad interest and importance that it will change the views of the community? The authors main claim seems to be that it will reveal previously undiscovered ‘cognitive chains’. What exactly is meant by this? That brain regions are activated sequentially in response to a task, and that different brain areas will be recruited depending upon the cognitive function? On the one hand, this seems to be something that we already know from many years of MEG and EEG research in humans. On the other, it is clear that the spatial resolution of the ECoG data far surpasses this research, and the extraction of network structure is very different from what has gone before. Nevertheless, I still very much struggled to understand whether this extraction of network structure was (a) valid or (b) important. In terms of validity, if I were reviewing this paper at a methods journal, I'd expect a set of examples in simulated data to convince me that the method works well and robustly. In terms of importance, it's precisely because the task is poorly controlled that I don't get an “aha!” moment when examining the results that convinces me that it has definitely worked.

I can see that there is a lot of potential in the paper, and it seems unfair to dismiss what could be an important set of findings about a novel technique. But equally I didn't find the results and structure of the paper sufficiently compelling or clear to warrant publication at present. I'd be open to other reviewers pointing out what they felt was the evidence that the technique works convincingly, or that it has produced a particularly important result.

Reviewer #3:

The authors have done a good job dealing with the technical concerns and no longer over-interpret the data.

However, there still remain concerns about whether the manuscript as currently written can be understood by a broad interest readership, or even a relatively specialised readership within the field. Indeed the mathematical expertise needed even to understand the basic analysis is extreme, and the authors do a very poor job in making the analysis comprehensible. Like the other reviewers, I suspect, I am still finding it difficult to really evaluate the neuroscientific findings, as I cannot fully understand their implications.

Despite the unusual and high quality of the data and the sophisticated nature of the analysis, it is therefore very difficult to understand what we have learnt about the cognitive processes.

For example, the figure legends do not explain how to read the figures. The new diagram asked for by a different reviewer is difficult to understand.

It is essential that the authors address this in both the Abstract and the main text before this can be published in a journal such as eLife.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled “Cortical network architecture for context processing in primate brain” for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior Editor).

The new manuscript is, in my view, dramatically clarified, and the data remain extremely exciting. However, in the rewrite the focus of the Abstract and the Introduction has been moved towards the technical achievements of the manuscript. This presents a difficulty for eLife because the manuscript has not been reviewed or considered as a technical manuscript. In the Discussion, the nature of the claims is much more balanced between the technical innovations and the neuroscience claims (except perhaps in the first overarching paragraph where the balance is again towards the technical). eLife cannot publish the manuscript on the basis of the technical claim alone, but I believe it will be relatively easy to adjust the Abstract and Introduction to highlight and to be clear about the new neuroscience claim.

In my view, the neuroscience claim is still not clearly stated in the Abstract or Introduction. You summarise your findings as follows in the Abstract:

“Collectively, the five structures delineated the flow of information in the network, including two isomorphic variants defining the encoding and retrieval, respectively, of contextual information.”

and in the Introduction:

“The structures we identified provide new insights on how contextual information is processed and help to identify relationships linking network communication and behavior.”

Both of these statements are descriptions of the success of the technique, and not of new neuroscience findings. In brief, what new insights do they provide?

In the Discussion you are much more clear about this in the subsection “Context is Encoded by Interactions of Large-Scale Network Structures”.

If I am correct, it seems like the key claims can be summarised as follows:

a) Large-scale network interactions are different in different contexts.

b) Bottom-up connections from posterior temporal cortex to anterior temporal cortex and medial PFC can encode a context whilst it is being first processed and held on-line.

c) These exact same brain regions exhibit the opposite top-down connectivity only seconds later when the context is being applied to incoming sensory information.

d) The extent to which the bottom-up connectivity is active during the processing of context predicts the extent to which the top-down connectivity will act when the context is later being applied.

To me, these claims seem to be striking and important and it is these claims that the manuscript has been judged on, but these claims no longer appear anywhere in the Abstract or Introduction and are only in a subsection of the Discussion. So I am asking for one further revision. In this round of revision, I am asking for changes to the Abstract, to the Introduction (and possibly also to the Discussion if you choose), which concisely and precisely highlight the new neuroscience claims, and which change the tone of the current version of the manuscript from being largely a methodological innovation, to being a balanced manuscript which introduces a new technique to make a clear and precise claim about the contextual processing of sensory information.

DOI: http://dx.doi.org/10.7554/eLife.06121.027

Author response