Trial-level Representational Similarity Analysis

Shenyang Huang; Cortney M Howard; Paul C Bogdan; Ricardo Morales-Torres; Matthew Slayton; Roberto Cabeza; Simon W Davis

doi:10.7554/eLife.106694.1

Introduction

Since the start of any systematic examination of the mind, the concept of representation has provided a link between the external world and the content of mental life (Brentano, 1874). Current perspectives in cognitive science have defined representations as brain activity patterns that convey some behaviorally relevant content, which could be sensory perception, memory, concept knowledge, or social relations (Kriegeskorte & Diedrichsen, 2019). One of the central approaches for evaluating information represented in the brain is representational similarity analysis (RSA), an analytical approach that queries the representational geometry of the brain in terms of its alignment with the representational geometry of some cognitive model (Kriegeskorte et al., 2008; Kriegeskorte & Kievit, 2013). The RSA approach has demonstrated utility across many domains of cognitive neuroscience research, such as visual perception (Jozranjbar et al., 2023), episodic memory (Xue, 2018), concept knowledge (Bauer & Just, 2019), social information (Freeman et al., 2018), and cognitive control (Freund, Etzel, et al., 2021). Despite its proven success, we argue that the classic RSA approach, henceforth cRSA, has several limitations in efficacy across various experimental scenarios, due to its inability to reflect the proper multi-level variance structure within the data. In this paper, we present an advancement termed trial-level RSA, or tRSA.

In cognitive neuroscience, cRSA is most commonly implemented to evaluate the representational geometry of a neural system by comparing it to some model of cognition. A typical implementation of cRSA involves four main steps (see Figure 1). First, brain activity responses to a series of N trials are compared against each other (typically using correlation distance, or 1 - Pearson’s r) to form an N×N representational similarity matrix, or RSM_brain.

Representational Similarity Analysis (RSA).
Steps for classic RSA (cRSA) and trial-level RSA (tRSA). A) Neural Representational Similarity Matrix (RSM_brain) is generated by correlating multi-voxel activity patterns across all trials within a Region of Interest (ROI), reflecting similarity between neural responses. B) Model Representational Similarity Matrix (RSM_model) is constructed by correlating of-interest stimulus properties across all trials. C) First-Level cRSA (top) and tRSA (bottom). For cRSA, the lower triangular parts (black outline) of RSM_brain and RSM_model are compared, producing a single summary statistic (e.g., Spearman’s rho) across all trials. For tRSA, representational similarity values from RSM_brain and RSM_model for the same trial (e.g., tennis ball; red dashed outline) are compared, producing a single representational strength estimate for that trial. D) Second-Level analysis for cRSA and tRSA. In cRSA, subject-level r values are submitted to a one-sample t-test to assess whether the values reliably exceed zero. In tRSA, a linear random effects model with random effects for subject and stimulus is fit, and hypothesis testing determines whether the estimated intercept is significantly greater than zero.

Second, a hypothesis of how this brain system ought to respond — a cognitive model — is created in the form of a model RSM, or RSM_model. This RSM_model can be created based on objective features of the stimuli (e.g., image brightness, category membership), subject ratings or behaviors (e.g., pleasantness, memory success), or outputs from computational models (e.g., neuron activations in artificial neural networks, semantic embeddings from large language models). Third, values from the lower (or equivalently, upper) triangular parts of both RSM_brain and RSM_model are retrieved, vectorized, and compared. This RSM_brain-RSM_model comparison focuses on the similarity between the two representational geometries, which is typically measured by Spearman’s rank correlation (Kriegeskorte et al., 2008) but alternatives have also been proposed (Bobadilla-Suarez et al., 2020; Diedrichsen et al., 2021; Walther et al., 2016). This similarity measure is referred to as a first-level RSA score or representational strength.

Fourth, once the above procedures are completed for each subject and condition, the first-level measures of representation are submitted to a second-level hypothesis testing for statistical inferences — often using general linear models such as t-test and ANOVA. One notable feature of cRSA is that the representational geometries are compared to one another in their entireties in step 3, and this step produces as a single measure of representational strength collapsing across all experimental trials. In other words, cRSA estimates cannot be interpreted for specific experimental trials or stimuli; rather, they only reflect how strongly the studied neural system represents for the entire set of items, limiting the realm of possible research questions. Most critically, as trial-level information is collapsed in cRSA, multiple sources of variance become intractable, yet these ignored variances could render subsequent second-level analyses susceptible to erroneous inferences.

A behavioral or neural measure in a single experimental trial consists of meaningful variances from four distinct sources: condition-level (experimental task manipulations such as cognitive load and emotional valence), subject-level (individual differences in perceptual acuity, prior knowledge, or other cognitive faculties), stimulus-level (features of the stimuli with behavioral or neural relevance), and trial-level (physiological and nuisance variables affecting measurement). To explain the variance in this measure, the appropriate statistical model should reflect said multi-level variance structure. However, the cRSA approach only does so imperfectly with three major limitations, which we detail below.

First, the heterogeneity of subject- and/or condition-level variances in real datasets adversely impacts the sensitivity and reliability of cRSA. One of the fundamental assumptions of general linear models (step 4 of cRSA; see Figure 1D) is homoscedasticity or homogeneity of variance — that is, all residuals should have equal variance. This assumption is often violated in real datasets. The variance of first-level cRSA scores depends largely on the number of observations (see Figure 2), but this trial count can be highly variable across subjects or conditions, resulting in heteroscedasticity. This issue is prevalent in studies where trials are selected and grouped based on subject behavior, such as in memory or attention tasks. In those cases, treating cRSA scores obtained from 100 trials and those obtained from 20 trials as having equal variance would violate the homoscedasticity assumption and lead to unreliable results. One suggested remedy of the issue is equating the number of trials via random subsampling conditions with more trials (Dimsdale-Zucker & Ranganath, 2018); however, this solution comes with the cost of not making full use of the rich information from those low-variance conditions and may not be ideal when the number of trials is highly unbalanced.

Trial count influences the reliability of correlations.
The spread of correlation estimates (y-axis) varies nonlinearly with the number of observations, or trial count (x-axis). Trial count ranges from 10 to 200 with increments of 1. For each trial count, observations were drawn from a bivariate normal distribution with the given ground truth Pearson correlation, and their empirical Pearson correlation coefficient was computed. Ten random samples were drawn for each trial count and ground truth combination. Both ground truth and estimated values in the scatter plots are Fisher-transformed (z).

Second, cRSA is unable to model stimulus-level variance. Most studies present subjects with a fixed set of stimuli, which are supposedly samples representative of some broader category. In this case, using cRSA brings about two issues. For one, stimuli can vary in a wealth of properties. Object images, for example, vary in terms of image complexity, concept frequency, and memorability, all of which can affect both behavior and neural activity in important and systematic ways (Bainbridge et al., 2017; Hovhannisyan et al., 2021; Naspi et al., 2021). To mitigate this issue, one might explicitly manipulate the distribution of relevant stimulus properties or cross-validate cRSA results on subsets of stimuli (Freund, Bugg, et al., 2021).

Nevertheless, neither solution could address the second issue, which is the fact that the same stimuli are presented to multiple subjects. To properly address this stimulus-level dependence in the data and generalize the results beyond the fixed stimulus set, one must model stimulus identity as random effects, for the very same reason subjects are specified as random effects (Chen et al., 2021; Yarkoni, 2022). However, this solution is not accessible for cRSA since its current implementation stipulates the collapsing of stimulus information across trials (step 3 of cRSA; see Figure 1C).

Third, cRSA is not well-suited for testing the influence of stimulus-level or trial-level properties on neural representations. For example, it may be desirable to explicitly test or control for how certain stimulus-level properties (e.g., memorability) or trial-level recordings (e.g., cardiac cycle, pupil dilation) affect neural representations (Critchley & Garfinkel, 2018; van der Wel & van Steenbergen, 2018). One strategy to analyze the effects of those stimulus- or trial-level continuous variables is to discretize them into categories (e.g., low vs. medium vs. high); however, discretization inevitably leads to a loss of information, reduced statistical power, and potentially misleading outcomes (Cohen, 1983; MacCallum et al., 2002). Alternatively, one could construct additional RSMs for the covariates and then compare all RSMs using a partial correlation or multiple regression framework, yet the statistical interpretations become much less straightforward after the conversion. For instance, even though a univariate random variable v, such as pleasantness ratings, can be conveniently converted to an RSM using pairwise distance metrics (Weaverdyck et al., 2020), the very same RSM would also be derived from the opposite random variable −v, leaving uncertain of the directionality of any findings with the RSM (see also Bainbridge & Rissman, 2018).

Here, we propose an original method for evaluating the neural representation of information at the level of individual experimental trials: trial-level RSA or tRSA. In this approach, instead of deriving a single measure of the similarity between RSM_model and RSM_brain, we compute a series of similarity measures on a trial-by-trial (row-by-row or column-by-column) basis (Davis et al., 2021). For instance, the representational strength in the first trial is calculated as the similarity between the first row of RSM_model and the first row of RSM_brain (see Figure 1C). We argue that our novel tRSA approach addresses all of the aforementioned limitations of cRSA: it can properly account for the multi-level variance structure in the data, affords generalizability beyond the fixed stimulus set, and allows one to test stimulus- or trial-level modulations of neural representations in a straightforward way. In this paper, we assessed the efficacy of tRSA in comparison with cRSA using data from simulations representing a wide range of possible experimental scenarios (Experiment 1) and data from an extant real fMRI study (Experiment 2). In both experiments, our results demonstrated enhanced sensitivity, robustness, and flexibility with analyzing neural representations using the tRSA approach.

Methods

Experiment 1: cRSA and tRSA comparisons in simulated data

Experiment 1 consisted of three main parts. First, we validated the similarity between tRSA and cRSA results in assessing the general representational strength across all trials. In other words, we established that tRSA can carry out the main purpose of cRSA. Second, we compared the statistical inferences generated by each approach in a set of synthetic within-subject studies with two conditions. Third, we demonstrated tRSA’s unique capability of assessing continuous modulators of representational strength in a set of synthetic scenarios. A host of parameters were varied to ensure the robustness of the comparisons in all three parts, such as the trial count, subject sample size, and effect size.

Estimating overall representational strength

We first set out to validate that tRSA indeed measures representation similarly to cRSA, and we conducted a series of simulations to compare the estimates of RSM_model-RSM_brain similarity obtained from both methods. Assuming an experimental design with n = 200 trials, we sampled n(n − 1)/2 = 19900 pairs of values from a bivariate normal distribution, with their Pearson correlation fixed at some given value via the ‘MASS 7.3-60.2’ package (Venables et al., 2002) in R 4.4.1 (R Core Team, 2024). The two vectors were rearranged as the lower triangular part of two n × n matrices, and the upper triangular parts were filled out accordingly to yield two symmetric matrices M and S, with which we conducted both cRSA and tRSA. With cRSA, overall representational strength was estimated as the Fisher-transformed Pearson correlation coefficient of the lower triangular parts of the two matrices, which was always equal to the ground truth parameter z. With tRSA, the representational strength for a given trial i was computed as the Fisher-transformed Pearson correlation coefficient between two row vectors, M_i· and S_i·, barring entries on the diagonal (see Figure 1C, Bottom). The tRSA values for all trials were then averaged to yield an estimate of overall representational strength comparable to that from cRSA. Note that Fisher transformation of correlation coefficients was always applied such that the distribution of values is more approximately normal (Silver & Dunlap, 1987). This set of procedures was repeated for 10,000 iterations, for each ground truth parameter z ranging from −0.2 to 0.6 with increments of 0.1.

To adhere to the mathematical constraints of correlation-based RSMs (i.e., positive semidefinite), we next simulated activity patterns, from which RSMs were derived. Specifically, assuming an experiment with n trials and q measurement channels (e.g., voxels), we sampled a set of n ⋅ q values from a univariate standard normal distribution N(0,1), and another set of n ⋅ q values from a univariate normal distribution N(0, σ²). Both vectors were rearranged into matrices of n rows and q columns, yielding a ground truth activity pattern and a noise pattern. The summation of these two patterns yielded a measured pattern. Two RSMs were then generated from the ground truth pattern (RSM_model) and the measured pattern (RSM_brain) using Pearson’s correlation, with which cRSA and tRSA were conducted separately. This set of procedures was repeated for 10,000 iterations with different random samples. The following parameter values were used for reported results: q = 500, n ∈ {10, 15, 20, 25, 30, 40, 50, 100}, σ² ranging from 0.8 to 2.0 with increments of 0.2.

We next assessed the correspondence between tRSA and cRSA in the presence of discrete conditions, which is often seen in real studies. Assuming a simple experimental design with two conditions, A and B, consisting of n_A and n_B trials, respectively, we generated a ground truth activity pattern with n_A + n_B rows and m columns in the same way as before. Then, we generated two noise patterns, which were controlled by parameters σ_A and σ_B, respectively, one for each condition. The measured pattern was again computed as the summation of ground truth and noise. cRSA was conducted separately for trials in each condition. Critically, tRSA was conducted in an across-condition fashion: we generated RSMs with all n_A + n_B trials, obtained n_A + n_B tRSA values as previously described, and then split the values into two sets based on the condition each trial belonged to. Finally, as before, we computed the average of tRSA values separately for Conditions A and B to allow comparisons with cRSA values. This set of procedures was repeated for 10,000 iterations with different random samples. We assessed the influence of three factors: sample size, balance, and noise level, with the following parameter values: q = 500, n_A, n_B ∈ {20,80,320}, σ_A, σ_B ∈ {1,2}.

Of note, tRSA could conceivably also be conducted in a within-condition fashion, whereby one would generate two separate sets of RSMs according to condition, i.e., one set of n_A × n_A RSMs for Condition A and another set of n_B × n_B RSMs for Condition B, and then compute the two sets of within-condition tRSA values separately. We hereby advocate for the use of across-condition tRSA — which we have used in previous studies (Davis et al., 2021; Howard et al., 2024; S. Huang, Howard, et al., 2024; Naspi et al., 2023) — over within-condition tRSA. Comparisons of the statistical properties of within-condition and across-condition tRSA approaches are discussed in Appendix 1.

Statistical inferences from tRSA and cRSA

Modeling discrete conditions

The essential theoretical advantage of tRSA over cRSA is that representational strength can be estimated at the level of experimental trials, which would allow us to properly capture the multi-level variance structure in the data. To demonstrate this benefit, four sources of variances were hypothesized: condition, subject, stimulus, and trial. Condition-level variance denotes the effect of experimental conditions or manipulations on representation, with each condition receiving its own noise-level parameter σ_cond,k. This is the focal effect of interest for this simulation.

Additionally, subject-level variance denotes individual differences in the quality of representation across m subjects, with each subject receiving one’s own noise-level parameter σ_subj,i. A within-subject design is common in real experiments and is thus assumed here, where each subject received all experimental conditions. Stimulus-level variance denotes the diversity of stimuli that may result in some being represented more strongly than others, with each stimulus receiving its own noise level parameter σ_stim,j. The subject-level and stimulus-level variances were fully crossed, i.e., all subjects were assumed to view all stimuli exactly once (condition counterbalanced) during the experiment. Finally, trial-level variance denotes random fluctuations in the signal across the experiment and is controlled by a single noise-level parameter σ_trial. Therefore, for a given event of subject i responding to stimulus j in condition k, we computed its noise level as:

Notably, this multi-level variance structure replaced the fixed noise parameters σ_A, σ_B, while all other procedures remained the same as the previous simulation. The entire set of procedures was repeated for 10,000 iterations. The base simulation parameters were as follows:

Experimental design: q = 500, m = 40, n_A = n_B = 100, and
Multi-level variance: σ_cond ∈ {1,1.2}, σ_subj∼ N(0, 0.5²), σ_stim ∼ N(0, 0.2²), σ_trial = 2.
To assess the robustness of RSA methods in different situations that could occur in real data, we introduced a number of variations to the base design:
Number of subjects: m ∈ {10, 20, 40, 80, 160, 320}.
Trial count per subject: n_trial ∈ {20, 30, 46, 70, 100, 250, 220, 330, 500}.
Trial count ratio: n_A ∈ {20, 40, 60, . . ., 180}, n_B = 200 − n_A.
Variance in trial count ratio across subjects: σ_n ∈ {0, 1, 2, 4, 8, 16, 32, 64, 128}. Actual trial counts were drawn from a truncated normal distribution, n_A ∼ truncN(100, σ_n, 10, 190), n_B = 200 − n_A.
Effect size: condition-level noise σ_cond,A/B ∈ {1.00, 1.05, 1.10, . . ., 1.30}. Greater differences in the condition-level noise correspond to larger condition effects.

Following data generation, condition-level representational strength was estimated using cRSA and trial-level representational strength was estimated using tRSA. We focused on the statistical inferences of the effect of conditions, namely b_B−A. To this end, cRSA estimates were submitted to a paired-sample t-test. Critically, tRSA estimates were submitted to a mixed-effects model that is theoretically appropriate for the multi-level variance structure in the data (Baayen et al., 2008; Chen et al., 2021). Specifically, a linear mixed-effects model with a fixed effect of condition and random effects of both subjects and stimuli were fitted to tRSA estimates via the ‘lme4 1.1-35.3’ package in R (Bates et al., 2015), and p-values were estimated using Satterthwaites’s method via the ‘lmerTest 3.1-3’ package (Kuznetsova et al., 2017).

Given data generated with σ_cond,A = σ_cond,B, the correct inference should be a failure to reject the null hypothesis of b_B−A = 0; any significant (p < 0.05) result in either direction was considered a false positive or Type I error. Given data generated with σ_cond,A > σ_cond,B, the inference was considered correct if it rejected the null hypothesis of b_B−A = 0 and yielded the expected sign of the estimated contrast (b_B−A > 0). A significant result with the reverse sign of the estimated contrast (b_B−A < 0) was considered a Type I error, and a nonsignificant (p ≥ 0.05) result was considered a false negative or Type II error. Error rates were computed for each RSA method and were compared against the null hypothesis of equal proportions between methods.

Modeling continuous modulators

In addition to discrete experimental conditions, we also simulated data reflecting scenarios in which representational strength varies continuously with some modulator. For example, one may be interested in the effects of continuous variables such as image complexity and memorability (stimulus-level) or reaction time (trial-level). In this case, σ_trial was no longer a constant throughout the experiment but a trial-level variable. For a given event of subject i responding to stimulus j in trial k, we computed its noise level as:

To capture the effect of this trial-level variable, we generated a random variable v_measured that reflects a trial-level measurement with construct validity r_val = Cor(v_measured, σ_trial), which was restricted to be a nonpositive value. When r_val = 0, r_measured carries no information of the underlying σ_trial and should not predict trial-level representational strength. When r_val < 0, r_measured is inversely related to trial-level noise and should positively predict trial-level representational strength. The entire set of procedures was repeated for 10,000 iterations. The base simulation parameters were as follows:

Experimental design: q = 500, m = 40, n = 200, and
Variance structure: r_val ∈ {0, −0.1}, σ_subj∼ N(0, 0.5²), σ_stim ∼ N(0, 0.2²), σ_trial = 2. As before, we introduced a number of variations, as follows:
Number of subjects: m ∈ {10, 20, 40, 80, 160, 320}.
Number of trials per subject: n_trial ∈ {20, 30, 46, 70, 100, 250, 220, 330, 500}.
Variance in the number of trials per subject: σ_n ∈ {0, 20, 40, . . ., 160}. Actual trial counts were drawn from a truncated normal distribution, n ∼ truncN(n, σ_n, 20, n).
Effect size: r_val ∈ {−0.10, −0.09, . . ., 0.00}. More negative values of r_val should result in greater representational strength.

Following data generation, two conditions were created for each subject based on a median split (A=low r_measured, B=high r_measured), and we tested cRSA and tRSA models as described in the previous simulation, focusing on b_B−A. Furthermore, we additionally modeled trial-level representational strength as a function of r_measured directly, using a linear mixed-effects model of tRSA estimates with a fixed effect of r_measured and random effects of both subjects and stimuli. Given data generated with r_val = 0, the correct inference should be a failure to reject the null hypothesis of b = 0 in both discrete and continuous models; any significant (p < 0.05) result in either direction was considered a false positive or Type I error. Given data generated with r_val < 0, the inference was considered correct if it rejected the null hypothesis of b = 0 and yielded the expected sign of the estimated contrast or slope b > 0. A significant result with the reverse sign of the estimated contrast or slope (b < 0) was considered a Type I error, and a nonsignificant (p ≥ 0.05) result was considered a false negative or Type II error. Error rates were computed for each RSA method and were compared against the null hypothesis of equal proportions between methods.

Experiment 2: cRSA and tRSA comparisons in fMRI data

Experiment 2 proceeded in 3 steps. First, we assessed how the impact of subject-level variance in heterogeneity of trial ratios amongst two critical conditions (“Hits” and “Misses”) can lead to biased estimates of representational strength. Second, we examined the impact of modeling trial-level variance on both representational strength during Object Naming, as well as Mnemonic Strength during a memory retrieval task. Lastly, stimulus-level estimates of representational strength were related to object memorability — an analysis that is not directly accessible to cRSA.

Data acquisition

Participants

A total of 38 adults, aged 18 to 30, participated in this study on a voluntary basis and received monetary compensation for their time. Eligibility criteria required participants to be native or fluent English speakers, with no history of significant neurological or psychiatric conditions, and not taking medications that could affect cognitive function or cerebral blood flow (except for antihypertensive agents). All participants provided written informed consent before the start of the study. Six participants did not complete the study and were excluded from the analysis.

Additionally, two participants were excluded from the Memory Retrieval analyses due to poor memory performance. The final sample included 32 participants (21 women and 11 men) for the Object Perception task and 30 participants (19 women and 11 men) for the Memory Retrieval task.

Experimental design

An expansive outline of the study design for the tasks described below can be found in (S. Huang, Bogdan, et al., 2024); here we briefly review relevant details pertinent to our application of trial-level RSA for a subset of that data. Data from two experimental datasets were used, including an Object Perception dataset and a Memory dataset (Figures 7A and 8A). In the Object Perception task, participants were shown images of 114 unique everyday objects on a white background. The corresponding label presented underneath each object and participants were asked to rate how well the label described the object on a four-point scale. The main purposes of this rating task was to make sure that participants assess the meaning of the objects and to verify that these objects are indeed familiar to them (mean rating = 3.59). The objects were presented for 4 seconds, followed by a jittered fixation cross with an average duration of 4 seconds. In the memory task, a Memory Encoding session took place at least 7 days after Object Perception. During Encoding, participants viewed images of 114 unique real-world scenes along with the 114 objects they had seen in session 1. In each trial, participants were shown a scene image for 3 seconds, followed by a jittered empty box indicating the continuation of the trial, which lasted for an average of 3 seconds, and finally an object image for 4 seconds. During the object presentation, participants rated “how likely it is to find the object in the scene” on a 4-point scale (1 = “very unlikely,” 2 = “somewhat unlikely,” 3 = “somewhat likely,” 4 = “very likely”). Each trial was separated by a jittered fixation cross with an average duration of 4 seconds. Memory Retrieval consisted of three scanning runs, each with 38 trials, lasting approximately 9 minutes and 12 seconds. Memory Retrieval took place one day after Memory Encoding and involved testing participants’ memory of the objects. In the main Memory Retrieval task, participants were presented with 144 labels of real-world objects, of which 114 were labels for previously seen objects and 30 were unrelated novel distractors. In a subsequent Perceptual Memory Retrieval task participants were shown 126 images of real-world objects (96 old images, 18 were different exemplar images of old objects, and 12 images of unrelated novel objects). Participants were asked to determine whether the image was old, similar, or new. All analyses in the current study pertained to fMRI data from Object Perception and the main Memory Retrieval tasks, as well as behavioral data from the Perceptual Memory Retrieval task.

MRI data acquisition

MRI data were collected using a 3T GE MR750 Scanner equipped with an 8-channel head coil at the Brain Imaging and Analysis Center (BIAC) at Duke University. Each MRI session began with a localizer scan, during which 3-plane (straight axial/coronal/sagittal) faster spin echo images were obtained. A high-resolution T1-weighted (T1w) structural scan was then acquired, consisting of 96 axial slices parallel to the AC-PC plane, with voxel dimensions of 0.9 × 0.9 × 1.9 mm³. This was followed by blood-oxygenation-level-dependent (BOLD) functional scans using a whole-brain gradient-echo echo planar imaging sequence (repetition time = 2000 ms, echo time = 30 ms, field of view = 192 mm, 36 oblique slices with voxel dimensions of 3 x 3 x 3 mm³). Task instructions and stimuli were delivered using the PsychToolbox program (Kleiner et al., 2007) and projected onto a mirror at the back of the scanner bore. Participants responded using a four-button fiber-optic response box. To reduce scanner noise, participants wore earplugs, and MRI-compatible lenses were provided when necessary to correct vision. Foam padding was placed inside the head coil to minimize head movement.

Data analysis

MRI data preprocessing

fMRIPrep 23.0.1 (Esteban et al., 2019, 2023) was used to preprocess structural and functional MRI data, as well as generating text descriptions of preprocessing details, which were condensed below. T1w structural images collected across all MRI sessions for the same participant were corrected for intensity non-uniformity with ‘N4BiasFieldCorrection’ (Tustison et al., 2010) from ANTs 2.3.3 (Avants et al., 2011). The T1w-reference was skull-stripped with a Nipype implementation of the ‘antsBrainExtraction.sh’ workflow from ANTs, using OASIS30ANTs as target template. Brain tissue segmentation of cerebrospinal fluid (CSF), white-matter (WM), and gray-matter (GM) was performed on the brain-extracted T1w using ‘fast’ from FSL (Zhang et al., 2001). An anatomical T1w-reference map was computed after registration of T1w images using ‘mri_robust_templatè from FreeSurfer 7.3.2 (Reuter et al., 2010). Brain surfaces were reconstructed using ‘recon-all’ from FreeSurfer 7.3.2 (Dale et al., 1999), and the brain mask estimated previously was refined with a custom variation of the method to reconcile ANTs-derived and FreeSurfer-derived segmentations of the cortical gray-matter of Mindboggle (Klein et al., 2017). Volume-based spatial normalization to the ICBM 152 Nonlinear Asymmetrical template version 2009c standard space was performed through nonlinear registration with ‘antsRegistration’ from ANTs 2.3.3, using brain-extracted versions of both T1w reference and the T1w template.

BOLD functional data across all sessions and runs were preprocessed collectively. A reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep. Head-motion parameters with respect to the BOLD reference were estimated, followed by spatiotemporal filtering using ‘mcflirt’ from FSL (Jenkinson et al., 2002). BOLD runs were slice-time corrected to 0.972s (0.5 of slice acquisition range 0s-1.94s) using ‘3dTshift’ from AFNI (Cox & Hyde, 1997). The BOLD time-series were resampled onto their original, native space by applying the transforms to correct for head-motion. The BOLD reference was then co-registered to the T1w reference using boundary-based registration via ‘bbregister’ from FreeSurfer (Greve & Fischl, 2009). Co-registration was configured with six degrees of freedom. Confounding time-series calculated based on the preprocessed BOLD included: root mean square displacement (RMSD) between frames (Jenkinson et al., 2002), absolute sum of relative framewise displacement (FD) (Power et al., 2014), and the derivative of root mean square variance over voxels (DVARS) (Power et al., 2014), as well as global signals extracted within CSF, WM, and the whole-brain mask. Additionally, a set of physiological regressors were extracted to allow for component-based noise correction (CompCor) (Behzadi et al., 2007).

Principal components were estimated after high-pass filtering the preprocessed BOLD time-series (using a discrete cosine filter with 128s cut-off) for the two CompCor variants: temporal (tCompCor) and anatomical (aCompCor). tCompCor components were calculated from the top 2% variable voxels within the brain mask. For aCompCor, three probabilistic masks (CSF, WM and combined CSF+WM) were generated in anatomical space. For each CompCor decomposition, the k components with the largest singular values that cumulatively explained at least 50% of variance across the nuisance mask (CSF, WM, combined, or temporal) were retained. The BOLD time-series were resampled into standard space with a spatial resolution of 2 × 2 × 2 mm³ or 97 × 115 × 97 voxels. First, a reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep. All resamplings can be performed with a single interpolation step by composing all the pertinent transformations (i.e., head-motion transform matrices, susceptibility distortion correction when available, and co-registrations to anatomical and output spaces). Gridded (volumetric) resamplings were performed using ‘antsApplyTransforms’ (ANTs), configured with Lanczos interpolation to minimize the smoothing effects of other kernels (Lanczos, 1964). Non-gridded (surface) resamplings were performed using ‘mri_vol2surf’ (FreeSurfer).

Single-trial modeling

Neural activity in gray matter (GM) voxels for each object presentation event was estimated using first-level least squares separate general linear models (Mumford et al., 2012) constructed with SPM12 (Friston et al., 2006) and custom MATLAB scripts. Subject-specific GM masks were generated by binarizing the fMRIPrep-derived probabilistic masks with a threshold set to exclude voxels with 80% probability of CSF or WM. Each model included a regressor for the event of interest along with a regressor for all other objects. Both regressors were convolved with the canonical double-Gamma hemodynamic response function, including their temporal and dispersion derivatives to account for variations in the timing of the peak response.

Covariates of no interest included global signal, WM signal, CSF signal, FD, DVARS, RMSD, and six motion parameters related to translation and rotation. A high-pass temporal filter with a cutoff of 128 seconds was applied, and the AR(1) model was used to correct for autocorrelation. These first-level models yielded regression coefficients (betas) that estimated voxel-level neural activity corresponding to a single trial. Trials with FD greater than 1mm either before or during the trial were excluded (range: 0-28). Additionally, runs with more than 25% motion trials were excluded. In total, 1 Object Perception run, and 3 Memory Retrieval runs were excluded.

Behavioral Analyses

Trials for which participants gave no response in the Object Perception (mean 5) or the main Memory Retrieval (mean 4) were excluded from all analyses involving fMRI data at Perception and Memory Retrieval tasks respectively. As our primary behavioral measure, we assessed the memory performance for the object labels. Participants demonstrated variability in their ability to distinguish between old and new objects, as well as in their decision criteria, making direct comparisons of raw responses challenging. To address these biases, we performed a receiver-operating characteristics (ROC) analysis using ‘yardstick 1.3.1’ (Kuhn et al., 2024) in R. This analysis allowed us to categorize Hit and Miss trials by determining whether counting only “4” responses or both “3” and “4” responses as “old” yielded the best decision outcome (i.e., closest to 100% true positive and 0% false positive). The outcome of this analysis was then used to compute the sensitivity index (d’) of recognition. Adjusted hit rates were used to determine adherence to the task and sufficient fMRI data during Memory Retrieval.

Behavioral analyses categorized each trial during Memory Retrieval as either a “Hit” or “Miss” for the purpose of categorizing the conditions in the fMRI data. This analysis determined whether counting only “4” responses or both “3” and “4” responses as “old” produced the most accurate decision outcome (i.e., closest to 100% true positive and 0% false positive). In total, data from 8 participants were adjusted to consider only “4” responses as “hit” trials. The results were then used to calculate the sensitivity index (d’) of recognition within each subjective congruency condition and the corrected hit rate. Participants with post-adjustment hit rates below 40% were considered to have low adherence to the task and were excluded from all subsequent analyses. The number of included trials per participant for perception (mean = 110), ROC values (mean = 0.78), the count of hit trials (mean =76) as well as the corrected hit rate (mean = 0.70), are detailed in Appendix 2 Table 4.

Neural Representational Similarity Matrices

Models for neural pattern similarity (RSM_brain) were constructed using both the Object Perception study and the main Memory Retrieval task. Twenty six regions from the Human Brainnetome Atlas (Fan et al., 2016), focusing on eight areas within the lateral occipital cortex (LOC), 14 areas within the inferior temporal cortex (ITC), and four areas in the inferior parietal cortex (IPL, combining IPL sub regions into lateral posterior and anterior regions), as these regions are critically involved in visual representations during object perception and memory retrieval (Davis et al., 2021; Favila et al., 2020; Howard et al., 2024; Long & Kuhl, 2021). For each of these 26 regions, we constructed similarity matrices of voxel activation patterns across stimuli with custom scripts in MATLAB and SPM12 (Friston et al., 2006). This was done by vectorizing the voxel-level activation values within each region and calculating their correlations using Pearson’s r, excluding all within-run comparisons. While each cell in a RSM_model reflects the similarity in stimulus properties, each cell in the 114×114 RSM_brain represents the similarity in activation patterns across different stimuli (Figure 1A).

Model Representational Similarity Matrix

The C2 layer of the Hierarchical Model of object recognition (HMAX) was used to capture visual similarity between the 114 objects. The PsyTorch implementation of the HMAX model was used and contains four sequential stages, each with their own output: S1, C1, S2, and C2. The S1 layer applies Gabor filters to the input image across multiple scales and orientations. The Gabor filters capture edge-like features such as bars and gratings. The output consists of feature maps that highlight these basic visual components. Following the S1 layer, the C1 layer performs local max pooling over the S1 feature maps. This operation introduces some degree of invariance to position and scale. By selecting the maximum response within localized regions, the C1 layer reduces the spatial resolution while retaining the most salient features. The stages S2 and C2 build upon this foundation using similar pooling mechanisms. Specifically, S2 units pool information from the C1 stage using linear filters and function as radial basis functions, responding most strongly to specific prototype input patterns. These prototypes are derived from random fragments extracted from a set of natural images, independent of the stimuli used in this study. The C2 layer then pools outputs from S2 units using a MAX operation, which provides a degree of position and scale tolerance, allowing for robust representation of visual objects. This global pooling results in a feature vector that is highly invariant to position and scale, capturing the presence of complex features regardless of their location in the input image. The C2 layer’s output serves as a compact representation of the visual input (Kriegeskorte et al., 2008; Riesenhuber & Poggio, 1999, 2002; Serre et al., 2005, 2007; Sufikarimi & Mohammadi, 2020).

Features from the C2 layer of the HMAX model were used to create the model RSM. Pairwise similarity between images was computed using Pearson’s correlation coefficient (r), measuring the relationship between their C2 feature vectors. These values were organized into a 114×114 RSM, where each cell represented the similarity between two images. The resulting RSM_model provided a structured representation of visual similarity, with higher correlations indicating shared complex features (Figure 1B).

Classic Representational Similarity Analysis (cRSA)

We conducted cRSA using data from both the Object Perception dataset and the Memory Retrieval dataset. For Object Perception, we estimated participants’ classic representational strength in each of the 26 regions of interest by calculating Spearman’s rho between the RSM_brain and the RSM_model, while excluding within-run comparisons and the diagonal cells of the matrices (see Figure 1C, Top). We then assessed the strength and reliability of visual representations across participants using one-sample t-tests (testing Fisher-transformed rho > 0) for each of the 26 regions, applying a false discovery rate (FDR) correction with a threshold of q = 0.05.

To evaluate cRSA’s performance in detecting differences in representational strength between conditions, we analyzed data from the main Memory Retrieval task, focusing on old items. An RSM_brain for each of the 26 regions of interest was parsed into two separate RSMs: one containing data from “Hit” trials and the other from “Miss” trials. Similarly, the model RSMs were split based on participant responses. We then calculated Spearman’s rhos for the Hit and Miss conditions by correlating RSM_brain and RSM_model, again excluding within-run comparisons and the diagonal cells of the matrices. These subject-specific representational strength measures were subsequently Fisher-transformed and analyzed using paired-sample t-tests.

Regions showing significantly higher representational strength for Hit trials compared to Miss trials, after FDR correction with q < 0.05, were identified as mnemonic representation regions.

To assess the impact of unbalanced trial counts on mnemonic representational strength, we calculated a Contrast Variance Factor (CVF) for each participant in regions showing evidence of mnemonic representation (p < 0.05, uncorrected). The participant contrast score was determined by subtracting the Miss representational strength (R_i,M) from the Hit representational strength (R_i,H).

Next, we computed the mean difference D̄ across all participants:

To quantify how much each participant’s mnemonic representational strength in each ROI deviated from the mean (thus inflating the contrast variance), we calculated the normalized absolute difference between the participant contrast score (D_i) and the average contrast score (D̄):

Finally, to evaluate the influence of unbalanced trial counts on the variance factor, we correlated the participant CVF averaged across regions with the ratio of Hit trial count (N_i,H) to Miss trial count (N_i,M).

Trial-Level Representational Similarity Analysis (tRSA)

The tRSA approach used the same model and neural RSMs from the Object Perception dataset and the main Memory Retrieval dataset as described above. However, instead of correlating the two matrices in their entirety, correlations were computed on a row-by-row basis, excluding within-run similarities and the diagonal cells of the matrices (see Figure 1C, Bottom). The trial-level estimates from both phases were then fitted to a series of mixed-effects models using the ‘lme4’ package in R (R Core Team, 2024) with restricted maximum likelihood methods. These fitted models were then evaluated using the Akaike Information Criterion (AIC) to determine the optimal random effect structure (Meteyard & Davies, 2020; Park et al., 2020). To further validate model selection (Matuschek et al., 2017), models were refit using maximum likelihood estimation and subjected to model selection via log-likelihood ratio tests (LRTs). The average AICs and the results of the LRTs are reported in Appendix 2 Table 5. Restricted maximum likelihood models with the selected random effects structure were further analyzed using the ‘lmerTest’ package. Denominator degrees of freedom were estimated using Satterthwaite’s method (Satterthwaite, 1946), and fixed effects were tested using t-tests with an alpha level of 0.05, corrected for false discovery rate (FDR) to account for multiple comparisons.

To estimate tRSA representational strength, trial-level estimates from the Object Perception dataset were fit to a mixed-effects model with random intercepts for Participant and Stimulus. Estimated intercepts (b > 0) were used to identify representational regions. For detecting tRSA mnemonic representations in the main Memory Retrieval task, instead of constructing separate Hit and Miss RSMs, row-wise correlations were performed across all trials in the matrices (i.e., across-condition tRSA). This approach allows for trial-level estimates of representational strength that can be subsequently categorized as “Hit” or “Miss” and analyzed using a series of mixed-effects models. The model for mnemonic representation data included the fixed effect of Memory Success (Hit vs. Miss trials) and random intercepts for Participant and Stimulus.

Examining continuous modulators of representation

The capacity of tRSA in examining stimulus-level variance modulating representation was assessed. For each item, a memorability score was calculated as the average response across participants (on a scale of 1-4 for the main Memory Retrieval and 1-3 for Perceptual Retrieval), normalized by the maximum possible value (4 for main retrieval and 3 for perceptual retrieval). This approach produced two stimulus-specific continuous variables indicating the overall confidence with which each item would be recalled. We refer to these measures as Conceptual Memorability and Item Memorability (perceptual). The Item Memorability measure was used as an of-interest fixed effect for trial-level representational strength estimates from the Object Perception dataset. Additional fixed nuisance variables included Conceptual Memorability, fMRI run, and trial-level reaction time. Random intercepts for Participant and Stimulus were also included.

Results

Experiment 1: cRSA and tRSA comparisons in simulated data

Basic statistical properties of tRSA estimates

To understand the statistical properties of representational strength estimates obtained via cRSA and tRSA, we first simulated datasets with a wide range of ground truth representational strengths. With fixed ground truth correlation between two symmetric matrices (i.e., fixed cRSA value), our simulations indicated that tRSA could produce reliable estimates of overall representational strength comparable to cRSA. Specifically, the average of trial-level estimates from tRSA centered around cRSA when the ground truth is zero correlation; as the magnitude of the ground truth increased, there appeared to be a bias toward slightly larger magnitudes. Despite this, the spread of tRSA estimates remained small, with a tight range of [0.599, 0.604] around the ground truth of 0.600 (see Figure 3A).

Correspondence between classic and trial-level RSA.
A) Histograms of representational strength estimates obtained by averaging tRSA values. Each histogram depicted 10,000 iterations. Representational similarity matrices (RSMs) were simulated based on known ground truth representational strength values (cRSA). Solid vertical lines indicate the mean of the observations, and dashed vertical lines indicate the ground truth representational strength. B) Scatter plots of overall representational strength estimates, based on simulated activity patterns, produced by cRSA (x-axis) and tRSA (y-axis). Trial count (n) and measurement noise level (σ²) were both varied. 50 iterations were performed for each parameter combination. Solid slopes indicate where tRSA equaled cRSA (y = x). C) Histograms of overall representational strength estimates, based on simulated activity patterns for two separate conditions, produced by cRSA (gray) and across-condition tRSA (orange). Each histogram depicted 10,000 iterations. Trial counts were varied; for example, “20 : 80” means that Condition A contained 20 trials and Condition B contained 80 trials, which was an unbalanced scenario. Effect sizes were also varied, such that the ground truth representational strength would be equal between conditions (“A = B”) or stronger in Condition B (“A < B”). D) Histograms of condition differences in representational strength values in C.

A limitation of this simulation is that the symmetric matrices were randomly generated without being guaranteed to be mathematically valid similarity matrices (i.e., positive semidefinite). For example, [[1, 1, 1], [1, 1, −1], [1, −1, 1]] is symmetric but is not a valid correlation RSM, since it is impossible to satisfy a = b = c (row 1) and a = b = −c (row 2) unless a = b = c = 0, in which case the Pearson correlation is undefined. Similarly, RSMs using distance metrics such as Euclidean distance must satisfy the triangle inequality, i.e., d_xy + d_xz ≥ d_yz. Moreover, it is difficult to implement the multi-level variance structure directly on RSMs; rather, those variances are theorized to operate directly on the underlying activity patterns. To adhere to the mathematical constraints on similarity matrices and for better correspondence to real implementations, we simulated activity patterns from which RSMs were then derived and used to estimate representational strength. Notably, while it is difficult to fix the ground truth representational strength, we manipulated the statistical dependence between the activity patterns — and by extension, between RSM_brain and RSM_model — by injecting varying levels of measurement noise (σ²). Simulations indicated that increasing measurement noise indeed reduced estimates of representational strength by both cRSA and tRSA (see Figure 3B). Also as expected, greater trial counts led to more stable estimates for both methods. Most importantly, a strong positive correlation between cRSA and tRSA estimates can be observed even in the “noisiest” simulation (i.e., n = 10, σ² = 2; Intercept b₀ = 0.00, SE = 0.01, t = 0.33, p = 0.74; Slope b_cRSA = 1.00, SE = 0.02, t = 64.95, p < 0.001). These results demonstrated the close numerical correspondence between tRSA and cRSA in estimating the overall representational strength of a given system across all trials.

Discrete conditions

Oftentimes, researchers are interested in not only representational strength per se, but also how representation changes with other factors like experimental manipulations (e.g., “easy” vs. “hard”), behavioral performance (e.g., “remembered” vs. “forgotten”), or some combination. Analyzing these differences with cRSA entails partitioning the data to generate separate condition-specific RSMs. Subsequently, an intuitive version of within-condition tRSA can be computed using those RSMs, and its estimates would closely track cRSA estimates, as we have demonstrated in the previous section. Instead, we advocate for across-condition tRSA: full RSMs are used for computing trial-level representational strength, and the estimates are then split by conditions as necessary. For a direct comparison of within- and across-condition tRSA, see Appendix 1.

Here, we examined the correspondence of across-condition tRSA with cRSA in a wide range of scenarios where data can be meaningfully split into two subsets. Specifically, we assessed the influence of three important factors that may vary widely across studies: raw trial counts, balance of trial counts, and effect sizes. Because cRSA was performed separately for each condition, changes in the trial count or effect size of Condition B had no influence on cRSA estimates for Condition A. However, they indeed affected estimates obtained from across-condition tRSA, where all experimental trials were used for representational strength estimation.

Specifically, all else being equal, increasing the trial count in Condition B from 20 to 80 or from 80 to 320 improved the reliability of tRSA estimates in Condition A (see Figure 3C, Rows 1 and 3). Additionally, increasing the noise level for Condition A to be higher than that for Condition B led to reduced tRSA estimates in Condition B as well. These effects were also reflected in the condition difference in tRSA estimates (see Figure 3D). Moreover, it is critical to note that tRSA appeared to be more reliable than cRSA in almost all simulated scenarios, especially in terms of the estimated difference between conditions. These findings suggest that tRSA may be more sensitive to true effects and at lower risk of false positives, i.e., reduced Type II and Type I error rates in formal statistical tests.

Contrasting conditions

The next set of simulations compared across-condition tRSA to cRSA in terms of statistical inferences, focusing on the difference in representational strength between two conditions as a typical effect of interest in many empirical studies. We implemented the condition differences by altering two condition-level noise parameters, where more noise would result in lower representational strength. The statistical significance of the condition difference b_B−A was estimated by a paired-sample t-test for cRSA and by a linear mixed-effects model for tRSA, using a threshold of α = 0.05. Depending on whether the two conditions ought to have the same quality of representation, outcomes were classified as true positives, true negatives, false positives (Type I error), or false negatives (Type II error). The performance of cRSA and tRSA were quantified with their specificity (1 - Type I error rate) and sensitivity (1 - Type II error rate).

We first examined how cRSA and tRSA performed with effects of different sizes, under a base within-subject experimental design consisting of 40 subjects and 200 unique stimuli randomly and evenly split between two conditions. Across 10,000 simulations, cRSA and tRSA were comparable in terms of specificity, with both Type I error rates neighboring around the nominal 5%. As the true condition difference increased, the sensitivity of both methods also increased accordingly. Importantly, tRSA were significantly more sensitive to true effects than cRSA in a range of simulated scenarios, consistent with our hypothesis (see Figure 4).

Testing condition differences in representational strength with varying effect sizes.
The effect size of the condition difference in representation was manipulated by changing the noise level in each condition in small increments, with higher noise levels corresponding to lower ground truth representational strengths. When the noise level was the same for both conditions, Type I error rates (red) were computed as the proportion of significant contrasts across 10,000 iterations, regardless of sign of the estimate. Otherwise, proportions were computed separately for effects of the correct sign (+, or B>A; blue) and of the incorrect sign (-, or B<A). Asterisks indicate the significance level of the test of equal proportions between simulation results from cRSA (left) and tRSA (right). Significance annotation: *** p < 0.001, ** p < 0.01, * p < 0.05.

We further probed the robustness of cRSA and tRSA across datasets that varied in several important experimental design factors, namely the number of participants, trial counts per participant, and the balance of trial counts. In terms of specificity, our simulations did not suggest an effect of those manipulated factors, as the proportions of significant contrasts yielded by both cRSA and tRSA did not significantly deviate from the nominal alpha level of 5%. Tests of significant contrast proportions reported by cRSA and by tRSA also failed to reject the null hypothesis of equal proportions (p > 0.05), suggesting similarly acceptable Type I error rates of both approaches (see Figure 5, Left).

Testing condition differences in representational strength with varying designs.
The robustness of cRSA and tRSA were assessed in several variations of a base experimental design consisting of 40 subjects and 200 unique stimuli (randomly and evenly split into two sets). Simulations varied in A) number of subjects, B) trial count per subject, C) the ratio of trial counts between two conditions (with a fixed total), and D) the variance of trial count ratios across participants (even split overall). In each simulation, the statistical significance of the condition difference in representational strength was determined by a paired-sample t-test for cRSA and a linear mixed-effects model for tRSA, with α = 0.05. The proportion of significant contrasts was computed across 10,000 iterations. Type I error rates were computed regardless of the direction of the contrast. Type II error rates were computed only for the correct contrast (i.e., B > A). Error bars indicate standard errors. In the left column, dashed horizontal lines mark the nominal α level of 0.05, and dotted horizontal lines mark the critical values beyond which the estimated proportion would be significantly different from α. Asterisks indicate the significance of the deviation in error rates between cRSA and tRSA results, *** p < 0.001, ** p < 0.01, * p < 0.05.

In terms of sensitivity, our simulations revealed large impacts of all manipulated parameters. Increasing the number of subjects or the trial count per subject improved the sensitivity of true effects, benefiting both cRSA and tRSA substantially. Tests comparing the sensitivity of tRSA and cRSA suggested an advantage of tRSA at small-to-medium sample sizes of up to at least 40 subjects, and at medium trial counts of between 100 and 220 trials per subject (see Figure 5A-B, Right). We additionally tested variabilities in trial counts that may occur in real experiments. In one set of simulations, the total trial count was fixed at 200 but the ratio of trial counts between conditions varied consistently across subjects (i.e., systematic imbalance; Figure 5C). In another set of simulations, while trial counts were kept approximately equal between conditions on average, subject-specific ratios varied (Figure 5D). In both scenarios, increasing heterogeneity in trial counts resulted in substantially reduced sensitivity for cRSA. This result was expected: the reliability of cRSA estimates depends largely on trial counts (Figure 2), which violated the homoscedasticity assumption of the subsequent paired-sample t-test. In the meantime, with trial-level estimates of representational strength from tRSA, the subsequent mixed-effects models were able to properly handle this variance in trial count. Indeed, our simulations validated that tRSA was more robust to extreme scenarios and always significantly outperformed cRSA in sensitivity (see Figure 5C-D, Right).

Continuously varying effects

A crucial advantage of tRSA over cRSA is that this approach offers a straightforward way to model representational strength as a function of stimulus- or trial-level measures, such as image complexity and subjective familiarity. While these measures could be discretized into a much smaller number of bins (e.g., “low” and “high”), they are better treated as continuous variables to preserve meaningful variance. To simulate these scenarios, we generated data such that true representational strength monotonically decreased with an underlying continuous noise variable σ_trial that was measured by r_measured with construct validity r_val = Cor(r_measured, σ_trial). To perform cRSA, trials were post-hoc split into two conditions by performing a median split on r_measured for each participant. Statistical inferences were again drawn for the difference in representational strength between the two conditions, using both cRSA and tRSA_discrete. In addition, we also assessed the performance of tRSA_continuous, where representational strength was directly predicted by the measured continuous variable r_measured without discretization.

Specificity and sensitivity were first examined for datasets with various effect sizes. Across 10,000 iterations, cRSA had a Type I error rate (i.e., when r_val = 0) close to the nominal rate of 5%, while both tRSA_discrete and tRSA_continuous showed significantly improved specificity (see Figure 6A). As construct validity and effect size of the trial-level measurement increased (i.e., more negative r_val), sensitivity expectedly improved for all methods. Relative to cRSA, tRSA_continuous appeared to show slightly lower sensitivity for small effects (e.g., r_val = −0.01), though it was also more conservative than cRSA with a lower chance of declaring significance in the incorrect direction (B<A). Moreover, tRSA_continuous was the most sensitive method for moderate effects (e.g., r_val = −0.04). We further probed the robustness of cRSA and tRSA across datasets that varied widely in the number of subjects and trial counts. Overall, we observed that the Type I error rates were stable around the nominal 5% for cRSA, as expected. However, both tRSA_discrete and tRSA_continuous demonstrated significantly enhanced specificity across all manipulations (see Figure 6B-D, Left). Sensitivity expectedly improved for all three approaches with increasing subject numbers and trial counts and hit the ceiling with enough samples, though tRSA_continuous significantly outperformed other methods at smaller sample sizes, such as 10 participants with 200 trials each or 40 participants with 30 trials each (see Figure 6B-C, Right). Additionally, we simulated scenarios in which subject-level trial counts were variable (e.g., missingness, exclusions), resulting in not only a reduction of total sample size but also additional variance in any subject-level estimates. The sensitivity of cRSA dropped substantially with this change because cRSA does not account for such variance, while tRSA-based inferences were minimally impacted (see Figure 6D, Right).

Testing continuous modulations of representational strength with varying designs.
The robustness of cRSA and tRSA were assessed in several variations of a base experimental design consisting of 40 subjects and 200 unique stimuli presented once for each subject. Simulations varied in A) the true effect size of representational strength, B) number of subjects, C) trial count per subject, and D) the variance of trial count across subjects (assuming a fixed total). Three statistical tests were performed in each simulation: a paired- sample t-test on the difference between “high” and “low” conditions (median-split of r_measured) for cRSA, a linear mixed-effects model on the same condition difference for tRSA_discrete, and a linear mixed-effects model on the effect of continuous variable r_measured for tRSA_continuous. Statistical significance was determined with α = 0.05. The proportion of significant results was computed across 10,000 iterations. Type I error rates were computed regardless of the sign of the estimate. Type II error rates were computed only for the correct sign (+). Error bars indicate standard errors. In the left column for B through D, dashed horizontal lines mark the nominal α level of 0.05, and dotted horizontal lines mark the critical values beyond which the estimated proportion would be significantly different from α. Asterisks indicate the significance level of the test of equal proportions between each tRSA approach and cRSA, *** p < 0.001, ** p < 0.01, * p < 0.05.

Summary of Experiment 1

Collectively, our simulations offered three key insights into the commonalities and distinctions between our novel tRSA and the commonly implemented cRSA approaches. First, when the goal is to quantify the overall representational strength of the entire set of trials, tRSA produces estimates highly similar to those generated by cRSA. In other words, little information is lost in performing the trial-level computations (Figure 3). Second, tRSA is robust to variances that cRSA is agnostic to, such as imbalances in trial count between conditions and variability in trial count ratios across subjects. In almost all cases tRSA significantly outperformed cRSA in sensitivity (Figure 5). This outcome was expected, as the collapsing of trials in cRSA inevitably leads to the loss of important information regarding meaningful variances from other levels; tRSA is capable of properly modelling the multi-level variance structure. Third, tRSA is uniquely advantageous when neural representations are to be linked to continuous variables, rather than discrete conditions. The voided need for post-hoc discretization prevents additional loss of meaningful variance from the modulators, and tRSA_continuous demonstrated significantly enhanced specificity and sensitivity over cRSA in a wide range of scenarios (Figure 6).

Experiment 2: cRSA and tRSA comparison in fMRI data

Trial-level RSA estimates for Perceptual and Mnemonic Processing

The first goal of Experiment 2 was to compare the results from tRSA and cRSA approaches obtained from real fMRI data. We used fMRI data collected when participants implicitly named 114 colored images of everyday objects in a white background (Object Perception; see Figure 7A). Estimates of regional representational strength in the tRSA models were compared with mean cRSA values across participants, for each region of interest. Unsurprisingly, the tRSA model estimates and the cRSA mean correlation values demonstrated similar distributions (Figure 7B) and were highly correlated (Figure 7C). Critically, however, the two approaches yielded different results in terms of statistically significant regions. Specifically, the cRSA approach identified four significant regions in the LOC during Object Perception, whereas the tRSA approach identified two additional regions within the LOC (Table 1, Figure 7D, and Figure 7E). This discrepancy indicated the ability of tRSA to more accurately and robustly identify representational regions.

tRSA outperforms cRSA in identifying representational regions during object perception.
A) Visualization of the Object Perception Task. The fMRI task required participants to view 114 labeled objects and rate how well the labels describe the objects on a scale of 1 to 4 (mean 3.59). B) Density plots (y-axis) of regional representational strengths (x-axis) for tRSA (model estimates, dark) and cRSA (mean correlation values, light). Regions with significant representational strength greater than zero (q < 0.05) are highlighted in red. C) Scatter plot showing the correlation between regional representational strengths from tRSA (model estimates, y-axis) and cRSA (mean correlation values, x-axis). Regions with significant tRSA representation are highlighted in red, and those significant in cRSA are outlined in red. The representational strength measures are highly correlated across regions (r = 0.98, p < 0.001). D) Bar graph of t-statistics for each of the 26 regions of interest. Representation t-statistics (y-axis) from cRSA (light) and tRSA (dark) across 26 regions of interest (x-axis). Regions significant after FDR correction (q < 0.05) are highlighted in red. E) Brain regions with significant representation. Regions in light red were identified by the tRSA and cRSA. Regions in dark red were identified by the tRSA method only.

Representation regions identified by cRSA and tRSA.

Subject-level variance not accounted for by cRSA

The second goal of Experiment 2 was to demonstrate the utility of tRSA in accounting for the heterogeneity in subject-level variance. In the main Memory Retrieval task, experimental trials were categorized as “Hit” or “Miss” based on participants’ memory success, and the exact number of trials in each condition varied widely across participants (Figure 8A, Bottom).

Unmodeled within- and between-participant variance in cRSA for mnemonic representations.
A) Top: Visualization of the main memory task. Participant recognition and judgement confidence was tested for 144 concepts (114 “old”, 30 “new”). **Bottom:** Stacked bar plots with the number of “Hit” (blue) and “Miss” (gray) trials (y- axes) for each participant (x-axis). Participants were sorted by their hit to miss ratio in descending order. B) Density plots of cRSA representational strength (y-axes) across the 26 ROIs (x-axes) for Hits (blue) and Misses (gray) for each participant, sorted by descending Hit-to-Miss count ratio. The Hit-to-Miss ratios were displayed in the top-left corner of each subplot, with the font color transitioning from blue (indicating more Hits) to gray (indicating more Misses). C) Scatter plot illustrating the influence of the Hit-to-Miss ratio (x-axis) on the participant CVFs (y-axis). The Spearman’s correlation was significant, with a coefficient of 0.45 and a p-value of 0.016. Two participants are marked X and were excluded from this correlation as their Hit-to Miss ratio is less than 1. D) Density plots (y-axis) of cRSA representational strength across participants for Hits (blue) and Misses (gray). E) Regions exhibiting significant cRSA mnemonic representation (Hits > Misses). Regions highlighted in blue were significant after FDR correction with q < 0.05. Regions in gray were significant at p < 0.05 but did not survive FDR correction.

Importantly, the difference in the number of trials had a strong impact on the estimation of representational strength (see Figure 2). Participants with fewer trials in a given condition exhibited a noticeably wider distribution of representational strength values across ROIs in that condition, indicating that trial count was a substantial source of across-subject variance indeed (see Figure 8B). Moreover, trial count was also negatively associated with the standard deviation of representational strengths across ROIs, for both Hit trials (rho = -0.37, p = 0.043) and Miss trials (rho = -0.71, p < 0.001). However, this subject-level variance is often not accounted for in subsequent group-level t-test or ANOVA, which assumes that each participant contributes equally reliable estimates (i.e., homoscedasticity). Indeed, CVFs were much higher for participants with more extreme Hit to Miss trial counts (rho = 0.45, p = 0.016; see Figure 8C).

This issue of neglected heteroscedasticity remains when conditional contrasts (e.g., Hit minus Miss) are computed. Given the nonlinear relationship between trial counts and the reliability of correlations (see Figure 2), a more extreme trial imbalance (e.g., 100 Hits and 10 Misses, versus 70 Hits and 40 Misses) would also lead to a less reliable estimate of the difference. To demonstrate this issue in real fMRI data, we computed the CVF for each participant, measuring how much their Hit-Miss contrast in representational strength estimates deviated from the group mean. Indeed, CVFs were much higher for participants with more extreme Hit to Miss trial counts (rho = 0.45, p = 0.016; see Figure 8C). Critically, the two participants with Hit to Miss ratios < 1 were excluded from this correlation (since in those cases we would expect an increased CVF), but their data points are included in the corresponding figure for transparency.

Additionally, the distributions of representational strengths for Hits and Misses across participants were unequal. Since most participants had more Hit trials than Miss trials (Hit-to-Miss Ratio mean = 3.7, median = 3.1), the distribution of representational strength values was broader for misses compared to hits (Figure 8D). Ultimately, results from cRSA revealed significant mnemonic representation in one right LOC region and one right anterior ITC region, with two additional ITC regions showing trend-level significance (Table 2, Figure 8E); these outcomes were revisited in comparison to the tRSA approach below. Together, these results demonstrate that cRSA failed to model the across-subject variance in real fMRI data, which stemmed from the inevitable variability in trial count ratios affecting the reliability of each participant’s cRSA contrast estimates. The violation of statistical assumptions challenged the interpretability of standard t-tests in detecting representational regions.

Mnemonic representation regions identified by cRSA and tRSA.

Subject-level variance accounted for by tRSA

We also sought to assess how the tRSA approach would improve the estimates of representational strength for memory success. The mixed-effects models used to test tRSA mnemonic representations supporting retrieval identified four significant mnemonic representation regions: one in the right LOC and three in the right ITC (Table 2, Figure 9A, and Figure 9B). While the t-statistics for tRSA and cRSA showed a correlation across ROIs (Figure 9C), the tRSA approach, which produced trial-level estimates of representational strength, was able to detect mnemonic representation regions that the cRSA approach did not. This result demonstrated the enhanced sensitivity of tRSA in identifying subtle differences in representational strength. Moreover, a region in the LOC, initially considered a mnemonic representation region by the cRSA method — largely due to data from participants with low miss trial counts (Appendix 2 Table 4) — was correctly excluded from being classified as such by tRSA (Figure 9A, Figure 9B). This result highlighted tRSA’s improved specificity in the identification of true mnemonic representation regions by reducing potential biases introduced by variances in trial counts that went neglected by cRSA.

tRSA outperforms cRSA in detecting mnemonic representations.
A) Bar graph of t Statistics for each of the 26 regions of interest. Mnemonic representation t-statistics (y-axis) from cRSA (light) and tRSA (dark) across 26 regions of interest (x-axis). Regions that are significant after FDR correction (q < 0.05) are highlighted in blue. B) Brain regions with significant mnemonic representation identified by tRSA are shown in blue. The region that was significant in cRSA but not in tRSA is depicted in light blue. C) Scatter plot of t-statistics calculated from tRSA (y-axis) and cRSA (x-axis), showing a significant Pearson correlation (r = 0.49, p = 0.010). Significant tRSA regions are highlighted in blue. Significant cRSA regions are outlined in light blue.

Estimations of continuous modulators

Our final goal was to assess the utility of tRSA in testing hypotheses regarding continuous modulators of representational strength. Here, we focused on the effect of item memorability, a stimulus-level variable (see Figure 10A). The tRSA approach successfully detected continuous, stimulus-level modulation of representational strength during Object Perception, when participants were engaged in basic-level naming of color objects displayed on a white background; notably, no memory task was engaged before or during this phase of the experiment. Four regions within the LOC showed significant modulation by Perceptual Memorability, a measure based on average responses in the Perceptual Retrieval task (see Table 3 and Figure 10B). While not the explicit focus of this analysis, the analyses also identified significant effects of the nuisance variables, Conceptual Memorability, and reaction time in five regions of interest (Appendix 2 Table 6) within the LOC. These regions are often associated with processing the constituent categories or visual features of an object (Tyler et al., 2013), and consistent with the view that the memorability of images can be predicted by statistical properties of semantic features (Bainbridge, 2022; Hovhannisyan et al., 2021; Kramer et al., 2023).

tRSA captures stimulus-level modulations on representation.
A) Distribution of Item Memorability. Density plot of memorability values demonstrates a wide distribution of memorability scores (i.e., the average confidence of remembering a particular item). This stimulus-level variance is an underexplored target of representational analyses. B) Cortical representational effects of Memorability. Relationship between item-level Memorability and representational strength, as assessed with tRSA, for each of the 26 regions of interest. Trial-level representational strength estimates from Object Perception were significantly modulated by a continuous, stimulus- level measure of Item Memorability in four regions (blue). The t-statistics for this modulation were plotted (y-axis) across regions of interest (x-axis).

Summary of Experiment 2

Three key takeaways from Experiment 2 mirrored those from Experiment 1. First, while access to ground truths was not possible with empirical fMRI datasets, results from tRSA and cRSA approaches showed a relatively high degree of correspondence as before (Figure 7C).

Nonetheless, modeling trial-level estimates from tRSA showed improved sensitivity to neural representations in the Object Perception dataset, as the tRSA model was able to weigh study subjects with different numbers of trials appropriately (Figure 7D). Second, in the Memory Retrieval dataset, participants had unbalanced and heterogenous distributions of trial counts in two conditions (“Hits” and “Misses”), as expected with any psychological task focusing on subject performance (Figure 8A). The outcome of tRSA’s improved characterization can be seen in multiple empirical outcomes: tRSA was able to detect mnemonic representation regions that the cRSA approach missed (improved; sensitivity Figure 9B), and afforded more appropriate weighting of participant’s contrast estimates while maintaining similar outcome statistics as cRSA (Figure 9C). Third, tRSA supported the investigation of research questions that cannot be readily addressed with cRSA, namely the trial- or stimulus-level modulations of representational strength. We demonstrated this point using Item Memorability — a stable stimulus property that is thought to contribute to memory independently of task context, experimental context, or individual differences. Representational strength in several regions in the ventral stream varied continuously with stimulus-level memorability (Figure 10B). Moreover, model-fits became the strongest when the models also incorporated trial-level variables such as fMRI run and reaction time. These findings showed that the representational strength estimates produced by tRSA indeed captured the multi-level variance structure in the data and that tRSA can be implemented to study item-specific or trial-level modulators of representational strength effectively.

Discussion

This paper formally presents tRSA — a novel technique for evaluating the strength of neural representation at the level of individual experimental trials. The performance of tRSA was evaluated and compared to that of cRSA using both simulations (Experiment 1) and empirical fMRI datasets (Experiment 2). Three principal insights can be drawn from these analyses. First, tRSA produces highly similar estimates of overall representational strength as cRSA such this new approach comes with little cost in efficacy. Second, tRSA is robust to subject-level variances that would be neglected by cRSA, namely the trial count differences across subjects. With both simulations and empirical datasets, tRSA demonstrated enhanced sensitivity over cRSA in detecting true effects. Third, tRSA provides an opportunity to examine continuous modulators of representational strength with decent specificity and sensitivity. With the empirical fMRI dataset, we demonstrated that tRSA was able to detect significant associations between representational strength and item-level memorability, as well as trial-level nuisance variables. Below we discuss the methodological and conceptual implications of the tRSA approach for the field of cognitive neuroscience and beyond.

Similarities between cRSA and tRSA

Since the advent of cRSA, the method has been widely implemented in studies of neural representations with great success (Dimsdale-Zucker & Ranganath, 2018; Kriegeskorte et al., 2008). Our first and foremost objective was to ensure that our novel tRSA technique produces estimates of representational strength that are comparable to cRSA. This objective was achieved in both Experiment 1 (simulations) and Experiment 2 (real fMRI data). Simulations suggested that tRSA and cRSA estimates reacted in very similar ways to manipulations of the number of subjects, trial counts, and noise level. While some numerical differences were observed when tRSA was computed across different conditions and compared to within-condition cRSA, the range of divergences remained small and did not adversely impact subsequent statistical testing. Analyses with real fMRI datasets corroborated their correspondence, showing highly similar distributions of estimated representational strength across the brain. These results confirm that our novel tRSA approach performs at least no worse than cRSA. In other words, researchers can confidently replace their existing cRSA analyses with the tRSA framework, not having to be concerned about the loss of information or inferior statistical performance. In general, our analyses with both simulations and real fMRI datasets demonstrated several important advantages of tRSA over cRSA in a wide range of scenarios.

Methodological significance

Statistically, tRSA is designed to properly capture the multi-level variance structure in the data, yielding improved specificity and sensitivity. An important motivation of ours for devising this novel tRSA technique accords with a steady emergence in many scientific fields that advocates for proper treatments of the multi-level variance structure in the data (Dedrick et al., 2009; Winter & Grice, 2021; Yarkoni, 2022). As laid out in the Introduction, we argue that (at least) four independent sources of variance contribute to how strongly a given neural system represents a single event (e.g., looking at an image of “basketball”): condition-level, subject-level, stimulus-level, and trial-level.

Subject-level variance was a major focus of analysis in both Experiments. Research subjects are samples (ideally, random and representative) drawn from some population of interest. Barring quantifiable individual differences that may be relevant to the cognitive functions being studied (e.g., age, education, personality traits), subjects are considered to vary randomly around the population-level mean, which is often the focus of analysis. In cRSA, subject-level estimates of representational strength or condition differences are typically entered into a general linear model where homoscedasticity is assumed (e.g., t-test). However, this assumption is easily violated due to heterogeneity in subject-level trial counts, resulting in unreliable model outputs. Indeed, artificially induced variability in trial counts led to catastrophic reductions in the sensitivity of cRSA (Experiment 1). Empirical studies that select trials based on subjective performance or expect to have extensive missingness or exclusions must be wary of this problem, as we demonstrated the same issue of cRSA using real fMRI data from an episodic memory study (Experiment 2). Critically, tRSA proved to be more robust to this subject-level heteroscedasticity issue in both Experiments, for trial count information was available to the subsequent models with subjects entered as random factors (Yarkoni, 2022).

Stimulus-level variance was another focus of our investigations. A major advantage of RSA over multivariate pattern classification techniques is that it allows the use of “condition-rich” experimental designs with large and diverse collections of stimuli, so long as meaningful hypotheses can be constructed in the form of an RSM_model (Kriegeskorte et al., 2008). For instance, using an RSM_model based on human similarity judgments, past work has implemented RSA on neuroimaging data collected with nearly two thousand unique natural object images (THINGS database; Hebart et al., 2023). Various stimulus properties have also been the target of neural representation research, such as the real-world size of objects (T. Huang et al., 2022), the geographical location of landmarks (Morton et al., 2021), or more abstractly, visual and semantic features of naturalistic images (Devereux et al., 2018; Jozwik et al., 2023; Naspi et al., 2021). Despite such prevalent appreciation for the neurocognitive relevance of stimulus properties, cRSA does not account for the fact that the same stimulus (e.g., “basketball”) is seen by multiple subjects and produces statistically dependent data. With tRSA, this issue is at least partially addressed by modeling stimulus identity as random effects, resulting in generalizability and reliability (Chen et al., 2021; Yarkoni, 2022). Admittedly, a complete resolution of this stimulus-as-fixed-effect fallacy would require stimulus information to be incorporated in any preceding procedures, including first-level models of fMRI data (Westfall et al., 2017); while beyond the scope of the current paper, these relevant analytical pipelines may be integrated for a full solution of the fallacy. Furthermore, as demonstrated in Experiment 2, tRSA supports the straightforward investigation of how representational strength varies with stimulus-level properties such as memorability. With the flexibility of tRSA, future studies may freely explore other interesting stimulus-level research questions.

Of note, in the current paper, we only examined experimental designs where each unique stimulus is presented once, i.e., no repetitions. Repetitions are sometimes included as targets to which participants should respond, in order to promote engagement level (Allen et al., 2022; Chang et al., 2019; Lahner et al., 2024). Additionally, some preprocessing or modeling procedures use repetitions to improve the accuracy of their estimates of neural activity responses and representations (Charest et al., 2018; Prince et al., 2022), though the effectiveness of this approach may depend on various other factors (Ritchie et al., 2021).

Importantly, repetitions have been found to robustly induce neural activity adaptation or repetition suppression in task-relevant brain regions (Barron et al., 2016; Grill-Spector et al., 2006), and this phenomenon has profound impacts on multi-voxel activity patterns as well (Mazurchuk et al., 2023). While beyond the scope of the current paper, we briefly mention two possible treatments of stimulus repetitions in implementing tRSA. For one, repetitions can be collapsed in first-level models of brain recordings for more reliable estimations of stimulus-specific neural activations; in this case, any subsequent analyses including tRSA can simply proceed as if individual stimuli are only presented once. Alternatively, repetitions can also be kept as separate events for which individual tRSA estimates are assigned; this option allows one to probe the stability of neural representations across time and repetitions. In either case, our tRSA approach can flexibly suit the various goals of research.

Trial-level variance refers to the effect on the estimations of neural representations by factors that vary from trial to trial. For example, attention level fluctuates during the task and changes the fidelity of neural representations (Aly & Turk-Browne, 2016; Rothlein et al., 2018). Like stimulus-level variance, trial-level variance is not accounted for by cRSA as the comparison of RSMs collapses across trials. In our simulations, we demonstrated that tRSA can indeed capture trial-level variance and reliably estimate its effect on representational strength (Experiment 1). In a real fMRI dataset, we found that reaction time, a trial-level nuisance variable, also had statistically significant effects on representational strength (Experiment 2). In addition, past research has found that pure physiological changes in pupillary size, cardiac rhythm, and respiration may also influence both neural and cognitive processes (Critchley & Garfinkel, 2018; van der Wel & van Steenbergen, 2018). While a portion of those effects could be due to the incomplete removal of artifacts during signal preprocessing, some physiological changes may also have intriguing neurocognitive relevance that is worth exploring in future studies.

Conceptual significance

Fundamentally, approaching RSA at the level of individual trials is an advancement in terms of not only methodology (discussed above) but also conceptualization. The notion of neural representation, albeit used more broadly than the phrase is intended to, emphasizes the link between the representee (contents being represented, such as stimuli or memories) and its representative (alternative format of the content, such as neural activity) (Baker et al., 2022; Favela & Machery, 2023; Vilarroya, 2017). Collapsing trial-level information in its formulation, cRSA could only indicate the quality of a kind of representation or the overall representational strength across events. Instead, tRSA provides a measure that directly corresponds to the conceptualization of representation, with each trial or event receiving its own estimated representational strength. This level of analysis is consistent with previous studies that focused on quantifying the similarity of multi-voxel activity patterns across different trials (S. Huang, Faul, et al., 2024; Wing et al., 2020; Yu et al., 2024) and at different stages of memory (Ritchey et al., 2013; Shao et al., 2023). As such, tRSA provides a versatile tool for inquiring about many different brain-behavior relationships at an appropriate level of analysis and with straightforward interpretations (Becker et al., 2024; Howard et al., 2024; Morales-Torres et al., 2024; Naspi et al., 2023; Pacheco-Estefan et al., 2024). Notably, tRSA is not limited to neuroimaging data; past research has also implemented this approach to perform item-specific analyses on behavioral data (Walsh & Rissman, 2023).

At a broader level, the tRSA approach also engenders a host of other innovative techniques. The emerging research on representational connectivity focuses on the intersection of representational information and functional connectivity analyses, and seeks to uncover new multivariate views on how concepts are distributed across the brain (Anzellotti & Coutanche, 2018). Using the tRSA approach, trial-level representational strength values from any pair of brain regions may be correlated to yield their model-based representational connectivity (S. Huang, De Brigard, et al., 2024; Karimi-Rouzbahani et al., 2022). Trial-level representational strength values may also be combined with other trial-level measures of neural activity such as univariate activation to probe diverse inter-regional interactions that are theoretically motivated (S. Huang, Howard, et al., 2024).

Conclusion

We present a comprehensive overview and diagnostics of a trial-level approach to representational similarity analysis, termed tRSA. With both simulations and real fMRI datasets, we have demonstrated that tRSA properly captures the multi-level variance structure in the data, shows improved and robust specificity and sensitivity, and offers flexible modeling options. We believe that tRSA is an important advancement of the generic RSA implementation, both methodologically and conceptually, and we hope that this innovation provides a versatile and useful tool for digging deeper into the fascinating complexity of cognition.

Acknowledgements

This study was supported by the National Institute of Health grant numbers: R01-AG066901, R21-AG058161, and R01-AG075417.

Appendix 1. Comparing across-condition and within-condition tRSA

Given discrete experimental conditions, tRSA could be formulated either within each condition or across all conditions. In within-condition tRSA, one would generate condition-specific RSMs, e.g., one set of n_A × n_A RSMs for Condition A and another set of n_B × n_B RSMs for Condition B, and compute the two sets of within-condition tRSA values separately. These tRSA estimates are then subjected to subsequent analysis. In across-condition tRSA, RSMs are generated across all trials in all conditions, and tRSA estimates are computed as if there were no multiple conditions. Then, tRSA estimates are categorized based on the condition they belong to in any subsequent analysis.

We advocate for the use of across-condition tRSA for its enhanced statistical reliability over within-condition tRSA. One major limitation of within-condition tRSA is its vulnerability to small and variable trial counts. Suppose that a condition contains n trials, cRSA would use n(n − 1)/2 observations to compute its estimate, whereas within-condition tRSA could only use n − 1 observations. This effective sample size could turn out to be fairly small and variable in practice (see Figure 8A, Bottom), resulting in unreliable estimates (see Figure 2). On the contrary, across-condition tRSA suffers much less from this issue, since most experiments should have been designed to have an adequate total trial count.

Another theoretical advantage of across-condition tRSA is that it better corresponds to tRSA_continuous, i.e., analyses in which trial-level representational strength estimates are hypothesized to be modulated by some continuous variable such as memorability or reaction time. In both across-condition tRSA and tRSA_continuous, full RSMs are used to estimate representational strength before any discrete condition effect or continuous modulation effect is assessed.

Appendix 1 Figure 11 below summarizes the differences between within-condition and across-condition tRSA outcomes in a series of simulations. Notably, while within-condition tRSA estimates showed better numerical correspondence with cRSA estimates, they are less reliable than across-condition tRSA, as evidenced by the wider spread of estimates.

Comparing across-condition and within-condition tRSA.
Each cell depicts descriptive statistics of different RSA methods from 10,000 simulations. Cells differ in raw trial counts (20, 80, or 320 in a given condition), balance of trial counts between conditions (balanced or unbalanced), and effect (A = B, A > B, or A < B). In each cell, the top scatter plots depict condition-level tRSA estimates (y-axis; blue, within-condition; orange, across-condition) against cRSA estimates (x-axis), whereas the bottom scatter plots depict the standard deviations of trial-level representational strength estimates from across-condition tRSA (y-axis) and within-condition tRSA (x-axis).

Appendix 2. Supporting data and results for Experiment 2

Significance of findings

Strength of evidence

Abstract

Introduction

Representational Similarity Analysis (RSA).

Trial count influences the reliability of correlations.

Methods

Experiment 1: cRSA and tRSA comparisons in simulated data

Estimating overall representational strength

Statistical inferences from tRSA and cRSA

Modeling discrete conditions

Modeling continuous modulators

Experiment 2: cRSA and tRSA comparisons in fMRI data

Data acquisition

Participants

Experimental design

MRI data acquisition

Data analysis

MRI data preprocessing

Single-trial modeling

Behavioral Analyses

Neural Representational Similarity Matrices

Model Representational Similarity Matrix

Classic Representational Similarity Analysis (cRSA)

Trial-Level Representational Similarity Analysis (tRSA)

Examining continuous modulators of representation

Results

Experiment 1: cRSA and tRSA comparisons in simulated data

Basic statistical properties of tRSA estimates

Correspondence between classic and trial-level RSA.

Discrete conditions

Contrasting conditions

Testing condition differences in representational strength with varying effect sizes.

Testing condition differences in representational strength with varying designs.

Continuously varying effects

Testing continuous modulations of representational strength with varying designs.

Summary of Experiment 1

Experiment 2: cRSA and tRSA comparison in fMRI data

Trial-level RSA estimates for Perceptual and Mnemonic Processing

tRSA outperforms cRSA in identifying representational regions during object perception.

Representation regions identified by cRSA and tRSA.

Subject-level variance not accounted for by cRSA

Unmodeled within- and between-participant variance in cRSA for mnemonic representations.

Mnemonic representation regions identified by cRSA and tRSA.

Subject-level variance accounted for by tRSA

tRSA outperforms cRSA in detecting mnemonic representations.

Estimations of continuous modulators

tRSA captures stimulus-level modulations on representation.

Continuous modulation in tRSA.

Summary of Experiment 2

Discussion

Similarities between cRSA and tRSA

Methodological significance

Conceptual significance

Conclusion

Acknowledgements

Appendix 1. Comparing across-condition and within-condition tRSA

Comparing across-condition and within-condition tRSA.

Appendix 2. Supporting data and results for Experiment 2

Behavioral results. Bold indicates participants excluded from Conceptual Retrieval Analyses.

Model selection using 3 criteria: Akaike information criterion (AIC), Bayesian information criterion (BIC), and log-liklihood ratio testing (LRT).

Significant nuisance effects in memorability model.

References

Article and author information

Author information

Shenyang Huang*

Cortney M Howard*

Paul C Bogdan

Ricardo Morales-Torres

Matthew Slayton

Roberto Cabeza

Simon W Davis

Author Notes

Version history

Cite all versions

Copyright

Metrics

Shenyang Huang

Cortney M Howard