Figures and data

(A) A visual depiction contrasting how perceptual processing is thought to be different in autism across the two prominent theories of WCC and EPF. (B) An illustration comparing visual sensory dominance (the Colavita effect) with decreased visual dominance or auditory sensory dominance (a reverse Colavita effect), which has been observed in autism.

Table of preregistered hypotheses (accessible at osf.io/h92gr and osf.io/47kj6).

(A) Visual vs audio perceptual preference (WV − WA) for all participants (ASD and nonASD) across visual (red), audiovisual (purple), and auditory (blue) ROIs. Perceptual preference is calculated by taking the difference of visual and audio stacked model weights. Perceptual preference is in the range -1 to 1 as stacked encoding model weights range from 0 to 1. (B) Whole-brain plot of mean visual vs audio perceptual preference (WV − WA) across all participants (ASD and nonASD). This metric is the same as in A, but now plotted at each grayordinate and colored on a scale of visual (red) to audio (blue) with purple indicating regions with a relative balance between visual and audio feature weights.

Heatmaps of fixed-effect coefficients for (A) diagnostic group (Dx: nonASD vs. ASD), (B) SRS and (C) sensory subset score (SSS), across 40%, 60% and 80% FD thresholds (left to right within each cortical parcel column).
Within each of the three horizontal panels, rows denote encoding model-derived metrics (low-level Ru2, high-level Ru2, and their preference index (WH − WL) and columns denote Glasser ROIs ordered from early visual areas through association cortex and then auditory areas. Color indicates the magnitude and sign of the coefficient (purple=negative effect with ASD>nonASD; green=positive effect with ASD<nonASD). Asterisks (∗) mark FDR-corrected significance at q < .05; open circles (◦) mark uncorrected p < .05. The visual modality encoding models are coded with red text and the auditory with blue. Low-level visual Ru2 was significantly different across diagnosis in STSvp and low-level audio Ru2 was not significant anywhere (A; contradicting H1.1). Visual though not audio WH − WL was significantly different in STSvp and STSdp (A; supporting H1.2). Visual but not audio WH − WL was significantly related to SRS in STSdp at the 40% threshold (B; H1.3) but not SSS (C). Generally, results were often consistent across the three FD thresholds.

(A) Box plot of audio stacked encoding model perceptual preference across ASD (white) and nonASD (gray) groups for all perceptual ROIs. (B) Corresponding box plot of the visual stacked encoding models. Results correspond to the 40% FD threshold. Boxes annotated with an asterisk indicate a significant group difference (FDR q<0.05) while a circle indicates an initially significant difference between groups that did not survive FDR correction. Boxes show the quartiles of the dataset and whiskers show the distribution with the exception of outliers. Each dot is the mean perceptual preference from a single subject from all statistically significant grayordinates within each ROI.

Cortical surface maps showing visual encoding model-derived fixed-effect coefficients for diagnostic group comparisons (ASD vs. nonASD) at the 40% FD threshold.
Colors represent coefficient magnitude and direction (pink: negative effect, ASD > nonASD; green: positive effect, ASD < nonASD). Asterisks denote significance after FDR correction (*: FDR q < 0.05, **: FDR q < 0.005).

Table of post-hoc whole-brain significance results.
Only significant regions and features (after FDR correction) are included in the table. In each column, results from FD thresholds of 40%, 60%, and 80% are separated by commas. VL=low-level visual, VH=high-level visual, AH=high-level audio, WV L=low-level visual model weight, WV H =high-level visual model weight, R2=explained variance, Ru2=unique explained variance, -=q>0.05, *=q<0.05, **=q<0.005

Perceptual regions of interest ADHD subgroup significance summary.
All effects listed here are between the nonASD and ADHD+ASD subgroups, as there were no significant differences between nonASD and ADHD-ASD and ADHD+ASD and ADHD-ASD. Only the three regions with significant effects after FDR correction are included. In each column, the results from FD thresholds of 40%, 60% and 80% are separated by columns. VL=low-level visual, VH=high-level visual

Visual encoding model metrics across left and right pSTS subregions.
(A) Low-level visual unique explained variance (Ru2) for ASD (white) and nonASD (gray) groups in left and right STSvp and STSdp regions. (B) High-level visual Ru2 across the same groups and regions. (C) High- vs. low-level perceptual preference (derived from the stacked visual encoding models), where positive values indicate a high-level preference, negative values indicate a low-level preference, and values near zero indicate no preference. (D) Low-level visual Ru2 for ASD-ADHD (white), ASD+ADHD (light gray), and nonASD (gray) groups across left and right STSvp and STSdp regions. (E) High-level visual Ru2 across groups in the same regions. (F) Stacked encoding model visual perceptual preference metrics (high- vs. low-level); positive values indicate a preference for high-level features, negative values indicate a low-level preference, and values near zero indicate no strong perceptual preference. All results are from the 40% FD threshold. Boxes annotated with an asterisk (*) indicate significant group differences (FDR-corrected p < 0.05); circles indicate nominal significance prior to FDR correction (uncorrected p < 0.05). Boxes represent quartiles; whiskers indicate data range excluding outliers. Individual dots represent mean metrics per subject, averaged across statistically significant grayordinates within each left or right ROI. Pairwise significance tests were conducted between all groups; annotations indicate significant differences, all of which occurred between the ASD-ADHD and nonASD subgroups.

Audio vs. visual modality preference.
Higher values indicate an audio preference and lower values indicate a visual preference. These results correspond with the 40% FD threshold. Boxes annotated with circles indicate a difference between groups that did not survive FDR correction (uncorrected p<0.05 but FDR q>0.05). Boxes show the quartiles of the dataset and whiskers show the distribution with the exception of outliers. Each dot is the mean value from a single subject from all statistically significant grayordinates within each ROI.

Heatmaps of fixed-effect coefficients from exploratory parcel-wise linear mixed models examining (A) main effects of age, (B) age-by-diagnosis (ASD vs. nonASD), and (C) age-by-SRS interactions.
Within each horizontal panel, rows represent encoding model-derived metrics (visual R2 and Ru2, audio R2 and Ru2, and their perceptual preference index WV − WA), while columns represent Glasser MMP parcels ordered roughly from early visual through higher-order association to auditory areas. Each parcel is further subdivided by FD thresholds (40%, 60%, 80%, left to right). Color indicates magnitude and direction of coefficients (green: positive effects; pink: negative effects). In panel A, green indicates encoding metrics increase with age, pink indicates metrics decrease with age. In panel B (Age × Diagnosis), green indicates the age-related slope is more positive in nonASD (greater increases with age), whereas pink indicates a more positive age-related slope in ASD. Asterisks mark FDR-corrected significance (∗= q < 0.05,∗∗= q < 0.005,∗∗∗= q < 0.0005); open circles mark uncorrected significance (p < 0.05). Visual metrics are labeled in red; auditory metrics in blue.

Significance of Age effects on visual, audio, and audio–visual encoding model metrics across all left and right hemisphere, cortical and subcortical parcels at the whole-brain level.

Head motion vs. age and behavioral measures.
Scatterplots show the fraction of volumes with framewise displacement > 0.2 mm in relation to (A) age, (B) SRS Total T-score, and (C) Sensory Subset Score (SSS). Solid black lines denote linear fits. Reported coefficients are Spearman correlations (ρ); “partial ρ” controls for age. (A) ρ = −0.40; (B) ρ = −0.01, partial ρ = 0.03; (C) ρ = 0.06, partial ρ = 0.08. Motion decreases with age and shows only weak associations with SRS/SSS.

A flow chart illustrating how participants were selected for the final sample including the number that were excluded at each stage.

Descriptive statistics of demographic and clinical variables by diagnostic group.
Continuous measures (Age, WISC–FSIQ, Barratt Total, SRS Total T, SSS) are reported along with their mean, SEM, minimum–maximum range in parentheses. “N” indicates the number of non-missing observations. Sex shows counts of males and females. P-values are FDR-corrected (Benjamini–Hochberg, α=0.05).

Brain regions of interest with labels selected from the MMP span the visual and auditory systems and the dorsal, ventral, and lateral streams.
The schematic shows all perceptual regions of interest grouped by their classification from the MMP and with perceptual streams labeled and some simplified connections illustrated. Note that the actual connectivity between these regions is known to be far more complex, for vision, see Felleman and Van Essen (1991).

Schematic diagram illustrating the stacked encoding model approach used to relate movie stimulus features to brain activation measured via fMRI in the case of the audio and visual stacked model.
From left to right: audio and visual features are extracted from a naturalistic movie stimulus. Each feature set independently informs its own ridge regression encoding model, which predicts brain activation (R2) on held-out data. The audio and visual ridge regression models are subsequently combined into a stacked regression model, assigning each model a weight (α1, α2). The performance of the stacked model is measured by predicting overall brain activation (R2). The difference between stacked model weights (α2 − α1) quantifies the relative perceptual preference toward audio or visual features.

Overall model performance across the cortex.
(A) Whole-brain plot of the percentage of subjects with a significant grayordinate at each region. The significance of each grayordinate was tested via a null model by repeatedly temporally permuting the order of observations and retraining and testing the models over 1,000 permutations. (B) The Spearman-Brown corrected split-half noise ceilings for each MMP parcel.

(A) Model performance (R2) for the visual (red), audio (blue), and stacked (purple) encoding models across perceptual regions of interest for all included participants (both ASD and nonASD). (B) Visual (red) and audio (blue) model weights from the stacked encoding model.

(A) Model performance (R2) for the visual low- (light red) and high-level (dark red) and stacked (grey) encoding models across all nonASD and ASD participants. (A) Corresponding R2 for audio models. (C) Model weights from the stacked visual encoding model. (D) Model weights from the stacked audio encoding model. (E), (F) High- vs. low-level visual and audio perceptual preferences (WH − WL) are calculated by taking the difference of high- and low-level weights (shown in B and C). Perceptual preference is in the range of -1 to 1, as the stacked encoding model weights range from 0 to 1. Positive values here indicate a high-level preference, negative values indicate a low-level preference, and values around zero indicate no preference.

(A) Whole-brain grayordinate-wise plot of mean high- vs. low-level perceptual preference across all participants (ASD and nonASD). (B) The same corresponding plot but from the audio stacked encoding model.

(A) Box plot of low-level audio encoding model explained variance (R2) for ASD (white) and nonASD (gray) groups across all perceptual ROIs. (B) Box plot of the unique explained variance (Ru2). The results correspond to the 40% FD threshold. Boxes annotated with circles indicate an initially significant difference between groups that did not survive FDR correction. Boxes show the quartiles of the dataset, and whiskers show the distribution, with the exception of outliers. Each dot is the mean R2 from a single subject from all statistically significant grayordinates within each ROI.

(A) Box plot of low-level visual encoding model R2 for ASD (white) and nonASD (gray) groups across all perceptual ROIs. (B) Corresponding box plot of Ru2. Results correspond to the 40% FD threshold. Boxes annotated with an asterisk indicate a significant group difference (FDR p<0.05), while a circle indicates an initially significant difference between groups that did not survive FDR correction. Boxes show the quartiles of the dataset and whiskers show the distribution with the exception of outliers. Each dot is the mean R2 from a single subject from all statistically significant grayordinates within each ROI.

(A) Box plot of high-level audio encoding model R2 for ASD (white) and nonASD (gray) groups across all perceptual ROIs. (B) Corresponding box plot of Ru2. Results correspond to the 40% FD threshold. Boxes annotated with an asterisk indicate a significant group difference (FDR p<0.05), while a circle indicates an initially significant difference between groups that did not survive FDR correction. Boxes show the quartiles of the dataset and whiskers show the distribution with the exception of outliers. Each dot is the mean R2 from a single subject from all statistically significant grayordinates within each ROI.

(A) Box plot of high-level visual encoding model R2 for ASD (white) and nonASD (gray) groups across all perceptual ROIs. (B) Corresponding box plot of Ru2. Results correspond to the 40% FD threshold. Boxes annotated with an asterisk indicate a significant group difference (FDR p<0.05), while a circle indicates an initially significant difference between groups that did not survive FDR correction. Boxes show the quartiles of the dataset and whiskers show the distribution with the exception of outliers. Each dot is the mean R2 from a single subject from all statistically significant grayordinates within each ROI.

(A) Box plot of audio encoding model Ru2 for ASD (white) and nonASD (gray) groups across all perceptual ROIs. (B) The same but for visual Ru2. Results correspond with the 40% FD threshold. Boxes annotated with circles indicate a difference between groups that did not survive FDR correction (uncorrected p<0.05, FDR q>0.05). Boxes show the quartiles of the dataset and whiskers show the distribution with the exception of outliers. Each dot is the mean value from a single subject from all statistically significant grayordinates within each ROI.

(A) Box plot of audio encoding model R2 for ASD (white) and nonASD (gray) groups across all perceptual ROIs. (B) The same but for visual R2. These results correspond with the 40% FD threshold. Boxes annotated with circles indicate a difference between groups that did not survive FDR correction (uncorrected p<0.05, FDR q>0.05). Boxes show the quartiles of the dataset and whiskers show the distribution with the exception of outliers. Each dot is the mean value from a single subject from all statistically significant grayordinates within each ROI.

Heatmaps of fixed-effect coefficients for (A) diagnostic group (Dx: nonASD vs. ASD), (B) SRS Total T-Score and (C) sensory subset score (SSS), across 40%, 60% and 80% FD thresholds (left to right within each cortical parcel column).
Within each of the three horizontal panels, rows denote encoding model-derived metrics (visual R2 and Ru2, audio R2 and Ru2, and their preference index (WV − WA) and columns denote Glasser MMP ROIs ordered left to right from early visual areas through association cortices followed by auditory areas. Color indicates the magnitude and sign of the coefficient (pink=negative effect with ASD>nonASD; green=positive effect with ASD<nonASD). Asterisks mark FDR-corrected significance at q < 0.05; open circles mark uncorrected p < 0.05. The visual modality encoding models are labeled in red and the auditory in blue.

Scatter plots showing the relationship between age and perceptual preference in three example cortical regions (A5, V3A, and IFSp) where significant effects were observed across all participants.
Perceptual preference values above zero indicate an auditory preference, and values below zero indicate a visual preference. Although no significant age-by-diagnosis interactions are displayed here, autistic (ASD; white) and non-autistic (nonASD; gray) groups are colored separately for clarity. Lines show model fits for each group colored correspondingly, with estimated age-related slopes (βAge) reported for ASD, nonASD, and the overall sample (black dashed line).

Scatter plots showing the relationship between age and perceptual preference in two example lateralized cortical regions, Right LO3 (a visual region between early visual cortex and MT+) and Left DVT (Dorsal Visual Transitional area; a region located on the posterior bank of the parieto-occipital sulcus), where significant effects were observed across all participants at the whole-brain level.
Perceptual preference values above zero indicate an auditory preference, and values below zero indicate a visual preference. Although no significant age-by-diagnosis interactions are displayed here, autistic (ASD; white) and non-autistic (nonASD; gray) groups are colored separately for clarity. Lines show model fits for each group, with estimated age-related slopes (βAge) reported for ASD, nonASD, and the overall sample (black dashed line).

Scatter plot showing the relationship between age and visual R2 in perceptual region VMV3 where a significant age:diagnosis interaction was observed.
Autistic (ASD; white) and non-autistic (nonASD; gray) diagnostic groups are displayed. Lines show model fits for each group, with estimated age-related slopes (βAge) reported for ASD, nonASD, and the overall sample (black dashed line).