Behavioral results.

A. Total number of responses (image judgments) if face is real (HR) and if face is fake (HF) for each of the 34 participants. Two subgroups for preference emerged for pressing real (21 participants; black dots and lines) and pressing fake (13 participants; gray dots and lines). Distributions of responses of the two subgroups are shown on the sides of the panel (dark gray shaded area, preference real; light gray shaded area, preference fake). B. Mean response times per participant of both groups for real (HR) and fake (HF) image judgments; participants were distributed in terms of their overall response time, with a faster response time for the most frequent judgment. Distributions (gray shaded areas) of response times are shown next to individual responses. HF, Human Fake; HR, Human Real.

Grand average event-related potentials (ERPs).

Time series of ERPs at Pz time-locked to the event when the AI classification icon (robot icon) is displayed on screen (time 0 ms) in the mismatch (red) and match (blue) conditions. At −300 ms participants made their judgment (fake or real key press). The match condition contains HRAIR and HFAIF, the mismatch condition contains HRAIF and HFAIR. The mean ERP trace is computed as the 20% trimmed mean of the mean participant-level ERP data, shaded areas denote the 95% Bayesian Highest Density Interval (see text for details). HRAIR, Human-Real & AI-Real; HFAIF, Human-Fake & AI-Fake; HRAIF, Human-Real & AI-Fake; HFAIR, Human-Fake & AI-Real.

Mismatched vs. matched AI classification.

A. ANOVA results comparing mismatch and match conditions, shown as a spatiotemporal heat map of significant F-values (red). Channels are sorted by proximity; Pz is marked and examined in panels B and C. Topographic maps are shown at 296, 460, and 700 ms after AI classification display. Electrodes in significant clusters are marked white; non-significant ones in black. Results are corrected using spatiotemporal clustering (α = 5%). B. Mismatch–match ERP time series (beta difference) at Pz, time-locked to AI classification (0 ms). Significant time points (from A) are marked in gray. Topographies at 296, 460, and 700 ms correspond to the N2, P3a, and P3b components. Curves show 20% trimmed means; shaded areas denote 95% Bayesian Highest Density Intervals (HDI). C. Beta time series at Pz for mismatch (red) and match (blue) conditions. Topographies for each condition are shown at the same three latencies, with Pz marked in red (mismatch) and blue (match). Data are shown as 20% trimmed means with 95% Bayesian HDIs.

Interaction effects of AI classification (mismatch, match) and human judgment of images (fake, real).

A. Interaction effect shown as heat map over channels and time points indicating significant F-values. Pz is marked for further description of the beta difference time courses in panels B and C. Topological distributions are presented at 360 and at 484 ms. Electrodes belonging to significant clusters are marked white; non-significant electrodes are marked black. B. Time series of beta mismatch-match difference wave when human judged an image as real at channel Pz with significant time points (as determined by the ANOVA results shown in panel A) marked in gray along the x-axis and topological distributions at 360 and 484 ms. Beta difference time series are presented as the 20% trimmed mean, shaded areas denote the 95% Bayesian Highest Density Interval. C. Time series of difference beta (mismatch-match) when human judged an image as fake at channel Pz, with significant time points marked in gray along the x-axis and topological distributions at same latencies. HF, Human Fake; HR, Human Real.

Human judgment of images (fake versus real).

A. ANOVA statistical summary comparing human image judgment as fake or real shown as heat map indicating significant F-values in red over channels and time points. Pz is marked for further investigation of the beta time courses in panels B and C. Topological distributions are presented at −508 ms relative to AI classification. Electrodes belonging to significant clusters are marked white; non-significant electrodes are marked black; results are corrected with spatiotemporal clustering at an alpha level of 5%. B. Time series of difference beta for human judgment fake-real Pz, with time points belonging to the significant cluster marked in gray along the x-axis, and topological distribution at −508 ms. Beta time series are presented as the 20% trimmed mean, shaded areas denote the 95% Bayesian Highest Density Interval. C. Time series of betas at Pz for the human fake (red) and human real (blue) judgment conditions. Beta time series are presented as the 20% trimmed mean; shaded areas denote the 95% Bayesian Highest Density Interval. CPP, centro-parietal positivity; HF, Human Fake; HR, Human Real.

EEG amplitude co-varies with changes in perceived AI reliability

A. Reliability ratings (in %) across all blocks of experiment of participant 1 (one rating after each block, including the practice block). B. Difference in reliability ratings between blocks (e.g., reliability rating of block 1 - rating of practice block) for participant 1. Positive differences are colored green, negative differences purple. C. Overview of differences in reliability ratings between blocks for all participants. The first row depicts results of participant 1 and is identical to the data shown in panel (B) presented with a color map. The color map quantifies the difference in reliability ratings between blocks. D. ANOVA outcome for covariance between changes in reliability ratings and ERP amplitude over participants. Results are shown as heat map indicating significant F-values in red over channels and time points. Topological distributions are presented at −520 ms (before AI classification). In the topoplot electrodes belonging to significant clusters are marked white; non-significant electrodes are marked black; results are corrected with spatiotemporal clustering at an alpha level of 5%.

Regression results of human response bias for judging images as fake or real and mismatch-match ERP amplitude.

A. F-statistic results from linear regression for the AI-classification-locked betas checking for correlations of human bias for judging images as real or fake and ERP (beta) amplitude of the mismatch-match difference wave. Results are shown as heat map indicating significant F-values in red over channels and time points. Channels Fz and Pz are highlighted for further investigation in panel B. Topological distributions are presented at four latencies (120 ms, 548 ms, 636 ms and 836 ms). Electrodes belonging to significant clusters are marked white; non-significant electrodes are marked black. Results are corrected with spatiotemporal clustering at an alpha level of 5%. Results are shown as heat map indicating significant F-values in red over channels and time points. Channels Fz and Pz are highlighted for further investigation in panel B. Topological distributions are presented at four latencies (120 ms, 548 ms, 636 ms and 836 ms). Electrodes belonging to significant clusters are marked white; non-significant electrodes are marked black. Results are corrected with spatiotemporal clustering at an alpha level of 5%. B. Correlations between bias for judging an image as fake and the amplitude of the difference mismatch-match beta at channel Fz (left) and Pz (right) at 636 ms. Gray dots represent participants, red line shows regression across participants, red shaded area the 95% CI. X-axis: values > 0 indicate a bias for fake, < 0 for real. Gray shaded areas at the edge of panels show the distribution of the response bias across participants (x-axis direction) and the distribution of the mismatch-match beta amplitude (y-axis direction).

Overview of the experimental design.

Trial structure (top), block structure (middle), experimental procedure (bottom). The trial structure represents one example when participants judged an image as real and the AI classification matched their judgment; see text for details on fake and mismatch conditions. The trial types are depicted top-right and are as followd: HRAIR, Human-Real & AI-Real; HFAIF, Human-Fake & AI-Fake; HRAIF, Human-Real & AI-Fake; HFAIR, Human-Fake & AI-Real.