Experimental Paradigm, Target Tone ERPs, and Analysis Procedures. (A) Tone Detection Task: Each trial presented participants with a 2-second white noise piece, after which they identified if a tone was heard. The target tone could appear at 500 ms, 1000 ms, or 1500 ms (indicated by colored boxes) or not at all. Two visual feedbacks were given: the first confirmed response accuracy, the second revealed the tone location. (B) Trial Portions and Target Tone Probability: The target tone was present in 75% of trials (‘tone trials’), with the remaining trials only featuring white noise (‘no-tone trials’). In tone trials, the target tone could appear at any of the three temporal locations, each with a 0.33 probability. (C) ERP Waveforms of the Cz Channel: Larger neural responses were evoked by the target tone at later temporal locations. (D) Hypotheses and Analysis Procedures for Sequential Temporal Anticipation: We hypothesized that participants would anticipate the temporal locations within the noise pieces to improve target tone detection. We expected to observe neural signatures reflecting anticipating processes for the temporal locations on a sub-second scale. To identify relevant neural components, we used a data-driven method, first performing a time-frequency decomposition on recorded EEG signals in no-tone trials to extract neural patterns in phase or power at each EEG frequency. We then conducted another fast Fourier transformation on the decomposed neural data to derive a modulation spectrum. After surrogate tests, we expected to find significant modulation clusters corresponding to the timescale of the temporal locations. We then filtered the EEG data according to each cluster’s spectral range for further analysis.

Exploring Power Modulation Spectrum (PMS) and Power Dynamics. (A) Displays a spectrogram of induced power from all no-tone trials. (B) Presents raw PMS (left plot) and significant clusters (right plot). The analysis was conducted on the EEG channel of Cz as previous research shows that anticipation-related neural signals can be largely captured by the Cz (see Methods). The y-axis signifies the frequency of induced power, or power frequency, while the x-axis represents the modulation frequency. In the right plot, colored bins indicate significant modulation bins with high modulation strength, as determined by a surrogate test. The four dashed boxes highlight the selected power and modulation frequency ranges for the clusters. Modulation frequencies larger than the Trend bound denote rhythmic neural components, while those smaller than Trend bound represent ramping neural components. Modulation bins below the Sampling bound violate the Nyquist theorem and are therefore not meaningful. (C) Topographies of significant modulation clusters for visualization, with green dots indicating significant EEG channels. (D) Dynamics of significant clusters, where the induced power was filtered according to the clusters selected in (B), showing the power dynamics of each cluster. (E) Merged clusters. Based on the power-power coupling between each pair of clusters (Fig. S2) and the power frequency overlaps, we grouped Cluster 2, 3, and 4 into one - Cluster 234. Cluster 1 and Cluster 234 collectively capture both the local temporal locations and the global elapsed time of the noise piece. (F) Corrected cluster 234, derived by removing the baseline of Cluster 234 from -1 s to 0 s. (G) Gaussian fits to each neural peak of Corrected cluster 234, providing peak latencies (mean of Gaussian fits) and peak widths (standard deviation). (H) Topography of modulation strength of Corrected cluster 234 from 0.4 s to 1.9 s post noise onset. (I) Peak magnitude within one standard deviation (SD) of the Gaussian fits. Each gray dot represents individual participant data. Shaded areas of color and error bars signify ±1 standard error of the mean across participants. * denotes p < 0.05.

Analyzing Power Modulation Correlation with Tone Detection Performance and Neural Power Precession. (A) Depicts power dynamics of Corrected cluster 234 in each block, with the colored vertical line representing the group mean of reaction time per block. (B) Illustrates peak magnitudes within one standard deviation (SD) of the Gaussian fits. A significant main effect of Temporal location is observed (p < 0.05), but not for Block number (p > 0.05). (C) Shows the correlation between d-prime values and neural peak magnitudes. Peak magnitudes of Corrected cluster 234, derived from no-tone trials, positively correlate with tone detection performance. Each dot represents individual participant data. (D) Presents peak latency differences of Corrected cluster 234 between blocks. Filled circles denote peak latencies derived from fitting Gaussian curves to Corrected cluster 234, with different temporal locations coded by the color scheme. Gray square areas indicate the 95% confidence interval of peak latencies from permutation tests across blocks, representing a ’Uniform’ bound, a null hypothesis that all peak latencies at a temporal location do not significantly differ across blocks. If a peak latency falls outside the ’Uniform’ bound, it significantly differs from other peak latencies. (E) Displays peak latency distribution. Three distributions for three temporal locations were derived from permutation tests. (F) Depicts distribution of shortened latency per temporal location. A set of peak latencies was randomly selected from distributions in (E) and a line was fitted to the selected peak latencies with the temporal location (1, 2, and 3) as the independent variable. This generated a distribution of slopes representing shortened latency per temporal location. 97.5% of the slope values fall below 0, suggesting a significant decrease in peak latency with the temporal location. The shaded areas of color in (A) and the error bars in (B) represent ±1 standard error of the mean across participants. * denotes p < 0.05.

Behavioral and Neural Findings Explained by Task-Optimized Continuous-Time Recurrent Neural Networks (CRNNs). (A) Training procedures for CRNNs are depicted. The left plot shows the inputs to CRNNs, while the right plot outlines the CRNN structure and training procedures. (B) Task performance of trained CRNNs is presented. Five CRNNs were trained at five different external noise levels and tested at seven noise levels. The solid black line represents the group mean of human behavioral performance (Fig. S1C), with dashed black lines indicating ± 2 standard errors across participants. Filled gray circles mark the testing noise levels chosen for the corresponding CRNNs whose performance matches human performance. (C) Modulation spectra of hidden activities of the CRNNs at the corresponding testing noise level selected in (B) are displayed. Modulation spectra of CRNN hidden activities were derived following the EEG analysis procedures. (D) Hidden activities of two CRNNs derived from 40 no-tone trials are shown. The y-axis specifies whether the units are excitatory (red units) or inhibitory (blue units). (E) Compares CRNN hidden activities with neural data. The neural data are sourced from the group-mean data shown in Fig. 3A. The CRNN analyzed was trained at the noise level of 0.15 SD and tested at the noise level of 0.2 SD. (F) Training CRNN with a uniform probabilistic distribution of tone locations is presented. The upper plot illustrates a trial structure with a uniform distribution of tone locations, where a tone can appear at any time from 0.25 s to 1.75 s post noise onset with equal probability. The lower plot reveals that the hidden activities of the CRNN trained using the ’uniform’ trial structure did not exhibit any rhythmic activities. (G) Task performance in the ’uniform’ trial structure is shown. At the testing noise level of 0.2 SD, the CRNN trained in (F) and the CRNN trained with three temporal locations at the training noise level of 0.15 SD were tested. (H) Hidden activities of CRNNs with perturbed excitation-inhibition (EI) weights are depicted. The thick colored lines refer to the EI perturbations that caused the peak latencies of CRNN hidden activities to decrease with the temporal location. The violet line represents the unperturbed CRNN selected from (C), with dashed vertical lines denoting peak latencies of this unperturbed CRNN. (I) Task performance of perturbed CRNNs at the testing noise level of 0.2 SD is presented.

Tone Detection Threshold, Condition Probabilities, and ERP Results. Refer to Figures 1 & 2 for related information.

(A) Tone detection thresholds for individual participants. Each dot signifies a participant’s threshold.

(B) Condition probabilities. The left plot illustrates the cumulative probability of the target tone appearing in tone trials over time; the right plot displays the probability of a trial being a tone trial, given that no tone is presented.

(C) Behavioral results. From left to right, the plots show d-prime values, criteria, and reaction time, respectively. Line color denotes block number. Error bars indicate ±1 standard error of the mean across participants. Analysis was conducted on tone detection performance using a signal detection framework and a two-way repeated-measure ANOVA (rmANOVA) on the d-prime values with Temporal location and Block number as main factors. A significant main effect was found for Block number (F(2,34) = 5.74, p = .007, ηp2 = 0.252) with a significant linear trend indicating decreased tone detection performance with increasing block number (F(1,17) = 9.94, p = .006, ηp2 = 0.369). This effect is attributed to participants’ decreased effort or fatigue due to the repetitive trial structure in the tone detection task. Temporal location did not have a significant main effect (F(2,34) = 3.26, p = .051, ηp2 = 0.166), but a significant interaction was found between Temporal location and Block number (F(4,68) = 3.80, p = .008, ηp2 = 0.183). Further testing of Temporal location in each block (one-way rmANOVA) revealed a significant main effect in the second (F(2,34) = 4.97, p = .020, ηp2 = 0.226) and third blocks (F(2,34) = 7.01, p = .009, ηp2 = 0.292), but not in the first (F(2,34) = 2.94, p = .066, ηp2 = 0.148). False Discovery Rate (FDR) adjustment was applied to control false positive rate.

The criteria calculated for the three temporal locations shared the same no-tone trials and the differences between them were dominated by the hit rate in the tone trials. We averaged the hit rates over the three temporal locations in the tone trials and calculated a criterion for each block. No significant main effect was found in a one-way rmANOVA on criteria with Block number as the main factor (F(2,34) = 3.06, p = .056, ηp2 = 0.105). A two-way rmANOVA was conducted on the raw reaction time with Trial condition (three different temporal locations in the tone trials and one no-tone condition) and Block number as main factors. The main effect of Trial condition was significant (F(3,51) = 28.68, p < .001, ηp2 = 0.636); no significance was found for the main effect of Block number (F(2,34) = 1.65, p = .207, ηp2 = 0.089) nor for the interaction (F(6,102) = .92, p = .485, ηp2 = 0.051). The significant main effect of Trial condition suggests that early tone detection might provide participants more time to prepare for button responses.

(D) ERPs to the target tone averaged across three blocks. Channel Cz was selected for statistical analyses. Two significant temporal regions showed a main effect of Temporal location (p < 0.05): region of interest 1 (ROI 1) from 0.18 s to 0.26 s and ROI 2 from 0.35 s to 0.46 s (left plot). ERP amplitudes within each temporal region were averaged to compare ERP magnitude between temporal locations. In both ROIs, the absolute magnitude of ERPs to the target tone significantly increased with the temporal location with the ERP in the third tone location larger than the first and the second tone locations and the ERP in the second tone location larger than the first tone location (p < 0.05; adjusted FDR correction was applied).

Power Coupling Analysis. Refer to Figure 2 for related information. (A) Power of no-tone trials. The plots from left to right represent spectrograms of power in blocks 1, 2, and 3, respectively. (B) Topographies of routinely defined power frequency bands: delta (1-3 Hz), theta (4-7 Hz), alpha (8-12 Hz), beta1 (13-20 Hz), and beta2 (21-30 Hz). The power in each frequency band was averaged over three blocks from 0.25 s to 1.75 s following noise onset. (C) Power coupling between significant clusters depicted in Fig. 2B & D. Phase-locking values (solid black line in each block) were calculated and corresponding thresholds of a one-sided alpha level of 0.01 were determined using surrogate tests (thin pink lines). A significant coupling of power dynamics was observed between Cluster 1 and 2 after noise onset. Additional significant couplings were found after noise onset between Cluster 2 and 3, and between Cluster 3 and 4, suggesting that Clusters 2, 3, and 4 collectively contribute to a common neural process.

(A) ITPC Modulation Spectrum. Modulation spectrum analyses were conducted on the neural phase measurement - inter-trial phase coherence (ITPC). However, only one cluster displayed robust modulation strength below the Trend bound. (B) ITPC Cluster Dynamics. The ITPC was filtered according to the cluster defined in (A), and the dynamics of ITPC are shown. The ITPC cluster indicates increased phase coherences at noise onset and offset, without displaying signatures of coding local temporal locations in the white noise or monitoring the elapsed time of the noise. It should be noted that ITPC is closely associated with phase-locked, or event-locked, neural responses that underlie routinely-defined ’evoked power’. The power calculation method we used differs from the ’evoked power’ calculation method, but the ITPC modulation result here illustrates that the phase-locked responses, or ’evoked power’, do not capture the process of sequential temporal anticipation. Refer to Methods for more details. The shaded areas of color represent ±1 standard error of the mean over participants.

Analyses of Gaussian Fits to Neural Peaks of Corrected Cluster 234 and Their Correlation with Behavioral Measurements. Refer to Fig. 3 for related information. (A) Gaussian fits were applied to each neural peak of Corrected Cluster 234 in every block, resulting in derived peak latencies and widths (Standard Deviation of the Gaussian fits). (B) Analysis of the correlation between d-prime values and the magnitude of neural peaks per block. The first block showed a negative, non-significant correlation, while the following two blocks demonstrated positive, significant correlations. This suggests that, in the second and third blocks, participants utilized the timing information of temporal locations for tone detection. (C) Examination of the correlation between d-prime values and the magnitude of neural peaks per temporal location revealed all positive correlations. This indicates that the neural coding observed in no-tone trials is indeed relevant to tone detection. (D) Analysis of the correlation between the false alarm rate and the magnitude of neural peaks, after the false alarm rates were transformed using the Fisher-Z transformation. A possible explanation for the emergence of Corrected Cluster 234 could be participants’ false perception of a tone in no-tone trials, triggering neural peaks. However, the mostly negative or near-zero correlations suggest that this explanation, based on the false alarm rate, does not account for the emergence of Corrected Cluster 234.