(a) The cross-correlation between the original speech signal with the fundamental waveform (red) as well as with its Hilbert transform (blue) and the resulting amplitude (black) show a peak at 0 ms and no phase shift. The processing of the acoustic signal does accordingly not change the latency or phase of that signal. (b) The computation of the cross-correlation of the fundamental waveform to the neural recording involved processing of the neural signal such as through filtering. However, the cross-correlation between the recorded neural signal and the filtered version shows a peak at vanishing latency. The processing of the neural signal did therefore not alter the latency. (c) When the earphones are placed close to the ears, but not inside the ear canal, preventing a subject from hearing the speech signal, the cross-correlation between the recorded neural signal and the fundamental waveform of speech (red) as well as its Hilbert transform (blue) do not yield a measurable peak. The amplitude of the resulting complex correlation function (black) does not peak either, demonstrating the absence of a stimulus artifact. (d) When a subject listened to a speech signal and then to the same signal with reversed polarity, and when the average over the neural recordings to both stimulus presentations was employed for the analysis, the complex cross-correlation showed the same structure as when it was computed using the neural response to one stimulus only. This shows the absence of a stimulus artifact as well as the absence of the cochlear microphonic in the measured response. (e) Putative cortical contributions to the neural response would occur at latencies above 15 ms and likely between 50–500 ms. The complex correlation at those latencies shows, however, no significant peak. To enable comparison, all recordings were obtained from the same subject for whom we report the exemplary recording in Figure 1c.