Reconstructing voice identity from noninvasive auditory cortex recordings

  1. Charly Lamothe  Is a corresponding author
  2. Etienne Thoret
  3. Régis Trapeau
  4. Bruno L Giordano
  5. Julien Sein
  6. Sylvain Takerkart
  7. Stephane Ayache
  8. Thierry Artieres  Is a corresponding author
  9. Pascal Belin  Is a corresponding author
  1. La Timone Neuroscience Institute UMR 7289, CNRS, Aix-Marseille University, France
  2. Laboratoire d’Informatique et Systèmes UMR 7020, CNRS, Aix-Marseille University, France
  3. Perception, Representation, Image, Sound, Music UMR 7061, CNRS, France
  4. Institute of Language Communication & the Brain, France
  5. Centre IRM-INT@CERIMED, France
  6. École Centrale de Marseille, France
5 figures, 17 tables and 3 additional files

Figures

Figure 1 with 1 supplement
DNN-derived Voice Latent Space (VLS).

(a) Variational autoencoder (VAE) Architecture. Two networks learned complementary tasks. An encoder was trained using 182 K voice samples to compress their spectrogram into a 128-dimension representation, the voice latent space (VLS), while a decoder learned the reverse mapping. The network was trained end-to-end by minimizing the difference between the original and reconstructed spectrograms. (b) Distribution of the 405 speaker identities along the first 2 principal components of the VLS coordinates from all sounds, averaged by speaker identity. Each disk represents a speaker’s identity colored by gender. PC2 largely maps onto voice gender (ANOVAs on the first two components: PC1: F(1, 405)=0.10, p=0.74; PC2: F(1, 405)=11.00, p<0.001). Large disks represent the average of all male (black) or female (gray) speaker coordinates, with their associated reconstructed spectrograms (note the flat fundamental frequency (f0) and formant frequency contours caused by averaging). The bottom of the spectrograms illustrates an interpolation between stimuli of two different speaker identities: spectrograms at the extremes correspond to two original stimuli (A, B) and their VLS-reconstructed spectrograms (A’, B’). Intermediary spectrograms were reconstructed from linearly interpolated coordinates between those two points in the VLS (red line) (Audio file 1). (c, d, e) Performance of linear classifiers at categorizing speaker gender (chance level: 50%), age (young/adult, chance level: 50%), or identity (119 identities, chance level: 0.84%) based on VLS or Linear model (LIN) coordinates. Error bars indicate the standard error of the mean (s.e.m.) across fivefolds. All ps <0.05. The horizontal black dashed lines indicate chance levels. *: p<0.05.

Figure 1—figure supplement 1
Projections of the DNN-derived Voice Latent Space (VLS).

Distribution of the 405 speaker identities along the first 2 principal components of the VLS coordinates from all sounds, averaged by speaker identity. Each disk represents a speaker's identity colored by either gender (as in Figure 1b), age, or language. (a) Large disks 1201 represent the average of all male (black) or female (gray) speaker coordinates. ANOVAs on the first two components: PC1: F(1, 405)=0.10, p=0.74; PC2: F(1, 405)=11.00, p<0.001. (b) Same for speaker age. ANOVAs on first two components: PC1: F(1, 405)=4.12, p<0.01; PC2:1204 F(1, 405)=3.99, p<0.01. (c) Same for speaker language. ANOVAs on the first two components: PC1: F(1, 405)=8.46, p<0.0001; PC2: F(1, 405)=6.09, p<0.0001.

Figure 2 with 3 supplements
Predicting brain activity from the VLS.

(a) Linear brain activity prediction from VLS for ~135 speaker identities in the different ROIs. We first fit a GLM to predict the Blood Oxygenation Level-Dependent (BOLD) responses to each voice speaker identity. Then, using the trained encoder, we computed the average VLS coordinates of the voice stimuli presented to the participants based on speaker identity. Finally, we trained a linear voxel-based encoding model to predict the speaker voxel activity maps from the speaker VLS coordinates. The cube illustrates the linear relationship between the fMRI responses to speaker identity and the VLS coordinates. The left face of the cube represents the activity of the voxels for each speaker’s identity, with each line corresponding to one speaker. The right face displays the VLS coordinates for each speaker’s identity. The cube’s top face shows the encoding model’s weight vectors. (b) Encoding results. For each region of interest, the model’s performance was assessed using the Pearson correlation score between the true and the predicted responses of each voxel on the held-out speaker identities. Pearson’s correlation coefficients were computed for each voxel on the speakers’ axis and then averaged across hemispheres and participants. Similar predictions were tested with the LIN features. Error bars indicate the standard error of the mean (s.e.m) across voxels. *p<0.05; **p<0.01; **p<0.001; ****p<0.0001. (c) Venn diagrams of the number of voxels in each ROI with the LIN, the VLS, or both models. For each ROI and each voxel, we checked whether the test correlation was higher than the median of all participant correlations (intersection circle), and if not, which model (LIN or VLS) yielded the highest correlation (left or right circles).

Figure 2—figure supplement 1
Brain activity in response to voice measured by fMRI.

(a) A GLM is used to model fMRI activity in response to each speaker's identity. (b) The fMRI activity in response to each speaker's identity is mapped into dedicated voxel maps by contrasting the speaker's identity with the silence, resulting in ∼135 voxel maps. (c) The voice-sensitive ROIs used for subsequent analyses, identified in each participant via an independent Voice Localizer: the anterior, middle, and posterior Temporal Voice Areas (TVAs). (d) The Primary Auditory Cortex (A1) is defined as the intersection between a probabilistic map of Heschl’s gyri and the sound vs silence contrast map.

Figure 2—figure supplement 2
Denoising of the fMRI BOLD responses.

A general linear model (GLM) was fit to regress out the noise by predicting Y from a ‘denoising’ design matrix Xd, composed of R = 38 regressors of nuisance 6 head motion parameters (three variables for the translations, three variables for the rotations); 18 ‘RETROICOR’ regressors (Glover et al., 2000) using the TAPAS PhysIO package (Kasper et al., 2017) with the hyperparameters set as specified in Snoek et al., 2021; 13 regressors modeling slow artifactual trends (sines and cosines, cut-off frequency of the high-pass filter = 0.01 Hz); an intercept. The design matrix was convolved with a hemodynamic response function (HRF) with a peak at 6s sec and an undershoot at 16s sec (Glover, 1999). We note the convolved design matrix as XdRS×R where S = number of scans. The ‘denoise’ GLM’s parameters βdRR×V were optimized to minimize the amplitude of the residual βd=argminβRR×VYXdβ2, where V = number of voxels. The denoised BOLD signal Yd was then obtained from the original one according to Yd=Y(Xdβd).

Figure 2—figure supplement 3
Extended predicted brain encoding results with state-of-the-art models.

(a) Encoding results with Wav2Vec and HuBERT. For each region of interest, the model's performance was assessed using the Pearson correlation score between the true and the predicted responses of each voxel on the held-out speaker identities. Pearson’s correlation coefficients were computed for each voxel on the speakers’ axis and then averaged across hemispheres and participants. Error bars indicate the standard error of the mean (s.e.m) across voxels. *p < 0.05; **p < 0.01; **p < 0.001; ****p < 0.0001. (b) Same results grouped by model. (c) Venn diagrams of the number of voxels for each ROI and triplet of models. For each ROI and each voxel, we checked whether the test correlation was higher than the median of all participant correlations (intersection circle), and if not, which model yielded the highest correlation (left or right circles).

The VLS better explains representational geometry for voice identities in the TVAs than the linear model.

(a) Representational dissimilarity matrices (RDMs) of pairwise speaker dissimilarities for ∼135 identities (arranged by gender, sidebars), according to LIN and VLS. (b) Spearman correlation coefficients between the brain RDMs for A1, the 3 TVAs, and the 2 model RDMs. Error bars indicate the standard error of the mean (s.e.m) across brain-model correlations. (c) Example of brain-model RDM correlation in the TVAs. The VLS RDM and the brain RDM yielding one of the highest correlations (LaTVA) are shown in the insert.

Reconstructing voice identity from brain recordings.

(a) A linear voxel-based decoding model was used to predict the VLS coordinates of 18 Test Stimuli based on fMRI responses to ~12,000 Train stimuli in the different ROIs. To reconstruct the audio stimuli from the brain recordings, the predicted VLS coordinates were then fed to the trained decoder to yield reconstructed spectrograms, synthesized into sound waveforms using the Griffin-Lim phase reconstruction algorithm (Griffin and Jae, 1983). (b) Reconstructed spectrograms of the stimuli presented to the participants. The left panels show the spectrogram of example original stimuli reconstructed from the VLS, and the right panels show brain-reconstructed spectrograms via LIN or VLS autoencoder-based representations, and SPEC, direct regression from the audio spectrograms (Audio file 2).

Behavioral and machine classification of the reconstructed stimuli.

(a, b, c). Decoding voice identity information in brain-reconstructed spectrograms. Performance of linear classifiers at categorizing speaker gender (chance level: 50%), age (chance level: 50%), and identity (17 identities, chance level: 5.88%). Error bars indicate s.e.m. across 40 random classifier initializations per ROI (instance of classifiers; 2 hemispheres x 20 seeds). The horizontal black dashed line indicates the chance level. The blue and yellow dashed lines indicate the LIN and VLS ceiling levels, respectively. *p<0.05; **p<0.001, ***p<0.001; ****p<0.0001. (d, e, f) Listener performance at categorizing speaker gender (chance level: 50%) and age (chance level: 50%), and at identity discrimination (two forced choice task, chance level: 50%) in the brain-reconstructed stimuli. Error bars indicate s.e.m. across participant scores. The horizontal black dashed line indicates the chance level, while the red, blue, and yellow dashed lines indicate the ceiling levels for the original stimuli, the LIN-reconstructed, and the VLS-reconstructed, respectively. *p<0.05; **p<0.01; ***p<0.001, ***p<0.0001. (g) Perceptual ratings of voice naturalness in the brain-reconstructed stimuli’ as assessed by human listeners, between 0 and 100 (zoomed between 5 and 80). *p<0.05, ****p<0.0001.

Tables

Appendix 1—table 1
Architecture of the VAE network.

The architecture of the VAE consists of 15 layers with an intermediate hidden representation of 128 neurons that will stand for the VLS. The Encoder network (Enc; 7 layers) learns to map an input, s (a spectrogram of a sound), onto the (128-dimensional) VLS, while the Decoder (Dec; 7 layers) aims at reconstructing the spectrogram s from z. The learning objective of the full model is to make the output spectrogram Dec(Enc(s)) as close as possible to the original one s. The model was trained until convergence (approximately 1000 epochs). Hyperparameter search was conducted to determine the suitable learning rate. BN: batch normalization; FC: fully connected; ReLU: Rectified Linear Unit

NameLayer# FiltersFilter sizeStrideActivation
EncoderConv2D+BN2 D646x32x2ReLU
Conv2D+BN2 D1286x22x2ReLU
Conv2D+BN2 D2566x22x1ReLU
Conv2D+BN2 D5126x22x1ReLU
Conv2D76x21x1-
BottleneckFC256---
DecoderConvTrans2D+BN2 D51227x31x1ReLU
ConvTrans2D+BN2 D2564x22x1ReLU
ConvTrans2D+BN2 D1284x22x1ReLU
ConvTrans2D+BN2 D644x22x2ReLU
ConvTrans2D14x22x2-
Batch size64
Loss functionMSE +KLdivergence
OptimizerAdam, learningrate = 0.00005
betas = (0.5, 0.999)
Appendix 1—table 2
Comparing the performance of the human listeners at discriminating speaker identity-related information by ROI.

This table reports the significance of the A1-TVAs difference in the speaker identity categorization and discrimination performance. Two-sample t-tests were conducted between the scores of human listeners at discriminating the speaker gender (2 classes), age (2 classes), and identity (17 classes) of the 18 Test Stimuli that were reconstructed from the VLS features with those from LIN features. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each speaker identity information and ROI.

SubjectROICorrelations.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
s1LA10.13 ± 0.150.034.78E+00321.91E-05****[0.08, inf]8.30E-011.22E+031.00
RA10.21 ± 0.140.038.08E+00321.57E-09****[0.16, inf]1.41E+007.74E+061.00
LmTVA0.32 ± 0.130.021.34E+01325.25E-15****[0.28, inf]2.34E+001.27E+121.00
RmTVA0.16 ± 0.070.011.11E+01261.21E-11****[0.13, inf]2.13E+007.53E+081.00
LpTVA0.07 ± 0.130.023.15E+00321.76E-03**[0.03, inf]5.50E-012.14E+010.92
RpTVA0.04 ± 0.080.022.56E+00317.82E-03**[0.01, inf]4.50E-016.05E+000.80
LaTVA0.27 ± 0.150.031.00E+01302.30E-11****[0.23, inf]1.80E+004.20E+081.00
RaTVA0.11 ± 0.100.025.26E+00259.42E-06****[0.07, inf]1.03E+002.42E+031.00
A10.17 ± 0.150.028.80E+00655.58E-13****[0.14, inf]1.08E+001.48E+101.00
mTVA0.25 ± 0.140.021.38E+01591.71E-20****[0.22, inf]1.79E+002.85E+171.00
pTVA0.06 ± 0.110.014.02E+00647.84E-05****[0.03, inf]5.00E-012.81E+020.99
aTVA0.20 ± 0.150.029.63E+00568.92E-14****[0.16, inf]1.28E+008.76E+101.00
TVAs0.16 ± 0.160.011.39E+011818.43E-31****[0.14, inf]1.03E+003.76E+271.00
s2LA10.04 ± 0.110.022.16E+00321.94E-02*[0.01, inf]3.80E-012.83E+000.68
RA1-0.01 ± 0.110.02n/an/an/an/an/an/an/an/a
LmTVA-0.02 ± 0.090.02n/an/an/an/an/an/an/an/a
RmTVA0.03 ± 0.110.021.17E+00211.27E-01ns[-0.01, inf]2.50E-018.20E-010.31
LpTVA-0.01 ± 0.100.02n/an/an/an/an/an/an/an/a
RpTVA0.04 ± 0.100.031.38E+00169.37E-02ns[-0.01, inf]3.30E-011.11E+000.37
LaTVA-0.05 ± 0.120.02n/an/an/an/an/an/an/an/a
RaTVA0.03 ± 0.120.031.18E+00191.26E-01ns[-0.01, inf]2.60E-018.56E-010.31
A10.02 ± 0.110.011.19E+00651.19E-01ns[-0.01, inf]1.50E-015.31E-010.32
mTVA0.00 ± 0.100.025.00E-02464.81E-01ns[-0.02, inf]1.00E-023.17E-010.06
pTVA0.01 ± 0.100.025.10E-01453.07E-01ns[-0.02, inf]7.00E-023.61E-010.13
aTVA-0.02 ± 0.120.02n/an/an/an/an/an/an/an/a
TVAs-0.00 ± 0.110.01n/an/an/an/an/an/an/an/a
s3LA10.04 ± 0.080.012.89E+00323.39E-03**[0.02, inf]5.00E-011.21E+010.88
RA10.03 ± 0.130.021.48E+00327.39E-02ns[-0.00, inf]2.60E-011.01E+000.42
LmTVA0.04 ± 0.090.022.43E+00281.10E-02*[0.01, inf]4.50E-014.72E+000.76
RmTVA0.07 ± 0.090.024.38E+00287.48E-05****[0.04, inf]8.10E-013.64E+021.00
LpTVA0.03 ± 0.120.021.48E+00287.45E-02ns[-0.00, inf]2.80E-011.06E+000.42
RpTVA0.04 ± 0.080.012.83E+00353.87E-03**[0.02, inf]4.70E-011.05E+010.87
LaTVA0.09 ± 0.130.033.15E+00232.24E-03**[0.04, inf]6.40E-011.91E+010.92
RaTVA0.07 ± 0.120.032.41E+00171.38E-02*[0.02, inf]5.70E-014.61E+000.75
A10.04 ± 0.110.012.80E+00653.38E-03**[0.01, inf]3.40E-019.50E+000.87
mTVA0.06 ± 0.090.014.76E+00576.83E-06****[0.04, inf]6.30E-012.76E+031.00
pTVA0.04 ± 0.100.012.92E+00642.40E-03**[0.02, inf]3.60E-011.29E+010.89
aTVA0.08 ± 0.130.024.00E+00411.28E-04***[0.05, inf]6.20E-012.05E+020.99
TVAs0.06 ± 0.110.016.62E+001642.46E-10****[0.04, inf]5.20E-013.49E+071.00
allLA10.07 ± 0.120.015.58E+00981.05E-07****[0.05, inf]5.60E-011.21E+051.00
RA10.08 ± 0.160.024.82E+00982.60E-06****[0.05, inf]4.80E-015.85E+031.00
LmTVA0.13 ± 0.190.026.37E+00864.45E-09****[0.10, inf]6.80E-012.55E+061.00
RmTVA0.09 ± 0.100.017.55E+00773.72E-11****[0.07, inf]8.50E-012.55E+081.00
LpTVA0.03 ± 0.120.012.66E+00904.59E-03**[0.01, inf]2.80E-016.39E+000.84
RpTVA0.04 ± 0.090.014.01E+00846.63E-05****[0.02, inf]4.30E-013.00E+020.99
LaTVA0.11 ± 0.190.025.07E+00831.20E-06****[0.07, inf]5.50E-011.27E+041.00
RaTVA0.07 ± 0.120.015.01E+00632.34E-06****[0.05, inf]6.30E-017.30E+031.00
A10.07 ± 0.140.017.25E+001974.67E-12****[0.06, inf]5.20E-011.54E+091.00
mTVA0.11 ± 0.160.019.12E+001641.30E-16****[0.09, inf]7.10E-014.40E+131.00
pTVA0.04 ± 0.110.014.49E+001756.53E-06****[0.02, inf]3.40E-012.00E+031.00
aTVA0.09 ± 0.170.016.81E+001471.14E-10****[0.07, inf]5.60E-017.58E+071.00
TVAs0.08 ± 0.150.011.18E+014889.58E-29****[0.07, inf]5.30E-012.93E+251.00
Appendix 1—table 3
Assessing the significance of brain encoding performance with LIN features.

This table reports the significance of the brain encoding performance with LIN features. We compared the distribution of Pearson's correlation coefficients to the chance level of 0.0 by conducting one-sample t-tests. Using a linear model, we calculated the correlation between the voxels in the speaker activity maps and the predicted voxels from the LIN features. s.e.m. = standard error of the mean. all = we combined the scores of all participants before computing the test. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each participant and ROI.

SubjectROICorrelations.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
s1LA10.03 ± 0.110.021.46E+00327.71E-02ns[-0.00, inf]2.50E-019.75E-010.41
RA10.13 ± 0.090.028.06E+00321.67E-09****[0.10, inf]1.40E+007.28E+061.00
LmTVA0.25 ± 0.160.038.95E+00321.58E-10****[0.20, inf]1.56E+006.77E+071.00
RmTVA0.08 ± 0.090.024.89E+00262.24E-05****[0.05, inf]9.40E-011.09E+031.00
LpTVA-0.03 ± 0.120.02n/an/an/an/an/an/an/an/a
RpTVA-0.06 ± 0.110.02n/an/an/an/an/an/an/an/a
LaTVA0.15 ± 0.160.035.34E+00304.43E-06****[0.10, inf]9.60E-014.70E+031.00
RaTVA0.03 ± 0.110.021.55E+00256.70E-02ns[-0.00, inf]3.00E-011.19E+000.44
A10.08 ± 0.110.015.65E+00651.93E-07****[0.06, inf]7.00E-017.57E+041.00
mTVA0.17 ± 0.150.028.69E+00591.91E-12****[0.14, inf]1.12E+004.57E+091.00
pTVA-0.04 ± 0.120.02n/an/an/an/an/an/an/an/a
aTVA0.10 ± 0.150.024.94E+00563.68E-06****[0.07, inf]6.50E-014.93E+031.00
TVAs0.07 ± 0.170.015.79E+001811.52E-08****[0.05, inf]4.30E-016.37E+051.00
s2LA10.04 ± 0.140.021.51E+00327.01E-02ns[-0.00, inf]2.60E-011.05E+000.43
RA10.01 ± 0.120.023.60E-01323.59E-01ns[-0.03, inf]6.00E-023.96E-010.10
LmTVA0.04 ± 0.070.013.07E+00242.61E-03**[0.02, inf]6.10E-011.66E+010.91
RmTVA0.08 ± 0.100.023.95E+00213.64E-04***[0.05, inf]8.40E-019.46E+010.99
LpTVA-0.01 ± 0.100.02n/an/an/an/an/an/an/an/a
RpTVA0.02 ± 0.130.037.30E-01162.39E-01ns[-0.03, inf]1.80E-016.29E-010.17
LaTVA-0.01 ± 0.080.02n/an/an/an/an/an/an/an/a
RaTVA0.02 ± 0.080.021.00E+00191.64E-01ns[-0.01, inf]2.20E-017.26E-010.25
A10.02 ± 0.130.021.38E+00658.61E-02ns[-0.00, inf]1.70E-016.66E-010.39
mTVA0.06 ± 0.090.014.92E+00465.72E-06****[0.04, inf]7.20E-013.43E+031.00
pTVA-0.00 ± 0.110.02n/an/an/an/an/an/an/an/a
aTVA0.00 ± 0.080.014.10E-01483.43E-01ns[-0.02, inf]6.00E-023.36E-010.11
TVAs0.02 ± 0.100.012.65E+001414.46E-03**[0.01, inf]2.20E-015.41E+000.84
s3LA10.01 ± 0.090.023.50E-01323.66E-01ns[-0.02, inf]6.00E-023.94E-010.10
RA10.03 ± 0.110.021.62E+00325.78E-02ns[-0.00, inf]2.80E-011.21E+000.48
LmTVA0.05 ± 0.140.032.03E+00282.61E-02*[0.01, inf]3.80E-012.34E+000.63
RmTVA0.09 ± 0.080.025.64E+00282.41E-06****[0.06, inf]1.05E+008.29E+031.00
LpTVA0.00 ± 0.100.022.20E-01284.12E-01ns[-0.03, inf]4.00E-024.04E-010.08
RpTVA0.01 ± 0.110.024.50E-01353.30E-01ns[-0.02, inf]7.00E-023.93E-010.11
LaTVA0.04 ± 0.120.031.60E+00236.16E-02ns[-0.00, inf]3.30E-011.31E+000.46
RaTVA0.11 ± 0.120.033.65E+00179.96E-04***[0.06, inf]8.60E-014.13E+010.97
A10.02 ± 0.100.011.49E+00657.09E-02ns[-0.00, inf]1.80E-017.69E-010.43
mTVA0.07 ± 0.110.024.68E+00579.11E-06****[0.05, inf]6.10E-012.12E+031.00
pTVA0.01 ± 0.110.014.90E-01643.14E-01ns[-0.02, inf]6.00E-023.05E-010.12
aTVA0.07 ± 0.130.023.53E+00415.14E-04***[0.04, inf]5.50E-015.87E+010.97
TVAs0.05 ± 0.120.014.87E+001641.32E-06****[0.03, inf]3.80E-019.31E+031.00
allLA10.02 ± 0.110.012.04E+00982.19E-02*[0.00, inf]2.10E-011.62E+000.65
RA10.06 ± 0.120.014.67E+00984.87E-06****[0.04, inf]4.70E-013.24E+031.00
LmTVA0.12 ± 0.160.027.09E+00861.77E-10****[0.09, inf]7.60E-015.59E+071.00
RmTVA0.09 ± 0.090.018.47E+00776.43E-13****[0.07, inf]9.60E-011.27E+101.00
LpTVA-0.01 ± 0.110.01n/an/an/an/an/an/an/an/a
RpTVA-0.02 ± 0.120.01n/an/an/an/an/an/an/an/a
LaTVA0.07 ± 0.140.024.23E+00832.96E-05****[0.04, inf]4.60E-016.36E+020.99
RaTVA0.05 ± 0.110.013.57E+00633.50E-04***[0.03, inf]4.50E-017.23E+010.97
A10.04 ± 0.120.014.76E+001971.89E-06****[0.03, inf]3.40E-016.19E+031.00
mTVA0.11 ± 0.130.011.01E+011642.66E-19****[0.09, inf]7.90E-011.88E+161.00
pTVA-0.01 ± 0.120.01n/an/an/an/an/an/an/an/a
aTVA0.06 ± 0.130.015.52E+001477.61E-08****[0.04, inf]4.50E-011.46E+051.00
TVAs0.05 ± 0.140.017.88E+004881.05E-14****[0.04, inf]3.60E-014.33E+111.00
Appendix 1—table 4
Assessing the significance of brain encoding performance with VLS features.

This table reports the significance of the brain encoding performance with VLS features. We compared the distribution of Pearson’s correlation coefficients to the chance level of 0.0 by conducting one-sample t-tests. Using a linear model, we calculated the correlation between the voxels in the speaker activity maps and the predicted voxels from the VLS features. s.e.m.=standard error of the mean. all = we combined the scores of all participants before computing the test. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each participant and ROI.

SubjectROICorrelation VLSCorrelation LINs.e.m. VLSs.e.m. LINT VLS vs LINdofp-valunc. sig.CI95%cohen-dBF10power
s1LA10.03 ± 0.110.13 ± 0.150.020.03-4.43E+00321.03E-04***[-0.14, -0.05]7.30E-012.47E+020.98
RA10.13 ± 0.090.21 ± 0.140.020.03-3.75E+00327.07E-04***[-0.11, -0.03]6.00E-014.39E+010.92
LmTVA0.25 ± 0.160.32 ± 0.130.030.02-3.90E+00324.61E-04***[-0.11, -0.03]4.80E-016.43E+010.76
RmTVA0.08 ± 0.090.16 ± 0.070.020.01-5.48E+00269.54E-06****[-0.10, -0.05]9.20E-012.24E+031.00
LpTVA-0.03 ± 0.120.07 ± 0.130.020.02-6.49E+00322.68E-07****[-0.13, -0.07]7.60E-015.95E+040.99
RpTVA-0.06 ± 0.110.04 ± 0.080.020.02-5.09E+00311.67E-05****[-0.14, -0.06]1.01E+001.31E+031.00
LaTVA0.15 ± 0.160.27 ± 0.150.030.03-7.34E+00303.55E-08****[-0.15, -0.09]7.70E-013.95E+050.99
RaTVA0.03 ± 0.110.11 ± 0.100.020.02-4.24E+00252.65E-04***[-0.11, -0.04]7.10E-011.11E+020.93
A10.08 ± 0.110.17 ± 0.150.010.02-5.81E+00652.02E-07****[-0.12, -0.06]6.30E-016.96E+041.00
mTVA0.17 ± 0.150.25 ± 0.140.020.02-6.24E+00595.16E-08****[-0.10, -0.05]5.00E-012.58E+050.97
pTVA-0.04 ± 0.120.06 ± 0.110.020.01-8.06E+00642.58E-11****[-0.12, -0.08]8.60E-013.62E+081.00
aTVA0.10 ± 0.150.20 ± 0.150.020.02-8.11E+00565.09E-11****[-0.13, -0.08]6.60E-011.91E+081.00
TVAs0.07 ± 0.170.16 ± 0.160.010.01-1.29E+011811.85E-27****[-0.11, -0.08]5.60E-011.89E+241.00
s2LA10.04 ± 0.140.04 ± 0.110.020.02-2.70E-01327.93E-01ns[-0.03, 0.02]3.00E-021.92E-010.05
RA10.01 ± 0.12-0.01 ± 0.110.020.026.20E-01325.38E-01ns[-0.03, 0.06]1.30E-012.23E-010.11
LmTVA0.04 ± 0.07-0.02 ± 0.090.010.023.52E+00241.77E-03**[0.03, 0.11]8.10E-012.11E+010.97
RmTVA0.08 ± 0.100.03 ± 0.110.020.023.74E+00211.22E-03**[0.02, 0.09]5.20E-013.01E+010.65
LpTVA-0.01 ± 0.10-0.01 ± 0.100.020.02-4.20E-01286.78E-01ns[-0.04, 0.02]6.00E-022.14E-010.06
RpTVA0.02 ± 0.130.04 ± 0.100.030.03-4.10E-01166.88E-01ns[-0.07, 0.05]1.00E-012.68E-010.07
LaTVA-0.01 ± 0.08-0.05 ± 0.120.020.022.78E+00289.51E-03**[0.01, 0.08]4.70E-014.75E+000.68
RaTVA0.02 ± 0.080.03 ± 0.120.020.03-4.30E-01196.69E-01ns[-0.07, 0.05]1.20E-012.53E-010.08
A10.02 ± 0.130.02 ± 0.110.020.014.00E-01656.87E-01ns[-0.02, 0.03]5.00E-021.46E-010.07
mTVA0.06 ± 0.090.00 ± 0.100.010.025.06E+00467.24E-06****[0.04, 0.09]6.40E-012.62E+030.99
pTVA-0.00 ± 0.110.01 ± 0.100.020.02-5.90E-01455.57E-01ns[-0.04, 0.02]8.00E-021.89E-010.08
aTVA0.00 ± 0.08-0.02 ± 0.120.010.021.50E+00481.40E-01ns[-0.01, 0.06]2.20E-014.43E-010.33
TVAs0.02 ± 0.10-0.00 ± 0.110.010.013.06E+001412.64E-03**[0.01, 0.04]2.40E-018.00E+000.83
s3LA10.01 ± 0.090.04 ± 0.080.020.01-2.32E+00322.68E-02*[-0.07, -0.00]4.20E-011.91E+000.64
RA10.03 ± 0.110.03 ± 0.130.020.02-1.00E-01329.17E-01ns[-0.04, 0.03]1.00E-021.87E-010.05
LmTVA0.05 ± 0.140.04 ± 0.090.030.027.20E-01284.79E-01ns[-0.02, 0.04]8.00E-022.50E-010.07
RmTVA0.09 ± 0.080.07 ± 0.090.020.029.30E-01283.59E-01ns[-0.02, 0.05]1.80E-012.94E-010.16
LpTVA0.00 ± 0.100.03 ± 0.120.020.02-1.82E+00287.91E-02ns[-0.06, 0.00]2.50E-018.47E-010.26
RpTVA0.01 ± 0.110.04 ± 0.080.020.01-2.26E+00353.03E-02*[-0.06, -0.00]3.10E-011.67E+000.44
LaTVA0.04 ± 0.120.09 ± 0.130.030.03-3.71E+00231.15E-03**[-0.07, -0.02]3.70E-013.10E+010.40
RaTVA0.11 ± 0.120.07 ± 0.120.030.032.79E+00171.25E-02*[0.01, 0.07]3.00E-014.41E+000.23
A10.02 ± 0.100.04 ± 0.110.010.01-1.60E+00651.14E-01ns[-0.04, 0.00]1.80E-014.55E-010.29
mTVA0.07 ± 0.110.06 ± 0.090.020.011.19E+00572.40E-01ns[-0.01, 0.03]1.20E-012.79E-010.15
pTVA0.01 ± 0.110.04 ± 0.100.010.01-2.92E+00644.88E-03**[-0.05, -0.01]2.80E-016.36E+000.61
aTVA0.07 ± 0.130.08 ± 0.130.020.02-9.50E-01413.49E-01ns[-0.03, 0.01]8.00E-022.54E-010.08
TVAs0.05 ± 0.120.06 ± 0.110.010.01-1.54E+001641.25E-01ns[-0.02, 0.00]9.00E-022.77E-010.20
allLA10.02 ± 0.110.07 ± 0.120.010.01-4.25E+00984.92E-05****[-0.07, -0.02]3.80E-013.57E+020.97
RA10.06 ± 0.120.08 ± 0.160.010.02-1.64E+00981.04E-01ns[-0.04, 0.00]1.40E-014.06E-010.29
LmTVA0.12 ± 0.160.13 ± 0.190.020.02-4.10E-01866.80E-01ns[-0.03, 0.02]3.00E-021.29E-010.06
RmTVA0.09 ± 0.090.09 ± 0.100.010.01-4.00E-01776.87E-01ns[-0.03, 0.02]4.00E-021.35E-010.07
LpTVA-0.01 ± 0.110.03 ± 0.120.010.01-4.81E+00905.96E-06****[-0.07, -0.03]4.00E-012.64E+030.96
RpTVA-0.02 ± 0.120.04 ± 0.090.010.01-4.60E+00841.51E-05****[-0.08, -0.03]5.10E-011.13E+031.00
LaTVA0.07 ± 0.140.11 ± 0.190.020.02-3.42E+00839.61E-04***[-0.07, -0.02]2.40E-012.46E+010.59
RaTVA0.05 ± 0.110.07 ± 0.120.010.01-1.81E+00637.58E-02ns[-0.05, 0.00]2.10E-016.31E-010.37
A10.04 ± 0.120.07 ± 0.140.010.01-4.02E+001978.19E-05****[-0.05, -0.02]2.50E-011.70E+020.94
mTVA0.11 ± 0.130.11 ± 0.160.010.01-5.80E-011645.64E-01ns[-0.02, 0.01]3.00E-021.02E-010.07
pTVA-0.01 ± 0.120.04 ± 0.110.010.01-6.65E+001753.68E-10****[-0.06, -0.04]4.50E-012.27E+071.00
aTVA0.06 ± 0.130.09 ± 0.170.010.01-3.79E+001472.23E-04***[-0.05, -0.02]2.30E-017.55E+010.78
TVAs0.05 ± 0.140.08 ± 0.150.010.01-6.28E+004887.44E-10****[-0.04, -0.02]2.10E-017.90E+061.00
Appendix 1—table 5
Comparing the performance of brain encoding models.

This table reports the significance of the VLS-LIN difference in the brain encoding performance. We conducted paired t-tests between the brain encoding model’s scores trained with the VLS features to predict the speaker activity maps' voxels and those trained with the LIN features. s.e.m.=standard error of the mean. all = we combined the scores of all participants before computing the test. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each participant and ROI.

SubjectModelROICorrelation ROICorrelation A1s.e.m. ROIs.e.m. A1T ROI vs A1dofp-valunc. sig.CI95%cohen-dBF10power
s1LINmTVA0.25±0.140.17±0.150.020.023.0700001242.62E-03**[0.03, 0.13]5.50E-011.25E+010.86
pTVA0.06±0.110.17±0.150.010.02–4.7100001296.34E-06****[-0.16,–0.06]8.20E-012.60E+031.00
aTVA0.20±0.150.17±0.150.020.021.1500001212.53E-01ns[–0.02, 0.09]2.10E-013.50E-010.21
TVAs0.16±0.160.17±0.150.010.02–0.1300002468.93E-01ns[–0.05, 0.04]2.00E-021.57E-010.05
VLSmTVA0.17±0.150.08±0.110.020.013.8600001241.81E-04***[0.05, 0.14]6.90E-011.28E+020.97
pTVA–0.04±0.120.08±0.110.020.01–6.0200001291.68E-08****[-0.17,–0.08]1.05E+006.23E+051.00
aTVA0.10±0.150.08±0.110.020.010.7500001214.53E-01ns[–0.03, 0.07]1.40E-012.49E-010.12
TVAs0.07±0.170.08±0.110.010.01–0.3500002467.25E-01ns[–0.05, 0.04]5.00E-021.65E-010.06
s2LINmTVA0.00±0.100.02±0.110.020.01–0.7600001114.48E-01ns[–0.06, 0.03]1.50E-012.62E-010.12
pTVA0.01±0.100.02±0.110.020.01–0.4300001106.70E-01ns[–0.05, 0.03]8.00E-022.21E-010.07
aTVA–0.02±0.120.02±0.110.020.01–1.5800001131.16E-01ns[–0.08, 0.01]3.00E-016.15E-010.35
TVAs–0.00±0.110.02±0.110.010.01–1.2200002062.22E-01ns[–0.05, 0.01]1.80E-013.24E-010.23
VLSmTVA0.06±0.090.02±0.130.010.021.8100001117.29E-02ns[–0.00, 0.08]3.50E-018.70E-010.43
pTVA–0.00±0.110.02±0.130.020.02–0.9600001103.41E-01ns[–0.07, 0.02]1.80E-013.06E-010.16
aTVA0.00±0.080.02±0.130.010.02–0.8100001134.20E-01ns[–0.06, 0.03]1.50E-012.69E-010.13
TVAs0.02±0.100.02±0.130.010.02–0.0200002069.87E-01ns[–0.03, 0.03]0.00E+001.62E-010.05
s3LINmTVA0.06±0.090.04±0.110.010.011.1700001222.43E-01ns[–0.01, 0.06]2.10E-013.57E-010.21
pTVA0.04±0.100.04±0.110.010.01–0.0400001299.71E-01ns[–0.04, 0.03]1.00E-021.87E-010.05
aTVA0.08±0.130.04±0.110.020.011.9400001065.55E-02ns[–0.00, 0.09]3.80E-011.09E+000.48
TVAs0.06±0.110.04±0.110.010.011.1900002292.35E-01ns[–0.01, 0.05]1.70E-013.06E-010.22
VLSmTVA0.07±0.110.02±0.100.020.012.7000001227.97E-03**[0.01, 0.09]4.90E-014.89E+000.76
pTVA0.01±0.110.02±0.100.010.01–0.6500001295.16E-01ns[–0.05, 0.02]1.10E-012.27E-010.10
aTVA0.07±0.130.02±0.100.020.012.3400001062.11E-02*[0.01, 0.10]4.60E-012.32E+000.64
TVAs0.05±0.120.02±0.100.010.011.6100002291.08E-01ns[–0.01, 0.06]2.30E-015.30E-010.36
allLINmTVA0.11±0.160.07±0.140.010.012.3600003611.86E-02*[0.01, 0.07]2.50E-011.69E+000.65
pTVA0.04±0.110.07±0.140.010.01–2.8500003724.57E-03**[-0.06,–0.01]3.00E-015.59E+000.81
aTVA0.09±0.170.07±0.140.010.011.2000003442.29E-01ns[–0.01, 0.05]1.30E-012.40E-010.22
TVAs0.08±0.150.07±0.140.010.010.4100006856.79E-01ns[–0.02, 0.03]3.00E-021.02E-010.07
VLSmTVA0.11±0.130.04±0.120.010.014.9100003611.40E-06****[0.04, 0.09]5.20E-019.29E+031.00
pTVA–0.01±0.120.04±0.120.010.01–4.4500003721.13E-05****[-0.08,–0.03]4.60E-011.31E+030.99
aTVA0.06±0.130.04±0.120.010.011.4100003441.58E-01ns[–0.01, 0.05]1.50E-013.13E-010.29
TVAs0.05±0.140.04±0.120.010.010.7500006854.56E-01ns[–0.01, 0.03]6.00E-021.23E-010.12
Appendix 1—table 6
Comparing the performance of brain encoding ROIs.

This table reports the significance of the A1-TVAs difference in the brain encoding performance. We conducted two-sample t-tests between the brain encoding model’s scores trained to predict A1 and those trained to predict temporal voice areas. s.e.m.=standard error of the mean. all = we combined the scores of all participants before computing the test. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each participant and model.

SubjectModelROICorrelationp-uncp-corrcorr. sig.
s1LINLA10.071.39E-021.79E-01ns
RA10.084.20E-031.04E-01ns
LmTVA0.081.92E-023.80E-01ns
RmTVA0.067.14E-025.51E-01ns
LpTVA0.042.05E-016.53E-01ns
RpTVA0.033.16E-017.66E-01ns
LaTVA0.121.40E-035.04E-01ns
RaTVA0.074.26E-026.61E-01ns
VLSLA10.096.52E-026.53E-02ns
RA10.087.47E-027.49E-02ns
LmTVA0.111.28E-011.28E-01ns
RmTVA0.101.39E-011.39E-01ns
LpTVA0.091.94E-011.94E-01ns
RpTVA0.111.18E-011.18E-01ns
LaTVA0.194.17E-024.17E-02*
RaTVA0.131.30E-011.30E-01ns
s2LINLA1–0.015.03E-016.27E-01ns
RA1–0.001.87E-014.95E-01ns
LmTVA0.014.72E-017.19E-01ns
RmTVA–0.017.98E-019.03E-01ns
LpTVA–0.017.21E-018.13E-01ns
RpTVA0.004.07E-016.00E-01ns
LaTVA–0.028.62E-019.22E-01ns
RaTVA–0.027.69E-017.92E-01ns
VLSLA10.022.36E-012.52E-01ns
RA10.031.12E-011.12E-01ns
LmTVA0.062.29E-012.29E-01ns
RmTVA–0.018.59E-019.26E-01ns
LpTVA–0.028.89E-019.85E-01ns
RpTVA0.014.22E-014.54E-01ns
LaTVA0.033.37E-013.38E-01ns
RaTVA0.002.76E-013.23E-01ns
s3LINLA1–0.005.71E-016.66E-01ns
RA10.053.00E-045.00E-02*
LmTVA0.054.10E-032.04E-01ns
RmTVA0.052.20E-031.16E-01ns
LpTVA0.055.80E-031.73E-01ns
RpTVA0.042.66E-024.60E-01ns
LaTVA0.120.00E+007.70E-02ns
RaTVA0.033.35E-023.26E-01ns
VLSLA10.021.78E-012.10E-01ns
RA10.071.42E-021.42E-02*
LmTVA0.117.20E-037.20E-03**
RmTVA0.051.23E-011.23E-01ns
LpTVA0.085.82E-025.82E-02ns
RpTVA0.131.56E-021.56E-02*
LaTVA0.231.00E-041.00E-04****
RaTVA0.042.34E-012.34E-01ns
Appendix 1—table 7
Assessing the significance of the RSA brain-model correlation.

This table reports the significance of the RSA brain model performance. The brain model correlation coefficients were computed between the ranked representational dissimilarity matrices. The correlation was compared to 0 using a ‘maximum statistics’ approach in which they are compared to a distribution of correlation coefficients drawn from a large number of random permutations of the model RDMs’ rows and columns while controlling for the number of comparisons performed (Methods) (Maris and Oostenveld, 2007), for each participant, model, and ROI.

SubjectROICorrelation VLSCorrelation LINp-corrp-unccorr. sig.
s1LA10.090.074.45E-012.99E-01ns
RA10.080.088.30E-015.05E-01ns
LmTVA0.110.084.63E-014.51E-01ns
RmTVA0.100.063.98E-013.97E-01ns
LpTVA0.090.042.86E-012.84E-01ns
RpTVA0.110.031.11E-011.11E-01ns
LaTVA0.190.123.94E-013.94E-01ns
RaTVA0.130.073.48E-013.48E-01ns
s2LA10.02–0.013.25E-011.65E-01ns
RA10.03–0.001.58E-011.41E-01ns
LmTVA0.060.011.78E-011.72E-01ns
RmTVA–0.01–0.011.00E+008.15E-01ns
LpTVA–0.02–0.011.00E+008.72E-01ns
RpTVA0.010.007.13E-014.47E-01ns
LaTVA0.03–0.021.20E-011.19E-01ns
RaTVA0.00–0.023.94E-011.13E-01ns
s3LA10.02–0.003.22E-011.05E-01ns
RA10.070.054.83E-013.22E-01ns
LmTVA0.110.056.61E-026.25E-02ns
RmTVA0.050.051.00E+005.38E-01ns
LpTVA0.080.054.30E-013.08E-01ns
RpTVA0.130.043.66E-023.66E-02*
LaTVA0.230.121.75E-021.75E-02*
RaTVA0.040.037.67E-016.08E-01ns
Appendix 1—table 8
Comparing the performance of the RSA models.

This table reports the significance of the RSA brain-model difference. We compared the correlation coefficients between brain RDM and VLS RDM with those from the brain RDM and LIN RDM within participants and hemispheres using one-tailed tests, based on the a priori hypothesis that the VLS models would exhibit greater brain-model correlations than the LIN models (Methods).

ModelROIAccuracy (%)s.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
LINLA143.33±2.220.51n/an/an/an/an/an/an/an/a
RA150.83±1.980.461.83E+00194.14E-02*[50.05, inf]4.10E-011.89E+000.55
LmTVA38.89±0.000.00n/an/an/an/an/an/an/an/a
RmTVA61.39±1.210.284.10E+01192.61E-20****[60.91, inf]9.17E+001.04E+171.00
LpTVA66.67±0.020.004.59E+03193.32E-59****[66.66, inf]1.03E+036.88E+331.00
RpTVA77.50±1.210.289.90E+01191.51E-27****[77.02, inf]2.21E+017.38E+231.00
LaTVA44.44±0.000.00n/an/an/an/an/an/an/an/a
RaTVA44.44±0.000.00n/an/an/an/an/an/an/an/a
A147.08±4.300.69n/an/an/an/an/an/an/an/a
mTVA50.14±11.281.818.00E-02394.70E-01ns[47.09, inf]1.00E-023.42E-010.06
pTVA72.08±5.480.882.51E+01395.18E-26****[70.60, inf]3.98E+005.89E+221.00
aTVA44.44±0.000.00n/an/an/an/an/an/an/an/a
TVAs55.56±13.941.284.35E+001191.47E-05****[53.44, inf]4.00E-011.08E+031.00
VLSLA161.94±1.980.462.62E+01191.08E-16****[61.16, inf]5.87E+003.93E+131.00
RA160.28±1.980.462.26E+01191.73E-15****[59.49, inf]5.05E+002.88E+121.00
LmTVA55.56±0.020.001.53E+03193.86E-50****[55.55, inf]3.42E+024.95E+331.00
RmTVA44.44±0.000.00n/an/an/an/an/an/an/an/a
LpTVA66.67±0.020.004.59E+03193.32E-59****[66.66, inf]1.03E+036.88E+331.00
RpTVA61.11±0.020.003.06E+03197.36E-56****[61.10, inf]6.85E+026.53E+331.00
LaTVA50.83±1.980.461.83E+00194.14E-02*[50.05, inf]4.10E-011.89E+000.55
RaTVA44.17±1.210.28n/an/an/an/an/an/an/an/a
A161.11±2.150.343.22E+01394.92E-30****[60.53, inf]5.10E+004.78E+261.00
mTVA50.00±5.560.89n/an/an/an/an/an/an/an/a
pTVA63.89±2.780.443.12E+01391.65E-29****[63.14, inf]4.94E+001.47E+261.00
aTVA47.50±3.720.60n/an/an/an/an/an/an/an/a
TVAs53.80±8.330.764.97E+001191.14E-06****[52.53, inf]4.50E-011.20E+041.00
Appendix 1—table 9
Assessing the significance of speaker gender decoding performance using VLS and LIN models based on voxel activity.

This table reports the significance of the speaker’s gender decoding performance. Linear classifiers were pre-trained to detect speaker gender (2 classes) from either the VLS or the LIN models. The speaker gender of the 18 Test Stimuli (3 participants x 6 stimuli per participant) was classified using either the VLS coordinates or the LIN features with these classifiers. We used one-sample t-tests to compare the mean of the accuracy distribution across 20 random classifier initializations (20 classifiers trained with a different initialization seed) with a chance level of 50%. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each model and ROI.

ModelROIAccuracy (%)s.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
LINLA150.42±4.150.954.40E-01193.33E-01ns[48.77, inf]1.00E-015.07E-010.11
RA110.83±3.820.88n/an/an/an/an/an/an/an/a
LmTVA44.17±3.820.88n/an/an/an/an/an/an/an/a
RmTVA50.42±6.711.542.70E-01193.95E-01ns[47.76, inf]6.00E-024.80E-010.08
LpTVA52.50±3.820.882.85E+00195.08E-03**[50.99, inf]6.40E-011.01E+010.87
RpTVA56.67±3.330.768.72E+00192.29E-08****[55.34, inf]1.95E+006.13E+051.00
LaTVA52.50±3.820.882.85E+00195.08E-03**[50.99, inf]6.40E-011.01E+010.87
RaTVA75.42±6.171.411.80E+01191.11E-13****[72.97, inf]4.02E+005.71E+101.00
A130.62±20.193.23n/an/an/an/an/an/an/an/a
mTVA47.29±6.291.01n/an/an/an/an/an/an/an/a
pTVA54.58±4.150.666.90E+00391.45E-08****[53.46, inf]1.09E+009.41E+051.00
aTVA63.96±12.552.016.94E+00391.28E-08****[60.57, inf]1.10E+001.06E+061.00
TVAs55.28±10.861.005.30E+001192.69E-07****[53.63, inf]4.80E-014.69E+041.00
VLSLA166.67±0.020.004.59E+03193.32E-59****[66.66, inf]1.03E+036.88E+331.00
RA18.33±0.000.00n/an/an/an/an/an/an/an/a
LmTVA49.17±12.612.89n/an/an/an/an/an/an/an/a
RmTVA41.67±0.000.00n/an/an/an/an/an/an/an/a
LpTVA58.33±0.020.002.30E+03191.74E-53****[58.33, inf]5.14E+026.07E+331.00
RpTVA71.67±4.080.942.31E+01191.11E-15****[70.05, inf]5.17E+004.36E+121.00
LaTVA56.67±3.330.768.72E+00192.29E-08****[55.34, inf]1.95E+006.13E+051.00
RaTVA64.17±3.820.881.62E+01197.29E-13****[62.65, inf]3.62E+009.69E+091.00
A137.50±29.174.67n/an/an/an/an/an/an/an/a
mTVA45.42±9.671.55n/an/an/an/an/an/an/an/a
pTVA65.00±7.261.161.29E+01396.05E-16****[63.04, inf]2.04E+001.05E+131.00
aTVA60.42±5.190.831.25E+01391.46E-15****[59.02, inf]1.98E+004.51E+121.00
TVAs56.94±11.301.046.70E+001193.59E-10****[55.23, inf]6.10E-012.64E+071.00
Appendix 1—table 10
Assessing the significance of speaker age decoding performance using VLS and LIN models based on voxel activity.

This table reports the significance of the speaker age decoding performance. Linear classifiers were pre-trained to detect speaker age (2 classes) from either the VLS or the LIN models. The speaker age of the 18 Test Stimuli (3 participants x 6 stimuli per participant) was classified using either the VLS or LIN coordinates with these classifiers. We used one-sample t-tests to compare the mean of the accuracy distribution across 20 random classifier initializations (20 classifiers trained with a different initialization seed) with the chance level of 50%. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, and degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each model and ROI.

ModelROIAccuracy (%)s.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
LINLA10.29±1.280.29n/an/an/an/an/an/an/an/a
RA118.09±3.260.751.63E+01196.14E-13****[16.80, inf]3.65E+001.14E+101.00
LmTVA11.18±4.010.925.75E+00197.61E-06****[9.59, inf]1.29E+003.01E+031.00
RmTVA2.35±3.030.69n/an/an/an/an/an/an/an/a
LpTVA12.21±3.390.788.13E+00196.54E-08****[10.86, inf]1.82E+002.32E+051.00
RpTVA6.76±4.661.078.30E-01192.10E-01ns[4.92, inf]1.80E-016.29E-010.20
LaTVA11.47±1.280.291.90E+01194.04E-14****[10.96, inf]4.25E+001.48E+111.00
RaTVA7.35±8.291.907.70E-01192.25E-01ns[4.06, inf]1.70E-016.07E-010.18
A19.19±9.241.482.24E+00391.55E-02*[6.70, inf]3.50E-013.15E+000.71
mTVA6.76±5.670.919.70E-01391.68E-01ns[5.24, inf]1.50E-015.30E-010.25
pTVA9.49±4.900.784.59E+00392.24E-05****[8.16, inf]7.30E-011.01E+031.00
aTVA9.41±6.281.013.51E+00395.75E-04***[7.72, inf]5.50E-015.39E+010.96
TVAs8.55±5.780.535.04E+001198.44E-07****[7.68, inf]4.60E-011.59E+041.00
VLSLA10.15±0.640.15n/an/an/an/an/an/an/an/a
RA111.47±5.091.174.79E+00196.37E-05****[9.45, inf]1.07E+004.49E+021.00
LmTVA11.47±4.731.095.15E+00192.87E-05****[9.59, inf]1.15E+009.13E+021.00
RmTVA0.59±1.500.34n/an/an/an/an/an/an/an/a
LpTVA9.71±3.370.774.95E+00194.44E-05****[8.37, inf]1.11E+006.19E+021.00
RpTVA22.65±2.100.483.48E+01195.69E-19****[21.81, inf]7.78E+005.62E+151.00
LaTVA10.29±4.511.034.27E+00192.09E-04***[8.51, inf]9.50E-011.57E+020.99
RaTVA6.18±3.940.903.30E-01193.74E-01ns[4.62, inf]7.00E-024.88E-010.09
A15.81±6.721.08n/an/an/an/an/an/an/an/a
mTVA6.03±6.481.041.40E-01394.44E-01ns[4.28, inf]2.00E-023.44E-010.07
pTVA16.18±7.051.139.12E+00391.65E-11****[14.27, inf]1.44E+005.85E+081.00
aTVA8.24±4.710.753.12E+00391.69E-03**[6.97, inf]4.90E-012.09E+010.92
TVAs10.15±7.550.696.17E+001194.96E-09****[9.00, inf]5.60E-012.12E+061.00
Appendix 1—table 11
Assessing the significance of speaker identity decoding performance using VLS and LIN models based on voxel activity.

This table reports the significance of the speaker decoding performance. Linear classifiers were pre-trained to detect speaker identity (17 classes) from either the VLS or the LIN models. The speaker identity of the 18 Test Stimuli (3 participants x 6 stimuli per participant) was classified using either the VLS or LIN coordinates with these classifiers. We used one-sample t-tests to compare the mean of the accuracy distribution across 20 random classifier initializations (20 classifiers trained with a different initialization seed) with the chance level of 5.88%. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, and degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each model and ROI.

CategoryROIAccuracy VLS (%)Accuracy LIN (%)s.e.m. VLSs.e.m. LINT VLS vs LINdofp-valunc. sig.CI95%cohen-dBF10power
GenderLA161.94±1.9843.33±2.220.460.513.06E+01191.24E-17****[17.34, 19.88]8.6100002.94E+141.00
RA160.28±1.9850.83±1.980.460.461.62E+01191.46E-12****[8.22, 10.67]4.6400004.85E+091.00
LmTVA55.56±0.0238.89±0.020.000.00INF190.00E+00****[nan, nan]1027.400000nan1.00
RmTVA44.44±0.0261.39±1.210.000.28–6.10E+01192.91E-23****[-17.53,–16.36]19.2900006.23E+191.00
LpTVA66.67±0.0266.67±0.020.000.00nan19nanns[nan, nan]0.000000nan0.05
RpTVA61.11±0.0277.50±1.210.000.28–5.90E+01195.47E-23****[-16.97,–15.81]18.6600003.43E+191.00
LaTVA50.83±1.9844.44±0.020.460.001.41E+01191.65E-11****[5.44, 7.34]4.4400004.96E+081.00
RaTVA44.17±1.2144.44±0.020.280.00–1.00E+00193.30E-01ns[–0.86, 0.30]0.3200003.61E-010.27
A161.11±2.1547.08±4.300.340.691.66E+01392.67E-19****[12.32, 15.73]4.0700001.78E+161.00
mTVA50.00±5.5650.14±11.280.891.81–5.00E-02399.59E-01ns[–5.59, 5.31]0.0200001.71E-010.05
pTVA63.89±2.7872.08±5.480.440.88–6.21E+00392.64E-07****[-10.86,–5.53]1.8600005.92E+041.00
aTVA47.50±3.7244.44±0.010.600.005.14E+00398.05E-06****[1.85, 4.26]1.1500002.45E+031.00
TVAs53.80±8.3355.56±13.940.761.28–1.60E+001191.12E-01ns[–3.94, 0.42]0.1500003.49E-010.38
AgeLA166.67±0.0250.42±4.150.000.951.71E+01195.58E-13****[14.26, 18.24]5.4000001.20E+101.00
RA18.33±0.0210.83±3.820.000.88–2.85E+00191.03E-02*[-4.34,–0.66]0.9000005.02E+000.97
LmTVA49.17±12.6144.17±3.822.890.881.71E+00191.04E-01ns[–1.12, 11.12]0.5200007.97E-010.60
RmTVA41.67±0.0250.42±6.710.001.54–5.69E+00191.76E-05****[-11.97,–5.53]1.8000001.32E+031.00
LpTVA58.33±0.0252.50±3.820.000.886.67E+00192.24E-06****[4.00, 7.66]2.1100008.58E+031.00
RpTVA71.67±4.0856.67±3.330.940.761.16E+01194.80E-10****[12.29, 17.71]3.9200002.12E+071.00
LaTVA56.67±3.3352.50±3.820.760.883.68E+00191.58E-03**[1.80, 6.53]1.1300002.46E+011.00
RaTVA64.17±3.8275.42±6.170.881.41–6.90E+00191.40E-06****[-14.66,–7.84]2.1400001.31E+041.00
A137.50±29.1730.62±20.194.673.234.21E+00391.43E-04***[3.58, 10.17]0.2700001.75E+020.39
mTVA45.42±9.6747.29±6.291.551.01–9.50E-01393.47E-01ns[–5.86, 2.11]0.2300002.60E-010.29
pTVA65.00±7.2654.58±4.151.160.669.78E+00394.83E-12****[8.26, 12.57]1.7400001.83E+091.00
aTVA60.42±5.1963.96±12.550.832.01–2.25E+00393.03E-02*[-6.73,–0.35]0.3600001.61E+000.61
TVAs56.94±11.3055.28±10.861.041.001.56E+001191.22E-01ns[–0.45, 3.78]0.1500003.28E-010.37
IdentityLA10.15±0.640.29±1.280.150.29–4.40E-01196.66E-01ns[–0.85, 0.56]0.1400002.53E-010.09
RA111.47±5.0918.09±3.261.170.75–4.58E+00192.05E-04***[-9.64,–3.59]1.5100001.47E+021.00
LmTVA11.47±4.7311.18±4.011.090.922.10E-01198.39E-01ns[–2.70, 3.29]0.0700002.37E-010.06
RmTVA0.59±1.502.35±3.030.340.69–2.11E+00194.86E-02*[-3.52,–0.01]0.7200001.42E+000.86
LpTVA9.71±3.3712.21±3.390.770.78–2.43E+00192.53E-02*[-4.65,–0.35]0.7200002.39E+000.86
RpTVA22.65±2.106.76±4.660.481.071.31E+01195.99E-11****[13.34, 18.42]4.2800001.48E+081.00
LaTVA10.29±4.5111.47±1.281.030.29–1.00E+00193.30E-01ns[–3.64, 1.29]0.3500003.61E-010.31
RaTVA6.18±3.947.35±8.290.901.90–7.50E-01194.64E-01ns[–4.47, 2.12]0.1800002.98E-010.12
A15.81±6.729.19±9.241.081.48–3.77E+00395.40E-04***[-5.20,–1.57]0.4100005.30E+010.72
mTVA6.03±6.486.76±5.671.040.91–8.80E-01393.83E-01ns[–2.42, 0.95]0.1200002.45E-010.11
pTVA16.18±7.059.49±4.901.130.784.01E+00392.65E-04***[3.32, 10.07]1.0900001.00E+021.00
aTVA8.24±4.719.41±6.280.751.01–1.21E+00392.32E-01ns[–3.14, 0.79]0.2100003.37E-010.25
TVAs10.15±7.558.55±5.780.690.532.07E+001194.06E-02*[0.07, 3.12]0.2400007.94E-010.73
Appendix 1—table 12
Comparing the performance of the models decoding speaker identity-related information.

This table reports the significance of the speaker identity decoding VLS-LIN difference. Paired t-tests were conducted between the mean scores of linear classifiers pre-trained to detect gender (2 classes), age (2 classes), and identity (17 classes) from the VLS features, and those trained with the LIN features. These scores were obtained after classifying the VLS or LIN coordinates of the 18 Test Stimuli (3 participants x 6 stimuli per participant). s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, and degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each speaker information and ROI.

CategoryModelROIAccuracy ROI (%)Accuracy A1 (%)s.e.m. ROIs.e.m. A1T ROI vs A1dofp-valunc. sig.CI95%cohen-dBF10power
GenderLINmTVA50.14±11.2847.08±4.301.810.691.58E+00781.20E-01ns[–0.79, 6.90]3.50E-016.83E-010.35
pTVA72.08±5.4847.08±4.300.880.692.24E+01780.00E+00****[22.78, 27.22]5.01E+001.30E+321.00
aTVA44.44±0.0047.08±4.300.000.69–3.83E+00780.00E+00***[-4.01,–1.27]8.60E-019.78E+010.97
TVAs55.56±13.9447.08±4.301.280.693.76E+001580.00E+00***[4.02, 12.92]6.90E-019.90E+010.96
VLSmTVA50.00±5.5661.11±2.150.890.34–1.17E+01780.00E+00****[-13.01,–9.21]2.60E+003.45E+151.00
pTVA63.89±2.7861.11±2.150.440.344.94E+00780.00E+00****[1.66, 3.90]1.10E+003.59E+031.00
aTVA47.50±3.7261.11±2.150.600.34–1.98E+01780.00E+00****[-14.98,–12.24]4.43E+004.06E+281.00
TVAs53.80±8.3361.11±2.150.760.34–5.46E+001580.00E+00****[-9.96,–4.67]1.00E+006.55E+041.00
AgeLINmTVA47.29±6.2930.62±20.191.013.234.92E+00780.00E+00****[9.93, 23.41]1.10E+003.41E+031.00
pTVA54.58±4.1530.62±20.190.663.237.26E+00780.00E+00****[17.39, 30.53]1.62E+003.02E+071.00
aTVA63.96±12.5530.62±20.192.013.238.76E+00780.00E+00****[25.75, 40.91]1.96E+001.68E+101.00
TVAs55.28±10.8630.62±20.191.003.239.72E+001580.00E+00****[19.65, 29.66]1.78E+004.55E+141.00
VLSmTVA45.42±9.6737.50±29.171.554.671.61E+00781.10E-01ns[–1.88, 17.71]3.60E-017.10E-010.36
pTVA65.00±7.2637.50±29.171.164.675.71E+00780.00E+00****[17.92, 37.08]1.28E+006.21E+041.00
aTVA60.42±5.1937.50±29.170.834.674.83E+00780.00E+00****[13.47, 32.36]1.08E+002.48E+031.00
TVAs56.94±11.3037.50±29.171.044.676.03E+001580.00E+00****[13.07, 25.82]1.10E+008.68E+051.00
IdentityLINmTVA6.76±5.679.19±9.240.911.48–1.40E+00781.70E-01ns[–5.88, 1.03]3.10E-015.42E-010.28
pTVA9.49±4.909.19±9.240.781.481.80E-01788.60E-01ns[–3.04, 3.63]4.00E-022.36E-010.05
aTVA9.41±6.289.19±9.241.011.481.20E-01789.00E-01ns[–3.34, 3.78]3.00E-022.34E-010.05
TVAs8.55±5.789.19±9.240.531.48–5.10E-011586.10E-01ns[–3.11, 1.83]9.00E-022.19E-010.08
VLSmTVA6.03±6.485.81±6.721.041.081.50E-01788.80E-01ns[–2.76, 3.20]3.00E-022.35E-010.05
pTVA16.18±7.055.81±6.721.131.086.65E+00780.00E+00****[7.26, 13.47]1.49E+002.43E+061.00
aTVA8.24±4.715.81±6.720.751.081.85E+00787.00E-02ns[–0.19, 5.04]4.10E-011.01E+000.45
TVAs10.15±7.555.81±6.720.691.083.21E+001580.00E+00**[1.67, 7.00]5.90E-011.91E+010.89
Appendix 1—table 13
Comparing the performance of the models decoding speaker identity-related information by ROI.

This table reports the significance of the speaker identity decoding A1-TVAs difference. Two-sample t-tests were conducted for each model to determine if there was an A1-TVAs difference between the mean scores of linear classifiers pre-trained to detect gender (2 classes), age (2 classes), and identity (17 classes). These scores were obtained by classifying the VLS coordinates or LIN features, reconstructed by different ROIs, for the 18 Test Stimuli (3 participants x 6 stimuli per participant). s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each speaker information and model.

ModelROIAccuracy (%)s.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
LINLA144.87±7.992.31n/an/an/an/an/an/an/an/a
RA151.28±10.022.894.40E-01123.33E-01ns[46.13, inf]1.20E-016.06E-010.11
LmTVA51.71±10.763.115.50E-01122.96E-01ns[46.17, inf]1.50E-016.35E-010.13
RmTVA43.59±8.402.42n/an/an/an/an/an/an/an/a
LpTVA50.00±8.722.52n/an/an/an/an/an/an/an/a
RpTVA52.99±10.362.991.00E+00121.69E-01ns[47.66, inf]2.80E-018.49E-010.24
LaTVA51.28±8.482.455.20E-01123.05E-01ns[46.92, inf]1.50E-016.27E-010.13
RaTVA45.30±9.712.80n/an/an/an/an/an/an/an/a
A148.08±9.621.92n/an/an/an/an/an/an/an/a
mTVA47.65±10.472.09n/an/an/an/an/an/an/an/a
pTVA51.50±9.691.947.70E-01252.24E-01ns[48.18, inf]1.50E-015.43E-010.19
aTVA48.29±9.591.92n/an/an/an/an/an/an/an/a
TVAs49.15±10.071.15n/an/an/an/an/an/an/an/a
VLSLA150.00±11.323.27n/an/an/an/an/an/an/an/a
RA161.54±6.341.836.31E+00121.96E-05****[58.28, inf]1.75E+001.31E+031.00
LmTVA51.71±5.061.461.17E+00121.32E-01ns[49.11, inf]3.20E-019.85E-010.29
RmTVA45.73±8.762.53n/an/an/an/an/an/an/an/a
LpTVA63.25±7.402.146.20E+00122.29E-05****[59.44, inf]1.72E+001.14E+031.00
RpTVA60.26±6.481.875.48E+00127.01E-05****[56.92, inf]1.52E+004.30E+021.00
LaTVA60.26±6.101.765.82E+00124.10E-05****[57.12, inf]1.61E+006.86E+021.00
RaTVA50.00±8.982.59n/an/an/an/an/an/an/an/a
A155.77±10.842.172.66E+00256.70E-03**[52.07, inf]5.20E-017.39E+000.83
mTVA48.72±7.751.55n/an/an/an/an/an/an/an/a
pTVA61.75±7.121.428.26E+00256.56E-09****[59.32, inf]1.62E+002.00E+061.00
aTVA55.13±9.241.852.78E+00255.13E-03**[51.97, inf]5.40E-019.24E+000.85
TVAs55.20±9.681.104.71E+00775.29E-06****[53.36, inf]5.30E-013.23E+031.00
Appendix 1—table 14
Assessing the significance of the speaker gender categorization task.

This table reports the significance of the speaker’s gender categorization performance. 342 voice stimuli were used in the experiments: the original stimuli (N=18), directly reconstructed stimuli using the LIN and the VLS models (N=36), and brain-reconstructed stimuli (18 stimuli x 2 models x 4 regions of interest x 2 hemispheres, N=288). The participants were tasked with identifying the gender of the presented voice in each trial by clicking either the ‘Female’ or ‘Male’ button. To evaluate the accuracy of the binary responses, we computed the classification accuracy for each participant and region of interest (ROI). We then utilized one-sample t-tests to compare the mean accuracy distribution across all participants to the chance level of 50%. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each model and ROI.

ModelROIAccuracy (%)s.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
LINLA146.15±14.484.18n/an/an/an/an/an/an/an/a
RA144.23±11.503.32n/an/an/an/an/an/an/an/a
LmTVA50.00±9.812.83n/an/an/an/an/an/an/an/a
RmTVA57.69±12.853.712.07E+00123.02E-02*[51.08, inf]5.80E-012.80E+000.62
LpTVA50.00±10.342.980.00E+00125.00E-01ns[44.68, inf]0.00E+005.56E-010.05
RpTVA50.64±10.573.052.10E-01124.19E-01ns[45.20, inf]6.00E-025.67E-010.07
LaTVA48.72±13.013.76n/an/an/an/an/an/an/an/a
RaTVA62.82±13.323.853.33E+00122.98E-03**[55.97, inf]9.20E-011.77E+010.93
A145.19±13.112.62n/an/an/an/an/an/an/an/a
mTVA53.85±12.062.411.59E+00256.17E-02ns[49.73, inf]3.10E-011.26E+000.46
pTVA50.32±10.462.091.50E-01254.40E-01ns[46.75, inf]3.00E-024.19E-010.07
aTVA55.77±14.942.991.93E+00253.24E-02*[50.67, inf]3.80E-012.06E+000.59
TVAs53.31±12.821.462.27E+00771.31E-02*[50.88, inf]2.60E-012.77E+000.73
VLSLA154.49±10.653.071.46E+00128.50E-02ns[49.01, inf]4.00E-011.32E+000.39
RA150.00±8.012.310.00E+00125.00E-01ns[45.88, inf]0.00E+005.56E-010.05
LmTVA51.28±10.262.964.30E-01123.36E-01ns[46.01, inf]1.20E-016.04E-010.11
RmTVA54.49±10.653.071.46E+00128.50E-02ns[49.01, inf]4.00E-011.32E+000.39
LpTVA45.51±11.143.22n/an/an/an/an/an/an/an/a
RpTVA56.41±8.742.522.54E+00121.30E-02*[51.91, inf]7.00E-015.38E+000.77
LaTVA64.74±7.422.146.88E+00128.46E-06****[60.93, inf]1.91E+002.74E+031.00
RaTVA61.54±14.814.282.70E+00129.68E-03**[53.92, inf]7.50E-016.79E+000.82
A152.24±9.681.941.16E+00251.29E-01ns[48.94, inf]2.30E-017.57E-010.30
mTVA52.88±10.582.121.36E+00259.24E-02ns[49.27, inf]2.70E-019.47E-010.38
pTVA50.96±11.402.284.20E-01253.38E-01ns[47.07, inf]8.00E-024.50E-010.11
aTVA63.14±11.822.365.56E+00254.45E-06****[59.10, inf]1.09E+004.79E+031.00
TVAs55.66±12.481.423.98E+00777.72E-05****[53.29, inf]4.50E-012.69E+020.99
Appendix 1—table 15
Assessing the significance of the speaker age categorization task.

This table reports the significance of the speaker age categorization performance. 342 voice stimuli were used in the experiments: the original stimuli (N=18), directly reconstructed stimuli using the LIN and the VLS models (N=36), and brain-reconstructed stimuli (18 stimuli x 2 models x 4 regions of interest x 2 hemispheres, N=288). The participants were tasked with identifying the approximate age of the presented voice in each trial by clicking either the ‘Younger’ or ‘Older’ button. To evaluate the accuracy of the binary responses, we computed the classification accuracy for each participant and region of interest (ROI). We then utilized one-sample t-tests to compare the mean accuracy distribution across all participants to the chance level of 50%. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each model and ROI.

ModelROIAccuracy (%)s.e.m.Tdofp-valunc. sig.CI95%cohen-dBF10power
LINLA154.70 ± 9.893.501.34E+0081.08E-01ns[48.20, inf]4.50E-011.29E+000.34
RA157.41 ± 8.693.072.41E+0082.12E-02*[51.70, inf]8.00E-014.14E+000.71
LmTVA57.04 ± 7.772.752.56E+0081.68E-02*[51.93, inf]8.50E-014.94E+000.76
RmTVA42.36 ± 8.222.91n/an/an/an/an/an/an/an/a
LpTVA57.78 ± 7.702.722.86E+0081.06E-02*[52.72, inf]9.50E-017.04E+000.83
RpTVA50.98 ± 9.613.402.90E-0183.90E-01ns[44.67, inf]1.00E-016.67E-010.08
LaTVA40.48 ± 9.523.37n/an/an/an/an/an/an/an/a
RaTVA40.52 ± 7.042.49n/an/an/an/an/an/an/an/a
A156.05 ± 9.412.282.65E+00178.36E-03**[52.09, inf]6.30E-016.92E+000.82
mTVA49.70 ± 10.852.63n/an/an/an/an/an/an/an/a
pTVA54.38 ± 9.342.271.93E+00173.51E-02*[50.44, inf]4.60E-012.22E+000.58
aTVA40.50 ± 8.372.03n/an/an/an/an/an/an/an/a
TVAs48.19 ± 11.181.54n/an/an/an/an/an/an/an/a
VLSLA172.22 ± 9.163.246.86E+0086.48E-05****[66.20, inf]2.29E+004.52E+021.00
RA148.33 ± 5.772.04n/an/an/an/an/an/an/an/a
LmTVA51.46 ± 7.762.745.30E-0183.04E-01ns[46.36, inf]1.80E-017.25E-010.12
RmTVA41.11 ± 6.572.32n/an/an/an/an/an/an/an/a
LpTVA60.61 ± 5.672.005.29E+0083.68E-04***[56.88, inf]1.76E+001.06E+021.00
RpTVA66.05 ± 6.652.356.83E+0086.70E-05****[61.68, inf]2.28E+004.40E+021.00
LaTVA52.02 ± 8.332.946.90E-0182.56E-01ns[46.54, inf]2.30E-017.83E-010.15
RaTVA50.00 ± 7.532.66n/an/an/an/an/an/an/an/a
A160.28 ± 14.193.442.99E+00174.14E-03**[54.29, inf]7.00E-011.24E+010.89
mTVA46.29 ± 8.862.15n/an/an/an/an/an/an/an/a
pTVA63.33 ± 6.751.648.14E+00171.44E-07****[60.48, inf]1.92E+001.11E+051.00
aTVA51.01 ± 8.001.945.20E-01173.05E-01ns[47.63, inf]1.20E-015.49E-010.13
TVAs53.54 ± 10.691.472.41E+00539.69E-03**[51.08, inf]3.30E-014.15E+000.77
Appendix 1—table 16
Assessing the significance of the speaker identity discrimination task.

This table reports the significance of the speaker identity discrimination performance. The participants listened to 684 voice stimuli with short breaks in between. Each trial contained 2 short sound samples, and the participants had to indicate whether the samples were from the same speaker or different speakers. We then utilized one-sample t-tests to compare the mean accuracy distribution across all participants to the chance level of 50%. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each model and ROI.

CategoryROIAccuracy VLS (%)Accuracy LIN (%)s.e.m. VLSs.e.m. LINT VLS vs LINdofp-valunc. sig.CI95%cohen-dBF10power
GenderLA148.33±10.5646.11±6.603.522.204.70E-0196.48E-01ns[–8.41, 12.85]0.2400003.40E-010.10
RA160.00±5.9853.33±8.311.992.771.86E+0099.63E-02ns[–1.46, 14.79]0.8700001.08E+000.69
LmTVA51.67±5.5850.56±10.081.863.364.10E-0196.93E-01ns[–5.05, 7.27]0.1300003.32E-010.07
RmTVA46.67±9.0343.33±8.533.012.841.20E+0092.60E-01ns[–2.94, 9.60]0.3600005.51E-010.18
LpTVA63.89±7.9549.44±9.442.653.153.03E+0091.43E-02*[3.65, 25.24]1.5700004.66E+000.99
RpTVA62.22±4.8453.33±10.601.613.531.95E+0098.26E-02ns[–1.41, 19.18]1.0200001.21E+000.82
LaTVA60.00±5.4450.56±8.771.812.922.68E+0092.50E-02*[1.49, 17.40]1.2300003.00E+000.93
RaTVA50.00±9.9445.56±9.883.313.291.15E+0092.80E-01ns[–4.30, 13.19]0.4300005.26E-010.23
A154.17±10.3749.72±8.332.381.911.52E+00191.45E-01ns[–1.67, 10.56]0.4600006.25E-010.50
mTVA49.17±7.9146.94±10.011.812.301.16E+00192.58E-01ns[–1.77, 6.21]0.2400004.21E-010.18
pTVA63.06±6.6451.39±10.231.522.353.57E+00192.06E-03**[4.82, 18.51]1.3200001.95E+011.00
aTVA55.00±9.4448.06±9.672.172.222.66E+00191.54E-02*[1.49, 12.40]0.7100003.59E+000.85
TVAs55.74±9.8848.80±10.151.291.324.37E+00595.05E-05****[3.77, 10.12]0.6900004.08E+021.00
AgeLA154.49±10.6546.15±14.483.074.181.54E+00121.50E-01ns[–3.48, 20.14]0.6300007.18E-010.55
RA150.00±8.0144.23±11.502.313.321.24E+00122.39E-01ns[–4.38, 15.92]0.5600005.25E-010.46
LmTVA51.28±10.2650.00±9.812.962.832.70E-01127.94E-01ns[–9.17, 11.73]0.1200002.87E-010.07
RmTVA54.49±10.6557.69±12.853.073.71–7.20E-01124.88E-01ns[–12.97, 6.55]0.2600003.47E-010.14
LpTVA45.51±11.1450.00±10.343.222.98–1.17E+00122.66E-01ns[–12.87, 3.89]0.4000004.90E-010.27
RpTVA56.41±8.7450.64±10.572.523.051.74E+00121.08E-01ns[–1.47, 13.00]0.5700009.09E-010.47
LaTVA64.74±7.4248.72±13.012.143.765.25E+00122.04E-04***[9.38, 22.68]1.4500001.55E+021.00
RaTVA61.54±14.8162.82±13.324.283.85–2.50E-01128.08E-01ns[–12.51, 9.95]0.0900002.86E-010.06
A152.24±9.6845.19±13.111.942.622.01E+00255.55E-02ns[–0.18, 14.28]0.6000001.16E+000.84
mTVA52.88±10.5853.85±12.062.122.41–3.00E-01257.70E-01ns[–7.65, 5.72]0.0800002.16E-010.07
pTVA50.96±11.4050.32±10.462.282.092.40E-01258.14E-01ns[–4.90, 6.19]0.0600002.13E-010.06
aTVA63.14±11.8255.77±14.942.362.992.16E+00254.02E-02*[0.35, 14.39]0.5400001.50E+000.75
TVAs55.66±12.4853.31±12.821.421.461.28E+00772.03E-01ns[–1.29, 6.00]0.1800002.74E-010.36
IdentityLA172.22±9.1654.70±9.893.243.503.64E+0086.61E-03**[6.41, 28.63]1.7300008.84E+000.99
RA148.33±5.7757.41±8.692.043.07–1.97E+0088.49E-02ns[–19.72, 1.57]1.1600001.23E+000.86
LmTVA51.46±7.7657.04±7.772.742.75–1.44E+0081.87E-01ns[–14.49, 3.34]0.6800007.08E-010.43
RmTVA41.11±6.5742.36±8.222.322.91–2.50E-0188.08E-01ns[–12.72, 10.22]0.1600003.30E-010.07
LpTVA60.61±5.6757.78±7.702.002.721.05E+0083.23E-01ns[–3.37, 9.02]0.3900005.02E-010.18
RpTVA66.05±6.6550.98±9.612.353.404.37E+0082.39E-03**[7.11, 23.03]1.7200002.01E+010.99
LaTVA52.02±8.3340.48±9.522.943.372.00E+0088.02E-02ns[–1.75, 24.84]1.2200001.29E+000.89
RaTVA50.00±7.5340.52±7.042.662.492.60E+0083.16E-02*[1.07, 17.88]1.2300002.59E+000.89
A160.28±14.1956.05±9.413.442.289.20E-01173.68E-01ns[–5.42, 13.86]0.3400003.54E-010.28
mTVA46.29±8.8649.70±10.852.152.63–1.10E+00172.86E-01ns[–9.96, 3.13]0.3300004.11E-010.27
pTVA63.33±6.7554.38±9.341.642.273.46E+00173.02E-03**[3.49, 14.41]1.0700001.45E+010.99
aTVA51.01±8.0040.50±8.371.942.033.17E+00175.62E-03**[3.51, 17.51]1.2500008.55E+001.00
TVAs53.54±10.6948.19±11.181.471.542.80E+00537.15E-03**[1.51, 9.18]0.4800004.88E+000.94
Appendix 1—table 17
Comparing human listeners' performance in discriminating speaker identity-related information decoded with VLS versus LIN.

This table reports the significance of the VLS-LIN difference in the speaker identity categorization and discrimination performance. Paired t-tests were conducted between the scores of human listeners at discriminating the speaker gender (2 classes), age (2 classes), and identity (17 classes) of the 18 Test Stimuli reconstructed from the VLS features with those from LIN features. s.e.m.=standard error of the mean. Here are reported the results of the statistical tests, t-value, degree of freedom (dof), p-value, degree of significance (unc. sig.), 95% confidence interval (CI95%), effect size (Cohen-d), Bayes Factor (BF10), and statistical power (power) for each speaker identity information and ROI.

CategoryModelROIAccuracy ROI (%)Accuracy A1 (%)s.e.m. ROIs.e.m. A1T ROI vs A1dofp-valunc. sig.CI95%cohen-dBF10power
GenderLINmTVA46.94±10.0149.72±8.332.301.91–9.30E-01383.60E-01ns[–8.83, 3.27]2.90E-014.35E-010.15
pTVA51.39±10.2349.72±8.332.351.915.50E-01385.80E-01ns[–4.46, 7.79]1.70E-013.48E-010.08
aTVA48.06±9.6749.72±8.332.221.91–5.70E-01385.70E-01ns[–7.59, 4.26]1.80E-013.51E-010.09
TVAs48.80±10.1549.72±8.331.321.91–3.60E-01787.20E-01ns[–5.99, 4.14]9.00E-022.77E-010.06
VLSmTVA49.17±7.9154.17±10.371.812.38–1.67E+00381.00E-01ns[–11.06, 1.06]5.30E-019.19E-010.37
pTVA63.06±6.6454.17±10.371.522.383.15E+00380.00E+00**[3.17, 14.61]9.90E-011.21E+010.87
aTVA55.00±9.4454.17±10.372.172.382.60E-01388.00E-01ns[–5.68, 7.35]8.00E-023.17E-010.06
TVAs55.74±9.8854.17±10.371.292.386.00E-01785.50E-01ns[–3.64, 6.78]1.60E-013.05E-010.09
AgeLINmTVA53.85±12.0645.19±13.112.412.622.43E+00502.00E-02*[1.50, 15.81]6.70E-012.95E+000.66
pTVA50.32±10.4645.19±13.112.092.621.53E+00501.30E-01ns[–1.61, 11.87]4.20E-017.23E-010.32
aTVA55.77±14.9445.19±13.112.992.622.66E+00501.00E-02*[2.59, 18.56]7.40E-014.65E+000.74
TVAs53.31±12.8245.19±13.111.462.622.75E+001021.00E-02**[2.27, 13.97]6.20E-015.95E+000.78
VLSmTVA52.88±10.5852.24±9.682.121.942.20E-01508.20E-01ns[–5.12, 6.40]6.00E-022.84E-010.06
pTVA50.96±11.4052.24±9.682.281.94–4.30E-01506.70E-01ns[–7.29, 4.73]1.20E-013.00E-010.07
aTVA63.14±11.8252.24±9.682.361.943.56E+00500.00E+00***[4.76, 17.04]9.90E-013.70E+010.94
TVAs55.66±12.4852.24±9.681.421.941.26E+001022.10E-01ns[–1.95, 8.79]2.90E-014.67E-010.24
IdentityLINmTVA49.70±10.8556.05±9.412.632.28–1.82E+00348.00E-02ns[–13.43, 0.72]6.10E-011.15E+000.43
pTVA54.38±9.3456.05±9.412.272.28–5.20E-01346.10E-01ns[–8.21, 4.86]1.70E-013.58E-010.08
aTVA40.50±8.3756.05±9.412.032.28–5.09E+00340.00E+00****[-21.76,–9.35]1.70E+001.25E+031.00
TVAs48.19±11.1856.05±9.411.542.28–2.65E+00701.00E-02*[-13.79,–1.94]7.20E-014.69E+000.74
VLSmTVA46.29±8.8660.28±14.192.153.44–3.45E+00340.00E+00**[-22.24,–5.75]1.15E+002.21E+010.92
pTVA63.33±6.7560.28±14.191.643.448.00E-01344.30E-01ns[–4.69, 10.79]2.70E-014.13E-010.12
aTVA51.01±8.0060.28±14.191.943.44–2.35E+00342.00E-02*[-17.30,–1.24]7.80E-012.53E+000.63
TVAs53.54±10.6960.28±14.191.473.44–2.09E+00704.00E-02*[-13.16,–0.31]5.70E-011.64E+000.54

Additional files

MDAR checklist
https://cdn.elifesciences.org/articles/98047/elife-98047-mdarchecklist1-v1.pdf
Audio file 1

Voice latent space interpolation.

The audio files are two original voice samples (A, B); the synthesized voice samples from the spectrograms of the autoencoder reconstructions of the original two voice samples (A’, B’); the synthesized voice samples from the spectrograms of the linearly interpolated voice latent space (VLS; A_to_B; Figure 1c).

https://cdn.elifesciences.org/articles/98047/elife-98047-supp1-v1.zip
Audio file 2

Brain-based voice reconstructions.

The audio files are reconstructed voice samples from the fMRI responses in the speakers' temporal voice areas (TVAs). These sounds were used in the quantitative and subjective voice identity tests (Figure 4). The samples below are from a German and a Spanish speaker. The sounds are reconstructed for each speaker using 2 models: LIN and VLS.

https://cdn.elifesciences.org/articles/98047/elife-98047-supp2-v1.zip

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Charly Lamothe
  2. Etienne Thoret
  3. Régis Trapeau
  4. Bruno L Giordano
  5. Julien Sein
  6. Sylvain Takerkart
  7. Stephane Ayache
  8. Thierry Artieres
  9. Pascal Belin
(2026)
Reconstructing voice identity from noninvasive auditory cortex recordings
eLife 13:RP98047.
https://doi.org/10.7554/eLife.98047.3