1 Introduction

Processing of complex auditory stimuli is fundamental to vocal communication. In humans with typical development, auditory processing has been shown to depend on multiple regions within the temporal cortex spreading across the superior and lateral aspects of the superior temporal gyrus (STG). The superior aspect of the STG hosts tonotopically organized primary auditory regions, which have been shown to be located in regions including the first transverse temporal gyrus (TTG), i.e., Heschl’s gyrus (HG; Dick et al., 2012; Moerel et al., 2014). The planum polare (PP) is immediately anterior to HG, while the planum temporale (PT) borders it posteriorly. Of note, TTG shows high variability in shape and size, some individuals having one single gyrus (HG), others presenting duplications (with a common stem or fully separated) or multiplication of TTG (Geschwind and Levitsky, 1968; Marie et al., 2015). The superior temporal sulcus (STS) lies directly below the STG and separates it from the middle temporal gyrus. Understanding functional contributions of these different brain areas to processing of sound and vocal stimuli has been advanced by lesion studies (Hillis et al., 2017), ablation, and resection case studies (Hamilton et al., 2021; Hullett et al., 2022), functional magnetic resonance imaging (Booth et al., 2002; da Costa et al., 2011; Humphries et al., 2014), and intracranial recordings (Hamilton et al., 2021; Mesgarani et al., 2014). To date, however, no single functional parcellation of the human auditory cortical areas is generally agreed upon (Hamilton et al., 2021; Moerel et al., 2014). For example, in contrast to hierarchical models of auditory processing (e.g., Binder et al., 2000; Humphries et al., 2014; Scott and Johnsrude, 2003; Wessinger et al., 2000), recent proposals of sound processing postulate that processing of speech is distributed, and happens in parallel in different regions of the auditory cortex (Hamilton et al., 2021). According to parallel processing models, the posterior STG has been shown to be a crucial and essential locus for language and phonological processing (Bhaya-Grossman and Chang, 2022; Hillis et al., 2017), and to encode acoustic-articulatory features of speech sounds (Lakertz et al., 2021; Mesgarani et al., 2014). The primary auditory regions, including Heschl’s gyrus, have been argued not to be necessary for speech perception (Hamilton et al., 2021; Hullett et al., 2022).

Structural brain imaging provides another route for gaining insight into the functional roles of cortical subregions, albeit at a different timescale. Quantifying individual differences in behavioral skill and/or language experience and relating them to cortical morphology can inform us about the relative influences of experience-dependent plasticity and of potential predisposition, in different domains. Of note, distinct influences (environmental versus genetic) tend to be reflected by different underlying anatomical characteristics of the brain. Indeed, a recent large-scale genome-wide association meta-analysis suggests that cortical surface area is relatively more influenced by genetics and that cortical thickness tends to reflect environmentally-driven neuroplasticity (Grasby et al., 2020). In the context of multilingualism, structural brain imaging can offer insights into whether and how the brain accommodates experience with different languages, and into what type of linguistic information is encoded, stored, and processed by particular brain structures. Previous findings point to an influence of experience-dependent plasticity on thickness of regions associated with speech processing, including the left posterior STS and left PT (Hervais-Adelman et al., 2017), as well as the STG (Mårtensson et al., 2012). Moreover, Ressel and colleagues (2012) associated lifelong bilingualism with increased volume of the Heschl’s gyrus, a finding interpreted as reflecting experience-related plasticity related to the acquisition of novel phonology.

Languages differ from each other in many ways and to different degrees: language similarities vary per linguistic domain, with distinct typological distances that can be computed between the languages’ phonological, lexical, and syntactic systems. The present study investigates whether there are brain structural signatures of multilingualism in the auditory cortex, and whether typological distances between spoken languages further modulate those signatures. Given our focus on auditory brain regions, we quantified how the languages spoken by the participants differed from each other in terms of their phonological systems. Specifically, we investigated whether the neuroanatomical indices describing the auditory cortex regions were related to cross-linguistic phonological information at different levels: acoustic and articulatory feature-level, phoneme-level, or (more abstract) counts of phonological classes.

To investigate the relationship between the individual variability in the morphology of the auditory cortex and variability in language experience, we analyzed the anatomical brain scans of 204 healthy, right-handed participants from the PLORAS database (Seghier et al., 2016). The sample was split into two groups according to the date of data acquisition. The main sample included 136 participants speaking between 1 to 7 languages (2.65 languages on average), and representing a relatively wide range of linguistic diversity (34 different languages in total), see Figure 1 for a visual representation of the sample’s language experience. The replication sample included 68 participants speaking to up to 5 languages, 2.44 languages on average. All MRI anatomical images were processed with FreeSurfer’s brain structural pipeline (Fischl et al., 2004). Multilingualism was operationalized in a continuous way, without artificially dichotomizing the sample into multi-, bi-, and monolinguals (cf. DeLuca et al., 2019; Luk and Bialystok, 2013). Given the focus on auditory brain regions and on potential associations with language distance at the phonological level, and the fact that phonological perceptual space is known to be shaped relatively early in life (e.g., Werker and Hensch, 2015), we used information regarding the age of onset(s) of acquisition (AoA) of the different languages spoken by participants, and weighed each language by its AoA. For this, we operationalized the language experience in a continuous and quantitative measure for each participant, using Shannon’s (1948) entropy equation (see Materials and Methods, Section 4.2), with high entropy values indicating more diverse language experience.

Our analyses aimed at a fine localization of the structural effects of multilingualism. We predicted (see Bhaya-Grossman and Chang, 2022; Hamilton et al., 2021; Lakertz et al., 2021; Mesgarani et al., 2014) that the cortical thickness of posterior STG would be related to the composite measures of language experience in our multilingual participants, and to the differences between their languages at the acoustic and articulatory feature level. In terms of the predicted direction of these effects, according to most current views, multilingualism is hypothesized to be a dynamic process reshaping brain structure and inducing both increases (in initial stages) and decreases (in peak efficiency) in brain morphological indices as a function of quantity of the multilingual experience (Pliatsikas, 2020). Furthermore, acquisition of multiple languages is thought to have additive effects on brain structure, and to induce cortical and subcortical adaptations in order to accommodate the additional languages (see e.g., Hervais-Adelman et al., 2018). The exact nature of such adaptations for multiple languages of phonologically diverse repertories on auditory brain regions is to date unknown. A positive relationship between the thickness of auditory regions and the degree of multilingualism (weighted by greater phonological distance) would indicate that the auditory regions need to expand in order to accommodate the variability of phonological input in the environment; a negative one would suggest a specialization of the auditory system in relation to such variability (Pliatsikas, 2020).

Illustration of the sample’s language experience. Each bar represents a single participant’s overall language experience; the height of the stacked bars within each bar represents the AoA index for individual languages (the taller the bar, the earlier in life a given language was acquired). The color of each stacked bar refers to the number of phonemes in each language’s phonological inventory. For reference, English phonological inventory has 40 phonemes (Stanford Phonology Archive, 2019a). Prior to plotting, data was sorted by the overall language experience based on a sum of AoA index for participants’ individual languages; consequently, data of participants with most diverse language experience can be found on the left-hand side of the figure, and the right-hand side includes data from monolinguals (i.e., exposed only to one language).

2 Results

Table 1 provides an overview of the performed analyses and of results.

Overview of the performed analyses.

2.1 Auditory cortex and language experience

In a first, exploratory analysis, we investigated relationships between the composite language experience measure and the volume, surface area, and average thickness of the following auditory subregions: STG, STS, HG, HS, PT and PP, as parcellated by FreeSurfer (Destrieux et al., 2010). Given the large size of STG and STS, we refined their segmentations by dividing the STG into anterior and posterior part, and the STS into anterior, middle, and posterior parts (see Materials and Methods, Section 4.6 for details). Using the lme4 R package (Bates et al., 2015), we fit three linear mixed models to the extracted anatomical measures (volume, surface area, average thickness), with participants modeled as random effects, language experience, region of interest (ROI), and hemisphere as fixed effects, controlling for the covariates of age, sex and whole-brain volume, area, or mean thickness. Interaction terms for language experience, ROI, and hemisphere were included in the models to determine whether language experience would differentially affect any of the segmented regions. Significance was calculated using the lmerTest package (Kuznetsova et al., 2017), which applies Satterthwaite’s method to estimate degrees of freedom and generate p-values for mixed models. Out of all investigated cortical measures, only average thickness of the PT, bilaterally, was related to participants’ language experience at p = .01, see Table S 1. Specifically, participants with more extensive and diverse language experience had a significantly thinner PT.

2.2 Superior temporal plane and language experience

Given that full TTG duplications, triplications etc. by definition belong to the PT, and that previous work has shown relationships between TTG duplications patterns and language aptitude (Turker et al., 2017), we wanted to better pinpoint the location of the above effects within PT subregions. To this end, and to gain more detailed insight into the localization of the relationship between the thickness of the PT and participants’ language experience, we ran three follow-up analyses, an ROI-based one, a vertex-wise based one, as well as an analysis investigating the effect of language proficiency.

First, we used an automatic toolbox (TASH; Dalboni da Rocha et al., 2020) to segment gyri along the superior temporal plane (HG, or 1st TTG, and additional TTG(s), when present), and to extract their cortical thickness, surface area and volume. Note that FreeSurfer does not segment additional TTGs specifically, but provides measures for the PT as a whole. We included HG in these follow-up analyses (i.e., even though we had included the Freesurfer HG ROI in the above, broader analyses), given that the default FreeSurfer pipeline is not fine-tuned for the segmentation of this small and variable region, and is thus error-prone for this ROI (Dalboni da Rocha et al., 2020). As expected from previous work, the TTG showed large individual variability in overall duplication patterns in the sample. Segmentations revealed different numbers of gyri across hemispheres and participants, ranging from a single gyrus to four identified TTGs, see Table 2. On average, participants had 2.375 gyri in the left hemisphere (SD = 0.63), and 1.84 in the right (SD = 0.64). The number of gyri was not related to participants’ language experience, either in the left (β = 0.10, t = 0.97, p = .335), or in the right hemisphere (β = -0.01, t = -0.13, p = .90), according to a linear model with number of gyri as dependent and language experience as independent variables (controlling for participants’ age and sex).

Number of participants, their demographic and language experience characteristics displaying different overall shapes of the TTG (i.e., total number of identified gyri in the left and right hemisphere). Last column lists whole sample’s descriptive statistics.

To localize the effect of language experience on the cortical thickness of the segmented gyri, we fit another linear mixed model with participants modeled as random effects, language experience, gyrus (first, second, third), and hemisphere as fixed effects, controlling for the covariates of age, sex, and mean thickness (including interaction terms for language experience, gyrus, and hemisphere). We also ran two additional models on the volume and surface area values of the segmented regions, to confirm that the results were specific to the average thickness values, as in the model above (including all auditory regions, see Section 2.1). Out of all investigated cortical measures, only average thickness of the second TTG (bilaterally) was related to participants’ language experience at p < .01, see Table S 2.

Second, to confirm the above result, we also ran a whole-brain vertex-wise analysis in FreeSurfer. We fit a general linear model to the cortical thickness surface maps of all subjects (smoothed with a 5-mm kernel), testing for a significant effect of the language experience index across the whole brain. We found one cluster of vertices negatively related to participants’ language experience at p < .0001 (uncorrected), located in the superior aspect of the left STG, corresponding to the location of the second TTG, see Figure S 4. At the same threshold, no vertices were found in the right hemisphere, possibly due to greater known anatomical variability in right compared to left auditory regions (cf. Dalboni da Rocha et al., 2020).

We also explored whether the proficiency attained in the spoken languages was related to the thickness of the 2nd TTG. Analyses calculating cumulative language experience based on the participants’ proficiency instead of AoA yielded similar results as above, with a significant negative relationship between the proficiency-based cumulative language experience index and the thickness of the left 2nd TTG, and a trend towards significance for the right 2nd TTG (see Section 7.6 for details).

2.3 Second Transverse Temporal Gyrus and effects of Language Typology

Building on the above result, we investigated whether the thickness of participants’ 2nd TTG was related to crosslinguistic phonological information describing the languages they spoke, above and beyond being related to mere experience of speaking several languages. The analysis below was therefore performed on a sub-sample of participants who had multiple TTGs (n = 130 in the left and n = 96 in the right hemisphere). We constructed three measures of typological distance between the languages spoken by our participants: (1) overlaps in distinctive acoustic and articulatory features describing the phonemes of each language (e.g., “short”, “long”); (2) overlaps in phoneme inventories (meaning-distinguishing sounds) of each language; and (3) similarity in counts of phonological classes that share certain features (e.g., counts of “front rounded vowels”, “clicks”), see Material and Methods for details. Next, using Rao’s quadratic entropy equation (1982), we weighted the distance for each pair of languages by the AoA of each language (with higher weights for lower AoA), and the resulting values were summed, per participant. This calculation resulted in three indices of language experience accounting for different typological relations between languages (see Kepinska et al., 2023 for a similar approach applied to lexical distances). We used these indices as dependent variables in a set of multiple regression analyses fit to the average thickness values of the left and right second TTG, and performed a model comparison procedure to gauge which of the typological measure was most related to the thickness of the second TTG. Model comparison was performed by computing differences in explained variance (ΔR2 Adjusted) compared to the baseline model (i.e., with language experience without typological information), and using the Bayesian information criterion (BIC) (Schwarz, 1978), where the difference between two BICs was converted into a Bayes factor using the below equation, following Wagenmakers (2007):

The model without typological information served as baseline for the comparisons. The analyses showed that out of the three language experience indices (feature-level, phoneme-level and counts of phonological classes), the model containing phoneme level information explained the most variance in the average thickness of the bilateral second TTG. It also outperformed the model containing the language experience index not accounting for any typological information, see Table 3. None of the other phonological distance measures combined with language experience performed better than language experience alone. The direction of the relationship between language experience weighted by typological distance was negative, and had a small effect size (β = -0.35, t = -2.96, p = .004, f2 = 0.07 and β = -0.26, t = -1.98, p = .05, f2 = 0.04 for left and right hemisphere, respectively), showing that the more extensive one’s language experience, and the more varied at the phoneme level one’s languages are, the thinner the second TTG cortex (see Figure 2). The statistics for the overall regression models are presented in the Supplementary Materials, Section 7.4.

Left and right second transverse temporal gyri and language experience. Multiple regression model parameters (parameter estimates and standard errors, in brackets; p-values are listed according to the coding presented underneath the table) for the average cortical thickness values of the second TTG, as predicted by the four language experience indices: (1) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (2) acoustic/articulatory features, (3) phonemes, and (4) counts of phonological classes. Last two rows present model comparison results (additional variance explained and BF10 values). NB. all models including typological information were compared against the ‘No typology’ model.

Multilingual language experience and thickness of the second TTG. Average thickness of the second TTG in the left and right hemisphere were negatively related to the multilingual language experience index weighted by their phoneme-level phonological distances. Plots show residuals, controlling for age, sex and mean hemispheric thickness.

To further probe how cross-linguistic sound inventories may modulate the effect of language experience on the brain, and to account for language experience in participants with different mother tongues (L1s), we constructed a cross-linguistic description of individual languages (see Methods, Section 4.3.2) and used it to calculate a measure of cumulative phoneme inventory for each participant. Here, we counted how many unique phonemes each participant was exposed to across all their languages (i.e., phonemes that overlapped between languages were counted only once), irrespective of when these languages were acquired. We used this measure as an independent variable in another linear model fit to the average thickness values of the second TTG (controlling for age, sex, hemispheric thickness, and accounting for language experience irrespective of language typology) (see Section 4.3.2, Supplementary Materials, Section 7.4.1 and Table S 3). In the left hemisphere, this measure explained variance in the thickness of the second TTG above and beyond language experience (β = -0.004, t = -2.910, p = .004), with a small effect size (f2 = 0.06). In the right hemisphere, the thickness of the second TTG was not significantly related to the ’cumulative phoneme inventory’ measure (p = .76), when also accounting for overall language experience irrespective of typology. This result shows that the thickness of the left second TTG in these participants was related to the specific characteristics of languages at the phoneme-level of their phonological inventories, irrespective of when these languages were acquired, while the thickness of the right second TTG was relatively more related to when and how many languages were learned (even though the typological distance information did improve the model fit, see Table 3), see Figure S 5 and Table S 3, and Supplementary Materials, Section 7.4.

2.4 Language experience in participants with a single TTG

So far, we revealed associations between phonological language experience and the cortical thickness of the 2nd TTG bilaterally in multilinguals. Yet, 40 participants out of 136 (29.4%) had a single TTG (Heschl’s gyrus) in the right hemisphere (versus only 6 participants, i.e., 4.4% in the left; see Table 2). We therefore repeated the above analysis in the 40 participants with a single TTG in the right hemisphere in order to investigate the relationship between TTG anatomy and language experience in this sample. The analysis revealed that in participants without posterior TTGs in the right hemisphere, the thickness of the first TTG was indeed related to their language experience. The nature of this relation proved however to be different than that for multilingual participants with multiple TTGs. First, the language experience index accounting for most variance in the thickness of the first TTG was the index not accounting for typological relations between multilinguals’ languages, see Table S 4. Second, the direction of this effect was inverse to that found for participants with multiple TTGs: the more extensive one’s language experience, the thicker the HG (β = 0.15, t = 2.89, p = .007; medium effect size: f2 = 0.237), see Figure 3. Of note, we did not find any significant relationship between the thickness of the PT and language experience in this sub-group of participants (β = 0.02, t = 2.43, p = .81). In addition, for PT thickness, compared to a model including only covariates of no-interest, the model including the language experience index offered moderate evidence against it (BF10 = 0.16), suggesting that the observed effects between cortical thickness and language experience in the superior temporal plane are specific to gyri only, see Table S 4, and Supplementary Materials, Section 7.5.

Thickness of the right first TTG and PT in participants with a single TTG in the right hemisphere. Average thickness of HG in the right hemisphere was positively related to the amount of multilingual experience, irrespective of typological relations between languages (left panel). The average thickness of the right PT was not related to language experience (right panel). Plots show residuals, controlling for age, sex, and mean hemispheric thickness.

2.5 Replication analysis

In an effort to strengthen the inferences that can be made from the above results, we tested whether the relationship between multilingual language experience and thickness of the second TTG could be replicated in an independent sample of 68 participants. Using linear regression, we fit models to the average thickness values of their left and right second TTG. As in the analyses on the original sample, we first fit a model using the language experience index irrespective of typology, and then we fit models that additionally accounted for the three types of typological relations between languages (features, phonemes, and counts of phonological classes), controlling for covariates of age, sex, scanner model, and mean hemispheric thickness. We again observed that language experience did significantly predict the average thickness of the second TTG, but only in the right hemisphere (β = -0.13, t = -2.05, p = .046; small effect size: f2 = 0.087), see Table 4. Furthermore, similarly to the analysis on the main sample (see Section 2.3), the language experience index weighed by phonological information at the phoneme level, showed a better fit than the model without typological information, explaining 3% of additional variance in the average thickness of the right second TTG. In contrast to the results of the main sample, however, this time the model including phonological feature-level information had a better fit to the thickness data and explained an additional 14% of the variance (11% more variance than the model with language experience weighted by phoneme-level distances), see Table 4 and Figure 4C. No language experience indices were significantly related to the average thickness values of second TTG in the left hemisphere (β = -0.03, t = -0.53, p = .60, and β = -0.09, t = -0.58, p = .57, for language experience index without and with typological phoneme-level information, respectively), see Figure 4 and Table 4.

Multilingual language experience and thickness of the second TTG in an independent sample of participants. (A) Average thickness of the left second TTG was not significantly related to the language experience index; average thickness of the right second TTG was significantly related to the language experience indices accounting for phoneme-level phonological overlaps between multilinguals’ languages (B) and feature level information (C). The model including phonological feature level information presented in panel (C) had the best fit to the average thickness data of the right second TTG.

Thickness of left and right second TTGs and language experience in an independent sample of participants. Multiple regression model parameters (parameter estimates and standard errors, in brackets; p-values are listed according to the coding presented underneath the table) for the average cortical thickness of the second TTG, as predicted by the four language experience indices: (1) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (2) features, (3) phonemes, and (4) counts of phonological classes. The last two rows present model comparison results (additional variance explained and BF10 values). NB. all models including typological information were compared against the ‘No typology’ model.

3 Discussion

Multilingual language experience, quantified in a continuous manner, is related to the structure of the auditory cortex. Using two independent samples of participants, we find replicable effects showing that multilingual language experience, weighed by the age of acquisition of the respective languages, is specifically related to the thickness of gyri (TTGs) of the superior temporal plane. Moreover, we show that the typological distances between multilinguals’ languages, in particular typological distance measures based on phonemes of the different languages (and to some degree on their acoustic and articulatory features), explains variation in TTG thickness above and beyond the effect of AoA alone.

The effects we find are specific to the second transverse temporal gyrus in people having at least one full posterior duplication, and to the first TTG in people having only one gyrus. The anterior-most TTG, i.e. HG, is the first cortical relay station for auditory input. HG and the TTG as a whole exhibit large individual variation in shape and size: some individuals have a single TTG, but duplications (complete or partial) and multiplications of the gyrus are also common, and have been observed both ex-vivo (Geschwind and Levitsky, 1968) and in-vivo (Marie et al., 2015). The TTG are known to be formed before birth, in utero (Chi et al., 1977). Further, genes known to be involved in speech processing, in the development of the nervous system, and in X-linked deafness, have been related to the surface area and thickness of the left and right HG (Cai et al., 2014). A body of work has shown relationships between both HG volume and TTG shape (i.e., duplication patterns) and auditory and language abilities that are thought to be relatively stable and thus possibly arising from predisposition, such as speech sound processing and learning abilities (Golestani et al., 2011, 2007), linguistic tone learning (Wong et al., 2008), and general language aptitude (Turker et al., 2017). Studies have also shown larger HG and more TTG duplications in professional musicians (Schneider et al., 2005), and longitudinal work in children undergoing musical training shows that the grey matter volume of this region is relative stable over the course of a year (Seither-Preisler et al., 2014). To date, however, little evidence has been shown supporting experience-dependent plasticity of TTG. Ressel et al. (2012) showed that lifelong, non-elective bilingual language experience with Spanish and Catalan was associated with a larger bilateral HG volume in comparison to monolingual experience. Importantly, the bilinguals in their study were not self-selected, so unlikely to have learned a second language because of a special talent for languages. The findings of the Ressel et al. study thus suggested that the HG volume differences arose from experience with a different phonological system, since Spanish and Catalan are quite similar at the lexical level but have different phonologies. This finding was partially replicated in our study, in a sub-sample of participants with a single gyrus in the right hemisphere. More generally, our results show that experience with multiple languages is associated with differences in the thickness of the TTG, and that increasing degrees of multilingualism are linearly related to the structure of the transverse temporal gyrus. Establishing that non-pathological environmental experiences are related to the morphology of TTG offers important insights into the adaptability of this early sensory region, and has important implications for our understanding of the neural underpinnings of auditory processing in general, which we outline below.

We find an effect of multilingualism on the cortical thickness of the second TTG in individuals with a second such gyrus, but on the thickness of the first gyrus in individuals without a full TTG duplication. The result aligns with recent findings of cortical surface area being relatively more associated with genetic factors and of cortical thickness being relatively more associated with environmental factors (e.g., Eyler et al., 2012; Grasby et al., 2020). Individual differences in the shape (i.e., duplication patterns) of the TTG seem to, however, be related to how multilingualism will be accommodated by the thickness of the auditory brain regions: increasing thickness of the first gyrus when no duplications are present, and decreasing thickness of the second gyrus for participants with more than one TTG.

The fact that the most multilingual participants in the current sample had thinner cortex in their second TTG might not automatically point to the conclusion that multilingual language experience is associated with this specific neural marker. In fact, polyglotism and multilingualism could partly result from innate predispositions (i.e., language aptitude), which in turn have been associated with brain structural markers within the TTG (Turker et al., 2021, 2017). However, we observed that accounting for typology (i.e., including information regarding how languages spoken by the participants differed from each other in terms of their phonological systems) explained more variance in the cortical thickness data of the 2nd TTG than an index of language experience accounting for the age of exposure to different languages alone. The contribution of typological distance to our results is important, since it strengthens our interpretation that the observed results are more likely to be due to experience-induced plasticity and not to predispositions: it is far less likely that specific brain anatomical features would induce individuals to learn specific language combinations.

Importantly, we show that, within the auditory cortex, language experience is particularly related to gyri in the superior temporal plane, and not to the rest of the planum temporale (as shown in the analysis investigating the sub-sample of participants with a single gyrus in the right hemisphere), nor to other auditory regions such as posterior STG, as shown in the first, exploratory analysis across broader auditory cortex sub-regions. The posterior STG, has been previously shown to be an essential site for language and phonological processing (Bhaya-Grossman and Chang, 2022; Hamilton et al., 2021; Hillis et al., 2017), and for representing acoustic-phonetic features of speech sounds (Lakertz et al., 2021; Mesgarani et al., 2014). We had predicted that we would find effects of differences between our multilingual participants’ languages at the acoustic and articulatory feature level within posterior STG, however indices related to the degree of multilingualism in our sample were not related to its structure. Tuning to the different acoustic-phonetic features might be an experience-independent feature of the posterior STG (i.e., it might be sensitive to features both present and absent from the languages a person knows), and therefore multilingualism might not induce any neuroplastic changes to this region. Alternatively, our findings may suggest that current language models might underestimate the role of early auditory regions (i.e., of HG and 2nd TTG, when present) in phonological processing, and support their role beyond an initial spectro-temporal analysis of speech sounds. Future research, using both structural and functional mapping, is needed to clarify the respective roles of TTG(s) and STG in phonological processing.

With regards to the specific typological features that did best explain our results, we found that accounting for overlaps between languages in terms of their phoneme-level sound structure (i.e., discrete sound categories present in the individual languages’ phonological inventories) was the best predictor of the thickness of the second TTG, and that a measure of how many unique phonemes each participant was exposed to across all their languages explained variance in the thickness of the left second TTG above and beyond language experience (in the larger main sample). Functional neuroimaging work by Fisher and colleagues (2018) has shown that speech sounds, and specifically vowel formants, are encoded in tonotopic auditory regions including HG and the STG, suggesting that even early auditory regions such as HG and second TTG (when present) are involved in speech sound processing. Similarly, Bonte et al. (2017) found that vowels can be decoded from fMRI signal of (among others) the temporal plane, Heschl’s gyrus and sulcus (see also Jäncke et al., 2002; Obleser and Eisner, 2009; van Atteveldt et al., 2004 for findings of processing of isolated phonemes in the HG and PT regions), and Rutten et al. (2019) showed that attending to speech sounds (stop consonants) in pseudowords results in an increase in the neural processing of higher temporal modulations in regions as early as HG and also in the PT. It is therefore plausible that our finding of an association between greater multilingual language experience and decreased thickness of the second TTG arises from increased processing demands (and associated cortical pruning, see below) inherent to the acquisition and mastery of many non-overlapping phonological inventories. Moreover, to the best of our knowledge, previous studies did not account for fine variation in the exact individual anatomy of the TTG (i.e., single versus multiplicated gyri). Our results may thus call for a re-evaluation of the specific anatomical site within the superior temporal plane in which phonemes are preferentially processed: based on our results, the second TTG (when present) could be a possible candidate.

Apart from phoneme-based typological distance measure explaining more variance in cortical thickness of the 2nd TTG in the main sample, the data from the replication sample showed that phonological feature-level language distance measure explained additional variance (to the composite language experience measure) in the thickness values. We believe that this inconsistency across samples might be related to the smaller sample size in the replication analysis and the particular phonological characteristics of participants’ languages. Specifically, the replication sample included 31 individuals (i.e., 46% of participants) who spoke German and English, a language combination that has a phonological feature distance of zero. In contrast, the main sample included a more heterogenous mix of 34 different languages. Notably, however, in the replication sample, the phonemes-based typological distance measure numerically explained more variance in the thickness of the 2nd TTG than a composite language experience measure based on AoA of different languages, as observed in the main sample.

The decreased thickness of the second TTG in relation to greater multilingualism may arise from different, non-exclusive microstructural and physiological mechanisms, including functional remapping, experience-driven pruning and neural efficiency, or learning-related increased myelination. Primary auditory cortex is known to lie within the anterior-medial part of HG (i.e., the first TTG; Rademacher et al., 1993), although there is no exact correspondence between cytoarchitectural and macrostructural boundaries (Rademacher et al., 2001). In-vivo markers for human primary auditory cortex have been identified, including tonotopy, increased myelination (Dick et al., 2017), and decreased cortical thickness (Zoellner et al., 2019). Also, there is evidence that in professional musicians, functional activation extends to posterior TTGs during listening to musical sounds, while activation is confined to HG in non-musicians (Schneider et al., unpublished findings). Our findings of thinning in relation to extensive multilingual language experience with a wider diversity of speech sounds could be speculatively interpretated as arising from spatial expansion of primary-like functionality to the second TTG, when present. This spreading of primary-like processing may subsequently induce specialization and pruning of the more posterior gyrus. Alternatively, it could be that the secondary auditory regions beyond HG may be specialised for the processing of (cross-linguistic) phonological information, and this might result in experience-driven pruning. Why this occurs in the second TTG and not in non-gyral portions of the superior temporal plane (i.e., in the PT more generally) remains to be explored in future work, but one possibility is that gyri are more likely to host such a specialised function due to their greater neuronal density or to specific micro-anatomical features and connectivity properties (i.e., greater proximity and thus more direct, local functional and structural connectivity of neurons within the gyrus, as opposed to with neurons within the sulcal parts). Either way, the neural efficiency account would be in line with the Dynamic Restructuring Model of bilingualism (Pliatsikas, 2020), according to which the “peak efficiency” stage of multilingual functioning comes hand in hand with reductions in local volume. Of note, our supplementary analysis (see Section 7.6) investigating second TTG thickness in relation to proficient versus non-proficient languages points to the tentative conclusion that the observed reduced thickness results might be driven by experience with proficiently spoken languages (more so than with non-proficiently spoken ones), aligning with the predictions of the Dynamic Restructuring Model. Another potential microstructural mechanism underlying the thinner cortex result could be increased myelination of 2nd TTG; this would again be in line with the idea that multilingual language experience leads to modifications such that the of 2nd TTG becomes more ‘primary-like’, since HG is characterized by higher myelination. It has been suggested that the apparent thinning of the ventral temporal cortex during childhood (as argued by e.g., Sowell et al., 2004) is actually driven by increased myelination, since increased myelination in deep cortical layers can shift the apparent gray–white boundary in MR images (Natu et al., 2019). Given the cross-sectional design of the current study, the above explanations remain hypotheses to be tested in further longitudinal studies of language acquisition, preferably ones using myelin mapping (Lutti et al., 2014; Marques et al., 2010) to explore the underlying physiological mechanisms in more details.

Interestingly, in individuals who have only one TTG (i.e., no duplications), we find that multilingualism is related to thickness but in the opposite direction. We speculate that there might be different mechanisms underlying plasticity in regions more likely to contain primary versus secondary auditory cortex. In other words, it may be more likely that experience-induced pruning occurs in secondary compared to primary regions, and that in primary regions, greater efficiency comes at the cost of a neuronal ‘bulking’. Alternately but non-exclusively, it could be that in people who do not have a second TTG, HG needs to accommodate both lower-level auditory processing and also the processing of more complex speech sounds. Further research, especially with larger samples of participants with a single TTG, is needed to elucidate the opposite direction of the relationship between multilingualism and cortical thickness in HG versus the second TTG in our results.

Different methodological tools are available for capturing the variability of TTG. Most previous studies used either manual labelling, whole-brain approaches involving normalizing structural images to a common template space in order to assess regional variation in grey or white matter probability, or automated pipelines serving to segment and label different regions of interest (e.g. VBM or FreeSurfer). Both approaches have important advantages, the former allowing for high precision in capturing details of individual variation, the latter two being far less time-consuming and labor-intensive, and therefore more applicable to large datasets. In the current study, we capitalized on recent developments in cortical segmentation efforts and used an automatic toolbox (TASH; Dalboni da Rocha et al., 2020) specifically designed for fine delineation of auditory cortex gyri, and extraction measures describing their anatomy. Coupled with visual inspection of the data, TASH provides a “best-of-both-worlds” approach, by being anatomically precise while involving little manual involvement with the data (which also tends to be error-prone). The measures we obtained align with the high degree of variability reported in previous literature, but they also seem to be more sensitive, since we identify more multiplications of the TTG than previously reported. For example, in Marie et al. (2015), out of 232 subjects, 204 (88%) had one gyrus (including partially duplicated) in the left hemisphere, 28 (12%) had a complete posterior duplication; the prevalence of further multiplications was not assessed in that study. In our data, a minority of participants (4%) presented only one gyrus in the left hemisphere, and most of them had either two (57%) or three (35%) separate gyri. TASH identifies gyri based on anatomical landmarks and curvature values, and therefore might be more sensitive to shallow cortical folds that might otherwise not be discernable in manual labelling. The precise functional relevance of these multiplications remains to be uncovered in future research.

Notably, such variability in shape of the auditory regions is absent from other, evolutionarily older, species: macaques’ auditory cortex is flat, and some chimpanzees have only a single gyrus (Hackett et al., 2001). By showing that a human-specific experience, i.e., speaking multiple languages, is reflected in the anatomy of cortical regions that seem to be evolutionarily particular to humans, we contribute to informing both models of neuroplasticity and of language evolution.

A large body of work has provided insights into both how the brain processes auditory information and speech signals (see Bhaya-Grossman and Chang, 2022; Moerel et al., 2014 for overviews), and into how bi- and multilingualism are related to the structure of the brain (see García-Pentón et al., 2016; Li et al., 2014; Pliatsikas, 2020 for overviews). In the current study, we brought together both strands of research in an effort to investigate how the auditory cortex accommodates multilingual experience, thereby providing insights into the intricate functioning of auditory processing in humans. Apart from showing that individual anatomy appears to constrain the architecture of phonological representations, our findings also support the idea that experience with typologically similar languages might be different from experience with typologically distant languages (Antoniou, 2018; Berthele, 2020; Li et al., 2014). Indeed, across two independent datasets, we show, for the first time, that cortical thickness of early auditory brain regions is related to the degree of one’s language experience coupled with typological distance between the languages one knows. These findings indicate that early auditory regions seem to represent (or be shaped by) phoneme-level cross-linguistic information, contrary to the most established models of language processing in the brain, which suggest that phonological processing happens in more lateral posterior STG and STS.

4 Materials and Methods

4.1 Participants

Main sample. The full sample consisted of N = 146 participants (Mage = 35.79, SD = 15.77; 85 females; 31 mono-linguals) exposed to up to 7 languages (2.65 languages on average), with n = 136 complete cases. Replication sample. The replication sample consisted of N = 69 participants (Mage = 32.04, SD = 11.68; 38 females; 29 mono-linguals) exposed to up to 5 languages (2.44 languages on average), with n = 68 complete cases.

4.2 Language experience

To describe the diverse language background of our participants, we expressed age of onset(s) of acquisition (AoA) of different languages in a continuous quantitative measure. First, AoA of individual languages was log-transformed to minimize differences between values for languages learned later in life, and the values were inverted to express early AoAs as the highest values; a constant value of 1 was added to the index before log-transformation and after inverting the values, each time to avoid values equal to zero. Next, the AoAs of participants’ different languages were combined into one “language experience” index per participant. It was computed using Shannon’s entropy equation (Shannon, 1948):

where, n stands for the total number of languages a participant has been exposed to and pi is the AoA index calculated as above. High entropy values indicated more diverse language experience.

4.3 Typological distance measures

Across the two samples, the participants spoke a total of N = 36 different languages (n = 34 in the main sample, and n = 20 in the replication sample): Arabic, Bengali, Cantonese, Creole (Antiguan), Czech, Danish, Dutch, English, Farsi, Finnish, French, German, Greek, Gujarati, Hakka, Hebrew, Hindi, Hokkien, Hungarian, Irish, Italian, Japanese, Korean, Luxembourgish, Malay, Mandarin, Portuguese, Punjabi, Russian, Sindhi, Spanish, Swahili, Swedish, Swiss German, Urdu and Vietnamese. The PHOIBLE database (Moran et al., 2019) and open-source software (Dediu and Moisik, 2016) were used to construct three measures of typological distance between the languages: distances between the distinctive phonological features describing the phonemes of each phonological inventory, distances between the sets of phonemes of the phonological inventories, and similarity in counts of phonological classes that share certain features, as detailed below. Figure S 2 provides a visual representation of the three distance matrices between all the languages.

4.3.1 Acoustic and articulatory features

For each documented phonological inventory, PHOIBLE includes their phonemes (see Section 4.3.2 below) and the lower-level distinctive acoustic and articulatory features describing the phonemes. These features offer a qualitative description of differences between discrete sound categories, and are based on articulatory phonology (Hayes, 2011). Every phoneme in every language is described by the following 37 binary features: tone, stress, syllabic, short, long, consonantal, sonorant, continuant, delayed release, approximant, tap, trill, nasal, lateral, labial, round, labiodental, coronal, anterior, distributed, strident, dorsal, high, low, front, back, tense, retracted tongue root, advanced tongue root, periodic glottal source, epilaryngeal source, spread glottis, constricted glottis, fortis, raised larynx ejective, lowered larynx implosive, click. Individual languages were described in terms of presence or absence of these features across all phonemes belonging to the given phonological inventory, and represented as vectors. For a given pair of language feature vectors, a pairwise distance was computed as cosine distance using the scipy.spatial.distance function in Python.

4.3.2 Phonemes

Phonemes are individual speech sounds such as vowels and consonants, representing the smallest units of sound that can change the meaning of a word when substituted. Lists of phonemes belonging to each of the languages represented in the data-base were exported from PHOIBLE (Moran et al., 2019) and represented as strings. For a given pair of languages A and B, a pairwise string distance between phonemes of both languages was computed with the Jaccard distance method:

representing the inverse of the ratio between the size of the intersection and the size of the union of the two sets of phonemes, and varying between 0 (both languages having exactly the same phonemes) and 1 (no phonemes in common).

In addition, using the language-specific phoneme-level information, we constructed a cross-linguistic description of individual languages and used it to compute a measure of cumulative phoneme inventory for each participant. We counted how many unique phonemes each participant was exposed to across all their languages (i.e., phonemes that overlapped between languages were counted only once). For example, the cumulative phoneme inventory of a participant who was exposed only to English was equal to 40, while an English-French bilingual’s cumulative phoneme inventory would be equal to 64 (both English and French have 40 unique phonemes (Stanford Phonology Archive, 2019b, 2019a), with 16 phonemes overlapping across the two inventories).

4.3.3 Counts of phonological classes

Both the lists of phonemes and phonological features describing them can be used to derive a more abstract, higher-level description of individual languages in terms of which and how many phonological classes their phonological inventories contain. Using the PHOIBLE inventories and feature system, Dediu and Moisik (2016) proposed a comprehensive list of 167 phonological classes, along with a method for selecting a language’s phonemes belonging to given classes and computing their counts. The system describes how many “segments”, “consonants”, “vowels”, “diphthongs” etc. a given language has, and counts further classes that share certain features (e.g., “bilabial consonants”, “front vowels”, “clicks”, see Supplementary Information, Section 7.8 for the full list). Per language, we created a vector representing the counts of all 167 phonological classes, and computed pairwise distances between the languages spoken by the participants in our database with cosine distance (as above).

4.4 Combining typology and language experience (indexed by AoA)

Quantifying diverse language experience with Shannon’s entropy equation does not account for similarities and differences between the languages. Therefore, using Rao’s quadratic entropy equation (QE) (De Bello et al., 2007; Pavoine and Bonsall, 2011; Rao, 1982), the language experience index was combined with each of the above typological distance measures, separately, to generate weighted measures accounting for the three different types of cross-linguistic phonological distances. The summed phonological distances (feature-level, phoneme-level or based on counts of phonological classes) between all language pairs were weighted by the log-transformed and inverted AoA index for each language, as follows:

If the AoA index for i-th language in a participant’s repertoire is pi and the dissimilarity between language i and j is dij, then language experience index accounting for typological information has the form:

where dij varies from 0 (i.e., where the two languages have exactly the same phonological systems) to 1 (i.e., the two languages have completely different phonological systems). The SYNCSA R package (Debastiani and Pillar, 2012) (https://rdrr.io/cran/SYNCSA/) was used to perform calculations.

For a similar methodology, but applied to lexical distance measures, see Kepinska et al. (2023).

4.5 Neuroimaging data acquisition and processing

For all participants, structural MRIs with 176 sagittal slices were acquired using a T1-weighted three-dimensional (3D) Modified Driven Equilibrium Fourier Transform (MDEFT) sequence. The resulting images had a matrix size of 256 × 224, yielding a final resolution (voxel size) of 1 × 1 × 1 mm. Repetition time (TR)/echo time (TE)/inversion time (TI) was: [12.24/3.56/530 ms] for the main sample (from a 1.5T Siemens Sonata scanner), and 7.92/ 2.48/910 ms for the replication sample (from a 3T Siemens Trio scanner). The T1 images were denoised using the spatially adaptive non-local means filter (SANLM; Manjón et al., 2010) in MATLAB within the CAT12 toolbox. The images were then processed with FreeSurfer’s (version 7.2) brain structural pipeline (Fischl et al., 2004), which consists of motion correction, intensity normalization, skull stripping, and reconstruction of the volume’s voxels into white and pial surfaces.

4.6 Parcellation of the auditory regions

The reconstructed surfaces were parcellated into regions using an atlas-based procedure (Destrieux et al., 2010). The auditory regions were delineated using the following labels from the Destrieux atlas: planum polare of the superior temporal gyrus (G_temp_sup-Plan_polar; PP), anterior transverse temporal gyrus (of Heschl) (G_temp_sup-G_T_transv; referring to HG, or to the anterior-most TTG), transverse temporal sulcus (S_temporal_trans-verse; HS), planum temporale, or temporal plane of the superior temporal gyrus (G_temp_sup-Plan_tempo; PT), superior temporal sulcus (parallel sulcus) (S_temporal_sup; STS), and l ateral aspect of the superior temporal gyrus (G_temp_sup-Lateral; STG). The STG and the STS labels were further divided into smaller regions using Free-Surfer’s freeview software and FreeSurfer command line tools. For this, on the inflated surface of the fsaverage template, we first drew additional ROIs, dividing the STG into anterior and posterior parts, with the transverse temporal sulcus as the dividing landmark (Hamilton et al., 2021). We further divided the STS into anterior, middle and posterior parts. The anterior versus mid-STS were divided by again aligning the boundary with the posterior border of the anterior TTG, and the mid-versus posterior STS were divided by aligning the boundary with the posterior border of STG. Figure S 3 shows an example of the parcellation in the native space of one of the participants. Next, we transformed the new ROIs into the participants native space (using mri_surf2surf), and intersected them with individual subjects’ unaltered STG and STS labels. Cortical volume, surface area and average thickness were computed and extracted from the final labels (with mris_anatomical_stats), and were used in the linear mixed models reported in Section 2.1.

For analysis on the TTG along the superior temporal plane (i.e., on the superior aspect of the STG) (see Results, Section 2.2), we used an automated toolbox (TASH; Dalboni da Rocha et al., 2020). TASH runs on the output of the FreeSurfer Destrieux atlas structural segmentation, and provides a finer segmentation not only of HG – whether a single gyrus or a common stem duplication (default version of TASH) – but also of additional TTGs when present (extended version of TASH, called ‘TASH_complete’; Dalboni da Rocha et al., 2023). Volume, surface area and average thickness of the resulting labels were extracted. To explore the significant interaction between average thickness of PT and language experience in the exploratory analysis on all auditory regions (Section 2.1), we used ‘TASH_complete’; this allowed us to explore possible relationships with additional TTGs, when present. We performed a visual selection of the gyri segmented by TASH_complete, excluding from the analysis gyri that lay along the portion of the superior temporal plane that curved vertically (i.e., within the parietal extension, Honeycutt et al., 2000), when present. The volume, surface area and average thickness of the resulting labels were extracted, and used as dependent variables in the statistical analysis (see Results, see Section 2.2).

Abbreviations

  • STG

    superior temporal gyrus

  • STS

    superior temporal sulcus

  • PT

    planum temporale

  • TTG

    transverse temporal gyrus

  • HG

    Heschl’s gyrus (first transverse temporal gyrus)

  • HS

    Heschl’s sulcus

  • PP

    planum polare

  • L1

    first language

  • AoA

    age of onset of acquisition

  • ROI

    region of interest

Acknowledgements

This work was supported by the Wellcome Trust (grant numbers 203147/Z/16/Z and 205103/Z/16/Z to CJP. The authors gratefully acknowledge support by the NCCR Evolving Language, Swiss National Science Foundation Agreement #51NF40_180888. OK was funded by the Marie Jahoda Stipendium from University of Vienna.

7. Supplementary Information

Illustration of the replication sample’s language experience. As in Figure 1, each bar here represents a single participant’s overall language experience; the height of the stacked bars within each bar represents the AoA index for individual languages (the taller the bar, the earlier in life a given language was acquired). The color of each stacked bar refers to the number of phonemes in each language’s phonological inventory.

Similarity matrices of typological distances between all languages represented in the study (N = 36) based on: (1) distances in distinctive acoustic and articulatory features describing the phonemes of each language (e.g., “short”, “long”); (2) distances in sets of phonemes belonging to each language; and (3) distances based on counts of phonological classes that share certain features (e.g., “consonants”, “front rounded vowels”, “clicks”). Data for individual languages were collected from the PHOIBLE database (Moran et al., 2019) and open-source software (Dediu & Moisik, 2016). The figure was generated in R, with the package pheatmap (Kolde, 2019), version 1.0.12.

Auditory ROIs used in the analysis. The ROIs are overlaid on an inflated surface in the native space of one of the participants.

7.1 Auditory cortex regions and language experience

Results of linear mixed models (parameter estimates and standard errors, in brackets; p-values are listed according to the coding presented underneath the table) testing the effect of language experience on the structure (volume, area, and average thickness) of the auditory regions: planum polare, Heschl’s gyrus, Heschl’s sulcus, planum temporale, anterior and posterior superior temporal gyrus, and anterior, middle, and posterior superior temporal sulcus. Anterior STG was used as the reference level.

7.2 Superior temporal region and language experience

Results of the linear mixed models testing the effect of Language Experience on the structure (volume, area and average thickness) of the gyri in the superior temporal region: first, second and third TTG. Anterior TTG was used as the reference level.

7.3 Vertex-wise analysis

Results of a whole-brain vertex-wise analysis, aimed at establishing relations between the language experience index and whole-brain cortical thickness. Overlaid on the inflated surface of the fsaverage template brain is the thresholded at p < .0001 (uncorrected) significance map from the conducted F-test showing a negative relationship between cortical thickness in the highlighted region and the degree of multilingual language experience.

7.4 Second transverse temporal gyrus and effects of language typology

Kolmogorov-Smirnov normality tests were run on all models’ residuals, revealing that parametric linear models were appropriate for the present data (all ps > .4). According to a frequentist analysis of the data, all eight overall regression models were statistically significant: (1) Left hemisphere: F(4,125) = 7.802, p < .001, F(4,125) = 6.32, p < 0.001, F(4,125) = 8.287, p < 0.001, F(4,125) = 6.98, p < 0.001, respectively for the models with the four language experience indices: (a) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (b) acoustic/articulatory features, (c) phonemes, and (d) counts of phonological classes; and (2) Right hemisphere: F(4,91) = 4.295, p = 0.003, F(4,91) = 3.461, p = 0.01, F(4,91) = 4.331, p = 0.002, F(4,91) = 3.722, p = 0.007, respectively for the models with the four language experience indices: (a) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (b) acoustic/articulatory features, (c) phonemes, and (d) counts of phonological classes.

7.4.1 Cumulative phoneme inventory and the second TTG

The two models including the ‘cumulative phoneme inventory’ measure were statistically significant: (1) Left hemisphere: F(5,124) = 8.308, p < .001, and (2) Right hemisphere: F(5,90) = 3.42, p = 0.007, see further Table S 3. Figure S 5 presents the relationship between the thickness of the second TTG (left and right) and the cumulative phoneme inventory.

Cumulative phoneme inventory and thickness of the second TTG. Average thickness of the second TTG in the left and right hemisphere in relation to the number of unique phonemes each participant was exposed to across all their languages (the plotted values are residuals controlled for age, sex, mean hemispheric thickness and the language experience index irrespective of typology).

left and right second transverse temporal gyri and cumulative phoneme inventory. Multiple regression model parameters (parameter estimates and standard errors, in brackets; p-values are listed according to the coding presented underneath the table) for the average cortical thickness of the second TTG (left and right), as predicted by the “cumulative phoneme inventory” index. For comparison, models with the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of phonemes are also reported. Last two rows present model comparison results (additional variance explained and BF10 and BF01 values are also reported).

7.5 Language experience in participants with a single TTG

Kolmogorov-Smirnov normality tests were run on all models’ residuals, revealing that parametric linear models were appropriate for the present data (all ps > .29). According to a frequentist analysis of the data, all eight overall regression models were statistically significant: (1) Right Heschl’s gyrus: F(4,35) = 6.365, p < .001, F(4,35) = 4.851, p = 0.003, F(4,35) = 5.403, p = 0.002, F(4,35) = 5.72, p = 0.001, respectively for models with the four language experience indices: (a) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (b) acoustic/articulatory features, (c) phonemes, and (d) counts of phonological classes; and (2) Right planum temporale: F(4,35) = 2.655, p = 0.049, F(4,35) = 2.639, p = 0.050, F(4,35) = 2.66, p = 0.048, F(4,35) = 2.685, p = 0.049, respectively for models with the four language experience indices: (a) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (b) acoustic/articulatory features, (c) phonemes, and (d) counts of phonological classes.

Right superior temporal plane (Heschl’s gyrus and planum temporale) and language experience in participants with one TTG. Multiple regression model parameters (parameter estimates and standard errors, in brackets; p-values are listed according to the coding presented underneath the table) for the average cortical thickness of the right Heschl’s gyrus, and the right planum temporale, as predicted by the four language experience indices: (1) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (2) phonemes, (3) acoustic/articulatory features and (4) counts of phonological classes. Last two rows present model comparison results (additional variance explained and BF10 values); NB. all models including typological information were compared against the ‘No typology’ model.

7.6 Effects of language proficiency

The findings of thinner cortex in the second TTG (when present) were further followed up by an analysis investigating the effect of proficiency attained by the participants in their individual languages. Here, we again constructed cumulative language experience measures per participant using the Shannon’s entropy equation (Shannon, 1948), but this time from the self-reported proficiency ratings (on a scale from 0 to 10) of each of participants’ languages. First, we computed a general proficiency score and used it in two linear models to establish whether the AoA-derived cumulative language experience measure could be replicated by the proficiency-derived measure, since AoA and proficiency are known to be highly correlated. This analysis therefore served as a ‘sanity check’ for the results obtained from the analyses reported in Section 2.2. Indeed, we observed that the thickness of the second TTG (when present) was negatively related to the cumulative language experience measure derived from proficiency ratings. This effect was significant for the left hemisphere 2nd TTG thickness, and showed a trend towards significance for the right (β = -0.11, t = -2.41, p = .018, small effect size: f2 = 0.05; and β = -0.10, t = -1.87, p = .065, small effect size: f2 = 0.05, for left and right 2nd TTG respectively).

Given the hypothesis that multilingualism is a dynamic process reshaping brain structure and inducing both increases (in initial stages) and decreases (in peak efficiency) in brain morphological indices (Pliatsikas, 2020), we further explored the effect of proficiency of different languages by investigating the role of languages spoken proficiently and non-proficiently on the structure of the 2nd TTG. Here, we computed two further different indices per participant: (1) derived only from languages spoken proficiently (rated at 7 and higher out of 10 on a self-reported proficiency scale), and (2) derived from languages spoken at a low level of proficiency (rated lower than 7 out of 10). We subsequently used these indices in two further analyses to test the hypothesis that decreases in brain morphological indices (here in cortical thickness) are driven by peak efficiency of language functioning (operationalized here as high proficiency in many languages). We did observe that the thickness of the 2nd TTG was related to the measure of language experience derived from languages spoken proficiently, however this effect was only significant in the right hemisphere (β = -0.18, t = -2.89, p = .005, small effect size: f2 = 0.11; and β = -0.02, t = -0.44, p = .66, for right and left 2nd TTG respectively). 2nd TTG’s thickness was not related to the language experience measure derived from languages spoken at low level of proficiency (β = -0.01, t = -0.11, p = .91; and β = -0.12, t = -1.69, p = .097 for right and left 2nd TTG respectively), see Figure S 6.

Multilingual proficiency and thickness of the second TTG. Average thickness of the second TTG in the left and right hemisphere as a function of language proficiency in all languages of each participant (top panel), only their proficient languages (middle panel) and only their non-proficient languages (bottom panel). Plots show residuals, controlling for age, sex and mean hemispheric thickness.

7.7 Replication analysis

Kolmogorov-Smirnov normality tests were run on all models’ residuals, revealing that parametric linear models were appropriate for the present data (all ps > .5). According to a frequentist analysis of the data, all four overall regression models for left hemisphere data were statistically not significant: F(5,55) = 1.109, p < .001, F(5,55) = 1.12, p =.36, F(5,55) = 8.287, p =.37, F(5,55) = 1.072, p =.38, respectively for models with the four language experience indices: (a) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (b) acoustic/articulatory features, (c) phonemes, and (d) counts of phonological classes. Models for the right hemisphere data were all statistically significant, apart from the counts of phonological classes model: F(5,48) = 2.781, p = 0.03, F(5,48) = 5.158, p < 0.001, F(5,48) = 3.244, p = 0.013, F(5,48) = 2.342, p = 0.06, respectively for models with the four language experience indices: (a) the cumulative language experience measure not accounting for typology, and cumulative language experience weighted by overlaps between languages at the level of (b) acoustic/articulatory features, (c) phonemes, and (d) counts of phonological classes.

7.8 Phonological classes as defined by Dediu and Moisik (2016)

(v = vowels, c = consonants) segments, vowels, monophtongs, diphtongs, triphtongs, heights.v, lengths.v, long.v, nasal.v, round.v, high.v, mid.v, low.v, front.v, back.v, tense.v, lax.v, atr.v, rtr.v, raised.v, retracted.v, fronted.v, glottalized.v, unique.v, unique.nasal.v, heights_mono.v, heights_di.v, heights_tri.v, lengths_mono.v, lengths_di.v, lengths_tri.v, long_mono.v, long_di.v, long_tri.v, nasal_mono.v, nasal_di.v, nasal_tri.v, round_mono.v, round_di.v, round_tri.v, high_mono.v, high_di.v, high_tri.v, mid_mono.v, mid_di.v, mid_tri.v, low_mono.v, low_di.v, low_tri.v, front_mono.v, front_di.v, front_tri.v, back_mono.v, back_di.v, back_tri.v, tense_mono.v, tense_di.v, tense_tri.v, lax_mono.v, lax_di.v, lax_tri.v, atr_mono.v, atr_di.v, atr_tri.v, rtr_mono.v, rtr_di.v, rtr_tri.v, raised_mono.v, raised_di.v, raised_tri.v, retracted_mono.v, retracted_di.v, retracted_tri.v, fronted_mono.v, fronted_di.v, fronted_tri.v, glottalized_mono.v, glottalized_di.v, glottalized_tri.v, unique_mono.v, unique_di.v, unique_tri.v, unique.nasal_mono.v, unique.nasal_di.v, unique.nasal_tri.v, consonants, places.c, bilabial.c, labiodental.c, dental.c, alveolar.c, dental_alveolar.c, palatoalveolar.c, alveolopalatal.c, postalveolar.c, true_retroflex.c, palatal.c, velar.c, uvular.c, pharyngeal_epiglottal.c, glottal.c, labial.c, coronal.c, dorsal.c, guttural.c, manners.c, obstruent.c, voiced_obstruent.c, voiceless_obstruent.c, aspirated_obstruent.c, glottalized_obstruent.c, stop.c, voiced_stop.c, voiceless_stop.c, aspirated_stop.c, glottalized_stop.c, fricative.c, voiced_fricative.c, voiceless_fricative.c, affricate.c, sonorant.c, voiced_sonorant.c, voiceless_sonorant.c, glottalized_resonant.c, nasal.c, approximant.c, tapflap.c, trill.c, trill_tap.c, coronal_trill_tap.c, glottalized.c, uvt.c, uvt.stops.c, uvt.fricatives.c, uvt.affricates.c, uvt.nasals.c, uvt.approximants.c, lvt.c, lvt.stops.c, lvt.fricatives.c, lvt.affricates.c, lvt.nasals.c, lvt.approximants.c, ratio.voiced.voiceless.obstruents, ratio.voiced.voiceless.stops, ratio.obstruents.sonorants, egressive, implosive, ejective, click, voiceless, voiced, breathy, creaky, tones, bilabial.fricatives, labiodental.fricatives, alveolar.fricatives, nonsibilant.dental.fricatives, sibilant.dental.fricatives, bilabiallabiodental.affricates, bilabial.affricates, retroflex.stops, retroflex.fricatives, retroflex.affricates, retroflex.nasals, retro-flex.approximants