1. Neuroscience
Download icon

Impaired voice processing in reward and salience circuits predicts social communication in children with autism

Research Communication
  • Cited 0
  • Views 719
  • Annotations
Cite this article as: eLife 2019;8:e39906 doi: 10.7554/eLife.39906

Abstract

Engaging with vocal sounds is critical for children’s social-emotional learning, and children with autism spectrum disorder (ASD) often ‘tune out’ voices in their environment. Little is known regarding the neurobiological basis of voice processing and its link to social impairments in ASD. Here, we perform the first comprehensive brain network analysis of voice processing in children with ASD. We examined neural responses elicited by unfamiliar voices and mother’s voice, a biologically salient voice for social learning, and identified a striking relationship between social communication abilities in children with ASD and activation in key structures of reward and salience processing regions. Functional connectivity between voice-selective and reward regions during voice processing predicted social communication in children with ASD and distinguished them from typically developing children. Results support the Social Motivation Theory of ASD by showing reward system deficits associated with the processing of a critical social stimulus, mother’s voice, in children with ASD.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved (see decision letter).

https://doi.org/10.7554/eLife.39906.001

Introduction

The human voice is a critical social stimulus in children’s environment, and engaging with vocal sounds is important for language (Kuhl et al., 2005a; Christophe et al., 1994) and social-emotional learning (DeCasper and Fifer, 1980) during typical development. However, children with autism spectrum disorder (ASD) are often not responsive to voices (Kanner, 1968; Harstad et al., 2016), and it has been hypothesized that voice processing deficits contribute to pronounced social communication difficulties in ASD (Klin, 1991; Kuhl et al., 2005b; Whitehouse and Bishop, 2008). A special case of voice processing impairments in children with ASD is a deficit in processing mother’s voice (Klin, 1991), a biologically salient and implicitly rewarding sound for typically developing (TD) children (Lamb, 1981; Thoman et al., 1977), which is closely associated with cognitive (Kuhl et al., 2005a; Christophe et al., 1994) and social development (DeCasper and Fifer, 1980; Adams and Passman, 1979). Compared to studies of visual face processing (Dalton et al., 2005; Dawson et al., 2002; Dichter et al., 2012; Pierce et al., 2001; Schultz et al., 2000), very little is known regarding the neurobiology of voice processing networks in children with ASD, which is fundamental to human communication.

It remains unknown why children with ASD often do not engage with the voices in their environment. Specifically, it is not known which aspects of voice processing are impaired in children with ASD. One possibility is that sensory deficits negatively affect voice processing and contribute to social communication deficits (Dinstein et al., 2012; Markram et al., 2007; Russo et al., 2010; Marco et al., 2011; Leekam et al., 2007; Woynaroski et al., 2013). A second possibility relates to the motivation to engage with socially relevant stimuli (Chevallier et al., 2012; Dawson et al., 2004; Pelphrey et al., 2011; Clements et al., 2018). The social motivation theory of ASD posits that impairments in representing the reward value of human vocal sounds impedes individuals with ASD from engaging with these stimuli, and contributes to social interaction difficulties (Dawson et al., 2002; Chevallier et al., 2012). While this is a prominent model for considering social communication function in ASD, there has been a dearth of compelling experimental evidence showing aberrant reward processing in response to clinically meaningful social stimuli (Clements et al., 2018).

An important approach for testing theories of ASD is the use of human brain imaging methods and functional circuit analyses. Behavioral studies are limited in their ability to provide details regarding the neural mechanisms underlying distinct aspects of social information processing, and systems neuroscience analyses can uncover important aspects of social information processing that may be impaired in individuals with ASD. For example, the social motivation theory posits that individuals with ASD show reduced engagement and connectivity in the mesolimbic reward system, including the ventral tegmental area (VTA), nucleus accumbens (NAc), orbitofrontal cortex (OFC), and ventromedial prefrontal cortex (vmPFC), and structures of the salience and affective processing systems, instantiated in the anterior insula and amygdala, during social processing (Chevallier et al., 2012).

Previous brain imaging research of voice processing in adults with ASD has supported the sensory deficit model by showing reduced regional activity in voice-selective superior temporal sulcus (STS) (Gervais et al., 2004; Schelinski et al., 2016), a core region associated with structural analysis of the human voice (Belin et al., 2000). However, several factors have precluded thorough tests of prominent ASD theories in the context of the neurobiology of voice processing. First, there have been few studies examining voice processing in ASD, particularly when compared to the extensive face processing literature (Dalton et al., 2005; Dichter et al., 2012; Pierce et al., 2001; Schultz et al., 2000; Baron-Cohen et al., 1999; Dapretto et al., 2006). Second, previous studies have not employed biologically salient voices (e.g. mother/caregiver), which are thought to be implicitly rewarding (Chevallier et al., 2012), to probe brain circuit function in children with ASD. For example, a recent study in TD children showed that, compared to unfamiliar voices, mother’s voice elicits activation within voice-selective, mesolimbic reward, affective, and salience, and face-processing brain regions, and connectivity between these regions predicts social communication abilities (Abrams et al., 2016). Third, previous studies of voice processing have focused on group differences in brain activity between individuals with ASD and matched controls but have not examined how individual variation in social communication abilities are associated with social brain circuit function in ASD. Finally, although autism has been conceptualized as a disorder of brain connectivity (Uddin et al., 2013a; Wass, 2011), previous brain imaging studies of human voice processing in ASD have focused on regional activation profiles in voice-selective cortex (Gervais et al., 2004; Schelinski et al., 2016) and have not employed a brain networks perspective. Importantly, a brain networks approach goes beyond describing activation in circumscribed brain regions and accounts for the coordinated activity in distributed brain systems during social information processing, and would provide considerable insight into aberrancies in several critical brain systems in ASD (Di Martino et al., 2011; Uddin et al., 2013b; von dem Hagen et al., 2013). For example, a previous resting state fMRI study investigated intrinsic connectivity of voice-selective cortex and showed that children with ASD have reduced connectivity between voice-selective STS and key structures of the mesolimbic reward system, anterior insula, and amygdala (Abrams et al., 2013a). Moreover, the strength of intrinsic connectivity in this network predicted social communication abilities in children with ASD. While intrinsic network findings support the social motivation theory of ASD, a critical question remains: do results from intrinsic connectivity reflect an epiphenomenon, or is aberrant brain connectivity in voice and reward brain systems during the processing of biologically salient and clinically relevant voices a signature of social communication deficits in children with ASD?

Here, we examine social information processing in children with ASD by probing brain circuit function and connectivity in response to human vocal sounds. We examined two aspects of voice processing: (1) unfamiliar voice processing compared to non-social auditory processing (i.e. environmental sounds) and (2) mother’s voice compared to unfamiliar voice processing (Figure 1a). The rationale for this approach is that these two levels of social information processing may reflect distinct neural signatures in voice-selective, salience, and reward processing brain systems in children with ASD. A key aspect of our analysis was to investigate whether brain activity and connectivity in response to these vocal contrasts reflects individual differences in social communication abilities in children with ASD (Lord et al., 2000). A second aspect of the analysis was to build on results from a previous intrinsic connectivity study of the voice processing network in children with ASD (Abrams et al., 2013a) to examine whether stimulus-evoked connectivity patterns within this network during unfamiliar and mother’s voice processing can reliably distinguish children with ASD from TD children and predict social communication abilities in children with ASD.

fMRI Experimental design, acoustical analysis, and behavioral results.

(A) Randomized, rapid event-related design: During fMRI data collection, three auditory nonsense words, produced by three different speakers, were presented to the child participants at a comfortable listening level. The three speakers consisted of each child’s mother and two control voices. Non-speech environmental sounds were also presented to enable baseline comparisons for the speech contrasts of interest. All auditory stimuli were 956 ms in duration and were equated for RMS amplitude. (B) Acoustical analyses show that vocal samples produced by the participants’ mothers were comparable between TD (yellow) and ASD groups (magenta) and were similar to the control samples (cyan) for individual acoustical measures (p>0.10 for all acoustical measures; see Appendix, Acoustical analysis of mother’s voice samples). (C) All TD children and the majority of children with ASD were able to identify their mother’s voice with high levels of accuracy, however five children with ASD performed below chance on this measure (see Appendix, Identification of Mother’s Voice). The horizontal line represents chance level for the mother’s voice identification task.

https://doi.org/10.7554/eLife.39906.002

Results

TD vs. ASD activation differences in response to unfamiliar voices

Direct group comparisons between TD children and children with ASD in response to unfamiliar female voices show that children with ASD have reduced activity in a relatively small set of brain regions confined to lateral temporal cortex (Figure 2A; see Appendix 1—table 1 for effect sizes and Appendix 1—figure 2 for within-group results). Specifically, children with ASD show reduced activity in right hemisphere planum polare (PP), an area of auditory association cortex within the superior temporal gyrus. Within-group signal level analysis showed that TD children have greater activity for unfamiliar female voices, compared to environmental sounds, in this brain region (i.e. positive βs; see Appendix 1—figure 3A) while children with ASD show weaker activity for this same contrast (i.e. negative βs for unfamiliar voices compared to environmental sounds). No brain regions showed greater activity for unfamiliar female voices in the ASD, compared to the TD, group.

Brain activity difference in TD children compared to children with ASD in response to vocal stimuli.

(A) Group comparisons indicate that TD children show greater activity compared to children with ASD in right-hemisphere auditory association cortex (planum polare (PP)) in response to the unfamiliar female voices > non-vocal environmental sound contrast. No regions showed greater activity in children with ASD compared to TD children for the unfamiliar female voice contrast. (B) Group comparisons indicate that TD children show greater activity in several visual processing regions, including bilateral intercalcarine cortex, lingual gyrus, and fusiform cortex, as well as right-hemisphere posterior hippocampus and superior parietal regions, in response to the mother’s voice > unfamiliar female voices contrast. No regions showed greater activity in children with ASD compared to TD children for the mother’s voice contrast.

https://doi.org/10.7554/eLife.39906.003

TD vs. ASD activation differences in response to mother’s voice

Direct group comparisons between brain responses measured from TD children and children with ASD in response to mother’s voice relative to unfamiliar female voices revealed that children with ASD have reduced activity in several visual processing regions as well as key structures of the medial temporal lobe memory system (Figure 2B; see Appendix 1—table 1 for effect sizes and Appendix 1—figure 4 for within-group results). Specifically, whole-brain analysis revealed that TD children had greater activation compared to children with ASD for mother’s voice in bilateral intercalcarine cortex extending into lingual gyrus. Moreover, children with ASD showed reduced activity compared to TD children in a broad extent of fusiform gyrus bilaterally, including both left-hemisphere occipital regions of fusiform as well as temporal occipital regions in the right-hemisphere. Children with ASD also showed less activity for mother’s voice in right-hemisphere posterior hippocampus, a critical region for learning and memory, as well as precuneus cortex of the default mode network. Signal level analysis shows that TD children have greater activity for mother’s voice compared to unfamiliar female voices in these brain regions (i.e. positive βs; see Appendix 1—figure 3B) while children with ASD show weaker activity for mother’s voice (i.e. negative βs). No brain structures showed greater activity for mother’s voice in the ASD, compared to the TD, group. Moreover, fMRI activation profiles in children with ASD were not related to mother’s voice identification accuracy (see Appendix, fMRI activation and connectivity profiles in children with ASD are not related to mother’s voice identification accuracy).

Brain activity and social communication abilities

Identifying sources of variance in key symptom domains represents an important question for autism research. We performed a whole-brain linear regression analysis using individual social communication scores as a predictor of brain activation. We first examined this relation in the context of general vocal processing using the unfamiliar female voices minus environmental sounds contrast. Results from this analysis show a striking pattern: the strength of activity in a variety of brain systems serving auditory, reward, and salience detection is correlated with social communication abilities in children with ASD (Figure 3A; see Appendix 1—table 2 for effect sizes). Specifically, this pattern was apparent in auditory association cortex of the superior temporal plane, including the PP, but also in the nucleus accumbens of the reward pathway, and anterior insula of the salience network. Scatterplots show that brain activity and social communication abilities vary across a range of values and greater social function, reflected by lower social communication scores, is associated with greater brain activity in these auditory, reward, and salience processing regions. Support vector regression (SVR) analysis (Abrams et al., 2016; Cohen et al., 2010) showed that the strength of activity in these regions was a reliable predictor of social communication function in these children (R ≥ 0.49; p ≤ 0.011 for all regions).

Activity in response to vocal stimuli and social communication abilities in children with ASD.

(A) In children with ASD, the whole-brain covariate map shows that social communication scores are correlated with activity strength during unfamiliar female voice processing in auditory association cortex, the NAc of the reward system, and AI of the salience network. Scatterplots show the distributions and covariation of activity strength in response to unfamiliar female voices and standardized scores of social communication abilities in these children. Greater social communication abilities, reflected by smaller social communication scores, are associated with greater brain activity in these regions. (B) The whole-brain covariate map shows that social communication scores are correlated with activity strength during mother’s voice processing in primary auditory and association cortex, voice-selective STS, vmPFC of the reward system, AI and rACC of the salience network, and SMA.

https://doi.org/10.7554/eLife.39906.004

We next examined the question of heterogeneity in the context of mother’s voice processing, and results show a similar pattern: children with ASD with greater social communication abilities showed greater activation for mother’s voice in a wide extent of primary auditory, auditory association, and voice-selective cortex as well as mesolimbic reward, salience detection, and motor regions (Figure 3B). Specifically, this brain-behavior relationship was evident in auditory regions of superior temporal cortex, including medial aspects of bilateral Heschl’s gyrus, which contains primary auditory cortex, right-hemisphere PP of the superior temporal plane, as well as bilateral voice-selective mSTS. This relationship was also observed in regions of the salience network, including dorsal aspects of AI bilaterally and right-hemisphere rostral ACC (rACC), as well as vmPFC of the reward network. SVR results indicated that the strength of activity in these particular brain regions during mother’s voice processing was a reliable predictor of social communication function in these children (R ≥ 0.50; p ≤ 0.009 for all regions).

Connectivity patterns predict group membership

Functional connectivity was examined using a generalized psychophysiological interaction (gPPI) model within an extended voice processing brain network defined a priori from intrinsic connectivity results described in a previous study in children with ASD (Abrams et al., 2013a) (Figure 4A). This approach allows us to systematically build upon our previous findings while preempting task and sample-related biases in region-of-interest (ROI) selection. This extended voice processing network included ROIs in voice-selective STS, structures of the reward and salience networks, amygdala, hippocampus, and fusiform cortex (see Appendix 1—table 3 for details of this network). There were no univariate group differences in individual links during either unfamiliar voice (Figure 4B) or mother’s voice processing (Figure 4C) after correcting for multiple comparisons (FDR, q < 0.05). Support vector classification (SVC) results showed that multivariate connectivity patterns during unfamiliar voice processing were unable to predict group membership above chance (SVC Accuracy = 51.6%, p = 0.31); however, multivariate connectivity patterns during mother’s voice processing accurately predicted TD vs. ASD group membership (SVC Accuracy = 70.4%, p = 0.001). We performed a confirmatory analysis using a different logistic regression classifier (GLMnet, generalized linear model via penalized maximum likelihood) and results were similar to the SVC results (unfamiliar voice processing: 54.8%, p = 0.78 (not significant); mother’s voice: 80.9%, p = 0.010). These SVC results held even after accounting for group differences in mother’s voice identification accuracy (see Appendix, fMRI activation and connectivity profiles in children with ASD are not related to mother’s voice identification accuracy). Results show that patterns of brain connectivity during biologically-salient voice processing, but not unfamiliar voice processing, can distinguish children with ASD from TD children.

Functional connectivity in the extended voice-selective network and TD vs. ASD group membership.

(A) The brain network used in connectivity analyses, which includes voice-selective, reward, salience, affective, and face-processing regions, was defined a priori from intrinsic connectivity results described in a previous study of children with ASD (Abrams et al., 2013a). (B-C) Group difference connectivity matrices shows differences in connectivity between TD children and children with ASD for all node combinations during (B) unfamiliar female voice processing and (C) mother’s voice processing. Results from multivariate connectivity analysis show that connectivity patterns during mother’s voice processing can accurately predict TD vs. ASD group membership; however, connectivity patterns during unfamiliar female voice processing are unable to accurately predict group membership.

https://doi.org/10.7554/eLife.39906.005

Connectivity patterns predict social communication abilities

We next examined the relation between connectivity beta weights in each cell of the connectivity matrix and social communication scores in children with ASD. There were no significant univariate correlations between the strength of brain connectivity during either unfamiliar (Figure 5B) or mother’s voice processing (Figure 5C) and social communication abilities. We then performed support vector regression (SVR) to examine whether multivariate patterns of connectivity during voice processing accurately predict social communication abilities in these children. Given that brain activation results showed that both unfamiliar (Figure 3A) and mother’s voice processing (Figure 3B) explained variance in social communication abilities, we used a combination of connectivity features from both vocal conditions for this analysis. SVR results showed that multivariate connectivity patterns during unfamiliar and mother’s voice processing accurately predict social communication scores in children with ASD (R = 0.42, p = 0.015). We performed a confirmatory analysis using GLMnet and results were similar to the SVR results (social communication prediction: R = 0.76, p < 0.001). Furthermore, when children with below chance accuracy on the mother’s voice identification accuracy were removed from the analysis, this result held and connectivity patterns were still predictive of social communication scores (see Appendix, fMRI activation and connectivity profiles in children with ASD are not related to mother’s voice identification accuracy).

Functional connectivity in the extended voice-selective network and social communication abilities in children with ASD.

(A) The brain network used in connectivity analyses, which includes voice-selective, reward, salience, affective, and face-processing regions, was defined a priori from intrinsic connectivity results described in a previous study of children with ASD (Abrams et al., 2013a). (B-C) Correlation matrices show Pearson’s correlations between social communication scores and connectivity for each pairwise node combination in response to (B) unfamiliar female voice processing and (C) mother’s voice processing in children with ASD. Results from multivariate connectivity analysis show that using a combination of connectivity features from both unfamiliar female and mother’s voice processing can accurately predict social communication scores in children with ASD.

https://doi.org/10.7554/eLife.39906.006

Discussion

It is unknown why children with ASD often ‘tune out’ from the voices of social partners in their environment (Kanner, 1968), including personal relations such as family members and caregivers (Klin, 1991). Here, we identify a striking relationship between individuals’ social communication abilities and the strength of activation in reward and salience processing brain regions, notably NAc and AI, during human voice processing in children with ASD. Multivariate connectivity patterns within an extended voice processing network distinguished children with ASD from their TD peers and predicted social communication abilities in children with ASD. These findings suggest that dysfunction of the brain’s reward system provides a stable brain signature of ASD that contributes to aberrant processing of salient vocal information (Abrams et al., 2013a).

Regional and network features associated with voice processing predict individual differences in social function in children with ASD

Individuals with ASD present with a complex behavioral profile, which includes an array of sensory (Marco et al., 2011), cognitive (Mundy and Newell, 2007), and affective processing differences (Harms et al., 2010) compared to TD individuals. Consensus on the specific factors that most contribute to pronounced social communication difficulties in this population has remained elusive. Our findings showed that both regional and network features associated with voice processing, encompassing voice-selective cortex in the STS and extended voice-processing network that includes auditory, reward, and salience regions, predicted social function in children with ASD. The diversity of this network reflects the complexities of social communication itself, which involves the ability to integrate sensory, affective, mnemonic, and reward information. Importantly, our results unify several important characteristics of ASD in the extant literature, including regional functional aberrancies within specific brain systems and their association with social abilities (Gervais et al., 2004; Kleinhans et al., 2008; Scott-Van Zeeland et al., 2010; Richey et al., 2014; Lombardo et al., 2015), network level dysfunction (Di Martino et al., 2011; Uddin et al., 2013b; von dem Hagen et al., 2013; Abrams et al., 2013a), and heterogeneity of social communication abilities (Lord et al., 2012; Lord et al., 1994). We suggest that social communication function – human’s ability to interact with and relate to others – is a unifying factor for explaining regional activation profiles and large-scale connectivity patterns linking key elements of the social brain.

A voice-related brain network approach for understanding social information processing in autism

Brain network analyses represent an important approach for understanding brain function in autism (Di Martino et al., 2011; Uddin et al., 2013b; von dem Hagen et al., 2013; Abrams et al., 2013a), and psychopathology more broadly (Menon, 2011). These methods, which are typically applied to resting-state brain imaging data, have yielded considerable knowledge regarding network connectivity patterns in ASD and their links to behavior (Abrams et al., 2013a). A central assumption of this approach is that aberrant task-evoked circuit function is associated with clinical symptoms and behavior; however, empirical studies examining these associations have been lacking from the ASD literature. Our study addresses this gap by probing task-evoked function within a network defined a priori from a previous study of intrinsic connectivity of voice-selective networks in an independent group of children with ASD. We show that voice-related network function during the processing of a clinically and biologically meaningful social stimulus predicts both ASD group membership as well as social communication abilities in these children. Our findings bridge a critical gap between the integrity of the intrinsic architecture of the voice-processing network in children with ASD and network signatures of aberrant social information processing in these individuals.

Biologically-salient vocal stimuli for investigating the social brain in autism spectrum disorders

Our results demonstrate that brief samples of a biologically salient voice, mother’s voice, elicit a distinct neural signature in children with ASD. Our findings have important implications for the development of social skills in children with ASD. Specifically, typically developing children prefer biologically salient voices such as a mother’s voice which provide critical cues for social (Adams and Passman, 1979) and language learning (Liu et al., 2003). In contrast, both anecdotal (Kanner, 1968) and experimental accounts (Klin, 1991) indicate that children with ASD do not show a preference for these sounds. We suggest that aberrant function within the extended voice processing network may underlie insensitivity to biologically salient voices in children with ASD, which may subsequently affect key developmental processes associated with social and pragmatic language learning.

The social motivation theory and reward circuitry in children with ASD

The social motivation theory of ASD provides an important framework for considering pervasive social deficits in affected individuals (Dawson et al., 2002; Chevallier et al., 2012). The theory posits that social skills emerge in young children from an initial attraction to social cues in their environment. For example, TD infants are highly attentive to speech despite having no understanding of words’ meanings, and this early attraction to vocal cues may be a critical step in a developmental process that includes speech sound discrimination, mimicry, and, ultimately, language learning and verbal communication (Kuhl et al., 2005b). In contrast, children with ASD often do not engage with the speech in their environment (Kanner, 1968), and a central hypothesis of the social motivation theory is that weak reward attribution to vocal sounds during early childhood disrupts important developmental processes supporting social communication.

Our findings provide support for the social motivation theory by showing a link between social communication abilities in children with ASD and the strength of activity in reward and salience detection systems in response to unfamiliar and mother’s voice. Specifically, children with ASD who have the most severe social communication deficits have the weakest responses in reward and salience detection brain regions to both of these vocal sources. Moreover, network connectivity of an extended voice-selective network, which includes nodes of the salience and reward networks, distinguished ASD and TD children and predicted social communication abilities in children with ASD. These results are the first to show that aberrant function of reward circuitry during voice processing is a distinguishing feature of childhood autism, and may limit the ability of children with ASD to experience vocal sounds as rewarding or salient. Our findings add to a growing literature suggesting that functional connectivity between voice-selective STS and reward and salience processing regions is an important predictor of social skill development in children (Abrams et al., 2016; Abrams et al., 2013a).

Our results highlighting the role of reward and salience in the context of voice processing have implications for clinical treatment of social communication deficits in children with ASD. An important direction for treatment of children with ASD involves the use of teaching strategies (Dawson et al., 2010; Koegel and Koegel, 2006) that focus on motivating children to engage in verbal interactions to improve social communication skills (Koegel et al., 2005; Mundy and Stella, 2000). Findings suggest that clinical efforts to increase the reward value of vocal interactions in children with ASD may be key to remediating social communication deficits in these individuals. Furthermore, neural activity and connectivity measures may represent a quantitative metric for assessing response to clinical treatments focused on verbal interactions.

Limitations

There are limitations to the current work that warrant consideration. First, the sample size is relatively modest compared to recent task-based brain imaging studies of neurotypical adult populations and resting-state fMRI or structural MRI studies in individuals with ASD, however these types of studies do not face the same data collection challenges as task-based studies in clinical pediatric populations (Yerys et al., 2009). Importantly, resting-state and structural imaging studies are unable to address specific questions related to social information processing in ASD, such as biologically salient voice processing, which are critical for understanding the brain bases of social dysfunction in affected children. Indeed, our sample size is larger than, or comparable to, the majority of task-fMRI studies in children with ASD published since 2017, and have more stringent individual-level sampling compared to these studies. This is an important consideration given that the replicability of task fMRI data is not solely contingent on a large sample size but also depends on the amount of individual-level sampling. A recent report examining this question showed that modest sample sizes, comparable to those described in our submitted manuscript, yield highly replicable results with only four runs of task data with a similar number of trials per run as our study (Nee, 2018). In comparison, we required that each child participant had at least seven functional imaging runs of our event-related fMRI task that met our strict head movement criteria. A final limitation of this work is that, consistent with the vast majority of brain imaging studies in children with ASD, we were unable to include lower functioning children with ASD since the scanner environment is ill-suited for these children (Yerys et al., 2009). Further studies with larger samples are needed both to capture the full range of heterogeneity of ASD and to ensure the broader generalizability of the findings reported here.

Conclusion

We identified neural features underlying voice processing impairments in children with ASD, which are thought to contribute to pervasive social communication difficulties in affected individuals. Results show that activity profiles and network connectivity patterns within voice-selective and reward regions, measured during unfamiliar and mother’s voice processing, distinguish children with ASD from TD peers and predict their social communication abilities. These findings are consistent with the social motivation theory of ASD by linking human voice processing to dysfunction in the brain’s reward centers, and have implications for the treatment of social communication deficits in children with ASD. For example, parent training has emerged as a powerful and cost-effective approach for increasing treatment intensity (National Research Council, 2001): treatment delivery in the child’s natural environment promotes functional communication (Delprato, 2001), generalization (Stokes and Baer, 1977), and maintenance of skills over time (Sheinkopf and Siegel, 1998; Moes and Frea, 2002). Findings from the current study, which demonstrate a link between social communication function and neural processing of mother’s voice, support the importance of parent training by suggesting that a child’s ability to focus on, and direct neural resources to, these critical communication partners may be a key to improving social function in affected children.

Materials and methods

Participants

The Stanford University Institutional Review Board approved the study protocol. Parental consent and the child’s assent were obtained for all evaluation procedures, and children were paid for their participation in the study.

A total of 57 children were recruited from around the San Francisco Bay Area for this study. All children were required to be right-handed and have a full-scale IQ > 80, as measured by the Wechsler Abbreviated Scale of Intelligence (WASI) (Wechsler, 1999). 28 children met ASD criteria based on an algorithm (Risi et al., 2006) that combines information from both the module 3 of the ADOS-2 (47) and the ADI–Revised (Lord et al., 1994). Specifically, these children showed mild to more severe social communication deficits, particularly in the areas of social-emotional reciprocity and verbal and non-verbal communication, and repetitive and restricted behaviors and interests (American Psychiatric Association, 2013). Five children with ASD were excluded because of excessive movement in the fMRI scanner, one child was excluded because of a metal retainer interfering with their brain images, and one child was excluded because their biological mother was not available to do a voice recording. Importantly, children in the ASD sample are considered ‘high-functioning’ and had fluent language skills and above-average reading skills (Table 1). Nevertheless, these children are generally characterized as having communication impairments, especially in the area of reciprocal conversation.

Table 1
Demographic and IQ measures
https://doi.org/10.7554/eLife.39906.007
ASD (n = 21)TD (n = 21)p-value
Gender ratio18 M: 3 F17 M: 4 F0.69†
Age (years)10.75 ± 1.4810.32 ± 1.420.34
Full-scale IQ*113.75 ± 15.04117.45 ± 10.830.38
VIQ*112.25 ± 16.13118.55 ± 12.130.17
PIQ
ADOS social
ADI-A social
ADI-B communication
ADI- C repetitive behaviors
Word reading
Reading comprehension
111.52 ± 14.30
9.52 ± 2.54
6.81 ± 4.52
7.43 ± 5.01
4.10 ± 2.66
112.24 ± 11.34
108.29 ± 11.81
113.14 ± 13.46
-
-
-
-
114.38 ± 8.96
115.38 ± 9.09
0.71
-
-
-
-
0.50
0.35
Max. Motion (mm)1.99 ± 0.931.73 ± 0.930.36
Mother's voice ID accuracy0.88 ± 0.210.98 ± 0.040.04
  1. Demographic and mean IQ scores are shown for the sample.

    M, Male; F, Female; WASI, Wechsler Abbreviated Scale of Intelligence.

  2. Chi-squared test.

    *Score missing for one participant in TD and ASD groups.

TD children and had no history of neurological, psychiatric, or learning disorders, personal and family history (first degree) of developmental cognitive disorders and heritable neuropsychiatric disorders, evidence of significant difficulty during pregnancy, labor, delivery, or immediate neonatal period, or abnormal developmental milestones as determined by neurologic history and examination. Three TD children were excluded because of excessive movement in the fMRI scanner, one was excluded because of scores in the ‘severe’ range on standardized measures of social function, and four female TD children were excluded to provide a similar ratio of males to females relative to the ASD participants. The final TD and ASD groups that were included in the analysis consisted of 21 children in each group who were matched for full-scale IQ, age, sex, and head motion during the fMRI scan (Table 1). All participants are the biological offspring of the mothers whose voices were used in this study (i.e. none of our participants were adopted, and therefore none of the mother’s voices are from an adoptive mother), and all participants were raised in homes that included their mothers. Participants’ neuropsychological characteristics are provided in Table 1.

Data acquisition parameters

All fMRI data were acquired at the Richard M. Lucas Center for Imaging at Stanford University. Functional images were acquired on a 3 T Signa scanner (General Electric) using a custom-built head coil. Participants were instructed to stay as still as possible during scanning, and head movement was further minimized by placing memory-foam pillows around the participant’s head. A total of 29 axial slices (4.0 mm thickness, 0.5 mm skip) parallel to the anterior/posterior commissure line and covering the whole brain were imaged by using a T2*-weighted gradient-echo spiral in-out pulse sequence (Glover and Law, 2001) with the following parameters: repetition time = 3576 ms; echo time = 30 ms; flip angle = 80°; one interleave. The 3576 msec TR can be calculated as the sum of: (1) the stimulus duration of 956 msec; (2) a 300 ms silent interval buffering the beginning and end of each stimulus presentation (600 ms total of silent buffers) to avoid backward and forward masking effects; (3) the 2000 ms volume acquisition time; and (4) an additional 20 ms silent interval, which helped the stimulus computer maintain precise and accurate timing during stimulus presentation. The field of view was 20 cm, and the matrix size was 64 × 64, providing an in- plane spatial resolution of 3.125 mm. Reduction of blurring and signal loss arising from field inhomogeneities was accomplished by the use of an automated high-order shimming method before data acquisition.

fMRI Task

Auditory stimuli were presented in 10 separate runs, each lasting 4 min. One run consisted of 56 trials of mother’s voice, unfamiliar female voices, environmental sounds and catch trials, which were pseudo-randomly ordered within each run. Stimulus presentation order was the same for each subject. Each stimulus lasted 956 msec in duration. Prior to each run, child participants were instructed to play the ‘kitty cat game’ during the fMRI scan. While laying down in the scanner, children were first shown a brief video of a cat and were told that the goal of the cat game was to listen to a variety of sounds, including ‘voices that may be familiar,’ and to push a button on a button box only when they heard kitty cat meows (catch trials). The function of the ‘catch trials’ was to keep the children alert and engaged during stimulus presentation. During each run, four or five exemplars of each stimulus type (i.e. nonsense words samples of mother’s and unfamiliar female voices, environmental sounds), as well as three catch trials, were presented. At the end of each run, the children were shown another engaging video of a cat. Although the button box failed to register responses during data collection in four children with ASD and nine TD children, data analysis of the catch trails for 17 children with ASD and 12 TD children showed similar catch trial accuracies between TD (accuracy = 91%) and ASD groups (accuracy = 89%; two-sample t-test results: t(2) = 0.35, p = 0.73). Across the ten runs, a total of 48 exemplars of each stimulus condition were presented to each subject (i.e. 144 total exemplars produced by each of the three vocal sources, including the child’s mother, unfamiliar female voice #1, and unfamiliar female voice #2). Vocal stimuli were presented to participants in the scanner using Eprime V1.0 (Psychological Software Tools, 2002). Participants wore custom-built headphones designed to reduce the background scanner noise to ∼70 dBA (Abrams et al., 2011; Abrams et al., 2013b). Headphone sound levels were calibrated prior to each data collection session, and all stimuli were presented at a sound level of 75 dBA. Participants were scanned using an event-related design. Auditory stimuli were presented during silent intervals between volume acquisitions to eliminate the effects of scanner noise on auditory discrimination. One stimulus was presented every 3576 ms, and the silent period duration was not jittered. The total silent period between stimulus presentations was 2620 ms, and consisted of a 300 ms silent period, 2000 ms for a volume acquisition, another 300 ms of silence, and a 20 ms silent interval that helped the stimulus computer maintain precise and accurate timing during stimulus presentation.

Functional MRI preprocessing

fMRI data collected in each of the 10 functional runs were subject to the following preprocessing procedures. The first five volumes were not analyzed to allow for signal equilibration. A linear shim correction was applied separately for each slice during reconstruction by using a magnetic field map acquired automatically by the pulse sequence at the beginning of the scan. Translational movement in millimeters (x, y, z) was calculated based on the SPM8 parameters for motion correction of the functional images in each subject. To correct for deviant volumes resulting from spikes in movement, we used a de-spiking procedure. Volumes with movement exceeding 0.5 voxels (1.562 mm) or spikes in global signal exceeding 5% were interpolated using adjacent scans. The majority of volumes repaired occurred in isolation. After the interpolation procedure, images were spatially normalized to standard Montreal Neurological Institute (MNI) space, resampled to 2 mm isotropic voxels, and smoothed with a 6 mm full-width at half maximum Gaussian kernel.

Movement criteria for inclusion in fMRI analysis

For inclusion in the fMRI analysis, we required that each functional run had a maximum scan-to-scan movement of < 6 mm and no more than 15% of volumes were corrected in the de-spiking procedure. Moreover, we required that all individual subject data included in the analysis consisted of at least seven functional runs that met our criteria for scan-to-scan movement and percentage of volumes corrected; subjects who had fewer than seven functional runs that met our movement criteria were not included in the data analysis. All 42 participants included in the analysis had at least seven functional runs that met our movement criteria, and the total number of runs included for TD and ASD groups were similar (TD = 192 runs; ASD = 188 runs).

Voxel-wise analysis of fMRI activation

The goal of this analysis was to identify brain regions that showed differential activity levels in response to mother’s voice, unfamiliar voices, and environmental sounds. Brain activation related to each vocal task condition was first modeled at the individual subject level using boxcar functions with a canonical hemodynamic response function and a temporal derivative to account for voxel-wise latency differences in hemodynamic response. Environmental sounds were not modeled to avoid collinearity, and this stimulus served as the baseline condition. Low-frequency drifts at each voxel were removed using a high-pass filter (0.5 cycles/min) and serial correlations were accounted for by modeling the fMRI time series as a first-degree autoregressive process (Friston et al., 1997). We performed whole-brain ANOVAs to separately investigate unfamiliar and mother’s voice processing: (1) the unfamiliar voice analysis used the factors group (TD and ASD) and auditory condition (unfamiliar voices and environmental sounds) and (2) the mother’s voice analysis used the factors group (TD and ASD) and voice condition (mother's voice and unfamiliar voices). These ANOVAs were designed to test specific hypotheses described in the Introduction. Group-level activation was determined using individual subject contrast images and a second-level analysis of variance. The main contrasts of interest were [mother’s voice – unfamiliar female voices] and [unfamiliar female voices – environmental sounds]. Significant clusters of activation were determined using a voxel-wise statistical height threshold of p < 0.005, with family-wise error corrections for multiple spatial comparisons (p < 0.05; 67 voxels) determined using Monte Carlo simulations (Forman et al., 1995; Ward, 2000) using a custom Matlab script (see Source Code). To examine GLM results in the inferior colliculus and NAc, small subcortical brain structures, we used a small volume correction at p<0.05 with a voxel-wise statistical height threshold of p < 0.005. To determine the robustness of our findings, group comparisons were also performed using more stringent height and extent thresholds (Appendix 1—tables 45). To provide estimates of effect sizes within specific regions displayed in Figure 2, t-scores from the whole-brain TD vs. ASD group GLM analysis were averaged within each significant cluster. Effect sizes were then computed as Cohen’s d according to Equation 1 below, where t is the mean t-score within a cluster and N is the sample size:

(1) Cohens d=tsqrt(N2)

To define specific cortical regions, we used the Harvard–Oxford probabilistic structural atlas (Smith et al., 2004) with a probability threshold of 25%.

Brain-behavior analysis

Regression analysis was used to examine the relationship between brain responses to unfamiliar and mother’s voice and social communication abilities in children with ASD. Social communication function was assessed using the Social Affect subscore of the ADOS-2 (47). Brain-behavior relationships were examined using analysis of activation levels. A whole-brain, voxel-wise regression analysis was performed in which the relation between fMRI activity and social communication scores was examined using images contrasting [unfamiliar female voices > environmental sounds] and [mother’s vs. unfamiliar female voices]. Significant clusters were determined using a voxel-wise statistical height threshold of p < 0.005, with family-wise error corrections for multiple spatial comparisons (p < 0.05; 67 voxels) determined using Monte Carlo simulations (Forman et al., 1995; Ward, 2000). To determine the robustness of our findings, brain-behavior relations were also examined using more stringent height and extent thresholds (Appendix 1—tables 67). To provide estimates of effect sizes within regions displayed in Figure 3, t-scores from the whole-brain ASD Social Communication covariate analysis were averaged within each cluster identified in the GLM analysis. Effect sizes were then computed as Cohen’s f according to Equation 2 below, where t is the mean t-score within a cluster and N is the sample size:

(2) Cohen f=tsqrt(N)

Brain activity levels and prediction of social function

To examine the robustness and reliability of brain activity levels for predicting social communication scores, we used support vector regression (SVR) to perform a confirmatory cross-validation analysis that employs a machine-learning approach with balanced fourfold cross-validation (CV) combined with linear regression (Cohen et al., 2010). In this analysis, we extracted individual subject activation beta values taken from the [unfamiliar female voices > environmental sounds] and [mother’s voice > unfamiliar female voices] GLM contrasts. For the [unfamiliar female voices > environmental sounds] GLM contrast, GLM betas were extracted from right-hemisphere PP and AI as well as left-hemisphere NAc. For the [mother’s voice > unfamiliar female voices] GLM contrast, GLM betas were extracted from left-hemisphere HG, PP, and AI as well as right-hemisphere mSTS, vmPFC, rACC, and SMA. These values were entered as independent variables in a linear regression analysis with ADOS-2 Social Affect subscores as the dependent variable. r (predicted, observed), a measure of how well the independent variable predicts the dependent variable, was first estimated using a balanced fourfold CV procedure. Data were divided into four folds so that the distributions of dependent and independent variables were balanced across folds. Data were randomly assigned to four folds and the independent and dependent variables tested in one-way ANOVAs, repeating as necessary until both ANOVAs were insignificant in order to guarantee balance across the folds. A linear regression model was built using three folds leaving out the fourth, and this model was then used to predict the data in the left-out fold. This procedure was repeated four times to compute a final r(predicted, observed) representing the correlation between the data predicted by the regression model and the observed data. Finally, the statistical significance of the model was assessed using a nonparametric testing approach. The empirical null distribution of r (predicted, observed) was estimated by generating 1000 surrogate datasets under the null hypothesis that there was no association between changes in ADOS social communication subscore and brain activity levels.

Functional connectivity analysis

We examined functional connectivity between ROIs using the generalized psychophysiological interaction (gPPI) model (McLaren et al., 2012), with the goal of identifying connectivity between ROIs in response to each task condition as well differences between task conditions (mother’s voice, other voice, environmental sounds). We used the SPM gPPI toolbox for this analysis. gPPI is more sensitive than standard PPI to task context-dependent differences in connectivity (McLaren et al., 2012). Unlike dynamical causal modeling (DCM), gPPI does not use a temporal precedence model (x(t + 1)~x(t)) and therefore makes no claims of causality. The gPPI model is summarized in Equation 3 below:

(3) ROItarget conv(deconv(ROIseed)taskwaveform)+ ROIseed+constant

Briefly, in each participant, the regional timeseries from a seed ROI was deconvolved to uncover quasi-neuronal activity and then multiplied with the task design waveform for each task condition to form condition-specific gPPI interaction terms. These interaction terms are then convolved with the hemodynamic response function (HRF) to form gPPI regressors for each task condition. The final step is a standard general linear model predicting target ROI response after regressing out any direct effects of the activity in the seed ROI. In the equation above, ROItarget and ROIseed are the time series in the two brain regions, and taskwaveform contains three columns corresponding to each task condition. The goal of this analysis was to examine connectivity patterns within an extended voice-selective network identified in a previous study of children with ASD (Abrams et al., 2013a). This study showed weak intrinsic connectivity between bilateral voice-selective STS and regions implicated in reward, salience, memory, and affective processing. The rationale for the use of an a priori network is it is an established method of network identification that preempts task and sample-related biases in region-of-interest (ROI) selection. This approach therefore allows for a more generalizable set of results compared to a network defined based on nodes identified using the current sample of children and task conditions. The network used in all connectivity analyses consisted of 16 regions. All cortical ROIs were constructed as 5 mm spheres centered on the coordinates listed in Appendix 1—table 3, while subcortical ROIs were constructed as 2 mm spheres.

Functional connectivity, group classification, and prediction of social function

Support vector classification (SVC) and regression (SVR) were used to examine whether patterns of connectivity within the extended voice processing network could predict TD vs. ASD group membership and social communication abilities in children with ASD, respectively. First, to examine TD vs. ASD group membership, a linear support vector machine algorithm (C = 1) from the open-source library LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) was used to build classifiers to distinguish children with ASD from TD children during unfamiliar voice processing. Individual subject connectivity matrices (16 × 16 ROIs) taken from the [unfamiliar female voices > environmental sounds] gPPI contrast were used as features to train classifiers in each dataset. Classifier performance was evaluated using a four-fold cross-validation procedure. Specifically, a dataset was randomly partitioned into four folds. Three folds of data (training set) were used to train a classifier, which was then applied to the remaining fold (test set) to predict whether each sample in the test set should be classified as ASD or TD. This procedure was repeated four times with each of the four folds used exactly once as a test set. The average classification accuracy across the four folds (cross-validation accuracy) was used to evaluate the classifier’s performance. To further account for variation due to random data partition, we repeated the same cross-validation procedure 100 times with different random data partitions. Finally, the mean cross-validation accuracies from 100 iterations was reported, and its statistical significance was evaluated using permutation testing (1000 times) by randomly permuting subjects’ labels and repeating the same above procedures. The same SVC methods were used to examine whether connectivity features during mothers voice processing could accurately predict TD vs. ASD group membership, however in this analysis individual subject connectivity matrices (16 × 16 ROIs) taken from the [mother’s voice > unfamiliar female voices] gPPI contrast were used as features to train the classifier.

Finally, SVR was used to examine whether connectivity patterns during unfamiliar female and mother’s voice processing could predict social communication scores in children with ASD. SVR methods are the same as those described in Brain Activity Levels and Prediction of Social Function; however, features in this analysis include multivariate connectivity patterns across the extended voice-selective network (16 ROIs). Given that brain activation results showed that both unfamiliar (Figure 3A) and mother’s voice processing (Figure 3B) explained variance in social communication abilities, we used a combination of connectivity features from both vocal conditions for this analysis. Specifically, connectivity features from both the [unfamiliar female voices > environmental sounds] and [mother’s voice > unfamiliar female voices] gPPI contrasts were entered as independent variables in a linear regression analysis with ADOS-2 Social Affect subscores as the dependent variable.

As a confirmatory analysis, and to examine the robustness of SVC and SVR results, we used GLMnet (http://www-stat.stanford.edu/~tibs/glmnet-matlab), a logistic regression classifier that includes regularization and exploits sparsity in the input matrix, on the same 16 × 16 connectivity matrices described for the SVC and SVR analyses above.

Stimulus design considerations

Previous studies investigating the processing (DeCasper and Fifer, 1980; Adams and Passman, 1979) and neural bases (Imafuku et al., 2014; Purhonen et al., 2004) of mother’s voice processing have used a design in which one mother’s voice serves as a control voice for another participant. However, due to an important practical limitation, the current study used a design in which all participants heard the same two control voices. While we make every effort to recruit children from a variety of communities in the San Francisco Bay Area, some level of recruitment occurs through contact with specific schools, and in other instances our participants refer their friends to our lab for inclusion in our studies. In these cases, it is a reasonable possibility that our participants may have known other mothers involved in the study, and therefore may be familiar with these mothers’ voices, which would limit the control we were seeking in our control voices. Importantly, HIPPA guidelines are explicit that participant information is confidential, and therefore there would be no way to probe whether a child knows any of the other families involved in the study. Given this practical consideration, we concluded that it would be best to use the same two control voices, which we knew were unfamiliar to the participants, for all participants’ data collection.

Stimulus recording

Recordings of each mother were made individually while their child was undergoing neuropsychological testing. Mother’s voice stimuli and control voices were recorded in a quiet conference room using a Shure PG27-USB condenser microphone connected to a MacBook Air laptop. The audio signal was digitized at a sampling rate of 44.1 kHz and A/D converted with 16-bit resolution. Mothers were positioned in the conference room to avoid early sound wave reflections from contaminating the recordings. To provide a natural speech context for the recording of each nonsense word, mothers were instructed to repeat three sentences, each of which contained one of the nonsense words, during the recording. The first word of each of these sentence was their child’s name, which was followed by the words ‘that is a,’ followed by one of the three nonsense words. A hypothetical example of a sentence spoken by a mother for the recording was ‘Johnny, that is a keebudieshawlt.’ Prior to beginning the recording, mothers were instructed on how to produce these nonsense words by repeating them to the experimenter until the mothers had reached proficiency. Importantly, mothers were instructed to say these sentences using the tone of voice they would use when speaking with their child during an engaging and enjoyable shared learning experience (e.g. if their child asked them to identify an item at a museum). The vocal recording session resulted in digitized recordings of the mothers repeating each of the three sentences approximately 30 times to ensure multiple high-quality samples of each nonsense word for each mother.

Stimulus post-processing

The goal of stimulus post-processing was to isolate the three nonsense words from the sentences that each mother spoke during the recording session and normalize them for duration and RMS amplitude for inclusion in the fMRI stimulus presentation protocol and the mother’s voice identification task. First, a digital sound editor (Audacity: http://audacity.sourceforge.net/) was used to isolate each utterance of the three nonsense words from the sentences spoken by each mother. The three best versions of each nonsense word were then selected based on the audio and vocal quality of the utterances (i.e. eliminating versions that were mispronounced, included vocal creak, or were otherwise not ideal exemplars of the nonsense words). These nine nonsense words were then normalized for duration to 956 ms, the mean duration of the nonsense words produced by the unfamiliar female voices, using Praat software similar to previous studies (Abrams et al., 2016; Abrams et al., 2008). A 10 msec linear fade (ramp and damp) was then performed on each stimulus to prevent click-like sounds at the beginning and end of the stimuli, and then stimuli were equated for RMS amplitude. These final stimuli were then evaluated for audibility and clarity to ensure that post-processing manipulations had not introduced any artifacts into the samples. The same process was performed on the control voices and environmental sounds to ensure that all stimuli presented in the fMRI experiment were the same duration and RMS amplitude.

Post-scan mother’s voice identification task

All participants who participated in the fMRI experiment completed an auditory behavioral test following the fMRI scan. The goal of the Mother’s Voice Identification Task was to determine if the participants could reliably discriminate their mother’s voice from unfamiliar female voices. Participants were seated in a quiet room in front of a laptop computer, and headphones were placed over their ears. In each trial, participants were presented with a recording of a multisyllabic nonsense word spoken by either the participant’s mother or a control mother, and the task was to indicate whether or not their mother spoke the word. The multisyllabic nonsense words used in the behavioral task were the exact same samples used in the fMRI task. Each participant was presented with 54 randomly ordered nonsense words: 18 produced by the subject’s mother and the remaining 36 produced by unfamiliar female voices.

Signal level analysis

Group mean activation differences for key brain regions identified in the whole-brain univariate analysis were calculated to examine the basis for TD > ASD group differences for both [unfamiliar female voices > environmental sounds] (Figure 2A) and [mother’s voice > unfamiliar female voices] contrasts (Figure 2B). The reason for this analysis is that stimulus differences can result from a number of different factors. For example, both mother’s voice and unfamiliar female voices could elicit reduced activity relative to baseline and significant stimulus differences could be driven by greater negative activation in response to unfamiliar female voices. Significant stimulus differences were inherent to this ROI analysis as they are based on results from the whole-brain GLM analysis (Vul et al., 2009); however, results provide important information regarding the magnitude and sign of results in response to both stimulus conditions. Baseline for this analysis was calculated as the brain response to environmental sounds. The coordinates for the ROIs used in the signal level analysis were based on peaks in TD > ASD group maps for the [unfamiliar female voices > environmental sounds] and [mother’s voice > unfamiliar female voices] contrasts. Cortical ROIs were defined as 5 mm spheres, and subcortical ROIs were 2 mm spheres, centered at the peaks in the TD > ASD group maps for the [unfamiliar female voices > environmental sounds] or [mother’s voice >unfamiliar female voices] contrasts. Signal level was calculated by extracting the β-value from individual subjects’ contrast maps for the [unfamiliar female voices > environmental sounds] and [mother’s voice >environmental sounds] comparisons. The mean β-value within each ROI was computed for both contrasts in all subjects. The group mean β and its standard error for each ROI are plotted in Appendix 1—figure 3.

Appendix 1

Acoustical analysis of mother’s voice samples

We performed acoustical analyses of mother’s voice and unfamiliar voice samples to characterize the physical attributes of the stimuli used for fMRI data collection. The goal of these analyses was to determine if differences between vocal samples collected from mothers of children with ASD and those collected from mothers of TD controls could potentially account for group differences in fMRI activity. Human voices are differentiated according to several acoustical features, including those reflecting the anatomy of the speaker’s vocal tract, such as the pitch and harmonics of speech, and learned aspects of speech production, which include speech rhythm, rate, and emphasis (Bricker and Pruzansky, 1976; Hecker, 1971). Acoustical analysis showed that vocal samples collected from mothers of children with ASD were comparable to those collected from mothers of TD controls measured across multiple spectrotemporal acoustical features (p > 0.10 for all acoustical measures; Figure 1B). An additional goal of the fMRI data analysis was to examine individual differences in social communication abilities in children with ASD, and therefore the next analysis focused on whether acoustical features varied as a function of social communication abilities in children with ASD; there was no relationship between acoustical measures and social communication scores (p > 0.25 for all acoustical measures). Finally, acoustical analyses of the unfamiliar voice samples used in all fMRI sessions were qualitatively similar to vocal samples collected from the mothers of TD controls and children with ASD. Together, these results indicate that there are no systematic differences in the acoustical properties of vocal samples collected from participants’ mothers that could potentially bias the fMRI analysis.

Identification of mother’s voice

To examine whether children who participated in the fMRI study could identify their mother’s voice accurately in the brief vocal samples used in the fMRI experiment, participants performed a mother’s voice identification task. All TD children identified their mother’s voice with a high degree of accuracy (mean accuracy = 97.5%; Figure 1C), indicating that brief (< 1 s) pseudoword speech samples are sufficient for the consistent and accurate identification of mother’s voice in these children. 16 of the 21 children in the ASD sample were also able to identify their mother’s voice with a high degree of accuracy (mean accuracy = 98.2%), however the remaining five children with ASD performed below chance on this task. Group comparison revealed that TD children had greater mother’s voice identification accuracy compared to children with ASD (t(40) = 2.13, p=0.039).

An important question is whether the five children with ASD who performed below chance on the mother’s voice identification task might show a distinct behavioral signature that may help explain why these children were unable to identify their mother’s voice in our identification task. While these children did not present with hearing impairments as noted by parents or neuropsychological assessors, who had performed extensive neuropsychological testing on these children prior to the fMRI scan and mother’s voice identification task, a plausible hypothesis is that the five children who were unable to identify their mother’s voice in the task would show greater social communication deficits, or lower scores on measures of cognitive and language function, and/or reduced brain activation in response to unfamiliar or mother’s voice stimuli. To test this hypothesis, we performed additional analyses to examine whether there are any identifying clinical or cognitive characteristics regarding these five children with low mother's voice identification accuracy.

We first examined differences in social communication scores and measures of cognitive and language abilities between children with ASD with low (N = 5) vs. high (N = 16) mother’s voice identification accuracy. Examining the distribution of ADOS Social scores revealed that the five children with low mother’s voice identification accuracy had a wide range of scores from 7 to 16 (please note that ADOS Social Affect is scored in a range between 0–20, with a score of 0 indicating no social deficit, a score of 7 indicating a more mild social communication deficit, and a score of 16 a more severe deficit). Group results for this measure are plotted in Appendix 1—figure 1A (left-most violin plot) and group comparisons between low (‘Low ID’ in green) and high (‘High ID’ in blue) mother’s voice identification groups using Wilcoxon rank sum tests were not significant for ADOS Social scores (p = 0.83). In a second analysis, we examined whether mother’s voice identification accuracy is related to social communication scores. Results from Pearson’s correlation analysis indicates that mother’s voice identification accuracy is not related to ADOS Social scores (R = 0.13, p = 0.59).

Appendix 1—figure 1
Social communication, cognitive, and language abilities in children with ASD with low vs. high mother’s voice identification accuracy.

(A) To examine whether children with ASD who were unable to identify their mother’s voice in the mother’s voice identification task (N = 5) showed a distinct behavioral profile relative to children with ASD who were able to perform this task (N = 16), we performed Wilcoxon rank sum tests using ADOS Social Affect scores (left-most violin plot) and standardized measures of IQ (Wechsler Abbreviated Scale of Intelligence (Wechsler, 1999)) between these groups. Group comparisons between low (green) and high (blue) mother’s voice identification groups using Wilcoxon rank sum tests were not significant for social communication (p = 0.83) or IQ measures (p > 0.25 for all three measures, uncorrected for multiple comparisons). (B) To examine group differences in language abilities for low vs. high mother’s voice identification groups, we performed Wilcoxon rank sum tests using CTOPP Phonological Awareness and CELF Language measures. Group comparison were not significant for any of the language measures (p > 0.05 for all four measures, not corrected for multiple comparisons), however there was a trend for reduced Core Language (p = 0.062) and Expressive Language abilities (p = 0.055) in the low (green) mother’s voice identification group.

https://doi.org/10.7554/eLife.39906.011

We next examined whether there were any differences in standardized IQ scores (Wechsler Abbreviated Scale of Intelligence (Wechsler, 1999)) for children with low and high mother’s voice identification accuracy, which are plotted below (Appendix 1—figure 1A, three right-most violin plots). Group comparisons between low and high mother’s voice identification groups using Wilcoxon rank sum tests were not significant for any of the IQ measures (p > 0.25 for all three measures, uncorrected for multiple comparisons). We then examined whether there were any differences for children with low vs. high mother’s voice identification accuracy in standardized measures of language abilities, including CTOPP Phonological Awareness (Wagner, 1999) and CELF-4 Core Language, Receptive Language, and Expressive Language standard scores (Semel, 2003) (Appendix 1—figure 1B). Group comparison using Wilcoxon rank sum tests were not significant for any of the language measures (p > 0.05 for all four measures, not corrected for multiple comparisons), however there was a trend for reduced Core Language (p = 0.062) and Expressive Language abilities (p = 0.055) in the low (green) mother’s voice identification group.

Together, results from clinical (i.e., social communication), cognitive, and language measures showed that there are no distinguishing features for the children with below chance mother’s voice identification accuracy compared to children with above chance accuracy.

Activation to unfamiliar female voices in TD children and children with ASD

We identified brain regions that showed increased activation in response to unfamiliar female voices compared to non-vocal environmental sounds separately within TD and ASD groups. This particular comparison has been used in studies examining the cortical basis of general vocal processing in neurotypical adult (Belin et al., 2000) and child listeners (Abrams et al., 2016). The TD child sample showed strong activation in bilateral superior temporal gyrus (STG) and sulcus (STS), amygdala, and right-hemisphere supramarginal gyrus of the inferior parietal lobule (IPL; Appendix 1—figure 2A). Children with ASD, however, showed a reduced activity profile in response to unfamiliar female voices, including a reduced extent of bilateral STG and STG and no difference in activity between unfamiliar female voices and environmental sounds in the amygdala (Appendix 1—figure 2B).

Appendix 1—figure 2
Brain activity in response to unfamiliar female voices compared to environmental sounds in TD children and children with ASD.

(A) In TD children, unfamiliar female voices elicit greater activity throughout a wide extent of voice-selective superior temporal gyrus (STG) and superior temporal sulcus (STS), bilateral amygdala, and right-hemisphere supramarginal gyrus. (B) Children with ASD show a reduced activity profile in STG/STS in response to unfamiliar female voices and do not show increased activity compared to environmental sounds in the amygdala.

https://doi.org/10.7554/eLife.39906.012

Activation to mother’s voice in TD children

We identified brain regions that showed greater activation in response to mother’s voice compared to unfamiliar female voices separately within the TD and ASD groups. By subtracting out brain activation associated with hearing unfamiliar female voices producing the same nonsense words (i.e., controlling for low-level acoustical features, phoneme and word-level analysis, auditory attention), we estimated brain responses unique to hearing the maternal voice. TD children showed increased activity in a wide range of brain systems, including auditory, voice-selective, reward, social, and visual functions (Appendix 1 —figure 3A). Specifically, mother’s voice elicited greater activation in primary auditory regions, including bilateral inferior colliculus (IC), the primary midbrain nucleus of the ascending auditory systems, and Heschl’s gyrus (HG), which includes primary auditory cortex. Mother’s voice also elicited greater activity in TD children in auditory association cortex in the superior temporal plane, including planum polare and planum temporale, with a slightly increased extent of activation in the right-hemisphere. Additionally, mother’s voice elicited greater activity in a wide extent of bilateral voice-selective STS, extending from the posterior-most aspects of this structure (y = −52) to anterior STS bordering the temporal pole (y = 6). Preference for mother’s voice was also evident in the medial temporal lobe, including left-hemisphere amygdala, a key node of the affective processing system, and bilateral posterior hippocampus, a critical structure for declarative and associative memory. Structures of the mesolimbic reward pathway also showed greater activity for mother’s voice, including bilateral nucleus accumbens and ventral putamen in the ventral striatum, orbitofrontal cortex (OFC), and ventromedial prefrontal cortex (vmPFC). Mother’s voice elicited greater activity in a key node of the default-mode network, instantiated in precuneus and posterior cingulate cortex, a brain system involved in processing self-referential thoughts. Preference for mother’s voice was also evident in visual association cortex, including lingual and fusiform gyrus. Next, mother’s voice elicited greater activity in bilateral anterior insula, a key node of the brain’s salience network. Finally, preference for mother’s voice was evident in frontoparietal regions, including right-hemisphere pars opercularis [Brodmann area (BA) 44] and pars triangularis (BA 45) of the inferior frontal gyrus, the angular and supramarginal gyri of inferior parietal lobule (IPL), and supplementary motor cortex.

Appendix 1—figure 3
Signal levels in response to unfamiliar female voices and mother’s voice in TD children and children with ASD.

The reason for the signal level analysis is that stimulus-based differences in fMRI activity can result from a number of different factors. Significant differences were inherent to this ROI analysis as they are based on results from the whole-brain GLM analysis (Vul et al., 2009); however, results provide important information regarding the magnitude and sign of fMRI activity. (a) Regions were selected for signal level analysis based on their identification in the TD > ASD group difference map for the [unfamiliar female voices vs. environmental sounds] contrast (Figure 2A). ROIs are 5 mm spheres centered at the peak for these regions in the TD > ASD group difference map for the [unfamiliar female voices vs. environmental sounds] contrast. (b) Regions were selected for signal level analysis based on their identification in the [mother’s voice vs. unfamiliar female voices] contrast (Figure 2B). The posterior hippocampus ROI is a 2 mm sphere centered at the peak for this regions in the [mother’s voice >unfamiliar female voices] contrast. All other ROIs are 5 mm spheres centered at the peak for these regions in the TD > ASD group difference map for the [mother’s voice vs. unfamiliar female voices] contrast.

https://doi.org/10.7554/eLife.39906.013

Activation to mother’s voice in children with ASD

Children with ASD showed a smaller collection of brain regions that were preferentially activated by mother’s voice (Appendix 1—figure 4B). This group did not show a preference for mother’s voice in primary auditory regions, including the IC, and activity in auditory cortex was confined to a small extent of left-hemisphere HG. Preference for mother’s voice was also more limited in both auditory association cortex of the superior temporal plane as well as voice selective STS, particularly in the right hemisphere, where only a focal anterior STS (aSTS) cluster showed increased activity for mother’s voice. Children with ASD also did not show a preference for mother’s voice in medial temporal lobe structures, including both amygdala and hippocampus, as well as structures of the mesolimbic reward pathway, default mode network, and occipital regions. Children with ASD did, however, show increased activation to mother’s voice in bilateral anterior insula of the salience network as well as frontoparietal regions, including left-hemisphere BA 44, bilateral supramarginal gyrus, and left-hemisphere angular gyrus.

Appendix 1—figure 4
Brain activity in response to mother’s voice compared to unfamiliar female voices in TD children and children with ASD.

(A) In TD children, mother’s voice elicited greater activity in auditory brain structures in the midbrain and superior temporal cortex (top row, left), including bilateral inferior colliculus (IC) and primary auditory cortex (medial Heschl’s gyrus; mHG) and a wide extent of voice-selective superior temporal gyrus (STG; top row, middle) and superior temporal sulcus (STS). Mother’s voice also showed greater activity in occipital cortex, including fusiform cortex (bottom row, left) as well as core structures of the mesolimbic reward system, including bilateral medial prefrontal cortex (mPFC) and nucleus accumbens (NAc), and the anterior insula (AI) of the salience network. (B) Greater activity for mother’s voice was evident in a smaller collection of brain regions in children with ASD compared to TD children. Mother’s voice did not elicit greater activity in auditory brain structures in the midbrain but extended slightly into primary auditory cortex (top row, left), and activated a more limited extent of voice-selective STG (top row, middle) and STS. Mother’s voice did not elicit greater activity compared to unfamiliar female voices in fusiform cortex, and mesolimbic reward system. Mother’s voice did elicit greater activity in AI of the salience network.

https://doi.org/10.7554/eLife.39906.014

fMRI activation and connectivity profiles in children with ASD are not related to mother’s voice identification accuracy

Behavioral results indicated that 5 of the 21 children with ASD had below chance-level accuracy on the mother’s voice identification task (Figure 1C; see Results, Identification of Mother’s Voice). An important question is whether the five children with ASD who performed below chance on the mother’s voice identification task might show a distinct neural signature that may help explain why these children were unable to identify their mother’s voice in our behavioral task. A plausible hypothesis is that the five children who were unable to identify their mother’s voice in the task would show reduced brain activation in response to unfamiliar or mother’s voice stimuli. To test this hypothesis, we performed additional analyses to examine whether there are any identifying neural characteristics regarding these five children with low identification accuracy.

We first examined neural response profiles for the five children with low vs. high mother’s voice identification accuracy by plotting ROI signal levels for the contrasts and regions identified in Figure 3A. First, results showed no group differences between children with low vs. high identification accuracy using Wilcoxon rank sum tests for any of the brain regions associated with the [unfamiliar voices vs. non-social environmental sounds contrast] (Appendix 1—figure 5A; p > 0.35 for all three regions, not corrected for multiple comparisons). We then examined low vs. high identification accuracy using Wilcoxon rank sum tests for the brain regions associated with the [mother’s voice vs. unfamiliar voices contrast] (Figure 3B) and again found no group differences (Appendix 1—fFigure 5B; p > 0.45 for all seven regions, not corrected for multiple comparisons).

Appendix 1—figure 5
Brain activation in response to unfamiliar voices and mother’s voice in children with ASD with low vs. high mother’s voice identification accuracy.

(A) To examine whether children with ASD who were unable to identify their mother’s voice in the mother’s voice identification task (N = 5) showed a distinct neural response profile relative to children with ASD who were able to perform this task (N = 16), Wilcoxon rank sum tests were computed using ROI single levels (mean contrast betas) for the [unfamiliar voices minus non-social environmental sounds] in regions identified in Figure 3A. Results showed no group differences between children with low (green) vs. high (blue) identification accuracy for any of the brain regions associated with the [unfamiliar voices vs. non-social environmental sounds] contrast (p > 0.35 for all three regions, not corrected for multiple comparisons). (B) Group differences in neural response profiles for low vs. high mother’s voice identification groups using ROI single levels (mean contrast betas) for the [mother’s voice minus unfamiliar voices] contrast were computed within regions identified in Figure 3B. Results showed no group differences between children with low vs. high identification accuracy for any of the brain regions associated with the [mother’s voice minus unfamiliar voices] contrast (p > 0.45 for all seven regions, not corrected for multiple comparisons).

https://doi.org/10.7554/eLife.39906.015

We examined whether mother’s voice identification accuracy affected results from ADOS covariate analyses in children with ASD (Figure 3). Therefore, additional regression analyses were performed in which ADOS Social Affect scores were the dependent variable and predictors included mother’s voice identification accuracy and betas from ROIs identified in the [unfamiliar female voice minus environmental sounds] contrast (i.e., Figure 3A) or [mother’s voice minus unfamiliar voices] contrast (i.e., Figure 3B). Separate regression models were computed for each ROI in each vocal contrast. Results showed that all ROI signal levels reported in Figure 3 were significant predictors of social communication scores after regressing out mother’s voice identification accuracy (p ≤ 0.005 for all ROIs).

We then examined whether removing the five children with low mother’s voice identification accuracy would affect group GLM and functional connectivity results. We therefore examined a sub-group comprised of the 16 children with ASD who showed above chance identification accuracy and performed whole-brain TD vs. ASD group comparisons, social communication covariate analysis within the ASD group, and functional connectivity analyses, including SVC and SVR. Results for all analyses were similar to those described previously for the entire ASD group. Specifically, whole-brain TD vs. ASD group differences and social communication covariate results were evident in similar brain regions as those described for the larger ASD group. Functional connectivity results also showed the same pattern of results described for the entire ASD group: SVC results showed that connectivity during unfamiliar voice processing could not classify individuals with ASD from TD children (SVC Accuracy = 50.9%, p = 0.41) while connectivity during mother’s voice processing could classify individuals with ASD from TD children (SVC Accuracy = 66.3%, p = 0.014). Furthermore, SVR results showed that connectivity using combined features from both unfamiliar and mother’s voice processing could classify individuals with ASD from TD children (R = 66.3%, p = 0.003). These results indicate that patterns of brain activity and connectivity in children with ASD in response to vocal stimuli were unrelated to behavioral identification of mother’s voice.

Together, results from neural measures of voice processing showed that there are no distinguishing features for the children with below chance mother’s voice identification accuracy compared to children with above chance accuracy.

Appendix 1—table 1
Effect sizes for GLM results: TD vs. ASD Group Analysis.

The overall effect size measured across all brain clusters identified in the TD vs. ASD Group Analyses is 0.68.

https://doi.org/10.7554/eLife.39906.016
ContrastBrain regionEffect size
 [Unfamiliar Voices minus Environmental Sounds]Right-hemisphere
Planum Polare (PP)
0.70
 [Mother’s Voice minus Unfamiliar Voices]Right-hemisphere
Intercalcarine
0.65
Right-hemisphere
Lingual
0.68
Right-hemisphere
Fusiform
0.66
Left-hemisphere
Fusiform
0.67
Right-hemisphere
Hippocampus
0.66
Left-hemisphere
Superior Parietal Lobule (SPL)
0.69
Right -hemisphere
Precuneus
0.69
Appendix 1—table 2
Effect sizes for GLM results: Social Communication Covariate Analysis.

The overall effect size measured across all brain clusters identified in the Social Communication Covariate Analysis is 0.76.

https://doi.org/10.7554/eLife.39906.017
ContrastBrain regionEffect size
[Unfamiliar Voices minus Environmental Sounds]Right-hemisphere
Planum Polare (PP)
0.84
Left-hemisphere
Nucleus Accumbens (NAc)
0.69
Right-hemisphere
Anterior Insula (AI)
0.84
[Mother’s Voice minus Unfamiliar Voices]Left-hemisphere
Heschl’s Gyrus (HG)
0.77
Left-hemisphere
Planum Polare (PP)
0.77
Right-hemisphere
Superior Temporal Sulcus (mSTS)
0.74
Right-hemisphere
Ventromedial prefrontal cortex (vmPFC)
0.73
Left-hemisphere
Anterior Insula (AI)
0.77
Right-hemisphere
Rostral Antreior Cingulate Cortex (rACC)
0.73
Right-hemisphere
Supplementary Motor Area (SMA)
0.76
Appendix 1—table 3
Brain regions used in functional connectivity analyses.
https://doi.org/10.7554/eLife.39906.018
Brain regionCoordinates
Left-hemisphere pSTS[−63–42 9]
Right-hemisphere pSTS[57 -31 5]
Left-hemisphere vmPFC[−6 32–14]
Right-hemisphere vmPFC[6 54 -4]
Left-hemisphere Anterior Insula[−28 18–10]
Right-hemisphere VTA[2 -22 -20]
Left-hemisphere NAc[−12 18–8]
Right-hemisphere NAc[14 18 -8]
Left-hemisphere OFC[−36 24–14]
Left-hemisphere Putamen[−24 14–8]
Right-hemisphere Putamen[16 14 -10]
Left-hemisphere Caudate[−18 4 20]
Right-hemisphere Caudate[14 22 -6]
Right-hemisphere Amygdala[30 -4 -24]
Right-hemisphere Hippocampus[28 -6 -26]
Right-hemisphere Fusiform[36 -28 -22]
Appendix 1—table 4
GLM Threshold Analysis: TD vs. ASD Group Analysis [Unfamiliar Voices minus Environmental Sounds] fMRI Contrast.
https://doi.org/10.7554/eLife.39906.019
Brain Region ActivationHeight: p<0.005
Extent: p<0.05
Height: p<0.005
Extent: p<0.01
Height: p<0.001
Extent: p<0.05
Height: p<0.001
Extent: p<0.01
67 Voxels87 Voxels30 Voxels41 Voxels
Auditory Assoc. Cx, PPYesYesYesYes
Appendix 1—table 5
GLM Threshold Analysis: TD vs. ASD Group Analysis [Mother’s Voice minus Unfamiliar Voices] contrast.
https://doi.org/10.7554/eLife.39906.020
Brain Region ActivationHeight: p<0.005
Extent: p<0.05
Height: p<0.005
Extent: p<0.01
Height: p<0.001
Extent: p<0.05
Height: p<0.001
Extent: p<0.01
67 Voxels87 Voxels30 Voxels41 Voxels
 Occipital Fusiform GyrusYesYesYesNo
 Temporal Occipital Fusiform GyrusYesYesNoNo
 Post. HippocampusYesYesNoNo
 Lingual GyrusYesYesYesYes
 Superior ParietalYesYesYesYes
 PrecuneusYesYesYesYes
Appendix 1—table 6
GLM Threshold Analysis: Social Communication Covariate Analysis, [Unfamiliar Voices minus Environmental Sounds] fMRI Contrast.
https://doi.org/10.7554/eLife.39906.021
Brain Region ActivationHeight: p<0.005
Extent: p<0.05
Height: p<0.005
Extent: p<0.01
Height: p<0.001
Extent: p<0.05
Height: p<0.001
Extent: p<0.01
67 Voxels87 Voxels30 Voxels41 Voxels
 Auditory Assoc., PPYesYesYesYes
 Voice Selective, STGYesYesYesYes
 Mesolimbic Reward, NAcYes (SVC)NoNoNo
 Salience, AIYesYesYesYes
Appendix 1—table 7
GLM Threshold Analysis: Social Communication Covariate Analysis, [Mother’s Voice minus Unfamiliar Voices] fMRI Contrast.
https://doi.org/10.7554/eLife.39906.022
Brain Region ActivationHeight: p<0.005
Extent: p<0.05
Height: p<0.005
Extent: p<0.01
Height: p<0.001
Extent: p<0.05
Height: p<0.001
Extent: p<0.01
67 Voxels87 Voxels30 Voxels41 Voxels
Primary Auditory, HGYesYesYesYes
Voice-selective, STG/STSYesYesNoNo
Mesolimbic Reward, vmPFCYesYesYesYes
Salience, AIYesYesYesYes
Salience, rACCYesYesNoNo
Motor, SMAYesYesYesYes
https://doi.org/10.7554/eLife.39906.010

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
    Diagnostic and Statistical Manual of Mental Disorders: DSM-5
    1. American Psychiatric Association
    (2013)
    Washington: American Psychiatric Association.
  8. 8
  9. 9
  10. 10
    Contemporary Issues in Experimental Phonetics
    1. PD Bricker
    2. S Pruzansky
    (1976)
    295–326, Speaker Recognition, Contemporary Issues in Experimental Phonetics, New York United States, Academic, 10.1016/b978-0-12-437150-7.50015-4.
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
    Early Warning Signs of Autism Spectrum Disorder: Division of Birth Defects, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention
    1. L Harstad
    2. C Baum
    3. Y Yatchmink
    (2016)
    Centers for Disease Control and Prevention.
  31. 31
    Speaker Recognition
    1. MH Hecker
    (1971)
    An interpretive survey of the literature, Speaker Recognition, ASHA Monographs.
  32. 32
  33. 33
    Autistic disturbances of affective contact
    1. L Kanner
    (1968)
    Acta Paedopsychiatrica 35:217–250.
  34. 34
  35. 35
  36. 36
    Psychosocial Treatments for Child and Adolescent Disorders: Empirically Based Strategies for Clinical Practice
    1. LK Koegel
    2. RL Koegel
    3. LI Brookman
    (2005)
    633–657, Child-initiated interactions that are pivotal in intervention for children with autism, Psychosocial Treatments for Child and Adolescent Disorders: Empirically Based Strategies for Clinical Practice, Washington, American Psychological Association.
  37. 37
    Pivotal Response Treatments for Autism: Communication, Social, and Academic Development
    1. RL Koegel
    2. LK Koegel
    (2006)
    Baltimore: Brookes Publishing.
  38. 38
  39. 39
  40. 40
    Developing trust and perceived effectance in infancy
    1. ME Lamb
    (1981)
    In: L. P Lipsitt, editors. Advances in Infancy Research. Norwood: Ablex. pp. 101–127.
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
    Autism Diagnostic Observation Schedule
    1. C Lord
    2. M Rutter
    3. PC DiLavore
    4. S Risi
    5. K Gotham
    6. S Bishop
    (2012)
    Torrance: Western Psychological Services.
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
    Autism Spectrum Disorders: A Transactional Developmental Perspective
    1. P Mundy
    2. J Stella
    (2000)
    55–77, Joint attention, social orienting, and nonverbal communication in autism, Autism Spectrum Disorders: A Transactional Developmental Perspective,  Baltimore, Paul H. Brookes Publishing Company.
  54. 54
    Educating Children with Autism. Committee on Educational Interventions for Children with Autism
    1. National Research Council
    (2001)
    Washington: National Academies Press.
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
    Reward processing in autism
    1. AA Scott-Van Zeeland
    2. M Dapretto
    3. DG Ghahremani
    4. RA Poldrack
    5. SY Bookheimer
    (2010)
    Autism Research : Official Journal of the International Society for Autism Research 3:53–67.
    https://doi.org/10.1002/aur.122
  65. 65
    Clinical Evaluation of Language Fundamentals
    1. E Semel
    (2003)
    Psychological Corporation.
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
    Comprehensive Test of Phonological Processing (CTOPP)
    1. RK Wagner
    (1999)
    Pro-Ed, Inc.
  75. 75
    Simultaneous Inference for fMRI Data, AFNI 3dDeconvolve Documentation
    1. BD Ward
    (2000)
    Medical College of Wisconsin.
  76. 76
  77. 77
    The Wechsler Abbreviated Scale of Intelligence
    1. D Wechsler
    (1999)
    San Antonio: The Psychological Corporation.
  78. 78
  79. 79
  80. 80

Decision letter

  1. Michael Breakspear
    Reviewing Editor; QIMR Berghofer Medical Research Institute, Australia
  2. Michael J Frank
    Senior Editor; Brown University, United States
  3. Coralie Chevallier
    Reviewer; INSERM, France

In the interests of transparency, eLife includes the editorial decision letter, peer reviews, and accompanying author responses.

[Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved.]

Re-evaluation:

We appreciate that you have highlighted the sample size as a limitation, provided documentary support for the sample size, your deeper individual testing, and noted the need for future studies that are better able to capture the heterogeneity of ASD. Nonetheless, there is increased sensitivity to the (out-of-sample) reproducibility issues inherent in studies of this size, even given the challenges of clinical research of this note. While we support publication of the paper, this limitation has been raised in the peer review process.

Decision letter after peer review:

Thank you for submitting your article "Impaired voice processing in reward and salience circuits predicts social communication in children with autism" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal her identity: Coralie Chevallier (Reviewer #1).

The Reviewing Editor has highlighted the concerns that require revision and/or responses, and we have included the separate reviews below for your consideration. If you have any questions, please do not hesitate to contact us.

This is a very interesting and nicely presented study of cortical responses to mothers' versus strangers' voices on people with Autistic Spectrum Disorders. All reviewers felt the study was well designed in its stimuli and, in principle, impressed by the range of uni- and multivariate analyses undertaken.

However, all three reviewers raise substantial concerns regarding the small sample size: these reflect the ability of the study to provide adequate cover of the heterogeneity of ASD, the possibility that some effects may be inflated (particularly the regressions), the under-estimate of variance in the cross-validation tests and, most importantly, the likely challenges of reproducing the principle findings (in the absence of an independent test data set). The regression effect sizes are very strong, and there is possibly some colinearity between the test to identify the ROIs (the group contrast) and the subsequent regression (ASD severity), given that ASD typically presents as an extreme on the healthy development/social spectrum.

Each analysis, on its own, may be valid, but the general feeling is that the authors are doing too much with this limited data set. In the absence of acquiring further data, there seems to be little additional work that the authors can do to address this fundamental concern.

Given this is part of the eLife "trial", there does not seem to be a deep enough concern for the paper to be withdrawn from further consideration. However, if published, the important concerns of the reviewers will need to be published alongside the paper, pending the nature of the authors' responses.

Other concerns are less fundamental and are listed below.

Separate reviews (please respond to each point):

Reviewer #1:

Overall, I very much enjoyed reading the paper: clearly written, timely, and novel. The use of recordings that are specific to each participant is an important contribution. I am not a neuroscientist so I am not qualified to evaluate the quality of the neuroimaging work. I will therefore focus my remarks on the Introduction, clinical / behavioural aspects of the Materials and methods, and Discussion. (and please pardon my naivety when I do comment on the neuroimaging part of the work).

1) Behavioural tasks can provide mechanistic insights

I think the authors should tone down their claim about the limitations of behavioural studies to understand and tease apart different cognitive mechanisms. Many behavioural studies use ingenious paradigms to tease apart various mechanistic possibilities. Recently, the use of computational modeling methods applied to behavioural data has also been very fruitful in this respect (different model parameter reflect different mechanisms)

On the same topic, in the Introduction “…and obtaining valid behavioral measurements regarding individuals’ implicit judgments of subjective reward value of these stimuli can be problematic”: I was surprised by the strong statement. Antonia Hamilton has published many papers demonstrating the validity of behavioural tools to measure social reward responsiveness. I have also published several papers on that same topic (including one using signal detection theory, in PLOS One). In these papers, no abstract judgment is required. Rather, participants' behaviours are thought to reflect underlying social motivation or social reward responsiveness.

2) Why is the accuracy for mother's voice identification different in the ASD vs. TD group?

I would like to know a bit more the difference in accuracy detection between the groups: why did the ASD group perform below the TD group? The authors point out that 5 children performed below chance in identifying their own mother's voice. Is there evidence that this indicates a true deficit or is poor performance linked to other factors (poor hearing? deteriorated listening skills in the scanner?). Where are these five children on the regression? Do they drive the effect?

If some children did not recognize their mother's voice, it seems to be that they should be looked at differently: if the reward / memory / visual network is less activated in these children, is it because they do not find voices as rewarding / memorable or is it because they didn't recognize this familiar voice (but if they had, their brain would have reacted in the same way)?

Another naive question on the same question. The authors report that "fMRI activation profiles in children with ASD were not related to mother's voice identification accuracy". So I was left wondering what these activations reflect (if they are not sensitive to the fact that 5 out of 21 children did not identify their mother, does it mean that these activations are picking up on something that is much more domain general than anything that might have to do with "mother's voice" specifically?) Is there a way to statistically correct all analyses for accuracy levels?

The authors should report whether identification accuracy is related to ADOS SC scores.

3) ADOS score use

ADOS scores are not meant to be used as a continuous severity scores unless they are transformed (Gotham, K., Pickles, A., & Lord, C. (2009). Standardizing ADOS scores for a measure of severity in autism spectrum disorders. Journal of autism and developmental disorders, 39(5), 693-705.). I do not know how this logic applies to using the social affect score only but it should be checked / discussed.

4) Are there a priori criteria for exclusion based on motion?

Materials and methods subsections “Participants” and “Movement criteria for inclusion in fMRI analysis”: Is there an a priori threshold to exclude participants based on motion? Or published guidelines? Can the authors specify the exclusion decision criteria? Is there a group difference in average motion?

5) Sample size concerns

In a behavioural study, a sample size of 21 is now considered too small. I realise that the change in standard is recent (and I have published many papers using small sample sizes myself). However, I do think that we need to accelerate change, especially for conditions that are notoriously heterogenous, such as ASD. And especially when one is interested in explaining interindividual differences (by using correlations).

Reviewers who specialise in neuroimaging methods may want to comment on this specific point but the paper would definitely be stronger is the sample size exceeded (or at least matched) the current average in cognitive neuroscience field (ie 30, see Poldrack et al., 2015, Figure 1). Alternatively, the authors should report observed power. This is I think most important for the regression part of the paper. If power is too low, I would recommend moving this part of the paper to the SM and rewriting the main text accordingly.

Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365.

Poldrack, Russell A., et al. "Scanning the horizon: towards transparent and reproducible neuroimaging research." Nature Reviews Neuroscience 18.2 (2017): 115.

To sum up, I think the paper raises an interesting question and would be even better if reproducibility concerns were addressed by increasing sample size.

Reviewer #2:

Impaired voice processing in.… children with autism" by Abrams et al.

This paper reports the uni- and multivariate analysis of fMRI acquired from children with ASD while listening to their mothers' or strangers' voice. The authors find quite striking correlations between social disability rating scales and both brain activity and functional connectivity.

The experimental stimuli, writing, analysis and cohort characterization are all of a very high standard. There are some limitations in the actual design that limit the nature of the inferences drawn. In a revised manuscript, these should be addressed through a subtle reframing and possibly additional discussion points.

1) The task stimuli are very well controlled but the task is essentially a passive listening task, with a low level vigilance task designed simply to keep the participants engaged with the stimuli. Therefore, in the absence of an explicit reward, the authors should be cautious of falling foul of reverse inference (Poldrack, 2006) when framing the findings in terms of "reward circuits". I guess listening to a voice, particularly a mother's voice, is implicitly rewarding, but caution should be taken in drawing to direct an inference in regards to dysfunctional reward processes (see Discussion section). Indeed, the whole positioning of functional neuroimaging studies as being more informative than behavioural tests should be mindful of the limitations of inferring disrupted functional circuits in disorders unless you are actually probing that function with an appropriate task.

2) Similarly, there is no manipulation of attention and therefore no means of telling whether the differences in activation during voice perception in ASD is simply a lack of due interest in, and attention to voices preceding (or consequent) to changes in reward-based learning.

3) Again the authors draw a very direct line between mother's vs stranger's voice and social reasoning/communication (e.g. Abstract). But maternal voice is more than a simple cue but incorporates many other processes, including basic parental attachment and dyadic reciprocation.

4) What are the core defining disturbances underlying the diagnosis of ASD in this study? The information in paragraph two of subsection “Participants” simply refers to an algorithm. While this might be sufficient for reproducibility, it is inadequate here for two reasons: First, as eLife is a general (not a clinical) journal, more depth and context is required for the broader readership. Second, if the diagnosis relies heavily on social deficits (which I suspect it does), is it then surprising that the strength of between group differences covaries so markedly within the ASD with social deficit scores. If so, is there an implicit circularity between the identification of these regions (Figure 2) and the very strong correlations against ADOS scores? If not, what other effects may be driving these? Such strong correlations are bound to draw attention and I think the authors should therefore pay careful attention to this issue.

5) This is a modest sample size for the use of machine learning, particularly when using a high dimensional (functional connectivity) feature space. N-fold cross-validation does always not provide strong control here (Varoquaux, 2017): Do the authors undertake a feature reduction step, such as a LASSO? Is there some logical circularity between identifying features with a group contrast, then using these same features in a between group classifier? Also, I would avoid using the term "prediction" for contemporaneous variables, even when cross-validation is undertaken.

6) I don't see any model-based analysis of neuronal interactions and hence don't think the authors are examining effective connectivity. gPPI is a purely linear model of statistical dependences and their moderation and hence falls into the class of functional connectivity. Personally, I would prefer to have seen the author use a more dynamic, model-driven method of effective connectivity, using something like DCM to provide a deeper mechanistic insight into the changes in activation and information flow (I also am not sure I buy into the choice of ROI's that do not show the group effects). However, given the classification success, the authors' approach seems entirely reasonable. However, please do specify the nature of the gPPI model in the text and supplementary material (what are the nodes, inputs and modulators; is it possible to represent this graphically?).

7) Do the manipulations to the vocal signals (subsection “Stimulus post-processing”) possibly warp the sound of the speech? If so, could this influence the ASD responses (ASD being perhaps more tuned to low level features of stimulus inputs).

References:

Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in cognitive sciences, 10(2), 59-63.

Varoquaux, G. (2017). Cross-validation failure: small sample sizes lead to large error bars. Neuroimage.

Minor Comments:

1) Introduction, second sentence: "affected" -> "ASD"

2) Figure 1B caption: Please add stats

3) Discussion first paragraph: "These findings.…"; Third paragraph: "Our findings.…"

4) Discussion paragraph three: ?"predicts"?

5) Subsection “Participants”: please add more details regarding the diagnosis; e.g. "In essence, these ASD.…" (see point 4 above)

6) Table 1: What are the "typical" values of ADOS-social and ADI-A social"? Are these very impaired children (also see point 4 above).

7) Materials and methods: Were there any between group differences in head motion parameters?

8) Subsection “Effective Connectivity Analysis”: I'm not sure what "preempts.… biases" means here – the authors are not modelling effects in the present data, which I think is a shame (see point 6 above)

Reviewer #3:

In this study, Abrams and colleagues examined voice processing in children (average age 10 years old) with and without autism. The specifically were interested in reward and salience circuitry and relate the study back to predictions made by the social motivation theory of autism. The authors applied group-level analyses, brain-behavior correlation analyses, as well as PPI connectivity analyses and applied multivariate classifier and regression analyses applied to the connectivity data. There are several issues which the authors may want to address.

1) Small sample size. The sample size for the study was n=21 per group. This sample size is likely not large enough to cover substantial heterogeneity that exists across the population of individuals with autism diagnosis. Thus, questions about generalizability arise. Can future studies replicate these findings? Small sample size also means lower statistical power for identifying more subtle effects, and this is especially important for the context of whole-brain between-group or brain-behavior correlation analysis which the authors have solely relied on for the activation and clinical correlation analyses (see Cremers, Wager, & Yarkoni, 2017, PLoS One). For multivariate classifiers and regressions, these too produce inflated and over-optimistic levels of predictions with smaller sample size (e.g., Woo et al., 2017, Nature Neuroscience).

2) Given the small sample sizes, but relatively strong and justified anatomical hypotheses, why not run ROI analyses instead of whole-brain analyses? Statistical power would likely be increased for ROI analyses, and one can cite more unbiased estimates of effect size. Whole-brain analyses can show us is where the likely effects might be, and this is helpful when we don't have strong anatomical hypotheses. But here the authors do have strong anatomical hypotheses, yet they choose an analysis approach that is not congruent with that and penalizes them in terms of statistical power and doesn't allow for estimation of unbiased effect sizes. What is missing from the paper is an estimate of how big the effects are likely to be, as this is what we should ideally care about (Reddan, Lindquist, & Wager, 2017, JAMA Psychiatry). Future studies that may try to replicate this study will need to know what the effect sizes are likely to be. Meta-analyses ideally need unbiased estimates of effect size. However, all we have to go off of here are the authors figures showing whole-brain maps, that likely just tell us where some of the largest effects may likely be.

3) Reported effect sizes in Figure 3 (e.g., Pearson's r) are likely inflated given small sample sizes and also due to the fact that it appears that the reported r values in the figure are likely taken from the peak voxel.

4) Scatterplots in Figure 3 show inverted y-axes so that higher numbers on at the bottom and lower numbers are at the top. The reported correlations are negative, and yet the scatterplot shows what looks like a positive correlation. All this confusion is due to the inverted y-axes. The authors should correct the plots to avoid this confusion.

5) The authors heavily rely on reverse inference to relate their findings back to the social motivation theory. However, if their manipulations were powerful enough to create a distinction between a stimulus that was heavily socially rewarding (e.g., mother's voice) versus another that is not (e.g., unfamiliar voice), then shouldn't there be some kind of difference in the main activation analysis in reward-related areas (i.e. Figure 2B)? In other words the main contrast of interest that might have been most relevant to the social motivation theory produces no group differences in activation in areas like the ventral striatum. Because this contrast doesn't really pan out the way the theory predicts, doesn't this cast doubt on the social motivation theory, or couldn't it be that some of the contrasts in this study can be better explained by some other kind of reverse inference than the social motivation theory?

6) From what I could tell, no manipulation check was done to measure some aspect of how rewarding the stimuli were to participants. This seems critical if the authors want to make strong reverse inferences back to the social motivation theory.

7) The authors should include a limitations section to their paper.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your revised article "Impaired voice processing in reward and salience circuits predicts social communication in children with autism" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal her identity: Coralie Chevallier (Reviewer #1).

The reviewers find that, on the whole, you have responded to the technical concerns raised on the prior round.

Reviewer #3 remains concerned that the sample size is not sufficiently large to capture the heterogeneity of ASD. The reviewer, BRE and senior editor have corresponded and agree that this is an important limitation regarding the broader generalizability of the study. In other words, even though you have adequately controlled for in-sample error, generalization beyond the study sample is limited by the size of the cohort.

We feel that a stronger statement that acknowledges this should be added to the manuscript. Given the advanced stage of the manuscript, we recognize that acquiring further subjects is highly unlikely to be feasible although we will note, in the published reviews, that this remained of concern at the conclusion of the reviewing process.

The paper will be promptly be re-assessed by the Reviewing Editor upon resubmission.

Reviewer #1:

I would like to thank the authors for their extremely thorough response and for all the additional analyses they performed in response to my comments. The authors have suitably addressed all of my comments and incorporated a limitations section to mention the modest sample size. Thank you again for this detailed and careful response.

Reviewer #3:

In this reviewer's opinion, the author's responses to previous comments largely dismissed many of the key issues in the comments, and the arguments brought to counter those comments were not at all convincing. The authors need to more properly consider the issues and at the very least comment on them as potential drawbacks, limitations, caveats, etc.

For example, none of the author's responses are adequate for addressing the main problem of small sample size. Heterogeneity across the autism population is vast and one cannot likely cover much of it with an n=21. Discussion about symptom domains as 'constraining' heterogeneity are not relevant to this aspect of the issue. If another study comes up with similar paradigm, but also small n, how likely will that study replicate the findings of the current study? It depends. If the new study samples a different strata of the autism population, then it may not replicate. Studies with larger sample size are more generalizable because they can cover a better range of the population and are not as susceptible to sampling biases that are more pronounced with small samples.

In another example, the authors claim that within subject, the multiple runs somehow can counteract the issue of small sample size. More data within subject can have an impact on statistical power. The estimates for each subject are more precise with more data within-subject. However, generalizing between-subjects or between studies is still an issue, for the reasons stated above.

With regard to the comment about ROI analysis, an analysis with 16 regions is surely more highly powered than a whole-brain analysis with 20,000 voxels to correct for. For anatomically constrained hypotheses (which the authors have), the comment still stands that it is strange to take a whole-brain approach as if the authors had a less strong idea about the anatomical hypotheses. Whole-brain analyses are lower in statistical power and have the ability to over-inflate effect sizes, and the authors are merely capitalizing on those things with small sample size.

Minor Comments:

The scatterplots must be plotted in the correct way, and not reversed, as this will mainly confuse readers.

https://doi.org/10.7554/eLife.39906.028

Author response

This is a very interesting and nicely presented study of cortical responses to mothers' versus strangers' voices on people with Autistic Spectrum Disorders. All reviewers felt the study was well designed in its stimuli and, *in principle*, impressed by the range of uni- and multivariate analyses undertaken.

However, all three reviewers raise substantial concerns regarding the small sample size: these reflect the ability of the study to provide adequate cover of the heterogeneity of ASD,

We thank the editors and reviewers for this comment. The ability of our sample to adequate cover heterogeneity in ASD is an important consideration for this study and all studies investigating the perceptual, cognitive, and biological bases of ASD. From one perspective, adequately addressing heterogeneity in ASD is a formidable challenge: studying and working with individuals with ASD has lead more than one researcher and clinician to state: “If you’ve met one person with autism, you’ve met one person with autism”. Given this widely-held view, sufficiently addressing heterogeneity in ASD represents a significant research challenge, particularly for pediatric brain imaging research in which gathering high-quality data in a modest sample such as that reported here can take many years.

Here we have addressed this challenge with a comprehensive approach which both: (a) constrains heterogeneity in ASD to a critical diagnostic symptom domain, social communication abilities, and (b) explores heterogeneity within this symptom domain by identifying neural features that covary as a function of social communication abilities. The rationale for constraining heterogeneity in ASD to a symptom domain is that all individuals with ASD necessarily have pronounced impairments in diagnostic domains, and these domains have considerably less heterogeneity in individuals with ASD relative to other behavioral and cognitive domains. For example, in the social communication domain, which we investigate in the current study, all individuals with ASD show pronounced deficits that are categorized by impaired communication and/or reciprocal social interactions (1). In contrast, consider the case of language function and IQ in children with ASD. Language function and IQ in these individuals varies widely, from very high levels, which are commensurate with neurotypical individuals, through very low levels of language and cognitive function (1-3). We argue that constraining analyses to an ASD diagnostic domain is an important approach for keeping a focus on critical areas that are central to the core deficits in the disorder, thereby providing important insight to clinicians, researchers, parents, and educators regarding the neurobiological bases of these core deficits.

The rationale for exploring heterogeneity within the social communication symptom domain is that high-functioning children with ASD, such as those included in our sample, often show a range of social communication abilities, from mild/moderate deficits to more severe deficits. For example, the Autism Diagnostic Observation Schedule (ADOS-2) is the gold-standard diagnostic instrument for ASD (4), and social communication subscores on the ADOS range from 0 to 22, with a 0 indicating no social communication deficit, a 7 indicating a more mild deficit, and a 22 indicating the most severe deficit. The high-functioning children included in our sample had a range of social communication abilities as measured with the ADOS between 7 and 16. Children with ASD who have scored higher than 16 tend to have multiple comorbid symptoms, and low cognitive function, thus confounding neuroscientific interpretation of findings. Importantly, understanding the link between social communication symptom severity and social brain processing is a critical question for understanding the neurobiological basis of autism, and is a question that has not been explored in previous studies of human voice processing in children with ASD. The reason this is an important question is that the severity of social communication deficits can play a crucial role in autism models. For example, the social motivation theory of ASD posits that impairments in representing the reward value of human vocal sounds impedes individuals with ASD from engaging with these stimuli and contributes to social interaction difficulties (5, 6). Given the causal link proposed in this model, a key prediction of the social motivation theory is that children with more severe deficits associated with social reward will have greater social communication deficits compared to those children with less severe social reward deficits. Therefore, by examining heterogeneity within the social communication symptom domain, we are able to test an important prediction of an influential ASD model and understand the neural features that may contribute to this heterogeneity.

In our study, we have employed this approach and constrained heterogeneity in children with ASD by focusing on social communication abilities. We then examine heterogeneity within the social communication symptom domain and show that social communication abilities explain significant variance in activity and connectivity of social reward and salience brain regions during human voice processing.

The possibility that some effects may be inflated (particularly the regressions),

We thank the reviewer for this comment. As we stated in the initial submission (see Brain Activity Levels and Prediction of Social Function subsection of the Materials and methods), the goal for performing the regression analyses was to examine the robustness and reliability of brain activity levels for predicting SC scores. The rationale for this analysis is that significant whole-brain covariate analysis can be the result of outliers in the data or other spurious effects (7). By extracting signal levels from ROIs, plotting the distributions, and performing additional regression analyses on these data, our goal was to show that the whole-brain covariate analysis was not driven by outlying data and was robust to confirmatory cross-validation regression analysis. We have made every effort to clarify this point in the revised manuscript.

The under-estimate of variance in the cross-validation tests

We acknowledge that proper estimation of variance is critical for robust and reliable cross-validation results. An important consideration in the context of the current analyses is that the permutation test used to assess statistical significance of support vector classification (SVC) and regression (SVR) results undergoes the same cross-validation procedure as that used in the original cross-validation. Therefore, the estimate of variance in the original cross-validation analysis is comparable to the null distribution from permutation. Importantly, this analytic approach accounts for any underestimate of variance in a modest sample size.

The likely challenges of reproducing the principle findings (in the absence of a independent test data set).

Reproducibility in neuroimaging research represents a major challenge for the study of pediatric clinical populations, such as children with ASD, whose data is considerably more difficult to acquire compared to typically-developing children (8). One consideration here is that while the sample size used in the current study (N=21 for both TD and ASD groups) is modest in comparison to recent task-based brain imaging studies of neurotypical adult populations and resting-state or structural studies in individuals with ASD, these types of studies do not face the same data collection challenges as task-based studies in clinical pediatric populations (8). Importantly, resting-state fMRI and structural MRI studies are unable to address specific questions related to social information processing in ASD, such as biologically-salient voice processing, which is critical for understanding the brain bases of social dysfunction in affected children. Furthermore, our sample size is larger than (9-12) or comparable to (13) other task-fMRI studies in children with ASD published since 2017.

Moreover, in task-fMRI studies published since 2017 that included >22 children with ASD, a much smaller number of runs (e.g., 2 runs of each condition (14); 4 runs (15); 1 run (16)) was provided to each participant compared to our study (7-10 runs with all stimulus conditions). This is an important consideration given that sample size is not the only determinant for the replicability of fMRI task data. While previous studies have shown that increasing the sample size can improve the replicability of results (17), an important consideration is that the replicability of task fMRI data is not solely contingent on a large sample size but also depends on the amount of individual-level sampling. A recent report examining this question showed that modest sample sizes, comparable to those described in our submitted manuscript, yield highly replicable results with only four runs of task data with a similar number of trials per run as our study (18). Moreover, replicability from smaller sample sizes using four runs of event-related task fMRI data exceeds the replicability of much larger sample sizes (N > 120) using only one run of block task fMRI data (18).

In the current study, we have used rigorous standards for inclusion that are, to the best of our knowledge, a first for autism and neurodevelopmental neuroimaging research. Specifically, we required that each child participant had at least 7 functional imaging runs of our event-related fMRI task that met our strict head movement criteria. This multi-run approach yields many more trials per vocal source condition (~150) than previous studies, thereby significantly enhancing power to detect effects within each child, and has only been used previously in visual neurosciences research in adults. To our knowledge, these rigorous within-subject criteria, which have been shown to be a critical factor for producing replicable task fMRI findings (18), are a first for autism and neurodevelopmental neuroimaging research.

The regression effect sizes are very strong, and there is possibly some colinearity between the test to identify the ROIs (the group contrast) and the subsequent regression (ASD severity), given that ASD typically presents as an extreme on the healthy development/social spectrum.

We believe that there is confusion regarding how we performed the ADOS covariate analysis within children with ASD. It seems that this confusion stems from the fact that reviewer #2 misunderstood the analysis and thought that the ADOS covariate analysis used ROIs that were identified from the TD vs. ASD group analysis. This was not the case. Rather, the TD vs. ASD group analysis and the ADOS covariate analysis were separate whole-brain analyses. By performing separate whole-brain analyses, we have avoided concerns related to collinearity in the ADOS covariate analysis. We have made every effort to clarify this point in the revised Materials and methods and Results sections. Finally, the strong effect sizes we found are likely the result of our use of at least 7 functional imaging runs in each participant.

Reviewer #1:

Overall, I very much enjoyed reading the paper: clearly written, timely, and novel. The use of recordings that are specific to each participant is an important contribution. I am not a neuroscientist so I am not qualified to evaluate the quality of the neuroimaging work. I will therefore focus my remarks on the Introduction, clinical / behavioural aspects of the Materials and methods, and Discussion. (and please pardon my naivety when I do comment on the neuroimaging part of the work).

1) Behavioural tasks can provide mechanistic insights

I think the authors should tone down their claim about the limitations of behavioural studies to understand and tease apart different cognitive mechanisms. Many behavioural studies use ingenious paradigms to tease apart various mechanistic possibilities. Recently, the use of computational modeling methods applied to behavioural data has also been very fruitful in this respect (different model parameter reflect different mechanisms)

We thank the reviewer for raising this point. Our intention was to state that “behavioral studies are limited in their ability to provide insights into neural mechanisms underlying social information processing.” We have revised this section of the Introduction accordingly.

On the same topic, in the Introduction “…and obtaining valid behavioral measurements regarding individuals’ implicit judgments of subjective reward value of these stimuli can be problematic”: I was surprised by the strong statement. Antonia Hamilton has published many papers demonstrating the validity of behavioural tools to measure social reward responsiveness. I have also published several papers on that same topic (including one using signal detection theory, in PLOS One). In these papers, no abstract judgment is required. Rather, participants' behaviours are thought to reflect underlying social motivation or social reward responsiveness.

We thank the reviewer for highlighting this important point, and we agree with the reviewer that we should highlight this important line of research. An important consideration is that the behavioral work that has been done in this area by Drs. Chevalier (19, 20), Hamilton (21, 22), and others (23) has been in the visual domain, in which researchers use eye-tracking or pupillary responses as a tool for studying implicit reward. Studying implicit reward in the context of biologically-salient voice processing presents additional challenges given that, to our knowledge, validated behavioral methods for ascertaining whether children are directing their neural resources to a specific vocal source, thereby measuring auditory social reward responsiveness, have not yet been developed. Therefore, we argue that using brain imaging methods represents a critical approach in the context of implicitly rewarding voice processing. We have clarified these important points in the revised Introduction.

2) Why is the accuracy for mother's voice identification different in the ASD vs. TD group?

I would like to know a bit more the difference in accuracy detection between the groups: why did the ASD group perform below the TD group? The authors point out that 5 children performed below chance in identifying their own mother's voice. Is there evidence that this indicates a true deficit or is poor performance linked to other factors (poor hearing? deteriorated listening skills in the scanner?). Where are these five children on the regression? Do they drive the effect?

If some children did not recognize their mother's voice, it seems to be that they should be looked at differently: if the reward / memory / visual network is less activated in these children, is it because they do not find voices as rewarding / memorable or is it because they didn't recognize this familiar voice (but if they had, their brain would have reacted in the same way)?

Results and SI Results in the initial submission showed that differences in identification accuracy between TD and ASD groups for mother's voice identification were driven by 5 children with ASD who performed below chance on the “mother’s voice identification” task. The reviewer highlights an important point by suggesting that these 5 children might show a distinct behavioral or neural signature that may help explain why these children were unable to identify their mother’s voice in our behavioral task. While these children did not present with hearing impairments as noted by parents or neuropsychological assessors, who had performed extensive neuropsychological testing on these children prior to the fMRI scan and mother’s voice identification task, a plausible hypothesis is that the 5 children who were unable to identify their mother’s voice in the task would show greater social communication deficits, lower scores on measures of cognitive and language function, and/or reduced brain activation in response to unfamiliar or mother’s voice stimuli. To test this hypothesis, we have performed additional analyses to examine whether there are any identifying clinical, cognitive, or neural or characteristics regarding these 5 children with low identification accuracy.

We first investigated differences in social communication, cognitive, and language abilities between children with ASD with low (N=5) vs. high (N=16) mother’s voice identification accuracy. Examining the distribution of ADOS Social Communication revealed that the 5 children with low mother’s voice identification accuracy had a wide range of scores from 7-16 (please note that ADOS Social Communication is scored in a range between 0-20, with a score of 0 indicating no social deficit, a score of 7 indicating a more mild social communication deficit, and a score of 16 a more severe deficit). Group results for this measure are plotted below (left-most violin plot) and group comparisons between low (“Low ID” in green) and high (“High ID” in blue) mother’s voice identification groups using Wilcoxon rank sum tests were not significant for ADOS Social Communication (P = 0.83). We next examined whether there were any differences in standardized IQ scores (Wechsler Abbreviated Scale of Intelligence (24)) for children with low and high mother’s voice identification accuracy, which are plotted below (three right-most violin plots). Group comparisons between low and high mother’s voice identification groups using Wilcoxon rank sum tests were not significant for any of the IQ measures (P > 0.25 for all 3 measures, not corrected for multiple comparisons).

We next examined whether there were any differences for children with low vs. high mother’s voice identification accuracy in standardized measures of language abilities, including CTOPP Phonological Awareness (25) and CELF-4 Core Language, Receptive Language, and Expressive Language standard scores (26). Group comparison using Wilcoxon rank sum tests were not significant for any of the language measures (P > 0.05 for all 4 measures, not corrected for multiple comparisons), however there was a trend for reduced Core Language (P = 0.062) and Expressive Language abilities (P = 0.055) in the low (green) mother’s voice identification group.

We next examined neural response profiles for the 5 children with low vs. high mother’s voice identification accuracy by plotting ROI signal levels for the contrasts and regions identified in Figure 3A of the initial submission. First, results showed no group differences between children with low vs. high identification accuracy using Wilcoxon rank sum tests for any of the brain regions associated with the unfamiliar voices vs. non-social environmental sounds contrast (plotted below; P > 0.35 for all three regions, not corrected for multiple comparisons).

We next examined low vs. high identification accuracy using Wilcoxon rank sum tests for the brain regions associated with the mother’s voice vs. unfamiliar voices contrast (Figure 3B) and again found no group differences (plotted below; P > 0.45 for all seven regions, not corrected for multiple comparisons).

We next examined whether the 5 children with ASD with low mother’s voice identification accuracy showed distinct relationships between social communication and neural activation profiles compared to children with high mother’s voice identification accuracy, we replotted the Figure 3 regressions and demarcated children with low identification accuracy (plotted below with “X’s”). While this only provides a qualitative description of these relationships, results show a range of neural activation profiles for unfamiliar (panel A) and mother’s voice (panel B) processing in children with low mother’s voice identification accuracy. Importantly, the relationship between social communication and neural activation profiles did not appear to distinguish the children with low mother’s voice identification accuracy.

Finally, additional analyses, which were included in the initial submission, showed that removing the 5 children with low identification accuracy from the GLM and gPPI connectivity analyses did not change any of the reported effects (please see Appendix subsection entitled fMRI activation and connectivity profiles in children with ASD are not related to mother’s voice identification accuracy).

Together, results from clinical (i.e., ADOS Social Communication), cognitive, language, and neural activity measures showed that there are no distinguishing features for the children with poor mother’s voice identification accuracy. One possible explanation is that all children performed the mother's voice identification task immediately after the fMRI scan, which took approximately 2 to 2.5 hours to complete. The reason children performed this task after the fMRI scan rather than before it (i.e., at a neuropsychological testing visit prior to the scan) is that we did not want to expose the children to the fMRI stimuli prior to performing the fMRI task. Therefore, it seems plausible that these children may have had difficulty focusing on the mother’s voice identification task due to fatigue from the fMRI scan. We have included these additional analyses and information in the revised Appendix.

Another naive question on the same question. The authors report that "fMRI activation profiles in children with ASD were not related to mother's voice identification accuracy". So I was left wondering what these activations reflect (if they are not sensitive to the fact that 5 out of 21 children did not identify their mother, does it mean that these activations are picking up on something that is much more domain general than anything that might have to do with "mother's voice" specifically?)

This is an important question, and one that we have spent much time considering. If neural activations are “picking up on something that is much more domain general than anything that might have to do with ‘mother's voice’,” as indicated by the reviewer, we would hypothesize that a signature of domain general differences would be evident in supplementary analyses of behavioral and neural measures described in section 1.3 (above), including brain activation for [unfamiliar voices vs. non-social environmental sounds], which indexes general voice processing (27). However, results from supplementary analyses of social communication, cognitive, language, and neural measures of voice processing failed to provide evidence to suggest that the 5 children who could not accurately identify their mother’s voice are different from the other participants other than on the mother’s voice identification task. One additional domain-general possibility, which was discussed in section 1.3 (above), is that fatigue may have played a role in the 5 children who could not accurately identify their mother’s voice in the post scanning session. However, we do not have additional data to further probe this possibility.

Is there a way to statistically correct all analyses for accuracy levels?

To examine whether mother’s voice identification accuracy affected results from ADOS covariate analyses in children with ASD (Figure 3), we performed additional regression analyses in which ADOS social communication values were the dependent variable and predictors included mother’s voice identification accuracy and betas from ROIs identified in the [unfamiliar female voice minus environmental sounds] contrast (i.e., Figure 3A) or [mother’s voice minus unfamiliar voices] contrast (i.e., Figure 3B). Separate regression models were computed for each ROI in each contrast. Results showed that all ROI signal levels reported in Figure 3 were significant predictors of ADOS social communication scores after regressing out mother’s voice identification accuracy (P ≤ 0.005 for all ROIs). We have added these results to the revised Appendix subsection entitled fMRI activation and connectivity profiles in children with ASD are not related to mother’s voice identification accuracy.

The authors should report whether identification accuracy is related to ADOS SC scores.

We thank the reviewer for this suggestion and correlation analysis indicates that mother’s voice identification accuracy is not related to ADOS social communication scores (R = 0.13, P = 0.59). We have included this result in the revised Appendix.

3) ADOS score use

ADOS scores are not meant to be used as a continuous severity scores unless they are transformed (Gotham, K., Pickles, A., & Lord, C. (2009). Standardizing ADOS scores for a measure of severity in autism spectrum disorders. Journal of autism and developmental disorders, 39(5), 693-705.). I do not know how this logic applies to using the social affect score only but it should be checked / discussed.

We thank the reviewer for this astute point. Standardization of ADOS scores (28) and subscores (29) is performed to enable comparisons across ADOS modules. Given that all of our participants were administered module 3 of the ADOS, which is stated in the Participants subsection of the Materials and methods, standardization of ADOS scores here is unnecessary.

4) Are there a priori criteria for exclusion based on motion?

Materials and methods subsections “Participants” and “Movement criteria for inclusion in fMRI analysis”: Is there an a priori threshold to exclude participants based on motion? Or published guidelines? Can the authors specify the exclusion decision criteria? Is there a group difference in average motion?

Our study incorporated stringent a priori criteria for exclusion based on head motion during all fMRI runs, which is consistent with our previous work (30) and is described in the Materials and methods subsection entitled Movement criteria for inclusion in fMRI analysis. This sections states:

“For inclusion in the fMRI analysis, we required that each functional run had a maximum scan-to-scan movement of < 6 mm and no more than 15% of volumes were corrected in the de-spiking procedure. Moreover, we required that all individual subject data included in the analysis consisted of at least seven functional runs that met our criteria for scan-to-scan movement and percentage of volumes corrected; subjects who had fewer than seven functional runs that met our movement criteria were not included in the data analysis. All 42 participants included in the analysis had at least 7 functional runs that met our movement criteria, and the total number of runs included for TD and ASD groups were similar (TD = 192 runs; ASD = 188 runs).”

The second to last line in Table 1: Demographic and IQ Measures shows descriptive statistics for head motion in the TD and ASD groups as well as results from a group comparison using a 2-sample t-test, which was not significant (P = 0.36). We have further clarified this point in the revisedParticipants subsection of the revised Materials and methods.

5) Sample size concerns

In a behavioural study, a sample size of 21 is now considered too small. I realise that the change in standard is recent (and I have published many papers using small sample sizes myself). However, I do think that we need to accelerate change, especially for conditions that are notoriously heterogenous, such as ASD. And especially when one is interested in explaining interindividual differences (by using correlations).

We thank the reviewer for this comment and their concern for the ability of our study to address heterogeneity in ASD. Here we have addressed this challenge with a comprehensive approach which both: (a) constrains heterogeneity in ASD to a critical diagnostic symptom domain, social communication abilities, and (b) explores heterogeneity within this symptom domain (i.e., “explaining interindividual differences” as mentioned by the reviewer) by identifying neural features that covary as a function of social communication abilities. The rationale for constraining heterogeneity in ASD to a symptom domain is that all individuals with ASD necessarily have pronounced impairments in diagnostic domains, and these domains have considerably less heterogeneity in individuals with ASD relative to other behavioral and cognitive domains. For example, in the social communication domain, which we explore in the current study, all individuals with ASD show pronounced deficits that are categorized by impaired communication and/or reciprocal social interactions (1). In contrast, please consider language function and IQ in children with ASD. Language function and IQ in these individuals varies widely, from very high levels, which are commensurate with neurotypical individuals, through very low levels of language and cognitive function (1-3). We argue that constraining analyses to an ASD diagnostic domain is an important approach for keeping a focus on critical areas that are central to the core deficits in the disorder, thereby providing important insight to clinicians, researchers, parents, and educators regarding the neurobiological bases of these core deficits.

The rationale for exploring heterogeneity within the social communication symptom domain is that high-functioning children with ASD, such as those included in our sample, often show a range of social communication abilities, from mild/moderate deficits to more severe deficits. For example, the Autism Diagnostic Observation Schedule (ADOS-2) is the gold-standard diagnostic instrument for ASD (4), and social communication subscores on the ADOS range from 0 to 22, with a 0 indicating no social communication deficit, a 7 indicating a more mild deficit, and a 22 indicating the most severe deficit. The high-functioning children included in our sample had a range of social communication abilities as measured with the ADOS between 7 and 16. Importantly, understanding the link between social communication symptom severity and social brain processing is a critical question for understanding the neurobiological basis of autism, and is a question that has not been explored in previous studies of human voice processing in children with ASD. The reason this is an important question is that the severity of social communication deficits can play a crucial role in autism models. For example, the social motivation theory of ASD posits that impairments in representing the reward value of human vocal sounds impedes individuals with ASD from engaging with these stimuli and contributes to social interaction difficulties (5, 6). Given the causal link proposed in this model, a key prediction of the social motivation theory is that children with more severe deficits associated with social reward will have greater social communication deficits compared to those children with less severe social reward deficits. Therefore, by examining heterogeneity within the social communication symptom domain, we are able to test an important prediction of an influential ASD model and understand the neural features that may contribute to this heterogeneity.

In our study, we have employed this approach and constrained heterogeneity in children with ASD by focusing on social communication abilities. We then examine heterogeneity within the social communication symptom domain and show that social communication abilities explain significant variance in activity and connectivity of social reward and salience brain regions during human voice processing.

Finally, it is important to note that our inclusion criteria required each participant to have at least 7 functional imaging runs of our event-related fMRI task that met our strict head movement criteria. Requiring a large number of functional runs for each participant is an important approach for increasing statistical power, an issue further elaborated below. This approach has been used primarily in basic human vision experiments, which often use small samples of 3-7 participants with intense scanning in each participant (31-33), and, to our knowledge, is a first for neuroimaging studies in autism, and in children.

Reviewers who specialise in neuroimaging methods may want to comment on this specific point but the paper would definitely be stronger is the sample size exceeded (or at least matched) the current average in cognitive neuroscience field (ie 30, see Poldrack et al. 2015, Figure 1).

Reproducibility in neuroimaging research represents a major challenge for the study of pediatric clinical populations, such as children with ASD, whose data is considerably more difficult to acquire compared to typically-developing children (8). One consideration here is that while the sample size used in the current study (N=21 for both TD and ASD groups) is modest in comparison to recent task-based brain imaging studies of neurotypical adult populations and resting-state or structural studies in individuals with ASD, these types of studies do not face the same data collection challenges as task-based studies in clinical pediatric populations (8). Importantly, resting-state fMRI and structural MRI studies are unable to address specific questions related to social information processing in ASD, such as biologically-salient voice processing, which is critical for understanding the brain bases of social dysfunction in affected children. Furthermore, our sample size is larger than (9-12) or comparable to (13) other task-fMRI studies in children with ASD published since 2017. Moreover, in task-fMRI studies published since 2017 that included >22 children with ASD, a much smaller number of runs (e.g., 2 runs of each condition (14); 4 runs (15); 1 run (16)) was provided to each participant compared to our study (7-10 runs with all stimulus conditions). This is an important consideration given that sample size is not the only determinant for the replicability of fMRI task data. While previous studies have shown that increasing the sample size can improve the replicability of results (17), an important consideration is that the replicability of task fMRI data is not solely contingent on a large sample size but also depends on the amount of individual-level sampling. A recent report examining this question showed that modest sample sizes, comparable to those described in our submitted manuscript, yield highly replicable results with only four runs of task data with a similar number of trials per run as our study (18). Moreover, replicability from smaller sample sizes using four runs of event-related task fMRI data exceeds the replicability of much larger sample sizes (N > 120) using only one run of block task fMRI data (18).

In the current study, we have used rigorous standards for inclusion that are, to the best of our knowledge, a first for autism and neurodevelopmental neuroimaging research. Specifically, we required that each child participant had at least 7 functional imaging runs of our event-related fMRI task (4 min each) that met our strict head movement criteria. This multi-run approach yields many more trials per condition (~150) than previous studies, thereby significantly enhancing power to detect effects within each child, and has only been used previously in visual neurosciences research in adults. To our knowledge, these rigorous within-subject criteria, which have been shown to be a critical factor for producing replicable task fMRI findings (18), are a first for autism and neurodevelopmental neuroimaging research.

Alternatively, the authors should report observed power. This is I think most important for the regression part of the paper. If power is too low, I would recommend moving this part of the paper to the SM and rewriting the main text accordingly.

To provide a better estimate of effect size, we used the originally computed t-scores from the whole-brain GLM analysis. Instead of examining the peak, we averaged the t-scores in each cluster to compute effect sizes. To estimate effect sizes for the TD vs. ASD group comparisons (i.e., regions identified in Figure 2), t-scores from the whole-brain TD vs. ASD group GLM analysis were averaged within each cluster identified in the GLM results. Effect sizes were then computed as Cohen’s d = t-scores/(sqrt(N/2)), where t is the mean t-score within a cluster N is the sample size.

To provide estimates of effect sizes within regions identified in the ASD Social Communication covariate analysis (i.e., Figure 3), t-scores from the whole-brain covariate analysis were averaged within each cluster identified in the results. Effect sizes were then computed as Cohen’s f according to f = t-scores/(sqrt(N)), where t is the mean t-score within a cluster and N is the ASD sample size: These effect sizes are now reported in the revised manuscript. We report an overall effect size of 0.68 averaged across all clusters identified in the TD vs. ASD group analysis (Figure 2) and an overall effect size of 0.76 averaged across all clusters identified in the ASD Social Communication Covariate analysis (Figure 3).

Reviewer #2:

Impaired voice processing in.… children with autism" by Abrams et al.

This paper reports the uni- and multivariate analysis of fMRI acquired from children with ASD while listening to their mothers' or strangers' voice. The authors find quite striking correlations between social disability rating scales and both brain activity and functional connectivity.

The experimental stimuli, writing, analysis and cohort characterization are all of a very high standard. There are some limitations in the actual design that limit the nature of the inferences drawn. In a revised ms, these should be addressed through a subtle reframing and possibly additional discussion points.

1) The task stimuli are very well controlled but the task is essentially a passive listening task, with a low level vigilance task designed simply to keep the participants engaged with the stimuli. Therefore, in the absence of an explicit reward, the authors should be cautious of falling foul of reverse inference (Poldrack, 2006) when framing the findings in terms of "reward circuits". I guess listening to a voice, particularly a mother's voice, is implicitly rewarding, but caution should be taken in drawing to direct an inference in regards to dysfunctional reward processes (see Discussion section).

We thank the reviewer for this comment and acknowledge that, as with all naturalistic and biologically salient stimuli, we cannot know for certain whether aberrant activation and connectivity patterns measured in nucleus accumbens (NAc) and ventromedial prefrontal cortex (vmPFC) in children with ASD reflect reward processing in these regions. However, previous empirical evidence and theory, which we have highlighted in our manuscript, provide a strong theoretical foundation for considering vocal stimuli in the context of reward, even in the absence of an explicit reward task. First, there is a sizable behavior literature that shows the implicitly rewarding nature of the human voice, including mother’s voice, in neurotypical children (34-38). Second, behavioral evidence shows that children with ASD often fail to be attracted to human vocal sounds, even when they are able to engage with other sounds in their environment, which suggests that they may not find these sounds rewarding (39, 40). Third, an influential theory posits that social reward processing, such as weak reward attribution to vocal communication, may substantially contribute to pronounced social deficits in children with ASD (5, 6). Given these converging results and theory, we believe that considering reward in the context of diminished activity and connectivity in response to vocal sounds in brain regions that are closely associated with reward processing (i.e., NAc and vmPFC) in children ASD is an important hypothesis. Importantly, we have made every effort to temper statements to avoid issues with reverse inference in the revised manuscript. Specifically, we have used “results suggest…” or “results support/are consistent with the hypothesis that…” in all instances throughout the Abstract, Introduction, Results, and Discussion in which we discuss “reward” in the context of activity or connectivity associated with NAc or vmPFC.

Indeed, the whole positioning of functional neuroimaging studies as being more informative than behavioural tests should be mindful of the limitations of inferring disrupted functional circuits in disorders unless you are actually probing that function with an appropriate task.

We apologize for any confusion here. Our goal for these statements was to highlight neuroimaging research as an additional tool to test important hypotheses regarding voice and reward processing in children with ASD. We have clarified these statements in the revised manuscript.

2) Similarly, there is no manipulation of attention and therefore no means of telling whether the differences in activation during voice perception in ASD is simply a lack of due interest in, and attention to voices preceding (or consequent) to changes in reward-based learning.

While parametrically manipulating attention to human voices in children with ASD is an important question, unfortunately this was beyond the scope of the current work. Importantly, we hypothesize that a lack of interest in human voice processing is a fundamental prediction of the social motivation theory (5): humans engage and pay attention to rewarding stimuli in their environment, and a consistent lack of attention to a category of stimuli strongly suggests that these stimuli may not be rewarding to an individual (5). It is hoped that future studies will test this prediction and examine the relative contributions of attention and reward for human voices in children with ASD.

3) Again the authors draw a very direct line between mother's vs stranger's voice and social reasoning/communication (e.g. Abstract). But maternal voice is more than a simple cue but incorporates many other processes, including basic parental attachment and dyadic reciprocation.

We thank the reviewer for requesting clarification here. The relationship we had intended to highlight here was that of the link between social communication abilities in children with ASD and brain activation in social and reward brain areas during voice processing. We have clarified this important point in the revised manuscript.

4) What are the core defining disturbances underlying the diagnosis of ASD in this study? The information in paragraph two of subsection “Participants” simply refers to an algorithm.

Autism spectrum disorder is characterized by pronounced social communication deficits, particularly in the areas of social-emotional reciprocity and verbal and non-verbal communication, and repetitive and restricted behaviors (RRB) and interests (1). As stated in the Participants subsection of the Materials and methods, the children in the ASD sample are considered “high-functioning” and have fluent language skills, normal IQ, and above-average reading skills. Nevertheless, these children are generally characterized as having moderate-to-severe communication impairments, especially in the area of reciprocal conversation (1). We have included additional information regarding the defining characteristics of ASD in the revised Participants section.

While this might be sufficient for reproducibility, it is inadequate here for two reasons: First, as eLife is a general (not a clinical) journal, more depth and context is required for the broader readership. Second, if the diagnosis relies heavily on social deficits (which I suspect it does), is it then surprising that the strength of between group differences covaries so markedly within the ASD with social deficit scores. If so, is there an implicit circularity between the identification of these regions (Figure 2) and the very strong correlations against ADOS scores? If not, what other effects may be driving these? Such strong correlations are bound to draw attention and I think the authors should therefore pay careful attention to this issue.

We believe that there is confusion regarding how we performed the ADOS covariate analysis within children with ASD. It seems that this confusion stems from the fact that the reviewer misunderstood the analyses and thought that the ADOS covariate analysis used ROIs that were identified from the TD vs. ASD group analysis. This was not the case. Rather, the TD vs. ASD group analysis and the ADOS covariate analysis were separate whole-brain analyses. The use of separate whole-brain analyses in this context avoids circularity mentioned by the reviewer. We have made every effort to clarify this point in the revised Materials and methods and Results sections.

5) This is a modest sample size for the use of machine learning, particularly when using a high dimensional (functional connectivity) feature space. N-fold cross-validation does always not provide strong control here (Varoquaux, 2017): Do the authors undertake a feature reduction step, such as a LASSO?

We thank the reviewer for highlighting this important point. To examine the robustness of SVC and SVR results reported in the Results, a confirmatory analysis was performed using GLMnet (http://www-stat.stanford.edu/~tibs/glmnet-matlab), a logistic regression classifier that includes regularization and includes a feature reduction step. Results from GLMnet were similar to those reported for SVC and SVR results, and were reported in Results section of the initial submission of this manuscript.

Is there some logical circularity between identifying features with a group contrast, then using these same features in a between group classifier?

We thank the reviewer for inquiring about this point. As we stated in Materials and methods section of the initial submission:

“The rationale for the use of an a priori network is it is an established method of network identification that preempts task and sample-related biases in region-of-interest (ROI) selection. This approach therefore allows for a more generalizable set of results compared to a network defined based on nodes identified using the current sample of children and task conditions.”

We believe that this is not a circular approach for two reasons: (1) the group contrast used to identify ROIs for the functional connectivity analysis was from a previous study (41) in an independent sample of children with ASD relative to the participants in the current study, and (2) the brain imaging approach and analysis employed in that previous study was intrinsic functional connectivity using resting-state data and seed-based analyses, which provides complementary information regarding brain network organization relative to the task-based data and gPPI analysis used in the current study. The importance of this approach was highlighted in the subsection entitled A voice-related brain network approach for understanding social information processing in autism in the Discussion section of the initial submission. We would like to bring special attention to the final sentence in this paragraph (quoted below) which highlights the fact that the networks approach used in the current study bridges a critical gap between findings from intrinsic connectivity analyses (41) and task-based social information processing that is fundamental to social communication deficits in children with ASD.

“A central assumption of [the intrinsic functional connectivity] approach is that aberrant task-evoked circuit function is associated with clinical symptoms and behavior, however empirical studies examining these associations have been lacking from the ASD literature. Our study addresses this gap by probing task-evoked function within a network defined a priori from a previous study of intrinsic connectivity of voice-selective networks in an independent group of children with ASD. We show that voice-related network function during the processing of a clinically and biologically meaningful social stimulus predicts both ASD group membership as well as social communication abilities in these children. Findings bridge a critical gap between the integrity of the intrinsic architecture of the voice-processing network in children with ASD and network signatures of aberrant social information processing in these individuals.”

Also, I would avoid using the term "prediction" for contemporaneous variables, even when cross-validation is undertaken.

Consistent with many papers in the fMRI literature (42-45), the use of prediction to describe cross-validated results is a widely used convention and therefore we would prefer to use this nomenclature in our study.

6) I don't see any model-based analysis of neuronal interactions and hence don't think the authors are examining effective connectivity. gPPI is a purely linear model of statistical dependences and their moderation and hence falls into the class of functional connectivity.

We appreciate this suggestion and have removed all instances of “effective connectivity” in the revised manuscript.

Personally, I would prefer to have seen the author use a more dynamic, model-driven method of effective connectivity, using something like DCM to provide a deeper mechanistic insight into the changes in activation and information flow

While we share the reviewer’s interest in providing a deeper mechanistic insight into voice processing in children with ASD, we had significant concern regarding the implementation of DCM in the context of the relatively long TR (3.576 seconds) used in data collection. The reason for this long TR is that it allowed the auditory stimuli to be presented in silent periods between volume acquisitions. Moreover, serious concerns have been raised in the literature regarding DCM (46) especially when estimating causal influences with a large set of nodes as we did in the present study. The gPPI models used in our study are not faced with estimability issues.

I also am not sure I buy into the choice of ROI's that do not show the group effects.

The reason that ROIs were not selected based on a group effect is that this approach could be considered circular, and our goal was to provide a more generalizable set of results compared to a network defined based on nodes identified using the current sample of children and task conditions. The use of an a priori network is it is an established method of network identification that preempts task and sample-related biases in region-of-interest (ROI) selection (47-50).

However, given the classification success, the authors' approach seems entirely reasonable. However, please do specify the nature of the gPPI model in the text and supplementary material (what are the nodes, inputs and modulators; is it possible to represent this graphically?).

We examined functional connectivity between ROIs using the generalized psychophysiological interaction (gPPI) model (51), with the goal of identifying connectivity between ROIs in response to each task condition as well differences between task conditions (mother’s voice, other voice, environmental sounds). We used SPM gPPI toolbox for this analysis. gPPI is more sensitive than standard PPI to task context-dependent differences in connectivity (51). Unlike dynamical causal modeling (DCM), gPPI does not use a temporal precedence model (x(t+ 1) ~ x(t)) and therefore makes no claims of causality. The gPPI model is summarized in Equation 1 below:

ROItarget~convdeconvROIseed*taskwaveform+ROIseed+constant(1)

Briefly, in each participant, the regional timeseries from a seed ROI is deconvolved to uncover quasi-neuronal activity and then multiplied with the task design waveform for each task condition to form condition-specific gPPI interaction terms. These interaction terms are then convolved with the hemodynamic response function (HRF) to form gPPI regressors for each task condition. The final step is a standard general linear model predicting target ROI response after regressing out any direct effects of the activity in the seed ROI. In the equation above, ROItarget and ROIseed are the time series in the two brain regions, and taskwaveformcontains three columns corresponding to each task condition. We have included this description in the revised Functional Connectivity Analysis subsection of the Materials and methods.

7) Do the manipulations to the vocal signals (subsection “Stimulus post-processing”) possibly warp the sound of the speech? If so, could this influence the ASD responses (ASD being perhaps more tuned to low level features of stimulus inputs).

Manipulations to the vocal signals during stimulus preparation were minimal and were performed on all mother’s and unfamiliar voice and environmental sound stimuli included in the study. We hypothesize that significantly warped vocal samples would have resulted in reduced mother’s voice identification accuracy is at least one of the TD children, however results showed that all TD children performed above chance on this task, with 20 of 21 TD children revealing >90% identification accuracy on this task (mean mother’s voice identification accuracy in TD children was 98%; see Table 1). Moreover, while there are mixed reports of increased auditory discrimination in individuals with ASD (52-55), with studies showing a relatively small subgroup (~20%) of individuals with ASD with enhanced auditory perceptual abilities (i.e., “more tuned to low level features of stimulus inputs” as suggested by the reviewer), it is not immediately clear how these enhanced auditory perceptual abilities would diminish one’s ability to discriminate mother’s voice. An arguably more likely possibility is that established deficits in children with ASD associated with phonological abilities (2, 56-58), which involve the processing of the sound structure of language, would have been linked to reduced mother’s voice discrimination accuracy, however results reported above (see reviewer 1) and now included in the Appendix (Appendix 1—figure 1), failed to show a difference in phonological abilities in children with low and high mother’s voice identification accuracy.

Minor Comments:

1) Introduction, second sentence: "affected" -> "ASD"

Done

2) Figure 1B caption: Please add stats

Done

3) Discussion first paragraph: "These findings.…"; Third paragraph: "Our findings.…"

Done

4) Discussion paragraph three: ?"predicts"?

As we stated previously, the use of the word “prediction” to describe cross-validated results is a widely used convention (42-45), and therefore we would prefer to continue to use this nomenclature in our study.

5) Subsection “Participants”: please add more details regarding the diagnosis; e.g. "In essence, these ASD.…" (see point 4 above)

Done.

6) Table 1: What are the "typical" values of ADOS-social and ADI-A social"? Are these very impaired children (also see point 4 above).

We have added this information to the Participants subsection of the revised Materials and methods.

7) Materials and methods: Were there any between group differences in head motion parameters?

No, there were no between-group differences in head motion parameters. Mean and standard deviation for maximum head motion for TD and ASD groups are included in Table 1, and we have further clarified this point in the revisedParticipants subsection of the revised Materials and methods.

8) Subsection “Effective Connectivity Analysis”: I'm not sure what "preempts.… biases" means here – the authors are not modelling effects in the present data, which I think is a shame (see point 6 above)

Network identification in brain imaging studies presents several challenges. One important consideration is that selecting ROIs based on a GLM contrast from a sample of participants may be considered circular when those ROIs will then be used in a subsequent functional connectivity analysis on that same contrast and sample of participants. Therefore, by using ROIs from a previous study of the intrinsic architecture of voice-selective cortex in a separate sample of children with ASD (41), we have preempted biases associated with both task contrast and participant sample that would have emerged had we used task-based GLM results from the current sample to generate ROIs. Finally, we have provided an explanation for why we have not performed additional causal analyses in section 2.11 above.

Reviewer #3:

In this study, Abrams and colleagues examined voice processing in children (average age 10 years old) with and without autism. The specifically were interested in reward and salience circuitry and relate the study back to predictions made by the social motivation theory of autism. The authors applied group-level analyses, brain-behavior correlation analyses, as well as PPI connectivity analyses and applied multivariate classifier and regression analyses applied to the connectivity data. There are several issues which the authors may want to address.

1) Small sample size. The sample size for the study was n=21 per group. This sample size is likely not large enough to cover substantial heterogeneity that exists across the population of individuals with autism diagnosis.

We thank the reviewer for this comment and their concern for the ability of our study to address heterogeneity in ASD. As discussed in response to similar reviewer comments, here we have addressed this challenge with a comprehensive approach which both: (a) constrains heterogeneity in ASD to a critical diagnostic symptom domain, social communication abilities, and (b) explores heterogeneity within this symptom domain (i.e., “explaining interindividual differences” as mentioned by the reviewer) by identifying neural features that covary as a function of social communication abilities. The rationale for constraining heterogeneity in ASD to a symptom domain is that all individuals with ASD necessarily have pronounced impairments in diagnostic domains, and these domains have considerably less heterogeneity in individuals with ASD relative to other behavioral and cognitive domains. For example, in the social communication domain, which we explore in the current study, all individuals with ASD show pronounced deficits that are categorized by impaired communication and/or reciprocal social interactions (1). In contrast, please consider language function and IQ in children with ASD. Language function and IQ in these individuals varies widely, from very high levels, which are commensurate with neurotypical individuals, through very low levels of language and cognitive function (1-3). We argue that constraining analyses to an ASD diagnostic domain is an important approach for keeping a focus on critical areas that are central to the core deficits in the disorder, thereby providing important insight to clinicians, researchers, parents, and educators regarding the neurobiological bases of these core deficits.

The rationale for exploring heterogeneity within the social communication symptom domain is that high-functioning children with ASD, such as those included in our sample, often show a range of social communication abilities, from mild/moderate deficits to more severe deficits. For example, the Autism Diagnostic Observation Schedule (ADOS-2) is the gold-standard diagnostic instrument for ASD (4), and social communication subscores on the ADOS range from 0 to 22, with a 0 indicating no social communication deficit, a 7 indicating a more mild deficit, and a 22 indicating the most severe deficit. The high-functioning children included in our sample had a range of social communication abilities as measured with the ADOS between 7 and 16. Importantly, understanding the link between social communication symptom severity and social brain processing is a critical question for understanding the neurobiological basis of autism, and is a question that has not been explored in previous studies of human voice processing in children with ASD. The reason this is an important question is that the severity of social communication deficits can play a crucial role in autism models. For example, the social motivation theory of ASD posits that impairments in representing the reward value of human vocal sounds impedes individuals with ASD from engaging with these stimuli and contributes to social interaction difficulties (5, 6). Given the causal link proposed in this model, a key prediction of the social motivation theory is that children with more severe deficits associated with social reward will have greater social communication deficits compared to those children with less severe social reward deficits. Therefore, by examining heterogeneity within the social communication symptom domain, we are able to test an important prediction of an influential ASD model and understand the neural features that may contribute to this heterogeneity.

In our study, we have employed this approach and constrained heterogeneity in children with ASD by focusing on social communication abilities. We then examine heterogeneity within the social communication symptom domain and show that social communication abilities explain significant variance in activity and connectivity of social reward and salience brain regions during human voice processing.

Thus, questions about generalizability arise. Can future studies replicate these findings?

Reproducibility in neuroimaging research represents a major challenge for the study of pediatric clinical populations, such as children with ASD, whose data is considerably more difficult to acquire compared to typically-developing children (8). An important consideration is that sample size is not the only determinant for the replicability of fMRI task data. While previous studies have shown that increasing the sample size can improve the replicability of results (17), an important consideration is that the replicability of task fMRI data is not solely contingent on a large sample size but also depends on the amount of individual-level sampling. A recent report examining this question showed that modest sample sizes, comparable to those described in our submitted manuscript, yield highly replicable results with only four runs of task data (18). Moreover, replicability from smaller sample sizes using four runs of event-related task fMRI data exceeds the replicability of much larger sample sizes (N > 120) using only one run of block task fMRI data (18).

In the current study, we have used rigorous standards for inclusion that are, to the best of our knowledge, a first for autism and neurodevelopmental neuroimaging research. Specifically, we required that each child participant had at least 7 functional imaging runs of our event-related fMRI task (4 min each) that met our strict head movement criteria. This multi-run approach yields many more trials per condition (~150) than previous studies, thereby significantly enhancing power to detect effects within each child, and has only been used previously in visual neurosciences research in adults. To our knowledge, these rigorous within-subject criteria, which have been shown to be a critical factor for producing replicable task fMRI findings (18), are a first for autism and neurodevelopmental neuroimaging research.

Small sample size also means lower statistical power for identifying more subtle effects, and this is especially important for the context of whole-brain between-group or brain-behavior correlation analysis which the authors have solely relied on for the activation and clinical correlation analyses (see Cremers, Wager, & Yarkoni, 2017, PLoS One).

We thank the reviewer for drawing our attention to these important concepts. From one perspective, identifying more subtle effects comes with its own challenges: weak effects are often viewed with caution irrespective of statistical power. As stated by Reddan, Lindquist, & Wager, (2017, JAMA Psychiatry), “small effects can reach statistical significance given a large enough sample, even if they are unlikely to be of practical importance or replicable across diverse samples.”

While we agree that a larger sample size would have been preferred, we argue that the rigorous within-subject criteria implemented for the current study, which is described in detail in response to the points raised in 3.2 above and is a first for neuroimaging studies of children with ASD, bolsters the ability for this study to identify more subtle GLM effects (18). It should also be noted that we also identified significant between-group (Figure 4) and brain-behavior relationships (Figure 5) using an a priori brain network identified from an independent sample of children with ASD (41).

For multivariate classifiers and regressions, these too produce inflated and over-optimistic levels of predictions with smaller sample size (e.g., Woo et al., 2017, Nature Neuroscience).

We again thank the reviewer for this comment regarding sample size, and have considered the important report by Woo et al. (2017) in the context of our study. While we agree that a larger sample size would have been preferable, we note that the current study avoided analysis procedures identified by Woo et al. that are performed across the dataset before training and testing data (e.g., denoising, scaling, component analyses, and feature selection) and that can create “dependence and optimistic biases in [cross-validated] accuracy”. Crucially, as discussed previously, we required that each child participant had at least 7 functional imaging runs of our event-related fMRI task (4 min each) that met our strict head movement criteria. The acquisition of high quality brain imaging data, including at least 7 functional runs from each child, is extremely difficult in the context of pediatric clinical populations (8) and is unprecedented in studies of autism.

2) Given the small sample sizes, but relatively strong and justified anatomical hypotheses, why not run ROI analyses instead of whole-brain analyses? Statistical power would likely be increased for ROI analyses, and one can cite more unbiased estimates of effect size. Whole-brain analyses can show us is where the likely effects might be, and this is helpful when we don't have strong anatomical hypotheses. But here the authors do have strong anatomical hypotheses, yet they choose an analysis approach that is not congruent with that and penalizes them in terms of statistical power and doesn't allow for estimation of unbiased effect sizes.

We thank the reviewer for this remark. We did in fact perform ROI analyses on the GLM results using ROIs from our previous intrinsic functional connectivity paper in children with ASD (41), however several issues emerged when we used this approach. First, intrinsic connectivity results from this previous study identified 16 ROIs, and including all of these ROIs would have required FDR correction, which would have reduced the ability to detect subtle effects. Furthermore, had we limited the number of ROIs included in the analysis to reduce the multiple comparisons issue, it may have appeared that we were selecting and reducing the ROIs after results are known (i.e., SHARKing (59)). Finally, results showed that there was not exact overlap between ROIs from our previous intrinsic functional connectivity study and GLM effects identified in the current study for unfamiliar and mother’s voice contrasts, which resulted in GLM activity in reward and affective processing regions (i.e., NAc, vmPFC, anterior insula, anterior cingulate cortex) that did not yield significant GLM results in the a priori ROIs. Consequently, we did not report ROI-based GLM results in the initial submission of the manuscript.

While the use of ROIs based on anatomical hypotheses would have been justified for GLM analyses, we do not believe that the use of whole-brain analysis in the context of the current study presents a methodological weakness. Consistent with the reviewer’s statement, the use of whole-brain analysis penalized our ability to identify effects, and despite this penalty, GLM results identified effects in reward and salience processing regions in response to vocal sounds in children with ASD.

What is missing from the paper is an estimate of how big the effects are likely to be, as this is what we should ideally care about (Reddan, Lindquist, & Wager, 2017, JAMA Psychiatry). Future studies that may try to replicate this study will need to know what the effect sizes are likely to be. Meta-analyses ideally need unbiased estimates of effect size. However, all we have to go off of here are the authors figures showing whole-brain maps, that likely just tell us where some of the largest effects may likely be.

3) Reported effect sizes in Figure 3 (e.g., Pearson's r) are likely inflated given small sample sizes and also due to the fact that it appears that the reported r values in the figure are likely taken from the peak voxel.

The plots in Figure 3 were meant to aid visualization of regional brain responses, and were not intended to reflect effect size. As noted in the excellent report by Reddan, Lindquist, & Wager, effects sizes in brain imaging studies are prone to numerous biases including the number of tests performed and number of brain regions/voxels examined, and the specific brain regions selected. In general, there is no good solution to this problem. A contributing factor is the more stringent GLM activation thresholds that are published in more recent fMRI papers: effect sizes increase when higher voxel-wise t-scores are used to compute them – there is no good solution to this problem. To provide a better estimate of effect size, we used the originally computed t-scores from the whole-brain GLM analysis. Instead of examining the peak, we averaged the t-scores in each cluster and computed the effect size = t-scores/(sqrt(N)), where N is the sample size. In the revised manuscript, these effect sizes have replaced the R and P values previously listed in Figure 3 scatterplots, and are also provided in Appendix 1—tables 1 and 2. To provide additional guidance for future studies that seek to replicate our findings, we report an overall effect size of 0.68 averaged across all brain regions examined in the TD vs. ASD group analysis (Figure 2) and an overall effect size of 0.76 averaged across all brain regions examined in the ASD Social Communication Covariate analysis (Figure 3).

4) Scatterplots in Figure 3 show inverted y-axes so that higher numbers on at the bottom and lower numbers are at the top. The reported correlations are negative, and yet the scatterplot shows what looks like a positive correlation. All this confusion is due to the inverted y-axes. The authors should correct the plots to avoid this confusion.

We inverted the y-axes in the initial submission since greater values of ADOS Social Communication scores are associated with more severe deficits, and often readers prefer to see reduced abilities at the bottom of the y-axis rather than at the top. If the reviewer still feels that we should make the change, we would be happy to.

5) The authors heavily rely on reverse inference to relate their findings back to the social motivation theory. However, if their manipulations were powerful enough to create a distinction between a stimulus that was heavily socially rewarding (e.g., mother's voice) versus another that is not (e.g., unfamiliar voice), then shouldn't there be some kind of difference in the main activation analysis in reward-related areas (i.e. Figure 2B)? In other words the main contrast of interest that might have been most relevant to the social motivation theory produces no group differences in activation in areas like the ventral striatum. Because this contrast doesn't really pan out the way the theory predicts, doesn't this cast doubt on the social motivation theory, or couldn't it be that some of the contrasts in this study can be better explained by some other kind of reverse inference than the social motivation theory?

We thank the reviewer for this comment. Our interpretation of the results is that there was a high degree of variance within the ASD group with regards to neural responses to vocal stimuli, and that variance within the ASD group was comparable to (or exceeded) the between-group variance, prohibiting significant between-group differences. This interpretation is supported by results in TD children (Appendix 1—figure 4A) and the scatterplots in Figure 3 showing the relationship between ADOS Social scores and neural activation profiles in children with ASD. While TD children, who do not have social deficits, showed robust responses in NAc and vmPFC in response to vocal stimuli (Appendix 1—figure 4A), activity in these regions was not evident in the ASD group (Appendix 1—figure 4B) until ADOS Social scores were included as a covariate in the analysis (Main Figure 3). Specifically, these latter results showed that the greater the social function in the children with ASD, the greater the activity in regions associated with reward processing, including NAc and vmPFC. Results provide new evidence for the social motivation theory (5) by suggesting that the degree of social reward impairment varies as a function of social abilities, supporting a link between being tuned into the social world and being rewarded by the social world.

6) From what I could tell, no manipulation check was done to measure some aspect of how rewarding the stimuli were to participants. This seems critical if the authors want to make strong reverse inferences back to the social motivation theory.

We thank the reviewer for this comment. While we agree that it would have been optimal to have a behavioral measure of vocal reward for these children, we were not confident that we would be able to elicit valid behavioral responses regarding an abstract concept like “reward” from children with ASD as young as 7-8 years old with moderate to severe communication deficits. Indeed, there is concern that neurotypical children in this age range might have difficulty comprehending the nature of “reward” in the context of their mother’s voice, a ubiquitous sound source in many children’s environment since before birth. Furthermore, to our knowledge, there are no validated behavioral measures for auditory processing analogous to eye-tracking (19, 21, 22) that might have been used for children in this age range to infer reward processing for these vocal sounds.

7) The authors should include a limitations section to their paper.

We have added a paragraph that identifies limitations of the current study to the revised Discussion section.

References

1. Association AP. Diagnostic and statistical manual of mental disorders: DSM-5. Washington, D.C.: American Psychiatric Association; 2013.

2. Kjelgaard MM, Tager-Flusberg H. An Investigation of Language Impairment in Autism: Implications for Genetic Subgroups. Lang Cogn Process. 2001;16(2-3):287-308. doi:10.1080/01690960042000058.

3. Tager-Flusberg H, R. Paul, and C. Lord. Language and Communication in Autism. In: F.R. Volkmar RP, and A. Klin, editor. Handbook of Autism and Pervasive Developmental Disorders, Volume 1: Diagnosis, Development, Neurobiology, and Behavior. I. Hoboken, NJ: John Wiley & Sons, Incorporated; 2005. p. 335-64.

4. Lord C, Rutter M, DiLavore PC, Risi S, Gotham K, Bishop S. Autism diagnostic observation schedule, second edition. Torrance, CA: Western Psychological Services; 2012.

5. Chevallier C, Kohls G, Troiani V, Brodkin ES, Schultz RT. The social motivation theory of autism. Trends Cogn Sci. 2012;16(4):231-9. doi:10.1016/j.tics.2012.02.007.

6. Dawson G, Carver L, Meltzoff AN, Panagiotides H, McPartland J, Webb SJ. Neural correlates of face and object recognition in young children with autism spectrum disorder, developmental delay, and typical development. Child Dev. 2002;73(3):700-17. doi:Doi 10.1111/1467-8624.00433.

7. Cohen JR, Asarnow RF, Sabb FW, Bilder RM, Bookheimer SY, Knowlton BJ, Poldrack RA. Decoding continuous variables from neuroimaging data: basic and clinical applications. Front Neurosci. 2011;5:75. doi:10.3389/fnins.2011.00075.

8. Yerys BE, Jankowski KF, Shook D, Rosenberger LR, Barnes KA, Berl MM, Ritzl EK, Vanmeter J, Vaidya CJ, Gaillard WD. The fMRI success rate of children and adolescents: typical development, epilepsy, attention deficit/hyperactivity disorder, and autism spectrum disorders. Hum Brain Mapp. 2009;30(10):3426-35. doi:10.1002/hbm.20767.

9. Jao Keehn RJ, Sanchez SS, Stewart CR, Zhao W, Grenesko-Stevens EL, Keehn B, Muller RA. Impaired downregulation of visual cortex during auditory processing is associated with autism symptomatology in children and adolescents with autism spectrum disorder. Autism Res. 2017;10(1):130-43. doi:10.1002/aur.1636.

10. Wadsworth HM, Maximo JO, Donnelly RJ, Kana RK. Action simulation and mirroring in children with autism spectrum disorders. Behav Brain Res. 2018;341:1-8. doi:10.1016/j.bbr.2017.12.012.

11. Oberwelland E, Schilbach L, Barisic I, Krall SC, Vogeley K, Fink GR, Herpertz-Dahlmann B, Konrad K, Schulte-Ruther M. Young adolescents with autism show abnormal joint attention network: A gaze contingent fMRI study. Neuroimage Clin. 2017;14:112-21. doi:10.1016/j.nicl.2017.01.006.

12. Wadsworth HM, Maximo JO, Lemelman AR, Clayton K, Sivaraman S, Deshpande HD, Ver Hoef L, Kana RK. The Action Imitation network and motor imitation in children and adolescents with autism. Neuroscience. 2017;343:147-56. doi:10.1016/j.neuroscience.2016.12.001.

13. Utzerath C, Schmits IC, Buitelaar J, de Lange FP. Adolescents with autism show typical fMRI repetition suppression, but atypical surprise response. Cortex. 2018;109:25-34.

14. Greene RK, Spanos M, Alderman C, Walsh E, Bizzell J, Mosner MG, Kinard JL, Stuber GD, Chandrasekhar T, Politte LC, Sikich L, Dichter GS. The effects of intranasal oxytocin on reward circuitry responses in children with autism spectrum disorder. J Neurodev Disord. 2018;10(1):12. doi:10.1186/s11689-018-9228-y.

15. Vogan VM, Francis KE, Morgan BR, Smith ML, Taylor MJ. Load matters: neural correlates of verbal working memory in children with autism spectrum disorder. J Neurodev Disord. 2018;10(1):19. doi:10.1186/s11689-018-9236-y.

16. Lynch CJ, Breeden AL, You X, Ludlum R, Gaillard WD, Kenworthy L, Vaidya CJ. Executive Dysfunction in Autism Spectrum Disorder Is Associated With a Failure to Modulate Frontoparietal-insular Hub Architecture. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. 2017;2(6):537-45.

17. Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafo MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14(5):365-76. doi:10.1038/nrn3475.

18. Nee DE. Correspondence: fMRI replicability depends upon sufficient individual level data. bioRxiv. 2018.

19. Safra L, Ioannou C, Amsellem F, Delorme R, Chevallier C. Distinct effects of social motivation on face evaluations in adolescents with and without autism. Sci Rep. 2018;8(1):10648. doi:10.1038/s41598-018-28514-7.

20. Chevallier C, Tonge N, Safra L, Kahn D, Kohls G, Miller J, Schultz RT. Measuring Social Motivation Using Signal Detection and Reward Responsiveness. PLoS One. 2016;11(12):e0167024. doi:10.1371/journal.pone.0167024.

21. Dubey I, Ropar D, de CHAF. Brief Report: A Comparison of the Preference for Viewing Social and Non-social Movies in Typical and Autistic Adolescents. J Autism Dev Disord. 2017;47(2):514-9. doi:10.1007/s10803-016-2974-3.

22. Dubey I, Ropar D, Hamilton AF. Measuring the value of social engagement in adults with and without autism. Mol Autism. 2015;6:35. doi:10.1186/s13229-015-0031-2.

23. Sepeta L, Tsuchiya N, Davies MS, Sigman M, Bookheimer SY, Dapretto M. Abnormal social reward processing in autism as indexed by pupillary responses to happy faces. J Neurodev Disord. 2012;4(1):17. doi:10.1186/1866-1955-4-17.

24. Wechsler D. The Wechsler Abbreviated Scale of Intelligence. San Antonio, TX: The Psychological Corporation; 1999.

25. Wagner RK, Torgesen JK, Rashotte CA. Comprehensive Test of Phonological Processing (CTOPP). Pro-Ed I, editor. Austin, TX1999.

26. Semel E, Wiig EH, Secord WH. Clinical evaluation of language fundamentals – Fourth edition (CELF-4). San Antonio, TX: Psychological Corporation; 2003.

27. Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B. Voice-selective areas in human auditory cortex. Nature. 2000;403(6767):309-12. doi:10.1038/35002078.

28. Gotham K, Pickles A, Lord C. Standardizing ADOS scores for a measure of severity in autism spectrum disorders. J Autism Dev Disord. 2009;39(5):693-705. doi:10.1007/s10803-008-0674-3.

29. Hus V, Gotham K, Lord C. Standardizing ADOS domain scores: separating severity of social affect and restricted and repetitive behaviors. J Autism Dev Disord. 2014;44(10):2400-12. doi:10.1007/s10803-012-1719-1.

30. Abrams DA, Chen T, Odriozola P, Cheng KM, Baker AE, Padmanabhan A, Ryali S, Kochalka J, Feinstein C, Menon V. Neural circuits underlying mother's voice perception predict social communication abilities in children. Proc Natl Acad Sci U S A. 2016;113(22):6295-300. doi:10.1073/pnas.1602948113.

31. Horikawa T, Kamitani Y. Generic decoding of seen and imagined objects using hierarchical visual features. Nat Commun. 2017;8:15037. doi:10.1038/ncomms15037.

32. Kashyap S, Ivanov D, Havlicek M, Sengupta S, Poser BA, Uludag K. Resolving laminar activation in human V1 using ultra-high spatial resolution fMRI at 7T. Sci Rep. 2018;8(1):17063. doi:10.1038/s41598-018-35333-3.

33. Wen H, Shi J, Chen W, Liu Z. Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization. Sci Rep. 2018;8(1):3752. doi:10.1038/s41598-018-22160-9.

34. DeCasper AJ, Fifer WP. Of human bonding: newborns prefer their mothers' voices. Science. 1980;208(4448):1174-6.

35. Seltzer LJ, Prososki AR, Ziegler TE, Pollak SD. Instant messages vs. speech: hormones and why we still need to hear each other. Evolution and Human Behavior. 2012;33(1):42-5. doi:10.1016/j.evolhumbehav.2011.05.004.

36. Seltzer LJ, Ziegler TE, Pollak SD. Social vocalizations can release oxytocin in humans. Proc Biol Sci. 2010;277(1694):2661-6. doi:10.1098/rspb.2010.0567.

37. Thoman EB, Korner AF, Beasonwilliams L. Modification of Responsiveness to Maternal Vocalization in Neonate. Child Dev. 1977;48(2):563-9. doi:DOI 10.1111/j.1467-8624.1977.tb01198.x.

38. Lamb ME. Developing trust and perceived effectance in infancy. Advances in Infancy Research. Norwood, NJ: Ablex; 1981. p. 101-27.

39. Klin A. Young autistic children's listening preferences in regard to speech: a possible characterization of the symptom of social withdrawal. J Autism Dev Disord. 1991;21(1):29-42.

40. Kuhl PK, Coffey-Corina S, Padden D, Dawson G. Links between social and linguistic processing of speech in preschool children with autism: behavioral and electrophysiological measures. Dev Sci. 2005;8(1):F1-F12. doi:10.1111/j.1467-7687.2004.00384.x.

41. Abrams DA, Lynch CJ, Cheng KM, Phillips J, Supekar K, Ryali S, Uddin LQ, Menon V. Underconnectivity between voice-selective cortex and reward circuitry in children with autism. Proc Natl Acad Sci U S A. 2013;110(29):12060-5. doi:10.1073/pnas.1302982110.

42. Mitchell TM, Shinkareva SV, Carlson A, Chang KM, Malave VL, Mason RA, Just MA. Predicting human brain activity associated with the meanings of nouns. Science. 2008;320(5880):1191-5. doi:10.1126/science.1152876.

43. Falk EB, Berkman ET, Mann T, Harrison B, Lieberman MD. Predicting persuasion-induced behavior change from the brain. J Neurosci. 2010;30(25):8421-4. doi:10.1523/JNEUROSCI.0063-10.2010.

44. Hoeft F, McCandliss BD, Black JM, Gantman A, Zakerani N, Hulme C, Lyytinen H, Whitfield-Gabrieli S, Glover GH, Reiss AL, Gabrieli JD. Neural systems predicting long-term outcome in dyslexia. Proc Natl Acad Sci U S A. 2011;108(1):361-6. doi:10.1073/pnas.1008950108.

45. Dosenbach NU, Nardos B, Cohen AL, Fair DA, Power JD, Church JA, Nelson SM, Wig GS, Vogel AC, Lessov-Schlaggar CN, Barnes KA, Dubis JW, Feczko E, Coalson RS, Pruett JR, Jr., Barch DM, Petersen SE, Schlaggar BL. Prediction of individual brain maturity using fMRI. Science. 2010;329(5997):1358-61. doi:10.1126/science.1194144.

46. Lohmann G, Erfurth K, Muller K, Turner R. Critical comments on dynamic causal modelling. Neuroimage. 2012;59(3):2322-9. doi:10.1016/j.neuroimage.2011.09.025.

47. Floris DL, Lai MC, Auer T, Lombardo MV, Ecker C, Chakrabarti B, Wheelwright SJ, Bullmore ET, Murphy DG, Baron-Cohen S, Suckling J. Atypically rightward cerebral asymmetry in male adults with autism stratifies individuals with and without language delay. Hum Brain Mapp. 2016;37(1):230-53. doi:10.1002/hbm.23023.

48. Lombardo MV, Pierce K, Eyler LT, Carter Barnes C, Ahrens-Barbeau C, Solso S, Campbell K, Courchesne E. Different functional neural substrates for good and poor language outcome in autism. Neuron. 2015;86(2):567-77. doi:10.1016/j.neuron.2015.03.023.

49. Hong SJ, Valk SL, Di Martino A, Milham MP, Bernhardt BC. Multidimensional Neuroanatomical Subtyping of Autism Spectrum Disorder. Cereb Cortex. 2018;28(10):3578-88. doi:10.1093/cercor/bhx229.

50. Pantelis PC, Byrge L, Tyszka JM, Adolphs R, Kennedy DP. A specific hypoactivation of right temporo-parietal junction/posterior superior temporal sulcus in response to socially awkward situations in autism. Soc Cogn Affect Neurosci. 2015;10(10):1348-56. doi:10.1093/scan/nsv021.

51. McLaren DG, Ries ML, Xu G, Johnson SC. A generalized form of context-dependent psychophysiological interactions (gPPI): a comparison to standard approaches. Neuroimage. 2012;61(4):1277-86. doi:10.1016/j.neuroimage.2012.03.068.

52. Jones CR, Happe F, Baird G, Simonoff E, Marsden AJ, Tregay J, Phillips RJ, Goswami U, Thomson JM, Charman T. Auditory discrimination and auditory sensory behaviours in autism spectrum disorders. Neuropsychologia. 2009;47(13):2850-8. doi:10.1016/j.neuropsychologia.2009.06.015.

53. Bonnel A, Mottron L, Peretz I, Trudel M, Gallun E, Bonnel AM. Enhanced pitch sensitivity in individuals with autism: a signal detection analysis. J Cogn Neurosci. 2003;15(2):226-35. doi:10.1162/089892903321208169.

54. Heaton P, Williams K, Cummins O, Happe F. Autism and pitch processing splinter skills: a group and subgroup analysis. Autism. 2008;12(2):203-19. doi:10.1177/1362361307085270.

55. Heaton P. Interval and contour processing in autism. J Autism Dev Disord. 2005;35(6):787-93. doi:10.1007/s10803-005-0024-7.

56. Bartolucci G, Pierce S, Streiner D, Eppel PT. Phonological investigation of verbal autistic and mentally retarded subjects. J Autism Child Schizophr. 1976;6(4):303-16.

57. Bartolucci G, Pierce SJ. A preliminary comparison of phonological development in autistic, normal, and mentally retarded subjects. Br J Disord Commun. 1977;12(2):137-47.

58. Bishop DV, Maybery M, Wong D, Maley A, Hill W, Hallmayer J. Are phonological processing deficits part of the broad autism phenotype? Am J Med Genet B Neuropsychiatr Genet. 2004;128B(1):54-60. doi:10.1002/ajmg.b.30039.

59. Poldrack RA, Baker CI, Durnez J, Gorgolewski KJ, Matthews PM, Munafo MR, Nichols TE, Poline JB, Vul E, Yarkoni T. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat Rev Neurosci. 2017;18(2):115-26. doi:10.1038/nrn.2016.167.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

We appreciate the enthusiasm of reviewer #1 for our revised manuscript. To further address the Reviewing Editor’s and reviewer #3’s concern regarding our sample size, we have included an additional sentence to the revised limitations section, which is tracked in the revised manuscript. We have also modified the scatterplots as requested by reviewer #3.

https://doi.org/10.7554/eLife.39906.029

Article and author information

Author details

  1. Daniel Arthur Abrams

    Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    daa@stanford.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1255-1200
  2. Aarthi Padmanabhan

    Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Supervision, Writing—original draft, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3727-5468
  3. Tianwen Chen

    Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  4. Paola Odriozola

    Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Data curation, Investigation, Project administration
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1641-4139
  5. Amanda E Baker

    Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Data curation, Investigation, Project administration
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0140-2162
  6. John Kochalka

    Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  7. Jennifer M Phillips

    Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6360-2346
  8. Vinod Menon

    1. Program in Neuroscience, Stanford University School of Medicine, Stanford, United States
    2. Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    menon@stanford.edu
    Competing interests
    No competing interests declared

Funding

National Institute of Mental Health (MH102428)

  • Daniel Arthur Abrams

Brain and Behavior Research Foundation (NARSAD Young Investigator Grant)

  • Daniel Arthur Abrams

Stanford School of Medicine, Stanford Medicine, Stanford University (UL1TR001085)

  • Daniel Arthur Abrams

National Center for Advancing Translational Sciences (UL1TR001085)

  • Daniel Arthur Abrams

National Institute on Deafness and Other Communication Disorders (DC011095)

  • Vinod Menon

National Institute of Mental Health (MH084164)

  • Vinod Menon

Singer Family Foundation

  • Vinod Menon

Simons Foundation (308939)

  • Vinod Menon

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by NIH Grants K01 MH102428 (to DAA), and DC011095 and MH084164 (to VM), a NARSAD Young Investigator Grant from the Brain and Behavior Research Foundation (to DAA), Stanford Child Health Research Institute and the Stanford NIH-NCATS-CTSA (to DAA; UL1 TR001085), the Singer Foundation, and the Simons Foundation/SFARI (308939, VM). We thank all the children and their parents who participated in our study, E Adair and the staff at the Stanford Lucas Center for Imaging for assistance with data collection, S Karraker for assistance with data processing, H Abrams and C Anderson for help with stimulus production, and C Feinstein for helpful discussions.

Ethics

Human subjects: The Stanford University Institutional Review Board approved the study protocol (Protocol # 11849). Parental written informed consent and consent to publish were obtained for all participants, and the child's assent was obtained for all evaluation procedures. Children were paid for their participation in the study. All procedures performed were in accordance with ethical standards set out by the Federal Policy for the Protection of Human Subjects (or 'Common Rule', U.S. Department of Health and Human Services Title 45 DFR 46).

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Michael Breakspear, QIMR Berghofer Medical Research Institute, Australia

Reviewer

  1. Coralie Chevallier, INSERM, France

Publication history

  1. Received: July 7, 2018
  2. Accepted: January 29, 2019
  3. Version of Record published: February 26, 2019 (version 1)

Copyright

© 2019, Abrams et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 719
    Page views
  • 98
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)