Recursive self-embedded vocal motifs in wild orangutans

  1. Adriano R Lameira  Is a corresponding author
  2. Madeleine E Hardus
  3. Andrea Ravignani
  4. Teresa Raimondi
  5. Marco Gamba
  1. Department of Psychology, University of Warwick, United Kingdom
  2. Independent Researcher, United Kingdom
  3. Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, Netherlands
  4. Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music Aarhus/Aalborg, Denmark
  5. Department of Human Neurosciences, Sapienza University of Rome, Italy
  6. Department of Life Sciences and Systems Biology, University of Turino, Italy


Recursive procedures that allow placing a vocal signal inside another of a similar kind provide a neuro-computational blueprint for syntax and phonology in spoken language and human song. There are, however, no known vocal sequences among nonhuman primates arranged in self-embedded patterns that evince vocal recursion or potential incipient or evolutionary transitional forms thereof, suggesting a neuro-cognitive transformation exclusive to humans. Here, we uncover that wild flanged male orangutan long calls feature rhythmically isochronous call sequences nested within isochronous call sequences, consistent with two hierarchical strata. Remarkably, three temporally and acoustically distinct call rhythms in the lower stratum were not related to the overarching rhythm at the higher stratum by any low multiples, which suggests that these recursive structures were neither the result of parallel non-hierarchical procedures nor anatomical artifacts of bodily constraints or resonances. Findings represent a case of temporally recursive hominid vocal combinatorics in the absence of syntax, semantics, phonology, or music. Second-order combinatorics, ‘sequences within sequences’, involving hierarchically organized and cyclically structured vocal sounds in ancient hominids may have preluded the evolution of recursion in modern language-able humans.

eLife assessment

The paper represents a novel application of recursion theory to the long call vocalizations of orangutans to demonstrate repetitive, rhythmic sub-structuring. The authors use detailed acoustic analyses to show compelling evidence for self-embedded and nested isochronic motifs. These fundamental results have the potential to significantly advance current approaches used to compare nonhuman communication systems with human language.

eLife digest

Language is the most powerful communication tool known in nature. By combining a finite set of elements, it allows us to encode infinite messages. This enables communication about virtually anything, from alerting others to potential dangers, to recommending a favourite book. The prevailing theory of the last 70 years suggests that this ability rests on a computational process in the brain that is unique to humans, known as recursion.

Recursion enables humans to produce and place a language element or pattern of elements inside another element or pattern of the same kind. In this way, a clause can be embedded inside another ‘carrier’ clause to extend a thought, argument, or scenario, for example, “the dog, which chased the cat, was barking”. While recursion offers a simple, yet potent, explanation for the endless possibilities of language, how and why recursion – and by extension language – emerged in humans but no other animals remains a mystery.

Lameira et al. observed vocal patterns in wild orangutans that appeared to be composed of different elements. As orangutans and other great apes are our closest living relatives, they represent the most realistic model for studying the ability of human ancestors to use and comprehend language. Therefore, Lameira et al. set out to determine if this was a case of vocal patterning embedded within a similar vocal pattern, which could indicate that recursion underpins production of these calls.

Analysing recordings of long calls made by wild male orangutans showed that they are organized as two layers, where calls with a regular beat (or tempo) are produced within another “carrier” call of a different tempo. Up to three different call types, each with their own signature tempo, can occur within the same carrier call. Further analysis confirmed these call types were unrelated to the carrier.

The findings of Lameira et al. demonstrate that orangutans produce recursive vocal sequences that could represent a possible precursor to recursion in humans, offering a potential avenue for studying how recursion, and ultimately language, evolved in humans. In the future, better understanding of how language evolved may help to refine machine learning algorithms that aim to recognize, predict or generate text.


Among the many definitions of recursion (Martins, 2012), the view that it represents the repetition of an element or pattern within a self-similar element or pattern has crossed centuries and disciplines, from von Humboldt, 1836 and Hockett, 1960 to Mandelbrot, 1980 and Chomsky, 2010; from fractals in mathematics (Mandelbrot, 1980) to generative grammars in linguistics (Chomsky, 2010), from graphic (e.g., ‘Print Gallery’ by M. C. Escher) to popular art (e.g., 1940’s Batman #8 comic book cover). Across varying terminologies, the common denominator across fields is that to re-curse (from the Latin to ‘re-run’ or ‘re-invoke’) is an operation that produces multiple, potentially infinite sets of items from one initial item or a finite set. This is achieved by nesting an item within itself or within another item of the same kind. Recursive patterns in everyday life are ubiquitous and include, for example, computer folders stored inside other computer folders, Russian dolls nested in each other, Romanesco broccoli’s spirals arranged in a spiral, and the same number of minutes passed within the same number of hours (e.g., 12:12). Accordingly, recursion is not the simple repetition of a pattern or item on a single level (e.g., computer folders or Russian dolls side by side), but the placement of a pattern or item within itself (e.g., computer folders or Russian dolls inside each other), hence, generating different hierarchical levels or strata. This means that the same pattern or item is encountered at least at two different scales (e.g., 12 at the scale of hours, and 12 at the scale of minutes).

In language, although classically associated with syntax (Chomsky, 2010; Idsardi et al., 2018), recursion and its diagnostic self-embedded patterns have been recognized in phonology (Bennett, 2018; Elfner, 2015; Barış and Revithiadou, 2009; Nasukawa, 2015; Nasukawa, 2020; Vogel, 2012) and in verbal and non-verbal music (Jackendoff, 2009; Koelsch et al., 2013; Martins et al., 2017; Sharma and Chimalakonda, 2018), making these systems open-ended and theoretically inexhaustible. Recursive vocal sequences or structures in nonhuman primates could potentially inform incipient or transitional states of recursion along human evolution before the rise of modern language. However, their apparent absence, notably in great apes – our closest living relatives – has been interpreted as indicating that a neuro-cognitive or neuro-computational transformation occurred in our lineage but none other (Hauser et al., 2002). This absence of evidence has led some scholars to question altogether the role of natural selection for the emergence of language, tacitly favoring sudden ‘hopeful monster’ mutant scenarios (Berwick and Chomsky, 2019; Bolhuis and Wynne, 2009).

Decades-long debates on the evolution of language have carved around the successes and limitations of empirical comparative animal research (Bolhuis et al., 2018; Bowling and Fitch, 2015; Corballis, 2014; Lameira, 2017a; Lameira and Call, 2020; Martins and Boeckx, 2019; Rawski et al., 2021; Townsend et al., 2018). Syntax-like vocal combinatorics have been identified in some bird (Engesser et al., 2019; Engesser et al., 2016; Suzuki et al., 2016; Suzuki et al., 2017) and primate species (Jiang et al., 2018; Wang et al., 2015; Watson et al., 2020), but vocal combinatorics were not claimed to be recursive nor was recursion directly tested. Three notable exceptions demonstrated recursion learning in nonhuman animal settings: (Gentner et al., 2006) in European starlings, Ferrigno et al., 2020 in rhesus macaques, and Liao et al., 2022 in crows. These studies show that animals can learn to recognize recursion in synthetic stimuli after dedicated human training in laboratory settings, but they do not show spontaneous production of recursive vocal combinatorics in naturalistic settings. Evidence of recursive vocal structures in wild animals (i.e., without human priming or intervention), notably in primates closely related to humans, such as great apes, would better inform what evolutionary precursors and processes could have led to the emergence of recursion in the human lineage.

Direct structural approach to recursive combinatorics

A novel, direct approach to recursive vocal combinatorics in wild primates is desirable to help infer signal patterns that were recursive in some degree or kind in an extinct past, and molded subsequently into the recursive structures observed today in humans. By virtue of their own primitive nature, proto-recursive structures did not likely fall within modern-day classifications. Therefore, they will often fail to be predicted based on assumptions guided by modern language (Kershenbaum et al., 2014; Miyagawa, 2021). To this end, a structural approach is particularly advantageous based on the cross-disciplinary definition of recursion as ‘the nesting of an element or pattern within a self-similar element or pattern’. First, no prior assumptions are required about species’ cognitive capacities. High-level neuro-motor procedures are inferred only to the extent that these are directly reflected in how signal sequences are organized. For example, Chomsky’s definition of recursion (Chomsky, 2010) can generate non-self-embedded signal structures, but these would be for that same reason operationally undetectable amongst other signal combinations. Second, no prior assumptions are required about signal meaning. There are no certain parallels between semantic content and word meaning in animals, but analyses of signal patterning allow us to identify the similarities between non-semantic (nonhuman) and semantic (human) combinatoric systems (Lipkind et al., 2013; Sainburg et al., 2019). The search for recursion can, hence, be made in the absence of lexical items, semantics, or syntax. Third, no prior assumptions are required about signal function. Under any evolutionary scenario, including punctuated hypotheses, ancestral signal function (whether cooperative, competitive, or otherwise) is expected to have derived or been leveraged by its proto-recursive structure. Otherwise, once present, recursion would not have been fixated among human ancestral populations. Accordingly, a structural approach opens the field to potentially untapped signal diversity in nature and yet unrecognized bona fide combinatoric possibilities within the human clade.

Exploring recursive combinatorics in a wild great ape

Here, we undertake an explorative but direct structural approach to recursion. We provide evidence for recursive self-embedded vocal patterning in a (nonhuman) great ape, namely, in the long calls of flanged orangutan males in the wild. We conducted precise rhythm analyses (De Gregorio et al., 2021; Roeske et al., 2020) of 66 long call audio recordings produced by 10 orangutans (Pongo pygmaeus wurmbii) across approximately 2510 observation hours at Tuanan, Central Kalimantan, Indonesian Borneo. We identified five different element types that comprise the structural building blocks of long calls in the wild (Hardus et al., 2009; Lameira and Wich, 2008), of which the primary type are full pulses (Figure 1A). Full pulses do not, however, always exhibit uninterrupted vocal production throughout a long call (as during a long call’s climax; Spillmann et al., 2010) but can break up into four different ‘sub-pulse’ element types: (i) grumble sub-pulses (quick succession of staccato calls that typically constitute the first build-up pulses of long calls; Hardus et al., 2009), (ii) sub-pulse transitory elements and (iii) pulse bodies (typically constituting pulses before and/or after climax pulses), and (iv) bubble sub-pulses (quick succession of staccato calls that typically constitute the last tail-off pulses of long calls) (Figure 1A). We characterized long calls’ full- and sub-pulses’ rhythmicity to determine whether orangutan long calls present a reiterated structure across different hierarchical strata. We extracted inter-onset intervals (IOIs; i.e., time difference between the start of a vocal element and the preceding one – tk) from 8993 vocal long call elements (Figure 1A): 1930 full pulses (1916 after filtering for 0.025 < tk < 5 s), 757 grumble sub-pulses (731), 1068 sub-pulse transitory elements (374), 816 pulse bodies (11), and 4422 bubble sub-pulses (4193). From the extracted IOIs, we calculated their rhythmic ratio by dividing each IOI by its duration plus the duration of the following interval. We then computed the distribution of these ratios to ascertain whether the rhythm of long call full and sub-pulses presented natural categories, following published protocols (De Gregorio et al., 2021; Roeske et al., 2020; Figure 1B–D).

Organization and rhythmic features of orangutans’ long calls.

(A) Top: the spectrogram of a full pulse and its organization in sub-pulses (e.g., grumble sub-pulses). Below are the spectrograms of the three other sub-element types: sub-pulse transitory elements, pulse bodies, and bubble sub-pulses. Bars on the top of each spectrogram schematically quantify the durations of inter-onset intervals (tk): dark green denotes the higher level of organization (full pulse). Orange (in the inset) and light green (bottom right) denote the lower-level organization (sub-pulse element types). (B) Probability density function showing the distributions of the inter-onset intervals (tk) for each of the long call element types. (C) The distributions on the left show rhythm ratios (rk) per element type as calculated on 12 flanged males for a total of 1915 full pulses and 5309 sub-pulses. Solid sections of the curves indicate on-isochrony rk values; striped sections indicate off-isochrony rk values. A solid white line indicates the 0.5 rk value corresponding to isochrony. White dotted lines denote the on-isochrony peak value extracted from the probability density function. Right: a bar plot per each element type shows the percentage of observations (rk) falling into the on-isochrony boundaries (solid bars) or on off-isochrony boundaries (striped bars). The number of on-isochrony rk is significantly larger (GLMM, full vs null: Chisq = 2717.543, p<0.001) than the number of off-isochrony rk for all long call element types (full pulse: t-ratio = −25.164, p<0.001; bubble sub-pulse: t-ratio = –30.694, p<0.001; grumble sub-pulse: t-ratio = −14.526, p<0.001; sub-pulse transitory element: t-ratio = −3.148, p<0.001). Pulse body showed no rk values falling within the on-off-isochrony boundaries. (D) Distribution of a variable calculated as the ratio between the tk of a sub-pulse and the tk of the corresponding higher level of organization, the full pulse. We report the peak value of the curve (0.046) and tested the significance of the extent of the central quartiles, which was significantly smaller than peripheral quartiles (Wilcoxon signed-rank test: W = 2272, p<0.001).


The density probability function of orangutan full pulses showed one peak (rk = 0.493) in close vicinity to a theoretically pure isochronic rhythm, that is, full pulses were regularly paced at a 1:1 ratio, following a constant tempo along the long call (Figure 1C). Our model (GLMM, full model vs null model: Chisq = 298.2876, df = 7, p<0.001; see Supplementary file 1) showed that pulse type, range of the curve (on-off-isochrony), and their interaction had a significant effect on the count of rk values. In particular, full pulses’ isochronous peak tested significant (t.ratio = −15.957, p<0.0001), that is, the number of rk values falling inside the on-isochrony range was significantly higher than the number of rks falling inside the off-isochrony range (Figure 1C). Critically, three (of the four) orangutan sub-pulse element types – grumble sub-pulses, sub-pulse transitory elements, and bubble sub-pulses – also showed significant peaks (grumble sub-pulses: t.ratio = –5.940, p<0.0001; sub-pulse transitory elements: t.ratio = −4.048, p=0.0001; bubble sub-pulses: t.ratio = –10.640, p<0.0001) around pure isochrony (peak rk: grumble sub-pulses = 0.501; sub-pulse transitory elements = 0.495; bubble sub-pulses = 0.502; Figure 1C). That is, sub-pulses were regularly paced within regularly paced full pulses, denoting isochrony within isochrony (Figure 1C) at different average tempi [mean tk (sd): full pulses = 1.696 (0.508); grumble sub-pulses = 0.118 (0.111); sub-pulse transitory elements = 0.239 (0.468); bubble sub-pulses = 0.186 (0.292); Figure 1B]. Overall, sub-pulses’ tk was equivalent to 0.046 of their comprising full pulses (Figure 1D), which puts sub-pulses at an approximate ratio of 1:22 relative to that of full pulses, the smallest categorical temporal rhythmic interval registered thus far in a vertebrate (De Gregorio et al., 2021; Roeske et al., 2020).

Permuted discriminant function analyses (Mundry and Sommer, 2007) (crossed, in order to control for individual variation) in R (R Development Core Team, 2013) based on seven acoustic measures extracted from grumble, transitory elements, and bubble sub-pulses confirmed that these represented indeed acoustically distinct sub-pulse categories, where the percentage of correctly classified selected cases (62.7%) was significantly higher (p=0.001) than expected (37%).


Rhythmic analyses of orangutan long calls reveal the presence of self-embedded isochrony in the vocal combinatorics of a wild great ape. Notably, we found that wild orangutan long calls exhibit two discernible structural strata – the full- and sub-pulse level – and three non-exclusive nested motifs in the form of [isochronyA [isochronya,b,c]](Figure 2).

Isochrony nested within isochrony.

Three acoustically distinct sub-pulse calls occurring at three distinct tempi nested within the same pulse-level tempo in wild flanged male orangutan long calls.

This is fundamentally distinct from a simple repetition of calls or call isochrony – when a call repeats linearly at a constant interval – which are common features in some animal sound communication systems (De Gregorio et al., 2023). Instead, we demonstrate how a vocal element repeated at a constant interval is itself composed by (one of three possible) vocal elements that also repeat themselves at a constant interval of different tempi.

The orangutans’ production of recursive vocal motifs in the wild, and therefore, without training, is especially compelling in the context of the lab-based work that shows that nonhuman animals can learn recursion with training (Ferrigno et al., 2020; Gentner et al., 2006; Liao et al., 2022). Some aspects of these vocal combinatoric structures could be potentially learned as well (Lameira et al., 2022; Lameira et al., 2016; Lameira et al., 2015; Lameira and Shumaker, 2019b; Wich et al., 2012), but this study is agnostic on this matter because its design does not allow to single out learning effects. Nonetheless, results show that temporal recursion occurs spontaneously in the wild in great ape vocal communication.

Can great apes hear recursive isochrony?

The observation that the long calls of wild orangutans possess isochronous characteristics raises questions about the ability of apes to perceive these signals. Humans perceive an acoustic pulse as a continuous pitch, instead of a rhythm, at rates higher than 30 Hz (i.e., 30 beats per second). Human and nonhuman great apes have similar auditory capacities (Quam et al., 2015), and there are limited skeletal differences in inner ear anatomy to suggest significantly distinct sensitivity, resolution, or activation thresholds in the time domain (Quam et al., 2015; Spoor and Zonneveld, 1998; Stoessel et al., 2023). Long call sub-pulses exhibited average rhythms at ~9.263 (sd: 3.994) Hz [i.e., tk = 0.184 (0.303) s]. Therefore, ear anatomy offers confidence that orangutans (and other great apes), like humans, perceive sub-pulse rhythmic motifs at these rates as such, that is, a train of signals, instead of one uninterrupted signal. Assuming otherwise would imply that auditory time resolution differs by more than one order of magnitude between humans and other great apes in the absence of obvious anatomical culprits.

Can physiology fully explain recursive isochrony?

The occurrence of three non-exclusive recursive patterns (i.e., three acoustically distinct sub-pulse calls occurring at three distinct tempi nested within the same pulse-level tempo) substantially decreases the probability that recursion was the by-product of anatomic constraints, such as vocal fold oscillation, breath length, heartbeat, and other physiological processes or movements (Pouw et al., 2020). Such processes can generate frequency patterns nested within others; however, in these cases, sub-frequencies occur in the form of harmonics related to the reference (dominant) frequency and to each other by small whole-numbered multiples. Yet, the three observed rhythmic arrangements at the sub-pulse level were not related to the pulse level by any small integer ratios (i.e., 1/22). Also, some of these processes (e.g., vocal fold action) are oscillatory in nature, involving nested frequency waves. They are not combinatorial, involving nested sequences of events, as we report here.

Our data stimulate new questions about the relationship between oscillators and combinatoriality, which is difficult to investigate from an observational point of view in the wild. Hopefully, our results will inspire new studies using controlled experimental settings to assess how oscillators and combinatoriality may be associated in ways potentially richer than thus far suspected. Together, our findings suggest that recursive isochrony is not the absolute result of raw mechanics but is instead likely generated or tampered with by, at least, one temporally recursive neuro-motor procedure.

Can a linear algorithm produce recursive isochrony?

The occurrence of three non-exclusive recursive patterns drives down the likelihood that orangutans concatenate long call pulses and sub-pulses in a linear fashion and without bringing into play a recursive neuro-motoric process. To generate the observed vocal motifs linearly, three independent neuro-computational procedures would need to run in parallel. These three independent procedures would need to be indistinguishable, transposable, and/or interchangeable at the pulse level, whilst generating distinct rhythms and acoustics at the sub-pulse level. If theoretically possible at all, one would predict some degree of interference between the three linear procedures at the pulse level, manifested in some form of deviation around the isochrony peak. However, this was not observed; distribution of data points on and off isochrony was equivalent between pulses and sub-pulses.

Precursor forms are not modern forms

Recursive self-embedded vocal motifs in orangutans indicate that vocal recursion among hominids is not exclusive to human vocal combinatorics, at least in the form of temporally embedded regular rhythms. This is not to suggest that orangutan recursive motifs exhibit all other properties that recursion exhibits in modern language-able humans, or that the two are the same, or equivalent. Further research will be necessary to fully unveil how orangutans use and control vocal recursion to form a clearer evolutionary picture. Expecting equivalence with language is, however, unwarranted as it would imply that no evolution would have occurred in over 10 million years since the split between the orangutan and human lineages. Any differences between our findings and recursion in today’s syntax, phonology, or music do not logically reject the possibility that recursive isochrony represents an ancient, and perhaps ancestral, state for the evolution of vocal recursion within the hominid family.

Implications for the evolution of recursion

Recursion and fractal phenomena are prevalent across the universe. Celestial and planetary movement, the splitting of tree branches, river deltas and arteries, and the morphology of bacteria colonies. Patterns within self-similar patterns are the norm, not the exception. This makes the seeming singularity of human recursion amongst animal vocal combinatorics all the more enigmatic. The discovery of recursive vocal patterns organized along two hierarchical temporal levels in a hominid besides humans suggests that ‘sequences within sequences’ may have been present in ancestral hominids, and hence, that second-order sequences may have predated the emergence of language in the human lineage.

Three major implications for the evolution of recursion in language apply. First, much ink has been laid on the topic. Yet, the possibility of self-embedded isochrony, or non-exclusive self-embedded patterns occurring within the same signal sequence, has on no account been formulated or conjectured as a possible state of recursive signaling, be it in vertebrates, mammals, primates, or otherwise, extant or extinct. This suggests that controversy may have been underscored by data-poor circumstances on vocal combinatorics in wild great apes, which only now start gathering comprehensive research effort (Bortolato et al., 2023a; Bortolato et al., 2023b; Girard-Buttoz et al., 2022; but see Lameira et al., 2013a). Resolution may come through a re-evaluation of previous studies with further related taxa and with experimental tests designed within a richer and more articulated panorama of observations on vocal combinatorics in wild great apes. Recursive vocal patterning in a wild great ape in the absence of syntax, semantics, phonology, or music opens a new charter for possible incipient and transitional states of recursion among hominids. The open discussion of what properties make a structure proto-recursive will be essential to move the state of knowledge past antithetical, dichotomous notions of how recursion and syntax evolved (Berwick and Chomsky, 2019; Martins and Boeckx, 2019).

Second, our findings invite renewed interest and reanalysis of primate vocal combinatorics in the wild (Gabrić, 2022; Girard-Buttoz et al., 2022; Leroux et al., 2023; Leroux et al., 2021). Given the dearth of such data, findings imply that it may be too hasty to discuss whether combinatorial capacities in primates or birds are equivalent to those engaged in syntax (Engesser et al., 2015; Watson et al., 2020) or phonology (Bowling and Fitch, 2015; Rawski et al., 2021). Such classifications may be putting the proverbial cart before the horse; they are based on untested assumptions that may not have applied to proto-recursive ancestors (Kershenbaum et al., 2014; Miyagawa, 2021), for example, that syntax and phonology evolved as separate ‘modules’, that one attained modern form before the other, or that they evolved in hominids regardless of whether consonant-like and vowel-like calls were present or not.

Third, given that isochrony universally governs music and that recursion is a feature of music, findings could suggest a possible evolutionary link between great ape loud calls and vocal music. Loud calling is an archetypal trait in primates (Wich and Nunn, 2002) and among ancient hominids it could have preceded, and subsequently transmuted, into modern recursive vocal structures in humans, as found today in the form of song and chants. Given their conspicuousness, loud calls represent one of the most studied aspects of primate vocal behavior (Wich and Nunn, 2002), but their rhythmic patterns have only recently started to be characterized with precision (Clink et al., 2020; De Gregorio et al., 2021; Gamba et al., 2016). Besides our analyses, there are remarkably few confirmed cases of vocal isochrony in great apes (but see Raimondi et al., 2023). The behaviors that have been rhythmically measured with accuracy have been implicated in the evolution of percussion (Fuhrmann et al., 2015) and musical expression (Dufour et al., 2015; Hattori and Tomonaga, 2020), such as social entrainment in chimpanzees in connection with the origin of dance (Lameira et al., 2019a) (a capacity once also assumed to be neurologically impossible in great apes; Fitch, 2017; Patel, 2014). This opens the intriguing, tentative possibility that recursive vocal combinatorics were first and foremost a feature of proto-musical expression in human ancestors, later recruited and ‘re-engineered’ for the generation of linguistic combinatorics. Future studies probing for recursive call sequence patterns in other orangutan vocal contexts and other great ape vocal behaviors could help test this idea.

Concluding remarks

The presence of temporally recursive vocal motifs in a wild great ape revolutionizes how we can approach the evolution of recursion along the human lineage beyond all-or-nothing accounts. Future studies on primate vocal combinatorics, particularly undertaking a structural approach and in the wild, offer promising new paths to empirically assess possible precursors and proto-states for the evolution of recursion within the hominid family, also adding temporal recursion as a new layer of analysis. These crucial data on the evolution of recursion, language, and cognition along the human lineage will materialize if, as stewards of our planetary co-habitants, humankind secures the survival of nonhuman primates and the preservation of their habitats in the wild (Estrada et al., 2022; Estrada et al., 2017; Laurance, 2013; Laurance et al., 2012).

Materials and methods

Study site

Request a detailed protocol

We conducted our research at the Tuanan Research Station (2°09′S; 114°26′E), Central Kalimantan, Indonesia. Long calls were opportunistically recorded from identified flanged males (P. pygmaeus wurmbii) using a Marantz Analogue Recorder PMD222 in combination with a Sennheiser Microphone ME 64 or a Sony Digital Recorder TCD-D100 in combination with a Sony Microphone ECM-M907.

Acoustic data extraction

Request a detailed protocol

Audio recordings were transferred to a computer with a sampling rate of 44.1 kHz. Seven acoustic measures were extracted directly from the spectrogram window (window type: Hann; 3 dB filter bandwidth: 124 Hz; grid frequency resolution: 2.69 Hz; grid time resolution: 256 samples) by manually drawing a selection encompassing the complete long call (sub)pulse from onset to offset using Raven interactive sound analysis software (version 1.5, Cornell Lab of Ornithology). These parameters were duration (s), peak frequency (Hz), peak time, peak frequency contour average slope (Hz), peak frequency contour maximum slope (Hz), average entropy (Hz), and signal-to-noise ratio (NIST quick method). Please see the software’s documentation for a full description of the parameters. Acoustic data extraction complemented the classification of long calls elements, both at the pulse and sub-pulse levels, based on close visual and auditory inspection of spectrograms, both based on elements’ distinctiveness between each other as well as in relation to the remaining cataloged orangutan call repertoire (Hardus et al., 2009; see also Supplementary files 2–5). Of these parameters, duration and peak frequency in particular have been shown to be resilient across recording settings (Lameira et al., 2013b) and adequately represent variation in the time and frequency axes (Lameira et al., 2017b).

Rhythm data analyses

Request a detailed protocol

IOIs (IOI’s = tk) were only calculated from the begin time (s) of each full- and sub-pulse long call elements using Raven interactive sound analysis software, as above explained. tk was calculated only from subsequent (full/sub) pulse elements of the same type. Ratio values (rk) were calculated as tk/(tk + tk+1). Following the methodology of Roeske et al., 2020 and De Gregorio et al., 2021, to assess the significance of the peaks around isochrony (corresponding to the 0.5 rk value), we counted the number of rks falling inside the on-isochrony ranges (0.440 < rk < 0.555) and off-isochrony ranges (0.400 < rk < 0.440 and 0.555 < rk < 0.600), symmetrically falling at the right and left sides of 1:1 ratios (0.5 rk value). We tested the count of on-isochrony rks versus the count of off-isochrony rks, per pulse type, with a GLMM for negative-binomial family distributions, using glmmTMB R library. In particular, we built a full model with the count of rk values as the response variable, the pulse type in interaction with the range the observation fell in (on- or off- isochrony) as predictors. We added an offset weighting the rk count based on the width of the bin. The individual contribution was set as random factor. We built a null model comprising only the offset and the random intercepts. We checked the number of residuals of the full and null models, and compared the two models with a likelihood ratio test (ANOVA with ‘Chisq’ argument). We calculated p-values for each predictor using the R summary function and performed pairwise comparisons for each level of the explanatory variables with emmeans R package, adjusting all p-values with Bonferroni correction. We checked normality, homogeneity (via function provided by R. Mundry), and number of the residuals. We checked for overdispersion with performance R package (Lüdecke et al., 2021). Graphic visualization was prepared using R (R Development Core Team, 2013) packages ggplot2 (Wickham, 2009) and ggridges (Wilke, 2022). Data reshape and organization were managed with dplyr and tidyr R packages.

Acoustic data analyses

Request a detailed protocol

Permutated discriminant function analysis with cross-classification was performed using R and a function provided by R. Mundry (Mundry and Sommer, 2007). The script was pdfa.res=pDFA.crossed (test.fac="Sub-pulse-type", contr.fac="Individual.ID", variables=c("Delta.Time", "Peak.Freq", "Peak.Time", "PFC.Avg.Slope", "PFC.Max.Slope", "Avg.Entropy", "SNR.NIST.Quick"),, n.sel=100, n.perm=1000, These analyses assured that long call elements, at the pulse and sub-pulse levels, indeed represented biologically distinct categories.

Data availability

Raw data available at:

The following data sets were generated
    1. Lameira A
    (2023) Open Science Framework
    ID w3ne5. Recursive self-embedded vocal motifs in wild orangutans.


  1. Book
    1. Barış K
    2. Revithiadou A
    (2009) An interface approach to prosodic word Recursion in
    In: Grijzenhout J, Baris Kabak, editors. Phonological Domains, Interface Explorations. Mouton de Gruyter. pp. 105–134.
  2. Book
    1. Chomsky N
    (2010) Some simple Evo Devo theses: how true might they be for language
    In: Larson RK, Deprez V, Yamakido H, editors. The Evolution of Human Language. Cambridge University Press. pp. 45–62.
    1. Hockett CF
    The origin of speech
    Scientific American 203:89–96.
  3. Book
    1. Idsardi WJ
    2. Gallego ÁJ
    3. Martin R
    (2018) Why is phonology different?
    In: Gallego ÁJ, Martin R, editors. Language, Syntax, and the Natural Sciences. Cambridge University Press. pp. 212–223.
    1. Laurance WF
    2. Useche DC
    3. Rendeiro J
    4. Kalka M
    5. Bradshaw CJA
    6. Sloan SP
    7. Laurance SG
    8. Campbell M
    9. Abernethy K
    10. Alvarez P
    11. Arroyo-Rodriguez V
    12. Ashton P
    13. Benítez-Malvido J
    14. Blom A
    15. Bobo KS
    16. Cannon CH
    17. Cao M
    18. Carroll R
    19. Chapman C
    20. Coates R
    21. Cords M
    22. Danielsen F
    23. De Dijn B
    24. Dinerstein E
    25. Donnelly MA
    26. Edwards D
    27. Edwards F
    28. Farwig N
    29. Fashing P
    30. Forget PM
    31. Foster M
    32. Gale G
    33. Harris D
    34. Harrison R
    35. Hart J
    36. Karpanty S
    37. Kress WJ
    38. Krishnaswamy J
    39. Logsdon W
    40. Lovett J
    41. Magnusson W
    42. Maisels F
    43. Marshall AR
    44. McClearn D
    45. Mudappa D
    46. Nielsen MR
    47. Pearson R
    48. Pitman N
    49. van der Ploeg J
    50. Plumptre A
    51. Poulsen J
    52. Quesada M
    53. Rainey H
    54. Robinson D
    55. Roetgers C
    56. Rovero F
    57. Scatena F
    58. Schulze C
    59. Sheil D
    60. Struhsaker T
    61. Terborgh J
    62. Thomas D
    63. Timm R
    64. Urbina-Cardona JN
    65. Vasudevan K
    66. Wright SJ
    67. Arias-G JC
    68. Arroyo L
    69. Ashton M
    70. Auzel P
    71. Babaasa D
    72. Babweteera F
    73. Baker P
    74. Banki O
    75. Bass M
    76. Bila-Isia I
    77. Blake S
    78. Brockelman W
    79. Brokaw N
    80. Brühl CA
    81. Bunyavejchewin S
    82. Chao JT
    83. Chave J
    84. Chellam R
    85. Clark CJ
    86. Clavijo J
    87. Congdon R
    88. Corlett R
    89. Dattaraja HS
    90. Dave C
    91. Davies G
    92. Beisiegel BM
    93. da Silva RN
    94. Di Fiore A
    95. Diesmos A
    96. Dirzo R
    97. Doran-Sheehy D
    98. Eaton M
    99. Emmons L
    100. Estrada A
    101. Ewango C
    102. Fedigan L
    103. Feer F
    104. Fruth B
    105. Willis JG
    106. Goodale U
    107. Goodman S
    108. Guix JC
    109. Guthiga P
    110. Haber W
    111. Hamer K
    112. Herbinger I
    113. Hill J
    114. Huang Z
    115. Sun IF
    116. Ickes K
    117. Itoh A
    118. Ivanauskas N
    119. Jackes B
    120. Janovec J
    121. Janzen D
    122. Jiangming M
    123. Jin C
    124. Jones T
    125. Justiniano H
    126. Kalko E
    127. Kasangaki A
    128. Killeen T
    129. King H
    130. Klop E
    131. Knott C
    132. Koné I
    133. Kudavidanage E
    134. Ribeiro JS
    135. Lattke J
    136. Laval R
    137. Lawton R
    138. Leal M
    139. Leighton M
    140. Lentino M
    141. Leonel C
    142. Lindsell J
    143. Ling-Ling L
    144. Linsenmair KE
    145. Losos E
    146. Lugo A
    147. Lwanga J
    148. Mack AL
    149. Martins M
    150. McGraw WS
    151. McNab R
    152. Montag L
    153. Thompson JM
    154. Nabe-Nielsen J
    155. Nakagawa M
    156. Nepal S
    157. Norconk M
    158. Novotny V
    159. O’Donnell S
    160. Opiang M
    161. Ouboter P
    162. Parker K
    163. Parthasarathy N
    164. Pisciotta K
    165. Prawiradilaga D
    166. Pringle C
    167. Rajathurai S
    168. Reichard U
    169. Reinartz G
    170. Renton K
    171. Reynolds G
    172. Reynolds V
    173. Riley E
    174. Rödel MO
    175. Rothman J
    176. Round P
    177. Sakai S
    178. Sanaiotti T
    179. Savini T
    180. Schaab G
    181. Seidensticker J
    182. Siaka A
    183. Silman MR
    184. Smith TB
    185. de Almeida SS
    186. Sodhi N
    187. Stanford C
    188. Stewart K
    189. Stokes E
    190. Stoner KE
    191. Sukumar R
    192. Surbeck M
    193. Tobler M
    194. Tscharntke T
    195. Turkalo A
    196. Umapathy G
    197. van Weerd M
    198. Rivera JV
    199. Venkataraman M
    200. Venn L
    201. Verea C
    202. de Castilho CV
    203. Waltert M
    204. Wang B
    205. Watts D
    206. Weber W
    207. West P
    208. Whitacre D
    209. Whitney K
    210. Wilkie D
    211. Williams S
    212. Wright DD
    213. Wright P
    214. Xiankai L
    215. Yonzon P
    216. Zamzani F
    (2012) Averting biodiversity collapse in tropical forest protected areas
    Nature 489:290–294.
    1. Martins MD
    (2012) Distinctive signatures of recursion
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 367:2055–2064.
  4. Book
    1. Nasukawa K
    (2015) Recursion in the lexical structure of Morphemes in
    In: van Oostendorp M, van Riemsdijk H, editors. Representing Structure in Phonology and Syntax. DE GRUYTER. pp. 211–238.
  5. Software
    1. R Development Core Team
    (2013) R: A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.
  6. Conference
    1. Sharma N
    2. Chimalakonda S
    (2018) Learning Recursion from Music and Music from Recursion
    2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT). pp. 257–261.
  7. Book
    1. Vogel I
    (2012) Recursion in phonology
    In: Botma B, Noske R, editors. Phonological Explorations: Empirical, Theoretical and Diachronic Issues. Berlin, Boston: De Gruyter. pp. 41–62.
  8. Book
    1. von Humboldt W
    Über Die Verschiedenheit Des Menschlichen Sprachbaues Und Ihren Einfluss Auf Die Geristige Entwickelung Des Menschengeschlechts
    Königlichen Akademie der Wissenschaften.

Peer review

Reviewer #1 (Public Review):

This study investigates the structuring of long calls in orangutans. The authors demonstrate long calls are structured around full pulses, repeated following a regular tempo (isochronic rhythm). These full pulses are themselves structured around different sub-pulses, themselves repeated following an isochronic rhythm. The authors argue this patterning is evidence for self-embedded, recursive structuring in orangutang long calls.

The analyses conducted are robust and compelling and they support the rhythmicity the authors argue is present in the long calls. Furthermore, the authors went above and beyond and confirmed acoustically the sub-categories identified were accurate.

Reviewer #3 (Public Review):

Summary: This paper presents evidence of recursive self-embedding in the vocalization structure of orangutans, using fine-grained acoustical analysis. It proves the existence of isochrony nested in isochrony in the motifs produced by a nonhuman vocal system.

Strengths: Very clear written, clear analysis, excellent responses to the Reviewers.

Weaknesses: Jargonous language may be reduced. A video showing the sound as it unfolds and the spectrogram (as in Fig 1A) of the long call could be useful to best exemplify the results.

Author response

The following is the authors’ response to the original reviews.

Thank you very much for forwarding these two important reviews on our paper. Please find hereby our point-by-point responses addressing the ideas, arguments and points of concern raised by the reviewers. We provide explanation of how these points have been incorporated in the paper.

We feel the review process has been a useful exercise and that the paper has greatly benefited in terms of clarity and accessibility. It is our hope that our findings may ignite renewed interest on unexplored and “unexpected” aspects of great ape vocal communication, inspire novel research, and invite bold new advances on the long-standing puzzle of language origins and evolution.In several relevant sections, we have also sought to explicitly address the point of doubt raised in eLife’s editorial assessment, published alongside the reviewed preprint of our paper. The editorial assessment stated that “…However the evidence provided to support the major claims of the paper is currently incomplete. Specifically, it is not yet clear how the rhythmic structuring found in these long calls is more similar to human language recursion per se rather than isochrony as a broader, more common phenomenon.” To directly clarify this point, we provide now various examples of how recursion is distinct from repetition, using everyday objects for an intuitive understanding (e.g., lines 43-51). We have also expanded the discussion to better contextualise and clarify the implications of our findings on language evolution theory. We hope this will help addressing the implicit request for clarification in the previous editorial assessment.

Thank you very much for your kind and dedicated attention in the processing of our study.

Public Reviews:

Reviewer #1 (Public Review):

This study investigates the structuring of long calls in orangutans. The authors demonstrate long calls are structured around full pulses, repeated following a regular tempo (isochronic rhythm). These full pulses are themselves structured around different sub-pulses, themselves repeated following an isochronic rhythm. The authors argue this patterning is evidence for self-embedded, recursive structuring in orangutang long calls.

The analyses conducted are robust and compelling and they support the rhythmicity the authors argue is present in the long calls. Furthermore, the authors went above and beyond and confirmed acoustically the sub-categories identified were accurate.

We thank the reviewer for this important support regarding our methods and findings.

However, I believe the manuscript would benefit from a formal analysis of the specific recursive patterning occurring in the long call. Indeed, as of now, it is difficult for the reader to identify what the authors argue to be recursion and distinguish it from simple repetitions of motifs, which is essential.

We agree with the reviewer that the distinction between repetition and recursion is very important for the adequate interpretation of our findings. Following the reviewer’s point (and the Editorial Assessement), we have now rephrased several passages in the initial paragraph of the paper for added clarity, where recursion is introduced and explained. We now also provide various new examples of recursion in everyday life and popular culture to better illustrate in an easy and accessible way the fundamental nature of recursion. We then use two of these common examples (computer folders and Russian dolls) to specifically distinguish repetition from recursion.

Although the authors already discuss briefly why linear patterning is unlikely, the reader would benefit from expanding on this discussion section and clarifying the argument here (a lay terminology might help).

Corrected accordingly.

I believe an illustration here might help. In the same logic, I believe a tree similar to the trees used in linguistics to illustrate hierarchical structuring would help the reader understand the recursive patterning in place here. This would also help get the "big picture", as Fig 1A is depicting a frustratingly small portion of the long call.

We completely understand the reviewer’s concern here. As proposed by the reviewer, and in addition to changes in the Introduction (see above) and Discussion (see below), we have now added a new figure in the Discussion to help the reader get the “big picture” of our findings.

We have also made revisions throughout the Introduction and Discussion to simplify the text, clarify our exposition and facilitate the reader better and intuitively understand the nature and relevance of our results.

Notwithstanding these comments, this paper would provide crucial evidence for recursion in the vocal production of a non-human ape species. The implication it would have would represent a key shift in the field of language evolution. The study is very elegant and well-constructed. The paper is extremely well written, and the point of view adopted is original, well-argued and compelling.

We are humbled by the reviewer’s words, and we thank the reviewer for attributing these qualities to our paper. This feedback reassures us of the disruptive potential that these and similar future findings may have on our understanding of language evolution.

Reviewer #2 (Public Review):

I am not qualified to judge the narrow claim that certain units of the long calls are isochronous at various levels of the pulse hierarchy. I will assume that the modelling was done properly. I can however say that the broad claims that (i) this constitutes evidence for recursion in non-human primates, (ii) this sheds light on the evolution of recursion and/or language in humans are, when not made trivially true by a semantic shift, unsupported by the narrow claims. In addition, this paper contains errors in the interpretation of previous literature.

We report the first confirmed case of “vocal sequences within vocal sequences” in a wild nonhuman primate, namely a great ape. The currently prevailing models of language evolution often rest on the (purely theorical) premise that such structures do not exist in any animal bar humans. We find the discovery of such structures in a wild great ape exciting, remarkable, and promising. We regret that the reviewer does not share this sentiment with us. We feel that the statement that these findings are trivial and narrow is unfounded.

In order to clarify and better communicate the significance of our findings, we now explain in more detail in the Introduction and Discussion how the discovery of nested isochrony in wild orangutans promises to stimulate new series of studies in nature and captivity. Our findings dovetail nicely with previous captive studies that have shown that animals can learn how to recognise recursive patterns and invite new research efforts for the investigation of recursive abilities in the wild and in the absence of human priming and in nonhuman primates.

The main difficulty when making claims about recursion is to understand precisely what is meant by "recursion" (arguably a broader problem with the literature that the authors engage with). The authors offer some characterization of the concept which is vague enough that it can include anything from "celestial and planetary movement to the splitting of tree branches and river deltas, and the morphology of bacteria colonies". With this appropriately broad understanding, the authors are able to show "recursion" in orangutans' long calls. But they are, in fact, able to find it everywhere.

The reviewer is correct in highlighting that recursion is ubiquitous in nature and this is something that we explicitly state in the paper. This only makes it the more surprising that, when it comes to vocal combinatorics, recursion has only been described in human language and music, but in no other animals. If studies providing such evidence are known to reviewer, we kindly request their corresponding references.

In the new revised version, we have paid attention to this aspect raised by the reviewer, and we have sought to disambiguate that our observations pertain to temporal recursion. This clarification will hopefully allow a better understanding of our results.

The sound of a plucked guitar string, which is a sum of self-similar periodic patterns, count as recursive under their definition as well.

The example pointed out here by reviewer is factually correct; sound harmonics represent a recursive pattern of a fundamental frequency. (In fact, we explain this phenomenon in the Discussion.) The reviewer’s comment seems to offer an analogy to oscillatory phenomena in the physiology of the vocal folds, and so, it is misplaced with regards to our present study, which focused vocal sequences. Admittedly, this misinterpretation may have been implicitly caused by our wording and we apologise for this. We now refer to “vocal combinatorics” instead of “vocal production” throughout the paper to avoid the reader considering that our findings pertain to the physiology of the vocal folds.

One can only pick one's definition of recursion, within the context of the question of interest: evolution of language in humans. One must try to name a property which is somewhat specific to human language, and not a ubiquitous feature of the universe we live in, like self-similarity. Only after having carved out a sufficiently distinctive feature of human language, can we start the work of trying to find it in a related species and tracing its evolutionary history. When linguists speak of recursion, they speak of in principle unbounded nested structure (as in e.g., "the doctor's mother's mother's mother's mother ..."). The author seems to acknowledge this in the first line of the introduction: "the capacity to iterate a signal within a self-similar signal" (emphasis added). In formal language theory, which provides a formal and precise definition of one notion of recursivity appropriate for human language, unbounded iteration makes a critical difference: bounded "nested structures" are regular (can be parsed and generated using finite-state machines), unbounded ones are (often) context-free (require more sophisticated automaton). The hierarchy of pulses and sub-pulses only has a fixed amount of layers, moreover the same in all productions; it does not "iterate".

The reviewer explains here how recursion, in its fully fledged form in modern language(s), is defined by linguistics. We fully agree and do not contest such descriptions and definitions in any way. These descriptions and definitions aim to describe how recursion operates today, not how it evolved. Nor do these descriptions and definitions generate data-driven, testable predictions about precursors or proto-states of recursion as used by modern language-able humans. This is scientifically problematic and heuristically unsatisfying regarding the open question of language evolution.

Following human-specific definitions for recursion, as proposed by the reviewer, cannot per se be used to undertake a comparative approach to evolution because they leave nothing to compare recursion with in other (wild) species. Using human-specific definitions unavoidably leads to black-and-white notions that language is always absolutely present in humans and always absolutely absent in other animals, regardless of their degree of relatedness to humans. It is unpreventable that these descriptions flout foundational principles of evolution, such as descent with modification and shared ancestry.

This conceptual problem is not new. Less than a century ago, it was believed that humans were the only tool-user (thousands of examples are known today in nonhuman animals, including fish and invertebrates), and later, that humans were the only cultural animal (today it is known that migrating caribou and fruit flies can establish traditions based on social learning). We must follow in the footsteps of those who have helped redefine human nature in the past. As famously stated by Louis Leakey when presented with evidence for chimpanzee tool-use collected by Jane Goodall, “Now we must redefine tool, redefine man, or accept chimpanzees as human”. Therefore, as a matter of course, we must redefine recursion, embracing empirically (other than purely theoretically) definitions that allow recursion to take on forms and functions different from that of modern language-able humans.

Another point is that the authors don't show that the constraints that govern the shape of orangutans long calls are due to cognitive processes.

The reviewer is indeed correct. This does not, however, refute our findings. We do not directly show that cognitive processes govern recursion in orangutan long calls. Instead, we show that the observed patterns cannot be explained by simple bodily or motoric processes, excluding therefore low-level explanations. With more than 50 years of accumulated field experience in primatology, this was the only possible way that our team found to go about conducting research and analyses on natural behaviour, in the wild, with a critically endangered primate. We would be very interested in learning from the reviewer what ethical and non-invasive methods, specific locations in the wild, and type of behavioural or socio-ecological data could be otherwise viably used to demonstrate what the reviewer requests. If other scientists believe that the patterns observed in wild orangutan long calls – three independent, but simultaneously-occurring recursive motifs – can be generated based on low-level physiological mechanisms alone, the burden of proof resides with them.

Any oscillating system will, by definition, exhibit isochrony.

We disagree with this statement. The example provided above by the reviewer him/her-self disproves the statement: a guitar string when struck is an oscillating system but it is not isochronic nor is it combinatorial. Isochrony cannot be established with single events, only with event sequences (in practice, ideally >3).

For instance, human trills produce isochronouns or near isochronous pulses. No cognitive process is needed to explain this; this is merely the physics of the articulators. Do we know that the rhythm of the pulses and sub-pulses in orangutans is dictated by cognition as opposed to the physics of the articulators?

The reviewer seems to misinterpret our results here. Our focus is on vocal combinatorics, not vocal fold oscillation (see previous response). We have now reworded all instances where the text could be unclear.

Even granting the authors' unjustified conclusion that wild orangutans have "recursive" structures and that these are the result of cognition, the conclusions drawn by the authors are too often fantastic leaps of induction. Here is a cherry-picked list of some of the far-fetched conclusions:

  • "our findings indicate that ancient vocal patterns organized across nested structural strata were likely present in ancestral hominids". Does finding "vocal patterns organized across nested structural strata" in wild orangutans suggest that the same were present in ancestral hominids?

Following the reviewer’s comment, we have now rephrased and toned down this passage, stating that such structures “may have been present” in ancestral hominids. We are grateful to the reviewer for this comment.

  • "given that isochrony universally governs music and that recursion is a feature of music, findings (sic.) suggest a possible evolutionary link between great ape loud calls and vocal music". Isochrony is also a feature of the noise produced by cicadas. Does this suggest an evolutionary link between vocal music and the noise of cicadas?

We apologise, but it is unclear what the reviewer is exactly suggesting or proposing here. It seems as though it is believed that cicadas are as phylogenetically related to humans as great apes are. Our last common ancestor with great apes diverged about 10mya, but with cicadas 600mya. The last common ancestor with great apes was a great ape (or hominid). The human-cicada last common ancestor would have looked like a worm (it is probable it would already have a nervous bulge at the head, or “brain”). In order to avoid similar misinterpretations, we have now clarified in several instances that our study and interpretation of results are based on shared ancestry within the Hominid family.

It seems that the reviewer may be also misinterpreting our findings. We do not simply report isochrony in a wild great ape (multiple references for isochronous calls in primate are provided in the Discussion). We report isochrony within isochrony in three non-exclusive rhythmic arrangements. In case the reviewer knows of a study on cicadas, or any non-human species, showing recursive sound combinatorics of this nature, we kindly request the citation. We can only hope that such new cases may be gradually unveiled in wild animals to help propel our general understanding of possible ways of how insipient recursive vocal combinatorics in ancient hominids could have given rise to recursion as used today by language-able modern humans.

Finally, some passages also reveal quite glaring misunderstandings of the cited literature. For instance:

  • "Therefore, the search for recursion can be made in the absence of meaning-base operations, such as Merge, and more generally, semantics and syntax". It is precisely Chomsky's (disputable) opinion that the main operation that govern syntax, Merge, has nothing to do with semantics. The latter is dealt within a putative conceptual-intentional performance system (in Chomsky's terminology), which is governed by different operations.

Following the reviewer’s comment, we have now removed “meaning-base operations, such as Merge, and more generally” from the target sentence in order to avoid confusion. Thank you.

  • "Namely, experimental stimuli have consisted of artificial recursive signal sequences organized along a single temporal scale (though not structurally linear), similarly with how Merge and syntax operate". The minimalist view advocated by Chomsky assumes that mapping a hierarchichal structure to a linear order (a process called linearizarion) is part of the articulatory-perceptual system. This system is likewise not governed by Merge and is not part of "syntax" as conceived by the Chomskyan minimalists.

Following the reviewer’s comment, we have not omitted the target sentence for added clarity.

Reviewer #1 (Recommendations For The Authors):

L55-67: I feel there is a step missing in the logic of the argumentation here.The studies cited by the authors here are mostly about syntactic-like structuring but not recursion. Hence when the authors mention in the next sentence that these studies investigate the perception of recursive signalling, it seems incorrect. I agree with the logic, but the references do not seem appropriate. I would further suggest that if there are no other references, that would make the introduction of the study here even easier: there is very little work investigating this capacity in non-human animals, let alone on a production perspective, therefore, the study conducted here is paramount and fills this important gap in the literature.

We are grateful to Reviewer #1 for these comments, and we are honoured to hear that our findings are filling a literature gap. We have now carefully revised the manuscript, hopefully, streamlining our line of reasoning and improving the paper’s overall readability. We agree that there is very little work investigating the spontaneous “production” of recursion in nonhuman animals. We decided to better detail the logic of our paper by clarifying the difference between recursion and repetition and clarifying that the motifs that we identify in wild orangutan represent a case of "temporal recursion".

L59: Johan J should be removed (same in discussion).

Removed, thanks.

L60: For example is repeated twice, here and L55.

We have rephrased this part of the manuscript, thanks.

L72-73: If we consider the Watson et al., 2020 study an example of recursive perception (which I do not think is true), this was conducted using a passive design - i.e. with no active training.

We have rephrased this part of the manuscript, thanks.

L240-241: Again, non-adjacent dependency processing does not equal recursion.

We agree that non-adjacent dependency processing does not equal recursion. We have now clarified this section accordingly.

L269: one of the most.

Corrected, thanks.

L296: add space after settings.

Corrected, thanks.

Reviewer #2 (Recommendations For The Authors):

In addition to the public portion of the review, I advise the authors' to substantially alter their style of writing. The language used is not accurate and the intended meaning is often not clear. This makes it hard for any reader to follow the authors' reasoning fully. Below I list only a few of the egregious examples but the examples abound:

  • "this hints at a neuro-cognitive or neuro-computational transformation in the human brain" what meaning do the author assign to "neuro-cognitive" and "neuro-computational" ? what difference do they place between the two (so that they would be disjoined.) ? What "transformation" are we talking about ? From what to what ?

  • " However, recursive signal structures can also unfold in other manners, such as across nested temporal scales and in the absence of semantics (Fitch, 2017a), as in music." what is meant here by nested temporal scales ?

  • "The simultaneous occurrence of non-exclusive recursive patterns excludes the likelihood that orangutans concatenate long calls and their subunits in linear structure without any recursive processes": isn't there a more straightforward way to say "excludes the likelihood"? What is meant by "non-exclusive recursive patterns"?

It seems that Reviewer #2 does not share our writing style. Nonetheless, we have tried to meet the reviewer halfway, clarifying throughout the new revised version our definitions, our line of argument, our motivations, our results, the context of our findings in what is known about recursion in animals, and the implication of our discovery for language evolution theory.

Article and author information

Author details

  1. Adriano R Lameira

    Department of Psychology, University of Warwick, Coventry, United Kingdom
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3071-4824
  2. Madeleine E Hardus

    Independent Researcher, Warwick, United Kingdom
    Resources, Data curation, Investigation, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Andrea Ravignani

    1. Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
    2. Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark
    3. Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
    Resources, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Teresa Raimondi

    Department of Life Sciences and Systems Biology, University of Turino, Torino, Italy
    Resources, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Marco Gamba

    Department of Life Sciences and Systems Biology, University of Turino, Torino, Italy
    Conceptualization, Resources, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9545-2242


UK Research and Innovation (MR/T04229X/1)

  • Adriano R Lameira

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


We thank the Indonesian Ministry of Research and Technology, the Indonesian Ministry of Environment and Forestry, the Indonesian Ministry of Home Affairs, the Directorate General of Natural Resources and Ecosystem Conservation, and the former Directorate General of Forest Protection and Nature Conservation for authorization to carry out research in Indonesia; the Universitas National for supporting the project and acting as sponsors and counter-partners; the Bornean Orangutan Survival Foundation and the MAWAS Programme in Palangkaraya for their support and permission to stay and work in the MAWAS Reserve. ARL was supported by the UK Research & Innovation, Future Leaders Fellowship grant agreement number MR/T04229X/1.


This study was performed in strict accordance with the Indonesia law and the recommendations of the Indonesian Institute of Sciences (LIPI), the Univesitas National (UNAS) and the Borneo Orangutan Survival Foundation (BOS).

Senior Editor

  1. George H Perry, Pennsylvania State University, United States

Reviewing Editor

  1. Ammie K Kalan, University of Victoria, Canada

Version history

  1. Preprint posted: April 14, 2023 (view preprint)
  2. Sent for peer review: April 14, 2023
  3. Preprint posted: July 12, 2023 (view preprint)
  4. Preprint posted: December 22, 2023 (view preprint)
  5. Version of Record published: January 22, 2024 (version 1)

Cite all versions

You can cite all versions using the DOI This DOI represents all versions, and will always resolve to the latest one.


© 2023, Lameira et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 717
    Page views
  • 86
  • 0

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Adriano R Lameira
  2. Madeleine E Hardus
  3. Andrea Ravignani
  4. Teresa Raimondi
  5. Marco Gamba
Recursive self-embedded vocal motifs in wild orangutans
eLife 12:RP88348.

Share this article

Further reading

    1. Computational and Systems Biology
    2. Physics of Living Systems
    Nicholas M Boffi, Yipei Guo ... Ariel Amir
    Research Article

    The adaptive dynamics of evolving microbial populations takes place on a complex fitness landscape generated by epistatic interactions. The population generically consists of multiple competing strains, a phenomenon known as clonal interference. Microscopic epistasis and clonal interference are central aspects of evolution in microbes, but their combined effects on the functional form of the population’s mean fitness are poorly understood. Here, we develop a computational method that resolves the full microscopic complexity of a simulated evolving population subject to a standard serial dilution protocol. Through extensive numerical experimentation, we find that stronger microscopic epistasis gives rise to fitness trajectories with slower growth independent of the number of competing strains, which we quantify with power-law fits and understand mechanistically via a random walk model that neglects dynamical correlations between genes. We show that increasing the level of clonal interference leads to fitness trajectories with faster growth (in functional form) without microscopic epistasis, but leaves the rate of growth invariant when epistasis is sufficiently strong, indicating that the role of clonal interference depends intimately on the underlying fitness landscape. The simulation package for this work may be found at

    1. Physics of Living Systems
    Nils Klughammer, Anders Barth ... Cees Dekker
    Research Article

    The nuclear pore complex (NPC) regulates the selective transport of large biomolecules through the nuclear envelope. As a model system for nuclear transport, we construct NPC mimics by functionalizing the pore walls of freestanding palladium zero-mode waveguides with the FG-nucleoporin Nsp1. This approach enables the measurement of single-molecule translocations through individual pores using optical detection. We probe the selectivity of Nsp1-coated pores by quantitatively comparing the translocation rates of the nuclear transport receptor Kap95 to the inert probe BSA over a wide range of pore sizes from 35 nm to 160 nm. Pores below 55 ± 5 nm show significant selectivity that gradually decreases for larger pores. This finding is corroborated by coarse-grained molecular dynamics simulations of the Nsp1 mesh within the pore, which suggest that leakage of BSA occurs by diffusion through transient openings within the dynamic mesh. Furthermore, we experimentally observe a modulation of the BSA permeation when varying the concentration of Kap95. The results demonstrate the potential of single-molecule fluorescence measurements on biomimetic NPCs to elucidate the principles of nuclear transport.