A meta-analysis of the association between male dimorphism and fitness outcomes in humans

  1. Linda H Lidborg  Is a corresponding author
  2. Catharine Penelope Cross
  3. Lynda G Boothroyd
  1. Department of Psychology, Durham University, United Kingdom
  2. School of Psychology and Neuroscience, University of St Andrews, United Kingdom

Abstract

Humans are sexually dimorphic: men and women differ in body build and composition, craniofacial structure, and voice pitch, likely mediated in part by developmental testosterone. Sexual selection hypotheses posit that, ancestrally, more ‘masculine’ men may have acquired more mates and/or sired more viable offspring. Thus far, however, evidence for either association is unclear. Here, we meta-analyze the relationships between six masculine traits and mating/reproductive outcomes (96 studies, 474 effects, N = 177,044). Voice pitch, height, and testosterone all predicted mating; however, strength/muscularity was the strongest and only consistent predictor of both mating and reproduction. Facial masculinity and digit ratios did not significantly predict either. There was no clear evidence for any effects of masculinity on offspring viability. Our findings support arguments that strength/muscularity may be sexually selected in humans, but cast doubt regarding selection for other forms of masculinity and highlight the need to increase tests of evolutionary hypotheses outside of industrialized populations.

Editor's evaluation

This paper presents a series of meta-analyses to test the plausibility of sexual selection hypotheses for the origins and/or maintenance of six sexually differentiated traits in humans, with strength/muscularity found to be significantly associated with both mating and reproduction. The authors considered both published and unpublished datasets to help account for potential publication biases that could theoretically impact meta-analysis results, which is a particular strength of this study.

https://doi.org/10.7554/eLife.65031.sa0

eLife digest

Many species show sexual dimorphism: traits that are different or more exaggerated in either females or males. These traits are often thought to have evolved because they increase an individual’s chances of producing offspring. While the evolution of male dimorphism – often referred to as masculinity – is generally well understood in many animal species, opinions differ as to whether such traits also increase male reproduction in humans.

Lidborg et al. tried to shed light on the evolution of masculine traits in human males (such as a more robust-looking facial structure, and increased strength and muscularity) by testing whether men with these traits reported having more sexual partners and/or whether they had more children compared to men in which these traits were not as extreme. To do so, Lidborg et al. compiled previously published data from populations all across the world and tested the associations between the traits and both partner numbers and reproduction.

The results showed that men who were physically stronger and more muscular reported having more sexual partners and, in societies that do not use contraception, these men also had more children than other men. Lidborg et al. also found that in industrialized societies, men who were taller, had a lower voice pitch and higher testosterone levels also reported more sexual partners, but they did not produce more offspring. Lastly, the analysis showed that men with more robust facial structures faces did not report having more partners or more children.

These findings suggest that traits such as strength and muscle mass in men may be favoured by evolution. Importantly, this seems to be the case across all societies from which Lidborg et al. analyzed data. The results also show that some of the traits Lidborg et al. tested – such as being tall – might increase the number of partners men in industrialized countries have, but not the number of children men in more traditional societies (such as hunter-gatherers) produce. This could be because women’s preferences for men’s traits differ between cultures.

Ultimately, Lidborg et al.’s analysis suggests that across different cultural contexts, only strength and muscularity truly do seem to matter for men’s mating and reproduction.

Introduction

Sexual dimorphism and masculinity in humans

Sexual dimorphism refers to sex differences in morphological and behavioral traits, excluding reproductive organs (Plavcan, 2001), with particular emphasis on traits thought to have evolved through sexual selection (Crook, 1972). Humans are a sexually dimorphic species (Plavcan, 2001). Sexual selection in mammalian species, including human and non-human primates, is commonly argued to have acted more strongly on male traits, as a consequence of greater variance in males’ reproductive output (Hammer et al., 2008) and a male-biased operational sex ratio, that is a surplus of reproductively available males relative to fertile females (e.g. Mitani et al., 1996).

Dimorphic traits that are exaggerated in males are typically referred to as masculine. In humans, masculine faces are characterized by features such as a pronounced brow ridge, a longer lower face, and wider mandibles, cheekbones, and chins (Swaddle and Reierson, 2002). Men are, on average, 7–8% taller than women (Gray and Wolfe, 1980) and weigh approximately 15% more (Smith and Jungers, 1997). Relative to this fairly modest body size dimorphism, upper body musculature and strength are highly dimorphic in humans: compared to women, men have 61% more overall muscle mass, and 90% greater upper body strength (Lassek and Gaulin, 2009). Men’s bodies also tend to have a V- or wedge-shape, showing a greater shoulder-to-hip ratio (Hughes and Gallup, 2003; Singh, 1993) and waist-to-chest ratio (Maisey et al., 1999; Weeden and Sabini, 2007) than women’s. Second-to-fourth finger (digit) length ratios are often claimed to be sexually dimorphic, with men’s 2D:4D typically being lower than women’s (Manning, 2002, although this may not be universal: Apicella et al., 2016). In addition, fundamental frequency, commonly referred to as voice pitch, is nearly six standard deviations lower in men than in women (Puts et al., 2012).

The development of these masculine traits in men is influenced by exposure to androgens, particularly testosterone. With the exception of 2D:4D, which is commonly claimed to be influenced primarily by prenatal testosterone levels and is present at birth (Galis et al., 2010; but see Richards et al., 2019), masculine traits generally develop or become exaggerated following a surge in testosterone production at sexual maturity (Butterfield et al., 2009; Fechner, 2003; Weston et al., 2007) – although it is not necessarily clear whether the size of that surge corresponds directly to the extent of trait expression.

Proposed mechanisms underlying the evolution of masculine traits

Key to the assumption that men’s masculine traits are sexually selected is that masculine traits should be reliably associated with greater biological fitness. Men may increase fitness by producing a greater quantity of offspring overall (i.e. greater fertility), by acquiring a greater number of partners which may in turn mediate offspring numbers (greater mating success), and/or by producing more surviving offspring (greater reproductive success).

Two key hypotheses and attendant mechanisms have been drawn on by evolutionary behavioral scientists, predicting positive associations between masculinity and fitness outcomes. Firstly, according to the immunocompetence handicap hypothesis (Folstad and Karter, 1992), masculine traits are a costly signal of heritable immunocompetence, that is good genetic quality, due to the putative immunosuppressive properties of testosterone (see Muehlenbein and Bribiescas, 2005). Masculine men should therefore produce healthier and more viable offspring, who are more likely to survive. Thus, women should be able to increase their fitness (via offspring survival) by selecting masculine men as mates. Authors therefore suggested that masculinity in men is intersexually selected, evolved and/or maintained through female choice, and should be associated with greater mating success in contexts where women are able to exercise choice. This should thus result in greater reproductive success, and an advantage in offspring survival.

The immunocompetence handicap hypothesis has persisted in the literature, particularly with reference to facial masculinity (although there are no a priori reasons to expect this putative mechanism to act more strongly on men’s faces than on their bodies), despite concerns regarding its validity since at least 2005 (Boothroyd et al., 2005). While beyond the scope of this article, common criticisms include that the relationship between testosterone and health is complex (Nowak et al., 2018), and facial masculinity is inconsistently linked to health (e.g. Boothroyd et al., 2013; Foo et al., 2020; Marcinkowska et al., 2019; Scott et al., 2013; Zaidi et al., 2019). Evidence is similarly mixed regarding the key assumption that women are attracted to masculinity in men’s faces (Boothroyd et al., 2013; Zeigler-Hill et al., 2015) and bodies (Frederick and Haselton, 2007; Gray and Frederick, 2012; Lukaszewski et al., 2014; Sell et al., 2017).

Secondly, under the male-male competition hypothesis, authors have argued that formidable (i.e. physically strong and imposing) men are better equipped to compete with other men for resources, status, and partners (Hill et al., 2016; Puts, 2016), through e.g. direct physical contests or by deterring rivals indirectly (Hill et al., 2016; Sell et al., 2012). For instance, increased musculature may intimidate competitors by signaling fighting prowess (Sell et al., 2009) and strength (Durkee et al., 2018), while facial masculinity and voice pitch may also have an indirect relationship with perceived formidability (Butovskaya et al., 2018; Haselhuhn et al., 2015; Raine et al., 2018; Little et al., 2015; Puts and Aung, 2019; Scott et al., 2014). Importantly, while male-male competition is often framed as an alternative to female choice, women may preferentially mate with both well-resourced men, and with competitive men, facilitating intersexual selection for masculinity (i.e. a ‘sexy sons’ effect, see Weatherhead and Robertson, 1979) where male status is due to, or competitiveness is cued by, formidability (Scott et al., 2013). Some authors have suggested that formidability increases men’s mating success through dominance over other men (which may create the circumstances that women select them as mates) rather than women’s direct preferences for formidable traits per se (Hill et al., 2013; Kordsmeyer et al., 2018; Slatcher et al., 2011). However, regardless of whether the driving mechanism is intra- or intersexual selection (or a combination thereof), the male-male competition hypothesis predicts that formidable men will acquire more partners over their lifetime, which will in turn result in more offspring. This approach, however, does not make any particular predictions regarding offspring health or survival.

It can be noted that proponents of both the immunocompetence and male-male competition hypotheses have also suggested that more masculine men may show reduced investment in romantic relationships and in offspring (Booth and Dabbs, 1993; Boothroyd et al., 2007; Muller et al., 2008; Schild et al., 2020), potentially suppressing offspring health/survival. This could arise from an association between circulating testosterone (which masculine traits are commonly argued to index) and motivation for sexual behavior (Grebe et al., 2019; Halpern et al., 1993) shifting effort away from parental investment toward pursuit of mating opportunities. Two important caveats here, however, are that the relationship between men’s testosterone levels in adolescence (when most masculine traits become exaggerated) and in adulthood is exceedingly weak (van Bokhoven et al., 2006), and masculine trait expression in adulthood is not consistently correlated with adult testosterone levels (e.g. Lefevre et al., 2013; Peters et al., 2008). Simply being more attractive to potential new partners, however, might shift behavior away from relationship investment (for discussion see e.g. Gangestad and Simpson, 2000). Because of this, many authors have previously suggested that women face a trade-off between the (health or competitive) benefits of masculinity, and paternal investment.

The association between masculine traits and biological fitness

We therefore have at least two theoretical positions which assert that masculine men should have greater numbers of sexual partners, greater offspring numbers, and perhaps a greater proportion of surviving offspring, in at least some circumstances. Studies addressing these predictions in societies without effective contraception have done so directly via offspring numbers and/or offspring survival. In most industrialized populations, where access to contraceptives attenuates the relationship between sexual behavior and reproductive success, mating success measures are often used instead. These include preferences for casual sex, number of sexual partners, and age at first sexual intercourse (earlier sexual activity allows for a greater lifetime number of sexual partners), as these are assumed to have correlated with reproductive success in men under ancestral conditions (Pérusse, 2010).

A key problem, however, is that the predictions outlined above do not always capture the diversity of human reproductive ecologies even where diverse data exists. We have already noted the fact that female choice may be important to outcomes above. Furthermore, even amongst non-contracepting populations, differences in rates of polygyny, pair-bond breakdown, and attitudes to fertility may moderate reproductive success and its variance. For instance, monogamous cultures do not typically show greater variance in men’s versus women’s reproductive success (Brown et al., 2009) and while increasing numbers of sexual partners (e.g. in serially monogamous or polygynous cultures) may often be important for increasing male reproductive success, the inverse is true amongst the Pimbwe where women are more advantaged by increased numbers of partners (Borgerhoff Mulder, 2009). Similarly, although the strongly monogamous Agta show high rates of fertility (Boothroyd et al., 2017), data from ostensibly non-contracepting rural Catholics in C20th Poland (Pawlowski et al., 2008) shows much lower rates of fertility. These issues highlight the fact that humans have likely had diverse reproductive and pair-bonding norms for a long time. As such we can make two observations. Firstly, availability of contraception in low-fertility samples might ‘free’ sexual behavior from the constraints of pregnancy avoidance, and we might find stronger relationships between any evolved motivation for sex, and actual sexual behavior, in these samples than would have necessarily been found ancestrally. Secondly, however, any adaptation which has been maintained across recent hominid lineages must have been adaptive on average across diverse reproductive ecologies. As such, if the proposed adaptation (masculinity leading to enhanced reproductive success via mating, and possibly increased offspring survival) exists, we should expect to see both: i. masculinity being associated with increased mating success in both high and (perhaps especially) low fertility populations, and ii. masculinity being on average positively associated with fertility, and potentially offspring survival, in non-contraception/high fertility populations.

Meta-analysis in sexual selection

Meta-analysis can be a valuable tool in understanding overall patterns in evolutionarily relevant traits, both across and within species. Jennions and colleagues (Jennions et al., 2012) noted that many traits hypothesized to predict male mating success had not been subject to meta-analysis, and further argued that while such meta-analyses can be valuable in clarifying the nature and extent of selection for some traits, at other times they act to refute prior assumptions. They say: “A general insight from sexual selection meta-analyses … is that it is easy to be misled by a few high-profile studies into believing that a prediction is well supported. Support is often weaker than assumed.” (p.1139). This point does not just apply to comparative research, but is relevant to human sexual selection work specifically. For instance, Van Dongen and Gangestad, 2011 found that evidence for health benefits of symmetry were weaker and harder to demonstrate meta-analytically than they would have supposed, given the size of the extant literature. Similarly, when two meta-analyses into the effects of menstrual cycle on women’s behavior, mate preferences, and attractiveness reached opposing conclusions (Gildersleeve et al., 2014a; Wood et al., 2014), the exercise suggested that some cycle effects were unlikely to be robust. Indeed, the more cautious analytical methods (e.g. treating unknown null results as zero rather than excluding them from analysis) resulted in a null overall effect – a finding that was later borne out by multiple large, pre-registered, studies (Jones et al., 2018; Jünger et al., 2018; Marcinkowska et al., 2018). The authors of the meta-analysis that found a null effect suggested that publication and inclusion bias was a particular problem in the field (Harris et al., 2014), although others argued against this (Gildersleeve et al., 2014b).

In terms of the current topic, previous studies explicitly testing the relationships between masculine traits and fitness outcomes have been overwhelmingly conducted in low fertility samples and have produced a mixture of positive, negative, and null results (e.g. Boothroyd et al., 2017; Arnocky et al., 2018; Rhodes et al., 2005). This creates a clear need for meta-analytic comparison of evidence from as wide a population sample as possible. To date, however, meta-analytic analyses are rare, typically exclude many aspects of masculinity, and focus on either mating or reproductive outcomes, despite both being relevant to testing the theories above. Van Dongen and Sprengers, 2012 meta-analyzed the relationships between men’s handgrip strength (HGS) and sexual behavior in only three industrialized populations (showing a weak, positive association [r = 0.24]). Across 33 non-industrialized societies, von Rueden and Jaeggi, 2016 found that male status (which included, but was not limited to, measures of height and strength) weakly predicted reproductive success (overall r = 0.19). In contrast, Xu and colleagues (Xu et al., 2018) reported no significant association between men’s height and offspring numbers across 16 studies when analyzing both industrialized and non-industrialized populations. Lastly, Grebe and colleagues’ (Grebe et al., 2019) meta-analysis of 16 effects – the majority of which came from Western samples – showed that men with high levels of circulating testosterone, assayed by blood or saliva, invested more in mating effort, indexed by mating with more partners and showing greater interest in casual sex (r = 0.22). Across all of their analyses (which also included pair-bond status, fatherhood status, and fathering behaviors), Grebe and colleagues found no significant differences between ‘Western’ and ‘non-Western’ samples, but their ‘non-Western’ grouping for the relevant analysis only included a low fertility population in 21st Century China. To our knowledge, facial masculinity, voice pitch, and 2D:4D have never been meta-analyzed in relation to mating and/or reproduction.

The present study

The present article therefore searched widely for published and unpublished data to meta-analyze the relationships between six main masculine traits in men (facial masculinity, body masculinity, 2D:4D, voice pitch, height, and testosterone levels) and both mating and reproductive outcomes, in both high and low fertility samples. By including multiple traits, a broad search strategy, and considering high and low fertility samples both separately and together, we can ascertain whether the current scientific evidence base provides plausible support for the sexual selection of masculine traits in humans. By further testing the publication status of each effect (whether the specific effect size/analysis was reported in a published article or not), we can also evaluate the evidence for publication bias, since this is known to artificially inflate effects in diverse literatures.

Mating measures included behavioral measures such as number of sexual partners, number of marital spouses, and age at first sexual intercourse. Since increased mating effort is an additional possible route to increased reproductive output, we also included mating attitudes, such as preferences for casual sex. Reproductive measures included: fertility measures, such as number of children/grandchildren born and age at the birth of the first child; and reproductive success measures, that is number of offspring surviving childhood. Since offspring mortality is a measure specifically of offspring viability, we included this as a separate measure (i.e. mortality rate and/or number of deceased offspring).

Materials and methods

Literature search and study selection

Request a detailed protocol

A systematic search was initially carried out between November 2017 and February 2018 using the databases PsycINFO, PubMed, and Web of Science; the searches were saved and search alerts ensured inclusion of subsequently published studies. Search terms are given in Box 1.

Box 1

Search terms for meta-analysis study discovery.

(masculin* OR “sexual dimorphism” OR "sexually dimorphic" OR width-to-height OR muscularity OR shoulder-to-hip OR chest-to-waist OR “digit ratio” OR 2d:4d OR “hand grip strength” OR “handgrip strength” OR “grip strength” OR testosterone OR “voice pitch” OR “vocal pitch” OR voice OR “non-fat body mass” OR “lean body mass” OR “fundamental frequency” OR “facial* dominan*” OR height OR “sexual dimorphism in stature” OR “CAG repeat*”) AND (“sex* partner*” OR “short-term relationship*” OR “short term mating” OR “extra pair” OR sociosexual* OR “age of first intercourse” OR “age of first sexual intercourse” OR “age at first intercourse” OR “age at first sexual intercourse” OR “age of sexual debut” OR “age at first sex” OR “mating success” OR “number of offspring” OR “offspring number” OR “number of children” OR “number of grandoffspring” OR “number of grand offspring” OR “offspring health” OR “offspring mortality” OR “mortality of offspring” OR “surviving offspring” OR “offspring survival” OR “reproductive onset” OR “reproductive success” OR “long-term relationship*” OR “age of first birth”) AND (human OR man OR men OR participant*).

Studies were also retrieved through cross-referencing, citation searches/alerts, and by asking for data on social media. The systematic search generated 2,221 results, including duplicates, and a further approximately 300 articles were found by other means. After scanning titles and abstracts, 280 articles/dissertations were reviewed in full. Studies submitted up to 1 May 2020 were accepted. Eligible studies included at least one of the following predictors: facial masculinity, body masculinity (strength, body shape, or muscle mass/non-fat body mass), 2D:4D, voice pitch, height, or testosterone levels. The following outcome measures were included:

- Mating domain: global sociosexuality (i.e. preferences for casual sex: Penke and Asendorpf, 2008; Simpson and Gangestad, 1991) and specific measures of mating attitudes and mating behaviors where:

  1. Mating attitudes included: preferences for short-term relationships, and sociosexual attitudes and desires.

  2. Mating behaviors included: number of sexual partners, one-night-stands/short-term relationships, potential conceptions, sociosexual behaviors, extra-pair sex, age at first sexual intercourse, and number of marital spouses.

- Reproductive domain: including both fertility and reproductive success, described below.

  1. Fertility: number of children and grandchildren born, and age at the birth of the first child.

  2. Reproductive success: number of surviving children/grandchildren.

- Offspring mortality domain: mortality rate and number of deceased offspring.

Both published and unpublished studies were eligible. We restricted our sample to studies with adult participants ( ≥ 17 years old). If key variables were collected but the relevant analyses were not reported, we contacted authors to request effect sizes or raw data. If data were reported in more than one study, we selected the analysis with the larger sample size or which included appropriate control variables, such as age. Studies using measures that were ambiguous and/or not comparable to measures used in other studies were excluded (e.g. measures of body size without information about the proportion of fat/muscle mass, or reproductive data during a very restricted time period). Twin studies where participants were sampled as pairs, population level studies, and studies analyzing both sexes together were also excluded, as well as articles that were not written in English or Swedish as we were not sufficiently fluent in other languages to conduct unbiased searching and extraction. Multiple measures from the same study were retained if they met the other criteria.

We chose Pearson’s r as our effect size measure and effect sizes not given as r were converted (see Supplementary file 1 for conversion formulas); if effect sizes were not convertible, the study was excluded. Where effect sizes for non-significant results were not stated in the article and could not be obtained, an effect size of 0 was assigned (k = 28). Excluding those effects from the analyses had no effect on any of the results. Twenty-nine percent of all observations (133 of 452, selected randomly) were double coded by the first author >2 months apart. Intracoder agreement was 97%. For coding decisions, see Supplementary file 2.

In total, 96 studies were selected (Lassek and Gaulin, 2009; Hughes and Gallup, 2003; Weeden and Sabini, 2007; Frederick and Haselton, 2007; Lukaszewski et al., 2014; Hill et al., 2013; Kordsmeyer et al., 2018; Peters et al., 2008; Boothroyd et al., 2017; Pawlowski et al., 2008; Arnocky et al., 2018; Rhodes et al., 2005; Van Dongen and Sprengers, 2012; Alvergne et al., 2009; Apicella, 2014; Apicella et al., 2007; Aronoff, 2017; Atkinson, 2012; Atkinson et al., 2012; Bogaert and Fisher, 1995; Booth et al., 1999; Boothroyd et al., 2011; Boothroyd et al., 2008; Charles and Alexander, 2011; Chaudhary et al., 2015; Edelstein et al., 2011; Falcon, 2016; Farrelly et al., 2015; Frederick, 2010; Frederick and Jenkins, 2015; Gallup et al., 2007; Genovese, 2008; Gettler et al., 2019; Gildner, 2018;(Gómez-Valdés et al., 2013) Hartl et al., 1982; Hoppler et al., 2018; Honekopp et al., 2007; Kirchengast, 2000; Kirchengast and Winkler, 1995; Klimas et al., 2019; Klimek et al., 2014; Kordsmeyer and Penke, 2017; Krzyżanowska et al., 2015; Kurzban and Weeden, 2005; Little et al., 1989; Loehr and O’Hara, 2013; Longman et al., 2018; Luevano et al., 2018; Maestripieri et al., 2014; Manning and Fink, 2008; Manning et al., 2003; Marczak et al., 2018; McIntyre et al., 2006; Međedović and Bulut, 2019; Mosing et al., 2015; Muller and Mazur, 1997; Nagelkerke et al., 2006; Nettle, 2002; Pawlowski et al., 2000; Pollet et al., 2011; Polo et al., 2019; Price et al., 2013; Prokop and Fedor, 2011; Prokop and Fedor, 2013; Puts et al., 2006; Puts et al., 2015; Putz et al., 2004; Rahman et al., 2005; Rosenfield et al., 2020; Schwarz et al., 2011; Scott and Bajema, 1982; Shoup and Gallup, 2008; Sim and Chun, 2016; Simmons and Roney, 2011; Smith et al., 2017; Sneade and Furnham, 2016; Sorokowski et al., 2013; Steiner, 2011; Stern et al., 2020; Strong, 2014; Strong and Luevano, 2014; Subramanian et al., 2009; Suire et al., 2018; Tao and Yin, 2016; van Anders et al., 2007; Varella et al., 2014; von Rueden et al., 2011; Voracek et al., 2010; Walther et al., 2016; Walther et al., 2017c; Walther et al., 2017a; Walther et al., 2017b; Waynforth, 1998; Winkler and Kirchengast, 1994; Honekopp et al., 2006), comprising 474 effect sizes from 99 samples and 177,044 unique participants (Table 1). This exceeds the number of studies for each of the meta-analyses published previously (Grebe et al., 2019; Van Dongen and Sprengers, 2012; von Rueden and Jaeggi, 2016; Xu et al., 2018).

Table 1
All studies included in the meta-analysis.
AuthorsYearPredictorOutcomeSampleSample locationLow or high fert.N
Alvergne et al., 20092009TREPRural villagersSenegalHigh53
Apicella, 20142014Body mascMAT, REP, OMHadzaTanzaniaHigh51
Apicella et al., 20072007Body masc, voice pitch, heightREP, OMHadzaTanzaniaHigh44-52
Arnocky et al., 20182018Facial mascMATStudentsCanadaLow135
Aronoff, 20172017TMATStudentsUSLow99
Atkinson, 20122012Body mascMATStudentsUSLow66
Atkinson et al., 20122012Body masc, voice pitch, heightREPHimba (Ovahimba)NamibiaHigh36
Bogaert and Fisher, 19951995TMATStudentsCanadaLow195-196
Booth et al., 19991999TMATArmy veterans and non-veteransUSLow4393
Boothroyd et al., 20082008Facial mascMATStudentsUKLow18-19
Boothroyd et al., 20112011Facial mascMATStudentsUKLow36
Boothroyd et al., 20172017Facial mascREP, OMAgtaPhilippinesHigh65
Facial mascMAT, REP, OMMayaBelizeHigh23-35
Charles and Alexander, 201120112D:4D, TMATStudentsUSLow25-42
Chaudhary et al., 20152015Body masc, heightMAT, REP, OMMbendjele BaYakaDemocratic Republic of the CongoHigh55-73
Edelstein et al., 20112011TMATStudentsUSLow134
Falcon, 201620162D:4DMATStudentsUSLow137
Farrelly et al., 20152015TMATStudentsUKLow75-78
Frederick, 20102010Body masc, 2D:4D, heightMATStudentsUSLow61
Frederick and Haselton, 20072007Body mascMATStudentsUSLow56-121
Frederick and Jenkins, 20152015HeightMATOnlineWorldwideLow28759-31418
Gallup et al., 20072007Body masc, 2D:4DMATStudentsUSLow71-75
Genovese, 20082008Body mascREPFormer teenage delinquentsUSHigh181
Gettler et al., 20192019TMATCebu Longitudinal Health and Nutrition SurveyPhilippinesHigh288
Gildner, 20182018Body masc, 2D:4D, heightREPShuar Health and Life History ProjectEcuadorHigh48
Gómez-Valdés et al., 20132013Facial mascREPHallstatt skullsAustriaHigh179
Hartl et al., 19821982Body masc, heightMAT, REPFormer teenage delinquentsUSHigh180-185
Hill et al., 20132013Facial masc, body masc,MATStudentsUSLow63
voice pitch, height
Hoppler et al., 20182018TREPMen’s health 40+ studySwitzerlandLow268
Hughes and Gallup, 20032003Body mascMATStudentsUSLow50-59
Honekopp et al., 200620062D:4D, heightMATStudents and non-studentsGermanyLow79-99
Honekopp et al., 20072007Facial masc, body masc, height, TMATStudents and non-studentsGermanyLow77
Kirchengast, 20002000HeightREP, OM!Kung SanNamibiaHigh103
Kirchengast and Winkler, 19951995HeightREP, OMUrban and rural Kavango peopleNamibiaHigh59-78
Klimas et al., 20192019TMATMen’s health 40+ studySwitzerlandLow159
Klimek et al., 201420142D:4D, heightREPMogielica Human Ecology Study SitePolandHigh238
Kordsmeyer et al., 20182018Body masc, voice pitch, height, TMATStudents and non-studentsGermanyLow103-164
Kordsmeyer and Penke, 201720172D:4D, heightMATStudents and non-studentsGermanyLow141
Krzyżanowska et al., 20152015HeightREPNational Child Development StudyUKLow6535
Kurzban and Weeden, 20052005HeightMAT, REPSpeed datersUSLow1503-1501
Lassek and Gaulin, 20092009Body masc, heightMATNHANES IIIUSLow4167-5159
Little et al., 19891989HeightREP, OMRural; growth stuntedMexicoHigh103
Loehr and O’Hara, 20132013Facial mascREPWWII soldiersFinlandHigh795
Longman et al., 20182018TMATStudentsUKLow38
Luevano et al., 20182018Facial masc, heightMATStudentsUSLow35-66
Lukaszewski et al., 20142014Body mascMATStudentsUSLow48-174
Maestripieri et al., 20142014TMATStudentsUSLow41-61
Manning and Fink, 200820082D:4DMAT, REPOnlineWorldwideLow26872-83681
Manning et al., 200320032D:4DREPCommunityEnglandLow189
2D:4DREPSugali and Yanadi tribal groupsIndiaHigh80
2D:4DREPZulus from townships near DurbanSouth AfricaHigh66
Marczak et al., 201820182D:4DREPYaliIndonesiaHigh47
McIntyre et al., 20062006TMATStudentsUSLow68-81
Međedović and Bulut, 20192019HeightMATStudentsSerbiaLow39
Mosing et al., 20152015HeightMAT, REPStudy of Twin Adults: Genes and EnvironmentSwedenLow2310-2549
Muller and Mazur, 19971997Facial mascREPWest Point class of 1950USHigh337
Nagelkerke et al., 20062006HeightMATNHANES 99–00USLow798-809
Nettle, 20022002HeightREPNational Child Development StudyUKLow4474
Pawlowski et al., 20082008HeightREPRuralPolandHigh46
Pawlowski et al., 20002000HeightREPUrban and ruralPolandHigh3201
Peters et al., 20082008Facial masc, body masc, TMATStudentsAustraliaLow100-113
Pollet et al., 20112011TMATNational Social Life, Health, and Aging ProjectUSLow749
Polo et al., 20192019Facial masc, body masc, heightMATStudents and non-studentsChileLow198-206
Price et al., 20132013Body masc, heightMATMainly studentsUKLow55
Prokop and Fedor, 20112011HeightREPFriends and family of studentsSlovakiaLow499
Prokop and Fedor, 20132013HeightMATStudentsSlovakiaLow105-150
Puts et al., 20062006Voice pitchMATStudentsUSLow103
Puts et al., 20152015TMATStudentsUSLow59-61
Putz et al., 200420042D:4DMATStudentsUSLow207-219
Rahman et al., 200520052D:4D, heightMATStudents and non-studentsUKLow78-150
Rhodes et al., 20052005Facial masc, body masc, heightMATMainly studentsAustraliaLow142-166
Rosenfield et al., 20202020Body masc, voice pitch, heightMAT, REP, OMTsimanéBoliviaHigh55-62
Schwarz et al., 201120112D:4DMATStudentsGermanyLow52-89
Scott and Bajema, 19821982HeightREPThird Harvard Growth StudyUSHigh606
Shoup and Gallup, 20082008Body masc, 2D:4DMATStudentsUSLow28-38
Sim and Chun, 20162016Body masc, 2D:4DMATStudentsUSLow90
Simmons and Roney, 20112011Body masc, TMATStudentsUSLow138
Smith et al., 20172017Body mascREPHadzaTanzaniaHigh51
Sneade and Furnham, 20162016Body mascMATStudentsUKLow145
Sorokowski et al., 20132013HeightREP, OMYaliIndonesiaHigh49-52
Steiner, 201120112D:4D, TREPStudents and non-studentsUSLow30
Stern et al., 20202020TMATStudentsUKLow61
Strong, 20142014Body mascMATStudentsUSLow31
Strong and Luevano, 20142014Body masc, 2D:4D, heightMATStudentsUSLow51-66
Subramanian et al., 20092009HeightOM2005-2006 National Family Health SurveyIndiaLow21120
Suire et al., 20182018Voice pitchMATMainly studentsFranceLow57-58
Tao and Yin, 20162016HeightREPThe Panel Study of Family DynamicsTaiwanLow1409
van Anders et al., 20072007TMATNon-studentsUSLow31
Van Dongen and Sprengers, 20122012Facial masc, body masc, 2D:4DMATNot specifiedNot specifiedLow52
Varella et al., 20142014Body masc, 2D:4D, heightMATStudentsBrazil, Czech RepublicLow69-80
von Rueden et al., 20112011Body masc, heightREP, OMTsimanéBoliviaHigh162-197
Voracek et al., 201020102D:4D, heightREPFirefightersAustriaLow134
Walther et al., 20162016Body mascREPMen’s health 40+ studySwitzerlandLow271
Walther et al., 2017a2017aBody mascMATMen’s health 40+ studySwitzerlandLow226
Walther et al., 2017b2017bHeightREPMen’s health 40+ studySwitzerlandLow271
Walther et al., 2017c2017cHeightMATMen’s health 40+ studySwitzerlandLow226
Waynforth, 199819982D:4D, heightMAT, REP, OMVillagersBelizeHigh35-56
Weeden and Sabini, 20072007Body masc, 2D:4D, heightMATStudentsUSLow188-212
Winkler and Kirchengast, 19941994HeightREP, OM!Kung SanNamibiaHigh31-114

Statistical analyses

Request a detailed protocol

We used the metafor package (Viechtbauer, 2010) in R 3.6.2 (R Development Core Team, 2019). metafor transforms Pearson’s r to Fisher’s Z for analysis; for ease of interpretation, effect sizes were converted back to r for presentation of results. For 2D:4D and voice pitch, effects were reverse coded prior to analysis because low values denote greater masculinity. Similarly, effects were reverse coded for all offspring mortality outcomes as well as the outcomes age at first birth and age at first sexual intercourse/contact, as low values denote increased fitness. In all analyses reported here, therefore, a positive value of r denotes a positive relationship between masculinity and fitness outcomes. All predicted relationships were positive.

Analyses were conducted using random-effects models, as we expected the true effect to vary across samples. We controlled for multiple comparisons by computing q-values (Storey, 2002). Note that q-values estimate the probability that a significant effect is truly significant or not; they are not adjusted p values. Thus, in all analyses presented below, only effects that remained significant after q-value computation (indicated by q-values < 0.05) are presented as significant. We computed q-values using all p values across all tests conducted in the whole analysis (266 in total). Q-values can be viewed in Supplementary file 7.

The analyses were conducted on three levels for both predictor traits and outcomes (Figure 1). For predictor traits, all six masculine traits were first combined and analyzed together at the global masculinity level. At the trait level, each masculine trait was then analyzed separately. Lastly, each masculine trait was further divided into separate trait indices, which were analyzed as potential moderators (see below).

Overall analysis structure.

For the outcomes, mating, reproduction, and offspring mortality were first analyzed together at the total fitness level. Given the widespread use of mating measures as proxies of reproductive outcomes, it is imperative where possible to test (and ideally compare) both mating and reproduction, to ensure that we are not relying on proxies that do not measure what they are assumed to measure. The domain level therefore divided outcomes into the mating domain, the reproductive domain, and the offspring mortality domain and analyzed them separately. The last level, the measures level, further divided mating and reproduction into their separate measures (mating attitudes and behaviors, and fertility and reproductive success, respectively), which were analyzed as subgroups.

The mating domain comprised mating attitudes and mating behaviors, as high mating success may result from increased mating efforts (reflected in favorable attitudes towards short-term mating) and/or encountering more mating opportunities (reflected in mating behaviors) without actively seeking them (because of female choice, for example). It is therefore necessary to divide these two measures.

The reproductive measures, fertility (number of offspring) and reproductive success (number of surviving offspring), are closely related but were also analyzed separately in subgroup analyses. Offspring mortality, on the other hand, was usually indexed by mortality rate (only two studies used absolute number of dead offspring, and it made no difference to the results whether those studies were included or not) and is not directly related to offspring numbers. Offspring mortality was therefore analyzed as a separate domain. As there were too few observations of offspring mortality to test predictor traits separately, this outcome was only analyzed at the global masculinity level.

In addition to analyzing all samples together, we also analyzed low and high fertility samples separately to assess whether results were robust in both types of populations. We used a cut-off of three or more children per woman on average within that sample, which roughly corresponds to samples with vs without widespread access to contraception (The World Bank, 2018). Samples therefore had two levels: all samples, and the two sample types low fertility and high fertility.

The analysis structure was therefore as summarized in Figure 1: overall analyses tested global masculinity as a predictor of total fitness, as well as the three domains of mating, reproduction, and offspring mortality, separately, across all samples. In our main analyses, we analyzed masculinity at the trait level, in relation to the two outcome domains mating and reproduction. The following subgroup analyses considered low and high fertility samples separately, in addition to also dividing outcomes into their respective measures (mating attitudes vs mating behaviors, and fertility vs reproductive success).

Lastly, we performed a series of exploratory meta-regressions on potential moderator variables. Such moderation analyses compare effect sizes across categories of studies as determined by a particular study characteristic, for example monogamous vs polygynous marriage systems, to determine if effect sizes were robust and/or equivalent across these categories. Since power was often low, we ran moderation analyses separately for each study characteristic rather than trying to test for interactions. For all masculine traits where we had sufficient power, trait-general moderation analyses included: domain type (mating vs reproduction), mating measure type (attitudes vs behaviors), reproductive measure type (fertility vs reproductive success), sample type (low vs high fertility), low fertility sample type (student vs non-student), high fertility sample type (traditional vs industrialized), ethnicity, marriage system, publication status (published vs not published effect), peer review status (peer reviewed vs not peer reviewed), sexual orientation, transformation of variables, conversion of effect sizes, age control, and inclusion of other control variables. Note that since we included many non-published effects from studies that were published, ‘publication status’ referred to whether particular the particular effects were published, not the study as whole. The analysis can therefore detect evidence of any tendency for significant results to be ‘written up’ while nonsignificant ones are not, whether this bias occurs between or within manuscripts. We ran moderation analyses both for outcome domains and outcome measures (i.e. mating attitudes and mating behaviors, and fertility and reproductive success, respectively). For each masculine trait, we also conducted trait-specific moderation analyses (e.g. subjectively rated vs morphometric facial masculinity (for full details on trait-specific moderators, see Supplementary file 3)).

Analyses sometimes included more than one observation from the same study/sample. In all analyses, therefore, effect sizes were clustered both by sample and by study. For all analyses, only relationships with a minimum of three independent samples from a minimum of two separate studies were analyzed. For moderation analyses, this meant that each category of the moderator needed observations from at least three samples from at least two studies; in many cases, there were not enough observations to test for moderators.

In the Results section, unless otherwise specified, we summarize results from trait-general moderation analyses of outcome domains only (where results for outcome measures and trait-specific moderators are reported in Supplementary file 4). Additional details and full results of all analyses can be found in Supplementary files 3-5.

Results

Summary of samples

All 96 studies included in the meta-analysis are shown in Table 1. In total, 29 articles reported effect sizes from high fertility samples, which included 17 articles drawing on 13 different extant forager or subsistence populations (of the type sometimes referred to as ‘small scale societies’, coded here as non-industrialized) predominantly in Africa or Latin America. The remaining high fertility data came from historical samples or low socioeconomic status sub-populations within low-fertility countries (e.g. agricultural Polish communities, former ‘delinquents’ in the US, and Zulus living in South African townships). Sixty-nine articles reported data from low fertility populations, which came from 54 primarily student or partially-student samples (43 of which were from English-speaking countries), and only 12 samples which could be considered representative community or cohort/panel samples. Two articles reported data drawn from ‘global’ online samples (classified as low fertility). The remaining low fertility samples were either unspecified or sampled particular sub-populations (e.g. specific professions).

Overall analyses of global masculinity

In the initial overall analyses, global masculinity was weakly but significantly associated with greater total fitness (i.e. mating, reproduction, and offspring mortality combined) (r = 0.080, 95% CI: [0.061, 0.101], q = 0.001; we reiterate here that for all analyses, q-values < 0.05 denote significance after correcting for multiple comparisons). When we divided the outcome measures into their three domains, the positive (albeit weak) associations with global masculinity remained significant for mating, but not for reproduction or offspring mortality (mating: r = 0.090, 95% CI: [0.071, 0.110], q = 0.001; reproduction: r = 0.047, 95% CI: [0.004, 0.090], q = 0.080; offspring mortality: r = .002, 95% CI: [–0.011, 0.015], q = 0.475). While the effect was thus only significant for mating, the differences between effects were not significant, but we note that sample sizes differed considerably between domains.

Below, we present in further detail the results of the effect of global masculinity on each of the three outcome domains: mating, reproduction, and offspring mortality. We then present the associations between each masculine trait and mating and reproductive measures, separately. We also present results for subgroup and trait-general moderation analyses (for outcome domains only); for complete results, see Supplementary files 4 and 5.

Mating

Main analyses of each masculine trait

This set of analyses tested the prediction that individual masculine traits are positively associated with mating. In terms of the overall mating domain (i.e. mating attitudes and behaviors combined), all masculine traits showed the predicted positive relationships with mating, and the effects were significant for all traits except for facial masculinity and 2D:4D (Table 2). Some of these effects were very weak, however. The strongest associations with the mating domain were seen in terms of body masculinity (r = 0.133, 95% CI: [0.091, 0.176], q = 0.001; Figure 2), voice pitch (r = 0.132, 95% CI: [0.061, 0.204], q = 0.002; Figure 3), and testosterone levels (r = 0.093, 95% CI: [0.066, 0.121], q = 0.001; Figure 4). Height showed a significant but smaller effect size (r = 0.057, 95% CI: [0.027, 0.087], q = 0.002; Figure 5). While not the weakest association, the relationship between facial masculinity and mating was nonsignificant (r = 0.080, 95% CI: [–0.003, 0.164], q = 0.117). The effect for 2D:4D was also nonsignificant (r = 0.034, 95% CI: [0.000, 0.069], q = 0.102), and moderation analyses showed that this was the only trait that showed a significantly smaller effect size than the strongest predictor, body masculinity (p < 0.001, q = 0.006).

Table 2
Masculine traits predicting mating: main analyses and subgroup analyses of mating attitudes vs mating behaviors and low vs high fertility samples.

Pearson’s r (95% CI); p value for meta-analytic effect, q-value (correcting for multiple comparisons); number of observations (k), samples (s), and unique participants (n); test for heterogeneity (Q), p value for heterogeneity. Statistically significant meta-analytic associations are bolded if still significant after controlling for multiple comparisons.

Mating
Outcome: SampleFacial masculinityBody masculinity2D:4DVoice pitchHeightT levels
Mating domain:
All samples
r = 0.080 (-0.003, 0.164), p = 0.060, q = 0.117r = 0.133 (0.091, 0.176), p < 0.001, q = 0.001r = 0.034 (0.000, 0.069), p = 0.049, q = 0.102r = 0.132 (0.061, 0.204), p < 0.001, q = 0.002r = 0.057 (0.027, 0.087), p < 0.001, q = 0.002r = 0.093 (0.066, 0.121), p < 0.001, q = 0.001
k = 30, s = 11, n = 948k = 121, s = 32,
n = 7939
k = 84, s = 23,
n = 66,807
k = 8, s = 5, n = 443k = 62, s = 25,
n = 43,686
k = 66, s = 21, n = 7083
Q(df = 29) = 54.834,
p = 0.003
Q(df = 120) = 297.472,
p < 0.001
Q(df = 83) = 101.994,
p = 0.077
Q(df = 7) = 2.334,
p = 0.939
Q(df = 61) = 263.247,
p < 0.001
Q(df = 65) = 66.090,
p = 0.439
Mating attitudes:
All samples
r = .095 (-0.072, 0.263), p = 0.263, q = 0.304r = .078 (0.002, 0.155), p = 0.045, q = 0.098r = 0.035 (-0.061, 0.132), p = 0.474, q = 0.385s = 0r = 0 .028 (-0.013, 0.068), p = 0.179, q = 0.253r = 0.099 (0.026, 0.173), p = 0.008, q = 0.032
k = 5, s = 4, n = 407k = 20, s = 9, n = 922k = 19, s = 7, n = 504k = 9, s = 6, n = 4232k = 21, s = 11, n = 1039
Q(df = 4) = 8.684,
p = 0.070
Q(df = 19) = 17.606,
p = 0.549
Q(df = 18) = 24.141,
p = 0.151
Q(df = 8) = 5.137,
p = 0.743
Q(df = 20) = 25.379,
p = 0.187
Mating behaviors:
All samples
r = .025 (-0.059, 0.109), p = 0.554, q = 0.424r = .142 (0.099, 0.187), p < 0.001, q = 0.001r = 0.038 (-0.002, 0.078), p = 0.061, q = 0.117r = 0.124 (0.043, 0.206), p = 0.003, q = 0.016r = 0.054 (0.021, 0.087), p = 0.001, q = 0.008r = 0.084 (0.058, 0.110), p < 0.001, q = 0.001
k = 22, s = 8, n = 755k = 91, s = 31, n = 7738k = 51, s = 19, n = 1607k = 7, s = 5, n = 443k = 48, s = 24,
n = 42,179
k = 32, s = 17, n = 6765
Q(df = 21) = 37.044,
p = 0.017
Q(df = 90) = 267.876,
p < 0.001
Q(df = 50) = 64.049,
p = 0.087
Q(df = 6) = 2.162,
p = 0.904
Q(df = 47) = 247.032,
p < 0.001
Q(df = 31) = 28.558,
p = 0.592
Mating domain:
Low fert. samples
r = 0.089 (-0.001, 0.179), p = 0.053, q = 0.109r = 0.135 (0.091, 0.180), p < 0.001, q = 0.001r = 0.038 (0.002, 0.073), p = 0.037, q = 0.086r = 0.129 (0.055, 0.204), p < 0.001, q = 0.005r = 0.055 (0.024, 0.086), p < 0.001, q = 0.004r = 0.099 (0.069, 0.129), p < 0.001, q = 0.001
k = 28, s = 10, n = 913k = 117, s = 28,
n = 7572
k = 82, s = 22,
n = 66,751
k = 7, s = 4, n = 388k = 58, s = 21,
n = 43,310
k = 58, s = 20, n = 6795
Q(df = 27) = 54.287,
p = 0.001
Q(df = 116) = 289.080,
p < 0.001
Q(df = 81) = 101.369,
p = 0.063
Q(df = 6) = 2.234,
p = 0.897
Q(df = 57) = 259.576,
p < 0.001
Q(df = 57) = 61.443,
p = 0.320
Mating attitudes:
Low fert. samples
r = 0.095 (-0.072, 0.262), p = 0.263, q = 0.304r = 0.078 (0.002, 0.155), p = 0.045, q = 0.098r = 0.035 (-0.061, 0.132), p = 0.474, q = 0.385s = 0r = 0.028 (-0.013, 0.068), p = 0.179, q = 0.253r = 0.108 (0.021, 0.195), p = 0.015, q = 0.047
k = 5, s = 4, n = 407k = 20, s = 9, n = 922k = 19, s = 7, n = 504k = 9, s = 6, n = 4,232k = 17, s = 10, n = 751
Q(df = 4) = 8.684, p = 0.070Q(df = 19) = 17.606,
p = 0.549
Q(df = 18) = 24.141,
p = .151
Q(df = 8) = 5.137,
p = 0.743
Q(df = 16) = 20.017,
p = 0.220
Mating behaviors:
Low fert. samples
r = 0.028 (-0.063, 0.119), p = 0.543, q = 0.420r = 0.145 (0.100, 0.193), p< 0.001, q = 0.001r = 0.042 (0.001, 0.083), p = 0.045, q = 0.098r = .119 (0.034, 0.205), p = 0.006, q = 0.025r = .051 (0.017, 0.086), p = 0.004, q = 0.019r = .088 (0.058, 0.119), p < 0.001, q = 0.001
k = 20, s = 7, n = 720k = 87, s = 27, n = 7371k = 49, s = 19, n = 1551k = 6, s = 4, n = 388k = 44, s = 20,
n = 41,803
k = 30, s = 16, n = 6477
Q(df = 19) = 36.610,
p = 0.009
Q(df = 86) = 259.448,
p < 0.001
Q(df = 48) = 62.941,
p = 0.073
Q(df = 5) = 2.017,
p = 0.847
Q(df = 43) = 243.392,
p < 0.001
Q(df = 29) = 27.793,
p = 0.529
Mating domain:
High fert. samples
s = 1r = 0.105 (-0.069, 0.280), p = 0.235, q = 0.285s = 1s = 1r = 0.089 (-0.016, 0.193), p = 0.096, q = 0.157s = 1
k = 4, s = 4, n = 367k = 4, s = 4, n = 376
Q(df = 3) = 7.282,
p = 0.063
Q(df = 3) = 3.388,
p = 0.336
Mating attitudes:
High fert. samples
s = 0s = 0s = 0s = 0s = 0s = 1
Mating behaviors:
High fert. samples
s = 1r = 0.105 (-0.069, 0.280), p = 0.235, q = 0.285s = 1s = 1r = 0.089 (-0.016, 0.193), p = 0.096, q = 0.157s = 1
k = 4, s = 4, n = 367k = 4, s = 4, n = 376
Q(df = 3) = 7.282,
p = 0.063
Q(df = 3) = 3.388,
p = 0.336
  1. Note. Fert. = fertility; k = number of observations; n = number of unique participants; Q = Cochran’s Q test of heterogeneity; q = q-value; s = number of samples; T = testosterone.

Forest plot of the association between body masculinity and the mating domain.

Effect sizes are shown as Z-transformed r, with 95% confidence intervals in brackets. The width of the diamond corresponds to the confidence interval for the overall effect.

Forest plot of the association between voice pitch and the mating domain.

Effect sizes are shown as Z-transformed r, with 95% confidence intervals in brackets. The width of the diamond corresponds to the confidence interval for the overall effect.

Forest plot of the association between testosterone levels and the mating domain.

Effect sizes are shown as Z-transformed r, with 95% confidence intervals in brackets. The width of the diamond corresponds to the confidence interval for the overall effect.

Forest plot of the association between height and the mating domain.

Effect sizes are shown as Z-transformed r, with 95% confidence intervals in brackets. The width of the diamond corresponds to the confidence interval for the overall effect.

Comparison of high and low fertility samples

Across all masculine traits, most effect sizes (94%) came from low fertility samples. Moderation analyses of sample type could only be run for body masculinity and height; neither was significant, although in both cases, the effect sizes observed in the main analyses were significant only for low fertility, and not the less numerous high fertility samples (k = 4 for each trait). The other four traits had only been measured in one high fertility sample each, and the main analyses thus contained almost exclusively low fertility samples. We further compared low fertility samples which were predominantly students with other low fertility samples as part of our moderation analyses where possible, that is for body masculinity, voice pitch, height, and testosterone. For body masculinity, student samples showed a significantly stronger effect than non-student samples for mating behaviors only (B = –0.128, p = 0.009, q = 0.032) but otherwise we found no differences (see Supplementary file 4).

Inclusion bias/heterogeneity

Since the analysis included unpublished data, the distribution of effects in the funnel plots (see Supplementary file 6A) shows availability bias rather than publication bias. Apart from voice pitch, for which we did not have many effects, visual inspection of funnel plots indicated that they were generally symmetric, suggesting that the analysis did not systematically lack studies with unexpected small effects. There was significant heterogeneity of effect sizes for facial masculinity, body masculinity, and height; all of which are accounted for in a random-effects analysis.

Additional subgroup and moderation analyses for outcome domains

In this step of the analyses, we tested the hypothesis that each of the six masculine traits is positively associated with the two mating domain measures (mating attitudes and mating behaviors) and tested further potential control variables and trait-specific moderators. Results of subgroup analyses can be viewed in Table 2 and trait-general moderators in Table 3; full results of all moderation analyses are reported in Supplementary file 4.

Type of mating measure (attitudes vs behaviors) was never a significant moderator. However, for both body masculinity and height, there were significant effects for mating behaviors (body masculinity: r = 0.142, 95% CI: [0.099, 0.187], q = 0.001, height: r = 0.054, 95% CI: [0.021, 0.087], q = 0.008) but not attitudes. Voice pitch was significantly related to mating behaviors (r = 0.124, 95% CI: [0.043, 0.206], q = 0.016) but was not measured in combination with mating attitudes. Testosterone levels showed near identical effects for both mating attitudes and behaviors (r = 0.099, 95% CI: [0.026, 0.173], q = 0.032 and r = 0.084, 95% CI: [0.058, 0.110], q = 0.001, respectively).

No trait-general moderator consistently changed the pattern of the associations (Table 3). Body masculinity effects were stronger in studies where age had not been controlled for compared to where it had been controlled for (B = 0.103, p = 0.015, q = 0.047). Associations for 2D:4D were weaker in non-white/mixed ethnicity samples compared to white samples (B = –0.080, p = 0.014, q = 0.047), and stronger where variables had been transformed to approximate normality compared to when they had not been transformed (B = 0.103, p = 0.016, q = 0.047). Similarly, associations for testosterone levels were also stronger for normality-transformed variables (B = 0.057, p = 0.015, q = 0.047), and weaker in gay/mixed sexuality samples compared to in heterosexual samples (B = –0.059, p = 0.003, q = 0.016).

For trait-specific moderators, significant moderation was seen for type of body masculinity where body shape was a significantly weaker predictor than strength (B = –0.099, p = 0.003, q = 0.017). Effects for rated body masculinity were significantly stronger than for indices taken from body measurements (B = 0.177, p = 0.007, q = 0.029). For 2D:4D, studies that had measured digit ratios three times – rather than twice or an unknown number of times – showed significantly stronger effects (B = 0.102, p = 0.006, q = 0.025).

Reproduction

Main analyses of each masculine trait

In this set of analyses, we tested the hypothesis that individual masculine traits positively predict reproduction. As Table 3 shows, relationships were generally in the predicted direction, but body masculinity was the strongest and only significant predictor (r = 0.143, 95% CI: [0.076, 0.212], q = 0.001; Figure 6). The only trait with an effect size significantly smaller than body masculinity was height (B = –0.107, p = 0.005, q = 0.023).

Table 3
Masculine traits predicting reproduction: main analyses and subgroup analyses of mating attitudes vs mating behaviors and low vs high fertility samples.

Pearson’s r (95% CI); p value for meta-analytic effect, q-value (correcting for multiple comparisons); number of observations (k), samples (s), and unique participants (n); test for heterogeneity (Q), p value for heterogeneity. Statistically significant meta-analytic associations are bolded if still significant after controlling for multiple comparisons.

Reproduction
Outcome: SampleFacial masculinityBody masculinity2D:4DVoice pitchHeightT levels
Reproductive domain:
All samples
r = 0.099 (-0.012, 0.211), p = 0.081, q = 0.140r = 0.143 (0.076, 0.212), p < 0.001, q = 0.001r = 0.074 (-0.006, 0.154), p = 0.070, q = 0.131r = 0.136 (-0.053, 0.328), p = 0.158, q = 0.228r = 0.006 (-0.049, 0.062), p = 0.819, q = 0.491r = 0.039 (-0.067, 0.145), p = 0.474, q = 0.385
k = 5, s = 5, n = 1411k = 14, s = 8, n = 897k = 19, s = 10,
n = 84,558
k = 5, s = 3, n = 143k = 35, s = 25,
n = 22,326
k = 3, s = 3, n = 351
Q(df = 4) = 8.799,
p = 0.066
Q(df = 13) = 16.356,
p = 0.230
Q(df = 18) = 31.704,
p = 0.024
Q(df = 4) = 5.378,
p = 0.251
Q(df = 34) = 433.359,
p < 0.001
Q(df = 2) = 0.387,
p = 0.824
Fertility:
All samples
r = 0.003 (-0.253, 0.260), p = 0.980, q = 0.543r = 0.130 (0.060, 0.201), p < 0.001, q = 0.002r = 0.032 (-0.065, 0.130), p = 0.514, q = 0.406s = 2r = 0.011 (-0.039, 0.062), p = 0.660, q = 0.451s = 2
k = 3, s = 3, n = 437k = 8, s = 6, n = 813k = 13, s = 5, n = 84,128k = 26, s = 23,
n = 22,242
Q(df = 2) = 5.416,
p = 0.067
Q(df = 7) = 4.840,
p = 0.679
Q(df = 12) = 17.757,
p = 0.123
Q(df = 25) = 400.038,
p < 0.001
RS:
All samples
s = 2r = 0.192 (-0.052, 0.441), p = 0.122, q = 0.189r = 0.174 (0.085, 0.267), p < 0.001, q = 0.002s = 2r = −0.044 (-0.201, 0.113), p = 0.584,
q = 0.430
s = 1
k = 6, s = 4, n = 205k = 6, s = 5, n = 430k = 9, s = 9, n = 603
Q(df = 5) = 11.344,
p = 0.045
Q(df = 5) = 0.976,
p = 0.965
Q(df = 8) = 33.311,
p < 0.001
Reproductive domain:
Low fert. samples
s = 0s = 1r = 0.083 (-0.023, 0.190), p = 0.126, q = 0.191s = 0r = −0.037 (-0.112, 0.038), pp = 0.337,
q = .347
s = 2
k = 8, s = 4, n = 84,034k = 8, s = 8, n = 17,135
Q(df = 7) = 13.988,
p = 0.051
Q(df = 7) = 244.970,
p < 0.001
Fertility:
Low fert. samples
s = 0s = 1r = 0.052 (-0.065, 0.169), p = 0.386, q = 0.369s = 0r = −0.037 (-0.112, 0.038), p = 0.337,
q = 0.347
s = 2
k = 7, s = 3, n = 83,845k = 8, s = 8, n = 17,135
Q(df = 6) = 8.335,
p = 0.215
Q(df = 7) = 244.970,
p < 0.001
RS:
Low fert. samples
s = 0s = 0s = 1s = 0s = 0s = 0
Reproductive domain:
High fert. samples
r = 0.099 (-0.012, 0.211), p = 0.081, q = 0.140r = 0.163 (0.104, 0.225), p < 0.001, q = 0.001r = 0.083 (-0.039, 0.205), p = 0.184, q = 0.257r = 0.136 (-0.053, 0.327), p = 0.158, q = 0.228r = 0.034 (-0.041, 0.109), p = 0.377, q = 0.367s = 1
k = 5, s = 5, n = 1411k = 13, s = 7, n = 626k = 11, s = 6, n = 524k = 5, s = 3, n = 143k = 27, s = 17, n = 5191
Q(df = 4) = 8.799,
p = 0.066
Q(df = 12) = 12.347,
p = 0.418
Q(df = 10) = 12.595,
p = 0.247
Q(df = 4) = 5.378,
p = 0.251
Q(df = 26) = 70.216,
p < 0.001
Fertility:
High fert. samples
r = 0.003 (-0.253, 0.260), p = 0.980, q = 0.543r = 0.165 (0.095, 0.237), p < 0.001, q = 0.001s = 2s = 2r = 0.059 (0.007, 0.111), p = 0.025, q = 0.068s = 0
k = 3, s = 3, n = 437k = 7, s = 5, n = 542k = 18, s = 15, n = 5,107
Q(df = 2) = 5.416,
p = 0.067
Q(df = 6) = 0.988,
p = 0.986
Q(df = 17) = 26.458,
p = 0.067
RS:
High fert. samples
s = 2r = 0.192 (-0.052, 0.441), p = 0.122, q = 0.189r = 0.170 (0.053, 0.291), p = 0.005, q = 0.022s = 2r = -0.044 (-0.201, 0.113), p = 0.584,
q = 0.430
s = 1
k = 6, s = 4, n = 205k = 5, s = 4, n = 241k = 9, s = 9, n = 603
Q(df = 5) = 11.344,
p = 0.045
Q(df = 4) = 0.965,
p = 0.915
Q(df = 8) = 33.311,
p < 0.001
  1. Note. fert. = fertility; k = number of observations; n = number of unique participants; Q = Cochran’s Q test of heterogeneity; q = q-value; RS = reproductive success; s = number of samples; T = testosterone.

Forest plot of the association between body masculinity and the reproductive domain.

Effect sizes are shown as Z-transformed r, with 95% confidence intervals in brackets. The width of the diamond corresponds to the confidence interval for the overall effect.

Comparison of high and low fertility samples

The majority (77 %) of observations of reproduction were from high fertility samples. Moderation analyses of low versus high fertility samples could only be conducted for 2D:4D and height; effect sizes did not differ significantly between sample types. Comparing types of high fertility samples (industrialized vs non-industrialized) for 2D:4D and height did not show any differences in effect sizes (see Supplementary file 4). It was not possible to compare sample subtypes for the other traits because observations were almost entirely from non-industrialized populations.

Inclusion bias/heterogeneity

Visual inspection of funnel plots (see Supplementary file 6B) suggested that while the effects for voice pitch, height, and testosterone levels were symmetrically distributed, our analysis may have lacked studies for the other traits. Facial masculinity and height showed significant heterogeneity.

Additional subgroup and moderator analyses for outcome domains

Results of subgroup analyses can be viewed in Table 3 and trait-general moderators in Table 4; full results of moderation analyses are found in Supplementary file 4.

Table 4
Overview of moderation analyses for the mating vs reproductive domains.

Significant associations are indicated by+ and – signs, showing the direction of the moderator relative to the reference category (stated first in the moderator column); crosses indicate no significant moderation; and ‘na’ indicates that power was too low to run that specific analysis. Only associations that remained significant after controlling for multiple comparisons are indicated here. Note that this table only shows general moderators shared by all masculine traits; for trait-specific moderation analyses, see Supplementary file 4. Likewise, for moderation analyses of the two mating domain measures attitudes and behaviors, and the two reproductive domain measures fertility and reproductive success, we also refer to Supplementary file 4.

ModeratorFacial masc.Body masc.2D:4DVoice pitchHeightT levels
MATREPMATREPMATREPMATREPMATREPMATREP
Mating vs reproductive domain
Mating attitudes vs behaviorsnanananananana
Fertility vs reproductive successnanananananananana
Low vs high fertility samplenananananananana
Low fertility: student vs non-student samplenananananananana
High fertility: traditional vs industrialized samplenananananananananana
Predominantly white vs mixed/other/unknown ethnicity samplenanananana
Monogamous vs non-monogamous marriage systemnanananananananana
Published vs non-published resultsnananana
Peer reviewed vs not peer reviewed studynanananananananana
Heterosexual vs gay/mixed/unknown samplenananana+na
Non-normality-transformed vs transformed variablesnana+nana+na
Non-converted vs converted effect sizesnana+nanananana
Age controlled for vs not controlled forna+nananana
Inclusion of non-relevant control variables vs notnanananananananana
  1. Note. Masc = masculinity; MAT = mating; REP = reproduction; T = testosterone.

Moderation analyses (where possible) showed no evidence that the effects of masculinity traits on fertility differed from the effects on reproductive success. However, for body masculinity, the effect on fertility was significant (r = 0.130, 95% CI: [0.060, 0.201], q = 0.002; five out of six samples high fertility) while the somewhat larger effect on reproductive success was not. For 2D:4D, there was a significant effect for reproductive success (four out of five samples from high fertility populations: r = 0.174, 95% CI: [0.085, 0.267], q = 0.002) but not for fertility.

Similarly, for mating, no trait-general or trait-specific moderators had any consistent effects on the results. Body masculinity effects were stronger where effect sizes had been converted to Pearson’s r compared to where they initially had been given as r (B = 0.143, p = 0.015, q = 0.047), and effects for height were stronger in gay/mixed sexuality samples than heterosexual samples (B = 0.135, p = 0.016, q = 0.047).

Comparing mating and reproduction across traits

Moderation analyses of domain type (mating versus reproduction) for each trait showed no significant differences, although height and testosterone levels had weaker associations with reproduction than mating while body masculinity showed the opposite pattern. There were generally far fewer observations for reproductive measures, so this nonsignificant analysis may reflect lack of power. For facial masculinity, voice pitch, and 2D:4D, effect sizes for global mating and reproductive measures were near identical.

Discussion

Summary of results

We conducted the first comprehensive meta-analysis of the relationships between men’s masculine traits and outcomes related to mating and reproduction. Various proposed (and non-mutually exclusive) hypotheses suggest that more masculine men should show increased mating success (indexed by more matings and/or preferences for short-term mating), increased reproductive output (indexed by fertility and/or reproductive success), and/or lower offspring mortality. Our results showed partial support for these predictions. Global masculinity (i.e. all masculine traits combined) significantly predicted effects in the mating domain, but not the reproductive domain or the offspring mortality domain. When we analyzed each masculine trait separately, all traits except facial masculinity and 2D:4D significantly predicted effects in the mating domain, where similarly strong associations were seen for body masculinity, voice pitch, and testosterone levels, and a weaker correlation was seen for height. In terms of the reproductive domain, the only significant predictor was body masculinity. It was not possible to analyze offspring mortality at the specific predictor level owing to a severe lack of relevant data from which to draw conclusions (total number of observations for each outcome domain: mating domain k = 371; reproductive domain k = 81; offspring mortality domain k = 22).

We also examined how these effects play out in high versus low fertility populations. Typically, however, different outcomes were measured in different groups of populations; mating outcomes were predominantly measured in low fertility populations, while reproductive outcomes were measured mainly in high fertility populations. This made it more challenging to draw direct comparisons. Where it was possible to run moderation analyses on sample type, there were no significant differences. These analyses, however, have small numbers of high and low fertility samples in mating and reproductive outcomes respectively. Therefore, while we can confidently say that most forms of masculinity (but not facial masculinity or 2D:4D) are associated with (largely self-reported) mating outcomes in low fertility samples, we cannot draw any clear conclusions regarding mating success in high fertility samples. Similarly, although we are confident that body masculinity is associated with fertility/reproductive success in high fertility samples, we cannot draw conclusions about low fertility contexts.

More generally, our moderation analyses on outcome types and factors relating to measure quality did not yield any consistent differences between effect sizes, suggesting that the effects we do find are reasonably robust within sample type at least. Two key points to note here are that: i. although effect sizes for mating attitudes and mating behaviors did differ for some traits (i.e. facial masculinity and body masculinity), these differences were never significant, despite mating behaviors being constrained by opportunities (assuming participants report truthfully), and ii. similarly, effect sizes did sometimes differ by publication status but never significantly so; in addition, the direction of the differences was not consistent (i.e. effect sizes were not consistently larger in published analyses). Even if the analysis was restricted to nonpublished effects only, the association between body masculinity and both mating and reproduction would be weaker but remain significant (mating: r = 0.077, p = 0.006; reproduction: r = 0.112, p < 0.001; both associations would remain significant after q-value computation). Overall, this suggests that researchers have not been selectively reporting larger effect sizes.

Compared to previous meta-analyses assessing associations between handgrip strength and mating outcomes (Van Dongen and Sprengers, 2012), height/strength and reproductive outcomes (von Rueden and Jaeggi, 2016; Xu et al., 2018), and testosterone levels and mating effort (Grebe et al., 2019), our analysis benefits from more comprehensive measures of masculinity, larger sample sizes, and inclusion of more unpublished effects. With the exception of Xu and colleagues’ analysis (Xu et al., 2018), we observe smaller effect sizes than previous meta-analyses, which suggests that the association between masculinity and fitness outcomes has previously been overestimated. In general, what significant associations we did observe were small and ranged between r = 0.05 and 0.17, although they are potentially meaningful in an evolutionary context. As benchmarks for interpreting correlations, Funder and Ozer, 2019 suggest that a correlation of 0.10, while being a small effect, has the potential to be influential over a long time period, and a medium-size correlation of 0.20 can be consequential both in the short- and long-term. The cumulative effect of relatively ‘weak’ correlations can therefore be of real consequence, particularly when considered in terms of selection acting over many generations.

Major implications

Selection for body masculinity

The first stand-out result of our analysis is that body masculinity (i.e. strength/muscularity) is the only trait in our analysis that was consistently correlated with both mating and reproductive outcomes across populations, and the effects of body masculinity on these outcomes were among the strongest in the analysis. In contrast, other aspects of masculinity (except facial masculinity and 2D:4D) predicted mating success in low fertility samples but did not yield reproductive benefits in high fertility samples.

Body masculinity is therefore the trait where we have the most compelling evidence that selection is currently happening within naturally fertile populations - and from that, can infer that selection likely took place in prior eras as well. As such, our results are consistent with the argument that dimorphisms in strength and muscle mass are sexually selected. Overall, since traits such as body size, strength, and muscularity are associated with formidability, our findings are consistent with the male-male competition hypothesis. In species with male intrasexual competition, males tend to evolve to become larger, stronger, and more formidable than females, as they are in humans. Some authors argue that male-male violence has influenced human evolution (Hill et al., 2016; Gat, 2015), and male intergroup aggression increases mating/reproductive success in both non-industrialized human societies and in non-human primates (Glowacki and Wrangham, 2015; Manson et al., 1991). (And indeed the non-human evidence might suggest this form of dimorphism has been under selection since pre-hominid ancestors, although the strength of such selection pressures have likely fluctuated over this time [172].) For example, in the Yanomamö Indians, men who kill others have greater reproductive success (Chagnon, 1988). A relationship between formidable traits and fitness outcomes need not be a direct one, however. It might, as mentioned in the introduction, be mediated by other factors that are important in mate choice, such as interpersonal status and dominance. For example, features that are advantageous in intraspecies conflicts may also be advantageous when hunting game (Sell et al., 2012; Smith et al., 2017) reported that in a hunter-gatherer population, men with greater upper body strength and a low voice pitch had increased reproductive success, but this relationship was explained by hunting reputation.

It is of course possible that different selection pressures may have contributed to the evolution of different masculine traits. Male-male competition for resources and mates, female choice, and intergroup violence are all plausible, non-mutually exclusive explanations (Plavcan, 2012). In this article, we have focused on the effect of men’s own traits on their fitness, but it is of course equally possible that men varying in masculinity may differ in the quality of the mates they acquire. If masculine men are able to secure mates who are more fertile and/or better parents, this may also increase their fitness.

No evidence of advantage for facial masculinity

Considerable attention has been given in the literature to the hypothesis that masculinity in men’s facial structure is an indicator of heritable immunocompetence (i.e. good genes), which should then be associated with greater mating and reproductive success. While we find that the effect of facial masculinity on mating was similar in size to that of other traits (r = 0.08), it was not significantly different from zero, suggesting more variability in effects. Furthermore, the effect of facial masculinity on mating (such as it was) was largely driven by mating attitudes and was close to zero for mating behaviors, suggesting that men’s facial masculinity exerts virtually no influence on mating when moderated by female choice. Similarly, the influence of facial masculinity on fertility in high fertility samples was non-existent (r = 0.00). Although the relationship with reproductive success appeared stronger, this was based on only two samples. This is, all together, doubly striking because although voice pitch, height, and testosterone levels did not predict reproductive outcomes, they did all relate to mating in the expected direction. Facial masculinity is ergo an outlier in being so entirely unrelated to mating success in our data, while subject to so large a literature assuming the opposite.

Overall, these findings contradict a large body of literature claiming that women’s preferences for masculinity in men’s faces are adaptive. Rather, they indicate that such preferences (to the extent they exist at all) are a modern anomaly only found in industrialized populations, as suggested by Scott and colleagues (Scott et al., 2014), and as demonstrated by the positive correlation between facial masculinity preferences and national health and human development indices (Marcinkowska et al., 2019).

Students and foragers

One key observation regarding our dataset is that it shows a rather ‘bimodal’ distribution between a large number of studies sampling (predominantly English-speaking) students on one hand, and a cluster of studies sampling foragers, horticulturalists, and other subsistence farmers (predominantly from just two continents) on the other. Where it was possible to compare student vs non-student/mixed samples within low fertility populations, and traditional vs industrialized high fertility samples, we generally did not find any differences. Likewise, where it was possible to compare monogamous and formally polygynous cultures, we also found no differences. This is despite evidence that monogamy actually changes selection pressures on human men (Brown et al., 2009). Therefore, although we are reasonably confident that our results regarding body masculinity and reproduction are robust, insofar as they are based on non-industrialized populations with a range of subsistence patterns (hunter-gatherers, forager-horticulturalists, and pastoralists), it remains essential to consider rebalancing the literature. Not only do we require more holistic representation of non-industrialized populations (drawing from Asia and Oceania in particular, where we had one and zero samples, respectively), but it is also important to increase representation of non-student participants in low fertility contexts.

Disconnection between mating and reproductive literatures

As noted above, we found that voice pitch, height, and testosterone levels were associated with (largely self-reported) mating success in mostly low fertility populations, but not with actual reproductive fitness in high fertility populations. A caveat here is that effect sizes for voice pitch and reproduction were similar in strength to effect sizes for body masculinity, but we note that this analysis had the smallest sample size of our whole analysis (k = 5, n = 143), which prevents us from drawing firm conclusions regarding the relationship between voice pitch and reproductive outcomes.

Overall, however, the contradicting pattern of results for the traits mentioned above raise important concerns for the human sexual selection field, particularly with respect to whether (and which) mating measures can be used as reliable indicators of likely ancestral fitness when considering the current evidence base. Since reproductive outcomes – for good reason – are not considered meaningful fitness measures in populations with widespread contraception use, we typically test fitness outcomes in industrialized populations using mating measures such as sociosexual attitudes and casual sexual encounters. This is done under the assumption that such measures index mating strategies that ancestrally would have increased men’s offspring numbers. However, if mating outcomes (be it attitudinal or behavioral) measured in low fertility populations truly index reproductive outcomes in naturally fertile contexts, we would expect traits that predict mating to also predict reproduction on average across samples (notwithstanding the diversity in norms/reproductive behaviors across high fertility samples). We do not, however, have evidence that this is generally the case. Our findings therefore raise the question of whether these widely used measurements are truly valid proxies of what we purport to be measuring.

Our findings thus illustrate that when we attempt to test the same underlying research questions using different measurements in different populations, this may yield conclusions that are erroneous or misleading when applied outside of the studied population. We suggest, based on our analysis, that researchers could for instance consistently gather sexual partner number, age of marriage, and number/survival rates of offspring in multiple population types. Wherever possible, it is essential to use the same measurements across populations, or at least resist the temptation of applying our findings universally.

Key limitations

Non-linearity

A limitation of our analysis is that we only assessed linear relationships, ignoring possible curvilinear associations. There is evidence suggesting that moderate levels of masculinity might be associated with increased reproductive success (see e.g. Boothroyd et al., 2017, for offspring survival rates) and perceived attractiveness (Frederick and Haselton, 2007; Johnston et al., 2001, but see also Sell et al., 2017), with a decrease for both very low and very high levels of masculinity. Indeed, some of these authors have argued that masculinity may be under stabilizing, rather than directional, selection in humans. In instances such as these, our ‘null’ conclusions regarding e.g. facial masculinity, remain valid; facial masculinity does not appear to be under directional selection. However, we also note that there is data suggesting that height in men may be optimal when it is over-average but not maximal. In this scenario, although the linear relationship would be weaker, the trait remains under directional selection, and we would still expect to see positive, albeit weak, associations in our analyses. In the vast majority of studies included, only linear relationships were tested, and acquiring original data to investigate and synthesize non-linear effects was beyond the scope of the current article. However, increased publication of open data with articles may well facilitate such a project in future years.

Testosterone effects

As mentioned above, in our analysis testosterone levels predicted mating outcomes – with similar effect sizes for attitudinal and behavioral measures – but did not predict reproduction. While a causal relationship between testosterone levels and mating success cannot be established from this (i.e. whether high testosterone men pursue more mating opportunities which leads to more matings, or whether high testosterone results from many matings), testosterone is commonly argued to motivate investment in mating effort. If current testosterone levels index degree of masculine trait expression in men, our results might indicate that masculine men’s increased mating success is due to greater pursuit of matings - rather than reflecting female choice and/or greater competitiveness. Two caveats for interpreting our results, however (applicable both to the significant effect we observe for mating and the nonsignificant effect for reproduction), is that circulating testosterone levels i. change over the course of a man’s lifetime, peaking in early adulthood and subsequently declining (Booth and Dabbs, 1993, although this may not be the case in non-industrialized populations: Bribiescas, 1996), and ii. are reactive. In the studies we gathered, testosterone levels were generally measured contemporaneously with mating/reproductive data collection – not when masculine traits generally become exaggerated in adolescence. Testosterone also decreases, for example, when men enter a relationship or get married (Archer, 2006; Holmboe et al., 2017), when they become fathers (Archer, 2006; Gettler et al., 2011), or when they engage in childcare (Archer, 2006). Thus, men whose testosterone levels were previously high may show declining testosterone levels either because of their age and/or because their relationship or fatherhood status has changed. This limits the conclusions we can draw, both with regards to a potential mediating role of testosterone levels in the association between masculine traits and mating success, and the observed nonexistent effect for testosterone levels and reproductive outcomes. We also note that the sample size for reproduction, as a function of testosterone levels, was small.

Conclusion

In summary, we used a large-scale meta-analysis of six masculine traits and their relationships with mating and reproductive outcomes to test whether such traits are currently under selection in humans. We found that all masculine traits except facial masculinity and 2D:4D were associated with significantly greater mating success. However, only body masculinity predicted higher fertility, indexed by reproductive onset, number of offspring, and grand-offspring. We further note that the mating and reproduction literature is starkly split between studying mating in predominantly student settings, and 'only' fertility in high fertility settings, which imposes constraints on both this paper and our field as a whole. We argue that our findings illustrate that when we test hypotheses about human evolution largely in industrialized populations, we risk drawing conclusions that are not supported outside of evolutionarily novel, highly niche mating and reproductive contexts. We therefore call for greater sample diversity and more homogenous measurements in future research.

Data availability

All data generated and analysed in this article, including complete R code, are available on the Open Science Framework ( https://doi.org/10.17605/OSF.IO/PHC4X).

The following data sets were generated
    1. Lidborg LH
    2. Cross CP
    3. Boothroyd LG
    (2020) Open Science Framework
    Is male dimorphism under sexual selection in humans? A meta-analysis.
    https://doi.org/10.17605/OSF.IO/PHC4X

References

  1. Book
    1. Aronoff J
    (2017)
    Life History Strategies, Testosterone, and the Anthropology of Human Development
    Tuscaloosa, Alabama: University of Alabama.
  2. Book
    1. Atkinson JA
    (2012)
    Anthropometric Correlates of Reproductive Success, Facial Configuration, Risk Taking and Sexual Behaviors among Indigenous and Western Populations: The Role of Hand-Grip Strength and Wrist Width
    Albany, New York: University at Albany.
    1. Boothroyd LG
    2. Scott I
    3. Gray AW
    4. Coombes CI
    5. Pound N
    (2013)
    Male facial masculinity as a cue to health outcomes
    Evolutionary Psychology 11:1044–1058.
    1. Crook JH
    (1972)
    Sexual Selection and the Descent of Man
    Sexual selection, dimorphism, and social organization in the primates, Sexual Selection and the Descent of Man, Aldine Publishing Company.
  3. Book
    1. Fechner PY
    (2003) The biology of puberty: New developments in sex differences
    In: Hayward C, editors. Gender Differences at Puberty. Cambridge University Press. pp. 17–28.
    https://doi.org/10.1017/CBO9780511489716.003
  4. Book
    1. Frederick MJ
    (2010)
    Effects of Early Developmental Stress on Adult Physiology and Behavior
    Albany, New York: State University of New York at Albany.
  5. Book
    1. Gildner TE
    (2018)
    Life History Tradeoffs between Testosterone and Immune Function among Shuar Forager-Horticulturalists of Amazonian Ecuador
    Eugene, Oregon: University of Oregon.
  6. Book
    1. Hartl EM
    2. Monnelly EP
    3. Elderkin RD
    (1982)
    Physique and Delinquent Behavior: A Thirty Year Follow-up of William H. Sheldon’s Varieties of Delinquent Youth
    Academic Press.
    1. Kirchengast S
    2. Winkler EM
    (1995)
    Differential reproductive success and body dimensions in Kavango males from urban and rural areas in northern Namibia
    Human Biology 67:291–309.
    1. Kirchengast S
    (2000)
    Differential reproductive success and body size in !Kung San people from northern Namibia
    Collegium Antropologicum 24:121–132.
    1. Little BB
    2. Malina RM
    3. Buschang PH
    4. Little LR
    (1989)
    Natural selection is not related to reduced body size in a rural subsistence agricultural community in southern Mexico
    Human Biology 61:287–296.
  7. Conference
    1. Luevano VX
    2. Merkling K
    3. Sanchez-Pinelo JJ
    4. Shakeshaft M
    5. Velazquez ML
    (2018)
    Facial width-to-height ratio, psychopathy, sociosexual orientation, and accepting detection-risk in extra-pair relationships
    Poster presented at the annual meeting of the Western Psychological Association.
  8. Book
    1. Manning JT
    (2002)
    Digit Ratio: A Pointer to Fertility, Behavior, and Health
    Rutgers University Press.
    1. Pawlowski B
    2. Boothroyd LG
    3. Perrett DI
    4. Kluska S
    (2008)
    Is female attractiveness related to final reproductive success?
    Collegium Antropologicum 32:457–460.
  9. Book
    1. R Development Core Team
    (2019)
    R: A Language and Environment for Statistical Computing
    Vienna: R Foundation for Statistical Computing.
    1. Scott EC
    2. Bajema CJ
    (1982)
    Height, weight and fertility among the participants of the Third Harvard Growth Study
    Human Biology 54:501–516.
  10. Book
    1. Steiner ET
    (2011)
    Testosterone and Vasopressin in Men’s Reproductive Behavior
    Eugene, Oregon: University of Oregon.
  11. Book
    1. Strong HL
    (2014)
    Association of Prenatal Androgen Exposure with Attention and Sociosexual Orientation
    Long Beach, California: California State University.
  12. Conference
    1. Strong H
    2. Luevano VX
    (2014)
    Prenatal androgen exposure predicts relationship-type preference but not experience
    Poster presented at the annual meeting of the Association for Psychological Science.
  13. Book
    1. Zeigler-Hill V
    2. Welling LLM
    3. Shackelford TK
    (2015) Evolutionary Perspectives on Social Psychology.
    In: Zeigler-Hill V., Welling LL, Shackelford TK, editors. Attraction and Human Mating” in Evolutionary Perspectives on Social Psychology. Cham: Springer. pp. 319–332.
    https://doi.org/10.1007/978-3-319-12697-5

Decision letter

  1. George H Perry
    Senior and Reviewing Editor; Pennsylvania State University, United States
  2. Jordan Martin
    Reviewer; University of Zurich, Switzerland
  3. Steven Gangestad
    Reviewer; University of New Mexico, United States
  4. Urszula Marcinkowska
    Reviewer; Jagiellonian University Medical College, Poland

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Is male dimorphism under sexual selection in humans? A meta-analysis" for consideration by eLife. Your article has been reviewed by 4 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by George Perry as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jordan Martin (Reviewer #2); Steven Gangestad (Reviewer #3); Urszula Marcinkowska (Reviewer #4).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

The reviewers included experts from both evolutionary psychology and evolutionary anthropology, as these two approaches do not always overlap in readership. I include a list of items below that summarizes the main issues raised by the reviewers, all of whom were in agreement that this was a strong, well-written paper that would benefit from revision. However, the recommended revisions do permeate the entire manuscript.

Summary Items

1. Framing and introduction: The introduction should be scaled back, to arrive more quickly at the analysis part of the paper. However, this scaling should not just be a cutting process, but instead involves the most important suggested revision of the paper. In this, the authors should situate the study within a more robust theoretical framing of the problem, drawing more strongly especially from evolutionary anthropology. The reason for this is linked to the next issue, which is the degree to which we should consider the study populations in the meta-analysis to be representative of humans in general. The authors provide an excellent review, but it could be shortened a bit and exchanged for the theoretical piece. This would shift in the framing of the paper to be more of an exploration of the different patterns of data reported in this field, with more skepticism about reporting bias, rather than mainly a primary hypothesis test. The authors here have a terrific opportunity in this compiled dataset to examine if these hypotheses are actually testable (or to what extent) using available data. For example, do the observations between X and X trait hold true when excluding studies presenting this as their main hypothesis? In this way, their paper becomes more than just another manuscript of many that attempts to test extant hypotheses, and more of a critique and technical examination of the state of available data – which would be ultimately more useful across a range of disciplines. As a structural note, the authors could also divide the introduction and Discussion sections into subsections, for example when providing different hypothesis behind masculinity attractiveness.

2. All reviewers suggested caution that the data in most of the analyses came from WEIRD populations, and that in specific realms (especially child mortality and marriage patterns) there is likely to be strong cultural variation that can bias patterns away from a stronger ancestral signal. The authors are clearly aware of this, but it is an issue used to explain results, rather than to frame expectations. In the detailed comments we each provide specific places where the authors might temper their discussion to specifically address this issue. They should also be more up front at the start in addressing how representative the meta-analysis based on these populations is likely to be of humans in general (or, indeed, humans over evolutionary time). The authors do perform an assessment of biased reporting of results, but most of this appears in the discussion and it is difficult to know if it is possible to show, using this dataset, if there is actually bias in the underlying data. If the authors reframe the manuscript as an examination of the robustness of existing data, then they can better consider the limitations of their sample and meta-analysis for informing historical selection pressures on human populations, as well as for identifying the direct and indirect targets of sexual selection. This is closely related to the limited scope of the theoretical framework presented for interpreting the data.

3. The authors performed many, many statistical tests in this study, but did not perform any multiple test correction or estimate of false discovery rates. They then highlight individual tests with p-values lower than 0.05, but this is expected many times with such an analysis scheme. The authors should provide a sense of which (if any) tests are relatively unexpected by chance when considering all of these tests performed. One option would be to compute FDR q-values with the input being all p-values from all analyses conducted as part of the study (they could alternatively choose to be more conservative than this and use Bonferroni correction). This could then focus the paper on the most robust results (if any remain standing), which links to items #1 and #2 about reframing the manuscript in terms of the utility of current data to speak to the issue of male dimorphism, and shifting more emphasis to some of the potential biases inherent in the datasets.

Line by line items and individual figure and table requests

– Line 29: Higher variance in male reproduction and a surplus of reproductively available males seems to be a true statement mainly with respect to mammals. The references are about humans and non-human primates, so this should be specified here.

– Line 42: I suggest underlining caution when interpreting results of 2d:4d and sexual reproduction studies, as very often they were not replicated (as authors note by adding reference 15).

– Line 60: as the previous sentence is a long one (but sufficiently clear, so no need to shorten it) I would maybe rephrase the "…such putative associations…" to reiterate which associations the authors meant.

– Line 67: "better quality, more viable offspring." as I learnt from experience, readers from fields other than human biology/ecology often are put off by saying offspring can be of better or worse quality. I suggest adding a short description, of what does a good quality offspring mean.

– Line 69: shouldn't there be "what" instead of "which"?

– Line 79: although there are citations here, I still wonder about the logic that masculinity should be linked to "health", broadly construed. There are many reasons why poor "health" might be linked to masculinity, for example through lifestyle choices or the social context of masculine performance. I would appreciate more nuance here. Are there specifically studies that examine "health" and masculinity in which lifestyle and diet, for example, are comparable? If those cited here qualify, then that should be specified.

– Line 80: authors could add here some of the cross-cultural studies showing women's mixed preferences for facial masculinity (DeBruine et al., 2010, 2011, Brooks et al., 2011 or Marcinkowska et al., 2019).

– Line 104: I suggest splitting the paragraph around line 104. The previous part describes the sexy-sons hypothesis, while sentences after line 104 focus on other explanations and possible negative correlates of pronounced sexual dimorphism in men. I also believe that explaining more in-depth the trade-off that women face when judging sexual dimorphism could be useful in the introduction (now there are just two sentences on the putative negative correlates of masculinity).

– Line 123: "In societies without effective contraception, reproduction can be measured directly in terms of offspring numbers and/or offspring survival." I understand the logic here, but at the end of the day everything about human behavior is culturally mediated. Humans understand the relationship between sex and children, and so contraception is only part of the equation; if it is societally acceptable or encouraged to have children, then more children will be had – irrespective of the availability of contraception. Similarly, if there are reasons not to have children but contraception is not available, humans deploy cultural means to moderate fertility. Many societies are also in the midst of a demographic transition, as these ideas (and access to contraception) are in the midst of changing across one generation to the next. Furthermore, how representative are "high fertility" and "low fertility" populations, if there is so much underlying cultural and individual variation in reproductive choices? These are all just thoughts I had while reading the paper that could potentially feed into an expanded discussion that draws in more work from evolutionary anthropology.

– Line 141: "Lastly, a meta-analysis of 16 effects by Grebe et al., (57) showed that men with high testosterone levels invested more in mating effort, indexed by mating with more partners and showing greater interest in casual sex (r = 0.22)."

I appreciate the authors' comments on confounding within- vs between-individual T effects on mating behavior in the discussion, but it would also be useful here to emphasize that these findings are for circulating levels of testosterone, to avoid readers assuming this study clearly demonstrated between-individual effects.

– Line 142: "testosterone levels" measured how/when?

– Line 155: "We make no attempt to evaluate the hypotheses described above against each other."

This statement should probably be deleted as it seems contradicted by the claim in the discussion that "our findings could be interpreted as lending support to the male-male competition hypothesis".

– Line 207: "Studies using measures that were ambiguous and/or not comparable to measures used in other studies were excluded." Please provide an example of what such a measure would be.

– Line 214: "Where effect sizes for non-significant results were not stated in the paper and could not be obtained, an effect size of 0 was assigned." Please state here how many studies were affected by this decision.

– Line 273: "We used a cut-off of three or more children per woman on average within that sample, which roughly corresponds to samples with vs without widespread access to contraception." It would be helpful to include a raw data plot in the supplement showing this pattern visually to better justify the analytic decision. My apologies if I happened to miss this.

– Line 275: a reference could be useful here to back up the cut-off point if one exists.

– Line 281: "high" is missing before fertility.

– Line 296: authors could provide possible options for" publication status" and "peer-review status"

– Line 299: calling this variable of a study "quality" infers that one approach is better than the other (while I don't think that is the case).

– Line 395: "For 2D:4D, all samples except one were from low fertility populations. Although the relationship with the mating domain was significant (but very weak), this was no longer significant when excluding non-significant effect sizes not reported in the paper, where we had assigned an effect size of 0." Is this just a power issue?

– Line 494: missing r for the weak association.

– Line 495: would it be possible to add here a number/rate/percentage showing that there was much more missing data in the "offspring mortality" analyses than there was in other models?

– Line 555: and also visible in the positive correlation of facial masculinity preference with Health and HDI Indexes found in Marcinkowska et al., 2019.

– Line 556: "A limitation of our analysis is that we only assessed linear relationships, ignoring possible curvilinear associations… However, if such results indicate that greater-than-average levels of masculinity are associated with peak fitness/attractiveness, we would still expect to see positive, albeit weak, linear relationships." I sincerely appreciate the authors making this point in their discussion. However, they should also consider that some populations may be at an evolutionary equilibrium with regard to the measured traits, in which case we would expect null main effects accompanied by quadratic effects (i.e. stabilizing selection). In general, more attention is needed to selection on not only the mean phenotype within a population, but also its variance.

– Lande, R., and Arnold, S. J. (1983). The measurement of selection on correlated characters. Evolution, 1210-1226.

– Stinchcombe, J. R., Agrawal, A. F., Hohenlohe, P. A., Arnold, S. J., and Blows, M. W. (2008). Estimating nonlinear selection gradients using quadratic regression coefficients: double or nothing? Evolution, 62(9), 2435-2440.

– In a paragraph starting at 563 authors could add how the T was measured in the listed studies, to further back up the idea that current T might not be related to current masculinity.

– Line 588: "Wherever possible, we thus need to use the same measurements across populations, or at least resist the temptation of applying our findings universally."

– Line 589: maybe here authors could add what these variables could be according to their exhaustive analysis.

– Line 595: "higher fertility." as measured by …

– Line 598: typo, should be "evolutionarily". Also, great point. Also, the reproductive context is niche (would there be as much casual sex in a population where it is more likely to lead to offspring?), not just the mating context.

Figures

– Figure 1: This is a very helpful figure for the reader. I would consider using a slightly smaller indentation, though. Please also note that the figure is quite low resolution in the main text, if you were not already aware.

– Figure 3: This forest plot is very helpful. I would encourage the authors to include more forest plots for other meaningful effects in the paper, as they will be of much greater relevance to the typical reader than the multiple, large funnel plots, which are important but could just as well be placed in the supplement and referenced from the main text.

Tables

– I am curious why Table 1 and 2 are presented as separate tables. It is a very long section in this article, and structuring it as clear as possible will be very helpful.

– Table 5: Please bold the checks to make them easier to see. Currently, the Xs stand out much more than the checks. Also, I would consider using a different color scheme to aid colorblind readers, as well as including the direction of effect change (i.e. + or -) next to the checks with the reference categories indicated in the footnote.

– Is there a reason why the authors did not discuss the moderation effects from Table 5 in the Discussion section?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "A meta-analysis of the association between male dimorphism and fitness outcomes in humans" for further consideration by eLife. Your revised article has been evaluated by George Perry (Senior Editor) and a Reviewing Editor.

While I do find that your manuscript has been improved and you have done some solid work in addressing many reviewer comments, I (speaking as a Senior Editor) have some concerns on several points that I do not feel were properly addressed, and yet that I consider fundamental. I give an overview of these comments in the following, and we offer you the opportunity to address these comments robustly in a revision, or to consider submitting your manuscript to another journal. In addition, several comments shared by the Reviewing Editor who handled your manuscript, Jessica Thompson, are below.

Senior Editor comments:

1. This is a passage from essential revision #1 from the previous decision letter:

"This would shift in the framing of the paper to be more of an exploration of the different patterns of data reported in this field, with more skepticism about reporting bias, rather than mainly a primary hypothesis test. The authors here have a terrific opportunity in this compiled dataset to examine if these hypotheses are actually testable (or to what extent) using available data. For example, do the observations between X and X trait hold true when excluding studies presenting this as their main hypothesis? In this way, their paper becomes more than just another manuscript of many that attempts to test extant hypotheses, and more of a critique and technical examination of the state of available data – which would be ultimately more useful across a range of disciplines."

In my view, two of the most essential parts of this point was not accomplished in the revision. One, for a shift to a different type of paper that one necessarily aiming to draw biological conclusions. Two, for skepticism related to reporting bias (i.e. the tendency for negative results to not be published, or even for the effects of inadvertent or worse p-hacking in the published literature, coming into the meta-analyses).

2. Related to FDR q-values. Thank you for providing these in the revision. But how was this analysis performed? It is surprising that the q-values are as low as they are being reported, considering the large number of tests performed. I think that the most conservative approach should be used here (all p-values from all of the tests in the paper), given the implications and the large number of tests performed.

https://doi.org/10.7554/eLife.65031.sa1

Author response

Summary Items

1. Framing and introduction: The introduction should be scaled back, to arrive more quickly at the analysis part of the paper. However, this scaling should not just be a cutting process, but instead involves the most important suggested revision of the paper. In this, the authors should situate the study within a more robust theoretical framing of the problem, drawing more strongly especially from evolutionary anthropology. The reason for this is linked to the next issue, which is the degree to which we should consider the study populations in the meta-analysis to be representative of humans in general. The authors provide an excellent review, but it could be shortened a bit and exchanged for the theoretical piece. This would shift in the framing of the paper to be more of an exploration of the different patterns of data reported in this field, with more skepticism about reporting bias, rather than mainly a primary hypothesis test. The authors here have a terrific opportunity in this compiled dataset to examine if these hypotheses are actually testable (or to what extent) using available data. For example, do the observations between X and X trait hold true when excluding studies presenting this as their main hypothesis? In this way, their paper becomes more than just another manuscript of many that attempts to test extant hypotheses, and more of a critique and technical examination of the state of available data – which would be ultimately more useful across a range of disciplines. As a structural note, the authors could also divide the introduction and Discussion sections into subsections, for example when providing different hypothesis behind masculinity attractiveness.

2. All reviewers suggested caution that the data in most of the analyses came from WEIRD populations, and that in specific realms (especially child mortality and marriage patterns) there is likely to be strong cultural variation that can bias patterns away from a stronger ancestral signal. The authors are clearly aware of this, but it is an issue used to explain results, rather than to frame expectations. In the detailed comments we each provide specific places where the authors might temper their discussion to specifically address this issue. They should also be more up front at the start in addressing how representative the meta-analysis based on these populations is likely to be of humans in general (or, indeed, humans over evolutionary time). The authors do perform an assessment of biased reporting of results, but most of this appears in the discussion and it is difficult to know if it is possible to show, using this dataset, if there is actually bias in the underlying data. If the authors reframe the manuscript as an examination of the robustness of existing data, then they can better consider the limitations of their sample and meta-analysis for informing historical selection pressures on human populations, as well as for identifying the direct and indirect targets of sexual selection. This is closely related to the limited scope of the theoretical framework presented for interpreting the data.

We address points 1 and 2 together as they both concern framing and focus of the paper. We thank the reviewers for raising the key point around the importance of discussing what the data can actually show, and the need to consider diversity (or lack of) within both high and low fertility samples. Based on their suggestions above and line by line comments below we have done the following:

Added more subheadings throughout

– Trimmed down the theoretical discussion in the introduction to make space for other points

– Expanded on the diversity in low fertility samples in the introduction

– Clarified the importance in considering how predictions regarding adaptations would play out in low vs high fertility samples in the introduction and explicitly reporting the high vs low fertility sample results in their own sub-sections within the results. We also create an explicit focus on this point and the problem with lacking mating data in high fertility samples in particular, in the discussion.

– More explicitly noted our analysis of publication status in the introduction in order to comment on the results (showing no difference in effects reported as an observed result in papers vs data noted but results not reported, or data not even published.)

– Added a summary of sample types in the Results, which breaks down samples beyond the simply high and low fertility distinction, and discussed the implications in Discussion.

Regarding the suggestion made by some reviewers that we are unduly accepting of our overall results (and their largely null direction), we would note that although one author (LB) is a known long-standing sceptic of the immunocompetence hypothesis, all authors considered it plausible at the start of this investigation that positive associations would be found for most traits with both mating success and fertility and our conclusions are based entirely on the results we found. Furthermore, our high fertility samples are in fact more diverse than the low fertility data (which is largely students) and as such the null results regarding most traits and fertility are arguably the more convincing.

We believe that we have now produced a paper which answers both the question we began with (is there evidence that masculine traits predict enhanced RS and/or mating success) and which considers more carefully the challenges arising from structural imbalances in the literature regarding the tendency in our field to collect data from either students or African and Latin American ‘small scale societies’, while largely ignoring other populations.

We believe these changes have improved the paper and hope that they will prove satisfactory.

3. The authors performed many, many statistical tests in this study, but did not perform any multiple test correction or estimate of false discovery rates. They then highlight individual tests with p-values lower than 0.05, but this is expected many times with such an analysis scheme. The authors should provide a sense of which (if any) tests are relatively unexpected by chance when considering all of these tests performed. One option would be to compute FDR q-values with the input being all p-values from all analyses conducted as part of the study (they could alternatively choose to be more conservative than this and use Bonferroni correction). This could then focus the paper on the most robust results (if any remain standing), which links to items #1 and #2 about reframing the manuscript in terms of the utility of current data to speak to the issue of male dimorphism, and shifting more emphasis to some of the potential biases inherent in the datasets.

We have used q values as the reviewers have suggested and focused on the remaining significant results. The overall pattern of results does not change much; the relationship between 2D:4D and mating success is no longer significant, but body masculinity remains the key trait associated with both mating and reproductive outcomes.

Line by line items and individual figure and table requests

– Line 29: Higher variance in male reproduction and a surplus of reproductively available males seems to be a true statement mainly with respect to mammals. The references are about humans and non-human primates, so this should be specified here.

We now specify this refers to humans and non-human primates (now on line 30).

– Line 42: I suggest underlining caution when interpreting results of 2d:4d and sexual reproduction studies, as very often they were not replicated (as authors note by adding reference 15).

Do the reviewers mean sexual dimorphism here? If so, we agree and thus cited reference 14 (line 43).

– Line 60: as the previous sentence is a long one (but sufficiently clear, so no need to shorten it) I would maybe rephrase the "…such putative associations…" to reiterate which associations the authors meant.

This phrase has now gone due to trimmed theoretical sections.

– Line 67: "better quality, more viable offspring." as I learnt from experience, readers from fields other than human biology/ecology often are put off by saying offspring can be of better or worse quality. I suggest adding a short description, of what does a good quality offspring mean.

We appreciate this sensitivity check and have replaced ‘better quality, more viable offspring’ with ‘healthier and more viable offspring, who are more likely to survive’ (line 63).

– Line 69: shouldn't there be "what" instead of "which"?

This phrase has now gone due to trimmed theoretical sections.

– Line 79: although there are citations here, I still wonder about the logic that masculinity should be linked to "health", broadly construed. There are many reasons why poor "health" might be linked to masculinity, for example through lifestyle choices or the social context of masculine performance. I would appreciate more nuance here. Are there specifically studies that examine "health" and masculinity in which lifestyle and diet, for example, are comparable? If those cited here qualify, then that should be specified.

We agree with the reviewer that the proposed association between masculinity and health is very complex, and it is indeed a claim frequently contested in the literature. We have expanded on that point in our reworked introduction. Our aim here, however, is not to evaluate the evidence for or against this claim, but rather to describe it as one of the mechanisms that have been proposed as an explanation of how selection may act on masculine traits, whilst acknowledging to the reader that it is indeed contested. As such, this passage is intentionally brief (lines 72-74).

– Line 80: authors could add here some of the cross-cultural studies showing women's mixed preferences for facial masculinity (DeBruine et al., 2010, 2011, Brooks et al., 2011 or Marcinkowska et al., 2019).

See the previous comment (lines 72-74). This point, with references, is also raised in the discussion (lines 614-618).

– Line 104: I suggest splitting the paragraph around line 104. The previous part describes the sexy-sons hypothesis, while sentences after line 104 focus on other explanations and possible negative correlates of pronounced sexual dimorphism in men. I also believe that explaining more in-depth the trade-off that women face when judging sexual dimorphism could be useful in the introduction (now there are just two sentences on the putative negative correlates of masculinity).

We have split this paragraph after ‘which should in turn result in more offspring.’ And have briefly expanded on the quality-care trade off (lines 94-107).

– Line 123: "In societies without effective contraception, reproduction can be measured directly in terms of offspring numbers and/or offspring survival." I understand the logic here, but at the end of the day everything about human behavior is culturally mediated. Humans understand the relationship between sex and children, and so contraception is only part of the equation; if it is societally acceptable or encouraged to have children, then more children will be had – irrespective of the availability of contraception. Similarly, if there are reasons not to have children but contraception is not available, humans deploy cultural means to moderate fertility. Many societies are also in the midst of a demographic transition, as these ideas (and access to contraception) are in the midst of changing across one generation to the next. Furthermore, how representative are "high fertility" and "low fertility" populations, if there is so much underlying cultural and individual variation in reproductive choices? These are all just thoughts I had while reading the paper that could potentially feed into an expanded discussion that draws in more work from evolutionary anthropology.

We have now expanded on the diversity of ‘high fertility’/non-contracepting populations in the introduction (lines 108-143; see also our first reply above), and have deleted this sentence.

– Line 141: "Lastly, a meta-analysis of 16 effects by Grebe et al., (57) showed that men with high testosterone levels invested more in mating effort, indexed by mating with more partners and showing greater interest in casual sex (r = 0.22)."

I appreciate the authors' comments on confounding within- vs between-individual T effects on mating behavior in the discussion, but it would also be useful here to emphasize that these findings are for circulating levels of testosterone, to avoid readers assuming this study clearly demonstrated between-individual effects.

We’ve added that the Grebe et al., study used circulating testosterone (lines 159-160).

– Line 142: "testosterone levels" measured how/when?

We now specify that testosterone was assayed by blood or saliva (line 160). The authors of the paper mention that sampling time of day was recorded, but this information is not given anywhere.

– Line 155: "We make no attempt to evaluate the hypotheses described above against each other."

This statement should probably be deleted as it seems contradicted by the claim in the discussion that "our findings could be interpreted as lending support to the male-male competition hypothesis".

We have cut this as part of our rewritten introduction more explicitly focusing on whether the effects exist, at all, in high v low fertility populations.

– Line 207: "Studies using measures that were ambiguous and/or not comparable to measures used in other studies were excluded." Please provide an example of what such a measure would be.

We have added examples of such measures (lines 234-236).

– Line 214: "Where effect sizes for non-significant results were not stated in the paper and could not be obtained, an effect size of 0 was assigned." Please state here how many studies were affected by this decision.

This has now been stated (line 243).

– Line 273: "We used a cut-off of three or more children per woman on average within that sample, which roughly corresponds to samples with vs without widespread access to contraception." It would be helpful to include a raw data plot in the supplement showing this pattern visually to better justify the analytic decision. My apologies if I happened to miss this.

The rationale for the choice of this cut-off (which was made prior to data collection/analysis) was to distinguish between populations with and without widespread access to contraception; the global fertility rate is around 2.5, and most industrialized countries have a fertility rate around or below 2 children/woman, whereas natural fertility populations typically have a fertility rate around 4 or higher (according to the World Bank Data: https://data.worldbank.org/indicator/SP.DYN.TFRT.IN). Additionally, industrialized, contracepting populations typically show a decline from a fertility rate above or around 3 to between 1.5-2 when contraception become widely available. A cut-off of 3 children/woman was therefore judged as appropriate to distinguish between low and high fertility populations. We have added a reference to justify this decision (line 302).

– Line 275: a reference could be useful here to back up the cut-off point if one exists.

We have added a reference the World Bank data also cited above (line 302).

– Line 281: "high" is missing before fertility.

This has been added (line 309).

– Line 296: authors could provide possible options for" publication status" and "peer-review status"

This has been added, along with a clarification that publication status refers to publication status of a particular effect, not status of the study it was taken from (lines 324-329).

– Line 299: calling this variable of a study "quality" infers that one approach is better than the other (while I don't think that is the case).

We have deleted this.

– Line 395: "For 2D:4D, all samples except one were from low fertility populations. Although the relationship with the mating domain was significant (but very weak), this was no longer significant when excluding non-significant effect sizes not reported in the paper, where we had assigned an effect size of 0." Is this just a power issue?

Yes – including the assigned effect sizes of.0, the association is just significant (p = 0.049), thus becomes nonsignificant when those effects are excluded. However, this association does not survive corrections for multiple comparisons (using q values as suggested) so this difference is no longer relevant.

– Line 494: missing r for the weak association.

This association did not survive correction for multiple comparisons.

– Line 495: would it be possible to add here a number/rate/percentage showing that there was much more missing data in the "offspring mortality" analyses than there was in other models?

We have added total number of observations for each outcome domain (lines 528-529).

– Line 555: and also visible in the positive correlation of facial masculinity preference with Health and HDI Indexes found in Marcinkowska et al., 2019.

This has been added (lines 617-618).

– Line 556: “A limitation of our analysis is that we only assessed linear relationships, ignoring possible curvilinear associations… However, if such results indicate that greater-than-average levels of masculinity are associated with peak fitness/attractiveness, we would still expect to see positive, albeit weak, linear relationships.” I sincerely appreciate the authors making this point in their discussion. However, they should also consider that some populations may be at an evolutionary equilibrium with regard to the measured traits, in which case we would expect null main effects accompanied by quadratic effects (i.e. stabilizing selection).

We thank the reviewers for raising this point. We were indeed referring in part to research which found such quadratic effects regarding masculinity and offspring quality and have made the point more clearly now, as well as differentiating stabilizing selection from non-linear but still directional selection (as some have argued is the case for height). We have now expanded on this point (668-682).

– In a paragraph starting at 563 authors could add how the T was measured in the listed studies, to further back up the idea that current T might not be related to current masculinity.

We have noted this in order to make the point clearer that T levels were generally gathered in a manner which exacerbates the reactive T problem: “In the studies we gathered, testosterone levels were generally measured contemporaneously with mating/reproductive data collection.” (lines 697-698).

– Line 588: “Wherever possible, we thus need to use the same measurements across populations, or at least resist the temptation of applying our findings universally.”

– Line 589: maybe here authors could add what these variables could be according to their exhaustive analysis.

We have added a preceding sentence: “We suggest, based on our analysis, that researchers could for instance consistently gather sexual partner number, age of marriage, and number/survival rates of offspring in multiple population types.” (lines 686-688).

– Line 595: “higher fertility.” As measured by …

This has been added (lines 713-714).

– Line 598: typo, should be “evolutionarily”. Also, great point. Also, the reproductive context is niche (would there be as much casual sex in a population where it is more likely to lead to offspring?), not just the mating context.

Typo corrected; ‘reproductive contexts’ added (line 719).

Figures

– Figure 1: This is a very helpful figure for the reader. I would consider using a slightly smaller indentation, though. Please also note that the figure is quite low resolution in the main text, if you were not already aware.

Figure 1 (as well as Figure 2) have been updated to a higher resolution.

– Figure 3: This forest plot is very helpful. I would encourage the authors to include more forest plots for other meaningful effects in the paper, as they will be of much greater relevance to the typical reader than the multiple, large funnel plots, which are important but could just as well be placed in the supplement and referenced from the main text.

Funnel plots have been moved to Supplementary files. Forest plots for all significant outcome domain associations have been added.

Tables

– I am curious why Table 1 and 2 are presented as separate tables. It is a very long section in this article, and structuring it as clear as possible will be very helpful.

They have been combined (both Tables 1 and 2, and 3 and 4).

– Table 5: Please bold the checks to make them easier to see. Currently, the Xs stand out much more than the checks. Also, I would consider using a different color scheme to aid colorblind readers, as well as including the direction of effect change (i.e. + or -) next to the checks with the reference categories indicated in the footnote.

We’re grateful for the reminder of the need to use a color deficiency friendly palette. Moderators and reference categories have been specified for all analyses. We use + and – signs to indicate direction of significant effects, and as before, crosses for nonsignificant effects. Please note that it is hard to make significant associations ‘pop out’ more without making them bigger, which leads to formatting problems in the table.

– Is there a reason why the authors did not discuss the moderation effects from Table 5 in the Discussion section?

We did not include non-significant moderators (other than those we explicitly planned to focus on) for brevity. We now explicitly mention all significant moderators in text for outcome domains, and for brevity, refer readers to the Supplementary files for outcome measure moderations.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Senior Editor comments:

1. This is a passage from essential revision #1 from the previous decision letter:

"This would shift in the framing of the paper to be more of an exploration of the different patterns of data reported in this field, with more skepticism about reporting bias, rather than mainly a primary hypothesis test. The authors here have a terrific opportunity in this compiled dataset to examine if these hypotheses are actually testable (or to what extent) using available data. For example, do the observations between X and X trait hold true when excluding studies presenting this as their main hypothesis? In this way, their paper becomes more than just another manuscript of many that attempts to test extant hypotheses, and more of a critique and technical examination of the state of available data – which would be ultimately more useful across a range of disciplines."

In my view, two of the most essential parts of this point was not accomplished in the revision. One, for a shift to a different type of paper that one necessarily aiming to draw biological conclusions. Two, for skepticism related to reporting bias (i.e. the tendency for negative results to not be published, or even for the effects of inadvertent or worse p-hacking in the published literature, coming into the meta-analyses).

We thank the editor for the greater clarity regarding their thinking on this issue provided through our email exchange and we append the full email history at the end of this document for reference. We believe that the conclusion of this exchange was that a focus on concerns around publication bias “could ultimately be the leading reason for this work to be featured in a major general life sciences journal, rather than a more field-specific journal, and to become a maximally impactful paper.”

On re-reading the paper we can see the advantages in more clearly spelling out the role of meta-analysis in general in this area, and have added a section starting at line 147 specifically on this (incorporating the previous meta-analyses on this specific topic that were already mentioned). This places our paper within that broader history of using meta-analysis to more clearly test evolutionary hypotheses with regard to humans and sexual selection.

Regarding the editor’s concern that meta-analyses addressing specific research questions are not suitable in eLife, we note that eLife has published prior examples of this kind of analysis. For instance, Baumelle et al., (2020, https://elifesciences.org/articles/55659) who tested specific hypotheses around biodiversity, and Mancini et al., (2020 https://elifesciences.org/articles/61523) who – like us – investigated which measures within a particular domain were and were not associated with a particular outcome (within neurohistology in their case). This demonstrates that theoretically driven meta-analyses of specific issues have a place within eLife alongside those that focus on methodological issues. We would also argue that a large-scale meta-analysis attempting to understand the level of selection for physical dimorphism in humans is of sufficient interest to a wide range of fields (including Psychology, Anthropology, and Biology) to justify its place within a general journal such as eLife.

While some meta-analysis in eLife have indeed focused on publication biases, such focus was also justified by the results. As we noted in our email exchange, our results do not support an exclusive focus on this angle and, as we further noted in response to the editor’s query and as we have now included in the discussion (lines 551-555), the main results do hold when only unpublished datasets were analysed. Publication bias is not, therefore, a primary concern with this research issue.

We hope that this provides both reassurance alongside our additional engagement with the goals of meta-analysis in our introduction, including concerns that publication bias may hinder interpretation, and can satisfy on this point.

2. Related to FDR q-values. Thank you for providing these in the revision. But how was this analysis performed? It is surprising that the q-values are as low as they are being reported, considering the large number of tests performed. I think that the most conservative approach should be used here (all p-values from all of the tests in the paper), given the implications and the large number of tests performed.

We believe this issue has been resolved through our email exchange, specifically:-

Our previous response by email: We did indeed run our correction for multiple comparisons in the suggested way: using all p-values from all of the tests in the paper (266 p-values in total). We include here output from the q value computation; as is evident from this output, where q values are low this is due to p-values being very small to begin with.

Editor’s further response: For the FDR analyses, I apologize for potentially missing the methodological statements in the text that you did indeed include all p-values into a single FDR calculation. That's great that you did do it this way.

We have also added a line specifying that this was how the q-value computation was performed (lines 291-292). Supplementary Figure 7 is also included in this resubmission; we apologise that it was erroneously left out from our previous resubmission.

https://doi.org/10.7554/eLife.65031.sa2

Article and author information

Author details

  1. Linda H Lidborg

    Department of Psychology, Durham University, Durham, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Visualization, Writing - original draft, Writing - review and editing
    For correspondence
    lhlidborg@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9667-9326
  2. Catharine Penelope Cross

    School of Psychology and Neuroscience, University of St Andrews, St Andrews, United Kingdom
    Contribution
    Conceptualization, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8110-8408
  3. Lynda G Boothroyd

    Department of Psychology, Durham University, Durham, United Kingdom
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6660-5828

Funding

No external funding was received for this work.

Senior and Reviewing Editor

  1. George H Perry, Pennsylvania State University, United States

Reviewers

  1. Jordan Martin, University of Zurich, Switzerland
  2. Steven Gangestad, University of New Mexico, United States
  3. Urszula Marcinkowska, Jagiellonian University Medical College, Poland

Publication history

  1. Preprint posted: March 8, 2020 (view preprint)
  2. Received: November 19, 2020
  3. Accepted: February 17, 2022
  4. Accepted Manuscript published: February 18, 2022 (version 1)
  5. Version of Record published: May 13, 2022 (version 2)

Copyright

© 2022, Lidborg et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,288
    Page views
  • 132
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Linda H Lidborg
  2. Catharine Penelope Cross
  3. Lynda G Boothroyd
(2022)
A meta-analysis of the association between male dimorphism and fitness outcomes in humans
eLife 11:e65031.
https://doi.org/10.7554/eLife.65031

Further reading

    1. Evolutionary Biology
    Jia Jia et al.
    Research Article

    Ecological preferences and life history strategies have enormous impacts on the evolution and phenotypic diversity of salamanders, but the yet established reliable ecological indicators from bony skeletons hinder investigations into the paleobiology of early salamanders. Here we statistically demonstrate, by using time-calibrated cladograms and geometric morphometric analysis on 71 specimens in 36 species, that both the shape of the palate and many non-shape covariates particularly associated with vomerine teeth are ecologically informative in early stem- and basal crown-group salamanders. Disparity patterns within the morphospace of the palate in ecological preferences, life history strategies and taxonomic affiliations were analyzed in detail, and evolutionary rates and ancestral states of the palate were reconstructed. Our results show that the palate is heavily impacted by convergence constrained by feeding mechanisms and also exhibits clear stepwise evolutionary patterns with alternative phenotypic configurations to cope with similar functional demand. Salamanders are diversified ecologically before the Middle Jurassic and achieved all their present ecological preferences in the Early Cretaceous. Our results reveal that the last common ancestor of all salamanders shares with other modern amphibians a unified biphasic ecological preference, and metamorphosis is significant in the expansion of ecomorphospace of the palate in early salamanders.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Osvaldo Villa et al.
    Short Report Updated

    The influence of genetic variation on the aging process, including the incidence and severity of age-related diseases, is complex. Here, we define the evolutionarily conserved mitochondrial enzyme ALH-6/ALDH4A1 as a predictive biomarker for age-related changes in muscle health by combining Caenorhabditis elegans genetics and a gene-wide association scanning (GeneWAS) from older human participants of the US Health and Retirement Study (HRS). In a screen for mutations that activate oxidative stress responses, specifically in the muscle of C. elegans, we identified 96 independent genetic mutants harboring loss-of-function alleles of alh-6, exclusively. Each of these genetic mutations mapped to the ALH-6 polypeptide and led to the age-dependent loss of muscle health. Intriguingly, genetic variants in ALDH4A1 show associations with age-related muscle-related function in humans. Taken together, our work uncovers mitochondrial alh-6/ALDH4A1 as a critical component to impact normal muscle aging across species and a predictive biomarker for muscle health over the lifespan.