Knowledge of biodiversity is unevenly distributed across the Tree of Life. In the long run, such disparity in awareness unbalances our understanding of life on Earth, influencing policy decisions and the allocation of research and conservation funding. We investigated how humans accumulate knowledge of biodiversity by searching for consistent relationships between scientific (number of publications) and societal (number of views in Wikipedia) interest, and species-level morphological, ecological and socio-cultural factors. Across a random selection of 3,019 species spanning 29 Phyla/Divisions, we show that socio-cultural factors are the most important correlates of scientific and societal interest in biodiversity, including the fact that a species is useful or harmful to humans, has a common name and is listed in the IUCN Red List. Furthermore, large-bodied, broadly distributed and taxonomically unique species receive more scientific and societal attention, whereas colorfulness and phylogenetic proximity to humans correlates exclusively with societal attention. These results highlight a favoritism towards limited branches of the Tree of Life, and that scientific and societal priorities in biodiversity research broadly align. This suggests that we may be missing out on key species in our research and conservation agenda simply because they are not on our cultural radar.
With a carefully collected dataset and compelling analyses, this fundamental manuscript demonstrates detailed links between societal and academic interest and natural species across the globe. In doing so, the authors reveal biases that may be diminishing our abilities to care for the species on our planet that may need our care the most. While some parts of this manuscript reflect previously published work, the authors are commended for putting all the puzzle pieces together for the first time. Their work highlights our uneven knowledge of biodiversity and its potential causes.
Human relationships with biodiversity trace back to our dawn as a species (Wilson, 1993). Wildlife permeates art, myths, and traditions, it constitutes an irreplaceable source of food and goods, and, even in the digital age, it remains one of the most powerful triggers of human emotions (Correia and Mammola, 2023; Hicks and Stewart, 2018; Jacobs, 2012; Soga and Gaston, 2016). Furthermore, the birth of modern science has turned biodiversity into a subject of intense investigation. However, scientific and societal attention towards biodiversity is unevenly distributed across the branches of the Tree of Life (Wilson et al., 2007). Whether for utilitarian reasons or due to conflictual emotional stimuli (Nyhus, 2016), we have better knowledge of some species than others (Jarić et al., 2022).
Widespread evidence indicates that biodiversity research has concentrated on certain lineages, habitats and geographic regions over others (Clark and May, 2002; García-Roselló et al., 2023; Hortal et al., 2015; Mammola et al., 2023; Šmíd, 2022; Troudet et al., 2017). At the species level, for example, research interests and conservation efforts are often skewed toward vertebrates rather than other animals (Cardoso et al., 2011b, 2011a; Leather, 2013), plants (Adamo et al., 2022; Balding and Williams, 2016) or fungi (Gonçalves et al., 2021; Oyanedel et al., 2022). Furthermore, scientific and societal attention towards species may correlate, to some degree, with aesthetic features (Adamo et al., 2021; Borgi et al., 2014; Ward et al., 1998), online popularity (Correia et al., 2016; Mammola et al., 2020) and phylogenetic proximity to humans (Miralles et al., 2019), although the relative importance of these factors is likely to vary across cultural settings and societal groups. Indeed, even the selection of model organisms is not always based on functional criteria (e.g., ease of growth under controlled conditions, cell size, genome size, ploidy level; Hedges, 2002) and instead may be driven by economic, affective, cultural, or other subjective attributes (Dietrich et al., 2020).
Importantly, most attempts to quantify which features make species attractive to humans have focused on vertebrates—typically mammals and birds (Haukka et al., 2023; Miralles et al., 2019). This means we now possess a growing understanding of research biases for selected taxa (Guedes et al., 2023; Šmíd, 2022; Sumner et al., 2018; Zvaríková et al., 2021), but we still lack a comprehensive picture of cross-taxa features that could drive human interest in biodiversity. Here, we explored research and societal interest in organisms across the Tree of Life, asking two general questions: What are the species-level and cultural drivers of scientific interest throughout the Tree of Life? And, how do those drivers differ from those explaining societal interest? To this end, we randomly sampled 3,019 species spanning 29 Phyla and Divisions (Figure S1). We sourced the number of scientific papers focusing on each species as a measure of scientific interest (Figure S2A), and the number of views of the Wikipedia page of each species as a measure of societal interest (Figure S2B). Furthermore, we collected species-level traits referring to morphology and ecology (size, coloration, range size, biome and taxonomic uniqueness) and cultural factors reflecting how humans perceive and interact with biodiversity (usefulness and harmfulness for humans, presence of a common name in English, phylogenetic distance to humans, IUCN conservation status).
The number of scientific papers focusing on these randomly selected species varied by four orders of magnitude and showed a highly skewed distribution (Figure 1A). While 52% of species lacked scientific papers associated with their scientific name in the Web of Science (median ± S.E. = 0 ± 3.96), there was a long tail of comparatively few species attracting substantial scientific attention (the most studied species in our selection, Ginkgo biloba L., appeared in as many as 7,280 scientific papers) (Figure S2A). In contrast, the distribution of the number of views in Wikipedia was less skewed (Figure 1A), but there was enormous disparity in societal attention across species (266 ± 25,217; range = 0– 50,727,745) (Figure S2B). With the notable exception of Chordata (the Phylum encompassing all vertebrates), most species from other taxonomic groups attracted more scientific interest than expected from societal attention (Figure 1B). The few species that attracted disproportionately more societal than scientific attention were colorful, of larger size, and possessed a common name (Figure S3).
Next, we modeled scientific and societal interest in relation to species-level traits and cultural features using generalized linear mixed effects models, controlling for phylogenetic and geographic effects. This analysis revealed a set of drivers that were associated with a high scientific and societal interest (Figure 2A; see methods for driver-specific hypotheses), with scientific and societal priorities largely mirroring each other. First, larger species were more attractive to both scientists and the general public. Second, species with broader geographic distributions and taxonomically unique species (i.e., with fewer congenerics) all received greater scientific and societal attention. Third, several cultural features strongly correlated with both scientific and societal interest, including the presence of a common name, whether a species is useful and/or harmful for humans, and whether a species had been assessed in the International Union for Conservation of Nature (IUCN) Red List of Threatened Species. Finally, there were three traits uniquely associated with societal interest in organisms: colorful species, freshwater-dwelling species, and species phylogenetically closer to humans all received greater societal attention.
Overall, both models explained ∼60% of variance, with an additional ∼20% captured by random effects related to taxonomic relatedness and geographic provenance. Using variance partitioning analysis, we compared the relative contribution of morphological, ecological and cultural factors in determining the observed pattern of research and societal attention. Cultural features were the most important in explaining the choice of investigated species across the scientific literature (31% of explained variance) and, to an even greater extent, the number of views on Wikipedia (38%). Species-level traits explained 12% of the variance in the scientific model and 15% of the variance in the societal interest model, whereas both sets of drivers jointly contributed an additional 19 and 16%, respectively, to the two models (Figure 2B).
We found that the strongest drivers of research and societal interest are utilitarian cultural features, namely whether a species is useful and/or harmful for humans in some way (Figure 2A), matching previous evidence based on restricted taxonomic samples. For example, Vardi et al. (2021) showed that in Israel, the most popular plants in terms of online representation often have some use for humans. Similarly, Ladle et al. (2019) found that bird representation online is strongly associated with long histories of human interactions, for example in the form of hunting or pet-keeping. From a cognitive standpoint, an interpretation of this relationship may be rooted in our ancestral past, when we more often relied on wildlife products and we were more frequently subject to predation and other hazards related to wildlife. Experimental evidence suggests that, even in today’s society, images of dangerous animals are better able to arouse and maintain human attention (Yorzinski et al., 2014). Interestingly, harmfulness to humans was not a significant driver of scientific and societal interest in Tracheophyta (Figure 3). This result may partly be an artifact because plants dangerous to humans are those that are poisonous, but many poisonous plants are contemporary medicinal plants, making it difficult to draw a clear border between usefulness and dangerousness. This is also the case for many poisonous animals, but since vascular plants do not move, the value of their poison as a medicine might overrun our perception of them as a threat. More broadly, many species with deeply-rooted histories of interactions with humans retain their importance in specific cultural contexts, particularly among indigenous peoples, and are thus more likely to remain salient nowadays. Disrupting these connections can have important biocultural consequences, and negatively affect both the species and the communities that value them (Ladle et al. 2023; Reyes-Garcia et al. 2023).
Species with a common name also attracted more scientific and popular interest, matching previous studies (e.g., Vardi et al., 2021). This result should be interpreted with caution, however, because we considered only whether a species possesses an English common name. While we recognize the limitations of this approach, English was selected due to the lack of a comprehensive list of common species names in multiple languages, and because most species that are relevant in other cultural and language settings are also likely to have been attributed English common names as part of legislative, scientific and other societal processes. It must also be noted that this variable entails some circularity, given that humans tend to assign common names to popular species and/or those that are relevant to humans in some way. For example, a recent study showed that across nine local villages in Mozambique, species perceived as dangerous were more likely to have a local name (Farooq et al., 2021). Interestingly, this speaks about the possible existence of specific interactions among different cultural traits and cultural settings which our results do not capture and could be further explored with targeted studies.
The positive effect of body size on scientific and societal interest suggests our attention is likely best captured by organisms with sizes similar (or larger than) our own, rather than organisms that are barely visible. Furthermore, larger species are easier to study and more detectable in the field (Johnston et al., 2014; Kéry and Gregg, 2003). Previous studies documented positive relationships between human interest and body size, e.g., in different vertebrate groups (Berti et al., 2020; Guedes et al., 2023; Ward et al., 1998; Żmihorski et al., 2013) and flowering plants (Adamo et al., 2021), while others observed negative relationships, e.g., in passerine birds (Garrett et al., 2018) and butterflies (Żmihorski et al., 2013). This hints that there may be some within-group variability that is not captured in our broad-scale analysis. However, it is worth noting that most previous studies have focused on organisms that are within the same approximate size range as humans. Indeed, when we repeated regression analyses within subsets of data corresponding to the phylum Chordata, Arthropoda and Tracheophyta, we found that the effect of body size was not significant in Tracheophyta (Figure 3). While our random sample of Tracheophyta encompassed an enormous range of sizes—from a duckweed to a sequoia—it may be that attractiveness in plants is primarily controlled by other aesthetic drivers (Adamo et al., 2021).
Different variables reflecting both commonness and rarity contributed markedly in explaining scientific and, to a lesser extent, societal interest. The positive relationship between scientific and societal interest and geographic range size suggests a broader area of distribution could make a species accessible and visible to more people, including researchers, and thus more likely to be studied and searched for in Wikipedia. This result aligns with previous studies observing a positive correlation between proxies of species familiarity and online popularity (Correia et al., 2016; Żmihorski et al., 2013) or scientific interest (Adamo et al., 2021). Furthermore, taxonomically unique species often attracted more scientific and societal interest. These species may represent unique adaptations and phylogenetic distinctiveness and thus be of interest from research or conservation standpoints. Taxonomic uniqueness may also appeal and fascinate the general public, as in famous cases of the discovery of living individuals belonging to taxa previously restricted to the fossil record such as the maidenhair tree (Ginkgo biloba L.) or the coelacanth fish (Latimeria chalumnae Smith). Conservation rarity, measured as presence and status on the IUCN Red List, was also an important driver of scientific and societal interest. Concerning scientific interest, this was true regardless of the threatened status, namely both endangered and least concern species were more studied and popular across our dataset compared to unlisted species. This variable also entails a certain degree of circularity: IUCN assessments require a lot of data, making it possible to confidently assess species only when there is background information on their distribution and threats.
Finally, colorfulness and phylogenetic proximity to humans correlated exclusively with societal attention. Colorfulness is an important proxy for the aesthetic value of biodiversity (Langlois et al., 2022) and has been shown to often match cultural and economic interests—for example, it was recently shown that colorful birds and fish are more frequently targeted in wildlife trade (Borges et al., 2022; Senior et al., 2022). Phylogenetic proximity to humans seemingly correlates with a range of traits including the degree of empathy and anthropomorphism toward species. This result resonates with a recent study by Miralles et al. (2019), who used an online survey to assess the empathy of 3500 raters towards 52 taxa (animals, plants and fungi) and observed a strong negative correlation between empathy scores and the divergence time separating the different taxa from Homo sapiens (Miralles et al., 2019). It is more difficult to explain the fact that freshwater-dwelling species were significantly more searched for in Wikipedia than species inhabiting multiple habitats. Speculatively, this may reflect human preference for species inhabiting habitats that are more foreign to human experience, but may also be a sampling artifact (only 103 species in the model, less than 4% of the total, were freshwater-dwelling).
The fact that subjectivity might drive scientific and societal attention towards biodiversity is not a problem per se, but, in the long run, it may bias our general understanding of life on Earth to the point of influencing policy decisions and allocation of research and conservation funding. For example, more popular species tend to receive more funding and resources for conservation efforts (Adamo et al., 2022; Davies et al., 2018; Mammola et al., 2020) and the allocation of protected areas has not adequately considered non-vertebrate species, as up to two-thirds of threatened insect species are not currently covered by existing protected areas (Chowdhury et al., 2023). This disparity in awareness may also influence species’ long-term conservation prospects—a species is less likely to go extinct if humans choose to protect it. Bluntly put, it may be that we are concentrating our attention on species that humans generally consider to be useful, beautiful, or familiar, rather than species that deserve more research effort due to a higher extinction risk and/or due to the key role they play in ecosystem stability and functioning.
Excluding subjectivity when developing any research agenda is certainly challenging. However, once we are aware that utilitarian needs and emotional and familiarity factors play a key role in the development of biodiversity research globally, we can start moving toward more balanced research agendas by carefully selecting which criteria we want to focus on. Ideally, we should aim, over time, for all parameter estimates in Figure 2A to move towards the middle (with the possible exceptions of IUCN categories). This strategy would minimize the effect of aesthetic and cultural factors in the selection of research and conservation priorities, and can be achieved over time through a more even repartition of research and conservation funds (see, e.g., Mammola et al., 2020 for a concrete agenda).
Global biodiversity is disappearing at an accelerating pace, not only from the physical world (Barnosky et al., 2011; Cowie et al., 2022) but also from our minds (Donadio Linares, 2022; Jarić et al., 2022; Soga and Gaston, 2016). Given that the long-term survival of humanity is intertwined with the natural world, preserving biodiversity in all its forms and functions (including cultural awareness of it) is a central imperative of the 21st century (Díaz and Malhi, 2022; Jarić et al., 2022; Loreau et al., 2021). However, biodiversity goals can only be reached by ensuring a ‘level playing field’ in the selection of conservation priorities, rather than looking exclusively at the most appealing branches of the Tree of Life.
We carried out random stratified sampling of the eukaryotic multicellular Tree of Life [Animalia, Fungi (restricted to Agaricomycetes), and Plantae (excluding unicellular Algae)] using the Global Biodiversity Information Facility (GBIF) backbone taxonomy (www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c; accessed on 01 May 2020). To our knowledge, GBIF is the only available backbone taxonomy covering all our target groups using a congruent classification. Note that we restricted our analyses to pluricellular organisms to by-pass issues with the unstable taxonomic classification of protists (Adl et al., 2019, 2012) and the challenge of extracting comparable traits between unicellular and multicellular eukaryotes.
Initially, we cleaned the GBIF backbone taxonomy by sub-selecting only accepted names (taxonomicStatus = “accepted”), removing subspecies and varieties (taxonRank = “subspecies” and “variety”), and fossil species [by removing both entirely extinct groups (e.g., Dinosauria) and single species labeled as “Fossil_Specimen”]. We chose the following criteria for the stratified random sampling:
i) The sample was at the species level within each order of Animalia, Fungi and Plantae (this way, we sampled all extant phyla and classes in the database).
ii) For each order, we sampled a fraction of 0.002 species. To avoid having an excessively uneven number of species among orders, we set the following thresholds:
- If the number of species in an order was comprised of between 10,001 and 50,000, we arbitrarily sampled 20 species;
- If the number of species in an order was comprised of between 50,001 and 100,000, we arbitrarily sampled 40 species;
- If the number of species in an order was >100,000, we arbitrarily sampled 60 species.
iii) We incorporated a broader sample of tetrapods so as to reflect the typical knowledge bias (“Institutional vertebratism”; Leather, 2013). For each tetrapod order, we arbitrarily sampled 20 species. However, for small tetrapod orders with less than 10 species, we only sampled 1 species.Institutional vertebratism”; Leather, 2013). For each tetrapod order, we arbitrarily sampled 20 species. However, for small tetrapod orders with less than 10 species, we only sampled 1 species.
This random sampling procedure yielded a database consisting of 3,019 species (Figure S2C). Despite the initial cleaning procedure of the dataset, due to the fact that some taxonomic names were not properly labeled in GBIF, 129 of the sampled names were synonyms, doubtful (nomina dubia) or fossils. We therefore manually inspected all records and dealt with taxonomic issues. Each expert involved in the study made decisions for their focal organisms on the invalid taxonomic names, e.g., reclassifying subspecies to the species rank, replacing eventual synonyms with the currently valid name, and substituting fossils with extant species.
Measures of scientific and societal interest
We collected data on two indicators of human attention towards species, pertaining to scientific and societal interest.
We measured scientific interest as the number of articles indexed in the Web of Science that refer to a given species. This is a standard quantitative estimate of research effort towards individual species (Adamo et al., 2021; dos Santos et al., 2020; Tam et al., 2022; Wilson et al., 2007). We collected data using the R package ‘wosr’ version 0.3.0 (Baker, 2018). Specifically, we queried the Web of Science’s Core Collection database using topic searches (“TS”) and the species scientific name as the search term, and recorded the total number of references published between 1945 and the date of sampling returned by each query. The use of scientific names returns comparable results to searches using vernacular names (Correia et al., 2017; Jarić et al., 2016) but avoids common problems associated with vernacular language queries [e.g., words with multiple meanings (homonyms) or used as brand names (theronyms)].
We measured societal interest for each species as the total number of pageviews across the languages where the species is represented on Wikipedia. Wikipedia is one of the top 10 most visited websites in the world (https://www.similarweb.com/top-websites/, accessed on 3 February 2023) nowadays and is often visited as a source of information for wildlife enthusiasts, many species containing a page in this digital encyclopedia. Wikipedia data has been widely used to explore patterns of popular interest in biodiversity, and total pageviews may be a particularly useful metric in instances where some pages have very few visits overall (Vardi et al., 2021). To extract the number of pageviews for each species, we first obtained the identification number of each species from the Wikidata knowledge base using the R package ‘WikidataQueryServiceR’ version 1.0.0 (Popov, 2020). We then used each species’ identifier to compile a list of available Wikipedia pages for the species in any language using the same query service. Once we identified the full list of Wikipedia pages for the species, we used the R package ‘pageviews’ version 0.5.0 (Keyes and Lewis, 2020) to extract monthly user pageviews (i.e., excluding views by bots) for the period between January 1st 2016 and December 31st 2021.
Species-level traits and associated hypotheses
To investigate the relationship between species-level traits, cultural factors and scientific and popular interest, we selected a set of candidate variables hypothesized to relate to species morphology, ecology and scientific and societal preferences of humans. Extracting comparable traits across distantly related taxa is challenging (Chichorro et al., 2022; Palacio et al., 2022; Weiss and Ray, 2019), thus we restricted the analysis to a small number of scalable traits and kept trait resolution low (i.e., we scored most traits as categorical variables rather than on continuous scales). Importantly, to ensure cross-taxon comparability of traits, we made specific decisions on how to score traits for the different organisms (details of decisions made and sources of traits are provided in Supplementary Text S1).
First, we extracted the average body size for each species (in mm). Size is among the most conspicuous and ubiquitous traits in ecology, relating to diverse body functions and ecological strategies (Calder, 1996; Peters and Peters, 1986). Furthermore, we expected an innate preference for large-sized species among scientists, the media and the public alike (Berti et al., 2020; Hall et al., 2011; Mammola et al., 2017; McClain et al., 2015). We also extracted the average size of males and females to calculate sexual size dimorphism as a possible driver of interest. However, as sex-specific size values were available for <20% of species in the database, we ended up excluding this variable from analyses.
We also scored, as binary variables (Yes/No), whether individuals within a species are colorful overall (brightly-colored and/or multi-colored species), blue-colored (i.e., when the species has bright blue/light blue markings or overall coloration), or red-colored (when the species has bright red/purple markings or overall coloration). In the case of sexually dichromatic species, we scored these traits as “Yes” even if only one sex displayed colorations. While there are more sophisticated ways to compute color variables (e.g., by extracting RGB pixels from standardized photographs; Delhey et al., 2021), this was not possible in our case since photographs were available for only 57% of the species included in our database. Given the role of aesthetics in driving human preference across diverse domains (Hoyer and Stokburger-Sauer, 2012) we hypothesized colorfulness to be a strong driver of attention toward biodiversity (Langlois et al., 2022). Furthermore, we scored red and blue patterns because these colors are known to impact people’s affection, cognition and behavior (Elliot and Maier, 2014). Recent studies on European plants, for example, have highlighted that species with blue/purple flowers are more frequently studied in the scientific literature (Adamo et al., 2021) and receive more conservation funds (Adamo et al., 2022).
For each species, we calculated taxonomic uniqueness as the number of species in the same family (Family uniqueness) or the number of congeneric species (Genus uniqueness). Taxonomic uniqueness may be interesting to scientists and the general public for different reasons. On the one hand, monospecific genera or families may capture divergent phylogenetic lineages defined by the presence of rare or exclusive characters (i.e., unique synapomorphies), and thus be of interest from research or conservation standpoints. On the other hand, families or genera rich in species may be useful as case studies (e.g., to explore evolutionary radiations; Gillespie et al., 2020) or be of interest to the general public simply because of greater accessibility and familiarity.
We marked the main domain inhabited by each species, namely “freshwater”, “marine”, “terrestrial”, or “multiple”. Finally, we used the R package ‘rgbif’ version 3.7.1 (Chamberlain et al., 2022) to extract distribution points for each species. As in Adamo et al. (2021), we expressed the geographical range size of each species as the average distance between occurrence points. This measure (dispersion) is less influenced by sampling effort than commonly used proxies of range size (e.g., minimum convex polygon or the area of occupancy). Hence, it should be better suited when dealing with opportunistically collected occurrence data such as in GBIF (Hughes et al., 2021). Geographical range size is not only a measure of ecological commonness (Gaston, 2011), but also reflects species’ accessibility and familiarity to scientists and the general public. Indeed, there is a tendency for humans to be more interested in wildlife species with which they have direct experience (Ladle et al., 2016), e.g., common species that are available to us through direct experience (Adamo et al., 2021; Schuetz and Johnston, 2019). Using the GBIF coordinates, we also extracted the coordinate of the centroid of each species’ range, providing a rough indication of their geographic provenance (Figure S1). Using the FADA Faunistic Regions database (Balian et al., 2008) (available at www.marineregions.org; accessed on 1 November 2022), we extracted the biogeographic region in which each species occurs (Afrotropical, Antarctic, Australasian, Nearctic, Neotropical, Oriental, Pacific, and Palaearctic) based on the centroid coordinates.
To express cultural knowledge and relationships between humans and wildlife, we scored, as binary variables (Yes/No), whether: i) a species has a popular name in English (Common name); ii) is an established scientific model organism beyond ecology and evolution (Model organism); iii) is harmful to humans in some way—e.g., crop pests, invasive species, species potentially dangerous to humans (large carnivores, venomous snakes, etc.) (Harmful to human); iv) has any commercial and/or cultural use (e.g. used as pets, as food or for pharmaceuticals) (Human use); and v) whether it has been assessed by the IUCN. Although we acknowledge that for the variables Harmful to human and Human use further subcategories could be used (e.g., crop pests, invasive, and harmful to humans may elicit different reactions and interests from a scientific and societal perspective), we decided not to split them due to sample size limitations.
We obtained divergence time (in millions of years) between each organism and Homo sapiens from the Time Tree database (Kumar et al., 2022). For this, we used a modified version of the timetree() function in the R package ‘timetree’ version 1.0 (https://github.com/FranzKrah/timetree; accessed on 8 November 2021). First, we obtained pairwise divergence time between each taxon and H. sapiens by running the function at the genus rank. If the assignment failed, we ran the function iteratively up to the family rank. If still missing, we manually assigned values to the first occurring rank in Time Tree (78 taxa, 2.3% of total). We hypothesize divergence time from H. sapiens to be a key factor that may explain human interest in biodiversity (Wilson, 1993), relating to empathy and compassion towards species (Miralles et al., 2019) and the degree of anthropomorphism in human-organism interactions (Servais, 2018).
Finally, we expressed the conservation status of each species as their IUCN extinction risk, which we extracted from the IUCN Red List of Threatened species using the R package ‘rredlist’ version 0.7.0 (Chamberlain, 2022). We assigned each species to one of the following categories: Extinct (EX), Extinct in the Wild (EW), Critically Endangered (CR), Endangered (EN), Vulnerable (VU), Near Threatened (NT), Least Concern (LC), Data Deficient (DD), and Not evaluated (NE). To balance the factor levels, we later re-grouped the different categories into three levels: “Threatened” (EX, EW, CR, EN and VU), “Non-Threatened” (NT and LC), and “Unknown” (DD and NE).
We used regression analyses (Zuur and Ieno, 2016) to test whether there were consistent relationships between scientific (number of scientific papers) and societal (number of views in Wikipedia) interest in an organism and species-level traits and cultural features. We carried out all analyses in R version 4.1.0 (R Core Team, 2021). We used the package ‘glmmTMB’ version 1.1.1 for modeling (Brooks et al., 2017) and ‘ggplot2’ version 3.3.4 (Wickham, 2016) for visualizations. In all analyses, we followed the general approach by Zuur & Ieno (2016) for data exploration, model fitting and validation. For data exploration, we visually inspected variable distribution, the presence of outliers, collinearity among continuous predictors (using pairwise Pearson’s correlations) and the balance of factor levels (Zuur et al., 2009). For model validation, we used the suite of functions of the package ‘performance’ version 0.0.0.6 (Lüdecke et al., 2020) to visually inspect model residuals and evaluate overdispersion, zero-inflation and multicollinearity. Given the large sample size of our dataset, we used a conservative approach in the identification of significance, setting an alpha level for significance at 0.01 instead of the usually accepted 0.05 (Benjamin et al., 2018). Furthermore, in interpreting and discussing results, we gave more relevance to explained variance and effect sizes rather than significance (Muff et al., 2021).
In a first set of models, we explored the role of species-level and cultural traits in explaining scientific and popular interest (dependent variables). As a result of data exploration, we log-transformed the variables Organism size, Range size, Family uniqueness and Phylogenetic distance to humans to homogenize their distribution and minimize the effect of a few outlying observations. We dropped the categorical variable Model organism because it was highly unbalanced—our random sample of species across the Tree of Life only captured 15 species classified as model organisms. Likewise, the variables blue colored and red colored were unbalanced and, to a certain extent, associated with the variable Colorful. We used only the latter in the analyses. Finally, we scaled continuous variables to a mean of zero and a standard deviation of one to facilitate model convergence and interpretation of the effect sizes. We fitted the initial models assuming a Poisson error structure (suitable for count data) and a log-link function (ensuring positive fitted data). The models had the formula (in R notation):
(Eq. 1) y ∼ Organism Size + Colorful + Range size + Domain + Taxonomic uniqueness + Common name + IUCN + Human use + Harmful to human + Phylogenetic distance to humans + (1 | Phylum / Class / Order) + (1 | Biogeographic region)
Where y was either the N° of articles in the Web of Science (Scientific interest) or the N° of views in Wikipedia (Popular interest). We introduced random factors to take into account the non-independence of observations. We accounted for taxonomic relatedness among species with a nested random intercept structure (1 | Phylum / Class / Order), under the assumption that closely related species should share more similar traits than would be expected from a random sample of species. Likewise, we used the random intercept structure (1 | Biogeographic region) under the assumption that people from the same region, including researchers, might be geographically biased in their interests, i.e., share common appreciation for similar species. Both models were overdispersed (Scientific interest: dispersion ratio = 47.2; Pearson’s Chi2 = 109874.8; p < 0.001; Popular interest: dispersion ratio = 632366.5; Pearson’s Chi2 = 1471516950.1; p < 0.001). Therefore, we fitted new models assuming a negative binomial distribution—i.e., a generalization of Poisson distribution that loosens the assumption that the variance should be equal to the mean.
Model validation for the scientific interest model revealed the existence of a highly influential observation corresponding to the Asian elephant (Elephas maximus L.). We therefore refitted the model removing this observation, which yielded almost identical model estimates but a better distribution of residuals versus fitted values. Also in the case of the popular interest model, there was a highly influential observation corresponding to the Mugger crocodile [Crocodylus palustris (Lesson)], which we removed. Model validation further revealed that the popular interest model was underfitting zeros (Observed zeros: 176; Predicted zeros: 95; Ratio: 0.54), suggesting probable zero-inflation. Therefore, we refitted the model as a standard zero-inflated negative binomial model, using the default “NB2” parameterization implemented in ‘glmmTMB’ (Hardin and Hilbe, 2007). This substantially improved model fit (Akaike Information Criterion of 42727.9 versus 42805.1). No multicollinearity affected either final model, with all Variance Inflation Factors for covariates below 3 (Zuur et al., 2009).
Once the models were fitted and validated, we used variance partitioning analysis (Borcard et al., 1992) to estimate the relative contribution of species-level traits and cultural factors in determining the observed pattern of scientific and societal interest. We used variance explained (marginal R2) to evaluate the contribution of each variable and combination of variables to the research and societal attention each species receives, by partitioning their explanatory power with the R package ‘modEvA’ version 2.0 (Barbosa et al., 2015).
Next, we tested whether the importance of traits would change across the main groups of organisms by running three models within subsets of data corresponding to Arthropoda, Chordata, and Tracheophyta (i.e., the Phyla/Divisions with most observations). The structure of the models was:
(Eq. 2) y ∼ Organism Size + Colorful + Range size + Domain + Genus uniqueness + Common name + IUCN + Human use + Harmful to human + (1 | Class / Order) + (1 | Biogeographic region)
The formula is essentially the same as Eq. 1, but for the exclusion of Phylum from the random part (as we modeled at the Phylum/Division level) and Phylogenetic distance to humans from the fixed part (as we lacked enough resolution in the phylogenetic distance information within Phyla). We also used Genus uniqueness instead of Family uniqueness given that we modeled at the Phylum level. Also in this case, since Poisson models were overdispersed, we switched to a negative binomial distribution.
Finally, we ran an analysis to understand which species-level traits drive the relative interest of scientists and the general public in different taxa. First, we used a generalized additive model to model the relationship between Popular interest and Scientific interest (Figure 1A). For each species, we extracted the residuals from this regression curve, whereby positive residuals indicate species with a greater popular than scientific interest, residuals close to zero indicate species with a balanced popular and scientific interest, and negative residuals indicate species with a greater scientific than popular interest (Figure 1B). Next, we used a Gaussian linear mixed model to model the relationship between the residuals and species-level traits. This model had the same general formula as Eq. 1.
Conceptualization: SM, MA, MC, TC, JC, PC, DF, AM, RC Study design: SM
Data collection (traits): all authors except RC and PC
Data mining (online databases): MC, RC, SM, and TC
Analysis: SM, RC
Interpretation: all authors
Writing, first draft: SM; RC
Writing, contributions: all authors
Thanks to Caio Graco-Roza for helping with ggplot2. Filipe Chichorro kindly compiled traits for ants. The authors acknowledge the support of NBFC, funded by the Italian Ministry of University and Research, P.N.R.R., Missione 4 Componente 2, “Dalla ricerca all’impresa”, Investimento 1.4, Project CN00000033.
Conflict of interest
Data and code availability
The database used in the analyses is available in Figshare (doi: 10.6084/m9.figshare.22731440). The R code to generate analyses and figures is available on GitHub (https://github.com/StefanoMammola/Mammola_et_al_ToL_research_interest).
Supplementary text S1
- 1.Plant scientists’ research attention is skewed towards colourful, conspicuous and broadly distributed flowersNat Plants 7:574–578https://doi.org/10.1038/s41477-021-00912-2
- 2.Dimension and impact of biases in funding for species and habitat conservationBiol Conserv 272https://doi.org/10.1016/j.biocon.2022.109636
- 3.Revisions to the Classification, Nomenclature, and Diversity of EukaryotesJ Eukaryot Microbiol 66:4–119 https://doi.org/10.1111/jeu.12691
- 4.The Revised Classification of EukaryotesJ Eukaryot Microbiol 59:429–514 https://doi.org/10.1111/j.1550-7408.2012.00644.x
- 5.wosr: Clients to the “Web of Science” and “InCites” APIs
- 6.Plant blindness and the implications for plant conservationConserv Biol 30:1192–1199https://doi.org/10.1111/cobi.12738
- 7.An introduction to the freshwater animal diversity assessment (FADA) projectHydrobiologia 600https://doi.org/10.1007/s10750-007-9235-6
- 8.modEvA: Model Evaluation and Analysis
- 9.Has the Earth’s sixth mass extinction already arrived?Nature 471https://doi.org/10.1038/nature09678
- 10.Redefine statistical significanceNat Hum Behav 2:6–10https://doi.org/10.1038/s41562-017-0189-z
- 11.Body size is a good proxy for vertebrate charismaBiol Conserv 251https://doi.org/10.1016/j.biocon.2020.108790
- 12.Partialling out the Spatial Component of Ecological VariationEcology 73:1045–1055https://doi.org/10.2307/1940179
- 13.Marine or freshwater: the role of ornamental fish keeper’s preferences in the conservation of aquatic organisms in BrazilPeerJ 10https://doi.org/10.7717/peerj.14387
- 14.Baby schema in human and animal faces induces cuteness perception and gaze allocation in childrenFront Psychol 5https://doi.org/10.3389/fpsyg.2014.00411
- 15.glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed ModelingR J 9:378–400https://doi.org/10.32614/RJ-2017-066
- 16.Size, function, and life historyCourier Corporation
- 17.Adapting the IUCN Red List criteria for invertebratesBiol Conserv 144:2432–2440https://doi.org/10.1016/j.biocon.2011.06.020
- 18.The seven impediments in invertebrate conservation and how to overcome themBiol Conserv 144:2647–2655https://doi.org/10.1016/j.biocon.2011.07.024
- 19.rredlist: “IUCN” Red List Client
- 20.Oldoni DWaller J
- 21.Trait-based prediction of extinction risk across terrestrial taxaBiol Conserv 274https://doi.org/10.1016/j.biocon.2022.109738
- 22.Three-quarters of insect species are insufficiently represented by protected areasOne Earth https://doi.org/10.1016/j.oneear.2022.12.003
- 23.Taxonomic Bias in Conservation ResearchScience https://doi.org/10.1126/science.297.5579.191b
- 24.Internet scientific name frequency as an indicator of cultural salience of biodiversityEcol Indic 78:549–555https://doi.org/10.1016/j.ecolind.2017.03.052
- 25.Familiarity breeds content: assessing bird species popularity with culturomicsPeerJ 4https://doi.org/10.7717/peerj.1728
- 26.The searchscape of fear: A global analysis of internet search trends for biophobiasPeople Nat https://doi.org/10.1002/pan3.10497
- 27.The Sixth Mass Extinction: fact, fiction or speculation?Biol Rev 97:640–663https://doi.org/10.1111/brv.12816
- 28.Popular interest in vertebrates does not reflect extinction risk and is associated with bias in conservation investmentPLoS One 14https://doi.org/10.1371/journal.pone.0203694
- 29.Migratory birds are lighter colouredCurr Biol 31:R1511–R1512https://doi.org/10.1016/j.cub.2021.10.048
- 30.Biodiversity: Concepts, Patterns, Trends, and PerspectivesAnnu Rev Environ Resour 47:31–63https://doi.org/10.1146/annurev-environ-120120-054300
- 31.How to choose your research organismStud Hist Philos Sci Part C Stud Hist Philos Biol Biomed Sci 80https://doi.org/10.1016/j.shpsc.2019.101227
- 32.The awkward question: What baseline should be used to measure biodiversity loss? The role of history, biology and politics in setting up an objective and fair baseline for the international biodiversity regimeEnviron Sci Policy 135:137–146https://doi.org/10.1016/j.envsci.2022.04.019
- 33.Drivers of taxonomic bias in conservation research: a global analysis of terrestrial mammalsAnim Conserv 23:679–688https://doi.org/10.1111/acv.12586
- 34.Color Psychology: Effects of Perceiving Color on Psychological Functioning in HumansAnnu Rev Psychol 65:95–120https://doi.org/10.1146/annurev-psych-010213-115035
- 35.Species perceived to be dangerous are more likely to have distinctive local namesJ Ethnobiol Ethnomed 17https://doi.org/10.1186/s13002-021-00493-6
- 36.The biased distribution of existing information on biodiversity hinders its use in conservation, and we need an integrative approach to act urgentlyBiol Conserv 283https://doi.org/10.1016/j.biocon.2023.110118
- 37.Introducing change: A current look at naturalized bird species in western North AmericaTrends Tradit Avifaunal Chang West North Am :116–130
- 38.Common EcologyBioscience 61:354–362https://doi.org/10.1525/bio.2011.61.5.4
- 39.Comparing Adaptive Radiations Across Space, Time, and TaxaJ Hered 111:1–20https://doi.org/10.1093/jhered/esz064
- 40.Include all fungi in biodiversity goalsScience https://doi.org/10.1126/science.abk1312
- 41.Species out of sight: elucidating the determinants of research effort in global reptilesEcography (Cop) n/a:e 6491https://doi.org/10.1111/ecog.06491
- 42.Forests and trees as charismatic mega-flora: implications for heritage tourism and conservationJ Herit Tour 6:309–323https://doi.org/10.1080/1743873X.2011.620116
- 43.Generalized linear models and extensions
- 44.The iratebirds Citizen Science Project: a Dataset on Birds’ Visual Aesthetic Attractiveness to HumansSci Data 10:297 https://doi.org/10.1038/s41597-023-02169-0
- 45.The origin and evolution of model organismsNat Rev Genet 3:838–849https://doi.org/10.1038/nrg929
- 46.Exploring potential components of wildlife-inspired aweHum Dimens Wildl 23:293–295https://doi.org/10.1080/10871209.2018.1419518
- 47.Seven Shortfalls that Beset Large-Scale Knowledge of BiodiversityAnnu Rev Ecol Evol Syst 46:523–549https://doi.org/10.1146/annurev-ecolsys-112414-054400
- 48.The role of aesthetic taste in consumer behaviorJ Acad Mark Sci 40:167–180https://doi.org/10.1007/s11747-011-0269-y
- 49.Sampling biases shape our view of the natural worldEcography (Cop 44:1259–1269https://doi.org/10.1111/ecog.05926
- 50.Human Emotions Toward WildlifeHum Dimens Wildl 17:1–3https://doi.org/10.1080/10871209.2012.653674
- 51.Data mining in conservation research using Latin and vernacular species namesPeerJ 4https://doi.org/10.7717/peerj.2202
- 52.Societal extinction of speciesTrends Ecol Evol 37:411–419https://doi.org/10.1016/j.tree.2021.12.011
- 53.Species traits explain variation in detectability of UK birdsBird Study 61:340–350https://doi.org/10.1080/00063657.2014.941787
- 54.Effects of life-state on detectability in a demographic study of the terrestrial orchid Cleistes bifariaJ Ecol 91:265–273https://doi.org/10.1046/j.1365-2745.2003.00759.x
- 55.pageviews: An API Client for Wikimedia Traffic Data
- 56.TimeTree 5: An Expanded Resource for Species Divergence TimesMol Biol Evol 39https://doi.org/10.1093/molbev/msac174
- 57.Conservation culturomicsFront Ecol Environ 14:269–275https://doi.org/10.1002/fee.1260
- 58.A culturomics approach to quantifying the salience of species on the global internetPeople Nat 1:524–532https://doi.org/10.1002/pan3.10053
- 59.Biocultural aspects of species extinctionsCambridge Prisms: Extinction :1–21https://doi.org/10.1017/ext.2023.20
- 60.The aesthetic value of reef fishes is globally mismatched to their conservation prioritiesPLOS Biol 20https://doi.org/10.1371/journal.pbio.3001640
- 61.Institutional vertebratism hampers insect conservation generally; not just saproxylic beetle conservationAnim Conserv 16:379–380https://doi.org/10.1111/acv.12068
- 62.Biodiversity as insurance: from concept to measurement and applicationBiol Rev 96:2333–2354https://doi.org/10.1111/brv.12756
- 63.. performance: An R Package for AssessmentComparison and Testing of Statistical Models. J Open Source Softw 6https://doi.org/10.21105/joss.03139
- 64.How much biodiversity is concealed in the word ‘biodiversity’?Curr Biol 33:R59–R60https://doi.org/10.1016/j.cub.2022.12.003
- 65.Record breaking achievements by spiders and the scientists who study themPeerJ 5https://doi.org/10.7717/peerj.3972
- 66.Towards a taxonomically unbiased European Union biodiversity strategy for 2030Proc R Soc B Biol Sci 287https://doi.org/10.1098/rspb.2020.2166
- 67.Sizing ocean giants: patterns of intraspecific size variation in marine megafaunaPeerJ 3https://doi.org/10.7717/peerj.715
- 68.Empathy and compassion toward other species decrease with evolutionary divergence timeSci Rep 9https://doi.org/10.1038/s41598-019-56006-9
- 69.Rewriting results sections in the language of evidenceTrends Ecol Evol 37:203–210https://doi.org/10.1016/j.tree.2021.10.009
- 70.Human–Wildlife Conflict and CoexistenceAnnu Rev Environ Resour 41:143–171https://doi.org/10.1146/annurev-environ-110615-085634
- 71.A way forward for wild fungi in international sustainability policyConserv Lett 22https://doi.org/10.1111/conl.12882
- 72.A protocol for reproducible functional diversity analysesEcography 11https://doi.org/10.1111/ecog.06287
- 73.The ecological implications of body size
- 74.WikidataQueryServiceR: API Client Library for ’Wikidata Query Service. R Core Team. 2021.R: A Language and Environment for Statistical Computing
- 75.Biocultural vulnerability exposes threats of culturally important speciesProc Natl Acad Sci 120https://doi.org/10.1073/pnas.2217303120
- 76.Characterizing the cultural niches of North American birdsProc Natl Acad Sci 116:10868–10873https://doi.org/10.1073/pnas.1820670116
- 77.Wildlife trade targets colorful birds and threatens the aesthetic value of natureCurr Biol 32:4299–4305https://doi.org/10.1016/j.cub.2022.07.066
- 78.Anthropomorphism in human–animal interactions: A pragmatist viewFront Psychol 9https://doi.org/10.3389/fpsyg.2018.02590
- 79.Geographic and taxonomic biases in the vertebrate tree of lifeJ Biogeogr 49:2120–2129https://doi.org/10.1111/jbi.14491
- 80.Extinction of experience: the loss of human–nature interactionsFront Ecol Environ 14:94–101https://doi.org/10.1002/fee.1225
- 81.Why we love bees and hate waspsEcol Entomol 43:836–845https://doi.org/10.1111/een.12676
- 82.Quantifying research interests in 7,521 mammalian species with h-index: a case studyGigascience 11https://doi.org/10.1093/gigascience/giac074
- 83.Taxonomic bias in biodiversity data and societal preferencesSci Rep 7https://doi.org/10.1038/s41598-017-09084-6
- 84.Combining culturomic sources to uncover trends in popularity and seasonal interest in plantsConserv Biol 35:460–471https://doi.org/10.1111/cobi.13705
- 85.The Relationship between Popularity and Body Size in Zoo AnimalsConserv Biol 12:1408–1411https://doi.org/10.1111/j.1523-1739.1998.97402.x
- 86.Unifying functional trait approaches to understand the assemblage of ecological communities: synthesizing taxonomic dividesEcography (Cop 42:2012–2020https://doi.org/10.1111/ecog.04387
- 87.ggplot2: Elegant Graphics for Data Analysis
- 88.Biophilia and the conservation ethic. In: Kellert S, Wilson E O, editors. The Biophilia Hypothesis. Washington, DC
- 89.The (Bio)diversity of Science Reflects the Interests of SocietyFront Ecol Environ 5:409–414
- 90.Dangerous Animals Capture and Maintain Attention in HumansEvol Psychol 12https://doi.org/10.1177/147470491401200304
- 91.Ecological correlates of the popularity of birds and butterflies in Internet information resourcesOikos 122:183–190
- 92.A protocol for conducting and presenting results of regression-type analysesMethods Ecol Evol 7:636–645https://doi.org/10.1111/2041-210X.12577
- 93.A protocol for data exploration to avoid common statistical problemsMethods Ecol Evol 1:3–14https://doi.org/10.1111/j.2041-210x.2009.00001.x
- 94.What makes spiders frightening and disgusting to people?Front Ecol Evol 9https://doi.org/10.3389/fevo.2021.694569