Drivers of species knowledge across the Tree of Life

  1. Molecular Ecology Group (MEG), Water Research Institute (CNR-IRSA), National Research Council, Verbania Pallanza, Italy
  2. Laboratory for Integrative Biodiversity Research (LIBRe), Finnish Museum of Natural History (LUOMUS), University of Helsinki, Helsinki, Finland
  3. National Biodiversity Future Center, Palermo, Italy
  4. Department of Life Sciences and Systems Biology, University of Turin, Torino, Italy
  5. University of Belgrade - Faculty of Biology, Belgrade, Serbia
  6. Curtin University, Perth, Australia
  7. Jodrell Laboratory, Kew Gardens, London, UK
  8. Department of Aquaculture, Isparta University of Applied Sciences, Isparta, Türkiye
  9. CIIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
  10. Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Rome, Italy
  11. CBMA - Centre of Molecular and Environmental Biology, Department of Biology, University of Minho, Campus Gualtar, Braga, Portugal
  12. Instituto de Investigaciones Marinas, CSIC, Eduardo Cabello, 6. 36208. Vigo, Pontevedra, Spain
  13. Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales, Madrid, Spain
  14. Helsinki Lab of Interdisciplinary Conservation Science (HELICS), Department of Geosciences and Geography, University of Helsinki, Helsinki, Finland
  15. Helsinki Institute of Sustainability Science (HELSUS), University of Helsinki, Helsinki, Finland
  16. CESAM – Centre for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal

Editors

  • Reviewing Editor
    David Donoso
    Escuela Politécnica Nacional, Quito, Ecuador
  • Senior Editor
    George Perry
    Pennsylvania State University, University Park, United States of America

Reviewer #1 (Public Review):

Overall, I quite enjoyed reading the manuscript and found it very well-structured and organized. I congratulate the authors for building this nice research. I do have a few major points to raise, but probably they would not affect the general message of the manuscript.

I was confused about how IUCN data were used. The IUCN predictors are not mentioned in the model equations presented in the manuscript, but their effect size is reported in Figure 2. In the manuscript Methods, it is said that IUCN data was classified into 3 categories. I believe there was a mix of mechanisms in measuring it this way since at least two processes might be underlying IUCN data. First, one can inspect whether there is an effect on "scientific/societal interest" for assessed vs non-assessed species. This would not have any relationship with the assessed status itself. Assessed species are any with LC, NT, VU, EN, CR, EW, EX statuses, whereas non-assessed species might include DD and NE. Second, one may observe an effect of threat status itself, with threatened species being more researched than non-threatened species, this would only be possible for assessed species, although there are methods out there to impute missing statuses. By inspecting Figure 2, I got the feeling that only the second option was explored, but this would need to be confirmed.

In Figure 2, I was confused about the presence of three categories of domain. In the text, it states that four categories have been used. I believe these domains are non-mutually exclusive, that's why there is a fourth category. Would it not be better to assess the influence of domain through three dummy variables (terrestrial, marine, freshwater), where multiple presences (1's) would indicate the "multiple" category?

At present, I felt that the spatial components of your data were unexplored. Since you have centroids representing species distribution, it could be interesting to explore the presence of the species within protected areas or biodiversity hotspots. That might be something triggering at least scientific interest. Also, one can derive information about the major habitat of species occurrence (either using IUCN Major Habitat classification) or extracting overlap of species centroids with WWF biomes (e.g., simplified to just forested vs non-forested habitats; https://ecoregions.appspot.com/). Another point very common to research exploring biodiversity shortfalls is the proximity to research institutions (https://doi.org/10.1111/2041-210X.13152). And since societal interest is also being explored, what about the proximity to major cities (doi:10.1038/nature25181). Finally, other metrics derived from species centroids could inform "tropicality", if the species is tropical or not. Most often, the tropics species are neglected in comparison with those occurring in temperate regions.

I was also thinking about the influence of time on the models. Species described long ago are often more known to people and scientists and had more "time" to be researched. Although metrics of societal interest were restricted to the last decade here, that does not necessarily mean that peoples' interest is not affected by their accumulated experiences. Similar reasoning applies to scientific interests, which have a lengthier time frame (~80 years). That said, the year of description or time since description could be added to capture some metric of time.

Model residuals could be checked for phylogenetic or spatial autocorrelation. I am aware there is no phylogenetic tree used, but the hierarchical taxonomy could be used (Phylum / Class / Order / Family / Genus) as a proxy for phylogenetic relationship. Concerning the spatial autocorrelation, one could check whether model residuals and their respective coordinate centroids of each species range. It is stated that GLMM has been used to avoid these non-independence issues, but it would be interesting to check whether residuals remained free of them.

A last point, it would be interesting to provide some sort of inset plots, such as barplots or donut plots (within the current plots), showing the proportion of species with respect to major clades and biogeographical regions.

Reviewer #2 (Public Review):

Using standard and widely used tools, the authors revealed the factors (cultural, phenotypic, phylogenetic, etc.) shaping societal and scientific interest in natural species around the globe. The strength of this manuscript (and the authors') lies in its command of the available literature, database and variable management and analysis, and its solid discussion. The authors thus achieved a manuscript that was pleasant to read.

While I agree that doing a global study requires losing details of local patterns, maybe this is exactly the biggest shortcoming of the manuscript, oblivious to how different cultures (compare USA to PNG, for example) are reflected in these global patterns.

Related to this previous point, my only other comment is about using English as a reference of societal interest (i.e., the presence of a common name in English). While English may be widespread in Academia, it is still not that common in other societal circles, especially those not using Wikipedia for lack of internet access.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation