Abstract
Comprehensive biodiversity data is crucial for ecosystem protection. The ‘Biome’ mobile app, launched in Japan, efficiently gathers species observations from the public using species identification algorithms and gamification elements. The app has amassed >6M observations since 2019. Nonetheless, community-sourced data may exhibit spatial and taxonomic biases. Species distribution models (SDMs) estimate species distribution while accommodating such bias. Here, we investigated the quality of Biome data and its impact on SDM performance. Species identification accuracy exceeds 95% for birds, reptiles, mammals, and amphibians, but seed plants, mollusks, and fishes scored below 90%. Our SDMs for 132 terrestrial plants and animals across Japan revealed that incorporating Biome data into traditional survey data improved accuracy. For endangered species, traditional survey data required >2,000 records for accurate models (Boyce index ≥ 0.9), while blending the two data sources reduced this to around 300. The uniform coverage of urban-natural gradients by Biome data, compared to traditional data biased towards natural areas, may explain this improvement. Combining multiple data sources better estimates species distributions, aiding in protected area designation and ecosystem service assessment. Establishing a platform for accumulating community-sourced distribution data will contribute to conserving and monitoring natural ecosystems.
Introduction
Nature underpins human society, and the conservation of ecosystems and associated ecosystem services contributes to the sustainable development of human society, yet these services have been rapidly declining in recent years (IPBES, 2019; Loh et al., 2005; Newbold et al., 2016; Scholes and Biggs, 2005). Kunming-Montreal Global Biodiversity Framework (KM-GBF) by the United Nations envisions reversing the nature loss by 2030. As direct means for nature conservation, KM-GBF targeted making 30% of Earth’s land and ocean area as protected areas by 2030 (i.e. 30by30). As an indirect but influential way, KM-GBF requires companies to “monitor, assess, and transparently disclose their risks, dependencies and impacts on biodiversity through their operations, supply and value chains and portfolios,” which is guided by Taskforce on Nature-related Financial Disclosures (TNFD) (TNFD, 2023). To achieve these goals, it is imperative to assess the state of biodiversity with a sufficient spatiotemporal resolution to support conservation planning, adaptive management, and companies’ annual nature-related financial disclosures. The basis for such assessments lies in our knowledge of species distributions (Gonzalez et al., 2023; Newbold et al., 2016). Traditionally, distribution data was acquired through on-site surveys by experts (people have expertise about biodiversity), but collecting distribution data with sufficient spatiotemporal resolution is challenging if we rely only on such limited human resources (Miya et al., 2022; Mori et al., 2023; Pocock et al., 2018).
Since the emergence of digital devices and the internet, people have been able to share their observations through various media, such as images and video/audio recordings. Such community-sourced data have significantly contributed to the accumulation of ecosystem information. These datasets have been instrumental in assessing the impacts of climate change and urbanisation on phenology (Fuccillo Battle et al., 2022; Klinger et al., 2023), detecting distribution changes including invasive alien species (Larson et al., 2020; Roy et al., 2023; Wallace and Bargeron, 2014), exploring large-scale geographic variations in traits (Atsumi and Koizumi, 2017; Leighton et al., 2016), and estimating species distributions (Chandler et al., 2017; Feldman et al., 2021; Johnston et al., 2018; Steen et al., 2019). Moreover, the utilisation of machine learning to describe population trends based on community-sourced data (Fink et al., 2023) offers opportunities for conducting time-series analyses. These analyses can help us understand community assembly processes, unravel species interaction networks, and assess ecosystem stability (Cornwell and Ackerly, 2009; Tilman et al., 2006; Ushio et al., 2018), capitalising on the spatio-temporally dense sampling effort facilitated by community-sourced data (Chandler et al., 2017; Kobori et al., 2016; Pocock et al., 2017). Such analytical approaches enable us to make informed predictions about changes in species distribution, population dynamics, and ecosystem stability in the face of climate change (Bury et al., 2021; Pennekamp et al., 2019; Urban et al., 2016). In essence, community-sourced data, owing to its extensive sampling across time and space, has the potential to test existing ecological theories, expand our comprehension of ecosystems and the underlying processes, eventually allowing us to forecast ecological dynamics in the context of climate change.
When people photograph organisms using digital devices with GPS capabilities, the images often contain timestamps and location details. Such images, when accompanied by species identifications, serve as evidence for tracking phenology and species occurrences. This crowdsourcing approach has been particularly successful on web- or mobile-based platforms such as eBird and iNaturalist (Chandler et al., 2017; Wood et al., 2011). Individuals submit records to these platforms for various reasons, including a desire to contribute to science and engage with cutting-edge technologies (Herodotou et al., 2023; Kaplan Mintz et al., 2023). By making the process more enjoyable (i.e. gamification), we can potentially gather even more biological data from the public (Bowser et al., 2013; Ponti et al., 2015). Yet, the collection process of Community-sourced data is usually not well-designed (e.g., spatially biassed “presence-only” data) (Feldman et al., 2021; Steen et al., 2019) and its interpretation is challenging without proper statistical modelling. Thus, although much effort has been invested in developing effective monitoring and modelling methods for biodiversity assessment, current approaches can be further improved by incorporating (i) more enjoyable community-based survey platforms using mobile applications and (ii) employing an advanced statistical modelling framework in estimating species distribution.
To fuel communities’ engagement in biodiversity surveys and environmental education, we launched the mobile application ‘Biome’ in 2019 in Japan (Fujiki and Tatsuno, 2021). For supporting species identification, Biome implements artificial intelligence (AI) algorithms that generate lists of potential species and enable users to seek help/suggestions from others for species identification (Figure 1) as in other applications such as iNaturalist and eBird. The unique feature of Biome is gamification which offers enjoyable experiences and facilitates communication among users (Fujiki and Tatsuno, 2021; Koide et al., 2023). For example, users can earn “points” by contributing in various ways such as submitting records and suggesting species identifications to others, and their levels are determined based on the total points earned. The inclusion of networking and gamification elements can attract a wider user base, including those who may not typically engage in community science (Bowser et al., 2013; Groom et al., 2021). Consequently, Biome has accumulated data rapidly. Since its launch, 6 million records have been collected through the app (by 17 October 2023). This is more than four times greater than the number of records accumulated by GBIF (Global Biodiversity Information Facility) from any data sources including iNaturalist and eBird during the same period in Japan (ca. 1.3 million). The data gathered through the app has been used for conservation planning and facilitating companies’ financial disclosures by supplying and analysing species occurrence records.
Species distribution models (SDMs) are effective statistical tools for assessing biodiversity at specific sites while accounting for biases in survey efforts. SDMs use species occurrence records and environmental conditions to estimate the potential geographic ranges and suitable habitats for species (Booth et al., 2014; Box, 1981; Elith et al., 2011; Hutchinson, 1957; Phillips et al., 2006). These models play a crucial role in conservation and restoration planning by helping predict how changes in land use and climate impact species distributions (Kindt, 2023; Porfirio et al., 2014; Urban et al., 2016). While species presence/absence data—which needs extensive surveys by experts—is limited, presence-only data—which can be obtained from communities’ observations— is much more available. MaxEnt (Phillips et al., 2006; Phillips and Dudík, 2008) is one of the most popular SDM methods due to its computational efficiency and estimation accuracy (Valavi et al., 2022). It can estimate species distribution from presence-only data by maximising the entropy of the probability distribution while satisfying constraints based on the available information (Elith et al., 2011; Phillips and Dudík, 2008). Since MaxEnt only requires occurrence records, it is well-suited for empowering community-based observations to predict species distributions. Also, while community-sourced data often suffer from spatially-biassed sampling efforts (i.e., sampling tends to concentrate in densely populated or touristic areas (Kendal et al., 2020; Reddy and Dávalos, 2003)), SDMs such as MaxEnt can account for such spatial biases by considering the spatial distribution of sampling efforts when selecting pseudo-absence (background) locations (Milanesi et al., 2020; Phillips et al., 2009). When sampling efforts are adequately controlled, adding community-sourced data improves the accuracy of SDMs (Johnston et al., 2018; Robinson et al., 2020; Steen et al., 2019). This implies that SDMs may be substantially improved by utilising rapidly accumulating Biome‘s species occurrence records if we adequately control the sampling efforts.
Here, we show the quality of community-based data gathered through the smartphone app Biome, and how the data improves the prediction accuracy of species distribution. First, we assess the quality of occurrence records by investigating the fractions of non-wild and misidentified records. Second, we built SDMs based on two types of data: (i) traditional survey data (e.g. forest inventory census, museum specimens and records extracted from published researches) only and (ii) a mixture of traditional survey and Biome data. We then compare the performance of the two SDMs. We modelled the distributions of 132 terrestrial animals and seed plants in the Japanese archipelago which covers subtropical to boreal areas. We finally discuss how our SDMs relying on community-sourced data may contribute to meeting the goals of GBF.
Results
The amount and quality of Biome data
By 7 July 2023, Biome had accumulated 5,275,457 occurrence records of 40,957 species across the Japanese archipelago (Figure 2A). The amount of occurrence records submitted to Biome has increased across the years (Figure 2B). On average in 2022, users submitted 5,407 records per day. The distribution of data along environmental gradients somewhat differs between Biome and Traditional survey data. To elucidate this distinction, we employed principal component (PC) analysis to summarise all environmental variables. The two datasets demonstrated divergent distribution patterns along PC1 (Figure 2C). This component, accounting for 6.1% of the total variation, is primarily influenced by land use, topography, and climate (Supplementary File 1). Among the environmental variables, a notable contrast between the datasets was observed in relation to the natural-urban gradient. The Biome data exhibited a relatively uniform distribution encompassing the entire gradient, while Traditional survey data was substantially biassed towards natural areas (Figure 2C). The majority of records are attributed to insects (31.2%) and to seed plants (41.8%), which are relatively accessible and can be easily photographed using smartphones (Figure 2D).
Out of all the records submitted to Biome, a total of 2,373,303 records (45.0%) successfully passed through the automatic filtering process. This dataset, referred to as the Biome data, is utilised for subsequent investigations. The quality of Biome data varied across taxa and the rarities of species (Table 1). The fraction of the records of wild individuals exceeded 97% in insects and birds, while it was lower than 90% in molluscs, seed plants, mammals and fishes. Among the records of wild individuals, at the species level, identification accuracy was higher than 95% in birds, reptiles, mammals and amphibians but less than 90% in insects, fishes and seed plants. At the genus level, identification accuracy was higher than 90% in all taxa except for insects. In the case of fishes and seed plants, identifications became 5-6% more accurate at the genus level compared to the species level. The family was correctly identified in more than 94% of records in all taxa examined. Common species had higher identification accuracy than rare species (average value, 95% vs. 87%). This tendency was prominent in insects and seed plants, but less in the other taxa. These results suggest that identifying rare species in taxonomically diverse taxa (i.e. seed plants and insects) is a challenging task.
The performance of species distribution models
SDMs using Biome+Traditional data, including Biome data at 50%, were more accurate than those modelled only using Traditional survey data when the two datasets have the same amount of occurrence records (Figure 3). Our analysis revealed that although the intercept of the Boyce Index (BI, model accuracy metric that ranges between −1 to 1) did not differ between the two datasets (generalised linear mixed model, see Methods: β=0.02±0.03, t=0.60, P=0.55), Biome+Traditional data consistently led to a more rapid increase in SDM accuracy as the amount of data increased, comparing to models solely relying on Traditional survey data (β=0.02±0.01, t=3.72, P<0.001).
When compared to SDMs using Traditional survey data, those using Biome+Traditional data achieved a high level of accuracy with a much smaller amount of data. For instance, BI which ranges from –1 to 1, exceeds 0.9 with 294±471 records (mean±SD across all species) in the Biome+Traditional data, whereas the Traditional survey data requires 2,129±4,157 records to achieve the same accuracy. This was also true in endangered species (included in Japanese national or prefectural red lists); although 2,336±3,718 Traditional survey records were required to exceed 0.9 of BI, only 338±571 were required for Biome+Traditional data.
Because we controlled the proportion of Biome data within the Biome+Traditional data as 50%, the amount of records of the Biome+Traditional data is often limited. In cases where a species had less Biome data compared to Traditional survey data, the total amount of records of Biome+Traditional data ends up being smaller than that of Traditional survey data alone. Therefore, the two datasets did not differ in the best model performances in each species (BIs of Biome+Traditional data: 0.81±0.20; Traditional survey data: 0.83±0.20).
Discussion
Biome: The amount and quality of submitted data
Since its launch in 2019, the app Biome accumulates species occurrence data rapidly (Figure 2). Despite our concerted efforts to engage non-expert users through gamification features, it is important to acknowledge that an excessive influx of non-expert users could potentially compromise the quality of the collected data. This could manifest in misidentifications or incomplete documentation, such as failing to appropriately label non-wild individuals. We thus have developed algorithms to exclude such suspicious records based on the features of records and users’ behaviour on the app. The implementation of automatic data filtering techniques is expected to enhance the quality of the data, although further refinement is necessary. Notably, for insects and birds, which encompass numerous species that can be kept in captivity, the majority of records that underwent filtering procedures were restricted to observations of wild individuals. Yet, the fraction of non-wild individuals is high in several taxa such as fishes and seed plants. In response, we have updated the posting flow in the app to prompt users to differentiate between non-wild and wild individuals. Further analysis is warranted to evaluate the impact of this update on data quality.
Once we could exclude non-wild individuals, species identification accuracy exceeded 95% in taxa with moderate species diversity (amphibians, reptiles, birds and mammals). In seed plants, Biome’s species Identification accuracy was 90%, which is higher than the accuracy of auto-suggest identification by commonly used apps for plants (69%, PlantNet, PlantSnap, LeafSnap, iNaturalist and Google Lens: (Hart et al., 2023)). During the invasive plants survey in the US, the reports by non-professional volunteers were 72% correct (Crall et al., 2011). The higher accuracy of species identification in Biome data can be attributed to two key factors. Firstly, the vigilant oversight of the user community through the “suggest identification” feature plays a crucial role. Biome encourages users to participate in suggesting identifications by offering “points” as rewards for their contributions. Secondly, the species identification AI algorithm leverages past occurrence data from nearby areas, resulting in increasingly accurate automatic identifications as the data accumulates. Given these, as a community science app, the data quality of Biome is decent. Yet, rare species generally showed lower identification accuracy, which would require identification by experts and further improvement of species identification AI algorithm.
Species distribution modelling
The inclusion of Biome data resulted in improved accuracy of SDMs (Figure 3). The most accurate model predictions were obtained when the training data consisted of 50-70% Biome data (Appendix 1), highlighting the necessity of incorporating both traditional surveys and citizen observations for a comprehensive understanding of species distributions (Miller et al., 2019; Pacifici et al., 2017; Robinson et al., 2020).
The improvement can be attributed to introducing data with different biases compared to the Traditional survey data. Indeed, when controlling for the number of occurrence records, the model performance was higher in the Biome+Traditional data compared to the Traditional survey data. The variation in performance can be attributed to the distribution of data in relation to environmental conditions. Traditional survey data exhibits a strong bias towards natural areas, whereas Biome data is well balanced across the natural-urban habitat gradients (Figure 2C). Therefore, incorporating Biome data could significantly enhance modelling accuracy in urban and suburban landscapes, which are typically underrepresented in traditional survey data. As pseudo-absences are selected based on search effort, our models utilise numerous pseudo-absences from these areas. Consequently, this might lead to better estimation of species absence in such areas, not just presence, resulting in an overall increase in model accuracy across a wider range of species. A balanced distribution along with the natural-urban gradient is noteworthy because community science data is typically biassed towards human population centres (Kendal et al., 2020; Reddy and Dávalos, 2003). This could be influenced by the distribution of users’ residencies, although we do not have specific information about the users’ locations. The app has collaborated with numerous local governments across Japan, including nine prefectures and 29 local municipalities such as cities and towns. Through these collaborations, the user base may be widely dispersed, enriching the geographical coverage of Biome data.
The Biome data also can improve SDM accuracy by simply increasing the overall amount of data. Essentially, SDM accuracy is enhanced with an increased amount of data (Figure 3) (Erickson and Smith, 2023; Stockwell and Peterson, 2002). In our analysis, we maintained a fixed proportion of 50% for Biome data within the Biome+Traditional dataset, which in turn restricted the amount of available Biome+Traditional data. However, our preliminary analysis (Appendix 1) demonstrates that the enhancement of SDM accuracy occurs across a range of proportion variations for Biome data blending. This implies that the proportion of Biome data does not necessarily need to be controlled. Therefore, in practical application scenarios, the incorporation of Biome data predominantly serves to augment the overall volume of training data.
The impact of community-sourced data on SDMs has primarily been investigated using birds, with a limited focus on plants (Feldman et al., 2021). In our investigation, we observed that incorporating Biome data improved SDM accuracy for seed plants and insects, while the impact on birds remained unclear (Figure 3). This ambiguity is likely because community-sourced data from platforms such as eBird are already incorporated in Traditional data through GBIF. In comparison to other taxonomic groups, our results indicate that seed plants exhibited lower model accuracy when evaluated against both Biome+Traditional survey data (Figure 3) and Traditional survey data alone (Figure 3–figure supplement 1). The variation in model accuracy among taxonomic groups may be attributed to data quality issues in both Biome and Traditional survey data. For instance, in Biome data, while the fractions of wild individuals were high in birds and insects, it was lower for seed plants (Table 1). Compared with other taxa, distinguishing between wild and non-wild individuals can be particularly difficult in plants when they are planted outside. In addition, identifying plant species may be challenging in certain taxa, primarily due to the absence of key identification traits on leaves and stems. This becomes especially problematic when flowers are not present. These difficulties could potentially impact the quality of Traditional data as well. Although few studies have simultaneously assessed the quality of community-sourced data and its impact on SDMs across different taxa, it is important to recognize that data quality can vary among taxa.
Importantly, SDMs for endangered species, which often suffer from data deficit (Erickson and Smith, 2023; Wisz et al., 2008), became accurate in a much fewer amount of records by blending Biome data (Figure 3). Specifically, a threshold of >0.9 Boyce index could be reached with only around 300 records when using Biome data, whereas over 6 times of data is required when using Traditional survey data only. This finding highlights the importance of community-sourced data not only for monitoring the dynamics of endangered species (Chandler et al., 2017; Zapponi et al., 2017) but also for modelling purposes. Considering the rapid accumulation of Biome data, Biome data would make a significant contribution to the more effective distribution modelling of endangered species.
Limitations of this study
In assessing data quality, reidentification was impossible for records that did not photograph key traits for species identification. To address this limitation, further app improvements can include allowing users to submit multiple images. Encouraging users to document various body parts of organisms through multiple images would make capturing key identification traits much easier. This will make reidentification easier, and possibly improve automatic species identification accuracy.
Given the absence of a comprehensive, environmentally unbiased occurrence dataset spanning a wide range of taxa, we assessed SDM accuracy not relying on an independent test dataset. In this evaluation, the test data was meticulously crafted to include 25% Biome data, serving as an intermediary proportion between Biome+Traditional (50%) and Traditional survey data (0%). By leveraging the distinct distribution patterns of Biome and Traditional survey data along environmental variables (Figure 2C), the test data would better encapsulate the actual species distribution, compared to datasets composed solely of either Biome or Traditional survey data. It is noteworthy that, even when the test data exclusively consisted of Traditional survey data (i.e., unfavourable conditions for Biome+Traditional data SDMs), the accuracy of SDMs derived from Biome+Traditional and Traditional survey data did not differ (Figure 3–figure supplement 1). This result further supports our conclusions that Biome provides valuable data for SDM in terms of the amount and quality, and that blending Biome data improves SDM accuracy.
We evaluated SDMs based on spatial transferability using the central Japan region, which encompasses a range of environmental conditions. However, the evaluation results may not necessarily indicate transferability across the entire Japanese archipelago. Instead, in the near future, we anticipate that we can evaluate SDM accuracy using temporal transferability. The rapid accumulation of Biome data will allow us to evaluate the temporal transferability using the occurrence dataset from different time periods, and thus enable assessing their performance in much wider regions. In addition, limited data availability for certain taxa hindered the assessment in those taxa (e.g., molluscs, amphibians, reptiles, and mammals), but Biome would be a platform to overcome the data limitation for many taxa.
Finally, our SDMs do not directly indicate the species’ presence probability. The output from presence-only SDMs usually deviates from the probability of presence when species prevalence (i.e. the proportion of area where the species occupied, requiring presence/absence data throughout the area) is unavailable (Elith et al., 2011; Ward et al., 2009). Due to the unavailability of absence data, SDM outputs in this work are indirect measures of species presence and thus are not directly comparable across different species. Nonetheless, they are comparable within a species, providing useful information for understanding species distributions.
Future directions
By blending data from traditional surveys and communities, we improved the accuracy of species distribution estimates. This enhanced estimation lays the groundwork for more precise subsequent analyses. For instance, estimated distributions will be useful in selecting new protected areas or areas with OECMs (Other Effective area-based Conservation Measures: allowing a wider range of land use as long as biodiversity and ecosystem services are sustained/improved). Using estimated distributions of each species, hotspots of species or evolutionary diverse taxa can be inferred. Such sites will be good candidates for protected areas (Jones et al., 2016) or OECMs (Shiono et al., 2021). Further, estimated distributions can be used as input for spatial conservation prioritisation tools (e.g. Marxan (Ball et al., 2009)).
In our experience, stakeholders—including corporate social responsibility managers and conservation practitioners—often seek the list of species potentially inhabiting their locations. Due to the uncertainty of SDMs and their thresholding into presence/absence, on-site surveys remain essential for assessing biodiversity status. Yet, SDMs can make such surveys cost-effective by screening important locations for on-site assessment (e.g., Locate phase in TNFD framework) and narrowing down the target species for surveying. Improved estimation through SDMs can mitigate risks associated with their use in society and enable more informed decision-making for conservation efforts.
The rapid accumulation of data from diverse locations holds the potential to unveil valuable ecological patterns. The accumulated data enables early detection capabilities for range expansions of invasive species (Sakai et al., in prep). For instance, Biome data has hinted at potential range expansions in several insect species, including butterflies, dragonflies, and stink bugs, as well as changes in wintering areas for birds (Biome Inc., 2023). Given the diverse taxonomic coverage of Biome data (Figure 2D), detecting phenological changes across various taxa may be possible. This, in turn, is useful in uncovering phenological mismatches exacerbated by climate change, which can significantly change the dynamics of interacting species (Renner and Zohner, 2018; Visser and Gienapp, 2019). Moreover, Biome data is well-suited for assessing the effects of urbanisation on ecosystems since it comprehensively spans both urban and natural habitats (Figure 2C). The benefit of rapidly accumulating data, combined with recent advancements in machine learning methods, opens up opportunities for conducting time-series analyses. Community science data has rarely been used for time-series population analysis due to its notable spatio-temporal bias in sampling efforts (Feldman et al., 2021; Zhang et al., 2021). However, the two-step machine learning approach, as demonstrated by Fink and colleagues in estimating bird population trends using eBird data (Fink et al., 2023), sets a precedent. In the future, Biome data may facilitate the inference of population dynamics for multiple taxa. This will enable various time-series analyses to unveil ecosystem stability and interaction strength, which holds potential for forecasting ecosystem dynamics (Laubmeier et al., 2020; Pennekamp et al., 2019; Ushio et al., 2018).
For financial disclosures, companies will assess how their activities rely on ecosystem services and their opportunities for protecting/recovering nature (TNFD, 2023). By incorporating taxon-specific ecosystem services, multifaceted ecosystem services can be preliminarily screened (Kass et al., 2023). For example, based on estimated distributions of bumblebees or insectivorous animals, the functioning of pollination services or pest regulation services might be inferred. Using counts of “likes” or records from Biome data, the charismatic species can be determined. By identifying places with a high estimated richness of charismatic species, potential areas for ecotourism can be screened. Because SDMs allow us to simulate the impacts of changes in landuse and climate (Porfirio et al., 2014; Urban et al., 2016), we will be able to forecast how those changes may influence local biodiversity and/or ecosystem functioning. Hence, estimated distributions provide the basis of nature-related financial disclosures.
Our platform facilitates collaboration among diverse stakeholders, including local communities, landowners, and employees from both private companies and government agencies. Engaging a broader spectrum of stakeholders is crucial for effective biodiversity assessment, nature management planning, and nature-related financial disclosures: this inclusivity allows for the incorporation of traditional knowledge into planning processes, mitigates conflicts among stakeholders, and ultimately supports more seamless and informed decision-making (Chan et al., 2021; Keough and Blahna, 2006; Linsley et al., 2023; Roy et al., 2023; TNFD, 2023). Supporting natural experiences for a wide range of people is also expected to contribute to changing people’s minds towards nature. Through experiencing nature, people become familiar with it and subsequently make pro-nature decisions (Soga and Gaston, 2023). We believe that community science can significantly contribute to KM-GBF and create a sustainable society by fostering nature-positive awareness in society and providing data tools that enable effective action.
Methods
Occurrence record accumulation through mobile app Biome
In April 2019, a free smartphone app called Biome was launched for the Japanese markets. The app has been downloaded 839,844 times by September 13, 2023. The app allows users to collect data on the distribution of plants and animals using their mobile devices. Users can post photographs of the plants and animals they find, and the app automatically records the location and timestamp from EXIF data. If the EXIF data is unavailable, users can manually input the locality and timestamp.
To support species identification, the app provides users with two options. First, the app provides a list of candidate species based on the image and metadata (e.g., location and timestamp). Biome employs a synergistic approach that integrates image recognition technology and geospatial data to facilitate species identification. The image recognition algorithm, constructed upon convolutional neural networks, classifies species at higher taxonomic levels. Subsequently, these candidates are refined based on their frequency of recent occurrences in the geographical area. Consequently, as the correctly identified records accumulate for a given area, species identification AI will improve the accuracy. Second, users can seek help from other users. If a user selects the “ask Biomers” button, their occurrence record is added to a waiting list that appears on the home screen. Other users can suggest possible identifications for the records, as in other records of which species was already identified.
Users can view and comment on other users’ records. However, for conservation purposes, Biome automatically conceals the geolocations of endangered species that are listed on the Japanese national or prefectural red lists. This feature sets it apart from iNaturalist, where users must manually choose to hide the location of endangered species (Koide et al., 2023). The social networking function provides opportunities for communication among users, including non-experts (Fujiki and Tatsuno, 2021). Users earn “points” through their contributions, including record submissions and identification suggestions to other users, and progress to higher levels based on their total points. The points awarded depend on the rarity, conservation status, and societal impact of the species submitted, meaning that users earn more points when submitting records of rare, endangered, or invasive species. The app occasionally offers “Quests” events that provide users with an opportunity to earn additional points by submitting records from specific locations or of particular species, crucial for monitoring phenology. Through the variety of gamification features, we stimulate people to participate in biological surveys as a fun activity.
We obtained occurrence records submitted to Biome by 7 July, 2023. The raw data collected through Biome contains invalid presence records which we defined in the present study as unclear images, documenting non-wild individuals and misidentifications, and images including some privacy issues. To improve data quality, we excluded records deemed to be invalid mainly based on location metadata and users’ reactions to the record is as detailed below. This filtered Biome data is used in the subsequent investigations.
Filtering suspicious occurrence record in Biome data
Occurrence records of non-wild individuals were eliminated as much as possible by using information provided by users and location of records. Biome users sometimes report inappropriate records (e.g., unclear images and images from websites or books), and we excluded all of those reported records. All private records were excluded because they can harbour inappropriate and misidentified records not being screened by other users. We also excluded occurrence records that users had marked as non-wild individuals: users have an option to label their records as photographing bred or cultivated individuals, or specimens. Records from cultural centres (i.e. zoos, botanical gardens, museums, and aquariums) and large pet stores were removed as well. During the data correction process, we prioritise the suggestions provided by certified users (see below for the definition), regardless of the decisions made by the users who originally created the record. Furthermore, we exclude records that have not been posted by certified users or have not received identification suggestions from certified users.
Certified users are defined as users who achieved the higher accuracy of species identification (<15% of public occurrence records were suggested as misidentification by other users), submitted few inappropriate records (<0.5% of public records), and have created >20 public records. We also defined specialist users, a subset of certified users identified in each taxa (see Figure 2 for the classification), who made a total of >30 records or identification suggestions with high identification accuracy (the fraction of suggested records is less than the average of certified users in the taxa). Specialist users are used in determining pseudo-absence for SDMs.
Assessing the accuracy of records
We investigated the proportion of occurrence records within the Biome data that were suitable for SDMs. Since SDMs are influenced by invalid presence records, we assessed the quality of Biome data based on a total of 1420 records from rare and common species of seed plants, molluscs, insects (including Arachnid and Insecta), fishes, mammals, birds, reptiles and amphibians (Figure 4). We defined rare species as those with fewer than or equal to 10 occurrences in Biome data, and common species as those with the highest 15% of records in each taxonomic category. In each of the seed plant and insect species which account for the majority of Biome data (Figure 2D), we randomly selected 145 records of each rare and common species. For the other taxonomic categories, we chose each of the 70 records from rare and common species.
Records were first screened whether they targeted organisms (images with no organisms were discarded) and contained wild individuals. To assess the accuracy of species identification, species in the records documented wild individuals were manually reidentified by experts with taxonomic knowledge (Figure 4). These experts have professional backgrounds, serving as a technician at a prefectural research institute (fish), highly-experienced field survey conductors (plants and insects, respectively), a post-doctoral researchers (amphibians and reptiles, and mammals, respectively), and a museum curator (mollusks) specialising in the focal taxa. Then, by comparing species identifications by the experts and on Biome data, the results were classified into two categories: (1) correct based on the image and locality—based on the image, identification was probably correct, and the image locality matches with habitat/range of the species; (2) misidentification—records were reidentified by experts if possible. We also examined if the identification was correct at genus and family levels.
Species distribution models
Occurrence data
To evaluate the impact of Biome data on SDM prediction accuracy, we compiled two datasets: “Traditional survey data” and “Biome+Traditional data”. The Traditional survey data comprised records collected through conventional survey techniques (e.g. riverine census, forest inventory census, and museum specimens) primarily sourced from The National Census on River and Dam Environments (NCRE) and GBIF. In contrast, the Biome+Traditional data encompassed records submitted to Biome that passed filtering methods, in addition to the Traditional survey data. To control the relative proportion of Biome data, we constrained the fraction of Biome data within the Biome+Traditional data to 50% for each species. Our preliminary results showed that blending 50-70% of Biome data in training data improved prediction accuracy (Appendix 1).
For traditional survey data, we downloaded occurrence records of relevant taxa from GBIF between April 20th, 2023. To prevent significant differences between the sampling periods of the GBIF records and environmental data, we used the GBIF sampled after 1970. The clean_coordinates function of the R package ‘CoordinateCleaner’ was used to remove records with erroneous coordinates such as records from country capitals and centroids, and biodiversity institutions. We obtained occurrence data from the large occurrence datasets such as National Census on River and Dam Environments (NCRE) and Forest Ecosystem Diversity Basic Survey. For the areas or taxa where occurrences were scarce, we further compiled the literature with detailed locality information, such as local species inventories. The amount of occurrence records in the modelled species and species coverage of each dataset are summarised in Table 2. For the species analysed (S9 Table), traditional survey data contains a negligible portion of community-sourced data (5.5%) because GBIF contains community-sourced data from iNaturalist and eBird.
Predictor variables
Predictors encompass a range of environmental variables recognized to impact species distribution (Table 3): land use (Newbold et al., 2015), climate (bioclim variables (Booth et al., 2014)), vegetation (Abe, 2018), lithology (Ott, 2020) and elevational range (Udy et al., 2021). Additionally, categorical variables representing known biogeographic regions, reflecting geological history, were included. We applied Blakiston’s Line —Tsugaru straits dividing the northern and main islands of Japan (i.e., Hokkaido and Honshu islands)— reflecting a significant historical migration barrier for mammals and birds (Dobson, 1994; Saitoh et al., 2015). Due to the distinct fauna (Wepfer et al., 2016; Yamasaki, 2017), we also specified oceanic islands (i.e. Ogasawara and Daito isles) which have never been connected with the Asiatic continents. Continuous environmental variables were transformed into linear, quadratic and hinge feature classes to illustrate nonlinear associations between environments and species occurrence (Phillips et al., 2017). The regularisation multiplier was set at 2.5, falling within the established optimal range of 1.5 to 4 (Elith et al., 2010; Moreno-Amat et al., 2015).
Pseudo-absence reflecting search effort
We considered sampling efforts when selecting a total of 10,000 pseudo-absence locations. To accommodate biases in sampling efforts, we assigned picking probabilities as an increasing function of the amount of occurrence records of all and relevant taxa at the grid cell (an index of sampling efforts) (Milanesi et al., 2020; Phillips et al., 2009). That is, grid cells with rich occurrence records of relevant taxa are more likely to be chosen as pseudo-absences than cells with few records, as detailed below (see also Figure 5).
To generate pseudo-absence (i.e. background) data, we employed two approaches considering different sampling efforts. The first approach incorporated all observers and taxa, while the second approach focused on experts and relevant taxa (Figure4). In both cases, pseudo-absences were selected from grid cells that lacked any occurrence records of the species being modelled. However, due to variations in sampling efforts across locations, it was important to address potential bias. To mitigate this bias, we adjusted the picking probability based on the number of occurrences of other species in each grid cell (Milanesi et al., 2020; Phillips et al., 2009).
In the first approach, we assumed that the users of Biome submit records of any taxon without specifically selecting species from particular taxa. The picking probability was simply determined by the total number of records from all taxa in the Biome data in every grid. In the second approach, we considered the expertise of observers (Milanesi et al., 2020) and the sampling effort for relevant taxa (Phillips et al., 2009). We also assumed that Traditional surveys targeted particular taxa. Under this approach, we selected records from Biome data contributed by specialist users and all records from the Traditional survey data. From this subset of data, we calculated the number of records for the taxa (e.g. seed plant, insect and amphibian) to which the modelled species belonged. This information was then used to calculate the picking probability for each grid cell. To account for the variability in record counts among locations, we applied a logarithmic transformation to the number of records. We also added a value of 1.2 before taking logarithms to allow for the selection of pseudo-absences with low probabilities, particularly in locations with only one or no records of other species. Pseudo-absences were not chosen from the spatial block used as test data, but otherwise, there were no geographical restrictions on their selection.
Using the described approaches, we obtained a total of 10,000 pseudo-absences for our analyses. The amount of pseudo-absences follows the default setting of MaxEnt (Elith et al., 2011). For the models using Biome+Traditional dataset (also in Biome-blended dataset in Appendix 1), pseudo-absences were generated by merging each of the 5,000 points identified through the two approaches. Meanwhile, for SDMs using the Traditional survey data only, we obtained 10,000 pseudo-absences by exclusively using the second approach without incorporating Biome data.
Modelling
We modelled distributions of terrestrial seed plants and animals at a scale of 1 x 1 km grid cell, based on Traditional survey data and Biome+Traditional data. To model species distributions from presence-only data, several algorithms have been utilised, including generalised additive models, random forest, and neural networks (Norberg et al., 2019; Valavi et al., 2022). In our study, we opted for MaxEnt (Phillips and Dudík, 2008) due to its high estimation accuracy and relatively low computational burden (Valavi et al., 2022). We perfomed MaxEnt via ENMeval 2.0 package (Kass et al., 2021) on R 4.1.3 (R Core Team, 2021).
Model evaluation
We evaluated the model by examining spatial transferability because we could not find occurrence data that are environmentally unbiased and independent from training data. To minimise spatial autocorrelation between training and test data, we set a spatial block for splitting data (Araújo et al., 2019; Santini et al., 2021). As the spatial block, we chose the central Japan region (latitude, 33.7°–37.7° N; longitude, 136.2°–137.6° E: Figure 6) which covers various environments—alpine to coastal lowlands, metropolis to highly intact areas.
To ensure a fair and balanced assessment of the accuracy of SDMs built from Traditional survey data (0% Biome data) and Biome+Traditional data (50% Biome data), we compiled a test dataset that embodies characteristics intermediate between these two datasets. This composite test dataset encompasses 25% Biome data and 75% Traditional data, effectively bridging the differences between the two original datasets and providing a comprehensive basis for evaluating SDM accuracy.
Due to the presence of invalid records, Biome records were used as test data only when multiple users recorded the same species within an identical 1km grid cell. Although Biome data may include invalid records (i.e. non-wild individuals or misidentification), if multiple users recorded the same species at the same place, any one of the records from the place is likely to be valid. As we know the fraction of valid records within the Biome dataset in each taxon (see Results), we can calculate the probability of the true presence in a given location as follows, by assuming that records made by different users were independent:
The probability of valid records at a given taxon is shown as, and the number of users reported given species at the place is indicated as. If exceeds 99%, we deemed that the species occurred in the location.
To reduce spatial sampling bias, we downsampled a dataset within Traditional survey data, NCRE with massive records from freshwaters, to match the number of records from the remaining Traditional survey data. This procedure is applied to all test datasets in both the main analysis and preliminary analyses documented in Figure 3–figure supplement 1 and Appendix 1.
Boyce index (BI) was used to measure model performance because it was designed to evaluate presence-only SDMs (Hirzel et al., 2006). In short, BI measures the correlation between estimated habitat preference and the frequency of actual presence, and ranges from −1 to 1. A high BI indicates high SDM accuracy that presence data points tend to be located in grids with higher habitat suitability values. To reliably calculate BI, at least 50 occurrences should be needed in test data (Hirzel et al., 2006). Thus, we used 132 species that have more than 50 occurrences in test data for calculating BI (Supplementary File 3).
Examining influences of blending Biome data on SDM accuracy
Given that the accuracy of SDMs is affected by the amount and quality of data (Araújo et al., 2019; Erickson and Smith, 2023; Stockwell and Peterson, 2002), blending Biome data in SDMs may affect the model performances in two possible ways: by increasing the overall amount of data, and/or by introducing data with different information than the original data. We analysed to distinguish between these effects. We prepared two different datasets: “Traditional survey data” and “Biome+Traditional data”. Then, we separately trained SDMs using these two datasets. We further varied the data size by performing random downsampling, ranging from a minimum of 20 to a maximum of 20,000 records, in order to evaluate its impact on the model. As for the “Biome+Traditional data” category, the proportion of Biome data was kept at 50%. For each condition, we conducted three iterations of training and testing to reduce the impact of random sampling stochasticity. Because the modelling was performed for each species, we obtained BI for each species, amount of records, and dataset (i.e., two datasets consisted of 132 species, each with a maximum of 123 conditions for the amount of records, and the models were replicated three times, resulting in a total of 12,351 individual model runs).
After obtaining BIs for each run, we evaluated the effects of data type (i.e., Biome+Traditional data or Traditional survey data) and species on BI while accounting for the amount of records. For each species and under each amount of records, the mean BI was calculated across the three iterations. Given that BI is a correlation coefficient, we applied the Fisher z-transformation to these BIs to approximate their distribution as a normal distribution. To the transformed BIs, we fitted a generalised linear mixed model that accounted for both the fixed and interaction effects of data type and amount of records. This model accommodated species identity as a random effect. The model was implemented and tested using R packages lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017), respectively.
Acknowledgements
We are grateful to Jamie M. Kass for advice on the construction and evaluation of SDMs. We thank Midzuho Tatsuno, Kazumichi Morishita, Hironori Tanaka, Kotaro Takai and Shuhei Tochino for species identification; Akira Sawada and Yuko Maegawa for their advice on the analysis; Trevor H Booth and Dalelan Anderson for comments on the manuscript. We appreciate the invaluable contributions of Biome users. This research was partly supported by the collaborative research agreement between MU at Kyoto University and Biome Inc. from 2020 to 2021.
Conflict of Interest
KA and YN are employed by Biome Inc., of which the CEO is SF and the CTO is TG. SF and TG are inventors of the species-identification-AI-algorithm JPN patents 6590417 and US patents 11048969. MU and HN declare that they have no competing interests. All authors will not financially benefit directly from the publication of this paper.
Figure, tables and supporting information
Supplementary File 1. Distributions of occurrence records along with environmental variables.
Supplementary File 2. List of GBIF data doi and literatures compiled in occurrence data.
Supplementary File 3. List of species for constructed species distribution models.
Appendix 1. Determine the best blend of Traditional survey and Biome data
Methods
In this investigation, we aimed to determine the optimal proportion of Biome data within the training dataset of SDMs in order to enhance the accuracy of SDMs. To conduct this assessment, we initially selected a subset of species for which sufficient test data was available (as detailed below).
For each of the selected species, we generated training datasets by combining Traditional survey data with Biome data at different proportions: 15%, 30%, 40%, 50%, 60%, 70%, 85%, and 100%, referred to as Biome-blended datasets.
To compare the accuracy of SDMs, we created and evaluated models using both the Traditional survey dataset and each of the Biome-blended datasets. SDMs were created by following the methodology employed in the main analysis. To ensure equitable comparison, we equalised the amount of data in each pair of blended and Traditional survey datasets. This equalisation was achieved by randomly downsampling the larger dataset to match the size of the smaller one.
We assessed the accuracy of the models using the Boyce index (BI), which follows the same methodology as employed in the main analysis. In this specific investigation, we did not control the proportion of Biome data within the test data. We selected a set of species for which the test dataset contained at least 50 locations and randomly chose 20 species from each of the seed plants, insects, and birds (see Supplementary File 4).
Results
The analysis revealed that the relative model accuracy becomes high positive values when training data comprises 50-70% of Biome data (Appendix 1–figure 1). This indicates that SDM accuracy is substantially enhanced when the training data incorporates 50-70% of Biome data. The relative model accuracy remained positive in the 15-70% Biome-blended datasets, but decreased to negative values in the 85% and 100% Biome-blended datasets (S3 Fig). This suggests that blending Biome data generally enhances the accuracy of SDMs, but it is important to include at least 30% Traditional survey data to maintain accuracy. Based on the high performance observed and simplicity, we selected the 50% Biome-blended dataset as the Biome+Traditional data for comparing model accuracy with the Traditional survey data in the main text.
References
- Habitat classification for 69 near threatened plants based on national vegetation survey dataVeg Sci 35:67–88https://doi.org/10.15031/vegsci.35.67
- Standards for distribution models in biodiversity assessmentsSci Adv 5https://doi.org/10.1126/sciadv.aat4858
- Web image search revealed large-scale variations in breeding season and nuptial coloration in a mutually ornamented fish, Tribolodon hakonensisEcol Res 32:567–578https://doi.org/10.1007/s11284-017-1466-z
- Marxan and relatives: Software for spatial conservation prioritizationSpatial Conservation Prioritisation: Quantitative Methods and Computational ToolsOxford University Press
- lme4: linear mixed-effects models using S4 classesJ Stat Softw 67:1–48
- Biome Inc. 2023. The report of Climate Change Biosurvey in 2022.The report of Climate Change Biosurvey in 2022
- . bioclim: the first species distribution modelling package, its early applications and relevance to most current MaxEnt studiesDivers Distrib 20:1–9https://doi.org/10.1111/ddi.12144
- Using gamification to inspire new citizen science volunteersProceedings of the First International Conference on Gameful Design, Research, and Applications, Gamification’13New York, NY, USA: Association for Computing Machinery https://doi.org/10.1145/2583008.2583011
- Macroclimate and Plant Forms, Tasks for Vegetation ScienceDordrecht: Springer Netherlands https://doi.org/10.1007/978-94-009-8680-0
- Deep learning for early warning signals of tipping pointsProc Natl Acad Sci U S A 118https://doi.org/10.1073/pnas.2106140118
- Chan L, Hillel O, Werner P, Holman N, Coetzee I, Galt R, Elmqvist T. 2021. Handbook on the Singapore Index on Cities’ Biodiversity (also known as the City Biodiversity Index). Montreal: Secretariat of the Convention on Biological Diversity and Singapore: National Parks Board, Singapore: The Secretariat of the Convention on Biological Diversity.Handbook on the Singapore Index on Cities’ Biodiversity (also known as the City Biodiversity Index). Montreal: Secretariat of the Convention on Biological Diversity and Singapore: National Parks Board, Singapore: The Secretariat of the Convention on Biological Diversity
- . Contribution of citizen science towards international biodiversity monitoringBiol Conserv SI: Measures of biodiversity :280–294https://doi.org/10.1016/j.biocon.2016.09.004
- Community assembly and shifts in plant trait distributions across an environmental gradient in coastal CaliforniaEcol Monogr 79:109–126https://doi.org/10.1890/07-1134.1
- Assessing citizen science data quality: an invasive species case studyConserv Lett 4:433–442https://doi.org/10.1111/j.1755-263X.2011.00196.x
- Patterns of distribution in Japanese land mammalsMammal Rev 24:91–111https://doi.org/10.1111/j.1365-2907.1994.tb00137.x
- The art of modelling range-shifting speciesMethods Ecol Evol 1:330–342https://doi.org/10.1111/j.2041-210X.2010.00036.x
- A statistical explanation of MaxEnt for ecologistsDivers Distrib 17:43–57https://doi.org/10.1111/j.1472-4642.2010.00725.x
- Modeling the rarest of the rare: a comparison between multi-species distribution models, ensembles of small models, and single-species models at extremely low sample sizesEcography 2023https://doi.org/10.1111/ecog.06500
- Trends and gaps in the use of citizen science derived data as input for species distribution models: A quantitative reviewPLOS ONE 16https://doi.org/10.1371/journal.pone.0234587
- A Double machine learning trend model for citizen science dataMethods Ecol Evol 14:2435–2448https://doi.org/10.1111/2041-210X.14186
- Citizen science across two centuries reveals phenological change among plant species and functional groups in the Northeastern USJ Ecol 110:1757–1774https://doi.org/10.1111/1365-2745.13926
- Practice of citizen science for developing biodiversity monitoring methods using mobile devicesJpn J Ecol 71:85–90https://doi.org/10.18960/seitai.71.2_85
- A framework for the detection and attribution of biodiversity changePhilos Trans R Soc B Biol Sci 378https://doi.org/10.1098/rstb.2022.0182
- Species interactions: next-level citizen scienceEcography 44:1781–1789https://doi.org/10.1111/ecog.05790
- Assessing the accuracy of free automated plant identification applicationsPeople Nat 5:929–937https://doi.org/10.1002/pan3.10460
- Young people in iNaturalist: a blended learning framework for biodiversity monitoringInt J Sci Educ Part B 0:1–28https://doi.org/10.1080/21548455.2023.2217472
- Evaluating the ability of habitat suitability models to predict species presencesEcol Model, Predicting Species Distributions 199:142–152https://doi.org/10.1016/j.ecolmodel.2006.05.017
- Concluding remarksCold Spring Harbor Symposia on Quantitative BiologyCold Spring Harbor Laboratory Press
- Global assessment report on biodiversity and ecosystem services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem ServicesBonn, Germany: IPBES secretariat
- Estimates of observer expertise improve species distributions from citizen science dataMethods Ecol Evol 9:88–97https://doi.org/10.1111/2041-210X.12838
- Incorporating climate change into spatial conservation prioritisation: A reviewBiol Conserv 194:121–130https://doi.org/10.1016/j.biocon.2015.12.008
- Multiple forms of engagement and motivation in ecological citizen scienceEnviron Educ Res 29:27–44https://doi.org/10.1080/13504622.2022.2120186
- Biodiversity modeling advances will improve predictions of nature’s contributions to peopleTrends Ecol Evol 0https://doi.org/10.1016/j.tree.2023.10.011
- ENMeval 2.0: Redesigned for customizable and reproducible modeling of species’ niches and distributionsMethods Ecol Evol 12:1602–1608https://doi.org/10.1111/2041-210X.13628
- City-size bias in knowledge on the effects of urban nature on people and biodiversityEnviron Res Lett 15https://doi.org/10.1088/1748-9326/abc5e4
- Achieving Integrative, Collaborative Ecosystem ManagementConserv Biol 20:1373–1382https://doi.org/10.1111/j.1523-1739.2006.00445.x
- TreeGOER: A database with globally observed environmental ranges for 48,129 tree speciesGlob Change Biol https://doi.org/10.1111/gcb.16914
- iPhenology: Using open-access citizen science photos to track phenology at continental scaleMethods Ecol Evol 14:1424–1431https://doi.org/10.1111/2041-210X.14114
- Citizen science: a new approach to advance ecology, education, and conservationEcol Res 31:1–19https://doi.org/10.1007/s11284-015-1314-y
- Species’ spatiotemporal distribution platform based on citizen science through desirable circulation between the real and digital worldsJpn J Conserv Ecol https://doi.org/10.18960/hozen.2217
- lmerTest package: Tests in linear mixed effects modelsJ Stat Softw 82https://doi.org/10.18637/jss.v082.i13
- From eDNA to citizen science: emerging tools for the early detection of invasive speciesFront Ecol Environ 18:194–202https://doi.org/10.1002/fee.2162
- Ecological Dynamics: Integrating Empirical, Statistical, and Analytical MethodsTrends Ecol Evol 35:1090–1099https://doi.org/10.1016/j.tree.2020.08.006
- Just Google it: assessing the use of Google Images to describe geographical variation in visible traits of organismsMethods Ecol Evol 7:1060–1070https://doi.org/10.1111/2041-210X.12562
- The Taskforce on Nature-related Financial Disclosures must engage widely and justify its market-led approachNat Ecol Evol 7:1343–1346https://doi.org/10.1038/s41559-023-02113-w
- The Living Planet Index: using species population time series to track trends in biodiversityPhilos Trans R Soc B Biol Sci 360:289–295https://doi.org/10.1098/rstb.2004.1584
- Observer-oriented approach improves species distribution models from citizen science dataEcol Evol 10:12104–12114https://doi.org/10.1002/ece3.6832
- The recent past and promising future for data integration methods to estimate species’ distributionsMethods Ecol Evol 10:22–37https://doi.org/10.1111/2041-210X.13110
- The use of citizen science in fish eDNA metabarcoding for evaluating regional biodiversity in a coastal marine region: A pilot studyMetabarcoding Metagenomics 6https://doi.org/10.3897/mbmg.6.80444
- Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical dataEcol Model 312:308–317https://doi.org/10.1016/j.ecolmodel.2015.05.035
- Perspective: sustainability challenges, opportunities and solutions for long-term ecosystem observationsPhilos Trans R Soc B Biol Sci 378https://doi.org/10.1098/rstb.2022.0192
- Has land use pushed terrestrial biodiversity beyond the planetary boundary? A global assessmentScience 353:291–288https://doi.org/10.1126/science.aaf2201
- Global effects of land use on local terrestrial biodiversityNature 520:45–50https://doi.org/10.1038/nature14324
- A comprehensive evaluation of predictive performance of 33 species distribution models at species and community levelsEcol Monogr 89https://doi.org/10.1002/ecm.1370
- How Lithology Impacts Global Topography, Vegetation, and Animal Biodiversity: A Global-Scale Analysis of Mountainous RegionsGeophys Res Lett 47https://doi.org/10.1029/2020GL088649
- Integrating multiple data sources in species distribution modeling: a framework for data fusion*Ecology 98:840–850https://doi.org/10.1002/ecy.1710
- The intrinsic predictability of ecological time series and its potential to guide forecastingEcol Monogr 89https://doi.org/10.1002/ecm.1359
- Opening the black box: an open-source release of MaxentEcography 40:887–893https://doi.org/10.1111/ecog.03049
- Maximum entropy modeling of species geographic distributionsEcol Model 190:231–259https://doi.org/10.1016/j.ecolmodel.2005.03.026
- Modeling of species distributions with Maxent: new extensions and a comprehensive evaluationEcography 31:161–175https://doi.org/10.1111/j.0906-7590.2008.5203.x
- Sample selection bias and presence-only distribution models: implications for background and pseudo-absence dataEcol Appl 19:181–197https://doi.org/10.1890/07-2153.1
- Chapter Six - A Vision for Global Biodiversity Monitoring With Citizen ScienceAdvances in Ecological Research, Next Generation Biomonitoring: Part 2 Academic Press :169–223https://doi.org/10.1016/bs.aecr.2018.06.003
- The diversity and evolution of ecological and environmental citizen sciencePLoS ONE 12:1–17https://doi.org/10.1371/journal.pone.0172579
- Science and Gamification: The Odd Couple?Proceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play, CHI PLAY’15New York, NY, USA: Association for Computing Machinery :679–684https://doi.org/10.1145/2793107.2810293
- Improving the Use of Species Distribution Models in Conservation Planning and Management under Climate ChangePLOS ONE 9https://doi.org/10.1371/journal.pone.0113749
- R: A language and environment for statistical computingR Found Stat Comput Vienna Austria
- Geographical sampling bias and its implications for conservation priorities in AfricaJ Biogeogr 30:1719–1727https://doi.org/10.1046/j.1365-2699.2003.00946.x
- Climate Change and Phenological Mismatch in Trophic Interactions Among Plants, Insects, and VertebratesAnnu Rev Ecol Evol Syst 49:165–182https://doi.org/10.1146/annurev-ecolsys-110617-062535
- Integrating citizen science data with expert surveys increases accuracy and spatial extent of species distribution modelsDivers Distrib 26:976–986https://doi.org/10.1111/ddi.13068
- IPBES Invasive Alien Species Assessment: Summary for PolicymakersZenodo https://doi.org/10.5281/zenodo.8314303
- DNA barcoding reveals 24 distinct lineages as cryptic bird species candidates in and around the Japanese ArchipelagoMol Ecol Resour 15:177–186https://doi.org/10.1111/1755-0998.12282
- Assessing the reliability of species distribution projections in climate change researchDivers Distrib 27:1035–1050https://doi.org/10.1111/ddi.13252
- ssdm: An r package to predict distribution of species richness and composition based on stacked species distribution modelsMethods Ecol Evol 8:1795–1803https://doi.org/10.1111/2041-210X.12841
- A biodiversity intactness indexNature 434:45–49https://doi.org/10.1038/nature03289
- Area-based conservation planning in Japan: The importance of OECMs in the post-2020 Global Biodiversity FrameworkGlob Ecol Conserv 30https://doi.org/10.1016/j.gecco.2021.e01783
- Nature benefit hypothesis: Direct experiences of nature predict self-reported pro-biodiversity behaviorsConserv Lett https://doi.org/10.1111/conl.12945
- An evaluation of stringent filtering to improve species distribution models from citizen science dataDivers Distrib 25:1857–1869https://doi.org/10.1111/ddi.12985
- Effects of sample size on accuracy of species distribution modelsEcol Model 148:1–13https://doi.org/10.1016/S0304-3800(01)00388-X
- Biodiversity and ecosystem stability in a decade-long grassland experimentNature 441:629–632https://doi.org/10.1038/nature04742
- TNFD. 2023. Taskforce on Nature-related Financial Disclosures (TNFD) Recommendations version 1.0.Taskforce on Nature-related Financial Disclosures (TNFD) Recommendations version 1.0
- Environmental heterogeneity predicts global species richness patterns better than areaGlob Ecol Biogeogr 30:842–851https://doi.org/10.1111/geb.13261
- Improving the forecast for biodiversity under climate changeScience 353https://doi.org/10.1126/science.aad8466
- Fluctuating interaction network and time-varying stability of a natural fish communityNature 554:360–363https://doi.org/10.1038/nature25504
- Predictive performance of presence-only species distribution models: a benchmark study with reproducible codeEcol Monogr 92https://doi.org/10.1002/ecm.1486
- Evolutionary and demographic consequences of phenological mismatchesNat Ecol Evol 12https://doi.org/10.1038/s41559-019-0880-8
- Identifying invasive species in real time: early detection and distribution mapping system (EDDMapS) and other mapping toolsInvasive Species Glob Clim Change, CABI Invasives Series :219–231https://doi.org/10.1079/9781780641645.0219
- Presence-Only Data and the EM AlgorithmBiometrics 65:554–563https://doi.org/10.1111/j.1541-0420.2008.01116.x
- Influences of climate and historical land connectivity on ant beta diversity in East AsiaJ Biogeogr 43:2311–2321https://doi.org/10.1111/jbi.12762
- Effects of sample size on the performance of species distribution modelsDivers Distrib 14:763–773https://doi.org/10.1111/j.1472-4642.2008.00482.x
- eBird: Engaging Birders in Science and ConservationPLOS Biol 9https://doi.org/10.1371/journal.pbio.1001220
- Biogeographic Pattern of Japanese Birds: A Cluster Analysis of Faunal Similarity and a Review of Phylogenetic EvidenceSpecies Diversity of Animals in Japan, Diversity and Commonality in Animals Tokyo: Springer Japan :117–134https://doi.org/10.1007/978-4-431-56432-4_4
- Citizen science data as an efficient tool for mapping protected saproxylic beetlesBiol Conserv, The role of citizen science in biological conservation 208:139–145https://doi.org/10.1016/j.biocon.2016.04.035
- Habitat change and biased sampling influence estimation of diversity trendsCurr Biol 31:3656–3662https://doi.org/10.1016/j.cub.2021.05.066
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Copyright
© 2024, Atsumi et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 831
- downloads
- 99
- citations
- 2
Views, downloads and citations are aggregated across all versions of this paper published by eLife.