Workflow of submitting records to Biome.

(1) Users can upload images that were taken by the smartphone camera or import existing images from the storage, including those imported from external devices. (2) Users select whether the image is about animals or plants to activate the species identification AI. (3) The AI analyses the image and its metadata to generate a candidate species list. (4) Alternatively, users can input the taxon name manually and obtain a list of candidate species. To submit the occurrence record, users can either (5) seek identification assistance from other users through the “ask Biomers” feature, or (6) identify the species from the list. To the records, users can add memos and tags indicating phenology, life-stage, sex, and whether the individual is wild or captive.

Description of data accumulated by Biome.

Data distributions are shown based on all records submitted to Biome by 7 July 2023 (N=5,275,457). A Spatial distribution of records across Japan. B Accumulation of records through time. The barplot represents the number of records each month and the line shows the cumulative amount of records. C Distributions of records along with PC1 of all environmental variables and standardised area occupancy of urban-type land uses. Grey and green represent distributions of Traditional and Biome data, respectively. D Taxonomic composition of records is shown as the area sizes. ‘Other plant’ consists of non-seed terrestrial plants; ‘insects’ include Arachnids and Insects; ‘arthropods’ cover any Arthropod not included in insects; ‘other animals’ covers all invertebrates not included in the taxa above.

Data quality of Biome.

The fraction of records documenting wild individuals, and identification accuracy at species, genus and family levels among the records documenting wild individuals are shown. Species were identified only for records documenting wild individuals.

The accuracy of species distribution models.

Accuracy of SDMs using Traditional survey data (grey dots and lines) and Biome+Traditional data (i.e. 50% of Biome data: green). Each SDM was performed with a specific dataset, species, and the amount of records. For each species and amount of records, we computed the average model accuracy (Boyce index) from three replicated runs. Subsequently, we calculated the median model accuracy across species for each amount of records. These medians were then illustrated for each taxon in the strip of each respective panel. The “Endangered” category includes species that are listed as endangered on Japan’s national or prefectural red lists.

The workflow of checking accuracy of Biome data.

List of species occurrence datasets used for constructing SDMs.

To compare Biome dataset with the other datasets, iNaturalist and eBird data based on community science were classified as ‘Traditional survey’ data. *For the list of GBIF download doi and literatures, see Supplementary File 2.

Environmental data used for constructing SDMs.

Years indicate the data collection period. Usage in the SDM shows how the variables were converted before using in the species distribution modelling.

The workflow for selecting pseudo-absence (background) grid cells for SDMs using the Biome-Traditional dataset.

In this process, both Biome data and Traditional dataset are utilised to determine the suitable locations for pseudo-absence grid cells. However, when constructing SDMs using the Traditional dataset exclusively, Biome data is not involved in the selection of pseudo-absence points.

Japanese archipelago, coloured by altitude.

Shaded area shows spatial block of test data. Retrieved from Wikipedia (2023, May 30), licensed under Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0).

Accuracy of SDMs using Traditional survey data (grey dots and lines) and Biome+Traditional data (i.e. 50% of Biome data: green), evaluated against test data only consisting of Traditional survey data.

The violin plots of relative model accuracy between SDMs using Biome-blended data and Traditional survey data.

The median values are shown as grey dots. The positive relative model accuracy indicates that SDMs that used Biome data outperformed models that used Traditional survey data.