Boosting biodiversity monitoring using smartphone-driven, rapidly accumulating community-sourced data

  1. Keisuke Atsumi  Is a corresponding author
  2. Yuusuke Nishida
  3. Masayuki Ushio
  4. Hirotaka Nishi
  5. Takanori Genroku
  6. Shogoro Fujiki  Is a corresponding author
  1. Biome Inc, Japan
  2. Department of Ocean Science, Hong Kong University of Science and Technology, Hong Kong
  3. Hakubi Center, Kyoto University, Japan
  4. Center for Ecological Research, Kyoto University, Japan
  5. Toyohashi Museum of Natural History, Japan
7 figures, 4 tables and 4 additional files

Figures

Workflow of submitting records to Biome.

(1) Users can upload images that were taken by the smartphone camera or import existing images from the storage, including those imported from external devices. (2) Users select whether the image is about animals or plants to activate the species identification artificial intelligence (AI). (3) The AI analyses the image and its metadata to generate a candidate species list. (4) Alternatively, users can input the taxon name manually and obtain a list of candidate species. To submit the occurrence record, users can either (5) seek identification assistance from other users through the ‘ask Biomers’ feature, or (6) identify the species from the list. To the records, users can add memos and tags indicating phenology, life stage, sex, and whether the individual is wild or captive.

Description of data accumulated by Biome.

Data distributions are shown based on all records submitted to Biome by 7 July 2023 (N = 5,275,457). (A) Spatial distribution of records across Japan. (B) Accumulation of records through time. The barplot represents the number of records each month and the line shows the cumulative amount of records. (C) Distributions of records along with PC1 of all environmental variables and standardised area occupancy of urban-type land uses. Grey and green represent distributions of Traditional and Biome data, respectively. (D) Taxonomic composition of records is shown as the area sizes. ‘Other plant’ consists of non-seed terrestrial plants; ‘insects’ include Arachnids and Insects; ‘arthropods’ cover any Arthropod not included in insects; ‘other animals’ covers all invertebrates not included in the taxa above.

Figure 3 with 1 supplement
The accuracy of species distribution models.

Accuracy of species distribution models (SDMs) using Traditional survey data (grey dots and lines) and Biome + Traditional data (i.e. 50% of Biome data: green). Each SDM was performed with a specific dataset, species, and the amount of records. For each species and amount of records, we computed the average model accuracy (Boyce index) from three replicated runs. Subsequently, we calculated the median model accuracy across species for each amount of records. These medians were then illustrated for each taxon in the strip of each respective panel. The ‘Endangered’ category includes species that are listed as endangered on Japan’s national or prefectural red lists.

Figure 3—figure supplement 1
Accuracy of species distribution models (SDMs) using Traditional survey data (grey dots and lines) and Biome + Traditional data (i.e. 50% of Biome data: green), evaluated against test data only consisting of Traditional survey data.
The workflow of checking accuracy of Biome data.
The workflow for selecting pseudo-absence (background) grid cells for species distribution models (SDMs) using the Biome-Traditional dataset.

In this process, both Biome data and Traditional dataset are utilised to determine the suitable locations for pseudo-absence grid cells. However, when constructing SDMs using the Traditional dataset exclusively, Biome data is not involved in the selection of pseudo-absence points.

Japanese archipelago, coloured by altitude.

Shaded area shows spatial block of test data. Retrieved from Wikipedia (2023, May 30), licensed under Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0).

Appendix 1—figure 1
The violin plots of relative model accuracy between species distribution models (SDMs) using Biome-blended data and Traditional survey data.

The median values are shown as grey dots. The positive relative model accuracy indicates that SDMs that used Biome data outperformed models that used Traditional survey data.

Tables

Table 1
Data quality of Biome.

The fraction of records documenting wild individuals, and identification accuracy at species, genus, and family levels among the records documenting wild individuals are shown. Species were identified only for records documenting wild individuals.

Species groupSpecies rarityNWild/total (%)Species correct/wild (%)Genus correct/wild (%)Family correct/wild (%)
TotalTotal142081.69193.696.9
Seed plantTotal29086.289.694.497.2
MolluscaTotal14087.990.291.196.7
InsectTotal29010083.486.994.1
FishTotal14073.687.493.296.1
AmphibianTotal14093.696.296.298.5
ReptileTotal14091.497.7100100
BirdTotal14098.698.699.399.3
MammalTotal14080.795.695.696.5
TotalRare71088.7879195.6
TotalCommon710919596.398.3
Seed plantRare14580.782.991.594.9
Seed plantCommon14591.795.59799.2
MolluscaRare7082.986.287.996.6
MolluscaCommon7092.993.893.896.9
InsectRare14510075.28091.7
InsectCommon14510091.793.896.6
FishRare7074.388.594.294.2
FishCommon7072.986.392.298
AmphibianRare7095.795.595.598.5
AmphibianCommon7091.496.996.998.4
ReptileRare7094.395.5100100
ReptileCommon7088.6100100100
BirdRare7097.198.5100100
BirdCommon7010098.698.698.6
MammalRare7081.491.291.293
MammalCommon7080100100100
Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Software, algorithmR 4.1.3; MaxEnt (using ENMeval 2.0 package on R)R 4.1.3 (R Core Team, 2021); MaxEnt (Phillips et al., 2006; Phillips and Dudík, 2008);
ENMeval 2.0 package (Kass et al., 2021)
OtherSpecies occurrence dataBiome app, GBIF and others (see ‘Methods’)For DOIs of GBIF data, see Supplementary file 2For details, see section ‘Occurrence data’
Table 2
List of species occurrence datasets used for constructing species distribution models (SDMs).

To compare Biome dataset with the other datasets, iNaturalist and eBird data based on community science were classified as ‘Traditional survey’ data.

Original datasetOccurrence records of modelled speciesSpecies coverage among modelled speciesSurvey methodData group in SDMDown load dateAvailability
NOccu pancy
Biome (filtering applied)201,1148.6132/132Citizen science through smartphone appBiome7 July 2023https://biome.co.jp/
National Census on River and Dam Environments (NCRE)1,413,54160.2126/132Traditional survey on freshwater and its adjacent ecosystemsTraditional survey10 January 2023http://www.nilim.go.jp/lab/fbg/ksnkankyo/
Institute records registered at GBIF530,95222.6116/132Traditional survey and museum specimensTraditional survey7 July 2023GBIF*
iNaturalist and eBird118,0505110/132Citizen science through smartphone app and web serviceTraditional survey*7 July 2023GBIF*
Forest Ecosystem Diversity Basic Survey80,9293.442/132Traditional survey on forest treesTraditional survey30 March 2023http://forestbio.jp/
Literature32930.1130/132Traditional surveyTraditional survey31 March 2023Refs*
  1. *

    For the list of GBIF download doi and literature, see Supplementary file 2.

Table 3
Environmental data used for constructing species distribution models (SDMs).

Years indicate the data collection period. Usage in the SDM shows how the variables were converted before using in the species distribution modelling.

DataVariablesYearUsage in the SDMAvailable at
Land useThe area sizes of forests, rice fields, farms, wastelands, inland waters, beaches, ocean, golf courses, urbanised areas, and others2016Extracted six principal components (PCA) explained ≧ 80% of total variation. PCs were converted into linear, quadratic and hinge terms.The Ministry of Land, Infrastructure, Transport and Tourism of Japan (MLIT) (https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-L03-a.html)
Forest typeForest type (planted and natural)1998Converted into linear, quadratic, and hinge terms.The Biodiversity Centre of Japan (http://gis.biodic.go.jp/webgis/index.html)
ClimateMonthly average, minimum and maximum temperature and precipitation11981–2010Transformed into 19 bioclimatic variables (Booth et al., 2014), then extracted three PCs explained ≧ 80% of total variation. Converted into linear, quadratic, and hinge terms.MLIT (https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-G02-v3_0.html)
Elevation-al rangeDifferences between maximum and minimum elevation, and maximum slope1981Converted into linear, quadratic, and hinge terms.MLIT (https://nlftp.mlit.go.jp/ksj/jpgis/datalist/KsjTmplt-G04-a.html)
VegetationThe area sizes1998Transformed into 37 PCs of which total variation explained was more than 80%. Converted into linear, quadratic and hinge terms.MOE (http://gis.biodic.go.jp/webgis/index.html)
GeologyThe area sizes of limestone and serpentinite2022Converted into linear, quadratic and hinge termsThe Research Institute of Geology and Geoinformation (https://gbank.gsj.jp/seamless/use.html)
GeohistoryBlakiston’s Line (Dobson, 1994; Saitoh et al., 2015), oceanic islands (Wepfer et al., 2016; Yamasaki, 2017)Categorical variables

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Keisuke Atsumi
  2. Yuusuke Nishida
  3. Masayuki Ushio
  4. Hirotaka Nishi
  5. Takanori Genroku
  6. Shogoro Fujiki
(2024)
Boosting biodiversity monitoring using smartphone-driven, rapidly accumulating community-sourced data
eLife 13:RP93694.
https://doi.org/10.7554/eLife.93694.3