Phylogenomic and mitogenomic data can accelerate inventorying of tropical beetles during the current biodiversity crisis

  1. Michal Motyka
  2. Dominik Kusy
  3. Matej Bocek
  4. Renata Bilkova
  5. Ladislav Bocak  Is a corresponding author
  1. Laboratory of Biodiversity and Molecular Evolution, Czech Advanced Technology Research Institute, Centre of Region Hana, Czech Republic

Abstract

Conservation efforts must be evidence-based, so rapid and economically feasible methods should be used to quantify diversity and distribution patterns. We have attempted to overcome current impediments to the gathering of biodiversity data by using integrative phylogenomic and three mtDNA fragment analyses. As a model, we sequenced the Metriorrhynchini beetle fauna, sampled from ~700 localities in three continents. The species-rich dataset included ~6500 terminals, ~ 1850 putative species delimited at 5% uncorrected pairwise threshold, possibly ~1000 of them unknown to science. Neither type of data could alone answer our questions on biodiversity and phylogeny. The phylogenomic backbone enabled the integrative delimitation of robustly defined natural genus-group units that will inform future research. Using constrained mtDNA analysis, we identified the spatial structure of species diversity, very high species-level endemism, and a biodiversity hotspot in New Guinea. We suggest that focused field research and subsequent laboratory and bioinformatic workflow steps would substantially accelerate the inventorying of any hyperdiverse tropical group with several thousand species. The outcome would be a scaffold for the incorporation of further data from environmental sequencing and ecological studies. The database of sequences could set a benchmark for the spatiotemporal evaluation of biodiversity, would support evidence-based conservation planning, and would provide a robust framework for systematic, biogeographic, and evolutionary studies.

Editor's evaluation

This manuscript provides clear ideas regarding the usage of next-generation sequencing data, and of more traditional mtDNA markers, to rapidly increase biodiversity inventories. You demonstrate how biodiversity information analyses done in the Metriorrhynchini, a hyperdiverse tropical insect group, can be rapidly expanded via targeted field research and large-scale sequencing. The study sets a benchmark for the spatiotemporal evaluation of tropical biodiversity, supports evidence-based conservation planning, and provides a robust framework for systematic, biogeographic, and evolutionary studies.

https://doi.org/10.7554/eLife.71895.sa0

Introduction

The number of known insects surpasses that of all other terrestrial groups (Mora et al., 2011), and we need much more detailed information to fully understand their diversity. Currently, the available biodiversity data are far from complete, and the majority of insect species remain undescribed (Novotny et al., 2006; Srivathsan et al., 2019). In addition, robust phylogenetic hypotheses are lacking for most lineages, and the genera and tribes are often artificial assemblages which are not relevant to evolutionary and biodiversity research. Therefore, we need to gather new information in order to advance our understanding of evolutionary and genetic relationships, and to build a phylogenetic scaffold for comprehensive taxonomic, biogeographic, and evolutionary studies that would be indispensable for biodiversity management.

Descriptive, morphology-based insect systematics is not keeping pace with the rapid loss and degradation of natural habitats (Theng et al., 2020), and with the ongoing decline in insect abundance as a result of human activities and climate change (van Klink et al., 2020). The largest taxonomic journal, Zootaxa, published almost 30,000 studies describing >60,000 new species and these represent over a quarter of all new animal species reported in 2001–2020 (Zhang, 2021). Although these numbers are impressive, they also show how labor-intensive is taxonomic research if, in average, only two new species are reported in a publication. To accelerate the cataloguing of biodiversity, it is vital to gather new material suitable for molecular analyses and to combine available molecular methods with traditional approaches (Riedel et al., 2013; Srivathsan et al., 2019; Yeo et al., 2020; Sharkey et al., 2021). DNA data are indisputably a valuable source for modern biodiversity research, and they can address both shallow and deep relationships (Tautz et al., 2003; Hajibabaei et al., 2007). There are two principal sources of short-fragment data: voucher-based DNA sequences typically produced by systematists (Riedel et al., 2013; Yeo et al., 2020; Sharkey et al., 2021), and DNA sequences produced by an ecosystem-based sequencing that does not associate individual samples with Linnean names (Andújar et al., 2015; Srivathsan et al., 2019). It is the responsibility of systematic biologists to assemble the natural system, that is, we need to reliably delimit genus- and tribe-level taxa, to make their ecological and distribution attributes informative. But the short fragments are often unsuitable for the building of deep phylogenies and large genomic datasets need to be used (McKenna et al., 2019; Baca et al., 2021). These include transcriptomes, whole genome sequences and anchored hybrid capture datasets. Using all information, a robust and stable natural classification will significantly facilitate detailed research into the spatial and temporal distribution of biodiversity (Morrison et al., 2009; Thomson et al., 2018). As an ultimate goal, we should attempt to construct a complete tree of life, or at least its backbone, which is invaluable in aiding the selection of groups for more detailed analyses (Chesters, 2017; McKenna et al., 2019). With a well-defined high-level classification, it is paramount to exploit all accessible data. We assume that voucher-based molecular phylogenies provide much-needed tools to researchers working on site-based biodiversity assessments (Andújar et al., 2015; Srivathsan et al., 2019) and that, in turn, the data produced by environmental and ecosystem-focused sequencing contribute to building the tree-of-life (Arribas et al., 2016; Bocak et al., 2016).

We have used hyperdiverse tropical metriorrhynchine beetles (Coleoptera, Lycidae, Metriorrhynchini) as our model. This net-winged beetle tribe contains >1500 recognised species, mostly found in the Old-World tropics (Figure 1A), and their classification is complicated by the complex taxonomic history (Bocak et al., 2020; see Appendix introductory information). The phenetic plasticity of Metriorrhynchini is relatively high (Figure 1B–D), but many distant species resemble each other due to convergent selection in Mullerian rings (Bocek et al., 2019; Motyka et al., 2020; Motyka et al., 2021). Therefore, unrelated taxa were often assumed to be closely related due to misleading morphological similarities. Although there are over 40 genera in the tribe, three-quarters of the species have been described in five ambiguously defined genera (Xylobanus, Cautires, Trichalus, Metriorrhynchus, and Cladophorus). Sometimes a single genus contains species from different subtribes (Bocak et al., 2020). In this respect, the Metriorrhynchini is a typical species-rich tropical insect group without well-founded classification and the paucity and inaccuracy of available data (Letsch et al., 2020). As a result, unlike vertebrates, these poorly known insect groups have not been considered for use in large-scale, integrative projects and data metanalyses (Myers et al., 2000; Holt et al., 2013) and have contributed little to our understanding of global biodiversity patterns.

Distribution and appearance of metriorrhynchine net-winged beetles.

(A) Distribution of Metriorrhynchini with major sampled localities designated by red dots. The numbers of analyzed specimens from individual regions are shown for regions and subtribes. (B–D) – General appearance of Metriorrhynchini.

The principal objective of this study is to demonstrate how biodiversity information for a hyperdiverse tropical group can be rapidly expanded via targeted field research and large-scale sequencing of transcriptomes, genomes, and short mtDNA fragments. Our investigation comprised four distinct steps. First, we assembled material from several hundred localities on three continents (Figure 1, Table 1). Second, as hyperdiverse groups are difficult to tackle and the current classification is unreliable, we attempted to compartmentalise diversity using phylogenomics. We then produced a tree, using all available data, to estimate species limits, intraspecific genetic variability, and species ranges. Finally, the tree was pruned and used to estimate shallow phylogenetic relationships, total and regional species diversity, and endemicity, and to define generic ranges and continental-scale range shifts. The applied methods of transcriptome and mtDNA analyses are widely used. The genomic datasets dominate among works focusing on deep relationships (transcriptomes and anchored hybrid capture; Misof et al., 2014; Kusy et al., 2019; McKenna et al., 2019; Baca et al., 2021). The mitochondrial markers have been used mainly to study the phylogeny of restricted clades (Toussaint et al., 2014; Sharkey et al., 2021). Until now, the genomic and mitochondrial data have seldom been combined to get simultaneously the phylogenetic backbone for the mid-rank classification (subtribes, groups of genera, generic limits) and the estimations of species diversity (e.g. Talavera et al., 2021). Our information and phylogenetic hypotheses can be a resource for higher level phylogenetics, population genetics, phylogeographic studies, and biodiversity estimation. At the same time, we want to show how limited our taxonomical knowledge is and how this lack is hindering biodiversity research and management (Thomson et al., 2018).

Table 1
The numbers of sampled localities per region.
AreaLocalitiesAreaLocalities
Australian region298Sino-Jap. region79
Australia118China51
New Guinea & Solomons179Japan28
New Zealand1
Wallacea49
Moluccas15Oriental region206
Sulawesi34S.India & Ceylon3
E.India & Burma12
Afrotropical Region64E.Indo-Burma44
West Africa1Malay Peninsula57
Guinean Gulf11Sumatra23
Ethiopia6Java & Bali15
East Africa10Philippines33
South Africa25
Madagascar11Total696

Results

Sampling of the Metriorrhynchini range

In total, we monitored almost 800 localities, 696 of them with occurrences of the Metriorrhynchini (Tabs. 1, Appendix 1—table 1). The distribution of sampling sites was partly biased due to the large extent of the Metriorrhynchini range, limited time and funds, different goals of various expeditions, and logistic problems (inaccessible regions, legal obstacles). The densest sampling is available from the Sundaland and New Guinea, while India and the Afrotropical region are under-sampled (Figure 1A).

Assembly of the phylogenomic tree

The phylogenomic dataset contained 35 Metriorrhynchini terminals (Appendix 1—table 2), seven outgroups, and ~4200 orthologs (1.9–5.7 × 106 aligned positions; Supplementary file 1; Appendix 1—table 3; Appendix 1—table 4). The tree shown in Figure 2A was produced using maximum likelihood (ML) analyses, whereas the coalescent method produced the topology shown in Figure 2B; additional trees are shown in Appendix 1—figures 18. For details on the data sets’ characteristics see Appendix 1—figures 912. Phylogenomic analyses resolved three subtribes (Metanoeina [Metriorrhynchina, Cautirina]), and five clades were regularly recovered within the Metriorrhynchina, that is, the hereby defined procautirines, leptotrichalines, trichalines, porrostomines, and cladophorines. Different settings (see Materials and methods) produced slightly different topologies and shifted the positions of the leptotrichalines and procautirines (Figure 3D). However, the monophyly of major subclades was not affected. The FcLM analysis favored a deeper position for the leptotrichaline clade (61.2%; Figure 2C, Appendix 1—figure 13). The position of the remaining terminals was stable across all analyses. All phylogenomic analyses question the definitions of some species rich genera (Appendix 1—figures 18) that are either polyphyletic (e.g. Cladophorus; 131 described species, most of them recovered in the Ditua subclade) or paraphyletic (Metriorrhynchus as a grade and Porrostoma in the terminal position; 194 species, see Figure 2A and B, Appendix 1—figures 18).

Topologies recovered by phylogenomic analyses.

(A) Phylogenetic relationships of Metriorrhynchinae based on the ML analyses of the concatenated amino-acid sequence data of supermatrix F-1490-AA-Bacoca-decisive. Unmarked branches are supported by 100/100 UFB/alrt; red circles depict lower phylogenetic branch support. (B) Phylogenetic relationships of Metriorrhynchini recovered by the coalescent phylogenetic analysis with ASTRAL when analysing the full set of gene trees (4109 gene trees inferred at the nucleotide level). Pie charts on branches show ASTRAL quartet support (quartet-based frequencies of alternative quadripartition topologies around a given internode). Outgroups taxa are not shown. (C) Results of FcLM analyses for selected phylogenetic hypotheses applied at the amino-acid sequence level (supermatrix F). (D) Alternative phylogenetic relationships of Metriorrhynchinae based on the ML analyses of the concatenated amino-acid sequence data of supermatrix A-4109-AA. Numbers depict phylogenetic branch support values based on 5000 ultrafast bootstrap replicates.

Topologies recovered by mitogenomic analyses.

(A) Relationships of Metriorrhynchini recovered by the constrained analysis of the pruned dataset at 2% distance. (The full resolution tree is shown in Source data 2 along with a tree recovered from the analysis of a complete dataset of 6429 terminals in Source data 1), asterisk designates a grade of Metriorrhynchina-like taxa found in a position in conflict with their morphology; (B) A chart of Robinson-Foulds distances among topologies inferred by repeated runs of the constrained and unconstrained analyses; (C) A comparison of the results obtained by two runs of the constrained analysis; (D) A comparison of trees inferred with/without the phylogenomic backbone; (E) A comparison of results obtained by two runs of the unconstrained analysis. The red lines designate terminals with conflicting positions in compared trees.

Constrained mitogenomics

The mtDNA database contained >11,500 mtDNA fragments (5935 cox1, 2381 rrnL, and 3205 nad5) representing 6429 terminals (2930 aligned positions). Using these data, we inferred additional trees using the constrained positions of 35 terminals whose relationships were determined through phylogenomic analyses, and the free positions of the other ~6400 terminals (Source data 1). The units based on uncorrected pairwise distances represent molecular operational taxonomic units (mOTUs), considered to be putative species, or ‘species’ for short. Depending on the applied 2% and 5% thresholds, we identified 34–37 mOTUs in the Metanoeina clade and 369–456 mOTUs in Cautirina. The major Metriorrhynchina clade (1376–1763 mOTUs) included procautirines, leptotrichalines, trichalines, porrostomines, and cladophorines. In addition, we identified several deeply rooted lineages, the kassemiines, and another five small clades (69–89 mOTUs in total; Source data 2), each of which comprised a limited number of species. As phylogenomic data for these terminals are still lacking, their positions were determined based only on mtDNA data and they are included in Metriorrhynchina, based on morphological traits (Table 2). The number of mOTUs does not include ~50 mOTUs for which cox1 was unavailable.

Table 2
The numbers of described species and identified mOTUs (molecular operational taxonomic units) at 2% and 5% thresholds per region and the total number of OTUs identified for subtribes.

Based on morphological identification, the OTUs of the kassemiine and other deeply rooted clades are included in Metriorrhynchina.

RegionMetriorrhynchina described/analyzed at 2%/5%Cautirina described/ analyzed at 2%/5%Metanoeina described/ analyzed at 2%/5%Metriorrhynchini described/ analyzed at 2%/5%RatioAnalyzed/described
Australian region639/1608/1239639/1608/12392.52–1.93
Australia196/167/133196/167/1310.85–0.67
New Guinea423/1434/1105423/1434/11053.39–2.61
Solomon Isl.21/9/921/9/90.43
Wallacea162/174/16214/10/9176/184/1711.05–0.97
Philippines51/18/1845/12/128/3/3104/33/330.32
Continental Asia43/52/42331/330/25730/34/31404/416/3301.03–0.82
Sundaland36/44/39201/184/14624/19/17261/247/2020.95–0.77
Indo-Burma6/7/762/52/423/4/474/63/530.85–0.72
China, Japan1/1/153/75/581/11/1155/87/701.58–1.27
India35/19/182/0/037/19/180.51–0.49
Afrotropical region231/104/94231/104/940.46–0.41
Sub-Saharan Africa178/74/65178/74/650.42–0.37
Madagascar53/30/2953/30/290.57
Total number of OTUs895/1852/1445641/456/36938/37/341574/2345/18481.50–1.17

Pruned mitogenomic tree with and without constraints

The dataset was subsequently pruned to a single terminal per mOTU based on 2% and 5% distance (see below) and was analyzed both with and without topological constraints (Figure 3A; Source data 2 and 3) show the pruned trees at 2% levels to capture the intraspecific genetic variability within the clusters of closely related mOTUs). Repeated runs with different starting seeds identified terminals with unstable positions (Figure 3A–C). The major clades were generally stable, whereas small, deeply rooted clades were prone to ‘wandering’ around the tree, as were distinct singletons. The trees that resulted from each of the seed-specific 19 ML runs differed slightly; tree similarity was thus evaluated using the Robinson-Foulds index, with values ranging from 0.180 (most similar) to 0.147 (most distant; Appendix 1—table 4).

Tree congruence

The degree of incongruence between selected topologies is shown in Figure 3C–D. The unconstrained analysis of mitochondrial data yielded a topology with a high number of terminals that were recovered in positions incongruent with their morphology (Figure 3D and E, Source data 3). The same dataset, when analyzed using the constrained position of 35 terminals (based on their relative relationships inferred by prior phylogenomic analyses), produced a topology with a much lower proportion of terminals in dubious positions (Figure 3C, Source data 2). The composition of the constituent clades is based on the topology recovered by the constrained mtDNA analyses and the position of genera was validated by morphological comparisons of vouchers with the type species of earlier described genera (all redescribed by Bocak, 2002). The named genera assigned into individual subclades are shown individual clades are characterized in Appendix results.

Species diversity

To investigate the total and regional species diversity of the Metriorrhynchini, we analyzed a dataset comprising 5935 of the 6429 terminals for which the cox1 mtDNA fragment was available (Figure 3A; Supplementary file 1). For the Metriorrhynchini, we identified 1848 and 2345 mOTUs using the 5% and 2% thresholds, respectively (Appendix 1—figure 14). We disregarded the presence of ~50 mOTUs (494 terminals) for which cox1 was missing. The number of mOTUs based on the cox1 analysis varied by thresholds and the number of delimited OTUs increased relatively slowly with decreasing threshold values from 1% to 10% (Appendix 1—figure 14).

Using an earlier published literature review (Bocak et al., 2020), we updated species lists for the Cautirina (641 spp. described species), Metanoeina (38 spp.), and Metriorrhynchina by adding taxa described in 2020 and 2021. By analysing DNA data, we identified 34–37 putative spp. of Metanoeina, 369–456 spp. of Cautirina, and 1445–1852 spp. of Metriorrhynchina, depending on the applied mtDNA uncorrected pairwise 5% and 2% mtDNA distance thresholds. The numbers of species per subregion, along with the estimated ratios between formally described and estimated species diversity, are shown in Table 2 for the 2% and 5% threshold (further information in Appendix 1—figure 14). Using both thresholds, 2% and 5%, the numbers of putative species surpass the numbers of species reported in the literature.

We observed very high species turnover even if 5% threshold was applied for delimitation. Only four mOTUs have been recorded in two landmasses separated by a deep-sea ( > 200 m). The faunas of Sulawesi and the islands across Wallace’s and Weber’s lines share two mOTUs, one mOTU was simultaneously identified in Laos and Luzon and one species in New Guinea and the Solomon islands. Similarly, only sixteen mOTUs were distributed across two landmasses separated by an inundated shelf (sea depth <100 m). Nine mOTUs were distributed in two or more islands of Southeast Asia and seven species were found in both New Guinea and Australia. The centres of species diversity of the Metriorrhynchini are New Guinea (1,105 putative spp. at 5% threshold) and the seasonally to perennially humid areas of the Sundaland (202 spp. at 5% threshold). The results suggest substantial modifications to the generic limits and ranges for numerous taxa that had been previously delimited (Appendix 1—figure 15).

Having the extensive the mtDNA topologies, we looked for examples of the evolution of neoteny and mimetic polymorphism. The detailed inspection of trees identifies the closest available relative of a putative neotenic, Cautires apterus (Cautirina). This species is morphologically very distinct (Figure 4B and D). The dated subtree indicates the recent origin of morphological divergence (Figure 4A). The mtDNA analyses recover some species with pronounced sexual dimorphism, such as an unidentified genus and species of the procautirine clade (Figure 4F–H). The origin of the polymorphism is putatively very recent (Figure 4F).

Identification of sexual dimorphism by large-scale biodiversity inventory.

(A) Relationships of lineages with modified ontogeny, the dated tree; (B, D) General appearance and head of Cautires apterus, a putative neotenic species; (C, E) ditto of the close relative with both sexes winged. Mimetic sexual dimorphism identified during diversity survey. (F) The dated tree, red colored terminal labels designate the individuals shown in G and H; (G) Dorsal view of individuals in copula; (H) Ditto, lateral view. Except of collecting individuals in copula, DNA-based assessment of relationships is the only option as the species are sexually dimorphic and no morphological traits indicate their conspecifity.

Discussion

In the context of the present loss of biodiversity (Sodhi et al., 2004; Hallmann et al., 2017; Theng et al., 2020), large-scale genomic resources are urgently needed for biodiversity assessment and conservation (Hajibabaei et al., 2007; Krehenwinkel et al., 2019). Molecular data cannot replace morphology-based taxonomy (Figure 3C–D; Thomson et al., 2018), but the analyses of our dataset complement and facilitate traditional biodiversity research in several directions. Our first step is to compartmentalize hyperdiverse Metriorrhynchini into manageable natural units (Figure 2). The densely sampled phylogeny identifies tribal and generic limits. It provides a useful foundation for detailed taxonomic research through the identification of weak areas in earlier classifications and points out the clades with undescribed diversity and non-monophyletic genera (e.g. several hundred species of Ditua and paraphyletic Metriorrhynchus; Figure 3, Appendix 1—figure 15). Furthermore, the analyses of species-rich datasets identify the areas with high species diversity as one of the critical conservation value parameters (Table 2; Baselga, 2010; Srivathsan et al., 2019). Traditional taxonomic research costs time and money, and the number of newly described species is relatively low if we consider the enormous diversity of tropical insects (Novotny et al., 2006; Sangster and Luksenburg, 2015). Therefore, we use DNA-based units as a provisional descriptor of species diversity (Hebert et al., 2003; Monaghan et al., 2009), and subsequently as a source for integrative taxonomy (Source data 1–3; Srivathsan et al., 2019). The presented large-scale monitoring project provides information on relationships (Figures 2 and 3), genetic divergence (Source data 1–3), turnover (Table 2), the extent of generic and species ranges (Appendix 1—figure 15, Source data 1–3), and on evolutionary phenomena that are usually studied using a few model organisms (Figure 4). Using phylogenomics and voucher-based sequencing, we show that taxonomic literature has provided insufficient and sometimes erroneous information, even after the formal consolidation of scattered descriptions (Bocak et al., 2020). We show that a taxon-focused continental scale project can effectively assemble comprehensive data for diversity of tropical insects.

Continent-wide taxon-specific monitoring of biodiversity: feasibility and impediments

Tissue and DNA archives have become critical in the assessment of biodiversity status (Hajibabaei et al., 2007; Blom, 2021). Although museomics is a potentially valuable source of data (Gauthier et al., 2020), in our case, museum collections are insufficient for filling data gaps due to the scarcity of material. For example, the Metriorrhynchini collection deposited in the Natural History Museum in London contains <3000 specimens, whereas there are ~6500 terminals in our dataset. At the beginning of our study, we faced critical absence of primary data. Therefore, we conducted intensive field research to obtain samples for a realistic assessment of the extant Metriorrhynchini diversity. We processed samples from our expeditions (most of which were focused on a range of topics over two decades between 2001 and 2019) and samples obtained through extensive collaboration with other researchers, both local and visiting, and with local naturalists whose contribution has increased with the growing number of citizen science projects (Jaskula et al., 2021; MacPhail and Colla, 2020). In such a way, we assembled a Metriorrhynchini tissue collection from almost 700 localities in three continents (Table 1, Figure 1). For several reasons our sampling is partly biased. We noted the serious loss of natural habitat in many regions. Previously described species were often collected in vicinity of seaports, but the lowland ecosystems are rapidly disappearing due to human exploitation. Therefore, type localities of many described species could not be sampled during recent expeditions and species known from museum collections are missing in our DNA dataset (Jiruskova et al., 2019). The habitat loss in South East Asia also affects other animal groups, and lowland primary forests are seriously endangered in the whole region (Sodhi et al., 2004; Theng et al., 2020). Further sampling bias is a consequence of the unsafe conditions and logistic problems in large areas of West Africa, Sahel, and the Congo Basin (Figure 1A), where net-winged beetles have not been systematically studied since the 1930s. Additional data gaps are caused by strict biodiversity research restrictions (Prathapan, 2018). Regardless of these limitations, we believe that the assembled dataset is a foundation for a robust classification framework and a soundly based assessment of biodiversity. Our results show the importance of field research for biodiversity studies and systematics (Basset and Lamarre, 2019).

Phylogenetic relationships: a scaffold for targeted research

Unresolved taxonomy is a common reason for the exclusion of specific groups from biodiversity research projects and this omission has an effect on conservation policies (Gutiérrez and Helgen, 2013). The current phylogenomic and mitogenomic phylogenetic hypotheses (Figures 2 and 3; Source data 1–3) supersede the morphology-based topologies (Bocak, 2002). The phylogenomic analysis incorporates a large amount of information, and we favor this method over morphological traits and short DNA sequences, both of which contain uncertainties (McKenna et al., 2019). Phylogenomics has resolved subtribe relationships and their internal structures. The analyzed 35 transcriptomes and low-coverage genomes were sufficient to identify five major Metriorrhynchina clades with several hundred putative species each and also to identify the limits of genera, which can be tested using traditional taxonomic methods (Figures 2 and 3A; Source data 1–3).

The sampling strategy is critical for building a phylogenomic backbone. Our goal was to cover as many deep lineages as possible and simultaneously to limit the number of sequenced RNA samples to avoid high costs. Therefore, we sequenced RNAlater preserved tissues and conspecific vouchers prior to assigning tissue samples for transcriptomic analyses. In this way, two rounds of sequencing provided us with critical information based on evenly distributed anchor taxa. In the next step, we re-analyzed the short-fragment dataset (Supplementary file 1) using constrained positions for taxa whose relationships had already been recovered through phylogenomics (Figures 2 and 3). A stabilized phylogenomic backbone is critical for inferring a robust topology because the large-scale analyses of short mtDNA fragments are sensitive, even to the application of starting seeds, and they often produce topologies incongruent with morphological traits (Figure 3B and E; Sklenarova et al., 2014). Only several small lineages have remained unanchored by genomic data, owing to a lack of properly fixed samples (Source data 2 and 3). For example, four small clades are much more deeply rooted than their morphology suggests (Figure 3A, Source data 2) and additional data are needed to place them in a phylogenetic context. Despite some contentious relationships that need further investigation, 35 genomic samples, that is under 2% of species, sufficiently supported relationships among most terminal clades that approximately represent genera, groups of genera and subtribes. Talavera et al., 2021 have shown that only six nuclear markers for 5%–10% of terminals can similarly stabilize the phylogeny of a species rich model group. We assume, that the combination of genomic and short DNA data can be valuable for building of the species-level trees of life.

We identified a substantial conflict between phylogenomic analyses, morphology-based classifications (Bocak, 2002 and earlier studies cited therein), and the analyses of a few short DNA fragments (Sklenarova et al., 2014, Appendix introductory information). Our analyses confirm the monophyly of the recently described Cautirina and Metanoeina and redefined Metriorrhynchina except several unanchored lineages (Sklenarova et al., 2013; Sklenarova et al., 2014), but, for the first time, we can robustly recover subtribal relationships (Figure 2A and B). We reject most internal splits suggested by morphological and mtDNA analyses (Bocak, 2002; Sklenarova et al., 2013; Sklenarova et al., 2014 and earlier studies cited therein). The present delimitation of five monophyla within Metriorrhynchina resolves the backbone of the subtribe that was contentious due to high levels of homoplasy in Sanger and morphology-based datasets. Similarly, some generic concepts are questioned as they have been mostly based on highly homoplastic traits (Sklenarova et al., 2014; Kusy et al., 2019). The taxonomic studies must consider the morphology along with molecular hypotheses. As morphology is not described in this study, we do not discuss the limits of individual genera and report only short information on newly defined subclades (Appendix results).

Our approach yielded a constrained phylogeny with 6429 terminals and almost 2000 mOTUs using 5% mtDNA distance threshold, and this provides the basis for the approximation of species diversity for constituent subclades and geographic regions (Figure 3; Source data 2 and 3). Concerning the extent of diversity, phylogenomic and mitochondrial data must be simultaneously analysed to provide a strong foundation for subsequent investigations (Figure 3C, Talavera et al., 2021). Phylogenomics cannot deal with thousands of species, and mitogenomic data are insufficient for the construction of robust deep relationships. The final steps are morphological validation of the proposed generic groups and genera (see Appendix results) and, in the future, formal descriptions of biodiversity using the Linnean classification. In such a way, the results of phylogenomic and mitogenomic inventory should be incorporated in the formal classification (Godfray and Knapp, 2004).

Species diversity: literature data and reality

Here, we deal with a species rich tropical beetle tribe ( > 1500 described species), and therefore we use the uncorrected pairwise distance thresholds for our diversity estimation (Table 2, Appendix 1—figure 14). The application of any threshold is a compromise between estimation accuracy, speed, and sequencing costs, taking into account the feasibility of inventorying a hyperdiverse group within a limited time frame (Hebert et al., 2003; Dupuis et al., 2012; Eberle et al., 2020). Several taxonomical works on Metriorrhynchini have simultaneously analysed mtDNA fragments, nuclear genes, morphology, and ecology (e.g. Bocak and Yagi, 2010; Kalousova and Bocak, 2017; Bocek et al., 2019; Jiruskova et al., 2019), but their results cannot robustly defend an application of a distance threshold for the whole tribe. To avoid a possibility of diversity overestimation, we base further discussion on the 5% distance. We assume that such a threshold might be sufficiently cautious. Future taxonomic revisions are surely needed for the validation of here presented data on species diversity.

When analysing the cox1 mtDNA fragment, we identified 1,848 mOTUs at 5% threshold and the numbers of delimited mOTUs indicate that, the substantial part of species diversity remains undescribed (Table 2, Appendix 1—figure 14). Additionally, the slope representing the relationship between the number of mOTUs and distance thresholds was gradual (Appendix 1—figure 14) due to a high number of genetically distant, indisputably distinct lineages in the dataset (Appendix 1—figure 14).

Our approach provides information about the diversity of the internal lineages. Metriorrhynchina is by far the most diverse group, within which the cladophorines comprise the largest clade (490 mOTUs, e.g, Ditua historically has included 2 spp., now ~250 spp.). The porrostomine clade is the next diverse group (373 mOTUs) and contains the speciose Porrostoma (126 mOTUs) and a paraphyletic series of lineages whose species have conventionally been placed in Metriorrhynchus. The differences between previously published data and our results are substantial and they question any reanalyses of literature data without prior verification (Bocak, 2002; Bocak et al., 2020 references therein; Source data 2 and 3).

The numbers of mOTUs must be interpreted in the context of the sampling activity in each region. We did not use standardized protocols due to the long-term character of our research, the incorporation of some samples provided by other researchers, and the necessity to apply appropriate collecting methods to maximize the number of recorded species in various ecosystems and under different weather conditions. We identified only 94 mOTUs from the Afrotropical region, mainly due to the limited number of collecting trips by authors (64 localities) and the inaccessibility of some areas. Despite intensive field research (33 localities), we collected from the Philippines less than one third of the species described (33 mOTUs). Our collection activities in the Philippines were hindered by substantial loss of natural habitats, and this is soon expected to be the case in other regions (Sodhi et al., 2004). The number of species known from the Sundaland (114 localities) was approximately equal to the number of sequenced mOTUs despite disproportionately intensive collecting effort by the authors. Even after numerous expeditions to the Sundaland, many regions remain unsampled. As metriorrhynchine species ranges are small (Jiruskova et al., 2019; Motyka et al., 2020), the number of species will probably increase in the future. The proportion of new species was regionally ~70% if DNA data and morphology were considered in detailed taxonomic studies (e.g. Jiruskova et al., 2019). While these regions house numerous unknown species, we found New Guinea to be exceptionally diverse, with almost three times the number of species reported in the literature (1,105 mOTUs at 5% threshold; 175 localities, Table 1 and Table 2). Despite the relatively large number of sampled localities, many areas of New Guinea remain unexplored and many places were only superficially sampled by colleagues and never visited by the authors (Figure 1A). Additional species were added to the dataset with each batch of sequenced samples from New Guinea and the area possibly houses much higher diversity than documented by the present study.

We observed a high turnover between regions, and few species had ranges which included landmasses separated by shallow seas (seven spp. Queensland / New Guinea, 9 spp. in Southeast Asia; Source data 1). Poorly dispersing lycids generally have very small ranges, except for the few genera that visit flowers and fly in open areas (Motyka et al., 2021). A similar small-scale turnover has recently been reported along altitudinal gradients (Bocek et al., 2019; Motyka et al., 2020; Motyka et al., 2021). A high turnover indicates a large proportion of hidden diversity, especially in tropical mountains (Merckx et al., 2015; Mastretta-Yanes et al., 2018). Mountain fauna is especially vulnerable to climate change and its inventorying is urgently needed.

The Metriorrhynchini has recently received considerable attention in taxonomic studies, and 302 species have been described by several authors over the past three decades, making a total of 1574 formally described species (Table 1, Kazantsev, 2010; Bocak et al., 2020; Appendix introductory information). Although the recent 24% increase in described diversity appears substantial, the distance-based analysis indicates the presence of almost 2000 mOTUs (Appendix 1—figure 14). An additional ~50 putative species (494 terminals) were identified, but this identification was only based on divergent morphology because of the absence of cox1. We assume that our sampling represents only a subset of all known species ( < 50%). It means that the dataset contains ~1000 undescribed species. At the current rate, formal morphological descriptions of an additional 1000 species would take decades or hundreds of years. This is a very long time in the context of the ongoing deforestation and fragmentation of natural habitats, and currently undocumented diversity might be lost long before it can be catalogued (Brooks et al., 2002; Sodhi et al., 2004; Ceballos et al., 2015; Theng et al., 2020). The rapid DNA-based inventory is an effective shortcut for obtaining basic information on the true diversity of tropical beetles and for setting a benchmark for future biodiversity re-evaluations.

The results reveal major biodiversity hotspots in New Guinea and the Sundaland. Tropical rainforests currently cover most of New Guinea, a tectonically young island that has not been considered a biodiversity hotspot for vertebrates (Myers et al., 2000; Hall, 2011; Toussaint et al., 2014). In the case of net-winged beetles, we show that the New Guinean fauna is phylogenetically diverse, spatially heterogeneous, and extremely rich as regards both the number of species and the endemic genera (Table 1). Additionally, the large clades of New Guinean species indicate that the diversification of major lineages preceded the uplift of the islands, and possibly started on the northern margin of the Australian craton and adjacent islands. Southeast Asia is a centre of phylogenetic diversity at the tribal level; its fauna contains all principal lineages and the highest diversity of Cautirina but is smaller than those of New Guinea. The Afrotropical and Palearctic regions represent only recently populated low-diversity outposts.

Impact of biodiversity inventorying on biogeographical and evolutionary research

Detailed data on Metriorrhynchini diversity indicate low dispersal propensity and this makes Metriorrhynchini a promising model for biogeographic studies (Ikeda et al., 2012). Our densely sampled phylogeny did not find any long-distance dispersal events, in contrast to many studies of flying beetles (Balke et al., 2009; Jordal, 2015). Most recovered overseas dispersal events are limited to distances of less than 100 km and are commonly accompanied by speciation (Source data 2 and 3). The high proportion of erroneous placement of many taxa (Appendix 1—figure 15; Bocak et al., 2020) renders the distribution data cited in previous literature unsuitable for phylogeographic investigations, and revision of the classification is important in order to understand the true distribution of individual taxa. The original and revised ranges of selected genera are compared in Appendix 1—figure 15 as examples.

Intensive biodiversity research has the potential to fill knowledge gaps concerning evolutionary phenomena that are mainly studied using a small number of model species, and the research can identify the unique attributes of other potential models. We document the contribution of a large-scale biodiversity inventory to evolutionary studies with two examples.

Net-winged beetles include several lineages in which females have lost the ability to completely metamorphose (Bocak et al., 2008; Mcmahon and Hayward, 2016). If a putative neotenic species is discovered, a comprehensive reference database of the group may identify its closest relatives. We used our data to place the East African Cautires apterus in a phylogenetic context, and the results indicated that it may be the youngest neotenic taxon of all net-winged beetles (36.1 my, Figure 4).

Our extensive DNA database of metriorrhynchine diversity may also play an important role in the study of the evolution of mimicry. Our inventory identified an extreme and previously unknown aposematic dimorphism in New Guinean metriorrhynchines (Figure 4; Figure 5). The placement of sexually dimorphic species in the phylogeny suggests that the shift to dimorphism was very recent (3.0 mya at the earliest) and began when both sexes were small-bodied. Mimetic sexual polymorphism is well understood in butterflies with non-mimetic males and mimetic females (Kunte, 2008), but the advergence of males and females to different aposematic models has only recently been reported in two subfamilies of net-winged beetles (Motyka et al., 2018; Motyka et al., 2020; Motyka et al., 2021). Divergent evolution in Müllerian systems appears to be more common in multi-pattern aposematic rings than was previously believed when morphology was the sole source of information.

A sequence of applied methods from sampling to hypotheses.

Conclusion

Priority areas for global conservation have usually been identified based on richness, species endemism and vulnerability of vertebrates (Myers et al., 2000; Holt et al., 2013). We assume that different patterns of biodiversity distribution can be revealed if other animal groups are studied. Reliable information on additional groups can focus our conservation efforts on valuable regions (Morrison et al., 2009; Thomson et al., 2018). Our model, beetles, is the most speciose group of animals but is much less known than vertebrates. Therefore, new data must be generated, and our research workflow must use innovations to economically produce the large-scale phylogenetic hypothesis for a high number of species. We conducted a worldwide sampling in ~700 localities, analysed transcriptomes, genomes, and mitochondrial markers, and validated our results with morphology. We show that the constrained position of less than 2% terminals increases the stability of tree topology and the congruence of molecular hypotheses with morphological traits (Appendix results). By the simultaneous consideration of genomic and mitochondrial phylogenetic signal, we achieved substantial progress with respect to the development of a Metriorrhynchini tree of life (Chesters, 2017; Linard et al., 2016). The voucher-based DNA entries established a framework for classifying samples from other studies, such as environmental sequencing (Linard et al., 2016; Andújar et al., 2015; Arribas et al., 2016) and for subsequent morphology-based studies. Despite limited time and funding, we identified almost 2000 mOTUs which indicate that there are at least twice more species than the number reported in the literature. This means that, at a conservative estimate, ~ 1000 species in the dataset were previously unknown to science. Furthermore, we identified New Guinea as a biodiversity hotspot, which is in clear contrast with studies identifying the biodiversity patterns of vertebrates. Our large-scale inventory shows that the literature records of tropical beetles cannot be used for biodiversity conservation and metanalyses without critical revision. We suggest that if focused field research is conducted even by a small research group and subsequent workflow steps are applied to any hyperdiverse tropical group, the results can set a benchmark for future evaluation of spatiotemporal changes in biodiversity.

Materials and methods

Field research

Request a detailed protocol

The analyzed individuals had been accumulated by numerous expeditions to various regions of the Metriorrhynchini range (Figure 1A, Appendix 1—table 1). The distribution of sampling sites was partly biased, and no samples are available from West Africa, Congo Basin, Sahel, Sri Lanka, and the Lesser Sundas. About 10% of samples were provided by other researchers.

Tissues for transcriptomic analyses were fixed in the field. As field identification is generally unreliable, we preferred to collect pairs in copula, then the female was fixed using RNAlater, and the male kept separately in 96% ethanol for Sanger sequencing and the voucher collection. Alternatively, the morphologically similar individual from the same place was fixed in ethanol and the identity of an individual assigned for transcriptomic analysis was confirmed by sequencing cox1 mtDNA using tissue from the specimen preserved in RNAlater and putatively conspecific voucher (Figure 2). About 100 tissue samples were fixed and thirty-five of them were used for sequencing (Appendix 1—table 2). Earlier published transcriptomes were added (McKenna et al., 2019; Kusy et al., 2019). Due to the inaccessibility of properly fixed tissue, the two critical samples were shotgun sequenced using isolated DNA.

Almost 7000 samples from 696 localities (Table 1) were included in the sequencing program to obtain short mtDNA fragments. In total, 6429 yielding at least a single fragment were included in the analysis (Supplementary file 1). The analyzed data set contained some previously published sequences (e.g. Sklenarova et al., 2013; Bocek and Bocak, 2019). Voucher specimens are deposited in the collection of the Laboratory of Biodiversity and Molecular Evolution, CATRIN-CRH, Olomouc.

Genomic and transcriptomic sequencing, data analysis

Request a detailed protocol

Libraries for thirty transcriptomes were prepared by Novogene Co., Ltd. (Beijing, China) and sequenced on the HiSeq X-ten platform (Illumina Inc, San Diego, CA). The removal of low-quality reads and TruSeq adaptor sequences were performed using fastp v.0.20.0 (Chen et al., 2018) with the following parameters: -q 5 u 50 l 50 n 15. All paired-end transcriptomic reads were assembled using SOAPdenovo-Trans-31mer (Xie et al., 2014).

Additionally, the total DNA (~33 Gb each) of Metanoeus sp. (Voucher code G19002) and an unidentified sample Metriorrhynchina species (Voucher JB0085) was shotgun-sequenced on the same platform. Reads were filtered with fastp using the same settings as above and quality was visualized with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The draft genomes were assembled using SPAdes v.3.13.1 (Bankevich et al., 2012), with k-mer sizes of 21, 33, 55, 77, and 99. Obtained contigs were used to train Augustus (Stanke and Waack, 2003) for species-specific gene models with BUSCO (Waterhouse et al., 2018). Predicted species-specific gene models were then used for ab initio gene predictions in Augustus and predicted protein-coding sequences were used for subsequent analyses. Outgroup taxa were reported in previous studies (Kusy et al., 2018, Kusy et al., 2019; McKenna et al., 2019).

The ortholog set was collated by searching the OrthoDB 9.1 (Zdobnov et al., 2017) for single copy orthologs in six beetle genomes (Appendix 1—table 3). We used Orthograph v.0.6.3 (Petersen et al., 2017) with default settings to search in our assemblies for the presence of specified single copy orthologs. From the recovered 4193 orthologs, terminal stop codons were removed, and internal stop codons at the translational and nucleotide levels were masked. The amino acid sequences were aligned using MAFFT v.7.471 with the L-INS-i algorithm (Katoh and Standley, 2013). The alignments from each ortholog group were then checked for the presence of outliers. To identify random or ambiguous similarities within amino acid alignments, we used Aliscore v.2.076 with the maximum number of pairwise comparisons –r 1027, option -e. and we masked them using Alicut v.2.3 (Kück et al., 2010). Alinuc.pl was then used to apply the Aliscore results to match amino acids to the nucleotide data. MARE v.0.1.2-rc was used to calculate the information content of each gene partition (Misof et al., 2013). Partitions with zero information content were removed at both levels. Finally, the remaining 4109 alignments were retained for subsequent multispecies coalescent analyses, and different concatenated datasets were generated for both amino acid and nucleotide levels using FasConCat-G v.1.4 (Kück and Longo, 2014; Appendix 1—table 4 and Appendix methods). The degree of missing data and overall completeness scores (Ca) across all datasets were inspected using AliStat v.1.7 (Wong et al., 2020).

Compositional heterogeneity tests

Request a detailed protocol

To explore the effect of among species compositional heterogeneity and its possible bias to tree reconstruction, we inspected the data with BaCoCa v.1.105 (Kück and Struck, 2014) to identify the gene partitions that strongly deviate from compositional homogeneity using relative composition frequency variation value (RCFV). Following Vasilikopoulos et al., 2019, we considered compositional heterogeneity among species in a given partition to be high when RCFV ≥0.1. The heterogeneous partitions were excluded from the data to generate a more compositionally homogeneous dataset. We used Maximum Symmetry Test (Naser-Khdour et al., 2019) to identify the partitions that strongly deviate from compositional homogeneity at the nucleotide level (p-value cut off <0.05), and partitions below the threshold were excluded. The software SymTest v.2.0.49 (Ott, 2019) was used to calculate the overall deviation from stationarity, reversibility, and homogeneity (SRH) (Ababneh et al., 2006).

Phylogenomic maximum likelihood analyses

Request a detailed protocol

For all datasets, phylogenetic reconstruction was performed using the maximum likelihood (ML) criterion with IQ-TREE 2.1.2 (Minh et al., 2020). First, we analyzed all datasets using the original gene partition boundary. The model selection for each gene was performed with ModelFinder (Chernomor et al., 2016; Kalyaanamoorthy et al., 2017) implemented in IQ-TREE2 (-MFP option). GTR model was considered for nucleotide supermatrices. For the amino acid supermatrices, the substitution models LG, DCMUT, JTT, JTTDCMUT, DAYHOFF, WAG, and free rate models LG4X and LG4M were tested. All possible combinations of modeling rate heterogeneity among sites were allowed (options: -mrate E,I,G,I + G,R -gmedian -merit BIC). We used the edge-linked partitioned model for tree reconstructions (-spp option) allowing each gene to have its own rate. The optimized partition schemes and best-fitting models were inferred for some datasets using -m MFP+ MERGE and the considering same substitution models as above. The fast-relaxed clustering algorithm was used to speed up computation during partition-scheme optimization (Lanfear et al., 2017). Ultrafast bootstrap (Hoang et al., 2018) and SH-like approximate likelihood ratio test (SH-aLRT) were calculated in IQ-TREE2 (options -bb 5000 and -alrt 5000) to assess nodal supports for focal relationships.

Coalescent analyses and analyses of the confounding and alternative signal

Request a detailed protocol

To account for variation among gene trees owing to incomplete lineage sorting and to account for potential gene tree heterogeneity and discordance (Edwards, 2009), the data were also analyzed using the coalescent-based species-tree method. For every single gene partition, we calculated an ML gene tree in IQ-TREE2, with 5,000 ultrafast bootstrap replicates (-bb option) and using the same substitution models as predicted by ModelFinder in the above described partitioned analyses. For subsequent coalescent species tree estimation, the Accurate Species Tree Algorithm (ASTRAL-III v.5.7.3; Zhang et al., 2018) was used. To account for very poorly resolved branches on gene trees, branches with ultrafast bootstrap ≤10 were collapsed using newick utilities v.1.6 (Junier and Zdobnov, 2010) in every ASTRAL analysis. Local posterior probabilities (Sayyari and Mirarab, 2016) and quartet frequencies of the internal branches in every species tree were calculated using the parameter ‘-t = 2’. Pie charts representing quartet scores for the given topology and two alternatives were plotted to the resulting species trees in R using plot_Astral_trees (Bellot, 2021).

Additionally, we studied the effect of potentially confounding signals, like non-random distribution of data coverage and violations of SRH conditions, on our phylogenetic reconstructions with the Four-cluster likelihood mapping (FcLM) approach (Strimmer and Haeseler, 1997) implemented in IQ-TREE2. Based on the results of our tree reconstructions, we tested the hypotheses about the alternative placement of leptotrichaline and procautirine clades.

Mitochondrial DNA sequencing and data analysis

Request a detailed protocol

Total DNA was extracted from the metathorax with a Wizard SV96 kit (Promega Corp., Madison, WI). The yield was measured using a NanoDrop-1000 Spectrophotometer (Thermo Fisher Scientific Inc, Waltham, MA). The PCR settings and cycle sequencing conditions were the same as those used by Bocak et al., 2008. Three fragments of mitochondrial genome were sequenced: cox1+ tRNA Leu+ cox2 (~1100 bp), rrnL+ tRNA Leu+ nad1 (~800 bp), and ~1210 bp of nad5 and adjacent tRNA-Phe, tRNA-Glu, and tRNA-Ser mtDNA (the mtDNA fragments are further mentioned as rrnL, cox1, and nad5). The PCR products were purified using PCRu96 Plates (Merck Millipore Inc, Burlington, MA) and sequenced by an ABI 3130 (Applied Biosystems, Waltham, MA) sequencer using the BigDye Terminator Cycle Sequencing Kit 1.1 (Applied Biosystems, Waltham, MA). Sequences were edited using Sequencher v.4.9 software (Gene Codes Corp., Ann Arbor, MI). Altogether 6476 individuals were analyzed including some previously published (Sklenarova et al., 2013; Bocek and Bocak, 2019).

The cox1 gene fragment was used to OTUs delimitation (Blaxter et al., 2005) using CD-hit-est (Fu et al., 2012) and different thresholds (from similarity 0.99–0.90 by 0.05 steps). Therefore, we assembled two datasets: (A) the dataset containing all sequenced individuals and (B) all OTUs delineated by 0.98 similarity of the cox1 gene. The rrnL and tRNAs were aligned using MAFFT 7.2 with Q-INS-I algorithm (Katoh and Standley, 2013), protein-coding genes were eye-checked for stop codons and aligned using Trans-Align (Bininda-Emonds, 2005). All fragments were concatenated using FasConCat (Kück and Longo, 2014) and analyzed under maximum-likelihood criterium in IQ-TREE v.2.1.2 (Minh et al., 2020; Appendix 1—table 5). To assess the branch supports values, we used SH-aLRT test with 1000 iterations. ModelFinder tool implemented in IQ-TREE was used to identify the best fit models using the Bayesian Information Criterion (Chernomor et al., 2016). The results of the TSA/WGS analyses were used to constrain basal topology among major clades of Metriorrhynchini in both analyses of datasets A and B. Further, we ran unconstrained analyses of the above-mentioned datasets with identical settings except -g option to compare results. We replicated constrained tree search nineteen-times and compared resulting trees using Robinson–Foulds distances in R package phangorn (Schliep, 2011; Appendix 1—table 2). Randomly chosen trees were then compared using cophylo script (phytools; Revell, 2012) with argument rotate = TRUE.

Appendix 1

Appendix introductory information

Overview of Metriorrhynchini systematics and natural history

Lycidae (net-winged beetles) is a diverse, mostly tropical family of Elateroidea and Metriorrhynchini is the largest lycid tribe with almost 1600 species (Masek et al., 2018). The history of their classification started with Kleine’s proposals of tribes and/or subfamilies Metriorrhynchini/inae, Trichalini/inae, Cladophorini/inae, and Dilolycini/inae (see an overview by Kleine, 1933). The supergeneric classification of net-winged beetles was later revised. Conderini were merged with Metriorrhynchini and Trichalini in the widely defined Metriorrhynchinae and Cladophorini and Dilolycini were synonymized to Metriorrhynchini (Bocak, 2002 and references therein). Calder, 1998 considered Metriorhynchinae Kleine, 1926 (based on Metriorhynchus Guérin-Méneville, 1838) to be a homonym of the crocodilian subfamily Metriorhynchinae Meyer, 1832 and used Dilolycinae Kleine, 1926 as a replacement name. Trichalinae were placed as a tribe Trichalini in Metriorrhynchinae or a subtribe Trichalina in Metriorrhynchini (Bocak, 2002). Additionally, Hemiconderina Bocak et al., 2008 were erected for Hemiconderis and related genera. Detailed information on the metriorrhynchine classification history was published in the latest revision of the subtribal classification (Sklenarova et al., 2014) and the genera were assigned to subtribes by Kubecek et al., 2015 and Bocak et al., 2020.

The alpha-taxonomy is mostly based on the original, often uninformative descriptions (catalogued by Kleine, 1933; Bocak et al., 2020). Recently, Calder, 1998 compiled the catalogue of Australian net-winged beetles and proposed Metriorhynchus Guérin-Méneville, 1838 as a junior homonym of Metriorhynchus Meyer, 1832. Subsequently, Porrostoma Castelnau, 1838 was used as a replacement name for Metriorhynchus Guérin-Méneville, 1838 and new combinations and new replacement names were proposed (Calder, 1998). Bocak, 2002 reviewed the status of lycid taxa described by Guérin-Méneville, 1838 and proposed to replace Metriorhynchus Guérin-Méneville, 1838 by Metriorrhynchus Gemminger et Harold, 1869. Bocak, 2002 additionally considered Porrostoma Castelnau, 1838 as a valid name for a separate genus in Metriorrhynchini. These studies resulted in repeated transfers of several hundred of species between Metriorrhynchus and Porrostoma (Calder, 1998; Bocak et al., 2020).

There are five Metriorrhynchini genera which house the majority of described species: Xylobanus (231 spp.), Cautires (437 spp.), Trichalus (120 spp.) Metriorrhynchus (194 spp.), and Cladophorus (131 spp.; Kleine, 1933; Calder, 1998; Bocak, 2002; Bocak et al., 2020), although their limits have been ambiguous. Trichalus was recovered as a much smaller clade than Microtrichalus Pic, 1921 (Bocek and Bocak, 2017). Australian Metriorrhynchus transferred to Porrostoma Castelnau, 1838 by Calder, 1998 were formally returned to Metriorrhynchus by Metriorrhynchus by Bocak et al., 2020. Xylobanus was recovered as a clade that contains most Oriental Cautirina with four primary costae, but Afrotropical Xylobanus were recovered as a distantly related terminal clade within Afrotropical Cautires (Sklenarova et al., 2013; Sklenarova et al., 2014). Some Oriental Xylobanus belong to Metanoeina and New Guinean species to Metriorrhynchina (Kubecek et al., 2015; Bocak et al., 2020). An additional typologically defined genus, Procautires Kleine, 1925, was reported from Asia, Africa and Australasia (Kleine, 1933), but the type species belongs to Metriorrhynchina in contrast to the Afrotropical and Asian species which are close to Cautires (Cautirina).

Geographic distribution

The tribe Metriorrhynchini dominates the Australian net-winged beetle fauna with ~85% (196 spp.) of the named taxa. They are similarly common in New Guinea (423 spp.) and the Wallacea (176 spp.). Although highly diversified (261 spp. and 231 spp., respectively), they represent only about 30% of described net-winged beetle species in the Oriental and Afrotropical regions where similar numbers of species belong to Platerodini, Calochromini and Lycini, or several small tribes (Masek et al., 2018).

The tribe Metriorrhynchini has a Gondwanan distribution, and the range reaches to the Palearctic East Asia (Russian Far East, Cautirina and Metanoeina). Additionally, the tribe is well represented in the Sino-Japanese realm sensu Holt et al., 2013 (China, Japan, Cautirina and Metanoeina in the whole area, Metriorrhynchina in southernmost China only) and in the Oriental realm (most of India, Sri Lanka, the lower elevations of the Himalayas, Indo-Burma, Malaya, the Great and Lesser Sundas, the Philippines). Further, the tribe is widespread in the Oceanian realm, especially in its western part (the Moluccas, New Guinea, the Solomon Islands and a few islands further east) and in the Australian realm (Australia, Tasmania, and one introduced species in New Zealand). The Afrotropical realm houses relatively diversified fauna in forest and savannah habitats, but all species belong to Cautirina and represent only two clades which correspond with putative independent colonization events from drifting India to Madagascar and continental Africa (Sklenarova et al., 2013).

The subtribes differ in their distribution. Metriorrhynchina is mostly Oceanian and Australian, relatively rich Metriorrhynchina fauna is known from the Sulawesi, much less species occur in the Philippines and only a few species of Diatrichalus, Leptotrichalus, Trichalus, Microtrichalus, and Metriorrhynchus in the eastern part of the Oriental realm (Bocak and Yagi, 2010; Bocek and Bocak, 2019; Bocek et al., 2019) and southern Yunnan in the Sino-Japanese realm. The Cautirina occurs in the Afrotropical, Oriental, Palearctic, and Sino-Japanese realms, but they do not cross Weber’s line that was proposed as a border between Oriental and Oceanian realm (Holt et al., 2013). The Metanoeina is Oriental, with a limited number of species in China and Japan. Only Metanoeus occurs in the Philippines.

The Metriorrhynchini is the species-rich tribe and the almost 200 years of taxonomic research accumulated data that need serious revision (Bocak et al., 2020). In the taxonomy of this group, we face the burden of uninformative descriptions (e.g. the studies by the French entomologist M. Pic in the first half of the 20th century). The careless classification of newly described species, inaccessibility of type-holding collections or their poor organization, and legal regulations limit taxonomic research (Prathapan, 2018; Bocak et al., 2020).

Biology

The adults are volatile, but they are poor dispersers due to their soft-bodiedness and many species remain in shaded situations under the canopy, usually sitting on leaves. Only some species (Porrostoma, Metriorrhynchus, Leptotrichalus, and Trichalus) visit flowers. The flower frequenting species are more common in semi-dry areas where nectar represents a source of water for adults instead of water on leaves or in rotten wood. All Metriorrhynchini are unpalatable for predators and most are aposematically coloured (Eisner et al., 2008; Motyka et al., 2021). The color patterns are usually geographically restricted and the sympatrically occurring species form Müllerian mimicry rings (Bocak and Yagi, 2010; Motyka et al., 2018; Motyka et al., 2020; Motyka et al., 2021; Bocek et al., 2019). The larvae of Metanoeus, Cautires, Leptotrichalus, Metriorrhynchus, and Porrostoma were described by Bocak and Matsuda, 2003. They live in rotten wood including trunks and logs on soil surface or in rotten roots in soil in arid regions.

Appendix results

Metriorrhynchina subclades based on phylogenomic analyses and their morphological characteristics and distribution.

The procautirine clade

The phylogenomic analysis contained four taxa that clustered as the first or second deepest serial branch in Metriorrhynchina. One of the internal clades contains species that are morphologically highly similar to the type-species of Procautires. They have reduced secondary elytral costae in the middle part of the elytra. Other procautirine species have fully developed secondary costae unlike Procautires. The only morphological trait supporting the monophyly of the procautirine clade is a setose patch in the internal sac of the male genitalia.

The leptotrichaline clade

The phylogenomic backbone identified close relationships of three endemic genera from the Sulawesi (Mangkutanus, Wakarumbia, and Broxylus), and two genera, Sulabanus, Leptotrichalus, known from Sulawesi and the Philippines (Bocak et al., 2020). The mtDNA dataset identified as closely related the Australian and New Guinean Synchonnus and New Guinean Falsolucidota. Two lineages of the clade putatively colonized the Oriental region: Leptotrichalus (the Philippines, Sundas, Indo-Burma and southernmost China) and Sulabanus (only the Philippines, 2 spp. from Mindanao and Sibuyan identified herein, either undescribed species or currently placed in Cautirina: Xylobanus). We have not found any reliable external diagnostic morphological character which would support the relationships of these genera. Only Sulabanus and Mangkutanus have a complete set of lateral pronotal carinae. Other genera have simplified of absent carinae except those that form the median lanceolate areola (Bocak, 2002). The group contains genera with the highly diverse arrangement of elytral costae.

The trichaline clade

The genomic dataset contained representatives of three genera: Diatrichalus, Microtrichalus, and Eniclases, all of them sharing the characteristic morphological traits of the trichaline clade, that is a single median areola in the pronotum and the shortened first primary elytral costa. We identified as close relatives two lineages of small-bodied Metriorrhynchina with conspicuous median pronotal areola and either absent or simplified lateral carinae, but with the full-length elytral costa 1. The structure of elytral costae is diverse in the clade: most trichalines sensu stricto have well-developed secondary costae, that is, nine costae at humeri, only a few Diatrichalus have only four primary costae in the humeral part. All genera of the here delimited trichaline clade, except Diatrichalus, share similar male genitalia (Bocak, 2002). The deepest branches of the trichaline clade contain mostly New Guinean species and several lineages of the trichalines colonized the Sulawesi, Philippines, Greater Sundas, Malaya, Indo-Burma, and southernmost China.

The porrostomine clade

Six transcriptomes were analyzed to recover the backbone of the clade. Porrostoma and Metriorrhynchus are distantly related and easy to distinguish with the genital morphology (Bocak, 2002). Numerous small bodied, morphologically diverse Metriorrhynchina species were identified as their close relatives. Most Metriorrhynchus and their closest relatives come from New Guinea, but several species of a terminal clade crossed Lydderker’s, Weber’s, and Wallace’s lines and colonized Sulawesi, the Philippines, Greater Sundas, Malaya, and Indo-Burma (Bocak and Yagi, 2010).

The cladophorine clade

Altogether eleven taxa were included in the genomic analysis, and we found Cladophorus as a sister to Pseudodontocerus (=Carathrix) and Ditua. Most genera have seven pronotal areoles, flabellate to pectinate male antennae, and nine well developed costae in the elytra. Mitochondrial dataset recovered this clade as the most speciose group. The characteristic large-bodied species resembling the type-species of Cladophorus were found as a terminal subclade among several lineages of small-bodied Metriorrhynchina which do not resemble them in general appearance. Further splits contain the Pseudodontocerus clade and a large clade containing representatives of earlier described genera Ditua and Cautiromimus. The species of the Cladophorus clade are most diverse in New Guinea, but some species of the Ditua clade were recorded from the Philippines and Sulawesi, a limited number of species is native to continental Australia.

Impact of molecular phylogenetics on the assessment of morphological evolution

With the robust, data-rich phylogeny, we can test earlier hypotheses on morphological evolution. Relationships among subtribes have been based on the similarity of larvae and ambiguously supported by Sanger data (Sklenarova et al., 2013; Sklenarova et al., 2014). Our results imply that bipartite larval terga can be inferred either as a synapomorphy of Metriorrhynchini with reversal to entire tergites in Cautirina or we must alternatively consider their independent origin in Metanoeina and Metriorrhynchina. The mapping of the character on the phylogenomic tree needs two instead of a single step as earlier supposed (Sklenarova et al., 2014). When most larvae are unknown and their morphology variable, the phylogenetic hypotheses based on incomplete morphological data and uncertain polarity urgently need validation from independent sources of information.

Due to quite high morphological disparity and ambiguous interpretation of some morphological characters, there were proposed subfamilies, tribes, and/or subtribes for various subsets of genera now placed in the Metriorrhynchina (Metriorrhynchinae/ini, Cladophorinae/ini, Trichalinae/ini, Dilolycinae/ini, Hemiconderina, and possibly Melanerotini, now in Lycidae incertae sedis; Bocak, 2002; Kazantsev, 2010; Kusy et al., 2019). Phylogenomics reject the possibility to assign a high rank to Trichalini and Hemiconderina as numerous coordinated taxa would have to be erected to make all groups reciprocally monophyletic and keep the earlier proposed names valid.

The phylogenomic analyses reinforce the previously identified monophyly of the trichaline clade (=earlier Trichalini part; Bocek and Bocak, 2017), but simultaneously suggest their delayed origin as a terminal clade and the presence of some ‘non-trichaline’ lineages as serial sister-groups to the trichalines in the earlier sense. The cladophorine clade contains numerous taxa which had been placed until recently into Cautires (Cautirina), a distant lineage in respect to cladophorines. Herein, we further delimit the porrostomine clade which contains Metriorrhynchus and Porrostoma, but along with them several deeply rooted lineages which have been incorrectly placed to various genera.

For the first time, we define the leptotrichaline clade that contains the genera earlier placed into the trichalines (Leptotrichalus), Hemiconderina (Falsolucidota), Lycinae: Calopterini (Broxylus), and Cautirina (some Xylobanus). The procautirine clade is also a newly defined group. Procautires had been for long considered as related to Cautires, but it is one of earliest Metriorrhynchina.

Until recently, the classification had been based almost exclusively on the arrangement of pronotal carinae, elytral costae and the shape of antennae. These traits are easily observable and sometimes define a clade, but in many cases their presence and/or absence is misleading as they are highly plastic even within restricted clusters of species (Sklenarova et al., 2014; Bocek and Bocak, 2017; Kusy et al., 2019). Before the availability of molecular data, the homoplasy of external morphological characters could not be tested. As a result, the earlier studies coped with uncertain relationships of newly described species at the levels from a tribe down to a genus (Kleine, 1933; Calder, 1998; Bocak et al., 2020).

The list of Metriorrhynchini genera with complete synonymy:

  • Subtribe Metanoeina Sklenarova et al., 2014

  • Matsudanoeus Sklenarova et al., 2014

  • Metanoeus Waterhouse, 1878

  • Ochinoeus Kubecek et al., 2015

    Xylometanoeus Sklenarova et al. 2014

  • Subtribe Cautirina Sklenarova et al. 2014

  • Subtribe Cautirina Sklenarova et al. 2015

  • Caenioxylobanus Waterhouse, 1878

  • Cautires Waterhouse, 1878

  • Prometanoeus Kleine, 1925

  • = Tapromenoeus Bocak et Bocakova, 1989

  • Paracautires Kazantsev, 2012

  • Spartoires Kazantsev, 2012

  • Tricautires Kazantsev, 2006

  • Xylobanus Waterhouse, 1878

  • Subtribe Metriorrhynchina Kleine, 1926

  • Procautirine clade

  • Procautires Kleine, 1925b

  • Xylothrix Kazantsev, 2015

  • Leptotrichaline clade

  • Falsolucidota Pic, 1921

  • = Hemiconderis Kleine, 1926

  • Broxylus Waterhouse, 1879

  • = Samanga Pic, 1921

  • Mangkutanus Kubecek et al., 2011

  • Sulabanus Dvorak & Bocak, 2007

  • Leptotrichalus Kleine, 1925

  • Synchonnus Waterhouse, 1879

  • = Achras Waterhouse, 1879

  • = Enylus Waterhouse, 1879

  • = Strophicus Waterhouse, 1879

  • Wakarumbia Bocak, 1999

  • Cladophorine clade

  • Cautiromimus Pic, 1926

  • Cladophorinus Kleine, 1926

  • Cladophorus Guérin-Méneville, 1830

  • = Odontocerus Guérin-Méneville, 1838

  • = Spacekia Strand, 1936

  • subgenus Cladophorus s. str.

  • subgenus Falsocautires Pic, 1926

  • Ditua Waterhouse, 1879

  • Marena Kazantsev, 2007

  • Pseudodontocerus Pic, 1921

  • = Carathrix Kleine, 1926

  • Trichaline clade

  • Diatrichalus Kleine, 1926

  • = Mimoxylobanus Pic, 1921

  • Eniclases Waterhouse, 1879

  • = Trichalolus Pic, 1923

  • Flabellotrichalus Pic, 1921

  • = Stereotrichalus Kleine, 1926

  • = Villosotrichalus Pic, 1921

  • subgen. Flabellotrichalus s. str.

  • subgen. Maibrius Bocek et Bocak, 2017

  • Lobatang Bocak, 1998

  • subgenus Lobatang s. str.

  • subgenus Spinotrichalus Kazantsev, 2010

  • Microtrichalus Pic, 1921

  • = Falsoenylus Pic, 1926

  • Schizotrichalus Kleine, 1926

  • Trichalus Waterhouse, 1877

  • = Xantheros Fairmaire, 1877

  • Porrostomine clade

  • Metriorrhynchus Gemminger et Harold, 1869

  • = Metriorhynchus Guérin-Méneville, 1838

  • = Dilolycus Kleine, 1926

  • = Flabelloporrostoma Pic, 1923

  • Porrostoma Laporte, 1838

  • Oriomum Bocak, 1999

  • Metriorrhynchoides Kleine, 1926

  • Stadenus Waterhouse, 1879

  • Metriorrhynchina incertae sedis

  • Malacolycus Kleine, 1943

  • Xylobanomimus Kleine, 1926

  • Xylobanomorphus Kleine, 1935

  • Kassemia Bocak, 1998

Appendix methods

Genomic and transcriptomic sequencing, data analysis, individual datasets

Libraries for thirty transcriptomes were prepared by Novogene Co., Ltd. (Beijing, China) and sequenced on the HiSeq X-ten platform (Illumina Inc, San Diego, USA). The removal of low-quality reads and TruSeq adaptor sequences were performed using fastp v.0.20.0 (Chen et al., 2018) with the following parameters: -q 5 u 50 l 50 n 15. All paired-end transcriptomic reads were assembled using SOAPdenovo-Trans-31mer (Xie et al., 2014).

Additionally, the total DNA (~33 Gb each) of Metanoeus sp. and an unidentified sample JB0085 was shotgun-sequenced on the same platform by Novogene Co., Ltd. (Beijing, China) for 150 bp paired-end reads. Read were filtered with fastp using same setting as above and quality was visualized with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc).

The draft genomes were assembled using SPAdes v.3.13.1 (Bankevich et al., 2012), with all parameters set to default values and k-mer sizes of 21, 33, 55, 77 and 99. Obtained contig sequences were used to train Augustus (Stanke and Waack, 2003) for species specific gene models with BUSCO v.3 (Waterhouse et al., 2018), -long option, Endopterygota set of conserved genes (n = 2442) and -sp tribolium2012 as the closest relative. Predicted species specific gene models were then used for ab initio gene predictions in Augustus and predicted protein coding sequences were used for subsequent analyses. Seven outgroup taxa, Porrostoma and Mangkutanus transcriptomes were reported in previous studies (Kusy et al., 2018, Kusy et al., 2019; McKenna et al., 2019). The filtering and assembling methods for these samples have been described by Kusy et al., 2019.

The ortholog set was collated by searching the OrthoDB v.9.1 database (Zdobnov et al., 2017) for single copy orthologs in six beetle genomes Onthophagus taurus (Poelchau et al., 2015), Tribolium castaneum (Shelton et al., 2015; Richards et al., 2008) Dendroctonus ponderosae (Keeling et al., 2013), Anoplophora glabripennis (McKenna et al., 2017), Leptinotarsa decemlineata (Poelchau et al., 2015) and Agrilus planipennis (Poelchau et al., 2015; Appendix 1—table 3). OrthoDB v.9.1 predicted 4225 single copy orthologs for beetle species and Coleoptera (Polyphaga) reference node. We used Orthograph v.0.6.3 (Petersen et al., 2017) with default settings to search in our assemblies for the presence of specified single copy orthologs. From the recovered 4193 orthologs, terminal stop codons were removed, and internal stop codons at the translational and nucleotide levels were masked using the Perl script summarize_orthograph_results.pl (Petersen et al., 2017). The amino acid sequences were aligned using MAFFT v.7.471 with the L-INS-i algorithm (Katoh and Standley, 2013). The alignments from each ortholog group were then checked for the presence of outliers using the script checker_complete.1.3.1.2.pl, according to previously published methods (Misof et al., 2014; Peters et al., 2017). Identified outlier sequences were removed from amino acid and nucleotide alignments. After this step, the sequences of reference taxa and all gap-only sites were pruned. Corresponding multiple sequence alignments of nucleotides were generated using Pal2Nal (Suyama et al., 2006). To identify random or ambiguous similarities within amino acid alignments, we used Aliscore v. 2.076 with the maximum number of pairwise comparisons –r 1027, a special scoring approach for gap-filled amino acid sites option -e, and other parameters set to default values. Any random or ambiguous similarities were masked using Alicut 2.3 (Kück et al., 2010). Alinuc.pl was then used to apply the Aliscore results to match amino acids to the nucleotide data (Peters et al., 2017). Additionally, in each gene alignment, the short randomly aligned fragments were replaced with gaps and sequences with ≥80% missing data (calculated as percentage of ‘-’ and ‘X’ in the amino acid alignments and ‘-’ and ‘N’ in the nucleotide alignments) were removed using Python scripts (Zhang, 2021). MARE v. 0.1.2-rc was used to calculate the information content of each gene partition in terms of amino acid coding (Misof et al., 2013). Partitions with zero information content (IC0) were removed at both amino acid and nucleotide level.

Finally, the remaining 4,109 alignments were retained for subsequent multispecies coalescent analyses, and different concatenated datasets were generated for both amino acid and nucleotide levels using FasConCat-G v.1.4 (Kück and Longo, 2014). With all 4109 genes, we generated datasets designated as A-4199-AA and B-4199- NT (name-number of partitions-level). To reduce the effect of saturation on phylogenetic reconstruction (Breinholt and Kawahara, 2013) we created the dataset C-4199-NT12 and E-4109-NT2, with third-codon or third and first-codon positions excluded. The degree of missing data and overall completeness scores (Ca) across all datasets were inspected using AliStat v.1.7 (Wong, 2021) and heatmaps of pairwise completeness scores for all analysed datasets were generated (see Appendix 1—table 3). Further Information content and saturation values (the overall degree of data coverage with respect to gene presence or absence) were calculated with MARE v.0.1.2-rc (MAtrix REduction) (Misof et al., 2013) for each dataset.

Compositional heterogeneity tests

To explore the effect of among species compositional heterogeneity and its possible bias to tree reconstruction, we inspected the data with BaCoCa v.1.105 (Kück and Struck, 2014) to identify the gene partitions that strongly deviate from compositional homogeneity using relative composition frequency variation value (RCFV). Following Fernández et al., 2016 and Vasilikopoulos et al., 2019, we considered compositional heterogeneity among species in a partition to be high when the overall RCFV value was ≥0.1. The compositionally heterogeneous partitions were excluded from the dataset A-4109-AA to generate more compositionally homogeneous dataset D- 3370-AA. To create more homogenous and decisive dataset, we reduced the dataset D-3370-AA to contain only partitions with all taxa and created dataset F-1490-AA. Additionally, we used MARE to increase information content of dataset D-3370-AA and default output supermatrix is designed as dataset J-2129-AA_MARE. We used tree matched-pairs tests of homogeneity the MPTS (matched- pairs test of symmetry), MPTMS (matched-pairs test of marginal symmetry), and MPTIS (matched- pairs test of internal symmetry) (Naser-Khdour et al., 2019) implemented in IQ-TREE2 to identify the partitions that strongly deviate from compositional homogeneity at nucleotide level (P-value cutoff <0.05) and partitions below the threshold were excluded. And following datasets generated: G-NT-1767_MaxSymTest, H-NT-1645_MaxSymTestmarginal and I-NT-3905_MaxSymTestInternal.

The software SymTest v.2.0.49 (https://github.com/ottmi/symtest) was used to calculate the overall deviation from stationarity, reversibility, and homogeneity (SRH) (Ababneh et al., 2006) between the amino acid and nucleotide sequences. Heatmaps were generated for all datasets to visualize the pairwise deviations from SRH conditions.

Phylogenetic maximum likelihood analyses

For all datasets phylogenetic reconstruction was performed using maximum likelihood (ML) criterion with IQ-TREE 2.1.2 (Minh et al., 2020). First, we analyzed all datasets using original gene partition boundary. The model selection for each gene was performed with ModelFinder (Chernomor et al., 2016; Kalyaanamoorthy et al., 2017) implemented in IQ-TREE2 (-MFP option). GTR model were considered for nucleotide supermatrices. For the amino acid supermatrices, the substitution models LG, DCMUT, JTT, JTTDCMUT, DAYHOFF, WAG and free rate models LG4X and LG4M were tested. All possible combinations of modeling rate heterogeneity among sites were allowed (options: -mrate E,I,G,I + G,R -gmedian -merit BIC). We used the edge-linked partitioned model for tree reconstructions (-spp option) allowing each gene to have its own rate. The optimized partition schemes and best-fitting models were inferred for dataset A-4109-AA using -m MFP+ MERGE and considering same substitution models as above. The fast-relaxed clustering algorithm was used to speed up computation during partition-scheme optimization (Lanfear et al., 2017). The top 10% of partitions schemes --rclusterf 10 and maximum 12 327 partitions pairs (three times the number of partitions) --rcluster-max 12 327 were considered. Dataset A-4109-AA was also analyzed without any prior partitioning and under best fitting JTT + F + R9 substitution model. Ultrafast bootstrap (Hoang et al., 2018) and SH-like approximate likelihood ratio test (SH-aLRT) were calculated in IQ-TREE2 (options -bb 5000 and -alrt 5000) to assess nodal supports for focal relationships (Appendix 1—table 3).

Analyses of the confounding and alternative signal with four-cluster likelihood mapping (FcLM) and sequence data permutations

Additionally, we studied the effect of potentially confounding signal, like non-random distribution of data coverage and violations of SRH conditions, on our phylogenetic reconstructions with the Four- cluster likelihood mapping (FcLM) approach (Strimmer and Haeseler, 1997) as described by Misof et al., 2014 and implemented in IQ-TREE2. Based on the results of our ML tree reconstructions we tested the hypotheses about alternative placement of the leptotrichaline and procautirine clades. Permutation of data were executed as described in Misof et al., 2014 Our approach only differs in the use of the LG substitution matrix (Le and Gascuel, 2008) for permuting the supermatrices instead of WAG substitution matrix. For the analysis of the permuted supermatrices we used the option -q for the partition model in IQ-TREE2 and we used the LG model (for amino-acid alignments) and GTR model (for the nucleotide alignments). With original data, we performed the FcLM analyses by applying previously estimated substitution models (edgelinked models, -spp) with IQ-TREE2.

For estimation of Cautires apterus and sexually dimorphic Metriorrhynchus sp. splits were used subsets of the samples which represent the most common lineages. Dating of forementioned diversification was analyzed in BEAST v.1.8.1 (Drummond et al., 2012) applying HKY + I + G substitution model, uncorrelated relaxed clock model and birth–death process tree prior and 0.0115 rate for cox1 gene as applied earlier by Bocak and Yagi, 2010. The stationary phase was checked in Tracer v.1.5 (http://beast.bio.ed.ac.uk/Tracer). The consensus tree was computed with TreeAnnotator v.1.8.2 (http://beast.bio.ed.ac.uk/treeannotator) after eliminating a part of trees as a burn-in after evaluating ESS in tracer v.1.5. The final trees were visualised in Figtree v.1.4.4.

Data deposition: The additional supporting data and all the analysed supermatrices are deposited in the Mendeley Data repository DOI:https://doi.org/10.17632/ntgg6k4fjx.1.

Appendix 1—figure 1
Maximum likelihood trees from IQ-TREE amino acid analysis of dataset A –4109-AA.

(A) With partitioning by gene and (B) without partitioning. The depicted branch support values represent SH-aLRT and ultrafast bootstrap.

Appendix 1—figure 2
Maximum likelihood trees from IQ-TREE amino acid analysis.

(A) Analysis of the dataset A-4109-AA with optimization of the partitioning scheme and (B) analysis of the dataset D-3370-AA_Bacoca with partitioning by gene. The depicted branch support values represent SH-aLRT and ultrafast bootstrap.

Appendix 1—figure 3
Maximum likelihood trees from IQ-TREE amino acid analysis.

(A) analysis of the dataset A –F-1490-AA_Bacoca_decisive and (B)analysis of the dataset J-2129-AA_MARE. Both datasets were partitioned by gene. The depicted branch support values represent SH-aLRT and ultrafast bootstrap.

Appendix 1—figure 4
Maximum likelihood trees from IQ-TREE nucleotide analysis.

(A) Analysis of the dataset B-4109-NT and (B)analysis of the dataset E-4109-NT2 using only second codon positions. Both datasets were partitioned by gene. The depicted branch support values represent SH-aLRT and ultrafast bootstrap.

Appendix 1—figure 5
Maximum likelihood trees from IQ-TREE nucleotide analysis.

(A) analysis of the datasetC-4109-NT12 using codon positions 1 + 2 and (B) analysis of the dataset G-NT-1767_MaxSymTest. Both datasets were partitioned by gene. The depicted branch support values represent SH-aLRT and ultrafast bootstrap.

Appendix 1—figure 6
Maximum likelihood trees from IQ-TREE nucleotide analysis.

(A) analysis of the dataset H-NT-1645_MaxSymTestmarginal and (B) analysis of the dataset I-NT-3905_MaxSymTestInternal. Both datasets were partitioned by gene. The depicted branch support values represent SH-aLRT and ultrafast bootstrap.

Appendix 1—figure 7
Topologies recovered by Astral analyses.

(A) ASTRAL species trees with branch lengths in coalescent units as resulted from the analyses of individual IQ-TREE maximum likelihood gene trees of nucleotide dataset B-4109-NT and (B) ASTRAL species trees with branch lengths in coalescent units as resulted from the analyses of individual IQ-TREE maximum likelihood gene trees of amino acid dataset A-4199-AA. Numbers on nodes show local posterior probabilities (pp1). Quartet support for the alternative topologies (q1, q2, and q3), total number of induced quartet trees in the gene trees that support the alternative topologies (f1, f2, f3) and local posterior probabilities (pp1, pp2, pp3) are available at Dryad repository.

Appendix 1—figure 8
Phylogenomic topologies.

(A) Phylogenetic relationships of Metriorrhynchini, resulted from the summary coalescent phylogenetic analysis with ASTRAL, when analyzing the full set of gene trees (4109 gene trees) inferred at the nucleotide level. (B) doitto at amino acids level. Pie charts on branches show ASTRAL quartet supports q1,q2,q3 (quartet-based frequencies of alternative quadripartition topologies around a given internode).Tree topologies correspond with Appendix 1—figure 7.

Appendix 1—figure 9
Alistat heatmaps.

(A–D) AliStat rectangular heatmaps showing pairwise alignment completeness scores for all species included in the analyzed amino acid supermatrices. The abbreviations of datasets correspond to Appendix 1—table 4. Values closer to one indicate higher completeness scores for the pairwise sequence comparisons.

Appendix 1—figure 10
AliStat heatmaps.

(A–F) AliStat rectangular heatmaps showing pairwise alignment completeness scores for all species included in the analyzed nucleotide supermatrices. The abbreviations of datasets correspond to Appendix 1—table 4. Values closer to one indicate higher completeness scores for the pairwise sequence comparisons.

Appendix 1—figure 11
SymTest analyses.

(A–D) Rectangular heatmap calculated with SymTest showing p-values for the pairwise Bowker’s tests in the analyzed amino acid supermatrices. Darker boxes indicate lower p-values and thus larger deviation from evolution under SRH conditions.

Appendix 1—figure 12
SymTest analyses.

(A–F) Rectangular heat map calculated with SymTest showing p-values for the pairwise Bowker’s tests in the analyzed nucleotide supermatrices. Darker boxes indicate lower p-values and thus larger deviation from the evolution under SRH conditions.

Appendix 1—figure 13
Results of FcLM analyses testing alternative phylogenetic hypotheses about placement of procautirine and leptotrichaline clades applied for various supermatrices.

The first column shows the results of FcLM when the original data were analyzed. The second column shows the results of FcLM after the phylogenetic signal had been eliminated from data. The third column show the results of FcLM after elimination of the phylogenetic signal and inhomogeneous amino-acid or nucleotide composition. The fourth column show the results of FcLM after the elimination of phylogenetic signal, inhomogeneous amino-acid or nucleotide composition and with randomized data coverage within all meta-partitions (see Supplementary methods for details).

Appendix 1—figure 14
Tribal and subtribal mOTUs delimitation using CD-hit-est.

Axis X represent the number of operational taxonomic units, whereas axis Y shows the delimitation threshold.

Appendix 1—figure 15
Distribution of selected Metriorrhynchini genera (originally assumed and revised).
Appendix 1—figure 16
Comparison of the Robinson-Fould (RF) distances among tree searches with constrained topology and tree searches with unconstrained topology.

Green values represent top 10% RF- distances, and red values shows bottom 10% RF- distances. The blue part of the graph represents constrained trees values, the grey part of the graph represents unconstrained trees values, the red part of the graph represents constrained/unconstrained trees values.

Appendix 1—table 1
Detailed information on regional sampling.
MetriorrhynchinaCautirinaMetanoeinaMetriorrhynchini
Region# of specimens# of specimens# of specimens# of specimens
Australian region44894489
Australian continent39643964
Australia475475
New South Wales6464
Northern Territory1818
Queensland382382
South Australia11
Western Australia1010
New Zealand11
New Guinea34613461
Solomon Islands1919
Aru Islands88
Wallacea52515540
Maluku-Buru1515
Halmahera169169
Sulawesi34115
Oriental region2781,3111131,692
Philippines48371499
Mindanao113923
Negros2
Palawan2934568
Sibuyan44
Luzon22
Continental Asia2301264991593
Malaya342708348
Java2145672
Bali213
Sumatra5620126283
Borneo8225322357
Cambodia71320
Indo-Burma2314510178
China incl. Taiwan510023128
India5959
Japan1774181
Afrotropical region233233
Africa175175
Madagascar5858
Total476715491136429
Appendix 1—table 2
List of material for phylogenomic analyses.
Ingroup
VoucherSpeciesCladeGeographic originData sourceData typeTissue type
R18010Metriorrhynchus s. l.MetriorrhynchinaPorrostomineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18013Metriorrhynchus sp.MetriorrhynchinaPorrostomineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R20001Metriorrhynchus philippinensisMetriorrhynchinaPorrostominePhilippinesThis studyILLUMINA RNA-seq PE-readsWhole animal
R18018Metriorrhynchus s. l.MetriorrhynchinaPorrostomineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18020Metriorrhynchus s. l.MetriorrhynchinaPorrostomineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
1_kitePorrostoma rhipidiumMetriorrhynchinaPorrostomineAustraliaMcKenna et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
R18001Cladophorus sp.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18009Cladophorus sp.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18007Pseudodontocerus sp.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18012Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18025Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18028Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18003Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18016Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18005Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18015Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18006Ditua s. l.MetriorrhynchinaCladophorineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18002Microtrichalus sp.MetriorrhynchinaTrichalineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18024Eniclases sp.MetriorrhynchinaTrichalineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18022Diatrichalus sp.MetriorrhynchinaTrichalineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18026Trichaline sp.MetriorrhynchinaTrichalineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
JB0085Trichaline sp.MetriorrhynchinaTrichalineNew GuineaThis studyILLUMINA WGS-seq PE-readsThoracic muscles
R18004Procautires sp.MetriorrhynchinaProcautirineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18017Procautires sp.MetriorrhynchinaProcautirineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18021Procautires sp.MetriorrhynchinaProcautirineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18023Procautires sp.MetriorrhynchinaProcautirineNew GuineaThis studyILLUMINA RNA-seq PE-readsWhole animal
R18037Broxylus sp.MetriorrhynchinaLeptotrichalineSulawesiThis studyILLUMINA RNA-seq PE-readsWhole animal
R18030Leptotrichalus sp.MetriorrhynchinaLeptotrichalineSulawesiThis studyILLUMINA RNA-seq PE-readsWhole animal
R18034Wakarumbia sp.MetriorrhynchinaLeptotrichalineSulawesiThis studyILLUMINA RNA-seq PE-readsWhole animal
R18036Sulabanus sp.MetriorrhynchinaLeptotrichalineSulawesiThis studyILLUMINA RNA-seq PE-readsWhole animal
R18041Sulabanus sp.MetriorrhynchinaLeptotrichalineSulawesiThis studyILLUMINA RNA-seq PE-readsWhole animal
R18040Mangkutanus sp.MetriorrhynchinaLeptotrichalineSulawesiKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
R18039Xylobanus sp.CautirinaSulawesiThis studyILLUMINA RNA-seq PE-readsWhole animal
AJ0013Cautires communisCautirinaMalaysiaKusy et al., 2019ILLUMINA WGS-seq PE-readsThoracic muscles
G19002Metanoeus sp.MetanoeinaMalaysiaThis studyILLUMINA WGS-seq PE-readsThoracic muscles
Outgroup
Dilophotes sp.MalaysiaKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
Dihammatus sp.MalaysiaKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
Lycoprogentes sp.MalaysiaKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
Libnetis sp.MalaysiaKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
Platerodrilus sp.MalaysiaKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
Lyropaeus optabilisMalaysiaKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
Antennolycus constrictusMalaysiaKusy et al., 2019ILLUMINA RNA-seq PE-readsWhole animal
Appendix 1—table 3
Overview of official gene sets of six reference species used for ortholog assessment.

Number of genes correspond with OrthoDB 9.1.

SpeciesAccession# of contigsSourceDownload dateReference
Onthophagus taurusPRJNA16747817,483i5K05.03.20171
Tribolium castaneumPRJNA1254016,631iBeetle05.03.20172, 3
Dendroctonus ponderosaePRJNA36027013,088ENS Metazoa05.03.20174
Anoplophora glabripennisPRJNA16747922,035i5K05.03.20175
Leptinotarsa decemlineataPRJNA17174924,671i5K05.03.20171
Agrilus planipennisPRJNA23092115,497i5K05.03.20171
Appendix 1—table 4
Detailed information and statistics of each generated dataset.
Dataset nameNumber of taxaNumber of partitionsNumber of alignment sitesCompleteness score (Ca) AliStatPercentage of pairwise P-values < 0.05 for the Bowker’s test(SV) MARE Matrix saturation(IC) MARE Information contentPartition scheme optimalization
Amino acids
A-4109-AA424,10918926910.77212798.61%0.8880.506Yes
D-3370-AA_Bacoca423,37016723620.78036397.91%0.8800.511
F-1490-AA_Bacoca_decisive421,490673,1020.92508294.08%10.594
J-2129-AA_MARE422,129959,7410.87633190.82%0.9640.648
Nucleoides
B-4109-NT424,10956780730.772133100%NANA
C-4109-NT12424,10937853820.77213399.77%NANA
E-4109-NT2424,1091892691'0.77213492.45%NANA
G-NT-1767_MaxSymTest421,7672413164'0.71391299.88%NANA
H-NT-1645_MaxSymTestmarginal421,6452233485'0.71220899.77%NANA
I-NT-3905_MaxSymTestInternal423,9055377449'0.771645100%NANA
Appendix 1—table 5
Characteristics of concatenated super-matrices (mitochondrial fragments) and used models of the DNA evolution (descriptions of columns abbreviation listed bellow tables).

Abbreviations. Seq – Number of sequences, Site – Number of bases, Infor – Number of parsimony-informative sites, Invar – Number of invariant sites, Model – Best-fit model according to BIC using ModelFinder.

SubsetSeqsSitesInforInvarModel
Complete dataset
nad53,20598780988GTR + F + I + G4
cox5,9351,068812134GTR + F + I + G4
16 s2,381875577201GTR + F + I + G4
98% OTUs delimitation dataset
nad51,252969776115GTR + F + I + G4
tRNA Leu2,245623417GTR + F + G4
cox12,395783536141GTR + F + I + G4
cox22,19221619411GTR + F + I + G4
16 s1,115834512237GTR + F + I + G4

Data availability

All datasets are deposited in the Mendeley Data repository https://doi.org/10.17632/ntgg6k4fjx.1.

The following data sets were generated
    1. Motyka M
    2. Kusy D
    3. Bocek M
    4. Bilkova R
    5. Bocak L
    (2021) Mendeley Data
    Data for: Phylogenomic and mitogenomic data can accelerate inventorying of tropical beetles during the current biodiversity crisis.
    https://doi.org/10.17632/ntgg6k4fjx.1

References

    1. Blaxter M
    2. Mann J
    3. Chapman T
    4. Thomas F
    5. Whitton C
    6. Floyd R
    7. Abebe E
    (2005) Defining operational taxonomic units using DNA barcode data
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 360:1935–1943.
    https://doi.org/10.1098/rstb.2005.1725
  1. Book
    1. Calder A
    (1998)
    Coleoptera: Elateroidea
    In: Wells A, editors. Zoological Catalogue of Australia. CSIRO Publishing. pp. 9–248.
    1. Godfray HCJ
    2. Knapp S
    (2004) Introduction. Taxonomy for the twenty-first century
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 359:559–569.
    https://doi.org/10.1098/rstb.2003.1457
    1. Guérin-Méneville FE
    (1838)
    Zoologie
    Crustacés, arachnides et insectes. In Voyage autour du monde sur la Corvette de Sa Majesté, La Coquille, pendant les annés 1822, 1823, 1824 et 1825, Zoologie, Vol 2, Paris, France, Duperrey, L.I.
    1. Kazantsev SV
    (2010)
    New taxa of Lycidae from Samoa, Fiji and Tonga (Coleoptera: Lycidae
    Russian Journal of Entomology 18:195–199.
  2. Book
    1. Kleine R
    (1933)
    Lycidae
    In: Hincks WD, editors. Coleopterorum Catalogus Auspiciis et Auxilio W. Junk Editus a Schenkling. Berlin: W. Junk. pp. 123–145.
  3. Conference
    1. Kunte K
    (2008) Proceedings of the Royal Society B – Biological Sciences
    Mimetic butterflies support Wallace’s model of sexual dimorphism. pp. 1617–1624.
    https://doi.org/10.1098/rspb.2008.0171
    1. Richards S
    2. Gibbs RA
    3. Weinstock GM
    4. Brown SJ
    5. Denell R
    6. Beeman RW
    7. Gibbs R
    8. Beeman RW
    9. Brown SJ
    10. Bucher G
    11. Friedrich M
    12. Grimmelikhuijzen CJP
    13. Klingler M
    14. Lorenzen M
    15. Richards S
    16. Roth S
    17. Schröder R
    18. Tautz D
    19. Zdobnov EM
    20. Muzny D
    21. Gibbs RA
    22. Weinstock GM
    23. Attaway T
    24. Bell S
    25. Buhay CJ
    26. Chandrabose MN
    27. Chavez D
    28. Clerk-Blankenburg KP
    29. Cree A
    30. Dao M
    31. Davis C
    32. Chacko J
    33. Dinh H
    34. Dugan-Rocha S
    35. Fowler G
    36. Garner TT
    37. Garnes J
    38. Gnirke A
    39. Hawes A
    40. Hernandez J
    41. Hines S
    42. Holder M
    43. Hume J
    44. Jhangiani SN
    45. Joshi V
    46. Khan ZM
    47. Jackson L
    48. Kovar C
    49. Kowis A
    50. Lee S
    51. Lewis LR
    52. Margolis J
    53. Morgan M
    54. Nazareth LV
    55. Nguyen N
    56. Okwuonu G
    57. Parker D
    58. Richards S
    59. Ruiz SJ
    60. Santibanez J
    61. Savard J
    62. Scherer SE
    63. Schneider B
    64. Sodergren E
    65. Tautz D
    66. Vattahil S
    67. Villasana D
    68. White CS
    69. Wright R
    70. Park Y
    71. Beeman RW
    72. Lord J
    73. Oppert B
    74. Lorenzen M
    75. Brown S
    76. Wang L
    77. Savard J
    78. Tautz D
    79. Richards S
    80. Weinstock G
    81. Gibbs RA
    82. Liu Y
    83. Worley K
    84. Weinstock G
    85. Elsik CG
    86. Reese JT
    87. Elhaik E
    88. Landan G
    89. Graur D
    90. Arensburger P
    91. Atkinson P
    92. Beeman RW
    93. Beidler J
    94. Brown SJ
    95. Demuth JP
    96. Drury DW
    97. Du YZ
    98. Fujiwara H
    99. Lorenzen M
    100. Maselli V
    101. Osanai M
    102. Park Y
    103. Robertson HM
    104. Tu Z
    105. Wang J
    106. Wang S
    107. Richards S
    108. Song H
    109. Zhang L
    110. Sodergren E
    111. Werner D
    112. Stanke M
    113. Morgenstern B
    114. Solovyev V
    115. Kosarev P
    116. Brown G
    117. Chen HC
    118. Ermolaeva O
    119. Hlavina W
    120. Kapustin Y
    121. Kiryutin B
    122. Kitts P
    123. Maglott D
    124. Pruitt K
    125. Sapojnikov V
    126. Souvorov A
    127. Mackey AJ
    128. Waterhouse RM
    129. Wyder S
    130. Zdobnov EM
    131. Zdobnov EM
    132. Wyder S
    133. Kriventseva EV
    134. Kadowaki T
    135. Bork P
    136. Aranda M
    137. Bao R
    138. Beermann A
    139. Berns N
    140. Bolognesi R
    141. Bonneton F
    142. Bopp D
    143. Brown SJ
    144. Bucher G
    145. Butts T
    146. Chaumot A
    147. Denell RE
    148. Ferrier DEK
    149. Friedrich M
    150. Gordon CM
    151. Jindra M
    152. Klingler M
    153. Lan Q
    154. Lattorff HMG
    155. Laudet V
    156. von Levetsow C
    157. Liu Z
    158. Lutz R
    159. Lynch JA
    160. da Fonseca RN
    161. Posnien N
    162. Reuter R
    163. Roth S
    164. Savard J
    165. Schinko JB
    166. Schmitt C
    167. Schoppmeier M
    168. Schröder R
    169. Shippy TD
    170. Simonnet F
    171. Marques-Souza H
    172. Tautz D
    173. Tomoyasu Y
    174. Trauner J
    175. Van der Zee M
    176. Vervoort M
    177. Wittkopp N
    178. Wimmer EA
    179. Yang X
    180. Jones AK
    181. Sattelle DB
    182. Ebert PR
    183. Nelson D
    184. Scott JG
    185. Beeman RW
    186. Muthukrishnan S
    187. Kramer KJ
    188. Arakane Y
    189. Beeman RW
    190. Zhu Q
    191. Hogenkamp D
    192. Dixit R
    193. Oppert B
    194. Jiang H
    195. Zou Z
    196. Marshall J
    197. Elpidina E
    198. Vinokurov K
    199. Oppert C
    200. Zou Z
    201. Evans J
    202. Lu Z
    203. Zhao P
    204. Sumathipala N
    205. Altincicek B
    206. Vilcinskas A
    207. Williams M
    208. Hultmark D
    209. Hetru C
    210. Jiang H
    211. Grimmelikhuijzen CJP
    212. Hauser F
    213. Cazzamali G
    214. Williamson M
    215. Park Y
    216. Li B
    217. Tanaka Y
    218. Predel R
    219. Neupert S
    220. Schachtner J
    221. Verleyen P
    222. Raible F
    223. Bork P
    224. Friedrich M
    225. Walden KKO
    226. Robertson HM
    227. Angeli S
    228. Forêt S
    229. Bucher G
    230. Schuetz S
    231. Maleszka R
    232. Wimmer EA
    233. Beeman RW
    234. Lorenzen M
    235. Tomoyasu Y
    236. Miller SC
    237. Grossmann D
    238. Bucher G
    (2008) The genome of the model beetle and pest Tribolium castaneum
    Nature 452:949–955.
    https://doi.org/10.1038/nature06784
    1. Sklenarova K
    2. Kubecek V
    3. Bocak L
    (2014)
    Subtribal classification of Metriorrhynchini (Insecta: Coleoptera: Lycidae): an integrative approach using molecular phylogeny and morphology of adults and larvae
    Arthropod Systematics & Phylogeny 72:37–54.
    1. Thomson SA
    2. Pyle RL
    3. Ahyong ST
    4. Alonso-Zarazaga M
    5. Ammirati J
    6. Araya JF
    7. Ascher JS
    8. Audisio TL
    9. Azevedo-Santos VM
    10. Bailly N
    11. Baker WJ
    12. Balke M
    13. Barclay MVL
    14. Barrett RL
    15. Benine RC
    16. Bickerstaff JRM
    17. Bouchard P
    18. Bour R
    19. Bourgoin T
    20. Boyko CB
    21. Breure ASH
    22. Brothers DJ
    23. Byng JW
    24. Campbell D
    25. Ceríaco LMP
    26. Cernák I
    27. Cerretti P
    28. Chang CH
    29. Cho S
    30. Copus JM
    31. Costello MJ
    32. Cseh A
    33. Csuzdi C
    34. Culham A
    35. D’Elía G
    36. d’Udekem d’Acoz C
    37. Daneliya ME
    38. Dekker R
    39. Dickinson EC
    40. Dickinson TA
    41. van Dijk PP
    42. Dijkstra KDB
    43. Dima B
    44. Dmitriev DA
    45. Duistermaat L
    46. Dumbacher JP
    47. Eiserhardt WL
    48. Ekrem T
    49. Evenhuis NL
    50. Faille A
    51. Fernández-Triana JL
    52. Fiesler E
    53. Fishbein M
    54. Fordham BG
    55. Freitas AVL
    56. Friol NR
    57. Fritz U
    58. Frøslev T
    59. Funk VA
    60. Gaimari SD
    61. Garbino GST
    62. Garraffoni ARS
    63. Geml J
    64. Gill AC
    65. Gray A
    66. Grazziotin FG
    67. Greenslade P
    68. Gutiérrez EE
    69. Harvey MS
    70. Hazevoet CJ
    71. He K
    72. He X
    73. Helfer S
    74. Helgen KM
    75. van Heteren AH
    76. Hita Garcia F
    77. Holstein N
    78. Horváth MK
    79. Hovenkamp PH
    80. Hwang WS
    81. Hyvönen J
    82. Islam MB
    83. Iverson JB
    84. Ivie MA
    85. Jaafar Z
    86. Jackson MD
    87. Jayat JP
    88. Johnson NF
    89. Kaiser H
    90. Klitgård BB
    91. Knapp DG
    92. Kojima J
    93. Kõljalg U
    94. Kontschán J
    95. Krell FT
    96. Krisai-Greilhuber I
    97. Kullander S
    98. Latella L
    99. Lattke JE
    100. Lencioni V
    101. Lewis GP
    102. Lhano MG
    103. Lujan NK
    104. Luksenburg JA
    105. Mariaux J
    106. Marinho-Filho J
    107. Marshall CJ
    108. Mate JF
    109. McDonough MM
    110. Michel E
    111. Miranda VFO
    112. Mitroiu MD
    113. Molinari J
    114. Monks S
    115. Moore AJ
    116. Moratelli R
    117. Murányi D
    118. Nakano T
    119. Nikolaeva S
    120. Noyes J
    121. Ohl M
    122. Oleas NH
    123. Orrell T
    124. Páll-Gergely B
    125. Pape T
    126. Papp V
    127. Parenti LR
    128. Patterson D
    129. Pavlinov I
    130. Pine RH
    131. Poczai P
    132. Prado J
    133. Prathapan D
    134. Rabeler RK
    135. Randall JE
    136. Rheindt FE
    137. Rhodin AGJ
    138. Rodríguez SM
    139. Rogers DC
    140. Roque O
    141. Rowe KC
    142. Ruedas LA
    143. Salazar-Bravo J
    144. Salvador RB
    145. Sangster G
    146. Sarmiento CE
    147. Schigel DS
    148. Schmidt S
    149. Schueler FW
    150. Segers H
    151. Snow N
    152. Souza-Dias PGB
    153. Stals R
    154. Stenroos S
    155. Stone RD
    156. Sturm CF
    157. Štys P
    158. Teta P
    159. Thomas DC
    160. Timm RM
    161. Tindall BJ
    162. Todd JA
    163. Triebel D
    164. Valdecasas AG
    165. Vizzini A
    166. Vorontsova MS
    167. de Vos JM
    168. Wagner P
    169. Watling L
    170. Weakley A
    171. Welter-Schultes F
    172. Whitmore D
    173. Wilding N
    174. Will K
    175. Williams J
    176. Wilson K
    177. Winston JE
    178. Wüster W
    179. Yanega D
    180. Yeates DK
    181. Zaher H
    182. Zhang G
    183. Zhang ZQ
    184. Zhou HZ
    (2018) Taxonomy based on science is necessary for global conservation
    PLOS Biology 16:e2005075.
    https://doi.org/10.1371/journal.pbio.2005075

Decision letter

  1. David Donoso
    Reviewing Editor; Escuela Politécnica Nacional, Ecuador
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Developmental Biology, Germany
  3. Ming Bai
    Reviewer; Institute of Zoology, Chinese Academy of Sciences, China

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Phylogenomic and mitogenomic data can accelerate inventorying of tropical beetles during the current biodiversity crisis" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Detlef Weigel as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Ming Bai (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The reviewers agree that the authors did a nice sampling and produce an interesting inventory, but that the authors need to be cautious in their claims (as noted in their results or this "new" methodology). One major concern is with the validity of the premise that this paper helps to accelerate, or provides a framework that accelerates tropical insect inventories. For example, it will be very important if the authors develop the 4-step procedure (Lines 90-97) they use to justify the importance of the work. What is unique (or new) in their methodology, that can accelerate inventory assembly? Despite Figure 5, which establish differences between their methods and basic Sanger sequencing, the reviewers are not yet convinced that this paper really moves the field forward. Alternatively, the authors can temper the acceleration biodiscovery framework, and provide a more robust background on how Illumina data may be essential for obtaining a backbone where to put the Sanger data, and how this is not frequently used yet.

Reviewer #2 (Recommendations for the authors):

Overall, I like the manuscript and I think it can be published after some modifications. The most important point is the claims about mOTUs/putative species. As a scientist, we should be cautious when interpreting our data and especially when selecting different parameters (like the threshold). Authors should not provide just their most optimistic (or most striking) estimates and should include different estimates and consider all of them over the manuscript (from abstract to conclusions).

Also, It is surprising the amount of text used for the phylogenomic analyses, it is a very appropriate methodological part but barely mentioned in the results. Authors should improve the result/discussion of this part, describing and comparing what they produce with other studies (if any).

Finally, the use of "person-months of focused field research" should be considered to be removed, as it did not provide any results or comparison with other study groups (which won't be possible).

Reviewer #3 (Recommendations for the authors):

Conservation efforts must be evidence-based, so rapid and economically feasible methods should be used to quantify diversity and distribution patterns. The principal objective of this study is to demonstrate how biodiversity information for a hyperdiverse tropical group can be rapidly expanded via targeted field research and large-scale sequencing. The authors have attempted to overcome current impediments to the gathering of biodiversity data by using integrative phylogenomic and three mtDNA fragment analyses. As a model, they sequenced the Metriorrhynchini beetle fauna, sampled from ~700 localities in three continents. The species-rich dataset included ~6,500 terminals, >2,300 putative species, more than a half of them unknown to science. It is an amazing finding. Their information and phylogenetic hypotheses can be a resource for higher-level phylogenetics, population genetics, phylogeographic studies, and biodiversity estimation. At the same time, they want to show how limited the taxonomical knowledge is and how this lack is hindering biodiversity research and management. It is a nice and well-designed study with a very important meaning. I would be happy to recommend it to be accepted after the revision. However before it is formally accepted, there are several questions followed:

1. The definition of species is very complex. The molecular species, e.g. OTU, is still different from morphological species using traditional taxonomic methods. Of course, it is very common to use molecular species to define and infer biodiversity in many studies. For this study, ideally, the authors can express their comments on the relationships between molecular species and morphological species. For example, Sharkey et al. (2021 Zookeys) published 403 new species only with molecular information.

2. P8L199: The constituent clades should be point out based on which tree. Pruned mitogenomic tree with and without constraints or Constrained mitogenomics?

3. P8L205-208: How many mOTUs if using thresholds of 2%? 2345 or 2356? It is inconsistent in this paragraph.

4. P10L244: This table is not too clearly to explain shared regions. If illustrate the meanings of number before and after "/", it will be better.

5. P14L369: What's the references?

6. P16L448: This discovery should be shown in Results part.

7. P18L528: What is the voucher number of Metanoeus sp.?

8. P96 S18: The figure S18 is unclear. It should be replaced by clear one.

https://doi.org/10.7554/eLife.71895.sa1

Author response

Essential revisions:

The reviewers agree that the authors did a nice sampling and produce an interesting inventory, but that the authors need to be cautious in their claims (as noted in their results or this "new" methodology). One major concern is with the validity of the premise that this paper helps to accelerate, or provides a framework that accelerates tropical insect inventories. For example, it will be very important if the authors develop the 4-step procedure (Lines 90-97) they use to justify the importance of the work. What is unique (or new) in their methodology, that can accelerate inventory assembly? Despite Figure 5, which establish differences between their methods and basic Sanger sequencing, the reviewers are not yet convinced that this paper really moves the field forward. Alternatively, the authors can temper the acceleration biodiscovery framework, and provide a more robust background on how Illumina data may be essential for obtaining a backbone where to put the Sanger data, and how this is not frequently used yet.

Many thanks for your comments. We agree with your suggestions and tried to address them.

We have added further information on our approach. Principally, we combine phylogenomics (both transcriptomes and WGS data) that has been used for deep phylogenies like insect order relationships or the beetle classification with short DNA fragments that resolve only shallow relationships. We tried to limit the number of transcriptomes to avoid high costs. Further, we elaborated each dataset contributes to results. Separately, neither phylogenomics nor short fragment analyses provide necessary information. We are aware that even our approach is labour intensive, but we argue that reporting the presence of ~1,000 undescribed species by a single group for a tribe of beetles is in the clear contrast with the slow pace of traditional morphology based taxonomy (the journal Zootaxa published 30,000 articles with 60,000 new species in 20 years; Zhang, 2021). Additionally, as we work with preserved vouchers, our research can be used for further studies integrating morphology and molecular phylogenetics.

Below, we answer how specific comments by reviewers were addressed. We tried to follow the suggestions as closely as possible.

Reviewer #2 (Recommendations for the authors):

Overall, I like the manuscript and I think it can be published after some modifications. The most important point is the claims about mOTUs/putative species. As a scientist, we should be cautious when interpreting our data and especially when selecting different parameters (like the threshold). Authors should not provide just their most optimistic (or most striking) estimates and should include different estimates and consider all of them over the manuscript (from abstract to conclusions).

Thanks for the comment. We modified the text accordingly, but we also discussed the lower threshold. We pointed more clearly to the Figure 17 where are shown numbers of putative species for 20 thresholds (1-10%, increased by 0.5%). Our preference for lower threshold has been based on the earlier published studies on Metriorrhynchini that have shown conspicuous morphological differentiation between New Guinean putative species which shared similar mtDNA (Kalousova and Bocak, 2017; Bocek et al., 2019). We studied the congruence of mtDNA/morphology species delimitation with nextRAD data in the later study. We proposed that morphological differentiation was driven by selection in different mimetic communities and by ecological shift (exemplified by possible shift to nocturnal activity in species that lost bright colouration and have much larger eyes than their closest relative – mtDNA distance around 1%). Analogically, we showed similar low mtDNA differentiation in Scarelus and Cautires in Malaysia (Bray and Bocak 2016, Jiruskova et al. 2019) and in Synchonnus in Australia (Kusy et al. 2018). To sum up, our claims are downgraded, alternative estimates included, and reference to further evidence added.

Also, It is surprising the amount of text used for the phylogenomic analyses, it is a very appropriate methodological part but barely mentioned in the results. Authors should improve the result/discussion of this part, describing and comparing what they produce with other studies (if any).

Thanks for the comment and we agree with you that we were too brief. We tried to improve these sections. We limited our discussion on the position of individual taxa as we wrote the manuscript for wide audience without specific knowledge on our group. As we noted in the manuscript, morphology must be inevitably studied, but it is not included in the present study. We already work on the next step of biodiversity research: the generic classification of the group with integrative definitions of named taxa. We have already prepared several hundreds of illustrations and the analysis of the congruence of relationships based on morphological traits, phylogenomics, and mitochondrial markers. But the taxon-specific information is interesting for taxonomists, not for wide audience.

Finally, the use of "person-months of focused field research" should be considered to be removed, as it did not provide any results or comparison with other study groups (which won't be possible).

Thanks for the suggestion that we fully follow in the revised manuscript. We added explanation why standardized methods were not used and we kept only approximate information which regions were sparsely sampled.

Reviewer #3 (Recommendations for the authors):

Conservation efforts must be evidence-based, so rapid and economically feasible methods should be used to quantify diversity and distribution patterns. The principal objective of this study is to demonstrate how biodiversity information for a hyperdiverse tropical group can be rapidly expanded via targeted field research and large-scale sequencing. The authors have attempted to overcome current impediments to the gathering of biodiversity data by using integrative phylogenomic and three mtDNA fragment analyses. As a model, they sequenced the Metriorrhynchini beetle fauna, sampled from ~700 localities in three continents. The species-rich dataset included ~6,500 terminals, >2,300 putative species, more than a half of them unknown to science. It is an amazing finding. Their information and phylogenetic hypotheses can be a resource for higher-level phylogenetics, population genetics, phylogeographic studies, and biodiversity estimation. At the same time, they want to show how limited the taxonomical knowledge is and how this lack is hindering biodiversity research and management. It is a nice and well-designed study with a very important meaning. I would be happy to recommend it to be accepted after the revision. However before it is formally accepted, there are several questions followed:

1. The definition of species is very complex. The molecular species, e.g. OTU, is still different from morphological species using traditional taxonomic methods. Of course, it is very common to use molecular species to define and infer biodiversity in many studies. For this study, ideally, the authors can express their comments on the relationships between molecular species and morphological species. For example, Sharkey et al. (2021 Zookeys) published 403 new species only with molecular information.

Thanks for an interesting suggestion. We added several sentences that summarize the relationships between nextRAD, mtDNA and morphology based species in Metriorrhynchini as we have studied them in previous works (Bocak and Yagi 2010, Kalousova and Bocak 2017, Bocek et al. 2019, Jiruskova et al. 2019). While trying to avoid selfcitations in the earlier version, we underestimated the need to support our conclusions by earlier studies. Our earlier discussion on the delimitation of species in Metriorrhynchini (especially the nextRAD,mtDNA study in Frontiers of Zoology) have shown that the net-winged beetle are often well morphologically differentiated even when species and have recently split. The studies discussed also the factors contribution to observed patterns: (the mosaic structure of mimetic communities, altitudinal differences, timing of activity etc.). The performance of delimitation must be evaluated with morphology and nuclear genes and can be solved only in restricted taxonomic studies. We consider our species numbers as estimates and do not intend to describe species without morphological evaluation. The text was modified accordingly.

2. P8L199: The constituent clades should be point out based on which tree. Pruned mitogenomic tree with and without constraints or Constrained mitogenomics?

Thanks for pointing to the problem. We expanded the text. To sum up, we used constrained mtDNA topology which contains all genomic samples in fixed position and we listed in the Supplements also genera for which transcriptomes or whole genomes have not been sequenced. The genera for which no molecular data are available are assigned to subclades using the morphology. The senior author have studied all type species of Metriorrhynchini genera and about 70 percent of all types of Metriorrhynchini.

3. P8L205-208: How many mOTUs if using thresholds of 2%? 2345 or 2356? It is inconsistent in this paragraph.

I am very sorry for this error, all number are checked and are now consistent. The wrong number was overlooked after the latest (smaller, rigorously edited) dataset was used for delimitation.

4. P10L244: This table is not too clearly to explain shared regions. If illustrate the meanings of number before and after "/", it will be better.

Thanks for your comment, we added the explanation to the headings of columns.

5. P14L369: What's the references?

The references were added. Now we cite the morphology-based phylogenetic analysis and the latest catalogue – (Bocak 2002, Bocak et al. 2020). To keep the list of references focused on relevant and more general works, we only referred readers to the history of classification described in these studies. The latest study contains almost complete bibliography for Metriorrhynchini.

6. P16L448: This discovery should be shown in RESULTS part.

Thanks for the comment, we moved illustration forward and added statements about the findings to Results.

7. P18L528: What is the voucher number of Metanoeus sp.?

The voucher number is now added.

8. P96 S18: The figure S18 is unclear. It should be replaced by clear one.

The Figure S18 is replaced by the high resolution version.

https://doi.org/10.7554/eLife.71895.sa2

Article and author information

Author details

  1. Michal Motyka

    Laboratory of Biodiversity and Molecular Evolution, Czech Advanced Technology Research Institute, Centre of Region Hana, Olomouc, Czech Republic
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Dominik Kusy
    Competing interests
    No competing interests declared
  2. Dominik Kusy

    Laboratory of Biodiversity and Molecular Evolution, Czech Advanced Technology Research Institute, Centre of Region Hana, Olomouc, Czech Republic
    Contribution
    Conceptualization, Data curation, Formal analysis, Methodology, Resources, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Michal Motyka
    Competing interests
    No competing interests declared
  3. Matej Bocek

    Laboratory of Biodiversity and Molecular Evolution, Czech Advanced Technology Research Institute, Centre of Region Hana, Olomouc, Czech Republic
    Contribution
    Data curation, Investigation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3398-6078
  4. Renata Bilkova

    Laboratory of Biodiversity and Molecular Evolution, Czech Advanced Technology Research Institute, Centre of Region Hana, Olomouc, Czech Republic
    Contribution
    Data curation, Investigation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Ladislav Bocak

    Laboratory of Biodiversity and Molecular Evolution, Czech Advanced Technology Research Institute, Centre of Region Hana, Olomouc, Czech Republic
    Contribution
    Conceptualization, Data curation, Funding acquisition, Project administration, Resources, Supervision, Writing – original draft, Writing – review and editing
    For correspondence
    ladislav.bocak@upol.cz
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6382-8006

Funding

Grantová Agentura České Republiky (18-14942S)

  • Ladislav Bocak

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Several colleagues provided samples from their respective home countries or their own field research. We are obliged to G Monteith and A Slipinski for their support during our research in Australia. The field research was enabled by the permits from the Government of Papua New Guinea, the Queensland Ministry of Environment, Malaysian Ministry of Natural Resources, and UP Los Baños. The trips to Papua New Guinea were supported by the Binatang Research Centre, Nagada and we are obliged to H Maraia and J Kua for their field assistance. A substantial part of the field research before the start of the Czech Science Foundation project was funded by the senior author and the Palacky University is acknowledged for granting the leave. L Harmackova advised with some analyses in R. We cordially acknowledge I Frebort and P Banas for their encouragement and support of this project after the liquidation of the laboratory at the Palacky University. Funding: This research was funded by The Czech Science Foundation, grant number 18–14942S.

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Developmental Biology, Germany

Reviewing Editor

  1. David Donoso, Escuela Politécnica Nacional, Ecuador

Reviewer

  1. Ming Bai, Institute of Zoology, Chinese Academy of Sciences, China

Publication history

  1. Received: July 2, 2021
  2. Preprint posted: July 15, 2021 (view preprint)
  3. Accepted: December 18, 2021
  4. Accepted Manuscript published: December 20, 2021 (version 1)
  5. Version of Record published: January 28, 2022 (version 2)

Copyright

© 2021, Motyka et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 544
    Page views
  • 93
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Michal Motyka
  2. Dominik Kusy
  3. Matej Bocek
  4. Renata Bilkova
  5. Ladislav Bocak
(2021)
Phylogenomic and mitogenomic data can accelerate inventorying of tropical beetles during the current biodiversity crisis
eLife 10:e71895.
https://doi.org/10.7554/eLife.71895
  1. Further reading

Further reading

    1. Ecology
    Tom WN Walker et al.
    Research Article Updated

    Climate warming is releasing carbon from soils around the world, constituting a positive climate feedback. Warming is also causing species to expand their ranges into new ecosystems. Yet, in most ecosystems, whether range expanding species will amplify or buffer expected soil carbon loss is unknown. Here, we used two whole-community transplant experiments and a follow-up glasshouse experiment to determine whether the establishment of herbaceous lowland plants in alpine ecosystems influences soil carbon content under warming. We found that warming (transplantation to low elevation) led to a negligible decrease in alpine soil carbon content, but its effects became significant and 52% ± 31% (mean ± 95% confidence intervals) larger after lowland plants were introduced at low density into the ecosystem. We present evidence that decreases in soil carbon content likely occurred via lowland plants increasing rates of root exudation, soil microbial respiration, and CO2 release under warming. Our findings suggest that warming-induced range expansions of herbaceous plants have the potential to alter climate feedbacks from this system, and that plant range expansions among herbaceous communities may be an overlooked mediator of warming effects on carbon dynamics.

    1. Ecology
    2. Epidemiology and Global Health
    Feifei Zhang et al.
    Research Article

    Background: The variation in the pathogen type as well as the spatial heterogeneity of predictors make the generality of any associations with pathogen discovery debatable. Our previous work confirmed that the association of a group of predictors differed across different types of RNA viruses, yet there have been no previous comparisons of the specific predictors for RNA virus discovery in different regions. The aim of the current study was to close the gap by investigating whether predictors of discovery rates within three regions-the United States, China, and Africa-differ from one another and from those at the global level.

    Methods: Based on a comprehensive list of human-infective RNA viruses, we collated published data on first discovery of each species in each region. We used a Poisson boosted regression tree (BRT) model to examine the relationship between virus discovery and 33 predictors representing climate, socio-economics, land use, and biodiversity across each region separately. The discovery probability in three regions in 2010-2019 was mapped using the fitted models and historical predictors.

    Results: The numbers of human-infective virus species discovered in the United States, China, and Africa up to 2019 were 95, 80 and 107 respectively, with China lagging behind the other two regions. In each region, discoveries were clustered in hotspots. BRT modelling suggested that in all three regions RNA virus discovery was better predicted by land use and socio-economic variables than climatic variables and biodiversity, though the relative importance of these predictors varied by region. Map of virus discovery probability in 2010-2019 indicated several new hotspots outside historical high-risk areas. Most new virus species since 2010 in each region (6/6 in the United States, 19/19 in China, 12/19 in Africa) were discovered in high-risk areas as predicted by our model.

    Conclusions: The drivers of spatiotemporal variation in virus discovery rates vary in different regions of the world. Within regions virus discovery is driven mainly by land-use and socio-economic variables; climate and biodiversity variables are consistently less important predictors than at a global scale. Potential new discovery hotspots in 2010-2019 are identified. Results from the study could guide active surveillance for new human-infective viruses in local high-risk areas.

    Funding: FFZ is funded by the Darwin Trust of Edinburgh (https://darwintrust.bio.ed.ac.uk/). MEJW has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 874735 (VEO) (https://www.veo-europe.eu/).