1. Introduction

The recent widespread availability of high-throughput sequencing facilitates reliable comparison of individual heterozygosity that requires a large number of loci (Nei, 1978; Mitton & Pierce, 1980). Extraordinarily high individual heterozygosity often reflects a heterogeneous genetic composition (e.g., Green, 1984; Muniz et al., 2020; Zhang et al., 2021). It can serve as a simple genetic clue to possible hybridization and subsequent processes, especially to hybridogenesis that is also referred to as hemiclonal reproduction (Hotz et al., 1999). This mode has been found in interspecific hybrids and involves the elimination of one parental genome during gametogenesis and clonal transmission of the left one, with the former renewed by backcrossing (Dedukh et al., 2017; Lavanchy & Schwander, 2019). As such, a hybridogenetic system is featured by elevated heterozygosities of F1-hybrid types compared to heterozygosities of the other, pure-species individuals. Partly inspired by heterozygosity patterns, classical, non-social hybridogenesis has been identified in six taxa (Bogart et al., 2007; Lafond et al., 2019; Lavanchy & Schwander, 2019; Suzuki et al., 2020; Majtanova et al., 2021). It occurs in hybrids between the live-bearing fish Poeciliopsis monacha and several congeners in North America (Miller & Schultz, 1959; Schultz, 1969; Mateos & Vrijenhoek, 2005). The other three fish taxa include an Iberian minnow complex Squalius alburnoides (Carmona et al., 1997; Collares-Pereira & Coelho, 2010; Matos et al., 2019), five Hypseleotris gudgeon species in Australia (Bertozzi et al., 2000; Schmidt et al., 2011; Thacker et al., 2022), and three Hexagrammos greenling species in coasts of southern Hokkaido and adjacent areas (Kimura-Kawaguchi et al., 2014; Munehara et al., 2016). It has also been found in hybrids between the European marsh frog Pelophylax ridibundus and two congeners (Berger, 1967; Tunner & Heppich-Tunner, 1991; Dufresnes & Mazepa, 2020; Dufresnes et al., 2024) and between stick insects Bacillus rossius and B. grandii in Sicily (Bullini & Nascetti, 1990; Mantovani et al., 1996; Scavariello et al., 2017; Lavanchy et al., 2024). Although one may share the view that these divergent animals represent a small fraction of hybridogenetic systems (Lavanchy & Schwander, 2019), there have been no new taxa reported for ten years in the era of high-throughput sequencing.

Possibly related to this gap, further evidence than individual heterozygosities is needed to confirm hybridogenesis (Majtánová et al., 2021) or to infer and distinguish it from other systems where hybrids are F1 types. The latter include hybrid sterility and parthenogenesis, gynogenesis, kleptogenesis, and androgenesis of interspecific hybrid origin. In parthenogenesis, the embryo develops from an unfertilized egg. Gynogenesis also transmits the female genome only but requires sperm from a sympatric sexual species to trigger embryogenesis. In kleptogenesis, the sperm acquired/stolen from a co-occurring sexual species usually serve to trigger embryogenesis only. Occasionally, the sperm’s nuclear genome can be incorporated and contribute to the gene pool of an asexual population (Fujita et al., 2020). In androgenesis, the nuclear genome is contributed by the male, while eggs lacking a nucleus are produced by females, or the female genome is eliminated after fertilization (Schwander & Oldroyd, 2016). Genome-wide evidence of parental types and a hybrid type characterized by elevated heterozygosity, ~50/50 ancestry proportions, and a large number of fixed heterozygous loci supports but does not distinguish among these systems. Reflecting the genetic divergence between species, considerable fixed heterozygosity is expected for the F1-hybrid types of hybridogenesis and other modes lacking interspecific recombination, such as ameiotic parthenogenesis and ameiotic androgenesis (e.g., Vrijenhoek et al., 1978; Mantovani & Scali, 1993; Nascetti et al., 2003; Scali et al., 2003; Schwander & Oldroyd, 2016; Flynn et al., 2017; Unmack et al., 2019; Wachi et al., 2021).

On the other hand, several types of evidence may be useful to distinguish hybridogenesis from the others. First, if an F1-hybrid type is widespread, hybrid sterility and modes assuming many local hybrid origins can be rejected by a lack of coexistence of hybrids with both parental species. Second, in lineages exhibiting parthenogenesis, gynogenesis, or kleptogenesis, males are very rare if not absent (Avise, 2008; Tilquin & Kokko, 2016; Fujita et al., 2020), which is not necessarily the case for hybridogenetic lineages (e.g., Schmidt et al., 2011). Third, when a single or few local hybrid origins are assumed, a prevalent local, nuclear haplotype sharing between F1-hybrid and parental types is not expected for androgenesis, parthenogenesis, gynogenesis, and kleptogenesis. In these modes by definition a co-distributed parental species has no or only occasional contribution to the nuclear genome (Lampert & Schartl, 2010; Lehtonen et al., 2013; Morgado-Santos et al., 2017). Similarly, assuming a single or few local hybrid origins, a widespread high mitochondrial DNA similarity between co-distributed F1-hybrid and parental types is not expected for parthenogenesis, gynogenesis, and kleptogenesis (Schlupp, 2005; Bi & Bogart, 2010; Schmidt et al., 2011). A gynogenetic or kleptogenetic lineage is expected to exhibit matrilineal mitochondrial DNA variation/haplotypes different from those of the sperm donor populations (Bogart et al., 2009). In a hybridogenetic system, hybridogens share nuclear and possibly mitochondrial variation with a co-occurring parental species through backcrossing (Doležálková-Kaštánková et al., 2021). Last, the non-hybridogenetic modes, if not combined with for instance a meiotic hybridogenesis-like process that forms homologous haploid gametes (e.g., Arai, 2023; Lafond & Angers, 2024), are not expected to have pure-species offspring. Nevertheless it can be accomplished by hybridogens with eggs and sperm carrying conspecific genomes as found in Pelophylax frogs (Vorburger, 2001a, b; Dufresnes & Mazepa, 2020). For clarity, a summary of such different signatures of hybridogenesis and other systems is provided in Table 1.

Several types of different signatures of reproductive systems where hybrids are F1 types.

If an identified hybridogenetic system does have pure-species offspring of hybridogens, it may be suitable to address an unexplored possibility: disjunct range expansion of a narrowly endemic species mediated by hybridogenesis. An endemic may be restricted by habitat requirements and/or competition from a widespread relative (e.g., McGlone et al., 2001; Matesanz et al., 2009). By forming a hybridogenetic complex with the latter and taking advantage of the expected heterosis (Bulger & Schultz, 1979; Hotz et al., 1999), the former may establish a remote population from pure-species offspring of hybridogens given ecological opportunity. This can be investigated using a hybridogenetic complex ideally between an endemic with disjunct distribution and a widespread species. Such a pattern of distribution, however, has not been observed in the aforementioned six hybridogenetic systems (e.g., Mantovani et al., 1996; Collares-Pereira & Coelho, 2010; Kimura-Kawaguchi et al., 2014; Dufresnes & Mazepa, 2020; Thacker et al., 2022; IUCN, 2024).

In East Asian spiny frogs, Quasipaa robertingeri has a narrow known distribution nested within the range of its widespread close relative Q. boulengeri (Che et al., 2009; Fei, 2020; Yan et al., 2021; Frost, 2023). They inhabit subtropical mountain streams and their banks, with Q. robertingeri recorded only from a small area at the southern edge of the Sichuan Basin (Fig. 1) where sympatric populations of them have been found (Wang et al., 2021). Adults of the two species are considered to be morphologically similar. The tadpole of Q. robertingeri can be readily distinguished from those of Q. boulengeri and closely related Q. shini, Q. spinosa, Q. exilispinosa, and Q. verrucospinosa by the absence of dark transverse markings on dorsal tail (Wu & Zhao, 1995; Fei et al., 2009; Pyron, 2014; Yan et al., 2021). Similar karyotypes of 2n = 26 with five large and eight small chromosome pairs have been reported in Q. robertingeri and Q. boulengeri (Tan & Wu, 1987; Wu & Zhao, 1995; Qing et al., 2012; Xia et al., 2020). In addition, an interchromosomal reciprocal translocation has been inferred to spread among some Q. boulengeri populations along the western edge of the Sichuan Basin but not detected elsewhere (Wang, 2006; Qing et al., 2012).

Sampling sites.

Symbols indicate genotype compositions of samples. Site number colors correspond to clusters BB1 and BR1 (black), BB2 and BR2 (purple), BB3 and BR3 (blue), BB4 and BR4 (white), and RR (red) in Fig. 2.

In a preliminary assessment of the genetic diversity of Q. boulengeri with 32 samples, we have observed a bimodal heterozygosity pattern that cannot be explained by reciprocal translocation. Reciprocal translocation may suppress meiotic recombination of the chromosomes involved and eventually result in high levels of heterozygosity (McKim et al., 1988; Haigis & Dove, 2003). Single nucleotide polymorphisms (SNPs) of these samples were obtained from restriction site-associated DNA sequencing (RAD-seq) data. Some estimates of individual heterozygosity were roughly an order of magnitude higher than the others, and this pattern remained unchanged when analyzing subsets of loci presumably on different chromosomes individually. Moreover, for each of the few loci further examined, a high-heterozygosity sample appeared to have one Q. boulengeri haplotype and one outgroup, Q. robertingeri haplotype. These results can be explained by that the high-heterozygosity samples are F1 type hybrids between the two species.

In this study, we obtained RAD-seq data from samples across the distribution of both species and conducted heterozygosity, ancestry proportion, principal coordinate plot, individual clustering, population tree, haplotype network, and mitochondrial gene tree analyses. We first provided evidence supporting a system where hybrids were F1 types. By integrating our results and the literature, we rejected other atypical reproductive modes and accepted hybridogenesis that could explain the occurrence of pure individuals of Q. robertingeri far from its narrow, presumed original range.

2. Materials and methods

2.1. Sampling

A total of 107 individuals from 53 representative localities throughout the range of Quasipaa boulengeri and Q. robertingeri were included as ingroup members (Fig. 1). Among them, there were 58 sexed samples, 27 males and 31 females. The others were mostly tadpoles. Among the sexed samples, 31 individuals had been previously karyotyped, including some reciprocal translocation carriers (Qing et al., 2012; Xia et al., 2020). Based on the current phylogenetic hypothesis for Quasipaa (Yan et al., 2021), the species Q. exilispinosa, with four individuals from three sites, was used as outgroup when necessary. Altogether 111 samples were subjected to RAD-seq, including the aforementioned 32 samples for a preliminary analysis. Details of samples are provided in Table S1.

2.2. RAD-seq and data processing

Liver, muscle, or tail clips preserved in 95% ethanol were sent to Novogene (Beijing, China) for genomic DNA extraction, library construction, and sequencing. The library was prepared following the SLAF-seq approach (Sun et al., 2013) with a target insert size range of 215–240 bp. The restriction enzyme MseI was used in the restriction-ligation reaction, and a subset of the product was excluded by being cut by the restriction enzyme HaeIII. Enriched libraries were sequenced on an Illumina Novaseq 6000 platform in PE150 mode.

The quality of raw reads was checked using FastQC version 0.11.9 (Andrews, 2010). Exclusion of reads containing adapter sequences was conducted by Novogene. Further data cleaning was performed using the process_radtags program in Stacks version 2.60 (Catchen et al., 2013), removing reads with an average Phred33 score of < 10 within a 22-bp sliding window or an uncalled base.

We conducted a de novo assembly of loci using ipyrad version 0.9.87 (Eaton & Overcast, 2020), which applied reverse-complement clustering to our single-enzyme RAD-seq reads. To complete the assembly in a reasonable time, the clean reads were trimmed and filtered to avoid overloading with singlecopy reads according to the ipyrad documentation (https://ipyrad.readthedocs.io/). Single-copy reads can result from sequencing errors that are more frequent toward the 3′ end of reads. The last five bases of each read were trimmed using Trimmomatic version 0.39 (Bolger et al., 2014). Then, for each sample, a file containing only single-copy reads was obtained. This was achieved by sequentially using the dedupe.sh script in BBMap version 38.87 (Bushnell, 2014), the ‘common’ option in SeqKit version 2.3.0 (Shen et al., 2016), and the filterbyname.sh script in BBMap. According to this file, single-copy reads were removed by filterbyname.sh. In the ipyrad pipeline, 3′-end bases with Phred33 scores below 20 were trimmed and three mitochondrial genomes were used as reference for excluding potential mitochondrial reads. They were under GenBank accession numbers KC686711 (Q. boulengeri; Shan et al., 2014), MZ895125 (Q. robertingeri; direct submission by Wang B and Shu G), and MT561179 (Q. exilispinosa; Wu et al., 2020). A minimum depth of 6 for both statistical and majority-rule base calling, a maximum cluster depth of 200 within samples, and an up to 70% occurrence for a shared heterozygous site were allowed. In a preliminary analysis of 22 representative ingroup samples, results obtained with clustering threshold values from 0.74 to 0.95 were compared. This parameter was determined as 0.89 according to the average observed heterozygosity, Pearson coefficient of correlation between missingness and genetic distance, and cumulative variance of the first five principal components (Ilut et al., 2014; Mastretta-Yanes et al., 2015; McCartney-Melstad et al., 2019). The correlation coefficient and cumulative variance were calculated using the vcfMissingness.pl and vcfToPCAvarExplained.R scripts (McCartney-Melstad et al., 2019), respectively. The maximum fraction of heterozygous bases in a consensus sequence was set to 0.11, i.e., one minus the clustering threshold as in the preliminary analysis for possible F1-hybrid types.

The ipyrad SNP output was filtered using VCFtools version 0.1.17 (Danecek et al., 2011) by retaining genotypes with read depth ≥ 6 and SNPs present in ≥ 60% of ingroup members. To minimize linkage disequilibrium, the randSnps.pl script (https://www.biostars.org/p/313701/) was used to extract one random SNP per locus, resulting in the final main dataset. In addition, a dataset containing mainly pure Q. robertingeri samples was generated for population tree reconstruction. Two Q. boulengeri samples from one locality were included for rooting purposes. The ipyrad output was filtered with VCFtools by keeping genotypes with read depth ≥ 6 and SNPs present in all pure Q. robertingeri samples. Then, one SNP was randomly selected for each locus by randSnps.pl.

For fixed heterozygosity and haplotype network analyses, we also used Stacks to perform loci assembly of ingroup members. Stacks provides the observed heterozygosity for each SNP for each sample group and infers haplotype sequences using an algorithm relying on the co-observation of alleles in a read or read pair (Rochette et al., 2019). Currently, ipyrad provides unphased alleles as stated in its documentation. The denovo_map.pl pipeline of Stacks was applied to the clean reads, and a minimum of 3 reads was required to form a stack. Given the read length of 144 bp, the aforementioned ipyrad clustering threshold (0.89) suggested allowing 16 (i.e., 144 − 144 × 0.89) mismatches between stacks and between sample loci. These allowed mismatches were however both reduced to 12 for the reason of computational time based on a preliminary analysis. Loci present in ≤ 60% of samples within a sample group were retained for subsequent analyses.

2.3. Individual heterozygosity, ancestry proportion, and fixed heterozygosity analyses

For each individual, the proportion of heterozygous SNPs was obtained from the main dataset and subsets of it using the --het option in VCFtools. For each SNP in this dataset, the first locus sequence in the gphocs output file from ipyrad was extracted with a custom script gphocs1stSeq2fasta.pl (https://doi.org/10.57760/sciencedb.14822) (will be published on ScienceDB upon final acceptance of the manuscript; data private access link: https://www.scidb.cn/en/s/URnIba) and TBtools version 1.120 (Chen et al., 2020). Using BLAST+ version 2.13.0 (Camacho et al., 2009) with an expectation value threshold of 1e-10, these sequences were compared against the chromosome-level genomes of Q. spinosa (Hu, 2021; Hu et al., 2022) and Nanorana parkeri (Genome Warehouse accession GWHBCKU00000000; Fu et al., 2022). Nanorana is closely related to Quasipaa (Pyron, 2014). Then loci with unique BLAST hits on homologous scaffolds of the two genomes were listed to obtain the proportion of heterozygous SNPs corresponding to individual and all 13 major scaffolds.

Ancestry proportions were inferred using ADMIXTURE version 1.3.0 (Alexander et al., 2009) based on the main dataset, which had been converted into BED input files by PLINK version 1.9 (Chang et al., 2015). For ingroup members, the number of ancestral populations (K) was set to 2. This value could be expected to result in only pure-species types and F1-hybrid types with ancestry proportions close to 50/50 when hybridogenesis occurred (e.g., Kimura-Kawaguchi et al., 2014; Dubey et al., 2019; Lavanchy et al., 2024). To identify possible intraspecific hybrids, K values ranging from 1 to 10 were applied to a subset of samples in separate runs. Then the results from K values with minimum cross-validation errors were compared.

To examine the source of extraordinarily high individual heterozygosity, a simplified 12-individual dataset was extracted from the main dataset. It contained no missing data and three sample types according to ancestry proportions. In each type four samples were selected by having the most loci. Custom scripts were written to find SNPs being heterozygous for one type but showing alternatively fixed alleles, or the same allele, in the other two types.

For fixed heterozygosity analysis, given the ancestry proportion estimates, two Stacks sample groups with the same number of individuals were defined: F1-hybrid types and a randomly selected subset of pure Q. boulengeri types. If a retained Stacks locus was found to contain a SNP with an observed heterozygosity of 1 for a sample group, it was selected. Subsequently, using BLAST+ and an expectation value threshold of 1e-10, its consensus sequence was compared against the aforementioned ipyrad sequences of loci with unique hits on homologous scaffolds.

2.4. Nuclear haplotype networks

Haplotype network construction was conducted using Stacks loci that matched an ipyrad locus with unique-hits on homologous scaffolds and exhibited fixed heterozygosity in F1-hybrid types. For each of the 13 major scaffolds, two loci also present in ≥ 60% of pure-species types were randomly selected for analysis. Sequences with any undetermined bases were excluded. Statistical parsimony network analysis was performed using TCS version 1.21 (Clement et al., 2000) at the 95% confidence level.

2.5. Mitochondrial DNA data and tree

Mitochondrial DNA sequences from three resources were combined to form a single dataset. First, mitochondrial sequences were extracted from the RAD-seq reads cleaned by process_radtags for ingroup members (Barth et al., 2020; Laczkó et al., 2022). MITObim version 1.9.1 (Hahn et al., 2013) was used to assemble mitochondrial sequences, with the aforementioned mitochondrial genomes of Q. boulengeri and Q. robertingeri as references. Reads were mapped to assembled sequences with BWA version 0.7.17 (Li & Durbin, 2009) to obtain the read depth of each site using SAMtools version 1.9 (Li et al., 2009). Sites with read depth < 6 were masked using BEDTools version 2.25.0 (Quinlan & Hall, 2010). Second, four mitochondrial DNA fragments, atp6-cox3, nad4, nad5, and cytb, were PCR amplified and Sanger sequenced (ABI 3730) with primers in Table S2. For a subset of 59 samples, in most cases, all four fragments were sequenced. Lastly, published sequences of the cytb and cox1 genes were retrieved from GenBank for 24 and 35 samples, respectively (Qing et al., 2012; Xia et al., 2016; Cao et al., 2018; Yuan et al., 2018) (Table S1). We also used the published mitochondrial genome of Q. verrucospinosa (KF199147; Zhang et al., 2018) and the aforementioned mitochondrial genome of Q. exilispinosa as outgroups according to current mitochondrial genome tree reconstructions (Zhang et al., 2018). The mitochondrial genome of Q. boulengeri has been found to exhibit tandem duplication and random loss of trnW, trnA, trnN, light-strand origin of replication, trnC, and trnY (Xia et al., 2016). It may also be the case for Q. robertingeri (Zhang et al., 2018). This “WANCY” region was not included. Alignment was performed using ClustalX version 2.1 (Larkin et al., 2007) and checked by eye. Regions with alignment gaps in the ingroup were excluded.

A maximum likelihood tree was inferred from the combined dataset using RAxML version 8.2.12 (Stamatakis, 2014) with 1000 inferences and 2000 non-parametric bootstrap replicates. The rapid hillclimbing algorithm and automatic determination of the initial rearrangement setting were used. The bestfitting combination of data partitions and substitution models was determined using PartitionFinder version 2.1.1 (Lanfear et al., 2017) under the corrected Akaike information criterion. To mitigate potential overparameterization in an analysis of close relatives, only seven data blocks were defined for PartitionFinder evaluation. They were tRNA genes, 12S rRNA gene, 16S rRNA gene, the first, second, and third codon positions of protein-coding genes, and the D-loop region. The GTR+I+G substitution model was not evaluated due to the partial overlapping of the invariable sites parameter I and the among-site rate variation parameter G.

2.6. Principal coordinate plot, neighbor-joining tree, and population tree

To further assess genetic structure, using the main dataset and PLINK, an identity-by-state distance matrix between individuals was calculated for principal coordinates and neighbor-joining cluster analyses. Principal coordinates analysis was performed using GenAlEx version 6.502 (Peakall & Smouse, 2012) for ingroup members. A neighbor-joining tree was constructed with 1000 bootstrap replicates using the Neighbor program in PHYLIP version 3.695 (Felsenstein, 2013). The script VCFbootstrap.pl (Zheng et al., 2020) was used to generate the 1000 bootstrap datasets.

To further examine relationships among pure Q. robertingeri samples and especially between those within and outside of the known species range, a population-tree approach was adopted. A maximum likelihood inference of population splitting and mixing was conducted using SNP allele frequencies and TreeMix version 1.13 (Pickrell & Pritchard, 2012). The allele frequencies for each sampling locality were calculated using PLINK. In separate TreeMix analyses, models allowing zero to six migration events were explored, and the results were compared.

3. Results

3.1. Individual heterozygosity, ancestry proportion, and fixed heterozygosity

The ipyrad genome-wide main dataset contained 45,523 SNPs, 8740−42,842 (mean = 30,600; 67.2%) for ingroup members and 2679−13,244 (8488) for outgroup individuals. Some individuals had much higher heterozygosities than the others, and the two forms occurred together and separately in the samples. The low- and high-heterozygosity samples were inferred as pure-species types and F1-hybrid types with approximately 50/50 ancestry proportions, respectively (Fig. 2). From 21 sites mainly around the Sichuan Basin, 40 F1 types showed similar proportions of heterozygous loci with a mean of 0.167 and SD 0.004. For the other 71 samples including outgroups, a mean value of 0.029 and SD 0.015 were estimated, with a range from 0.009 found in site 30 to 0.081 for sample 28B (Figs. 1 and 2). Sexed F1 types came from 13 sites, 18 females from 9 sites and 10 males from 7 sites. The interchromosomal reciprocal translocation carriers were F1 types, and F1-type tadpoles had distinct, weak, or no transverse dark markings on dorsal tail (Table S1). For the 55 pure Q. boulengeri types, the smallest cross-validation errors were obtained with ADMIXTURE K priors 2 to 4. Sample 28B was the only one that appeared to be an intraspecific hybrid in all three cases (Fig. S1). The samples showing the lowest heterozygosities, 30B and 30D, were inferred to be pure Q. robertingeri types, the other ten of which were found in the known species range (Fig. 1).

Principal coordinate plot (A), heterozygosity pattern (B), ADMIXTURE ancestry proportions (C), and neighbor-joining tree (D) based on the 45,523-SNP main dataset.

Italic numbers on the neighborjoining tree indicate sampling sites in Fig. 1. Numbers beside nodes are bootstrap support values ≥70, and circles on nodes indicate support values of 100. The sample 28B, a putative hybrid of different Quasipaa boulengeri lineages, is marked by an asterisk. In the neighbor-joining tree, the two individuals showing Q. robertingeri genotypes found far from the known species range are highlighted in purple. BB: Q. boulengeri; RR: Q. robertingeri; BR: putative Q. boulengeri × Q. robertingeri hybridogen; Qe: four Q. exilispinosa individuals.

In the main dataset, 20,912 SNPs were extracted from loci with unique BLAST hits on homologous major scaffolds of the Q. spinosa and N. parkeri genomes. These comprised 2379-3467 loci for the five larger scaffolds and 549-1219 loci for the eight smaller ones. Based on these SNPs, the above distribution pattern of heterozygosity among samples remained unchanged for all and individual scaffolds (Table S1). Besides, two datasets containing at least 70% or 80% ingroup members at each locus were generated. Compared with the main dataset, they showed similar heterozygosity, ancestry proportions, principal coordinate plot, and neighbor-joining clustering results. No further analyses were applied to the two datasets. For clarity, a summary of most datasets used in this study is shown in Table 2.

Dataset summaries.

Unique hit: loci with unique BLAST hits on homologous chromosome-level scaffolds of two genomes of the related Quasipaa spinosa and Nanorana parkeri; BB: Q. boulengeri; RR: Q. robertingeri; BR: putative Q. boulengeri × Q. robertingeri hybridogen.

The 100% complete, 12-individual subset of the main dataset contained 25,046 SNPs. A total of 2567 sites were found to be heterozygous in the F1 type and alternatively fixed in the two pure-species types, constituting the majority of the high heterozygosity estimates. The same pattern was observed when only SNPs from loci with unique BLAST hits were considered (Table 3). No sites were found to be heterozygous in the F1 type while showing the same allele in both pure-species types.

Heterozygous SNP numbers of twelve individuals in subsets of the 45,523-SNP main dataset.

The 25,046-site subset contained no missing data and 12,183 sites each corresponding to a locus with unique BLAST hits on homologous scaffolds of two published genomes of Quasipaa spinosa and Nanorana parkeri. ID: individual ID; BB: Q. boulengeri; RR: Q. robertingeri; BR: putative BB x RR hybridogen. SNPs being heterozygous for BR while showing one allele in BB and the other in RR were considered (H) or excluded (H′).

A considerable amount of fixed heterozygosity of the F1 type was detected. A total of 1260 Stacks loci exhibiting fixed heterozygosity among the 40 F1 types (≥ 60% completeness) were found to have a BLAST hit to the aforementioned 20,912 unique-hit ipyrad loci. These comprised 156−210 and 24−66 loci corresponding to the larger and smaller scaffolds, respectively. No such fixed heterozygous loci were detected among the 40 randomly selected pure Q. boulengeri types.

3.2. Nuclear haplotype networks and sharing

For each of the 26 representative Stacks loci for fixed heterozygosity, a single parsimony network was constructed (Figs. 3 and S2). Typically more haplotypes were found for the pure Q. boulengeri type (mean = 6.7) than for the pure Q. robertingeri type (1.6). Haplotypes were rarely shared between the two types, involving only three loci and three haplotypes. On the other hand, an F1 type almost always shared one haplotype with the pure Q. boulengeri type and the other with the pure Q. robertingeri type. In the few exceptional cases between F1 and Q. robertingeri types, a single substitution instead of haplotype sharing was observed. This allowed a ready identification of putative Q. robertingeri haplotypes.

Statistical parsimony network for haplotypes of a single exemplary locus, CLocus_761519 of 144 bp, from individuals of Quasipaa boulengeri, Q. robertingeri, and putative Q. boulengeri × Q. robertingeri hybridogen genotypes.

This locus was heterozygous in putative hybridogens and had unique BLAST hits on the first scaffolds of published Q. spinosa and Nanorana parkeri genomes. Lines represent single substitutions. Filled circles are haplotypes not encountered. The other circles’ sizes are proportional to haplotype frequency. Putative hybridogens’ sample IDs are shown. Samples of Q. robertingeri exhibited haplotypes marked by asterisks.

Compared with the pure Q. robertingeri type, the F1 type usually showed the same or fewer putative Q. robertingeri haplotypes for each locus. For all 26 loci, the former ( = 9) and latter (26) respectively presented 42 and 34 such haplotypes and shared 30 of them. Besides, the haplotypes of the two pure Q. robertingeri types in site 30 were also found both in pure Q. robertingeri types from the known range and in F1 types.

Haplotype sharing between F1 and pure-species types in the same locality was common across the 26 loci. For the 21 F1 types co-occurring with pure Q. boulengeri ones at nine sites, it was observed in 94.4% of the 357 cases with available data. For the two F1 types sampled with pure Q. robertingeri ones at site 30, it was found in 42 of 43 cases. Of the 26 loci, 21 had at least five haplotypes found in pure Q. boulengeri ones. Besides, six F1-type-sample only sites had a nearby pure Q. boulengeri sample, i.e., sites 1, 2, 4, 33, 34, and 41 (Fig. 1). When these sites were also considered, local haplotype sharing between F1 and pure Q. boulengeri types was observed in 92.8% of the 347 cases across the 21 loci.

3.3. Mitochondrial DNA data and tree

Mitochondrial sequence fragments were assembled from RAD-seq reads for all ingroup members, from 2472 to 6983 bp per sample with an average of 5099 bp. A total of 179 newly obtained Sanger sequences were deposited under GenBank accessions PP065924–36 and PP083013–178 (Table S1). The combined mitochondrial dataset contained 10,568 sites and was 52.7% complete. For ingroup members, 1514 sites were variable and 1154 sites were potentially parsimony-informative. This dataset is available on ScienceDB (https://doi.org/10.57760/sciencedb.14822) (will be published upon final acceptance of the manuscript; data private access link: https://www.scidb.cn/en/s/URnIba). For the RAxML analysis, a six-partition scheme with a GTR+G model for each partition was selected: tRNA, rRNA, D-loop, and each of the three codon positions.

F1 types exhibited mitochondrial haplotypes clustering together with those of co-occurring and nearby pure Q. boulengeri types on the generally well-supported RAxML tree (Fig. 4) and often shared haplotypes with the latter. In the Q. robertingeri clade Qr, haplotypes from the eastern and western known distribution range were grouped separately. In the other clade Qb, haplotypes clustered into four subclades.

Maximum likelihood population tree of Quasipaa robertingeri with three migration edges (arrows) inferred using TreeMix (A) and RAxML mitochondrial gene tree (B).

The TreeMix tree was based on the 8,535-SNP nuclear dataset of 100% completeness for ingroups. The number below an arrow is migration weight that represents the fraction of ancestry derived from the migration edge. s.e.: average standard error of the sample covariance matrix of allele frequencies. The RAxML tree was based on a 10,568-site dataset of 52.7% completeness and six partitions. Italic numbers beside nodes are bootstrap support values ≥70, and circles on nodes indicate support values of 100 (closed) and 98–99 (open). The characters BB, RR, or BR following the sample ID indicate Q. boulengeri, Q. robertingeri, or putative hybridogen nuclear genotypes, respectively. Lineage names within the Qb clade correspond to cluster names in Fig. 2. Outgroups are not shown for the RAxML tree.

Mitochondrial haplotypes from Q. boulengeri were detected in three pure Q. robertingeri types. Sample 21A had a haplotype closely related to that of a nearby F1 type, the others, at site 30, shared haplotypes with co-occurring F1 or nearby pure Q. boulengeri types. No mitochondrial haplotypes from Q. robertingeri were found in pure Q. boulengeri types.

3.4. Principal coordinate plot, neighbor-joining tree, and population tree

The principal coordinate plot showed three discrete clusters corresponding to the F1 and two pure-species types, with the former located approximately intermediate between the latter two (Fig. 2). Among them, 71.1% of the observed genetic variation was explained by the first two principal coordinates. Much more variation was observed within the F1 or pure Q. boulengeri types than in the pure Q. robertingeri type. Subclusters could be identified in the former two.

The neighbor-joining tree was generally well supported. Two pure-species clusters with bootstrap support of 100 were observed. The two pure Q. robertingeri types from site 30 were grouped with those from the eastern known species range (sites 23–25 in Fig. 1) with strong support. For the F1 type, subclusters BR1, BR2, BR3, and BR4 were identified, corresponding to the four subclusters of the pure Q. boulengeri type, BB1, BB2, BB3, and BB4, respectively (Fig. 2D). When applying these subclusters to the principal coordinate plot, parallel structures were readily detected. Along the second principal coordinate, a BB2+BB3 cluster was located between BB1 and BB4, and BR2+BR3 was located between BR1 and BR4. Besides, BB1–4 and BR1–4 corresponded to the four subclades of the mitochondrial clade Qb (Fig. 4). The full neighbor-joining tree is available as Fig. S3.

The dataset for TreeMix analysis of pure Q. robertingeri types contained 8535 SNPs. Three migration edges between populations from the known species range were found to increase the likelihood significantly (all P < 0.01). With these edges added or not, the same maximum likelihood tree topology was obtained in which site 30 was again nested within the eastern clade (Fig. 4).

4. Discussion

4.1. Hybridogenesis and insular occurrence

In the Quasipaa boulengeri-Q. robertingeri complex, hybridogenesis can be inferred and distinguished from other systems where hybrids are F1 types. Our dataset consisted of two low-heterozygosity parental types and one hybrid type characterized by much higher genome-wide heterozygosity (Fig. 2B), ~50/50 ancestry proportions (Fig. 2C), and a large number of fixed heterozygous loci (6%) throughout the genome. Previous karyotyping of these frogs has identified solely 2n = 26 diploid individuals (e.g., Tan & Wu, 1987; Xia et al., 2020), which when genotyped appear as either F1 or parental types. With F1 type hybrids widely detected (Fig. 1), the known natural range of Q. robertingeri is narrow (e.g., Qing et al., 2012; Yan et al., 2013; Fei, 2020; Wang et al., 2021). Hybrid sterility, or any mode requiring many local hybrid origins, can be ruled out by such a lack of coexistence of hybrids with both parental species (Table 1). In the present study, 10 of the 27 sexed F1 type adults were males from 7 sites (Table S1). This is not expected for parthenogenesis, gynogenesis, or kleptogenesis. In lineages exhibiting such a mode, males are absent or very rare (Avise, 2008; Tilquin & Kokko, 2016; Fujita et al., 2020). For 17 sites across the F1 type range, local nuclear haplotype sharing between F1 and pure Q. boulengeri types occurred at a rate of 92.8% in 21 representative loci. This is unexpected for androgenesis, parthenogenesis, gynogenesis, and kleptogenesis when assuming a single or few local hybrid origins. In these modes a codistributed parental species has only occasional (kleptogenesis) or no contribution to the nuclear genome (Lampert & Schartl, 2010; Lehtonen et al., 2013; Morgado-Santos et al., 2017). Furthermore, the widespread high mitochondrial DNA similarity between co-distributed F1 and Q. boulengeri types (Fig. 4B) is not expected for parthenogenesis, gynogenesis, and kleptogenesis of a single or few local hybrid origins (Bogart et al., 2009; Bi & Bogart, 2010; Schmidt et al., 2011). Last, the two pure Q. robertingeri types found with F1 types at site 30 can hardly be explained by even a combination of non-hybridogenetic processes. With divergent non-conspecific mitochondrial genomes (Plotner et al., 2008; Liu et al., 2010), these two samples were nested deeply among those from the narrow known range approximately 500 kilometers away based on genome-wide nuclear data (Figs. 2D and 4). It is unlikely that, through androgenesis, they have been produced by parthenogenetic hybrid females and local relict Q. robertingeri males (Tinti & Scali, 1996). They may also in theory have arisen from androgenesis with homologous haploid sperm formed by hybrids through a meiotic hybridogenesis-like process (Arai, 2023; Lafond & Angers, 2024). However, their low but apparent and similar chromosome-level heterozygosities indicate other infrequent events, polyspermy involving different males (Talevi, 1989; Schwander & Oldroyd, 2016; Hater et al., 2020; Iwao & Ueno, 2022). In contrast, all the above findings are consistent with a hybridogenetic system in which hybridogens can share nuclear and mitochondrial genetic variation with local parental species and can produce heterozygous pure-species offspring (Vorburger, 2001a, b; Dufresnes & Mazepa, 2020; Doležálková-Kaštánková et al., 2021).

Along with the clustering of pure Q. robertingeri types, two lines of evidence also support an “out of the known range” origin for the widespread Q. robertingeri nuclear genomes. First, the mitochondrial lineage corresponding to this species is endemic to the narrow known range, and it shows a significant east-west division that suggests long-term persistence (Figs. 1 and 4B) (Yan et al., 2013; Wang et al., 2021). Second, for nuclear loci, the variation detected within the known range is greater than and overlaps substantially with that found outside it. For the 26 representative loci, on average 7 pure-species and 28 F1/pure-species types were respectively sampled within and outside the known range. However, 23.5% more putative Q. robertingeri haplotypes were detected in the former, with 88.2% such haplotypes observed in the latter shared by the former. Based on the 45,523-SNP data, parallel geographic structures were inferred for pure Q. boulengeri and F1 types (Figs. 1 and 2A). This also agrees with the “out of the known range” scenario. Because nuclear variation in Q. boulengeri was much higher than that of Q. robertingeri from the known range, among hybridogens the former could contribute dominantly in characterizing geographic structure.

In addition, it is unlikely that accidental transport of tadpoles (along the Yangtze River) or human-mediated translocation from the known range has caused the insular occurrence of Q. robertingeri nuclear genomes at site 30 (Fig. 1). In both cases, the F1 types of this site are expected to have a local Q. boulengeri haploid nuclear genome and hence belong to subcluster BR1. However, they were nested in subcluster BR4 and were most closely related to the F1 type from an intermediate site (28) between the known range and their locality (Fig. S3). Moreover, three of the four samples from site 30 also had mitochondrial haplotypes most closely related to the one found in site 28 (Fig. 4B).

To our knowledge, this study provides the first evidence consistent with a hypothesis that hybridogenesis mediates the occurrence of individuals far from the continuous distribution of conspecifics. Such an occurrence obviously does not guarantee the successful establishment of a Q. robertingeri population, but the possibility deserves further exploration in addition to cytogenetic examination of hybridogenesis. The reason is that if this kind of reconstituted population does exist, then the finding will introduce a new mode of range expansion and extend our understanding of the interaction between narrow-ranged and widespread relatives.

4.2. Individual heterozygosity and features of the spiny frog system

Atypical reproductive systems of hybrid origin are unusual and useful research resources (Laskowski et al., 2019). For instance, hybridogenesis suggests an approach to preserve a species in non-optimal conditions by taking advantage of a relative and of hybrid vigor (Dubey & Dufresnes, 2017). This study constitutes another example of heterozygosity patterns inspiring a test of atypical reproduction, with the discovery of a putative hybridogenetic complex unrelated to any known one for the first time since 2014 (Kimura-Kawaguchi et al., 2014; Lavanchy & Schwander, 2019; Dufresnes et al., 2024). From a dataset of genome-wide loci, individual heterozygosity estimates are among the first statistics that one can obtain. If some estimates are much higher than the others which is the case here, the pattern can serve as a clue to the presence of unusual reproductive modes (e.g., Jaron et al., 2021; Lavanchy et al., 2024). Such datasets are common in the era of high-throughput sequencing. However, a simple Google Scholar search has returned only 1850 or 812 articles dated between 2014 and June 3, 2025 with the phrase “individual heterozygosity/heterozygosities” or “proportion/proportions of heterozygous loci/sites”, respectively. It implies that this variable has not been reported in most cases, making the accumulating genome-wide data a resource for searching for individual heterozygosity clues of atypical reproduction of interspecific hybrid origin.

Compared with documented hybridogenetic systems, Q. boulengeri and Q. robertingeri occur in a relatively poorly studied region from a natural history perspective, East Asian mainland. The former have been reported from North America, Europe, Australia, and the Japanese islands (Suzuki et al., 2020; Majtánová et al., 2021). Research efforts are not equally distributed across taxa. From under-studied taxa, atypical reproduction may even be found as a by-product of efforts to understand the study system.

As revealed by conventional staining, the Q. boulengeri x Q. robertingeri system shows structurally abnormal, non-Y/W chromosomes which are regionally widespread. The fixation or partial fixation of such chromosomal mutations is generally rare probably due to deleterious effects and meiotic pairing abnormalities (Wallace et al., 1999; Bidau & Marti, 2010; Mezzasalma et al., 2017, 2021). In hybridogenesis, clonal genomes, which do not face meiotic pairing problems and recombination, are prone to accumulation of deleterious mutations and can exhibit diagnostic chromosomal rearrangements (Suzuki et al., 2017, 2020; Lavanchy & Schwander, 2019; Dufresnes & Mazepa, 2020). In the spiny frog system, female and male carriers of an interchromosomal reciprocal translocation (karyotype IRT) have been found in some populations at the western edge of the Sichuan Basin (Fig. 1). Furthermore, there is a tight linkage between the two involved chromosomes, with other co-occurring abnormal karyotypes rarely observed (Qing et al., 2012). In this region, our samples with normal karyotype can be either pure Q. boulengeri (n = 10) or putative hybridogens (n = 4). As predicted by a hypothesis that the linkage is maintained by clonal transmission of the Q. robertingeri haploid genome, the 6 samples of karyotype IRT all belong to the latter (chi-square P < 0.01). The rest few samples of rare abnormal karyotypes are also putative hybridogens showing chromosome-level heterozygosity estimates of similarly large values, implying a complicated scenario with for instance possible subsequent rearrangements. When non-sex-specific abnormal chromosomes are encountered, hybridogenesis may serve as a hypothesis to be tested.

Quasipaa boulengeri and Q. robertingeri show significant mitochondrial divergence comparable to those between parental species of known hybridogenetic systems. These parental species are expected to be similar enough to produce fertile offspring. They are also expected to attain a certain divergence before hybridization leads to a transition to hybridogenesis rather than to genetic admixture (Dufresnes & Mazepa, 2020; Barley et al., 2022). Currently, genome-wide nuclear data for a comparison across known hybridogenetic systems are not available. However, a mitochondrial cytb partial sequence of 948 bp is accessible for most of these parental species in GenBank. By arbitrarily using one such sequence to represent a Pelophylax frog species (JN627421, 22, MF667576; Hofman et al., 2012; Dufresnes et al., 2017), one can obtain uncorrected p-distances of 0.141 and 0.133. They are for the two parental species pairs P. ridibundus–P lessonae and P. ridibundus–P perezi (Dufresnes & Mazepa, 2020), respectively. The corresponding value is 0.113 between Q. boulengeri (KC686711) and Q. robertingeri (MZ895125). For hybridogenetic fishes, such p-distances can be estimated as 0.100 and 0.082 in Hexagrammos (AB290742, 48, 69; Kimura et al., 2007), 0.129–0.133 in Poeciliopsis (AF412139, 51, MK860197, 98, KX229692; Mateos et al., 2002, 2019; Jeon et al., 2016), and 0.098–0.103 in Hypseleotris (DQ468188, 236, 250, 296; Thacker et al., 2007). In Squalius, a mitochondrial lineage corresponding to one of the two parental species has not been found (Matos et al., 2019). In the other, Bacillus stick insect hybridogenetic system, a similar level of mitochondrial divergence can also be observed. An uncorrected p-distance of 0.074 is estimated for a 675-bp fragment of the cox2 gene (AF148299, 315; Mantovani et al., 2001), which evolves at a rate close to that of the cytb gene (e.g., Hofman et al., 2012). These estimates are higher than many shallow interspecific distances (e.g., Young et al., 2013; Tarvin et al., 2017), and their similarity agrees with a previous comparison (Stöck et al., 2021). This suggests an investigation of the range of nuclear divergence for various parental species pairs.

Data availability

Custom scripts used and the data underlying this article including RAD-seq reads, SNP datasets, BLAST query sequences, and sequence datasets of nuclear and mitochondrial loci are available on ScienceDB (https://doi.org/10.57760/sciencedb.14822) (will be published upon final acceptance of the manuscript; data private access link: https://www.scidb.cn/en/s/URnIba). Mitochondrial sequences obtained by the Sanger method were deposited in GenBank under accessions PP065924–36 and PP083013–178.

Supplementary Files

ADMIXTURE cross-validation plot and ancestry proportions for 55 Quasipaa boulengeri individuals based on the 45,523-SNP dataset.

*: Sample 28B.

Statistical parsimony networks for haplotypes of 26 loci.

Corresponding scaffolds of published genomes of Quasipaa spinosa (Qs) and Nanorana parkeri (Np) are shown. The 0a and 1a in the sequence name indicate allele 0 and allele 1, respectively. The last character in a sequence name, b, r, or H, indicates Q. boulengeri, Q. robertingeri, or putative hybridogenetic individuals, respectively. Between them, the sample ID is given. Lines represent single substitutions, and numbers beside lines are site numbers. Circles are haplotypes not encountered.

Neighbor-joining tree based on the SNP identity-by-state distance matrix for individuals.

Numbers at nodes are bootstrap support values ≥70, and circles indicate support values of 100. BB: Qu-asipaa boulengeri; RR: Q. robertingeri; BR: putative hybridogenetic hybrids between Q. boulengeri and Q. robertingeri.

Acknowledgements

We thank Pizhu Zhang and Siqi Yuan for their help with this work and thank Jinzhong Fu for discussions. We also thank three anonymous reviewers for their helpful comments on a previous version of the manuscript. We thank editors for their comments helping to improve the manuscript. This work was conducted with the permission of the Ethics Committee of the Chengdu Institute of Biology (CIBDWLL2021009) and was supported by the National Natural Sciences Foundation of China (NSFC-32170465 to Y.Z., NSFC-32170419 to X.Z., and NSFC-32270448 to Y.X.).

Additional files

Table S1. Details and proportion of heterozygous loci (Ho) of samples. The main dataset contained a subset of 20,912 SNPs each corresponding to a locus with unique BLAST hits on homologous scaffolds of a Quasipaa spinosa genome and a Nanorana parkeri genome. BR: putative Q. boulengeri × Q. robertingeri hybridogen; IRT: karyotype with an interchromosomal reciprocal translocation.

Table S2. Primers used in PCR and sequencing of fragments of Quasipaa boulengeri mitochondrial genome.