Benchmarking and optimization of methods for the detection of identity-by-descent in high-recombining Plasmodium falciparum genomes

eLife Assessment

This important study presents an evaluation of several tools used for detecting Identity-By-Descent (IBD) segments in highly recombining genomes, using simulated data to replicate the high recombination and low marker density of Plasmodium falciparum, the parasite responsible for malaria. The evidence presented by the authors is convincing demonstrating that users should be cautious calling IBD when SNP density is low and recombination rate is high. This study will be of interest to scientists working in the field of genome evolution and infectious diseases

https://doi.org/10.7554/eLife.101924.3.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Convincing: Appropriate and validated methodology in line with current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Methods
Data availability
References
Article and author information
Metrics

Abstract

Genomic surveillance is crucial for identifying at-risk populations for targeted malaria control and elimination. Identity-by-descent (IBD) is increasingly being used in Plasmodium population genomics to estimate genetic relatedness, effective population size (N_e), population structure, and signals of positive selection. Despite its potential, a thorough evaluation of IBD segment detection tools for species with high recombination rates, such as Plasmodium falciparum, remains absent. Here, we perform comprehensive benchmarking of IBD callers – probabilistic (hmmIBD, isoRelate), identity-by-state-based (hap-IBD, phased IBD) and others (Refined IBD) – using population genetic simulations tailored for high recombination, and IBD quality metrics at both the IBD segment level and the IBD-based downstream inference level. Our results demonstrate that low marker density per genetic unit, related to high recombination relative to mutation, significantly compromises the accuracy of detected IBD segments. In genomes with high recombination rates resembling P. falciparum, most IBD callers exhibit high false negative rates for shorter IBD segments, which can be partially mitigated through optimization of IBD caller parameters, especially those related to marker density. Notably, IBD detected with optimized parameters allows for more accurate capture of selection signals and population structure; IBD-based N_e inference is very sensitive to IBD detection errors, with IBD called from hmmIBD uniquely providing less biased estimates of N_e in this context. Validation with empirical data from the MalariaGEN Pf7 database, representing different transmission settings, corroborates these findings. We conclude that context-specific evaluation and parameter optimization are essential for accurate IBD detection in high-recombining species and recommend hmmIBD for Plasmodium species, especially for quality-sensitive analyses, such as estimation of N_e. Our optimization and high-level benchmarking methods not only improve IBD segment detection in high-recombining genomes but also enhance overall genomic analysis, paving the way for more accurate genomic surveillance and targeted intervention strategies for malaria.

Introduction

Malaria is a mosquito-borne disease caused by Plasmodium parasites. It poses a significant public health challenge globally, with an estimated 263 million clinical cases and 597,000 deaths occurring in 2023 (World Health Organization, 2024). Despite intensive efforts toward malaria control and elimination, malaria reduction has slowed or plateaued in recent years, due to multiple factors including antimalarial drug resistance and lack of highly efficacious vaccines and detailed, timely surveillance. Advances in sequencing technologies and the scale of resequencing studies now allow for parasite genomic surveillance, which can provide insights into the efficacy of malaria interventions and guide the design of targeted elimination strategies in different transmission settings (Neafsey et al., 2021; Wesolowski et al., 2018; Shetty et al., 2019).

Identity-by-descent (IBD) is an essential tool in population genomics that has been used to estimate genetic relatedness (Taylor et al., 2019b; Gerlovina et al., 2022; Henden et al., 2018; Schaffner et al., 2023), positive selection (Henden et al., 2018; Guo et al., 2024; Browning and Browning, 2020), effective population size (N_e) (Gutenkunst et al., 2009; Morgan et al., 2020), fine-scale population structure (Guo et al., 2024; Shetty et al., 2019; Nait Saada et al., 2020), and migration patterns (Shetty et al., 2019; Al-Asadi et al., 2019). IBD-based analyses of parasite genomic data have applications in diverse epidemiological settings, enhancing malaria surveillance and control by providing crucial information to researchers and policymakers (Wesolowski et al., 2018; Camponovo et al., 2023; Guo et al., 2025). In high transmission settings, a rapid decrease in genetic diversity and effective population size may indicate a successful malaria intervention (Morgan et al., 2020). In intermediate transmission settings, IBD-based analysis of parasite population structure and migration provides valuable insights into sources and sinks of transmission for planning targeted elimination strategies (Shetty et al., 2019; Guo et al., 2024; Henden et al., 2018). In low transmission settings, IBD-based estimates of pedigree relationship analysis between infections can help differentiate local transmission from importation and causes of recurrent infection (Taylor et al., 2019b; Wong et al., 2025). Additionally, IBD-based detection of positive selection can assist in identifying and monitoring the emergence and spread of antimalarial drug resistance (Henden et al., 2018; Amambua-Ngwa et al., 2019; Guo et al., 2024).

However, the reliability of IBD-based analysis is highly dependent on the accuracy of the detected IBD segments. Insufficient density of genetic markers, on a local or genome-wide scale, probably contributes to high error rates in the identification of IBD segments (Browning and Browning, 2020; Freyman et al., 2021; Zhou et al., 2020) and reduced accuracy of IBD-based estimates of population demography (Browning and Browning, 2015; Taylor et al., 2019a). Many IBD detection methods have been designed for human genomes, where the demographic history and evolutionary parameters, including the recombination rate, differ considerably from Plasmodium falciparum (Pf). The effective population size of humans has increased rapidly in recent history (Gutenkunst et al., 2009), while that of Pf is decreasing, particularly in regions such as Southeast Asia and South America (Joy et al., 2003; World Health Organization, 2024), due to enhanced malaria elimination efforts. More importantly, Pf genomes recombine about 70 times more frequently per unit of physical distance (Gardner et al., 2002; Su et al., 1999; Jiang et al., 2011; Amambua-Ngwa et al., 2019; Miles et al., 2016) than the human genome (Kong et al., 2002), while sharing a similar mutation rate (Bopp et al., 2013; Camponovo et al., 2023; Churcher et al., 2014; Hamilton et al., 2017; Huber et al., 2016; McDew-White et al., 2019; Neafsey et al., 2021) as human genomes (Campbell et al., 2012) on the order of 10^–8 per base pair per generation. The decreasing population size (Guo et al., 2024) and the high recombination rate in Pf result in a reduced number of common variants, such as single-nucleotide polymorphisms (SNPs), per unit of genetic distance. Large human whole-genome sequencing data sets typically provide millions of common biallelic SNP variants (Taliun et al., 2021), while Pf data sets only have tens of thousands (Abdel Hamid et al., 2023). Given that the human genome is about twice as large as Pf in genetic units, the per-centimorgan (cM) SNP density in Pf can be two orders of magnitude lower than in humans, which may not provide sufficient information for detecting IBD segments. Thus, it is critical to understand whether IBD detection methods can still generate accurate IBD segments under low SNP density conditions, considering the specific evolutionary parameters of the Pf genome.

Evaluating the quality of the detected IBD segments requires benchmarking with the known ground truth through simulation studies (Zhou et al., 2020; Freyman et al., 2021; Tang et al., 2022; Shemirani et al., 2021). The performance of IBD detection tools developed for use in the human context is typically measured using simulated genomes reflecting demographic and evolutionary parameters of human genomes (Browning and Browning, 2011; Browning and Browning, 2011; Zhou et al., 2020; Tang et al., 2022; Freyman et al., 2021; Shemirani et al., 2021), which likely do not apply directly to Pf. For tools explicitly designed for malaria parasites, such as isoRelate and hmmIBD, the evaluation of the quality of IBD was based on parent-offspring (Schaffner et al., 2018) or pedigree-based simulations (up to 25 generations; Henden et al., 2018) that focused primarily on close relatives, which more likely mirrors low malaria transmission settings than high transmission settings. Furthermore, benchmarking methods and definitions of IBD accuracy are inconsistent across studies (Zhou et al., 2020; Freyman et al., 2021; Schaffner et al., 2018; Henden et al., 2018), making the results of the quality evaluation of IBD difficult to compare. Considering the limitations of existing evaluations of IBD detection methods for Pf genomes, a unified benchmarking framework specifically designed for high recombining Pf genomes from low- and high-transmission settings is needed. Such a framework will assist researchers in comparing and prioritizing different IBD detection methods for intended downstream analysis.

In the present study, we developed a unified IBD benchmarking framework that reflects the demographic and evolutionary parameters of Pf (Figure 1 and Supplementary file 1—Data S1). We evaluated how different recombination rates and marker densities affect the quality of detected IBD segments, performed IBD caller-specific parameter optimization, and benchmarked different IBD detection methods with their optimized parameters at both the IBD segment and downstream inference levels. Furthermore, we validated our findings from simulation analysis with empirical data sets constructed from subsets of samples from the publicly available whole-genome sequencing database MalaraiGEN Pf7. Our findings indicate that a high recombination rate (given the same mutation rate) is associated with a low SNP density (per genetic unit), which substantially affects the accuracy of the detected IBD segments. To obtain optimal results, we generally recommend using hmmIBD when phased genotype data from haploid genomes are available. If human-oriented IBD callers are used, we recommend optimizing the parameters prior to applying to Pf genomes.

Figure 1

Download asset Open asset

Overview of methods used in benchmarking IBD detection methods.

Benchmarking and optimization of IBD callers for Pf include simulation analyses (top, green shading) and empirical data-based validation analyses (bottom, gray shading). (1) For the simulation study, the genealogical trees and phased genotype data are generated via the combination of forward (SLiM Haller et al., 2019; Haller and Messer, 2019) and coalescent (msprime Baumdicker et al., 2022) simulations (indicated by the superscript a in the diagram). True IBD is obtained from a simulated genealogy tree via tskibd (Guo et al., 2024) (b) and inferred IBD from the phased genotype using different IBD callers, including hap-IBD (Zhou et al., 2020), hmmIBD (Schaffner et al., 2018), isoRelate (Henden et al., 2018), Refined IBD (Browning and Browning, 2013), and phased IBD (Freyman et al., 2021). IBD benchmarking is performed at two levels. The first is at the IBD segment level. The metrics include false positive and false negative rates (c), population-level total IBD per length bin (Browning and Browning, 2015) (d), and total IBD per isolate pair. The second is at the level of IBD-based downstream analyses, including the effective population size (N_e) by IBDNe (Browning and Browning, 2015) (e), community membership through the InfoMap algorithm (Rosvall et al., 2009; Csardi and Nepusz, 2006) (f), and selection signals by statistics X_iHS (Guo et al., 2024) (g). As the default parameters of different IBD callers, particularly those developed for human data, may not be ideal for Pf genomes, we performed grid searches for key parameters for each IBD caller so that the comparison was based on the best performance of each caller (see Supplementary file 1—Data S1 for detailed information on simulation and IBD calling parameters and used values). (2) For validation in empirical data where true IBD is not available, we obtained IBD-based estimates using IBD from different callers and assessed which version of IBD can generate the expected patterns. The empirical data sets are subsampled from the MalariaGEN Pf7 database (Abdel Hamid et al., 2023) (h).

Results

Low SNP density due to high recombination rate affects the accuracy of IBD calls

Accurate estimation of IBD segments often requires dense genetic markers to capture ancestral relationships left by recent recombination events (Thompson, 2013; Browning and Browning, 2012; Zhou et al., 2020; Tang et al., 2022; Kelleher et al., 2019; Speidel et al., 2019). However, Pf has very low marker densities per genetic unit, which may significantly affect the accuracy of inferred IBD segments.

To assess how recombination rates affect marker density per genetic unit and the detection of IBD segments, we simulated genomes with varying recombination rates but a fixed mutation rate. Under a single-population demographic model (see Methods), we found that the density of common biallelic SNPs (minor allele frequency ≥ 0.01) per cM, in selectively neutral scenarios, is inversely correlated with recombination rates ranging from 3 × 10^–9 to 10^–6 per base pair per generation (Figure 2a). For instance, the SNP density of Pf-like genomes is 25 SNPs per cM, which is approximately 1/67 of that of the human-like genomes (1660 SNPs per cM). We further assessed how low marker densities associated with high recombination rates affect the accuracy of detected IBD segments, calculating two metrics (via ishare/ibdutils, see Code availability), including the false negative rate (FN), which represents the proportion of a true IBD segment (obtained via tskibd Guo et al., 2024) not covered by inferred IBD segments of the same genome pairs, and the false positive rate (FP), which indicates the fraction of an inferred segment not covered by true segments of the same genome pairs (see Methods for detailed definitions, and Figure 1 method overview). Our analysis showed that as the recombination rate increases, both the genome-wide FN and FP (Figure 2b) increase for IBD inferred from hmmIBD. The patterns vary in the other four IBD detection methods, and all suffer elevated FNs and/or FPs as the recombination rate increases (Figure 2b; Figure 2—figure supplement 1), with the exception of isoRelate, which has better IBD quality with lower marker densities. The results suggest that low SNP density per genetic unit can dramatically affect the reliability of detected IBD segments.

Figure 2 with 1 supplement see all

Download asset Open asset

High recombination rates reduce genetic marker density and affect the quality of detected IBD segments.

(a) The number of common single nucleotide polymorphisms (SNPs) (minor allele frequency ≥ 0.01) per genetic unit (centimorgan, cM) in simulated genomes with different recombination rates. In these simulations (blue line), the mutation rates are fixed; the recombination rates vary widely to include the rate for both humans (red diamond) and Pf (red star). For each recombination rate, n = 4 independent simulations of chromosomes were performed. Data were plotted in the form of mean (marker) ± standard deviation (vertical lines, which are difficult to visualize given low variation among chromosomes). (b) Accuracy of IBD segments detected from genomes simulated with different recombination rates. The accuracy of IBD segments is measured by the false negative rates (top panel) and false positive rates (bottom panel). The plotted error rates represent the genome-wide rates (as defined in Methods) of IBD segments identified with default IBD caller parameters unless stated otherwise in Supplementary file 2—Table S4 . These rates are based on one representative set from n = 3 simulation sets. The plotted vertical lines indicate standard deviations of error rates across all genome pairs in the representative simulated set. Both the vertical lines and markers are horizontally staggered for clarity. Only error rates for two IBD detection methods, hmmIBD and hap-IBD, are included in (b) for simplicity. The error rates for all 5 IBD callers are provided in Figure 2—figure supplement 1. For both (a) and (b), the genomes were simulated under the single-population model (see Methods). Note that log scales are used for the y axis in (a) and the x axis in (b).

Varying quality of IBD inferred from simulated Pf genomes via different IBD callers

Multiple IBD callers have been used for Pf, including Pf-oriented, Hidden Markov Model-based methods, such as hmmIBD (Schaffner et al., 2018) and isoRelate (Henden et al., 2018), and those originally designed for human genomes, such as Refined IBD (Morgan et al., 2020) and Beagle (version 4.1) (Shetty et al., 2019). We analyzed hap-IBD (Zhou et al., 2020) and phased IBD (Freyman et al., 2021) in addition to hmmIBD, isoRelate, and Refined IBD, since the former represents two recent key advancements in the development of IBD detection methods that scale well to large sample sizes and genome sizes.

To evaluate the applicability and accuracy of these IBD detection methods in analyzing Pf genomes, we performed benchmarking analyses in simulated genomes (Figure 1, top panel), mimicking the high recombination rate and the decreasing population size of Pf populations. Our analyses include three sets of comparisons: (1) baseline benchmarking, where we mainly used the default parameter values for each IBD caller and compared the performance in Pf genomes at the level of an IBD segment and their simple statistics; (2) post-optimization benchmarking, where we used parameter values optimized specifically for each IBD caller so that the comparisons are based on the optimal performance of each method; (3) human-like genome benchmarking, where detected IBD segments are expected to have low error rates for human-oriented IBD callers and thus were used as an internal control to validate our benchmarking pipeline (see Supplementary file 2—Table S1 for the IBD caller parameters and Methods for demographic models).

Baseline benchmarking analysis shows that all callers except hmmIBD suffer from high FN rates, especially for short IBD segments (genome-wide FN/FP rates reflect shorter IBD segments as most segments are short; Figure 3). Similarly, genetic relatedness metrics based on pairwise total IBD are largely underestimated for most callers (Figure 3—figure supplement 1). We found that by default, hmmIBD has relatively low FN/FP error rates (Figure 3) and is less biased for relatedness estimates (Figure 3—figure supplement 1). In contrast, isoRelate and human-oriented callers have high FN rates and varying FP rates (Figure 3). Thus, both Pf- and human-oriented IBD callers can suffer high error rates when detecting IBD from genomes with a high recombination rate and a low marker density, with the extent depending on underlying assumptions and methodologies.

Figure 3 with 1 supplement see all

Download asset Open asset

The accuracy of IBD segments detected from Pf genomes varies across IBD callers.

IBD segments were inferred from genomes simulated under the single-population model with a shrinking population size and a recombination rate compatible with Pf. The accuracy of IBD was evaluated using the calculated false positive rate (y axis) and false negative rate (x axis). The rates were calculated for different length bins in centimorgans, including [3-4), [4-6), [6-10), [10-18), [18, inf) centimorgans and at the genome-wide level (defined by overlapping analysis between true IBD segments and inferred IBD segments from each genome pair). These rates are based on one representative set from n = 3 simulation sets. The plotted vertical and horizontal lines represent the standard deviations of error rates (horizontal for false negatives and vertical for false positives) calculated across all relevant segments for length-bin specific rates or across all genome pairs for genome-wide rates in the representative simulated set. The titles of the subplots indicate the IBD callers analyzed. The results of the simulations under the multiple-population model and the UK human demographic model are provided as Supplementary file 2—Data S2.

IBD-caller-specific parameter optimization for Pf improves IBD accuracy

Since these IBD callers are optimized for different species or genotype data sets, we hypothesized that optimization of IBD caller parameters under a unified framework designed for Pf genomes can improve the performance of these callers in analyzing malaria parasite data. As searching the entire IBD caller parameter space is inefficient, our optimization focused mainly on parameters potentially affected by or needing adjustment due to differences in marker density between the high-recombining species (e.g. Pf) and the lower-recombining species (e.g. humans). For IBD callers that do not explicitly have marker density-related parameters, such as hmmIBD, we explored other parameters that likely affect IBD quality. We performed grid searches to find parameter values that generate inferred IBD with low and balanced error rates (see Supplementary file 2—Table S2 for parameters explored and their corresponding values, and Supplementary file 2—Data S2 for detailed results).

We found that most IBD callers have a key parameter that can dramatically affect their FN/FP rates (Supplementary file 2—Table S2). For example, the FN rates of IBD called from hap-IBD change substantially when the min-marker parameter varies (see Supplementary file 2—Data S2). With a value of 70, the FN rate for short IBD segments dramatically decreases such that the FN and FP error rates become more balanced (Figure 4a). Consistently, genetic relatedness estimates change from being highly underestimated before parameter optimization (Figure 4b, left column) to being more balanced after optimization (Figure 4b, right column). Similar improvements were observed for Refined IBD and phased IBD (Figure 4—figure supplement 1). In contrast, the quality metrics remained largely unchanged during parameter optimization attempts for hmmIBD and isoRelate, with hmmIBD being more accurate and unbiased and isoRelate suffering from high false negative rates and underestimated relatedness (Figure 4—figure supplement 1).

Figure 4 with 2 supplements see all

Download asset Open asset

IBD caller-specific parameter optimization can improve the quality of IBD segments inferred from simulated Pf genomes (using hap-IBD as an example).

(a) Quality of detected IBD measured by false positive and false negative rates before (left column) and after (right column) hap-IBD-specific parameter optimization. As indicated in the axis legend, the error rates were calculated for different length ranges (in centimorgans), including [3-4), [4-6), [6-10), [10-18), [18, inf) and at the genome-wide level. These rates are based on one representative set from n = 3 simulation sets. The plotted vertical and horizontal lines represent the standard deviations of error rates (horizontal for false negatives and vertical for false positives) calculated across all relevant segments for length-bin-specific rates or across all genome pairs for genome-wide rates in the representative simulated set. (b) Quality of detected IBD measured by total genome pairwise IBD, an estimate of genetic relatedness, before (left column) and after (right column) parameter optimization. Data of n = 1000 haploid genomes from a representative simulation set was plotted with each subplot for hap-IBD before and after parameter optimization as indicated in the titles. Each dot represents a pair of genomes with the coordinates x and y being true and inferred total IBD. In (b), the blue dots are the pairs with nonzero true and inferred total IBD, while red dots are pairs with either true total IBD or inferred total IBD being 0; zero-valued total IBD was replaced with 1.0 cM for visualization purposes. The red dotted line of y = x indicates the expected pattern, that is, true total IBD equal to inferred total IBD if the inferred IBD was 100% accurate. Note that log scales are used in both the x and y axes in (b).

The human-oriented IBD callers, when not optimized for Pf, underperformed hmmIBD, especially for Refined IBD and phased IBD (Figure 4—figure supplement 1a). To exclude potential problems in our benchmarking pipeline, we simulated genomes with recombination rates and demographic history consistent with the human population in the UK (Figure 4—figure supplement 2a, left column, and Figure 4—figure supplement 2b, left column; also see Methods). These callers, evaluated without Pf-optimized parameters, indeed perform much better on human genomes, showing consistently lower FN/FP error rates (Figure 4—figure supplement 2a, left column) and less biased total IBD-based relatedness estimates (Figure 4—figure supplement 2b, left column), compared to Pf-like genomes (Figure 4—figure supplement 1a and b, left columns). The results support the robustness of our benchmarking pipeline (Figure 4—figure supplement 2a and b, left columns) and demonstrate challenges in applying human-oriented IBD callers to Pf. Furthermore, we found that IBD caller performance varies with demographic configurations (the single-population model in Figure 4—figure supplement 1a and b, left columns, versus UK human model in Figure 4—figure supplement 2a and b, right columns) even with the same (Pf) recombination/mutation rates and default IBD caller parameters, suggesting that the optimization is demography-dependent (see details in Supplementary file 2—Data S2).

Post-optimization benchmarking via downstream inferences

IBD-based downstream analyses, such as estimation of N_e, selection signals, and population structure, are key applications of IBD segments in population genetics, which often rely on the high quality of input IBD segments. With optimized IBD caller-specific parameters tailored for Pf-like genomes, we can expect IBD callers to perform at their best, which allows high-level benchmarking by comparing IBD-based downstream estimates (Figure 5 and supplements).

Figure 5 with 5 supplements see all

Download asset Open asset

Post-optimization benchmarking of different IBD callers by comparing downstream estimates N_e .

With parameters optimized for each IBD caller, the performance of IBD callers was evaluated by comparing the N_e trajectory for the recent 100 generations estimated via IBDNe based on true (black dashed line) IBD versus inferred IBD (red solid line). True IBD was calculated from simulated genealogical trees via tskibd; inferred IBD includes those inferred from hap-IBD, hmmIBD, isoRelate, Refined IBD, and phased IBD, with their N_e estimates shown from top left to bottom right. The shaded areas surrounding the red lines indicate 95% confidence intervals as determined by IBDNe. The plots show results from one representative set of n=3 replicated simulation sets. See Figure 5—figure supplement 5 for pre-optimization results. Note that log scales are used on the y axes.

For IBD-based selection detection, we simulated positive selection on each of the 14 chromosomes via the single-population model and identified IBD peaks with different callers (see Methods). These peaks, inferred as regions under selection, were considered true signals if they contained the selected site from simulations. We found that most callers can capture the majority of the simulated signals except Refined IBD, which is less sensitive and only detects 2–3 out of 14 selected loci (Figure 5—figure supplement 1a); isoRelate shows an increased level of false positives or low signal-to-noise ratios, evident in IBD coverage curves (Figure 5—figure supplement 1a).

For IBD-based population structure inference, we simulated Pf-like genomes under a selectively neutral condition using the multiple-population demographic model and performed IBD network community detection via InfoMap (Rosvall et al., 2009; Csardi and Nepusz, 2006) to define subpopulations (see Methods). Similarly to IBD-based selection signal detection, we found that IBD inferred from most callers can accurately recapitulate the simulated population structure, comparable to true IBD (Figure 5—figure supplement 1b). The exception is that isoRelate tends to generate many smaller groups, which is likely due to high FN rates for short IBD segments, missing connectivity among distantly related genomes, and only showing closely related subgroups of small sizes. As indicated by the low adjusted Rand indices, there was little agreement between the community labels inferred by isoRelate and the true labels.

For IBD-based N_e inference, we simulated neutral Pf genomes using the single-population model and estimated N_e from detected IBD via IBDNe (Browning and Browning, 2015). We found most of the compared IBD callers suffer from wild oscillations, which have previously been observed (Browning and Browning, 2015), and deviate significantly from the truth for older generations (Figure 5), consistent with the general pattern of high error rates in shorter IBD length bins for these IBD callers (Figure 4—figure supplement 1a, right column). Meanwhile, the IBD inferred by hmmIBD generated highly accurate estimates comparable to true IBD (Figure 5). We explored the mechanisms underlying bias in N_e estimates and found that the strong bias is likely due to the underestimation of population-level total IBD for short IBD segments, which is most obvious for hap-IBD, isoRelate, and phased IBD, followed by Refined IBD (Figure 5—figure supplement 2). For hmmIBD, both the estimates of N_e (Figure 5) and population total IBD are relatively unbiased (Figure 5—figure supplement 2, column 2), consistent with its relatively low and balanced FP/FN rates (Figure 3, leftmost panel). These results suggest that IBD-based N_e estimates are highly sensitive to the quality of input IBD segments, and that hmmIBD is more accurate for this analysis.

Given that N_e estimation in Pf is very sensitive to the quality of detected IBD, we explored whether excluding short, error-prone IBD segments (< 4 cM) could improve N_e estimates for callers other than hmmIBD. The exclusion results in reduced oscillation of the trajectory for some callers, like hap-IBD, but wide confidence intervals or underestimation in older generations, in both simulated and empirical data (Figure 5—figure supplement 3). We then explored reasons underlying the recent oscillation or drop (around 20 generations ago) commonly observed in the estimated N_e trajectories (Figure 5—figure supplement 3a second row and Figure 5—figure supplement 3b, both rows) (Morgan et al., 2020; Browning and Browning, 2011; Harris et al., 2020). We hypothesized that this oscillation is partially due to IBD segments with TMRCA < 1.5 generations ago being included in the IBDNe input (Browning and Browning, 2015). We found that removing these segments can greatly mitigate this problem (Figure 5—figure supplement 3a), especially for hmmIBD and Refined IBD. The findings suggest caution when interpreting (1) a recent drop in an estimated N_e trajectory in empirical data sets where TMRCA-based filtering is less practical, and (2) extremely large estimates in older generations stemming from high error rates for short IBD segments.

To confirm that parameter optimization improves downstream inferences, we compared post-optimization results (Figure 5—figure supplement 1 and Figure 5) with pre-optimization estimates (Figure 5—figure supplement 4 and Figure 5—figure supplement 5). We found that parameter optimization increased the accuracy of selection detection (in isoRelate, Refined IBD, and phased IBD), improved population structure inference (phased IBD and hap-IBD), and reduced oscillation on the N_e trajectory (Refined IBD).

Validation in empirical data sets

We further validated the findings from simulation analysis in empirical data sets, by constructing ‘single’ or ‘multiple’ population data sets based on the MalariaGEN Pf7 data (Abdel Hamid et al., 2023; see Methods for details). As true IBD segments are not available here, we focused on high-level benchmarking by evaluating whether IBD-based downstream estimates are consistent with expected patterns, including N_e estimation and selection signal detection with ‘single’ population data sets and InfoMap population structure inference with the ‘multiple’ population data set (Supplementary file 2—Table S3-S5).

With optimized parameters, all callers, except Refined IBD, capture most known selection signals from the Southeast Asia data set. These signals include selective sweeps associated with antimalarial drug resistance and sexual commitment, such as dihydrofolate reductase (dhfr) (Miotto et al., 2013), multidrug resistance protein 1 (pfmdr1) (Koenderink et al., 2010), amino acid transporter 1 (pfaat1) (Amambua-Ngwa et al., 2019), chloroquine resistance transporter (pfcrt) (Martin and Kirk, 2004), dihydropteroate synthase (dhps) (Brooks et al., 1994), Apicomplexan-specific ApiAP2-g (ap2-g) (Early et al., 2022) and apicoplast ribosomal protein S10 (arps10) (Miotto et al., 2015; Figure 6a). hmmIBD detects more peaks but suffers from noise, likely due to the relatively high FP to FN ratios for short (<4 cM) IBD segments (Figure 2 and Figure 3).

Figure 6 with 1 supplement see all

Download asset Open asset

Validation of the performance of IBD callers in empirical data sets by comparing IBD-based downstream analyses.

(a) IBD coverage and detected selection signals in the SEA data set using different IBD callers (rows 1 to 5). Annotations and corresponding vertical dotted lines at the top indicate the center of known and putative drug resistance genes and genes related to sexual commitment; red shading indicates regions that are inferred to be under positive selection (see Methods for definitions). (b) N_e estimates of the SEA data set based on IBD inferred from different callers. Line plots are point estimates; the shaded areas around the line plots indicate confidence intervals based on bootstrapping (generated by IBDNe). (c) Inference of the population structure of the structured data set by the InfoMap community detection algorithm using the IBD inferred from different IBD callers. The rows of the heatmap are geographic regions of isolates, and the columns are the largest, inferred communities, labeled as c1 to c6. The heatmap color represents the number of isolates in each block with the given row and column labels. The columns are rearranged so that the diagonal blocks tend to have the largest values per row for better visualization. Note that log scales are used in y axes in (b).

Similar to the simulation analysis, IBD detected using most callers resulted in N_e estimates with unrealistic oscillations for the Southeast Asia data set, including extremely large estimates in the more distant past (> 20 generations ago; Figure 6b). The problems are much less severe with the estimates from hmmIBD, which mirrored the expected reduction in malaria in this region due to the intense efforts to eliminate malaria in recent decades (World Health Organization, 2024).

InfoMap-based community detection reveals expected continental population structure across most callers: African Pf parasites are less structured, and Southeast Asian parasites are more structured and distinct from Oceanian parasites (Figure 6c), consistent with previous non-IBD-based methods (Abdel Hamid et al., 2023). isoRelate IBD estimates generate many small, close groups, likely due to high false negative rates for short IBD segments, especially in high transmission settings like Africa where parasites have low relatedness and mainly share short IBD segments (Figure 6c).

To further confirm the improvement of IBD-based estimates through parameter optimization, we performed analyses with IBD detected with unoptimized parameters (see Supplementary file 2—Table S1). We found that the height and number of IBD peaks for hap-IBD and phased IBD decreased significantly (Figure 6—figure supplement 1 versus Figure 6a). Pre-optimization N_e estimates show more extreme oscillations, especially for human-oriented IBD callers (Figure 6—figure supplement 1b). Pre-optimization IBD estimates from hap-IBD and phased IBD fail to reveal the expected population structure, particularly in African parasite populations (Figure 6—figure supplement 1c). These differences underscore the importance of parameter optimization for Pf, especially for IBD callers not validated for Pf.

Computational efficiency comparison

With the decrease in whole-genome sequencing cost and the increase in sample availability, it is important to prioritize IBD callers that scale well for large sample sizes, such as MalariaGEN Pf7 (n = 20,864; Abdel Hamid et al., 2023). We compared the IBD inference time and maximum memory usage for different IBD callers with or without parallelization. When using a single thread, probabilistic inference algorithms like isoRelate, Refined IBD, and hmmIBD are about two orders of magnitude slower than those based on identity-by-state-based or positional Burrows-Wheeler transform (PBWT) based algorithms, such as hap-IBD and phased IBD (Figure 7a). Maximum memory consumption is highest in Refined IBD, with hmmIBD being around 10 times more efficient (Figure 7—figure supplement 1). With multithreading, the patterns are similar to single-thread comparison as most allow parallelization (Figure 7b and Figure 7—figure supplement 1b). The exception is hmmIBD, as it currently only supports single-thread computation. Despite the computational efficiency of hmmIBD compared to isoRelate and its high accuracy in detecting IBD segments from Pf genomes, it remains significantly slower than IBS-based methods, highlighting the need for further enhancements for large data sets like MalariaGEN Pf7.

Figure 7 with 1 supplement see all

Download asset Open asset

Comparison of computational runtime for IBD calling process for different callers.

(a) Runtime for different IBD callers to detect IBD from genomes of different sample sizes in single-thread mode. The comparison is based on Pf genomes of size 100 cM simulated under the single-population model. The x-axis tick labels include the number of pairs of genomes simulated and analyzed (below the plot, reflecting the number of computation units) and the number of haploid genomes (above the plot, representing sample size) analyzed. The line styles and markers for different callers/tools are provided in the legend box on the far right of the figure, which is shared across the two subplots. Values on the y axis represent means (markers) and standard deviations (vertical error lines) from n = 3 sets of independent simulations. Note that at each sample size, the x values for different IBD callers are slightly staggered to prevent the error lines from overlapping; some of the error lines are hard to visualize as they are relatively small. (b) Runtime in multithreading mode. (b) is organized similarly to (a) except that the IBD calling processes were run in multithreading mode with 10 CPU threads. Note that log scales are used in y axes and bottom x axes. Also, see Figure 7—figure supplement 1 for the maximum memory usage for different callers.

Discussion

In this work, we evaluated the reliability of IBD detection methods in high-recombining genomes of malaria parasites. Our findings indicate that low marker density per genetic distance significantly affects IBD detection accuracy. Optimizing parameters of IBD detection methods for Plasmodium genomes enhances the accuracy of detected IBD segments, thereby improving subsequent downstream analyses such as the inference of positive selection signals and population structures. However, variations in performance among IBD detection methods remain substantial, particularly in analyses sensitive to IBD quality, such as the estimation of effective population size (N_e), even after parameter optimization. We generally recommend hmmIBD for IBD estimation in Plasmodium genomes when phased genotype data of haploid genomes are available. We further emphasize the necessity of performance evaluation and parameter optimization for IBD callers prior to application in untested species or scenarios.

Comparing the performance of multiple IBD detection methods requires a unified framework, which should include a uniform definition of accuracy and a simulated ground truth that mimics Pf genomes. Our benchmarking framework incorporates several novel features. First, it utilizes a consistent definition of IBD length-specific accuracy based on the overlap of IBD segment lengths, closely aligned with metrics used in hap-IBD (Zhou et al., 2020), phased IBD (Freyman et al., 2021), and Refined IBD (Browning and Browning, 2013), as detailed in our Methods section. The approach differs from the original evaluation of hmmIBD, which assesses the accuracy of IBD based on the fraction of SNPs that share IBD, which could overlook the precise accuracy of the length of the detected IBD segments (Schaffner et al., 2018). In particular, the original study of isoRelate defined the accuracy (true positive rate) using a less stringent overlap-by-count criterion where a segment is counted as accurate when at least 50% of a detected IBD segment is overlapped by true segments (Henden et al., 2018). Second, we generated Pf-like genomes via population genetic simulation that reflect a realistic distribution of IBD segment lengths. This contrasts with previous studies, where methods such as hap-IBD and Refined IBD used human-like data from population genetic simulations (Browning and Browning, 2013; Zhou et al., 2020), whereas hmmIBD, isoRelate, and phased IBD relied on simulations based on artificial recombination or pedigree models (Henden et al., 2018; Schaffner et al., 2018). These models (non-population-based genetic simulation) often produce long shared IBD segments typical of close relatives, failing to capture the IBD length distribution in population samples predominantly comprising distant relatives. Third, our benchmarking extends beyond the segment-level evaluation of IBD callers and includes downstream inferences of population structure and effective population size, providing a more thorough assessment of their application in real-world analyses.

Of note, our benchmarking highlights the high false negative rate of short segments detected by isoRelate, despite it being developed for malaria parasites. The relatively poor performance of isoRelate compared to hmmIBD is likely due to differences in the underlying HMM models, where the isoRelate model assumes unphased data even when phase is provided in our benchmarking analysis. Given that isoRelate can be applied directly to unphased genotype data, it might outperform hmmIBD when genotype phasing, required by hmmIBD, is error-prone. However, this possibility was not examined in this study and warrants further investigation beyond the current scope.

The density of markers per genetic unit plays a crucial role in IBD detection. IBS-based methods, such as hap-IBD and phased IBD, first identify long IBS segments (≥ 2 centimorgan) as candidate IBD segments, and subsequently merge short ones separated by small gaps (Zhou et al., 2020), allow a certain number of discordant markers to account for phasing errors (Freyman et al., 2021), or eliminate false positive segments by removing candidate segments supported by only a small number of markers (Zhou et al., 2020). Similarly, Refined IBD, which combines an IBS-based method with an HMM probabilistic model, uses a LOD score to decide whether a candidate IBD segment should be rejected or accepted (Browning and Browning, 2013). In these studies, default values for marker density-related parameters were shown to be effective for human genomes but have not been evaluated in high-recombining genomes. We evaluated how different levels of per-genetic-unit marker density affect the detection of IBD segments by varying recombination rates in simulations. With simulated Pf-like genomes of low marker density and high recombination rate, we found that human-oriented IBD callers suffer high false negative rates. One potential explanation is that the thresholds optimized for human data are too stringent for Pf, causing excessive rejection of candidate segments. The effect of marker density on IBD detection is further confirmed by our findings that adjusting the values of marker density-related parameters using a grid-search approach could significantly reduce IBD error rates and generate more accurate IBD-based downstream estimates. Even though IBD accuracy can be improved by parameter optimization, we found that error rates of the detected IBD segments are still higher in Pf genomes than those of human genomes even after IBD caller parameters are optimized. There are several possible explanations for the high error rates of IBD segments detected from low marker density data: (1) Detected IBD segments can only start and end at the genotype marker site, which may not reflect true IBD end points; (2) A lower marker density is linked with greater uncertainty in the distribution of IBD endpoints Browning and Browning, 2020; (3) Ancestral relationships, including IBD, are too difficult to be reliably inferred given limited mutational information (Ishigohoka and Liedvogel, 2025; Speidel et al., 2019; Kelleher et al., 2019; Mehra et al., 2025).

In our benchmark analysis, we used only common biallelic SNPs as markers for inferring IBD, excluding rare variants and indels. The use of this additional information can potentially provide denser genotype information, thus enhancing our understanding of the population’s ancestral relationships, a key aspect on which the inference of the IBD segment depends. For instance, large-scale whole-genome sequencing studies reveal that rare variants account for the majority of all segregating sites (Ahouidi et al., 2021; Taliun et al., 2021), which contain crucial information for deciphering recent evolutionary history. However, rare variants are typically not utilized for two main reasons: (1) rare genotypes are very sparsely distributed across many sites and are less informative per site (Browning and Browning, 2020; Schaffner et al., 2018); (2) rare genotype calls are more prone to genotyping or phasing errors. As a result, including rare variation may cause reduced accuracy in detected IBD segments due to genotyping/phasing errors and increase IBD inference time due to marker density. Although our simulation analysis indicates that including rare variants is of little effect or detrimental for IBD detection (Supplementary file 2—Data S2), the findings could be skewed due to the absence of genotyping errors, the small sample size simulated, and the limited number of cut-off values tested. Further investigation in the context of large sample sizes and varying levels of genotype errors is needed to inform the usability of rare variants for different IBD detection algorithms. Indels are another significant source of underutilized genetic variations in the Pf genome, with abundance on par with that of biallelic SNPs in the MalariaGEN Pf7 data set (Abdel Hamid et al., 2023). These indels are, in part, the result of microplasticity related to the high AT content (up to 90% in non-coding regions) in Pf genomes (Hamilton et al., 2017). Using these variants could also increase the marker density to infer IBD segments. However, additional research is necessary to determine whether the inclusion of these variants can reduce uncertainty in the inference of IBD for Pf or introduce more bias due to challenges such as sequence-read mapping.

While demonstrating high accuracy in IBD detection and downstream analysis, hmmIBD tends to be slower than IBS-based methods like hap-IBD. The trade-off between speed and accuracy may limit hmmIBD’s suitability for large sample-size data sets. Our ongoing efforts are directed towards enhancing the computational efficiency and functionalities of the model used by hmmIBD in an adapted tool, thereby extending its applicability to large-scale data sets of relatively small and high-recombining genomes, such as Plasmodium genomes.

Although a substantial portion of this study concentrated on Pf, the main findings and methodologies may be relevant to high-recombining species beyond Pf. For instance, in regions with intermediate and low malaria transmission, the incidence of Pf has markedly decreased, allowing other species, such as Plasmodium vivax, to become predominant (World Health Organization, 2024). Clinical malaria caused by simian Plasmodium species, for example Plasmodium knowlesi, has also increased in some geographic areas where human Plasmodium species have declined (Amir et al., 2018). We expect that the performance of IBD callers will be similar in other Plasmodium species, given their likely comparable high recombination rates (Bright et al., 2014; Ibrahim et al., 2023); however, the generalization would need further exploration as part of future work, considering variations in evolutionary histories and parasite biology (Escalante et al., 2004; Lee et al., 2011; Loy et al., 2017). Beyond Plasmodium parasites, there are many other high-recombining organisms (Stapley et al., 2017) such as Apicomplexan species like Theileria (Sivakumar et al., 2014), insects like Apis mellifera (honeybee; Kent et al., 2012; Leroy et al., 2024), and fungi like Saccharomyces cerevisiae (Baker’s yeast; Barton et al., 2008; Peter et al., 2018). For these species, our optimized parameters may not be directly applicable, but the benchmarking framework established in this study can be utilized to prioritize and optimize IBD detection methods in a context-specific manner.

While we have conducted numerous simulation analyses complemented by carefully designed validation studies, our work is subject to a few caveats: (1) Our simulations did not explicitly incorporate inbreeding within the complicated life cycle of the parasite (Anderson et al., 2000), except for the increased inbreeding potential due to the reduced population size in the single-population model. Inbreeding can be pervasive, especially in low transmission settings, leading to a change in the length distribution of IBD toward longer segments and a potentially reduced marker density. (2) Our optimization is based on simple accuracy metrics and only focuses on a subset of parameters to allow faster iteration over different values. Investigating a larger parameter space, including genotyping error rate and higher-level accuracy metrics, may generate different optimal values and further improve IBD-based downstream estimates. (3) We assume a constant recombination rate and mutation rate as static genomic/population parameters, rather than traits capable of evolving over time. If this assumption proves to be inaccurate, such as with recombination rates that vary between individuals and populations (Smukowski and Noor, 2011), a more complex benchmarking framework will be required.

In conclusion, we evaluated the performance of existing IBD segment detection methods for analyzing genomic data of the malaria parasite Pf, which is characterized by high recombination rates and low marker densities. Our findings underscore that a high recombination rate, relative to the mutation rate, can compromise the accuracy of detected IBD segments when using methods originally calibrated for the human genome, characterized by a significantly lower recombination rate and a higher marker density (per genetic unit). The accuracy of IBD detection can be improved by parameter optimization via grid search techniques. We advocate for a context-specific evaluation of IBD detection methods when applying them to untested species. Specifically for Pf, our research indicates that hidden Markov model-based probabilistic methods, such as hmmIBD, produce less biased IBD estimates, leading to more accurate downstream inferences. This is especially important for analyses that heavily rely on the accuracy of detected IBD segments, such as N_e inference. These findings will improve the accuracy of IBD detection and downstream analysis, providing more robust estimates of malaria transmission patterns, essential to effective malaria control and elimination efforts.

Methods

Simulation overview

We used population genetic simulations to allow the generation of (1) ground truth, including true IBD segments, true sites under positive selection, true trajectory of population size, and true sub-population assignments (population structure), and (2) inferred patterns, including IBD inferred from phased genotype data via different IBD callers and IBD-based downstream inferences of N_e, positive selection, and population structure. By comparing inferred patterns with ground truth, we calculated metrics at the IBD segment level and the IBD-based downstream estimate level (high level) for benchmarking and optimizing various IBD detection methods for Pf genomes. As described in our accompanying work (Guo et al., 2024), we combined the flexible forward simulator SLiM (Haller and Messer, 2019; Haller et al., 2019) and the efficient coalescent simulator msprime (Baumdicker et al., 2022) to simulate genomes similar to Pf, reflecting the high recombination rate, strong positive selection, and population size shrinkage due to malaria reduction (Figure 1). Detailed simulation parameter values used are provided in Supplementary file 1—Data S1. A detailed implementation of the simulations can be found in a dedicated GitHub repository (https://github.com/bguo068/bmibdcaller_simulations, copy archived at Guo, 2025a).

In these simulations, we assumed constant recombination rates over the genome, such as 6.67 × 10^–7 per base pair per generation for Pf (Amambua-Ngwa et al., 2019; Conway et al., 1999) and 1.0 × 10^–8 for humans (Kong et al., 2002), and a mutation rate of 1.0 × 10^–8 for both Pf (Camponovo et al., 2023; Bopp et al., 2013) and humans unless otherwise specified. Parameters for modeling population size changes, population structure, and positive selection are detailed in the following section or our related publication (Guo et al., 2024).

Simulated demographic models

We used three different demographic models in the simulations, including the single-population model, the multiple-population model, and the UK European human population model (Zhou et al., 2020).

Single- and multiple-population models have been described in our accompanying work (Guo et al., 2024). The single-population model mimics malaria reduction in settings like Southeast Asia, with a population size decreasing from 10,000 to 1000 over the last 200 generations. This model was used to benchmark IBD detection methods at the IBD segment level and the (high) level of downstream estimation, including selection signal detection and N_e estimation. The multiple-population model was mainly used to benchmark IBD calling methods via IBD network-based community detection. Implementation of the two models is provided in a GitHub repository (see Code availability).

The UK human demographic model, similar to the one used in Zhou et al., 2020, simulates a population bottleneck event from a constant size of 10,000 to 3000 that occurred 5000 generations ago, followed by growth at rates of 1.4% and 25% per generation beginning 300 and 10 generations ago, respectively. We simulated 14 chromosomes with a size of 60 cM each for a smaller genome size to reduce simulation time. This demographic model serves as a control to detect IBD segments with human-oriented callers, which can help validate our IBD accuracy evaluation pipeline. We also used this model to test whether demographic models impact the performance of IBD callers by replacing the human recombination rate with that of Pf.

To separate the effects of positive selection from those of demographic models and recombination rates, we mostly simulated neutral genomes by setting the selection coefficient S to 0 for the above models, except in the case where we needed to benchmark IBD callers for detecting positive selection signals.

Positive selection simulation

To evaluate the performance of IBD callers via IBD-based selection signal detection, we simulated positive selection within the single-population model with selection coefficients s of 0.2 with a single origin starting 80 generations ago. Fourteen chromosomes with a size of 100 cM were simulated independently, each with a selected site at 33.3 cM from their left ends. For selection simulation, we conditioned on the establishment of the selective sweeps; that is, the allele under selection should not be lost in the present-day generation. If lost, the simulation was rerun for a maximum of 100 times until the selective sweep was established.

IBD calling and default parameters

We generated true IBD from simulated genealogical trees in the tree sequences using the tskibd algorithm (Guo et al., 2024). Briefly, we sampled local/marginal trees along a chromosome and tracked changes in the most recent common ancestor (MRCA) for each pair of sample nodes. If the MRCA changes, the shared ancestral segment breaks. We report a shared ancestral segment as an IBD segment if its length exceeds a threshold, such as 2 cM.

For inferred IBD, we used phased genotype data as input. Only biallelic sites with a minor allele frequency no less than 0.01 were included, unless otherwise stated. When needed, a genetic map is generated based on the constant recombination rate as specified in the corresponding simulations. For IBD detection methods designed for diploids, we converted each haploid to a pseudo-homozygous diploid. The resulting IBD segments of each pair of pseudo-homozygous diploids (A1/A2 and B1/B2) have redundant information due to the 100% runs of homozygosity. We only keep one pair, A1-B1, and remove other combinations.

As each IBD detection method provides multiple tunable parameters, we detailed values used in Supplementary file 2—Table S1 for both default and optimized scenarios. For the default scenarios, the parameters mostly follow the original documentation. Exceptions are parameters that need consistency across different IBD callers for benchmarking, including the minimum IBD length and the minimum minor allele frequency. The process of obtaining optimized parameter values is described below.

Benchmarking metrics at the IBD segment level

Several metrics were calculated to benchmark IBD methods at the IBD segment level or using their simple aggregates, including the false negative rate and false positive rate, pairwise total IBD, and population-level total IBD per length bin.

The false positive rate and the false negative rate were obtained following the definition used in the work of Zhou et al., 2020. Rates were first calculated per segment via segment-overlapping analysis; then, they were averaged over all segments of the same length bin. The false positive rate per segment is defined as the proportion of a given IBD segment of some genome pair from the inferred set (for example, IBD called via hmmIBD) that is not covered by any IBD segment from the truth set (generated by tskibd), for the same genome pair. Similarly, the false negative rate per segment is defined as the proportion of a given IBD segment of some genome pair from the truth set that is not covered by any IBD segment from the inferred set. The average false positive rate is calculated as the average per-segment false positive rates for all inferred IBD segments the length of which falls in a certain range (length bin); the average false negative rate is calculated as the average per-segment false negative rates for all true segments of a length bin. The following length bins were used: [3-4), [4-6), [6-10), [10-18), [18, inf) cM, similar to Zhou et al. method.

Genome-wide FP/FN rates per pair and their averages across all genome pairs were calculated to capture genome-wide bias. The per-pair genome-wide false positive rate is the ratio of two sums: (1) the numerator sum is the total length of parts of all inferred IBD segments of a certain genome pair that are not covered by any true IBD segments of the same genome pair; (2) the denominator sum is the total length of all inferred IBD segments of a certain genome pair. The per-pair genome-wide false negative rate is defined in a similar way as the percentage of pairwise total true IBD that is not overlapped by any inferred segments of the same pair. We then obtain the aggregate metrics by averaging these rates for all genome pairs.

Pairwise total IBD from truth versus inferred set was calculated as it’s a useful metric to estimate genetic relatedness and build IBD sharing networks. It was calculated as the sum of the lengths of all inferred or true IBD segments of each genome pair.

Given that the N_e estimator IBDNe internally utilizes quantities of population-level total IBD of different length bins, these quantities were calculated here to better examine IBD accuracy for N_e inference. We defined non-overlapping length bins of 0.05 cM width that cover all possible lengths. For each length bin, a population total IBD was defined as the sum of the length of all IBD segments with segment lengths falling into this bin from any genome pair.

To expedite IBD segment-level analysis and alleviate computational burdens, we developed an open-source tool, ishare/ibdutils (available at https://github.com/bguo068/ishare, copy archived at Guo, 2025b), which harnesses algorithms such as interval trees to efficiently calculate metrics like those described above.

IBD caller parameter optimization

We optimized key IBD caller-specific parameters by iterating each parameter over the list of discrete values, or two or more parameters over a grid of discrete values. Many optimized parameters were related to marker density for IBD callers hap-IBD, phased IBD, Refined IBD, and isoRelate. Other parameters were searched to see if they potentially have a great impact on the quality of the detected IBD. The optimal values for explored parameters are determined by the length-bin-specific or genome-wide error rates (FN and FP) for detected IBD segments as defined above. The parameter value or combination of values that generates lower and generally balanced error rates was selected as optimal values. The parameters searched, the value lists explored, and the optimal values selected are summarized in Supplementary file 2—Tables S1 and S2. When the optimized values vary across different demographic models, we used the ones optimized for the single-population model for downstream analyses. We provided detailed simulation and IBD calling parameter values in Supplementary file 1—Data S1 and heatmaps of error rates for all demographic models tested in Supplementary file 2—Table S2.

Benchmarking via IBD-based downstream analyses

At a higher level, we benchmarked different IBD callers by comparing downstream estimates based on true IBD sets versus inferred IBD sets. These IBD-based estimates include positive selection scans, N_e estimates, and population structure inference via the community detection algorithm InfoMap.

We scanned for positive selection signals using the IBD-based thresholding method followed by validation with integrated haplotype score-based statistics X_iHS as previously described (Guo et al., 2024).

We inferred the trajectory of effective population size, N_e, for the last 100 generations using IBDNe. As this method uses IBD shared by diploid individuals as input, we converted each haploid genome to pseudo-homozygous diploid individuals. We inferred the trajectory N_e using most of the default parameters except for setting the minregion parameter to 10 cM to allow the inclusion of short contigs in the analysis. The final estimates are scaled by 0.25 to compensate for the haploid-to-diploid conversion. By default, we used the value of 2 cM for the mincm parameter to only include IBD segments ≥ 2 cM for inferring the N_e trajectory; as indicated in the Results section, we also set mincm to 4 to test whether excluding short IBD segments can improve the accuracy of N_e estimates. As the IBDNe algorithm does not work well when IBD segments shared by close relatives are included in the input, we followed the procedure described in the original work by Browning and Browning, 2015. For simulated data, we utilized the TMRCA information of the true IBD segments to filter the IBD segments before calling IBDNe. For true IBD segments (generated by tskibd), we excluded IBD segments with TMRCA < 1.5; for inferred IBD segments (called by hap-IBD, hmmIBD, isoRelate, Refined IBD, phased IBD), we removed (inferred) segments that overlap with any true IBD segment with TMRCA < 1.5 shared by the same genome pair. For empirical data where true IBD and TMRCA are not available, we pruned highly related isolates by iteratively removing the genome that has the highest number of close relatives defined by pairwise total IBD > 0.5 of genome size until no close relatives are present in the remaining subgroup (Guo et al., 2024).

For population structure inference, we first built the pairwise total IBD matrix, each element being the total IBD for a pair of genomes. The matrix was then squared and used as a weighted adjacency matrix to construct an IBD-sharing network. We then ran the InfoMap algorithm to infer community membership. Genomes assigned the same membership were inferred to be of the same subpopulation. We calculated an adjusted Rand index using the igraph-python package (Csardi and Nepusz, 2006) to analyze the agreement between true population labels and inferred community labels. For empirical data, we excluded IBD segments shorter than 4 cM when calculating the total IBD matrix to help reduce noise due to false positives and set each element with a value < 5 to zero in the unsquared IBD matrix to decrease the density of the IBD matrix.

Before all high-level benchmarking analyses, we pruned highly related samples as mentioned above. As IBD-based estimates can be biased by strong positive selection in empirical data, we removed high IBD-sharing peaks with peak impact index > 0.01 as previously described (Guo et al., 2024) before any downstream analyses.

Processing empirical data sets

We constructed empirical data sets for validation using genotype data from whole-genome sequencing samples from the MalariaGEN Pf7 database (Abdel Hamid et al., 2023). We used malariagen_data 7.13 to download high-quality monoclonal samples that pass quality control. Monoclonal samples were determined by F_ws > 0.95 (F_ws table available from the MalariaGEN website). Quality control labels were extracted directly from the metadata (provided in the malariagen_data package).

We then generated haploid genomes (phased genotype data) using the dominant allele from genotype calls. The dominant allele of each genotype call was determined by the per-sample allele depth (AD) fields. For each sample and site, the allele supported by 90% of total reads (total AD values) in a genotype call with at least 5 total reads was used as the dominant allele. Genotype calls without dominant alleles were marked as missing; those with dominant alleles were replaced with a phased genotype homozygous for the dominant allele. The genotype data were further filtered by sample missingness and SNP minor allele frequency and missingness. The resulting genotype data had per-SNP and per-sample missingness < 0.1 and minor allele frequency ≥ 0.01. Genotypes based on dominant alleles were further imputed without a panel using Beagle 5.1 (Browning et al., 2018). These processing steps generated phased, imputed, pseudo-homozygous diploid genotype data, ready for IBD detection.

We constructed different data sets, including two ‘single’ population data sets and a ‘structured, multiple’ population data set, by subsampling the above haploid genomes according to sampling time and location. For each data set, we set the time window to 2-3 years to reduce the sample time heterogeneity and then shifted the window within all possible sampling years and chose one that maximized the sample size. For the ‘single’ population data sets, we further restricted the sample locations to a relatively small geographic region, such as eastern Southeast Asia, as the data set was used for N_e estimation, which assumes a homogeneous population. For the ‘multiple’ population data set, we included samples from different continental or subcontinental regions using ‘Population’ labels from the meta-information table provided with the MalariaGEN Pf7 database. To make the sample size of each ‘population’ more balanced, we set a maximum number of samples of 300. Populations with samples larger than 300 are subsampled to a size of 300; populations with a size smaller than 100 were not included in the multiple-population data set. The details of the sampling location and time information were summarized in Supplementary file 2—Tables S3, S4 and S5.

Details about the preparation and analysis of the empirical data sets can be found at https://github.com/bguo068/bmibdcaller_empirical (copy archived at Guo, 2025c).

Measuring computational runtime and memory usage

The genomes were simulated with N₀ = 1000 and s = 0.0 under the single-population model. The runtime and maximum memory usage were measured using the GNU time 1.7 utility. To allow a more appropriate comparison, we ensured: (1) the time used for data pre- and post-processing was excluded; (2) memory resource allocation was capped at 30 gigabytes per IBD call; (3) input genotype data included only common SNPs with minor allele frequency ≥ 0.01; (4) the minimum reported IBD segment length was set to 2.0 cM.

Replications and uncertainty of measures

Replications of simulations are conducted at two levels: (1) Replication of simulation sets: For each combination of simulation parameters, we performed n = 3 full sets of simulations of populations and sampled n = 1000 haploid genomes per population. (2) Each of the 14 chromosomes in the genomes of a population was simulated independently, which are replicates of each other (see Supplementary file 1—Data S1 for detailed simulation parameters and replications at simulation set and chromosome levels).

Different analyses reported the uncertainty of measures at various levels of replications or units of observations (as mentioned in the corresponding figure legends and Supplementary file 2—Data S2). The choice was made based on the level of uncertainty most relevant to the measures. (1) For IBD accuracy-based IBD segment overlapping analysis, the mean ± standard deviation (SD) was calculated at the segment level for IBD segment false positive and false negative rates for each length bin, or at the genome-pair level for IBD genome-wide error rates. (2) For IBD-based genetic relatedness, the uncertainty is directly visualized in scatter plots at the genome-pair level. (3) For IBD-based selective signal scans, the mean ± SD of the number of true selection signals (peaks) and false selection signals were calculated at the simulation set level (n = 3 full simulation sets). (4) For IBD network community detection, the mean ± SD of the adjusted Rand index was reported at the simulation set level (n = 3). (5) For IBD-based N_e estimates, bootstrap confidence intervals were obtained directly from IBDNe using a single simulation set. (6) For the measure of computational efficiency and memory usage, the mean ± SD was calculated across chromosomes from the same simulation sets.

Given our large sample size of 1000 haploid genomes, the uncertainty reported at the simulation set level is relatively small and can be measured with a limited number of replications. Additionally, full sets of simulation replications were computationally intensive. Therefore, we opted to run n = 3 full simulation sets when it was necessary to measure uncertainty at the simulation set level. For measures for which uncertainty was reported at the segment level or genome-pair levels, only results from a representative simulation set were reported if the results were consistent across n = 3 simulation sets.

Code availability

Custom tools or scripts were provided in the following GitHub repositories: (1) bmibdcaller_simulations: a Nextflow pipeline to benchmark different IBD detection methods and optimize IBD caller-specific parameters by simulating Plasmodium falciparum-like genomes and using true IBD (https://github.com/bguo068/bmibdcaller_simulations, v0.1.0, copy archived at Guo, 2025a). (2) bmibdcaller_empirical: a Nextflow pipeline to benchmark IBD callers with empirical data by comparing IBD-based estimates with expected patterns (https://github.com/bguo068/bmibdcaller_empirical, v0.1.0, copy archived at Guo, 2025c). (3) ishare/ibdutils: a Rust crate and command-line tools designed to facilitate the analysis of rare-variant sharing and identity-by-descent (IBD) sharing (used here mainly for fast IBD segment overlapping analysis) (https://github.com/bguo068/ishare, copy archived at Guo, 2025b, v0.1.11, command-line tool ibdutils).

Data availability

All empirical data used was publicly available from MalariaGEN Pf7 (https://www.malariagen.net/resource/34/; Abdel Hamid et al., 2023).

References

1. Abdel Hamid MM
2. Abdelraheem MH
3. Acheampong DO
4. Ahouidi A
5. Ali M
6. Almagro-Garcia J
7. Amambua-Ngwa A
8. Amaratunga C
9. Amenga-Etego L
10. Andagalu B
11. Anderson T
12. Andrianaranjaka V
13. Aniebo I
14. Aninagyei E
15. Ansah F
16. Ansah PO
17. Apinjoh T
18. Arnaldo P
19. Ashley E
20. Auburn S
21. Awandare GA
22. Ba H
23. Baraka V
24. Barry A
25. Bejon P
26. Bertin GI
27. Boni MF
28. Borrmann S
29. Bousema T
30. Bouyou-Akotet M
31. Branch O
32. Bull PC
33. Cheah H
34. Chindavongsa K
35. Chookajorn T
36. Chotivanich K
37. Claessens A
38. Conway DJ
39. Corredor V
40. Courtier E
41. Craig A
42. D’Alessandro U
43. Dama S
44. Day N
45. Denis B
46. Dhorda M
47. Diakite M
48. Djimde A
49. Dolecek C
50. Dondorp A
51. Doumbia S
52. Drakeley C
53. Drury E
54. Duffy P
55. Echeverry DF
56. Egwang TG
57. Enosse SMM
58. Erko B
59. Fairhurst RM
60. Faiz A
61. Fanello CA
62. Fleharty M
63. Forbes M
64. Fukuda M
65. Gamboa D
66. Ghansah A
67. Golassa L
68. Goncalves S
69. Harrison GLA
70. Healy SA
71. Hendry JA
72. Hernandez-Koutoucheva A
73. Hien TT
74. Hill CA
75. Hombhanje F
76. Hott A
77. Htut Y
78. Hussein M
79. Imwong M
80. Ishengoma D
81. Jackson SA
82. Jacob CG
83. Jeans J
84. Johnson KJ
85. Kamaliddin C
86. Kamau E
87. Keatley J
88. Kochakarn T
89. Konate DS
90. Konaté A
91. Kone A
92. Kwiatkowski DP
93. Kyaw MP
94. Kyle D
95. Lawniczak M
96. Lee SK
97. Lemnge M
98. Lim P
99. Lon C
100. Loua KM
101. Mandara CI
102. Marfurt J
103. Marsh K
104. Maude RJ
105. Mayxay M
106. Maïga-Ascofaré O
107. Miotto O
108. Mita T
109. Mobegi V
110. Mohamed AO
111. Mokuolu OA
112. Montgomery J
113. Morang’a CM
114. Mueller I
115. Murie K
116. Newton PN
117. Ngo Duc T
118. Nguyen T
119. Nguyen T-N
120. Nguyen Thi Kim T
121. Nguyen Van H
122. Noedl H
123. Nosten F
124. Noviyanti R
125. Ntui VN-N
126. Nzila A
127. Ochola-Oyier LI
128. Ocholla H
129. Oduro A
130. Omedo I
131. Onyamboko MA
132. Ouedraogo J-B
133. Oyebola K
134. Oyibo WA
135. Pearson R
136. Peshu N
137. Phyo AP
138. Plowe CV
139. Price RN
140. Pukrittayakamee S
141. Quang HH
142. Randrianarivelojosia M
143. Rayner JC
144. Ringwald P
145. Rosanas-Urgell A
146. Rovira-Vallbona E
147. Ruano-Rubio V
148. Ruiz L
149. Saunders D
150. Shayo A
151. Siba P
152. Simpson VJ
153. Sissoko MS
154. Smith C
155. Su X-Z
156. Sutherland C
157. Takala-Harrison S
158. Talman A
159. Tavul L
160. Thanh NV
161. Thathy V
162. Thu AM
163. Toure M
164. Tshefu A
165. Verra F
166. Vinetz J
167. Wellems TE
168. Wendler J
169. White NJ
170. Whitton G
171. Yavo W
172. van der Pluijm RW
173. MalariaGEN
(2023) Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples
Wellcome Open Research 8:22.

https://doi.org/10.12688/wellcomeopenres.18681.1
- PubMed
- Google Scholar
1. Ahouidi A
2. Ali M
3. Almagro-Garcia J
4. Amambua-Ngwa A
5. Amaratunga C
6. Amato R
7. Amenga-Etego L
8. Andagalu B
9. Anderson TJC
10. Andrianaranjaka V
11. Apinjoh T
12. Ariani C
13. Ashley EA
14. Auburn S
15. Awandare GA
16. Ba H
17. Baraka V
18. Barry AE
19. Bejon P
20. Bertin GI
21. Boni MF
22. Borrmann S
23. Bousema T
24. Branch O
25. Bull PC
26. Busby GBJ
27. Chookajorn T
28. Chotivanich K
29. Claessens A
30. Conway D
31. Craig A
32. D’Alessandro U
33. Dama S
34. Day NPJ
35. Denis B
36. Diakite M
37. Djimdé A
38. Dolecek C
39. Dondorp AM
40. Drakeley C
41. Drury E
42. Duffy P
43. Echeverry DF
44. Egwang TG
45. Erko B
46. Fairhurst RM
47. Faiz A
48. Fanello CA
49. Fukuda MM
50. Gamboa D
51. Ghansah A
52. Golassa L
53. Goncalves S
54. Hamilton WL
55. Harrison GLA
56. Hart L
57. Henrichs C
58. Hien TT
59. Hill CA
60. Hodgson A
61. Hubbart C
62. Imwong M
63. Ishengoma DS
64. Jackson SA
65. Jacob CG
66. Jeffery B
67. Jeffreys AE
68. Johnson KJ
69. Jyothi D
70. Kamaliddin C
71. Kamau E
72. Kekre M
73. Kluczynski K
74. Kochakarn T
75. Konaté A
76. Kwiatkowski DP
77. Kyaw MP
78. Lim P
79. Lon C
80. Loua KM
81. Maïga-Ascofaré O
82. Malangone C
83. Manske M
84. Marfurt J
85. Marsh K
86. Mayxay M
87. Miles A
88. Miotto O
89. Mobegi V
90. Mokuolu OA
91. Montgomery J
92. Mueller I
93. Newton PN
94. Nguyen T
95. Nguyen T-N
96. Noedl H
97. Nosten F
98. Noviyanti R
99. Nzila A
100. Ochola-Oyier LI
101. Ocholla H
102. Oduro A
103. Omedo I
104. Onyamboko MA
105. Ouedraogo J-B
106. Oyebola K
107. Pearson RD
108. Peshu N
109. Phyo AP
110. Plowe CV
111. Price RN
112. Pukrittayakamee S
113. Randrianarivelojosia M
114. Rayner JC
115. Ringwald P
116. Rockett KA
117. Rowlands K
118. Ruiz L
119. Saunders D
120. Shayo A
121. Siba P
122. Simpson VJ
123. Stalker J
124. Su X
125. Sutherland C
126. Takala-Harrison S
127. Tavul L
128. Thathy V
129. Tshefu A
130. Verra F
131. Vinetz J
132. Wellems TE
133. Wendler J
134. White NJ
135. Wright I
136. Yavo W
137. Ye H
138. MalariaGEN
(2021) An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples
Wellcome Open Research 6:42.

https://doi.org/10.12688/wellcomeopenres.16168.2
- Google Scholar
(2019) Estimating recent migration and population-size surfaces
PLOS Genetics 15:e1007908.

https://doi.org/10.1371/journal.pgen.1007908
- PubMed
- Google Scholar
(2019) Major subpopulations of Plasmodium falciparum in sub-Saharan Africa
Science 365:813–816.

https://doi.org/10.1126/science.aav5427
- PubMed
- Google Scholar
1. Amir A
2. Cheong FW
3. de Silva JR
4. Liew JWK
5. Lau YL
(2018) Plasmodium knowlesi malaria: current research perspectives
Infection and Drug Resistance 11:1145–1155.

https://doi.org/10.2147/IDR.S148664
- PubMed
- Google Scholar
(2000) Do malaria parasites mate non-randomly in the mosquito midgut?
Genetical Research 75:285–296.

https://doi.org/10.1017/s0016672300004481
- PubMed
- Google Scholar
(2008) Meiotic recombination at the ends of chromosomes in Saccharomyces cerevisiae
Genetics 179:1221–1235.

https://doi.org/10.1534/genetics.107.083493
- PubMed
- Google Scholar
1. Baumdicker F
2. Bisschop G
3. Goldstein D
4. Gower G
5. Ragsdale AP
6. Tsambos G
7. Zhu S
8. Eldon B
9. Ellerman EC
10. Galloway JG
11. Gladstein AL
12. Gorjanc G
13. Guo B
14. Jeffery B
15. Kretzschumar WW
16. Lohse K
17. Matschiner M
18. Nelson D
19. Pope NS
20. Quinto-Cortés CD
21. Rodrigues MF
22. Saunack K
23. Sellinger T
24. Thornton K
25. van Kemenade H
26. Wohns AW
27. Wong Y
28. Gravel S
29. Kern AD
30. Koskela J
31. Ralph PL
32. Kelleher J
(2022) Efficient ancestry and mutation simulation with msprime 1.0
Genetics 220:iyab229.

https://doi.org/10.1093/genetics/iyab229
- PubMed
- Google Scholar
1. Bopp SER
2. Manary MJ
3. Bright AT
4. Johnston GL
5. Dharia NV
6. Luna FL
7. McCormack S
8. Plouffe D
9. McNamara CW
10. Walker JR
11. Fidock DA
12. Denchi EL
13. Winzeler EA
(2013) Mitotic evolution of Plasmodium falciparum shows a stable core genome but recombination in antigen families
PLOS Genetics 9:e1003293.

https://doi.org/10.1371/journal.pgen.1003293
- PubMed
- Google Scholar
1. Bright AT
2. Manary MJ
3. Tewhey R
4. Arango EM
5. Wang T
6. Schork NJ
7. Yanow SK
8. Winzeler EA
(2014) A high resolution case study of a patient with recurrent Plasmodium vivax infections shows that relapses were caused by meiotic siblings
PLOS Neglected Tropical Diseases 8:e2882.

https://doi.org/10.1371/journal.pntd.0002882
- PubMed
- Google Scholar
1. Brooks DR
2. Wang P
3. Read M
4. Watkins WM
5. Sims PFG
6. Hyde JE
(1994) Sequence variation of the hydroxymethyldihydropterin pyrophosphokinase: dihydropteroate synthase gene in lines of the human malaria parasite, Plasmodium falciparum, with differing resistance to sulfadoxine
European Journal of Biochemistry 224:397–405.

https://doi.org/10.1111/j.1432-1033.1994.00397.x
- PubMed
- Google Scholar
1. Browning BL
2. Browning SR
(2011) A fast, powerful method for detecting identity by descent
American Journal of Human Genetics 88:173–182.

https://doi.org/10.1016/j.ajhg.2011.01.010
- PubMed
- Google Scholar
1. Browning SR
2. Browning BL
(2012) Identity by descent between distant relatives: detection and applications
Annual Review of Genetics 46:617–633.

https://doi.org/10.1146/annurev-genet-110711-155534
- PubMed
- Google Scholar
1. Browning BL
2. Browning SR
(2013) Improving the accuracy and efficiency of identity-by-descent detection in population data
Genetics 194:459–471.

https://doi.org/10.1534/genetics.113.150029
- PubMed
- Google Scholar
1. Browning SR
2. Browning BL
(2015) Accurate non-parametric estimation of recent effective population size from segments of identity by descent
American Journal of Human Genetics 97:404–418.

https://doi.org/10.1016/j.ajhg.2015.07.012
- PubMed
- Google Scholar
(2018) A one-penny imputed genome from next-generation reference panels
American Journal of Human Genetics 103:338–348.

https://doi.org/10.1016/j.ajhg.2018.07.015
- PubMed
- Google Scholar
1. Browning SR
2. Browning BL
(2020) Probabilistic estimation of identity by descent segment endpoints and detection of recent selection
American Journal of Human Genetics 107:895–910.

https://doi.org/10.1016/j.ajhg.2020.09.010
- PubMed
- Google Scholar
1. Campbell CD
2. Chong JX
3. Malig M
4. Ko A
5. Dumont BL
6. Han L
7. Vives L
8. O’Roak BJ
9. Sudmant PH
10. Shendure J
11. Abney M
12. Ober C
13. Eichler EE
(2012) Estimating the human mutation rate using autozygosity in a founder population
Nature Genetics 44:1277–1281.

https://doi.org/10.1038/ng.2418
- PubMed
- Google Scholar
(2023) Measurably recombining malaria parasites
Trends in Parasitology 39:17–25.

https://doi.org/10.1016/j.pt.2022.11.002
- PubMed
- Google Scholar
(2014) Public health: Measuring the path toward malaria elimination
Science 344:1230–1232.

https://doi.org/10.1126/science.1251449
- PubMed
- Google Scholar
1. Conway DJ
2. Roper C
3. Oduola AMJ
4. Arnot DE
5. Kremsner PG
6. Grobusch MP
7. Curtis CF
8. Greenwood BM
(1999) High recombination rate in natural populations of Plasmodium falciparum
PNAS 96:4506–4511.

https://doi.org/10.1073/pnas.96.8.4506
- PubMed
- Google Scholar
1. Csardi G
2. Nepusz T
(2006)
The igraph software package for complex network research

InterJournal, Complex Systems 1695:1–9.
- Google Scholar
1. Early AM
2. Camponovo F
3. Pelleau S
4. Cerqueira GC
5. Lazrek Y
6. Volney B
7. Carrasquilla M
8. de Thoisy B
9. Buckee CO
10. Childs LM
11. Musset L
12. Neafsey DE
(2022) Declines in prevalence alter the optimal level of sexual investment for the malaria parasite Plasmodium falciparum
PNAS 119:e2122165119.

https://doi.org/10.1073/pnas.2122165119
- Google Scholar
(2004) Assessing the effect of natural selection in malaria parasites
Trends in Parasitology 20:388–395.

https://doi.org/10.1016/j.pt.2004.06.002
- PubMed
- Google Scholar
(2021) Fast and robust identity-by-descent inference with the templated positional burrows-wheeler transform
Molecular Biology and Evolution 38:2131–2151.

https://doi.org/10.1093/molbev/msaa328
- PubMed
- Google Scholar
1. Gardner MJ
2. Hall N
3. Fung E
4. White O
5. Berriman M
6. Hyman RW
7. Carlton JM
8. Pain A
9. Nelson KE
10. Bowman S
11. Paulsen IT
12. James K
13. Eisen JA
14. Rutherford K
15. Salzberg SL
16. Craig A
17. Kyes S
18. Chan M-S
19. Nene V
20. Shallom SJ
21. Suh B
22. Peterson J
23. Angiuoli S
24. Pertea M
25. Allen J
26. Selengut J
27. Haft D
28. Mather MW
29. Vaidya AB
30. Martin DMA
31. Fairlamb AH
32. Fraunholz MJ
33. Roos DS
34. Ralph SA
35. McFadden GI
36. Cummings LM
37. Subramanian GM
38. Mungall C
39. Venter JC
40. Carucci DJ
41. Hoffman SL
42. Newbold C
43. Davis RW
44. Fraser CM
45. Barrell B
(2002) Genome sequence of the human malaria parasite Plasmodium falciparum
Nature 419:498–511.

https://doi.org/10.1038/nature01097
- Google Scholar
(2022) Dcifer: an IBD-based method to calculate genetic distance between polyclonal infections
Genetics 222:iyac126.

https://doi.org/10.1093/genetics/iyac126
- PubMed
- Google Scholar
1. Guo B
2. Borda V
3. Laboulaye R
4. Spring MD
5. Wojnarski M
6. Vesely BA
7. Silva JC
8. Waters NC
9. O’Connor TD
10. Takala-Harrison S
(2024) Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum
Nature Communications 15:2499.

https://doi.org/10.1038/s41467-024-46659-0
- PubMed
- Google Scholar
Software
1. Guo B
(2025a) bmibdcaller_simulations, version swh:1:rev:df4f1520030097b9c4b7217ae8f1561ae87b354d
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:c890bb60d5f8be0330cb8d7031eb030dd8a8eabe;origin=https://github.com/bguo068/bmibdcaller_simulations;visit=swh:1:snp:993410f6cca773a63504ea49a37c1035fe140f3e;anchor=swh:1:rev:df4f1520030097b9c4b7217ae8f1561ae87b354d
Software
1. Guo B
(2025b) ishare, version swh:1:rev:0a9f1add82d91213fa72965d314cb66f65e46733
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:b0ac7d0fe5022f94d87e2bca25a03efb3b6d0cb3;origin=https://github.com/bguo068/ishare;visit=swh:1:snp:0e2437b3dae7975cca3ebecd9a1159a2e3f47a05;anchor=swh:1:rev:0a9f1add82d91213fa72965d314cb66f65e46733
Software
1. Guo B
(2025c) bmibdcaller_empirical, version swh:1:rev:576990262b229544eeec0f54af4eb181a8a73659
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:d621ac8ee80da30c94dc7c52ceb24fe8a3abdbfb;origin=https://github.com/bguo068/bmibdcaller_empirical;visit=swh:1:snp:ac7c545a987b83eda8a72c77ccf62c5a8d4f7089;anchor=swh:1:rev:576990262b229544eeec0f54af4eb181a8a73659
(2025) Potential and pitfalls of using identity-by-descent for malaria genomic surveillance
Trends in Parasitology 41:387–400.

https://doi.org/10.1016/j.pt.2025.03.012
- PubMed
- Google Scholar
(2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data
PLOS Genetics 5:e1000695.

https://doi.org/10.1371/journal.pgen.1000695
- Google Scholar
(2019) Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes
Molecular Ecology Resources 19:552–566.

https://doi.org/10.1111/1755-0998.12968
- PubMed
- Google Scholar
1. Haller BC
2. Messer PW
(2019) SLiM 3: Forward genetic simulations beyond the wright-fisher model
Molecular Biology and Evolution 36:632–637.

https://doi.org/10.1093/molbev/msy228
- PubMed
- Google Scholar
(2017) Extreme mutation bias and high AT content in Plasmodium falciparum
Nucleic Acids Research 45:1889–1901.

https://doi.org/10.1093/nar/gkw1259
- PubMed
- Google Scholar
(2020) Evolutionary history of modern Samoans
PNAS 117:9458–9465.

https://doi.org/10.1073/pnas.1913157117
- Google Scholar
1. Henden L
2. Lee S
3. Mueller I
4. Barry A
5. Bahlo M
(2018) Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens
PLOS Genetics 14:e1007279.

https://doi.org/10.1371/journal.pgen.1007279
- PubMed
- Google Scholar
(2016) Quantitative, model-based estimates of variability in the generation and serial intervals of Plasmodium falciparum malaria
Malaria Journal 15:490.

https://doi.org/10.1186/s12936-016-1537-6
- PubMed
- Google Scholar
(2023) Population-based genomic study of Plasmodium vivax malaria in seven Brazilian states and across South America
The Lancet Regional Health - Americas 18:100420.

https://doi.org/10.1016/j.lana.2022.100420
- Google Scholar
1. Ishigohoka J
2. Liedvogel M
(2025) High-recombining genomic regions affect demography inference based on ancestral recombination graphs
Genetics 229:iyaf004.

https://doi.org/10.1101/2024.02.05.579015
- Google Scholar
1. Jiang H
2. Li N
3. Gopalan V
4. Zilversmit MM
5. Varma S
6. Nagarajan V
7. Li J
8. Mu J
9. Hayton K
10. Henschen B
11. Yi M
12. Stephens R
13. McVean G
14. Awadalla P
15. Wellems TE
16. Su X
(2011) High recombination rates and hotspots in a Plasmodium falciparum genetic cross
Genome Biology 12:R33.

https://doi.org/10.1186/gb-2011-12-4-r33
- PubMed
- Google Scholar
1. Joy DA
2. Feng X
3. Mu J
4. Furuya T
5. Chotivanich K
6. Krettli AU
7. Ho M
8. Wang A
9. White NJ
10. Suh E
11. Beerli P
12. Su X
(2003) Early origin and recent expansion of Plasmodium falciparum
Science 300:318–321.

https://doi.org/10.1126/science.1081449
- PubMed
- Google Scholar
1. Kelleher J
2. Wong Y
3. Wohns AW
4. Fadil C
5. Albers PK
6. McVean G
(2019) Inferring whole-genome histories in large population datasets
Nature Genetics 51:1330–1338.

https://doi.org/10.1038/s41588-019-0483-y
- PubMed
- Google Scholar
1. Kent CF
2. Minaei S
3. Harpur BA
4. Zayed A
(2012) Recombination is associated with the evolution of genome structure and worker behavior in honey bees
PNAS 109:18012–18017.

https://doi.org/10.1073/pnas.1208094109
- Google Scholar
(2010) The ABCs of multidrug resistance in malaria
Trends in Parasitology 26:440–446.

https://doi.org/10.1016/j.pt.2010.05.002
- PubMed
- Google Scholar
(2002) A high-resolution recombination map of the human genome
Nature Genetics 31:241–247.

https://doi.org/10.1038/ng917
- PubMed
- Google Scholar
1. Lee KS
2. Divis PCS
3. Zakaria SK
4. Matusop A
5. Julin RA
6. Conway DJ
7. Cox-Singh J
8. Singh B
(2011) Plasmodium knowlesi: reservoir hosts and tracking the emergence in humans and macaques
PLOS Pathogens 7:e1002015.

https://doi.org/10.1371/journal.ppat.1002015
- PubMed
- Google Scholar
1. Leroy T
2. Faux P
3. Basso B
4. Eynard S
5. Wragg D
6. Vignal A
(2024) Inferring long-term and short-term determinants of genetic diversity in honey bees: beekeeping impact and conservation strategies
Molecular Biology and Evolution 41:msae249.

https://doi.org/10.1093/molbev/msae249
- PubMed
- Google Scholar
1. Loy DE
2. Liu W
3. Li Y
4. Learn GH
5. Plenderleith LJ
6. Sundararaman SA
7. Sharp PM
8. Hahn BH
(2017) Out of Africa: origins and evolution of the human malaria parasites Plasmodium falciparum and Plasmodium vivax
International Journal for Parasitology 47:87–97.

https://doi.org/10.1016/j.ijpara.2016.05.008
- PubMed
- Google Scholar
1. Martin RE
2. Kirk K
(2004) The malaria parasite’s chloroquine resistance transporter is a member of the drug/metabolite transporter superfamily
Molecular Biology and Evolution 21:1938–1949.

https://doi.org/10.1093/molbev/msh205
- PubMed
- Google Scholar
1. McDew-White M
2. Li X
3. Nkhoma SC
4. Nair S
5. Cheeseman I
6. Anderson TJC
(2019) Mode and tempo of microsatellite length change in a malaria parasite mutation accumulation experiment
Genome Biology and Evolution 11:1971–1985.

https://doi.org/10.1093/gbe/evz140
- PubMed
- Google Scholar
1. Mehra S
2. Neafsey DE
3. White M
4. Taylor AR
(2025) Systematic bias in malaria parasite relatedness estimation
G3 15:jkaf018.

https://doi.org/10.1093/g3journal/jkaf018
- PubMed
- Google Scholar
1. Miles A
2. Iqbal Z
3. Vauterin P
4. Pearson R
5. Campino S
6. Theron M
7. Gould K
8. Mead D
9. Drury E
10. O’Brien J
11. Ruano Rubio V
12. MacInnis B
13. Mwangi J
14. Samarakoon U
15. Ranford-Cartwright L
16. Ferdig M
17. Hayton K
18. Su X-Z
19. Wellems T
20. Rayner J
21. McVean G
22. Kwiatkowski D
(2016) Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum
Genome Research 26:1288–1299.

https://doi.org/10.1101/gr.203711.115
- PubMed
- Google Scholar
1. Miotto O
2. Almagro-Garcia J
3. Manske M
4. Macinnis B
5. Campino S
6. Rockett KA
7. Amaratunga C
8. Lim P
9. Suon S
10. Sreng S
11. Anderson JM
12. Duong S
13. Nguon C
14. Chuor CM
15. Saunders D
16. Se Y
17. Lon C
18. Fukuda MM
19. Amenga-Etego L
20. Hodgson AVO
21. Asoala V
22. Imwong M
23. Takala-Harrison S
24. Nosten F
25. Su X-Z
26. Ringwald P
27. Ariey F
28. Dolecek C
29. Hien TT
30. Boni MF
31. Thai CQ
32. Amambua-Ngwa A
33. Conway DJ
34. Djimdé AA
35. Doumbo OK
36. Zongo I
37. Ouedraogo J-B
38. Alcock D
39. Drury E
40. Auburn S
41. Koch O
42. Sanders M
43. Hubbart C
44. Maslen G
45. Ruano-Rubio V
46. Jyothi D
47. Miles A
48. O’Brien J
49. Gamble C
50. Oyola SO
51. Rayner JC
52. Newbold CI
53. Berriman M
54. Spencer CCA
55. McVean G
56. Day NP
57. White NJ
58. Bethell D
59. Dondorp AM
60. Plowe CV
61. Fairhurst RM
62. Kwiatkowski DP
(2013) Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia
Nature Genetics 45:648–655.

https://doi.org/10.1038/ng.2624
- PubMed
- Google Scholar
1. Miotto O
2. Amato R
3. Ashley EA
4. MacInnis B
5. Almagro-Garcia J
6. Amaratunga C
7. Lim P
8. Mead D
9. Oyola SO
10. Dhorda M
11. Imwong M
12. Woodrow C
13. Manske M
14. Stalker J
15. Drury E
16. Campino S
17. Amenga-Etego L
18. Thanh TNN
19. Tran HT
20. Ringwald P
21. Bethell D
22. Nosten F
23. Phyo AP
24. Pukrittayakamee S
25. Chotivanich K
26. Chuor CM
27. Nguon C
28. Suon S
29. Sreng S
30. Newton PN
31. Mayxay M
32. Khanthavong M
33. Hongvanthong B
34. Htut Y
35. Han KT
36. Kyaw MP
37. Faiz MA
38. Fanello CI
39. Onyamboko M
40. Mokuolu OA
41. Jacob CG
42. Takala-Harrison S
43. Plowe CV
44. Day NP
45. Dondorp AM
46. Spencer CCA
47. McVean G
48. Fairhurst RM
49. White NJ
50. Kwiatkowski DP
(2015) Genetic architecture of artemisinin-resistant Plasmodium falciparum
Nature Genetics 47:226–234.

https://doi.org/10.1038/ng.3189
- PubMed
- Google Scholar
1. Morgan AP
2. Brazeau NF
3. Ngasala B
4. Mhamilawa LE
5. Denton M
6. Msellem M
7. Morris U
8. Filer DL
9. Aydemir O
10. Bailey JA
11. Parr JB
12. Mårtensson A
13. Bjorkman A
14. Juliano JJ
(2020) Falciparum malaria from coastal Tanzania and Zanzibar remains highly connected despite effective control efforts on the archipelago
Malaria Journal 19:47.

https://doi.org/10.1186/s12936-020-3137-8
- PubMed
- Google Scholar
1. Nait Saada J
2. Kalantzis G
3. Shyr D
4. Cooper F
5. Robinson M
6. Gusev A
7. Palamara PF
(2020) Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations
Nature Communications 11:6130.

https://doi.org/10.1038/s41467-020-19588-x
- PubMed
- Google Scholar
(2021) Advances and opportunities in malaria population genomics
Nature Reviews. Genetics 22:502–517.

https://doi.org/10.1038/s41576-021-00349-5
- PubMed
- Google Scholar
1. Peter J
2. De Chiara M
3. Friedrich A
4. Yue JX
5. Pflieger D
6. Bergström A
7. Sigwalt A
8. Barre B
9. Freel K
10. Llored A
11. Cruaud C
12. Labadie K
13. Aury JM
14. Istace B
15. Lebrigand K
16. Barbry P
17. Engelen S
18. Lemainque A
19. Wincker P
20. Liti G
21. Schacherer J
(2018) Genome evolution across 1,011 Saccharomyces cerevisiae isolates
Nature 556:339–344.

https://doi.org/10.1038/s41586-018-0030-5
- PubMed
- Google Scholar
(2009) The map equation
The European Physical Journal Special Topics 178:13–23.

https://doi.org/10.1140/epjst/e2010-01179-1
- Google Scholar
1. Schaffner SF
2. Taylor AR
3. Wong W
4. Wirth DF
5. Neafsey DE
(2018) hmmIBD: software to infer pairwise identity by descent between haploid genotypes
Malaria Journal 17:196.

https://doi.org/10.1186/s12936-018-2349-7
- PubMed
- Google Scholar
1. Schaffner SF
2. Badiane A
3. Khorgade A
4. Ndiop M
5. Gomis J
6. Wong W
7. Ndiaye YD
8. Diedhiou Y
9. Thwing J
10. Seck MC
11. Early A
12. Sy M
13. Deme A
14. Diallo MA
15. Sy N
16. Sene A
17. Ndiaye T
18. Sow D
19. Dieye B
20. Ndiaye IM
21. Gaye A
22. Ndiaye A
23. Battle KE
24. Proctor JL
25. Bever C
26. Fall FB
27. Diallo I
28. Gaye S
29. Sene D
30. Hartl DL
31. Wirth DF
32. MacInnis B
33. Ndiaye D
34. Volkman SK
(2023) Malaria surveillance reveals parasite relatedness, signatures of selection, and correlates of transmission across Senegal
Nature Communications 14:7268.

https://doi.org/10.1038/s41467-023-43087-4
- PubMed
- Google Scholar
1. Shemirani R
2. Belbin GM
3. Avery CL
4. Kenny EE
5. Gignoux CR
6. Ambite JL
(2021) Rapid detection of identity-by-descent tracts for mega-scale datasets
Nature Communications 12:3546.

https://doi.org/10.1038/s41467-021-22910-w
- PubMed
- Google Scholar
(2019) Genomic structure and diversity of Plasmodium falciparum in Southeast Asia reveal recent parasite migration patterns
Nature Communications 10:2665.

https://doi.org/10.1038/s41467-019-10121-3
- PubMed
- Google Scholar
(2014) Evolution and genetic diversity of Theileria
Infection, Genetics and Evolution 27:250–263.

https://doi.org/10.1016/j.meegid.2014.07.013
- PubMed
- Google Scholar
1. Smukowski CS
2. Noor MAF
(2011) Recombination rate variation in closely related species
Heredity 107:496–508.

https://doi.org/10.1038/hdy.2011.44
- PubMed
- Google Scholar
1. Speidel L
2. Forest M
3. Shi S
4. Myers SR
(2019) A method for genome-wide genealogy estimation for thousands of samples
Nature Genetics 51:1321–1329.

https://doi.org/10.1038/s41588-019-0484-x
- PubMed
- Google Scholar
(2017) Variation in recombination frequency and distribution across eukaryotes: patterns and processes
Philosophical Transactions of the Royal Society B 372:20160455.

https://doi.org/10.1098/rstb.2016.0455
- Google Scholar
1. Su XZ
2. Ferdig MT
3. Huang Y
4. Huynh CQ
5. Liu A
6. You J
7. Wootton JC
8. Wellems TE
(1999) A genetic map and recombination parameters of the human malaria parasite Plasmodium falciparum
Science 286:1351–1353.

https://doi.org/10.1126/science.286.5443.1351
- Google Scholar
1. Taliun D
2. Harris DN
3. Kessler MD
4. Carlson J
5. Szpiech ZA
6. Torres R
7. Taliun SAG
8. Corvelo A
9. Gogarten SM
10. Kang HM
11. Pitsillides AN
12. LeFaive J
13. Lee S-B
14. Tian X
15. Browning BL
16. Das S
17. Emde A-K
18. Clarke WE
19. Loesch DP
20. Shetty AC
21. Blackwell TW
22. Smith AV
23. Wong Q
24. Liu X
25. Conomos MP
26. Bobo DM
27. Aguet F
28. Albert C
29. Alonso A
30. Ardlie KG
31. Arking DE
32. Aslibekyan S
33. Auer PL
34. Barnard J
35. Barr RG
36. Barwick L
37. Becker LC
38. Beer RL
39. Benjamin EJ
40. Bielak LF
41. Blangero J
42. Boehnke M
43. Bowden DW
44. Brody JA
45. Burchard EG
46. Cade BE
47. Casella JF
48. Chalazan B
49. Chasman DI
50. Chen Y-DI
51. Cho MH
52. Choi SH
53. Chung MK
54. Clish CB
55. Correa A
56. Curran JE
57. Custer B
58. Darbar D
59. Daya M
60. de Andrade M
61. DeMeo DL
62. Dutcher SK
63. Ellinor PT
64. Emery LS
65. Eng C
66. Fatkin D
67. Fingerlin T
68. Forer L
69. Fornage M
70. Franceschini N
71. Fuchsberger C
72. Fullerton SM
73. Germer S
74. Gladwin MT
75. Gottlieb DJ
76. Guo X
77. Hall ME
78. He J
79. Heard-Costa NL
80. Heckbert SR
81. Irvin MR
82. Johnsen JM
83. Johnson AD
84. Kaplan R
85. Kardia SLR
86. Kelly T
87. Kelly S
88. Kenny EE
89. Kiel DP
90. Klemmer R
91. Konkle BA
92. Kooperberg C
93. Köttgen A
94. Lange LA
95. Lasky-Su J
96. Levy D
97. Lin X
98. Lin K-H
99. Liu C
100. Loos RJF
101. Garman L
102. Gerszten R
103. Lubitz SA
104. Lunetta KL
105. Mak ACY
106. Manichaikul A
107. Manning AK
108. Mathias RA
109. McManus DD
110. McGarvey ST
111. Meigs JB
112. Meyers DA
113. Mikulla JL
114. Minear MA
115. Mitchell BD
116. Mohanty S
117. Montasser ME
118. Montgomery C
119. Morrison AC
120. Murabito JM
121. Natale A
122. Natarajan P
123. Nelson SC
124. North KE
125. O’Connell JR
126. Palmer ND
127. Pankratz N
128. Peloso GM
129. Peyser PA
130. Pleiness J
131. Post WS
132. Psaty BM
133. Rao DC
134. Redline S
135. Reiner AP
136. Roden D
137. Rotter JI
138. Ruczinski I
139. Sarnowski C
140. Schoenherr S
141. Schwartz DA
142. Seo J-S
143. Seshadri S
144. Sheehan VA
145. Sheu WH
146. Shoemaker MB
147. Smith NL
148. Smith JA
149. Sotoodehnia N
150. Stilp AM
151. Tang W
152. Taylor KD
153. Telen M
154. Thornton TA
155. Tracy RP
156. Van Den Berg DJ
157. Vasan RS
158. Viaud-Martinez KA
159. Vrieze S
160. Weeks DE
161. Weir BS
162. Weiss ST
163. Weng L-C
164. Willer CJ
165. Zhang Y
166. Zhao X
167. Arnett DK
168. Ashley-Koch AE
169. Barnes KC
170. Boerwinkle E
171. Gabriel S
172. Gibbs R
173. Rice KM
174. Rich SS
175. Silverman EK
176. Qasba P
177. Gan W
178. Papanicolaou GJ
179. Nickerson DA
180. Browning SR
181. Zody MC
182. Zöllner S
183. Wilson JG
184. Cupples LA
185. Laurie CC
186. Jaquish CE
187. Hernandez RD
188. O’Connor TD
189. Abecasis GR
190. NHLBI Trans-Omics for Precision Medicine Consortium
(2021) Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
Nature 590:290–299.

https://doi.org/10.1038/s41586-021-03205-y
- PubMed
- Google Scholar
1. Tang K
2. Naseri A
3. Wei Y
4. Zhang S
5. Zhi D
(2022) Open-source benchmarking of IBD segment detection methods for biobank-scale cohorts
GigaScience 11:giac111.

https://doi.org/10.1093/gigascience/giac111
- PubMed
- Google Scholar
(2019a) Estimating relatedness between malaria parasites
Genetics 212:1337–1351.

https://doi.org/10.1534/genetics.119.302120
- PubMed
- Google Scholar
1. Taylor AR
2. Watson JA
3. Chu CS
4. Puaprasert K
5. Duanguppama J
6. Day NPJ
7. Nosten F
8. Neafsey DE
9. Buckee CO
10. Imwong M
11. White NJ
(2019b) Resolving the cause of recurrent Plasmodium vivax malaria probabilistically
Nature Communications 10:5595.

https://doi.org/10.1038/s41467-019-13412-x
- Google Scholar
1. Thompson EA
(2013) Identity by descent: variation in meiosis, across genomes, and in populations
Genetics 194:301–326.

https://doi.org/10.1534/genetics.112.148825
- PubMed
- Google Scholar
1. Wesolowski A
2. Taylor AR
3. Chang HH
4. Verity R
5. Tessema S
6. Bailey JA
7. Alex Perkins T
8. Neafsey DE
9. Greenhouse B
10. Buckee CO
(2018) Mapping malaria by combining parasite genomic and epidemiologic data
BMC Medicine 16:190.

https://doi.org/10.1186/s12916-018-1181-9
- PubMed
- Google Scholar
1. Wong W
2. Wang L
3. Schaffner SF
4. Li X
5. Cheeseman I
6. Anderson TJC
7. Vaughan A
8. Ferdig M
9. Volkman SK
10. Hartl DL
11. Wirth DF
(2025) MalKinID : A classification model for identifying malaria parasite genealogical relationships using identity-by-descent
GENETICS 229:iyae197.

https://doi.org/10.1093/genetics/iyae197
- Google Scholar
Website
1. World Health Organization
(2024) World malaria report 2024
Accessed December 11, 2024.

https://www.who.int/teams/global-malaria-programme/reports/world-malaria-report-2024
(2020) A Fast and simple method for detecting identity-by-descent segments in large-scale data
American Journal of Human Genetics 106:426–437.

https://doi.org/10.1016/j.ajhg.2020.02.010
- PubMed
- Google Scholar

Article and author information

Author details

Bing Guo
1. Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, United States
2. Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, United States
Contribution
Conceptualization, Software, Formal analysis, Visualization, Writing – original draft

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3998-5981
Shannon Takala-Harrison

Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, United States

Contribution
Supervision, Funding acquisition, Writing – review and editing

Contributed equally with
Timothy D O'Connor

For correspondence
stakala@som.umaryland.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-4674-8500
Timothy D O'Connor

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, United States

Contribution
Supervision, Funding acquisition, Writing – review and editing

Contributed equally with
Shannon Takala-Harrison

For correspondence
timothydoconnor@gmail.com

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-0276-1896

Funding

National Institutes of Health (1R01AI145852)

Shannon Takala-Harrison
Timothy D O'Connor

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This publication uses MalariaGEN data as described in “Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples” MalariaGEN et al, Wellcome Open Research 2023, 8:22 https://doi.org/10.12688/wellcomeopenres.18681.1. This work was supported by NIH 1R01AI145852 granted to ST-H and TDO by the U.S. National Institutes of Health.

Version history

Preprint posted: July 14, 2024
Sent for peer review: August 9, 2024
Reviewed Preprint version 1: January 6, 2025
Reviewed Preprint version 2: July 7, 2025
Version of Record published: August 19, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.101924. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

939

views
50

downloads
1

citation

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Citations by DOI

1

citation for Reviewed Preprint v1 https://doi.org/10.7554/eLife.101924.1

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Bing Guo
Shannon Takala-Harrison
Timothy D O'Connor

(2025)

Benchmarking and optimization of methods for the detection of identity-by-descent in high-recombining Plasmodium falciparum genomes

eLife 14:RP101924.

https://doi.org/10.7554/eLife.101924.3

Categories and tags

Research organism

P. falciparum

Share this article

Cite this article

Overview of methods used in benchmarking IBD detection methods.

High recombination rates reduce genetic marker density and affect the quality of detected IBD segments.

The accuracy of IBD segments detected from Pf genomes varies across IBD callers.

IBD caller-specific parameter optimization can improve the quality of IBD segments inferred from simulated Pf genomes (using hap-IBD as an example).

Post-optimization benchmarking of different IBD callers by comparing downstream estimates Ne .

Validation of the performance of IBD callers in empirical data sets by comparing IBD-based downstream analyses.

Comparison of computational runtime for IBD calling process for different callers.

Author details

Bing Guo

Contribution

Competing interests

Shannon Takala-Harrison

Contribution

Contributed equally with

For correspondence

Competing interests

Timothy D O'Connor

Contribution

Contributed equally with

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Post-optimization benchmarking of different IBD callers by comparing downstream estimates N_e .