A method for low-coverage single-gamete sequence analysis demonstrates adherence to Mendel’s first law across a large sample of human sperm

  1. Sara A Carioscia
  2. Kathryn J Weaver
  3. Andrew N Bortvin
  4. Hao Pan
  5. Daniel Ariad
  6. Avery Davis Bell
  7. Rajiv C McCoy  Is a corresponding author
  1. Department of Biology, Johns Hopkins University, United States
  2. School of Biological Sciences, Georgia Institute of Technology, United States

Abstract

Recently published single-cell sequencing data from individual human sperm (n=41,189; 969–3377 cells from each of 25 donors) offer an opportunity to investigate questions of inheritance with improved statistical power, but require new methods tailored to these extremely low-coverage data (∼0.01× per cell). To this end, we developed a method, named rhapsodi, that leverages sparse gamete genotype data to phase the diploid genomes of the donor individuals, impute missing gamete genotypes, and discover meiotic recombination breakpoints, benchmarking its performance across a wide range of study designs. We then applied rhapsodi to the sperm sequencing data to investigate adherence to Mendel’s Law of Segregation, which states that the offspring of a diploid, heterozygous parent will inherit either allele with equal probability. While the vast majority of loci adhere to this rule, research in model and non-model organisms has uncovered numerous exceptions whereby ‘selfish’ alleles are disproportionately transmitted to the next generation. Evidence of such ‘transmission distortion’ (TD) in humans remains equivocal in part because scans of human pedigrees have been under-powered to detect small effects. After applying rhapsodi to the sperm data and scanning for evidence of TD, our results exhibited close concordance with binomial expectations under balanced transmission. Together, our work demonstrates that rhapsodi can facilitate novel uses of inferred genotype data and meiotic recombination events, while offering a powerful quantitative framework for testing for TD in other cohorts and study systems.

Editor's evaluation

The paper reports a method to study deviations from Mendelian inheritance in genomic data from gametes. The authors use this method to study the existence of the phenomenon in human sperm data but do not find it. The method will be useful for future studies on segregation distortion, and the findings are an important step for the systematic study of segregation distortion in humans and other organisms.

https://doi.org/10.7554/eLife.76383.sa0

eLife digest

Many species on Earth can carry up to two different versions of a given gene, with each of these ‘alleles’ having only a 50/50 chance of being transmitted to the next generation via sexual reproduction. Certain ‘selfish’ sequences, however, can hijack this process and increase their probability of being passed on to an offspring. Known as transmission distortion, this phenomenon may result in alleles spreading through the population even if they are detrimental to fertility.

Transmission distortion has been detected in many species such as flies, mice and some plants. It can take place at various stages during reproduction; for example, the selfish alleles may become overrepresented among eggs or sperm. However, scientists need to study a large number of offspring or reproductive cells to be able to detect whether an allele is inherited more often than expected. This has made it difficult to determine whether transmission distortion also happens in humans, and research so far has resulted in conflicting conclusions.

A recently published dataset of human sperm from 25 donors offered Carioscia, Weaver et al. the opportunity to examine this question. Every volunteer had produced between 969 and 3377 sperm cells, each with about 1% of their genome sequenced. Carioscia, Weaver et al. developed a computational method, which they named rhapsodi, that allowed them to ‘fill in the gaps’ and infer missing regions of the genome for each cell. To do so, they relied on the fact that sperm cells from a given individual are highly related to one another.

With this more complete data at hand, it became possible to look for evidence of transmission distortion by searching for alleles that were overrepresented in sperm from a given donor. No selfish sequence could be detected in any of the 25 individuals, suggesting that human sperm may not be subject to pervasive transmission distortion. Signatures of selfish alleles detected in previous human studies may have therefore not resulted from this mechanism taking place at the sperm level. Instead, transmission distortion in humans could primarily target eggs or happen at later stages (for instance, if embryos carrying the selfish allele have better chances of survival).

The ‘rhapsodi’ method developed by Carioscia, Weaver et al. should allow other scientists to work with datasets for which large portions of the genetic information is missing. It may therefore become easier for researchers to track selfish alleles which are difficult to detect, and to examine bigger, more diverse samples which also include individuals with known fertility challenges.

Introduction

The recent development of high-throughput single-cell genome sequencing of human sperm (termed “Sperm-seq”) (Bell et al., 2020; Leung et al., 2021) offers an opportunity to study various aspects of meiosis and inheritance with improved statistical power. Using a highly multiplexed droplet-based approach, Sperm-seq facilitated sequencing of thousands of sperm from each of 25 donor individuals (n=41,189 total cells), in turn revealing detailed patterns of meiotic recombination and aneuploidy. However, the low sequencing coverage per cell (∼0.01×) necessitates the development of tailored statistical methods for recovering gamete genotypes.

To this end, we developed a method called rhapsodi (R haploid sperm/oocyte data imputation) that uses low-coverage single-cell DNA sequencing data from large samples of gametes to reconstruct phased donor haplotypes, impute gamete genotypes, and map meiotic recombination events (Figure 1). Here, we introduce this method and quantify its performance over a broad range of gamete sample sizes, sequencing depths, rates of recombination, and rates of genotyping error. Key improvements to the haplotype phasing and crossover calling methods from the Sperm-seq paper (Bell et al., 2020) include evaluating model performance over a wide range of possible study designs, directly comparing our method to an existing tool, and offering a thoroughly documented and accessible software package.

Schematic of the methods underlying rhapsodi (R haploid sperm/oocyte data imputation).

Low coverage data from individual gametes (A) is clustered to phase the diploid donor haplotypes (B). A Hidden Markov Model, with tunable rates of genotyping error and meiotic crossover, is applied to trace the most likely path along the phased haplotypes for each gamete (C) thereby imputing the missing gamete genotypes (D) which can be used to discover meiotic recombination events as transitions from one donor haplotype to the other (e.g., purple [H2] to teal [H1] in gamete G4 between SNPs 7 and 8).

We then used the resulting imputed genotype data to test adherence to expected rules of inheritance. Specifically, in typical diploid meiosis, each gamete randomly inherits one of two alleles from a heterozygous parent—a widely supported observation that forms the basis of Mendel’s Law of Segregation. However, many previous studies have also uncovered notable exceptions, collectively termed ‘transmission distortion’ (TD), whereby “selfish” alleles cheat this law to increase their frequencies in the next generation. Indeed, examples of TD have been characterized in nearly all of the classic genetic model organisms, as well as numerous other systems (Fishman and McIntosh, 2019; Koide et al., 2012; Kim et al., 2014; Xu et al., 2014; Tang et al., 2013; Yu et al., 2011; Hulse-Kemp et al., 2015; Dai et al., 2016; Larracuente and Presgraves, 2012; Mcdermott and Noor, 2012; Reinhardt et al., 2014; Eversley et al., 2010; Casellas et al., 2012; Wei et al., 2017). Mechanisms include meiotic drive (Kursel and Malik, 2018), gamete competition or killing (Bravo Núñez et al., 2020), embryonic lethality (Bikard et al., 2009), and mobile element insertion (Ross and Shoemaker, 2018). Such phenomena are frequently associated with sterility or subfertility (Schimenti, 2000; Higgins et al., 2018), but may spread through a population despite negative impacts on these components of fitness (Phadnis and Orr, 2009).

Previous attempts to study TD in humans have revealed intriguing global signals but did not identify individual loci that achieved genome-wide significance and could be discerned from sequencing or analysis artifacts (Mitchell et al., 2003). For example, using data from large human pedigrees, Zöllner et al., 2004 reported a slight excess of allele sharing among siblings (50.43%)—a signal that was diffuse across the genome, with no individual locus exhibiting a strong signature. Meyer et al., 2012 applied the transmission disequilibrium test (TDT) (Spielman et al., 1993) to genotype data from three large datasets of human pedigrees. While multiple loci exhibited signatures suggestive of TD, the authors could not confidently exclude genotyping errors, and the signatures did not replicate in data from additional pedigrees. Similarly, Paterson et al., 2009 applied the TDT to large-scale genotype data from the Framingham Heart Study but attributed the vast majority of observed signals to single-nucleotide polymorphism (SNP) genotyping errors.

Analysis of large samples of gametes, either by pooled (Corbett-Detig et al., 2015; Corbett-Detig et al., 2019) or single-cell genotyping (Meyer et al., 2012), offers an alternative approach for discovering TD, albeit only for mechanisms operating prior to the timepoint at which the gametes are collected (e.g., meiotic drive or gamete killing). Many well-characterized instances of TD across organisms relate to male gametogenesis (Navarro-Dominguez et al., 2022; Verspoor et al., 2020; Corbett-Detig et al., 2019), and genotyping of sperm cells allows isolated investigation of this process, without possible opposing effects of selection at later stages. Meyer et al., 2012, for example, performed sperm genotyping in the attempted replication of their pedigree-based test. Wang et al., 2012 and Odenthal-Hesse et al., 2014 used sequencing and targeted genotyping, respectively, to scan samples of sperm cells for evidence of TD. While neither study observed the long tracts of TD signals that are expected under a classic model of meiotic drive, they did uncover short tracts suggestive of biased gene conversion. This observation is potentially consistent with the reported rapid evolutionary turnover in the landscape of meiotic recombination hotspots (Coop and Myers, 2007). Such hotspots are associated with high rates of crossovers and non-crossovers, given that the repair of meiotic double-strand breaks produces short tracts of gene conversion.

Importantly, previous studies of TD in humans have been limited in their statistical power for detecting weak TD. Power of pedigree-based studies has been constrained by the small size of human families, although power may be gained for common polymorphisms by aggregating signal across multiple families. Gamete-based studies were historically constrained by costs and technical challenges of single-cell genotyping, limiting analysis to relatively few gametes or small portions of the genome. Specifically, previous single-cell studies used sample sizes of fewer than 500 sperm per donor and performed targeted genotyping of specific loci of interest (e.g., as validation of candidate TD hits) (Meyer et al., 2012; Crouau-Roy and Clayton, 2002).

To address these limitations, we applied rhapsodi to published single-cell sequencing data from 41,189 human sperm (969–3377 cells from each of 25 donors) (Bell et al., 2020; Leung et al., 2021). After stringent filtering for segmental duplications and other sources of genotyping error, our results exhibited close concordance with null expectations under Mendelian inheritance, both with regard to individual loci and to aggregate genome-wide signal. Our study thus suggests balanced transmission of alleles to the gamete pool in this sample.

Results

A method for single-gamete sequencing analysis

We developed a method to phase donor haplotypes, impute gamete genotypes, and discover meiotic recombination events using low-coverage single-cell DNA sequencing data obtained from multiple gametes from a given donor (see Materials and methods; Figure 1a). We describe here the default behavior of rhapsodi, but note in later Results sections and in the Materials and methods additional options and arguments available to the user. Briefly, chromosomes are segmented into overlapping windows, and within each window, the sparse gamete genotype observations at detected heterozygous SNPs (hetSNPs) are clustered to distinguish the two haplotypes (i.e., phase the genotypes) of the diploid donor individual (Figure 1b). The sequences of alleles that compose these haplotypes are decoded based on majority ‘votes’ within each cluster, and haplotypes from overlapping windows are then stitched together based on sequence identity, thereby achieving chromosome-scale phasing. A Hidden Markov Model (HMM) with (1) emission probabilities defined by rates of genotyping error and (2) transition probabilities defined by rates of meiotic crossover is then used to infer the most likely path along the phased haplotypes for each gamete (Figure 1c), thereby imputing missing genotype data (Figure 1d). Points where the paths are inferred to transition from one donor haplotype to the other suggest the locations of meiotic recombination events.

Evaluating performance on simulated data

To benchmark rhapsodi’s performance, we developed a generative model to construct input data with varying gamete sample sizes, sequencing depths of coverage, rates of meiotic recombination, and rates of genotyping error (Figure 2—figure supplement 1). We then applied rhapsodi to the simulated data, matching input parameters to those used in the simulations (average of 1 recombination event per chromosome and genotyping error rate of 0.005). Phasing was assessed based on accuracy, completeness, switch error rate, and largest haplotype segment (Figure 2a, Figure 2—figure supplement 2a). Briefly (but see Materials and methods for complete definitions), ‘accuracy’ is defined as the proportion of positions where the inferred sequence matches the truth sequence. ‘Completeness’ is defined as the proportion of non-missing genotypes. ‘Switch error rate’ is defined as the number of tracts of adjacent mismatches between the inferred and truth sequence, divided by the total number of sequence positions (see Figure 2—figure supplement 3). ‘Largest haplotype segment’ is defined as the longest tract of adjacent matches between the inferred and truth sequence.

Figure 2 with 10 supplements see all
Benchmarking performance across a range of study designs.

Values represent the average of three independent trials. FDR: False Discovery Rate; TPR: True Positive Rate. For phasing and imputation, gray indicates that no hetSNPs remained after downsampling. For meiotic recombination discovery, gray indicates the absence of a prediction class (e.g., zero FNs, FPs, TNs, or TPs). Simulations roughly matching the characteristics of the Sperm-seq data are outlined in red.

Across the range of study designs, we observed that phasing performance (of the default method, termed ‘windowWardD2’) improved with increasing amounts of data (i.e., increased gamete sample size and coverage). For specific scenarios involving low coverages or small numbers of gametes, this relationship was not always monotonic. This suggests that other parameters that we currently hold fixed (e.g., window size used in phasing) may interact and influence performance. We therefore added an option in rhapsodi to optimize window size based on features of the input data (Figure 2—figure supplement 4; see Materials and methods section titled ‘Automatic phasing window size calculation’). With the exception of very small gamete counts, we observed that phasing performance reaches a plateau at ∼0.1× coverage. However, large sample sizes of gametes can compensate for lower coverages, leading to high performance. Qualitatively similar trends were observed for the tasks of imputing gamete genotypes and discovering meiotic recombination breakpoints (Figure 2b and Figure 2c; Figure 2—figure supplement 2b and Figure 2—figure supplement 2c).

Discovery of meiotic recombination events exhibited the weakest relative performance among the three tasks (phasing donor haplotypes, imputing gamete genotypes, and discovering meiotic recombination events), although still strong in absolute terms. This is likely due to a combination of (1) this task’s dependence on the successful completion of the previous two tasks, (2) the simplifying assumptions employed within the generative model, and (3) the inherent challenge of this task in data-limited scenarios (i.e., those with low coverage or few SNPs). In relation to the last point, we observed that the resolution of inferred meiotic recombination breakpoints (i.e., the length of the genomic intervals to which inferred crossover events could be localized) was strongly associated with the depth of coverage of the input data (Figure 2—figure supplement 5a), as well as the density of underlying hetSNPs across the genome. Assuming a pairwise nucleotide diversity of 1 per 1000 basepairs (bp) (Sachidanandam et al., 2001) and given that the theoretical limit of resolution is two hetSNPs, we found that a coverage of 2.3× (i.e., a missing genotype rate of 10%) was required to approach this theoretical limit. Meanwhile, for coverage resembling the Sperm-seq data (Bell et al., 2020) (∼0.01×), we estimate a median resolution of 167.5 kilobase pair (kbp), in line with empirical observations from the original study (Bell et al., 2020).

Formulating discovery of simulated meiotic recombination events as a classification problem where a predicted recombination breakpoint (or lack thereof) could either be a true positive (TP), true negative (TN), false positive (FP), or false negative (FN) prediction (Figure 2—figure supplement 5b), we assessed rhapsodi’s performance by computing a false discovery rate (FDR), true positive rate (TPR), and F1 Score, as well as several related metrics (Figure 2c; Figure 2—figure supplement 2c). Briefly (but see Materials and methods for complete definitions), the ‘FDR’ is the ratio of false predicted recombination breakpoints to the total number of predicted breakpoints. The ‘TPR’ is the ratio of true predicted breakpoints to total simulated breakpoints. The ‘F1 Score’ is the harmonic mean of precision (ratio of true predicted breakpoints to total predicted breakpoints) and TPR (also called ‘recall’). As was observed for phasing and imputation, meiotic crossover discovery improved as the amount of data increased, as indicated by increasing F1 scores and TPRs and decreasing FDRs.

Through closer investigation of the locations of FP and FN recombination events (Figure 2—figure supplement 5b), we identified three typical error modes. Specifically, we attribute the vast majority of FNs to (1) crossovers occurring near the ends of chromosomes or (2) pairs of crossovers occurring in close proximity to one another, especially at low coverages (Figure 2—figure supplement 5c). In the case of co-occurring FNs, the genotype data may be too sparse to capture one or more informative markers that flank the recombination breakpoint(s). Notably, such nearby crossovers should be mitigated by the phenomenon of crossover interference (Broman and Weber, 2000), which causes crossovers to be spaced farther apart than expected by chance. By simulating crossover locations under a uniform distribution, our benchmarking strategy is thus conservative in regard to this specific error mode (i.e., over-estimating the FN rate). However, our estimates of the FN rate may be under-conservative in regard to the terminal edges of chromosomes, which have been shown to exhibit high rates of recombination in males (Halldorsson et al., 2019). A third mode of error, which manifests as pairs of FPs and FNs, owes to slight displacement of the inferred crossover breakpoint (Figure 2—figure supplement 5c), which may arise by consequence of premature or delayed switching behaviors of the HMM.

For study designs mirroring the published Sperm-seq data (Bell et al., 2020) (1000 gametes, 30,000 hetSNPs, coverage of 0.01×), rhapsodi phased donor haplotypes with 99.993 (±0.003)% accuracy and 99.96 (±0.03)% completeness (Figure 2a); imputed gamete genotypes with 99.962 (±0.002)% accuracy and 99.34 (±0.04)% completeness (Figure 2b); and discovered meiotic recombination breakpoints with a mean F1 Score of 0.959 (±0.003), a mean FDR of 1.8 (± 0.3)%, and a mean TPR of 93.7 (±0.3)% (Figure 2c). Values are reported as the mean, plus or minus one standard deviation.

We next assessed rhapsodi’s robustness to parameter mis-specification by altering the recombination and genotyping error rate parameters relative to those used in generating the simulated data. Only one parameter was mis-specified at a time, while the other was matched to the simulation. While our results suggest overall robustness to model mis-specification, we found that underestimating the genotyping error rate or recombination rate (Figure 2—figure supplement 6 and Figure 2—figure supplement 7) had a greater effect on performance than overestimating either of these parameters (Figure 2—figure supplement 8 and Figure 2—figure supplement 9). In practice, such parameters may be informed based on outside knowledge for a given species (e.g., recombination rate ≈ 1 × 10-8 per bp for humans) and sequencing platform (e.g., error rate ≈ 0.005 per bp for Illumina short-read sequencing; Lou et al., 2013).

rhapsodi is designed to work for large existing datasets such as Sperm-seq (Bell et al., 2020; Leung et al., 2021) and to remain applicable as future single-gamete sequencing datasets grow in size. Specifically, rhapsodi was rigorously benchmarked for datasets containing up to 5000 gametes per donor with 100,000 SNPs per chromosome at coverages up to 2.3× (Figure 2). However, rhapsodi is capable of analyzing much larger datasets, as we demonstrate through its successful application to simulated data comprising 32,900 gametes (0.01× coverage, 90,795 SNPs) in under 24 hr, multi-threaded on 48 CPU cores (Figure 2—figure supplement 10). This represents a dataset ∼20× the size of that produced by Bell et al., 2020.

Benchmarking against existing methods: Hapi and HapCUT2

We compared rhapsodi to the existing software tool Hapi, which was previously developed for diploid donor phasing, gamete genotype imputation, and crossover discovery (Li et al., 2020), as well as HapCUT2, which was originally developed for read-based phasing of diploid samples (Edge et al., 2017) but can be adapted for single-gamete sequence-based phasing using an approach inspired by Bell et al., 2020. As the latter approach assumes that alleles originating from a single gamete and chromosome are linked, we hereafter refer to this adaptation as ‘linkedSNPHapCut2’.

Hapi was previously shown to outperform the only other haploid-based algorithm, PHMM (pairwise Hidden Markov Model) (Hou et al., 2013), as well as two diploid-based phasing methods, WhatsHap (Martin et al., 2016) and HapCUT2 (standard implementation) (Edge et al., 2017) in terms of accuracy, reliability, completeness, and cost-effectiveness (Li et al., 2020). While those results were based on different data characteristics than those encountered in our study, we selected Hapi for our phasing, gamete genotype imputation, and crossover discovery comparisons because it was designed specifically for low-coverage gamete imputation, is a reproducible and user-friendly package, and outperformed the existing programs considered in Li et al., 2020. We compared the performance of rhapsodi to Hapi using simulated gamete sample sizes ranging from 3 to 150 and depths of coverage ranging from 0.001× to 2.3× (Figure 3).

Figure 3 with 4 supplements see all
Comparison of the performance of rhapsodi to Hapi, an existing gamete genotype imputation tool.

For each panel, we depict the difference in performance between the tools (rhapsodi minus Hapi). Each point represents a simulated dataset, and only datasets successfully analyzed by both tools are displayed.

Hapi was designed and optimized for use with low numbers of gametes and was benchmarked using datasets where coverage was greater than 1× (Li et al., 2020). Hapi and rhapsodi performed comparably under these conditions (Figure 3). Datasets with more than 150 gametes were not possible to benchmark because Hapi’s runtimes with larger sample sizes became intractable, taking up to 39 hr per simulated dataset (compared to less than 90 s for rhapsodi; see Figure 2—figure supplement 10 for a comparison of rhapsodi’s runtimes). Of 2916 simulated datasets, Hapi phased, imputed, and detected crossovers in 1902 datasets (65%), while rhapsodi completed the three tasks in 2754 datasets (94%) (Figure 3—figure supplement 1). For datasets that Hapi could not analyze, rhapsodi maintained high accuracy and completeness (Figure 3—figure supplement 2), with low cost to performance in comparison to datasets that both tools analyzed successfully. Hapi typically achieved high phasing accuracy, but often at the cost of completeness (Figure 3a). In contrast, rhapsodi exhibited relative balance between accuracy and completeness. Across a large range of study designs, rhapsodi phased and imputed a greater proportion of SNP genotypes than Hapi with little cost to accuracy (Figure 3a, Figure 3b). This improvement in completeness was most pronounced at the low coverages (<0.1×) that characterize the Sperm-seq data (Figure 3).

While Hapi was previously shown to outperform HapCUT2 (original implementation) in phasing single-gamete genomes (Li et al., 2020), Bell et al., 2020 adapted HapCUT for use with single gamete sequencing data by assuming that alleles originating from the same gamete cell and chromosome were linked. We thus developed a pipeline (to which we refer as ‘linkedSNPHapCut2’) that converts the files into the necessary format and executes HapCUT2, largely following the previously developed pipeline and detailed in Materials and methods (Bell et al., 2020; Edge et al., 2017). We then benchmarked rhapsodi’s default windowWardD2 phasing approach against linkedSNPHapCUT2. The data for these simulations are explained in the Methods section ‘Assessing performance with simulation’. We applied linkedSNPHapCUT2 to 5713 simulated datasets, on which it ran successfully in 77% of cases, but failed in 21% of cases based on the computing resources available (3 days runtime, 185 Gb memory). Another 1% were unable to be processed due to extreme low coverages (88% below 0.001×) resulting in too few SNPs per chromosome (43 simulations with 1 SNP and 32 simulations with 2–5 SNPs) (Edge et al., 2017). Of the simulations that failed with linkedSNPHapCut2, 43% were successfully phased with rhapsodi’s default windowWardD2 method.

The major cost of the linkedSNPHapCUT2 approach is the time and memory resources necessary to convert the input data files to the format necessary for use in HapCUT2 (Figure 3—figure supplement 4). Because windowWardD2 operates in windows along the chromosome, it runs as a multithreaded process. As such, its overall system time is larger than that of linkedSNPHapCUT2 for datasets with high numbers of gametes, but wall-clock time may be much lower. Both options for phasing within rhapsodi offer high levels of completeness and accuracy across most study designs (i.e., sequencing coverage and number of gametes), including those matching the Sperm-seq dataset (Bell et al., 2020). Overall, we find that in data-limited scenarios, linkedSNPHapCUT2 phases haplotypes with a higher completeness than the default windowWardD2 method, but with comparable accuracy (Figure 3—figure supplement 3). For higher numbers of gametes, windowWardD2 offers comparable completeness and much higher accuracy (Figure 3—figure supplement 3), partially due to the HapCUT2 approach ignoring the positional information that was already encoded in the alignment. In doing so, this method does not take full advantage of the co-inheritance patterns of the SNP alleles, which is an advantage offered by the windowWardD2 approach. Based on these results, we include both windowWardD2 and linkedSNPHapCUT2 in rhapsodi as alternative methods for phasing and recommend use of the latter in scenarios with small numbers of gametes and low coverage.

Applying rhapsodi to data from single-cell human sperm genomes

Given the strong performance of our method on simulated data, we proceeded to analyze published (Bell et al., 2020; Leung et al., 2021) single-cell DNA sequencing data from 41,189 human sperm (969–3377 cells from each of 25 donors) (Figure 5—figure supplement 1). These data possessed an average sequencing depth of ∼0.01× coverage per cell, with a range of ∼0.002× to ∼0.03×. Of the 25 sperm donors, samples from 20 individuals were obtained from a sperm bank and were of presumed (but unknown) normal fertility status (Bell et al., 2020), while five donors had known reproductive issues (failed fertilization after intracytoplasmic sperm injection [n=2] or poor blastocyst formation [n=3]) (Leung et al., 2021). Using principal component analysis, we compared the genetic similarity of each donor individual to globally diverse populations from the 1000 Genomes Project (Auton et al., 2015, Figure 5—figure supplement 2). A total of 16 sperm donor individuals clustered with reference samples from the European superpopulation, three donors clustered with reference samples from the East Asian superpopulation, and 2 donors clustered with reference samples from the South Asian superpopulation. Three donors clustered on an axis between the African and European superpopulations, consistent with the admixed African American backgrounds reported at sample collection (Bell et al., 2020). Meanwhile, one donor (NC26) showed similarities with both the African and East Asian populations, again consistent with the potential admixed ancestry information reported at sample collection (Bell et al., 2020).

Before applying rhapsodi, we performed stringent filtering to mitigate sequencing, alignment, and genotyping errors (see Materials and methods section titled ‘Genotype filtering to mitigate spurious TD signatures’). Specifically, we removed low-quality cells and those called as aneuploid for the chromosome of interest; excluded regions of low mappability or extreme repeat content; restricted analysis to known SNPs from the 1000 Genomes Project (Auton et al., 2015); and excluded regions exhibiting coverage abnormalities (e.g., suggestive of segmental duplications) in each donor individual’s genome (Figure 5—figure supplement 3). Applying rhapsodi to these filtered data, we phased donor haplotypes at an average of 99.9 (±0.16)% of observed hetSNP positions per donor and chromosome (i.e. leaving ∼0.1% as unknown phase); imputed an average 99.3 (±1.23)% of genotypes per gamete and chromosome; and identified an average of 1.17 (median of 1) meiotic recombination events per gamete and chromosome, with a mean of 25.75 and mode of 24 autosomal crossovers per gamete genome, broadly consistent with the reported lengths of autosomal genetic maps for males (Broman et al., 1998; Halldorsson et al., 2019). Values are reported as the mean, plus or minus one standard deviation. The average resolution of meiotic recombination breakpoints was 664 kbp (±1.25 Mbp), with a median of 357 kbp, in line with empirical observations from the original study (Bell et al., 2020). As previously reported (Bell et al., 2020), the inferred crossover locations were concentrated in similar genomic regions across all 25 donors, with the highest densities occurring near the ends of chromosomes (Figure 5—figure supplement 4). The overall distributions of crossover locations were qualitatively similar to the patterns observed in the prior analysis of a subset of these donors (Bell et al., 2020), as well as strongly correlated with a published male-specific recombination map inferred by trio sequencing of an Icelandic population (r=0.9) (Halldorsson et al., 2019, Figure 5—figure supplement 5). The modest discrepancies between these maps may be driven by a combination of biological (e.g., PRDM9 genotype, age, etc.) and technical factors—the latter underscored by the observation that the sperm donor sample-specific correlation with the Halldorsson et al., 2019 map was positively associated with the average depth of coverage of sperm cells from those donors (β^ = 5.1, SE = 1.6, p-value = 0.00356).

Statistical power to detect moderate and strong TD

The scale of the Sperm-seq dataset offers strong statistical power to detect even slight deviations from Mendelian expectations, as supported by our simulations of TD across a range of gamete sample sizes and transmission rates (Figure 4, Figure 4—figure supplement 1). The 25 donors have an average of 1711 gametes each (range: 969–3377). With this average sample size, we estimate a statistical power of 0.681 to detect deviations of 0.07 (i.e. 57% transmission of one allele in a single donor) and 0.912 to detect deviations of 0.08, accounting for multiple hypothesis testing (p-value threshold = 1.78 × 10-7; see below and Materials and methods). For an individual with 950 gametes, we estimate a statistical power of 0.637 to detect deviations of 0.09 and power of 0.84 to detect deviations of 0.1.

Figure 4 with 4 supplements see all
Simulation demonstrating power to detect deviations from binomial expectations across sample sizes of sperm, without (A) and with (B) multiple hypothesis testing correction.

For each combination of transmission rate and number of gametes, power was calculated based on 1000 independent simulations and assuming full knowledge of gamete genotypes. Panel A uses the standard α = 0.05, while panel B uses an adjusted p-value threshold of 1.78 x 10-7 as employed in our study. Note that this correction is conservative in that it adjusts for multiple testing across the genome as well as across donor individuals. Red arrows indicate gamete sample sizes roughly matching the Sperm-seq data (average n = 1711 sperm cells per donor).

Cases of extreme TD pose a potential technical concern, as such loci may appear as homozygous across a sample of sperm, thereby evading detection without outside knowledge of donor hetSNPs. However, our simulations of extreme TD (transmission rate, k = 0.99) demonstrate that despite the homozygosity of the causal SNP and nearby SNPs in near-perfect linkage disequilibrium (LD), recombination in flanking regions recovers heterozygosity but still manifests as extreme and detectable TD (Figure 4—figure supplement 3). Specifically, across 2200 simulations (100 independent simulations × 22 chromosomes; k = 0.99) with parameters matching a typical Sperm-seq donor, we identified the TD signature in all 2200 cases (Power = 1) despite homozygosity (and thus filtering) of the causal SNP in most cases (1958/2200 [89%]). This high power also holds for donor samples with higher (coverage = 0.01; Power = 1) and lower (coverage = 0.002; Power = 1) coverages, respectively (see Materials and methods).

Adherence to Mendelian expectations across sperm genomes

Encouraged by these power simulations, we proceeded to analysis of the Sperm-seq data. Before applying rhapsodi, we performed stringent filtering to mitigate sequencing, alignment, and genotyping errors that could confound our downstream tests of TD, as described above. While our stringent filtering removed a total of 15,138,461 (∼30%) SNPs from the input data, we emphasize that true signatures of TD should be minimally affected by such filtering, as such signatures are expected to extend across large genomic intervals due to the extreme nature of LD among the sample of gametes (Figure 4—figure supplement 2). For each of the 25 donors, we scanned across the imputed sperm genotypes, performing a binomial test for each detected hetSNP.

Naive multiple testing approaches based on the nominal number of tests would be over-conservative in this context, given the extreme levels of LD across the sample of closely related sperm. We therefore applied a principal component analysis-based approach to infer the effective number of independent statistical tests (Gao et al., 2008; Gao et al., 2010) and used this value as the basis of a Bonferroni correction that additionally accounted for the multiple testing across donor individuals (p-value threshold = 1.78 × 10-7; see Materials and methods). After applying this correction, no individual SNP exhibited evidence of TD at the level of genome-wide significance (Figure 5). The distribution of transmission ratio for each allele is shown, by donor, in Figure 5—figure supplement 6.

Figure 5 with 6 supplements see all
Manhattan plot displaying genome-wide TD scan results for all 25 sperm donors across the 22 autosomes.

P-values are correlated across large genomic intervals due to the high degree of linkage disequilibrium among sperm cells from a single donor. Colors distinguish results from different donors. No individual test was significant after multiple testing correction (p-value threshold = 1.78 × 10-7).

The strongest signal of TD occurred at the end of chromosome 2 (transmission rate = 833 of 1571 [56.2%]; p-value = 9.6×10-7) for donor NC18 (depicted in dark yellow in Figure 5) and encompassed 7 SNPs (rs7603674, rs7578293, rs7567762, rs12478306, rs2084684, rs2385305, rs2100268) within this gene-poor region. These linked SNPs segregate at frequencies of ∼0.25 in European populations of the 1000 Genomes Project (Auton et al., 2015; Marcus and Novembre, 2017; of greatest relevance given the genetic similarities between this donor and individuals from the European superpopulation based on principal component analysis [Figure 5—figure supplement 2]).

As noted above, extreme TD (e.g., transmission rate >0.9) could manifest as tracts of homozygosity across the sperm sample at the causal and nearby sites, although TD signal should be detectable in flanking regions, where recombination has restored heterozygosity (Figure 4—figure supplement 3). While tracts of apparent homozygosity are indeed observed within the Sperm-seq data (including two donors with tracts exceeding 3.5 Mbp), such tracts are better explained by filtering of technically challenging regions (e.g., low mappability; see Bell et al., 2020; Materials and methods), heterozygous deletions, or tracts of homozygosity (e.g., due to identity by descent; Browning and Browning, 2011), as in all cases the TD signal at flanking SNPs was unremarkable. Specifically, no site within 1 Mbp of the window of homozygosity yielded a statistically significant result (minimum p-value = 0.072).

No global signal of biased transmission in human sperm

We also observed that the combined distribution of p-values closely mirrored the expected distribution under the null hypothesis of no TD (Figure 6). This absence of global signal is an intriguing contrast to the results of Zöllner et al., 2004, especially given the strong statistical power underlying our analysis. We do note that Zöllner et al., 2004 and other pedigree studies are based on surviving offspring (rather than gametes) and thus may reflect additional sources of TD, as well as selection (see Discussion). To quantify global evidence of TD in our dataset, we calculated the overall degree of allele sharing at observed hetSNPs among sperm from the same donor (Figure 6). After pruning the data to SNPs in approximate linkage equilibrium (see Materials and methods section titled ‘Calculation of global signal of TD’), we found that across all pairwise comparisons of sperm from each of 25 donors, the average proportion of shared alleles was 0.499996, consistent with the distribution of this proportion under null simulations with no TD (n=500 simulations; p-value = 0.47).

Quantifying global evidence of TD.

(A) Quantile-quantile plot comparing the distribution of p-values from our genome-wide scan for TD to that expected under the null hypothesis. The dashed line corresponds to x = y. (B) Mean allele sharing across all pairs of sperm from null simulations (n=500) is depicted with grey bars. The same mean computed on the observed data (0.499996) is depicted with a red line, while the null proportion of 0.5 is depicted with a gray dashed line.

Discussion

Here, we developed a genotype phasing and imputation method, rhapsodi, that leverages sparse gamete genotype data to phase the diploid genomes of the donor individuals, impute missing gamete genotypes, and discover meiotic recombination breakpoints. rhapsodi is available as a thoroughly documented and accessible software package. We benchmarked its performance across a wide range of study designs, with parameters including and much larger than those from existing gamete sequencing datasets.

We then applied rhapsodi to published single-cell sequencing data from individual human sperm (n=41,189; 969–3377 cells from each of 25 donors) and tested for TD. In contrast to the tentative locus-specific and genome-wide signals reported in previous studies, our results were consistent with adherence to Mendel’s Law of Segregation across this large sample. The signals that we observed are well explained based on the expected variance of the binomial process of Mendelian segregation under the null hypothesis of balanced transmission genome-wide. This negative result hinged on stringent filtering of the input data, as phenomena such as hidden structural variation and ensuing mapping and genotyping errors may otherwise generate false signatures of TD. While TD could of course occur in these filtered regions, it is important to emphasize that even stringent filtering is not expected to eliminate true signals, which should span large genomic intervals due to the extreme LD within the samples of sperm genomes from a given donor (as supported by our simulations as well as empirical data).

One caveat of this and most previous scans for TD in humans is the inability to analyze highly repetitive and other technically challenging regions of the genome, including heterochromatic sequences such as telomeres and centromeres, which have been implicated in TD in other species (e.g., Brand et al., 2015; Axelsson et al., 2010). Indeed, the strongest signal of TD in our study, albeit below the threshold of genome-wide significance, occurred near the end of chromosome 2. Stringent filtering of repetitive regions was, however, necessary to avoid spurious signatures of TD that may arise due to read mapping and genotyping artifacts (see Materials and methods ‘Genotype filtering to mitigate spurious TD signatures’). The recent assembly of a complete human reference genome resolved many heterochromatic sequences for the first time and offers great promise for the future rigorous study of TD genome-wide, especially when combined with other technological advancements such as long-read sequencing (Nurk et al., 2022; Aganezov et al., 2022).

Past studies of TD in humans have been limited in statistical power, due largely to (1) the challenge of amassing a sufficiently large set of human families (number of families heterozygous at a given site) and number of individuals within a family and (2) the technical limitations of gamete genotyping. Such limitations are compounded by the high burden of multiple testing in genome-wide scans. Importantly, in a single-gamete sequencing study, the number of informative transmissions is equal to the number of genotyped gametes for all observed hetSNPs. In a pedigree-based study, the number of informative transmissions varies across SNPs, as not all parent-offspring trios will include one or more parent heterozygous for a given SNP (Figure 4—figure supplement 4). For example, the Hardy-Weinberg expected proportion of heterozygous parents for a common, autosomal bi-allelic SNP with an allele frequency (p) of 0.5 is 2p(1 - p) = 0.5. Meanwhile, variants with lower frequencies will possess smaller proportions of heterozygotes, thus capturing fewer informative transmissions and limiting statistical power. One implication of this distinction is that pedigree-based studies rely on distorter alleles that act across multiple families and are therefore best suited to discover TD involving variants that are common in the population. Single-gamete sequencing studies require only one individual to be heterozygous for the causal TD allele, although the probability that a donor is heterozygous still depends on the frequency of the allele in the population.

Intuitively, patterns of allelic co-inheritance among single sperm cells from the same donor facilitates the imputation of missing genotype data even at very low depths of coverage (∼0.01×) per cell, thereby augmenting our sample size at each heterozygous SNP by approximately 100-fold and improving statistical power compared to naive scans of aligned reads. An alternative but related approach, implemented in several previous studies of non-human species, entails pooled sequencing of gametes (Corbett-Detig et al., 2015; Corbett-Detig et al., 2019) or offspring individuals (Wei et al., 2017). Under the assumption that different gametes are sampled by the reads aligned at each SNP position, TD signal can then be aggregated within localized genomic windows. While this aggregation approach improves statistical power, it sacrifices resolution for identifying the causal locus as well as the ability to simultaneously discover recombination breakpoints. It is also important to note that such aggregation requires external knowledge of the phased haplotypes (e.g., obtained through sequencing of inbred parental strains). Applying a similar approach to pooled sperm sequencing data from humans (e.g., Breuss et al., 2020; Yang et al., 2021) would thus require additional phasing experiments, either based on the sequencing of relatives, single-molecule long-read sequencing, or patterns of LD in an external reference panel. Formal benchmarking of such strategies and comparison to single-cell methods offers a promising area for future study, and the decision to pursue single versus pooled sequencing likely depends on factors such as library preparation costs and effort, sample availability, and broader goals of the project.

Given the strong statistical power of our study, the absence of TD signal suggests the absence of moderate or strong TD in this sample. This negative result provides an intriguing contrast to the numerous validated examples uncovered in other species. However, our observations do not preclude the occurrence of TD in humans, for reasons that we enumerate here. First, our study was limited to sperm from 25 donor individuals. We would therefore miss TD involving rare or population-specific alleles that may be absent from our sample. A second and related consideration is that if not balanced by countervailing forces, alleles that are subject to TD may rapidly fix in the population, such that detecting these alleles during their ephemeral existence as polymorphisms may be unlikely. Work in model organisms has also demonstrated the possibility that after distorter alleles arise, suppressors may subsequently evolve to re-balance the transmission ratio (e.g., Greenberg Temin, 2020). Admixture between populations may then separate the distorter from the suppressor, revealing hidden distorter alleles. A third limitation is that our analysis is solely sensitive to mechanisms of TD that would operate prior to the sampling of the sperm cells (e.g., sperm killing), and is therefore blind to mechanisms that may operate in female meiosis (e.g., meiotic drive) or at different stages of development (e.g., fertilization, maternal-fetal incompatibility, or postzygotic selection). Notably, female meiosis may be particularly susceptible to meiotic drive given the asymmetric nature of the cell divisions that produce the oocyte and polar bodies (Clark and Akera, 2021). Within-ejaculate sperm competition (Sutter and Immler, 2020) may also contribute to signals of TD observed in other study designs (such as pedigrees) but would not be detected in this study where gametes were sampled indiscriminately. Finally, our analysis does not exclude the existence of TD driven by biased gene conversion, which would generate short tracts below the resolution of our study. Indeed, the occurrence of this phenomenon is well documented based on both comparative evolutionary data and targeted analysis of human recombination hotspots, and is beyond the scope of our study (Coop and Myers, 2007).

Conclusions and future directions

In summary, we introduced and benchmarked a method to phase donor haplotypes, impute gamete genotypes, and infer meiotic recombination events using low-coverage single-cell sequencing data from a large sample of human gametes. Our method is uniquely tailored to extremely sparse sequencing data from large numbers of gametes—a form of data that will be increasingly common as Sperm-seq and related methods are more broadly adopted and improved. Our simulations demonstrate that rhapsodi outperforms existing methods for phasing and imputation of single-gamete data, while also efficiently scaling to datasets 20-fold the size of that analyzed in our study.

As sequencing large numbers of gametes becomes tenable, even at lower coverages, rhapsodi can facilitate novel uses of inferred genotype data and meiotic recombination events. For example, the ability to construct sex-specific genetic maps could provide essential genetic resources for non-model species to complement genome assemblies and annotations (Lyu et al., 2021). In humans, the construction of personalized recombination maps offers opportunities to investigate sources of variability in the recombination landscape, as well as its relation to phenotypes such as fertility status (Lyu et al., 2021). For example, whole and partial chromosome gains and losses (i.e., aneuploidies) are the leading cause of human pregnancy loss, and one potential mechanism of aneuploidy formation is the abnormal number or placement of meiotic crossovers (Lamb et al., 2005). The chromosome-length haplotypes generated by rhapsodi also offer additional opportunities for population genetic analyses, including inferring kinship and population structure, detecting signatures of positive selection, and assessing demographic history (Li et al., 2020).

We applied rhapsodi to scan a large sample of human sperm genomes for evidence of ‘selfish’ alleles that violate Mendel’s Law of Segregation. We observed no significant evidence of TD, either with respect to individual loci or aggregate genome-wide signal, in turn supporting a model of balanced transmission in this sample. Measuring the fidelity of Mendelian segregation and uncovering the mechanisms that safeguard or subvert this process remain important goals for human genetics. Future investigation of TD in humans can employ rhapsodi to examine large samples of sparsely sequenced gametes from diverse individuals and populations—both fertile and infertile. Given the documented fertility impacts of TD in other organisms, a thorough understanding of the loci driving such inheritance patterns could help reveal novel genetic contributions to human infertility. More broadly, our method offers a flexible and reproducible toolkit for testing this and related hypotheses in other cohorts and study systems, especially in light of the declining costs and expanding application of single-cell sequencing methods.

Materials and methods

Data input and filtering

Request a detailed protocol

The main algorithmic steps of rhapsodi consist of (1) data preprocessing and filtering, (2) reconstruction of phased donor haplotypes (Figure 1b), (3) imputation of gamete genotypes (Figure 1d), and (4) discovery of meiotic recombination breakpoints. A table of sparse gamete genotypes (SNPs × gametes) with corresponding genome positions is required by rhapsodi as input (Figure 1a). Genotypes may be encoded in one of two forms: 0/1/NA (denoting reference, alternative, or missing genotypes, respectively) or A/C/G/T/NA (denoting nucleotides or missing genotypes, respectively). For A/C/G/T/NA encoding, reference and alternative alleles for each position are also required as input. Data is then filtered to consider only SNP positions that are heterozygous for the given individual, with at least one reference and one alternative allele observed across gametes. rhapsodi returns as output phased donor haplotypes, imputed genotype (and corresponding source haplotype) for each SNP in each gamete, and a list of locations of recombination breakpoints detected for each gamete (Figure 1c). We detail the methods that produce each output below.

Reconstruction of phased donor haplotypes

Request a detailed protocol

rhapsodi’s default windowWardD2 phasing approach leverages patterns of co-inheritance across gametes to reconstruct donor haplotypes from the sparse genotype matrix derived from low coverage sequencing (Figure 1a). Chromosomes are divided into overlapping windows; within each window, we compute pairwise binary (Jaccard) distances, cluster the gamete genotypes using Ward’s method (Ward, 1963) (implemented with the ‘ward.D2’ option in the hclust() function in the R ‘stats’ package), and cut the resulting tree into two groups to distinguish the haplotypes (Figure 1b). For each cluster within a window, we use majority voting to decode the phased haplotype sequences. This approach depends on the reasonable assumption that recombination is sufficiently rare, affecting a minority of gametes within each window. If no genotype achieves a majority at a given position, we designate the phase at that position as ambiguous (denoted as ‘NA’). Finally, haplotypes of adjacent (overlapping) windows are stitched together by considering the genotype consensus (i.e., rate of matching) for the overlapping SNPs. In the stringent stitching mode, if consensus is greater than a given threshold (default = 0.9), the haplotypes under consideration are inferred to be the same (i.e., to derive from the same donor homolog). If the consensus is less than 1 minus the threshold (default = 1 - 0.9 = 0.1), the haplotypes under consideration are inferred to be distinct (i.e., to derive from different donor homologs). Meanwhile, if the consensus lies between these upper and lower boundaries, the windows are considered too discordant to ascertain the relation of the haplotypes, and the method returns an error. However, rhapsodi can optionally be run in a lenient stitching mode which continues with phasing regardless of the level of consensus. In this case, the haplotypes with the highest consensus are assumed to derive from the same homolog.

To summarize, the SNP positions are split into windows. Within each of these windows, binary clustering of the gamete data across SNPs and majority voting are used to reconstruct haplotypes within each window. Finally, adjacent windows are stitched together based on the amount of consensus in the genotypes within the overlapping regions to reconstruct the two haplotypes of the donor genome.

rhapsodi also offers the option to apply a modified version of the diploid read-based phasing approach from HapCUT2 (Edge et al., 2017). This includes the functionality necessary for data conversion and calling HapCUT2 within rhapsodi as an alternative phasing approach, termed linkedSNPHapCUT2. While previous work modified HapCUT for use in haploid phasing (Bell et al., 2020), we chose to use HapCUT2, which extends and improves upon the original HapCUT algorithm (e.g., with regard to speed) (Edge et al., 2017). To use HapCUT2, the gamete data input to rhapsodi (cell genotype format) is converted into the fragment file format that would be produced as output from the extractHAIRS step if HapCUT2 were applied directly to sample BAM files. This conversion effectively produces synthetic ‘fragments’ based on the assumption that alleles originating from the same gamete and chromosome are linked.

In our application of rhapsodi to the Sperm-Seq datset, we used the default phasing method (windowWardD2) rather than linkedSNPHapCUT2, due to the poor efficiency of the latter for datasets of this scale. The 0.1% of hetSNP positions that remained of unknown phase were due to ties during the majority vote step.

Imputation of gamete genotypes

Request a detailed protocol

Our imputation approach uses a Hidden Markov Model (HMM) and a subsequent filling step to infer the genotypes of each gamete, effectively tracing the most likely path through the donor haplotypes, given the sparse observations. Our model assumes that (1) the hidden haplotype state at a given SNP depends only on the haplotype state at the immediately preceding SNP, (2) adjacent genotypes originate from the same donor haplotype unless a meiotic recombination event occurred between them, and (3) the observed SNP alleles are either correct or in rare cases arise from genotyping errors (e.g., PCR errors, contaminant DNA co-encapsulated with the sperm genome during library preparation, sequencing errors, mapping errors, etc.). The emission for each position is a single allele, either the H1 allele or the H2 allele (Figure 1c). The HMM stays in the current state (either H1 or H2) with some probability and changes state (via meiotic recombination) with some probability. The HMM uses the phased donor haplotypes and accounts for probabilities of genotyping errors and meiotic crossovers: the probability of genotyping errors is reflected in the emission probabilities, while the probability of meiotic crossovers (i.e., the number of expected meiotic recombination events per chromosome per gamete divided by the number of hetSNPs) is reflected in the transition probabilities (Figure 1c). We then apply the Viterbi algorithm (Forney, 1973) to determine the most likely sequence of hidden states, iterating over all gametes independently (Figure 1d). To avoid potential over-smoothing behavior of the HMM, we optionally overwrite the inferred alleles with any observed alleles for rare cases where these genotypes conflict. For example, if the HMM calls a site as deriving from haplotype 1, but the raw data exhibits an allele matching haplotype 2, the method replaces the inferred haplotype 1 allele with the observed haplotype 2 allele.

The HMM considers only observed genotypes and makes no predictions for missing genotypes. To impute missing genotypes, we assume that tracts of unobserved sites that are bordered by SNPs originating from a single donor haplotype originate from that same haplotype, thus filling in the genotypes for these unobserved sites. Similarly, we assume that recombination does not occur between the last observed SNPs and the ends of the chromosomes, again filling in genotypes at unobserved sites in these terminal regions. However, because of known high rates of recombination at the ends of chromosomes, we provide an option to leave these sites at the ends of chromosomes as unassigned (denoted as ‘NA’). Similarly, tracts within recombination breakpoints (i.e., between observed SNPs that transition from haplotype 1 to haplotype 2 or vice versa) remain unassigned (denoted as ‘NA’).

To summarize, rhapsodi first formulates a Hidden Markov Model (HMM) with emission probabilities defined by the expected genotyping error rate and transition probabilities defined by the expected average recombination rate. The HMM is then fit to the sparse gamete data, tracing the most likely path along the phased donor haplotypes for each gamete. Internal missing genotypes are filled if bordered by matching haplotypes. Missing genotypes at the ends of chromosomes are assigned to haplotypes based on the haplotype state at the first non-missing position or optionally may remain unassigned. Finally, by default, if original genotype observations disagree with inferred genotypes (based on the haplotype state), the original observations are superimposed. These steps together recover the dense gamete genotype matrix.

Discovery of meiotic recombination breakpoints

Request a detailed protocol

Using the imputed gamete genotypes and the underlying phased donor haplotypes, we identified meiotic recombination breakpoints as sites where, for a given gamete chromosome, the inferred path transitions from one donor haplotype to the other. This step should typically be run on the smoothed data (i.e. without superimposing the original genotype observations), as even a low rate of genotyping error could otherwise manifest as false meiotic recombination. Recombination breakpoints are reported as intervals, defined by the last observed site inferred to derive from one haplotype and the first observed site inferred to derive from the other. These start and end points thus demarcate regions in which the meiotic recombination events likely occurred.

Assessing performance with simulation

Request a detailed protocol

We developed a generative model to simulate input data and assess rhapsodi’s performance (Figure 2—figure supplement 1) while varying (a) the number of gametes, (b) the underlying number of hetSNPs, (c) depth of coverage, (d) recombination rate, and (e) genotyping error rate. Given these arguments, we construct a sparse matrix for input to the rhapsodi pipeline. The algorithmic steps of this generative model include (1) building phased diploid donor haplotypes (Figure 2—figure supplement 1a), (2) randomly tracing through these haplotypes to simulate recombination and construct gamete genotypes (Figure 2—figure supplement 1b), (3) masking genotype observations to mimic low sequencing coverage (Figure 2—figure supplement 1c), (4) superimposing genotype errors at a given rate (Figure 2—figure supplement 1d), and (5) filtering to only retain observable hetSNPs. We detail each step in turn below.

In the first step of the generative model, we construct phased haplotypes of the diploid donor by randomly generating a vector of 0s and 1s (with equal probabilities) at a length equal to the underlying number of hetSNPs. This binary vector is then inverted to generate the complementary haplotype (Figure 2—figure supplement 1a).

We then use these simulated homologs to generate genotypes for each gamete (Figure 2—figure supplement 1b). Specifically, we sample a count of recombination breakpoints from a Poisson distribution, where lambda reflects the average recombination rate. Each recombination breakpoint is randomly placed on the respective chromosome, and genotypes occurring after that breakpoint are inverted. In cases where a gamete chromosome does not recombine, the chromosome is identical to one of the parental chromosomes. By placing crossovers randomly, our method ignores heterogeneity in the landscape of meiotic recombination, as well as the phenomenon of crossover interference (Broman and Weber, 2000), which suppresses nearby crossovers.

After obtaining the gamete genotype matrix, we simulated the sparse coverage of single-cell sequencing data by masking observed genotypes. The missing genotype rate (MGR) is obtained from the probability density function of a Poisson distribution with x=0 and lambda equal to the sequencing coverage. We then multiplied the MGR by the total number of simulated SNP observations (number of underlying hetSNPs × number of gametes) to compute the total number of genotypes that should be masked (denoted as ‘NA’) (Figure 2—figure supplement 1c). Following a similar approach, we computed the number of genotyping errors by multiplying the error rate by the number of non-missing genotypes and then randomly placed these errors at individual SNPs by inverting the respective alleles (replacing 0 with 1 or vice versa) (Figure 2—figure supplement 1d). Finally, SNPs or rows with at least one 0 genotype and one 1 genotype are kept (and the rest of the rows are filtered out), retaining only sites where both alleles were observed. In this way, the final number of observed hetSNPs in the generated output may be less than or equal to the specified number of underlying hetSNPs in the input.

We used this generative model to construct input data for ranges of gamete sample sizes (3, 15, 50, 150, 500, 1000, 2500, 5000), underlying numbers of hetSNPs (5000, 30,000, 100,000), and depths of coverage (0.001, 0.01, 0.1, 0.223, 0.357, 0.511, 0.693, 1.204, and 2.303×, corresponding to missing genotype rates of 99.9, 99, 90, 80, 70, 60, 50, 30, and 10%, respectively). For each combination of these three parameters, nine different conditions were simulated, with three independent runs for each condition. The conditions were defined by combinations of the genotyping error rate (0.001, 0.005, and 0.05) and recombination rate (0.6, 1, and 3) used to construct the gamete data. Simulations in which no hetSNPs remained after filtering were dropped from downstream benchmarking (2% of the generated datasets). In total, we produced 5713 simulations for downstream benchmarking across these conditions.

We evaluated the quality of donor haplotype phasing by considering the switch error rate, largest haplotype segment, completeness, and phasing accuracy (see Materials and methods section titled ‘Definition of metrics’). To assess the quality of imputation of gamete genotypes, we quantified accuracy, completeness, and switch error rate. Finally, we assessed meiotic recombination discovery by computing the True Positive Rate (TPR), False Discovery Rate (FDR), and meiotic recombination breakpoint resolution. Given the imbalance in the number of true recombination breakpoints compared to the number of false recombination breakpoints, we also considered precision and F1 Score.

Definition of metrics

Request a detailed protocol
  • Accuracy (Phasing and Imputation): For two sequences of genotypes, s1 and s2 (of equal length and both represented by vectors comprising 0, 1, and NA), accuracy was defined as the Hamming error rate subtracted from 1. The Hamming error rate was defined to be equal to the number of positions with mismatches between s1 and s2 (ignoring a mismatch when a single sequence had an NA value) divided by the total number of sequence positions (Li et al., 2020; Porubsky et al., 2017).

  • Completeness: For a sequence of genotypes, s1 (represented by 0 s, 1 s, and NAs), completeness was defined as the number of non-NA positions divided by the total number of positions in s1 (Li et al., 2020).

  • Switch Error Rate: For two sequences of genotypes, s1 and s2 (of equal length and both represented by 0 s, 1 s, and NAs), the switch error rate was defined as the number of first mismatches in any stretch of adjacent mismatches (where the stretch is greater than or equal to length 1 and uninterrupted by NAs) divided by the total number of sequence positions (Li et al., 2020). This is visualized in Figure 2—figure supplement 3.

  • Largest Haplotype Segment: For two sequences of genotypes, s1 and s2 (of equal length and both represented by 0s, 1s, and NAs), the largest haplotype segment was defined as the maximum number of adjacent matched positions, uninterrupted by NAs. For plotting purposes, we divide this number by the total number of sequence positions (Li et al., 2020).

  • True Positive (TP): A true simulated meiotic recombination breakpoint intersects with predicted meiotic recombination breakpoint (Figure 2—figure supplement 5b).

  • True Negative (TN): Simulated chromosome with no meiotic recombination events is correctly predicted to have no recombination breakpoints (Figure 2—figure supplement 5b).

  • False Positive (FP): A predicted meiotic recombination breakpoint does not intersect with any true simulated meiotic recombination breakpoints (Figure 2—figure supplement 5b).

  • False Negative (FN): A true simulated meiotic recombination breakpoint does not intersect with any predicted meiotic recombination breakpoints (Figure 2—figure supplement 5b).

  • Recall or True Positive Rate (TPR): TPTP+FN

  • Precision: TPTP+FP

  • F1 Score: 2×precision×recallprecision+recall

  • False Discovery Rate (FDR): FPTP+FP

  • False Positive Rate (FPR): FPTN+FP

  • Specificity or True Negative Rate (TNR): TNTN+FP

  • Accuracy (Discovery): TP+TNTP+TN+FP+FN

Automatic phasing window size calculation

Request a detailed protocol

We noted that for specific scenarios involving low coverages or small numbers of gametes, the positive relationship between phasing performance and increasing amounts of data (i.e., gamete sample size and coverage) was not always monotonic. This suggested that other parameters that were held fixed (notably, window size) may interact with sample size and coverage to influence performance.

To address this possibility, we performed a series of simulations (n=860) across a range of window sizes, tracking the phasing accuracy for each window size possibility. These simulations covered 459 unique combinations of number of gametes, number of SNPs, coverage, genotyping error, and recombination rate (1–3 independent replicate simulations each). For each set of parameters, the optimal window size which led to highest phasing accuracy was recorded. In cases of ties, the larger window sizes were preferred.

Optimal window size was expressed as a proportion (optimal number of SNPs per window / total number of SNPs on the simulated chromosome) such that data was bound between 0 and 1. We randomly designated 25% (n=213) of simulations as the test set, while using the remaining 75% (n=647) of simulations were designated as the training set. We fit a beta regression to the training data with the optimal window proportion as the response variable and number of gametes, coverage, genotyping error rate, and recombination rate as the predictor variables. As expected, we found that the number of gametes (β^=6.6×104, pvalue=2.2×1041), coverage (β^=2.1×101, p-value = 0.009), and recombination rate (β^=3.8×101, pvalue=2.7×1018) were significant predictors of optimal window size. The model performed well on both the training and the held-out test set (Figure 2—figure supplement 4) with a training pseudo R-squared of 0.32 and no obvious loss of generalization in performance in the test set.

The results of this model are currently provided via an optional feature in the rhapsodi software package for automatic selection of phasing window size.

Benchmarking run time

Request a detailed protocol

To estimate rhapsodi’s runtime, we simulated sparse gamete datasets across the same range of coverage (0.001–2.303×) and gamete sample size (3–5000) as used in our broader benchmarking analysis (Figure 2). To be conservative, we used 100,000 hetSNPs for all simulations, equivalent to the largest dataset used in the other benchmarking analyses. Each simulation was run in triplicate with different random seeds. We then applied the rhapsodi_autorun() function to each dataset on 48 Intel(R) Xeon(R) Gold 6248 R CPUs @ 3.00 GHz. All simulations of 2500 gametes or fewer were run with an allocation of 189 GB of RAM. Simulations of 5000 gametes were run with an allocation of 1.47 TB of RAM. Time was recorded using R’s system.time() function, and CPU seconds were calculated by summing together the user and self times for parent and child processes. All analyses took under 90 min of wall-clock time to run (Figure 2—figure supplement 10).

Comparison to existing methods

Request a detailed protocol

We compared our method to the software package Hapi (haplotyping with imperfect genotype data) (Li et al., 2020), which was similarly developed for phasing and imputation based on single-cell DNA sequencing of gametes. Hapi was demonstrated to outperform the only other haploid-based algorithm, PHMM (pairwise Hidden Markov Model) (Hou et al., 2013), as well as two commonly applied diploid-based phasing methods, WhatsHap (Martin et al., 2016) and HapCUT2 (Edge et al., 2017) in terms of accuracy, reliability, completeness, and cost-effectiveness. Because Hapi outperforms PHMM and the PHMM software was not readily available, we did not directly compare rhapsodi to this method.

In contrast to rhapsodi, which is intended for datasets where many gametes are sequenced at individually low coverages, Hapi was designed for datasets that contain relatively small numbers of gametes (∼3-7) sequenced at individually higher coverages (>1×). Given this distinction, the default behavior of Hapi’s autorun function was not appropriate for our simulated datasets. In particular, Hapi’s imputation function (hapiImpute()) by default filtered out any SNPs that had a missing genotype observation for at least one gamete, which results in all data being removed from our low-coverage datasets. We thus disabled this filtering behavior prior to benchmarking against our method to facilitate comparison.

As part of Hapi’s autorun function, the HMM used to filter errors uses a fixed set of transition and emission probabilities. To more fairly compare the tools to each other, we instead provided Hapi with an HMM with parameters matching those of our simulation, thus closely mirroring the approach we used to evaluate rhapsodi in this analysis. After rhapsodi and Hapi were run on each dataset, the performances of both methods were evaluated using rhapsodi’s assessment functions.

We then compared the default phasing method in rhapsodi (windowWardD2) to the modified HapCUT2 phasing approach (linkedSNPHapCUT2). Following the same steps outlined in Materials and methods section titled ‘Assessing performance with simulation’, we calculate the accuracy and completeness for phasing using each method and take the difference between the measures for each of the two methods Figure 3—figure supplement 3. The time requirements of linkedSNPHapCUT2 follow the same steps as those outlined in Materials and methods section titled ‘Benchmarking run time’, with the additional use of the Unix time function for the calling of HapCUT2. Evaluation of linkedSNPHapCUT2 does not include the timing of file conversion for use with HapCUT2, only the phasing; timing of windowWardD2 only evaluates the timing of phasing itself because no file conversion is necessary. We report both total user and system time as well as real time Figure 3—figure supplement 4. Simulations of linkedSNPHapCUT2 were allowed to time out if not completed after three days (including both file conversion and execution of HapCUT2’s phasing).

While leveraging the data from Bell et al., 2020, rhapsodi offers substantive changes in the analysis approach. First, the Methods presented here build a reproducible software toolkit for similar analyses. Second, rhapsodi takes full advantage of the co-inheritance patterns of the SNP alleles; this phasing method is a principled approach tailored to the structure of the input data and knowledge of the biological process of meiosis. Third, for crossover discovery, rhapsodi assigns gamete-level genotypes via HMM-based imputation prior to detecting recombination breakpoints. Dealing with the error prior to crossover discovery enables rhapsodi to use the strengths of the HMMs.

Inferring genetic similarity of sperm donors to reference samples

Request a detailed protocol

For inference of genetic similarity of donor individuals to reference samples from the 1000 Genomes Project (Figure 5—figure supplement 2; Auton et al., 2015), we merged data from all sperm cells of each donor, effectively mimicking sequencing of the donors’ diploid genomes. We then randomly downsampled coverage 10-fold and limited analysis to chromosome 21 for computational tractability. Variants were called across all donor samples using freebayes (Garrison and Marth, 2012) with default settings and restricted to SNPs (as opposed to insertions or deletions). The list of variants observed in the Sperm-seq sample was then intersected and merged with genotype data from the 1000 Genomes Project reference panel of 3202 globally diverse samples (Auton et al., 2015). Using PLINK (Purcell et al., 2007), principal component analysis was applied to these merged genotype data, visualizing sperm donor sample principal component scores with respect to the labelled reference panel.

Genotype filtering to mitigate spurious TD signatures

Request a detailed protocol

Using metadata from Bell et al., 2020 and Leung et al., 2021, we removed any sperm cells designated as poor quality and any cell that was called as aneuploid for the chromosome of interest by the original researchers. To limit potential artifacts, we excluded all technically challenging regions of the genome previously identified by the Genome in a Bottle Consortium (GRCh38 ‘union’) (Krusche et al., 2019) and/or the ENCODE Consortium (Amemiya et al., 2019). These include tandem repeats, homopolymers >6 bp, imperfect homopolymers >10 bp, difficult to map regions, segmental duplications, GC content <25% or >65%, and other problematic regions. To further filter individual variants, we restricted our analysis to SNPs that were also present in the 1000 Genomes Project dataset (Auton et al., 2015). Notably, even if TD were to be caused by a rare variant (which we would have removed from our dataset in this step), it is expected to occur on a haplotype that also carries common variants shared with the 1000 Genomes Project, allowing us to discern its effect. Finally, we removed any SNPs exhibiting an excess of observations across sperm cells (>1 standard deviation above the mean), with the goal of filtering out potential segmental duplications that are unique to a given donor. Such duplications contribute to alignment artifacts, whereby both copies of the sequence (including divergent sites) would pile up at the same location in the reference genome. This phenomenon may generate false signatures of TD, which we detail here.

Specifically, we identified two potential indicators that a given signature of TD may arise from rare segmental duplications (Figure 5—figure supplement 3). First, we consider the number of sperm in which the SNP is observed to carry the reference allele vs. the alternative allele (Figure 5—figure supplement 3a, Figure 5—figure supplement 3b). In the absence of TD, we expect points (each representing an SNP) to cluster to the line with slope = 1, indicating balanced representation of the reference and the alternative allele across the sample of sperm from that donor. Such is the case in Figure 5—figure supplement 3b, which is representative of most loci throughout the genome. However, if no filtering is applied, we observe that certain regions with strong TD signatures exhibit predictable patterns of allelic imbalance manifesting here as lines with slope = 0.5 and slope = 2 (Figure 5—figure supplement 3a). The second indicator that such regions derive from a rare segmental duplication regards the overall number of genotype observations (Figure 5—figure supplement 3c). Because of the low coverage (∼0.01×), we expect most SNPs to be covered by one or more reads in only a few of the 969–3377 sperm per donor. If no filtering is applied, we find that SNPs exhibiting strong signatures of TD also exhibit an extreme excess of genotype observations, again consistent with duplication. To avoid these potential confounding impacts of segmental duplications, we set a stringent threshold for our filtering pipeline, removing SNPs with excess (>1 standard deviation above the mean) genotype observations across the pool of sperm. We calculated these means and standard deviations separately for each donor’s chromosomes (25 donors × 22 chromosomes).

Power analyses for detecting TD

Request a detailed protocol

To evaluate the statistical power of our TD scanning approach, we conducted simulations of varying levels of TD (0–20% deviations from Mendelian expectations) across a range of sample sizes of human sperm and applied two-tailed binomial tests (Figure 4). We used a Bonferroni correction to account for multiple hypothesis testing across both SNPs and donor individuals, as described in the Materials and methods section titled ‘Significance threshold for TD scan’. Power is calculated here based on fully known gamete genotype data. While donors with fewer gametes may experience slightly decreased imputation and thus decreased power, even for the smallest sample size encountered in our study (n=969 sperm cells from a single donor), we reach both high accuracy and high completion in genotype imputation (Figure 2b). Larger samples increase our power to detect even smaller deviations from binomial expectations. Power was computed for each study design (number of gametes and rate of transmission) based on 1000 independent trials.

To evaluate our power to discover very strong TD, whereby the causal SNP and nearby SNPs may appear as homozygous across the sperm sample, we also applied rhapsodi to simulated data with a transmission rate (k) of 0.99. To simulate TD signatures, we developed a modified version of our generative model, whereby we randomly selected a causal SNP and haplotype to be under-transmitted relative to the other (Figure 4—figure supplement 2). We then generated a larger than desired set of gametes, removed a specified fraction of gametes containing this SNP/haplotype combination from the dataset, and sampled the desired number of gametes from this pool. By setting the fraction of gametes to remove, we effectively control the transmission rate (k), based on the following equation: 2-(1/k).

Replicates of 100 independent simulations were performed across each chromosome for each of three donors (100 × 22 × 3 = 6600 total simulations): NC26 (average coverage, but small number of gametes), NC2 (slightly higher than average coverage and average number of gametes), and FF3 (lowest coverage, largest number of gametes). For each set of simulations, the coverage, number of gametes, and number of SNPs were set to match the data profile of that chromosome and donor combination, and datasets with strong TD (k = 0.99) were simulated with the modified generative model. These simulated datasets were assessed with rhapsodi, and a TD scan was applied to the imputed data. The lead observed SNP was recorded and its p-value compared to the p-value threshold of 1.78 × 10-7 used with the Sperm-seq TD scan. Power was then computed as the proportion of simulated lead observed hetSNPs with p-values below this significance threshold. The observed transmission rates for the lead observed hetSNPs ranged from 0.82 to 0.996 across the 6600 simulations.

Genome-wide scan for TD

Request a detailed protocol

To scan the genome for TD, single-cell genotype calls were generated from sperm sequencing data as described by Bell et al., 2020. Briefly, genomic variants were called for each donor using all sequence data from all sperm cells. After selection of heterozygous sites, the software GenotypeSperm (Drop-seq Tools v2.2, https://github.com/broadinstitute/Drop-seq/releases) was used to determine which sites were captured by each sperm cell barcode and which allele(s) were present at each of these observed sites. Aneuploidy status and cell doublets were identified as previously described (Bell et al., 2020; Leung et al., 2021). Only cell barcodes associated with single sperm cells (non-doublets) with even sequence coverage were included in the analysis, while aneuploid autosomes were excluded from the analysis.

We used a two-tailed binomial test (comparing counts of each allele under a null hypothesis of balanced transmission) to scan for TD at each hetSNP across sperm genomes from each of 25 donors. As input to this analysis, we used the imputed gamete data output by rhapsodi, applying the option to superimpose any observed genotypes in cases of discordance with inferred genotype state.

Significance threshold for TD scan

Request a detailed protocol

As described above, we applied a separate binomial test for TD at each hetSNP in each donor individual. Given the relative infrequency of recombination (per base pair per meiosis), sperm genomes are composed of large tracts of the donor’s two parental haplotypes. By consequence, genotype data from the sample of sperm from a given donor are affected by extreme LD, which typically extends across entire chromosomes. As such, our binomial tests are highly correlated across the genome, and naive multiple testing correction based on the number of SNPs would be overly conservative.

To account for this effect, we applied the method simpleM (Gao et al., 2008; Gao et al., 2010), which uses a PCA approach to compute the effective number of independent statistical tests. For the Sperm-seq dataset (combining across all donor individuals), simpleM estimated 281,368 effective tests (compared to 34,799,282 nominal tests). We then used the effective number of independent tests to apply a Bonferroni correction of 0.05/281,368 = 1.78 × 10-7.

Calculation of global signal of TD

Request a detailed protocol

We quantified global evidence of TD across the sperm genomes by calculating the rate of allele sharing between all pairs of sperm from each donor. First, we applied PLINK (Purcell et al., 2007) to the fully imputed sperm genotype matrix output by rhapsodi to generate a list of the SNPs deemed to segregate in near linkage equilibrium (plink –file <infilename> –no-fid –no-parents –no-sex –no-pheno –indep 50 5 2 –out <outfilename>). We then computed a Hamming distance matrix, effectively counting mismatches between each pair of sperm genomes from each donor. After repeating the above steps for all donors, we reported the mean across all pairs of sperm cells.

Null simulations were conducted by applying a modified version of the generative model (without missing genotypes or errors), using gamete numbers, pruned SNP counts, and crossover counts identical to those inferred from the empirical data. We repeated the above procedure for computing pairwise distances on each simulated dataset, recording the mean proportion of shared alleles across all sperm from each of the 25 donors. Repeating this entire procedure 500 times allowed the construction of a null distribution to which we compared our allele sharing calculation from the Sperm-seq data. We then computed a one-tail p-value, defined as the proportion of null simulations where the average pairwise allele sharing was greater than or equal to the observed value.

Power analysis for detecting TD with pedigrees

Request a detailed protocol

Pedigree-based studies typically test for TD using the transmission disequilibrium test (TDT) (Spielman et al., 1993). The TDT is a McNemar’s test of the binomial (H0: pA1 = pA2 = 1/2), where pA1 is the probability of transmitting the A1 alele and pA2 is the probability of transmitting the A2 allele (Meyer et al., 2012). As a point of comparison for the single-sperm analysis, we computed the power of the pedigree-based McNemar test (using the test statistic X=(b-c)2/(b+c), where b and c are the numbers of observed transmissions of the A1 and A2 alleles, respectively Meyer et al., 2012) at varying levels of TD (0–20% deviations from Mendelian expectations) across a range of sample sizes of informative transmissions (Figure 4—figure supplement 4). The sample size refers to the number of offspring in which transmission of the allele of interest can be assessed (i.e., one heterozygous parent). We then plot the number of informative transmissions per SNP using published data from Meyer et al., 2012 (their supplementary table 1).

Availability of data and materials

Request a detailed protocol

Data analysis scripts specific to our study are available at https://github.com/mccoy-lab/transmission-distortion, (copy archived at swh:1:rev:d30bde24ca6d385036b88527c66a4a7d6261f8a5; Carioscia et al., 2022). Our package rhapsodi is available at: https://github.com/mccoy-lab/rhapsodi, (copy archived at swh:1:rev:8a5d712b1eb500594ac75428aa8dd94494bf81f3; Weaver et al., 2022), and a vignette detailing use of rhapsodi is included there, at https://github.com/mccoy-lab/rhapsodi/blob/master/vignettes/rhapsodi.html.

Raw sperm sequencing data from Bell et al., 2020 can be accessed via dbGaP (study accession number phs001887.v1.p1), as described in the original publication. Raw sperm sequencing data from Leung et al., 2021 was accessed upon request from the authors. We filtered the cells in our analysis using metadata published by Bell et al., 2020 at: https://zenodo.org/record/3561081#.YLAdO2ZKhb9. Analogous metadata from Leung et al., 2021 was obtained upon request from the authors.

Data availability

Data analysis scripts specific to our study are available at https://github.com/mccoy-lab/transmission-distortion (copy archived at swh:1:rev:d30bde24ca6d385036b88527c66a4a7d6261f8a5). Our package rhapsodi is available at: https://github.com/mccoy-lab/rhapsodi, (copy archived at swh:1:rev:8a5d712b1eb500594ac75428aa8dd94494bf81f3). Raw sperm sequencing data from Bell et al. (2020) can be accessed via dbGaP (study accession number phs001887.v1.p1), as described in the original publication. Raw sperm sequencing data from Leung et al. (2021) was accessed upon request from the authors. We filtered the cells in our analysis using metadata published by Bell et al. (2020) at: https://zenodo.org/record/3561081#.YLAdO2ZKhb9. Analogous metadata from Leung et al. (2021) was obtained upon request from the authors.

The following previously published data sets were used

References

    1. Spielman RS
    2. McGinnis RE
    3. Ewens WJ
    (1993)
    Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (iddm)
    American Journal of Human Genetics 52:506.
    1. Sutter A
    2. Immler S
    (2020) Within-ejaculate sperm competition
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 375:20200066.
    https://doi.org/10.1098/rstb.2020.0066

Decision letter

  1. Daniel R Matute
    Reviewing Editor; University of North Carolina, Chapel Hill, United States
  2. Molly Przeworski
    Senior Editor; Columbia University, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting the paper "Strict adherence to Mendel's First Law across a large sample of human sperm genomes" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The reviewers have opted to remain anonymous.

Comments to the Authors:

Your manuscript was reviewed by three experts in the field and they concurred that the research is important and valuable. In particular, the development of the rhapsodi will be important for future studies. We also appreciated the application to human gamete data. The conclusions of the study largely confirm the results from previous studies and for that reason, we have decided to decline the manuscript. The work will not be considered further for publication by eLife. We hope the extensive comments from the reviewers will be useful.

Reviewer #1 (Recommendations for the authors):

The manuscript reports a new method (rhapsodi) to impute haplotypes in the sequencing of human gametes. The method performs well in simulated data and the authors explore parameter spaces relevant to the research. In particular, the authors evaluate how well the method performs at different genotyping depths and number of gametes. Even with relatively low coverage (~0.1X), the method phases the haplotypes effectively. Next, the authors evaluate the performance of rhapsodi to mis-specification of recombination and genotype error rate. Some simulations show a non-monotonic relationship with coverage which means the method is currently hard-coded for an unidentified source of variation. The authors acknowledge this caveat.

The second part of the manuscript compares the results from rhapsodi with those Hapi, another tool to phase haplotypes in gamete sequencing data and show that rhapsodi method seems superior in terms of computational time, completeness, and accuracy.

The next, and final, part of the manuscript is to analyze the data from the SpermSeq dataset (Bell et al. 2020; ~41K sperm haploid genomes). The authors do several steps of filtering and use the reduced dataset to infer crossover location along the genome (and the results are consistent with those of previous reports). Most importantly, the authors also use the method to infer Transmission Distortion (TD). There are a few sites (e.g., seven linked SNPs in chromosome 2) that show some deviations, but after using a multiple hit correction, there is no single SNP that shows a strong signal of TD. In general the manuscript concludes there is no significant deviations from random segregation along the whole human genome. This is a rigorous study that provides a novel tool for the study of gamete genomes. Nonetheless, the conclusion is not particularly novel. Perhaps if the authors provide more context on how the findings are contextualized in our current understanding of TD in humans, the manuscript would be more appealing. As it stands, the discussion is almost entirely about the method performance. An example of this lack of context comes in paragraphs 1 and 2 of the discussion. I would elaborate on the studies that are 'limited in statistical power'. Paragraph 2 suggest that previous studies were problematic but the authors do not describe the specifics. This segment of the discussion could be expanded for the benefit of the reader.

Reviewer #2 (Recommendations for the authors):

As mentioned briefly in the public review, the method developed in this work is well-motivated, high performing, and clearly explained. The descriptions of both simulations and application of the method to real data are very thorough, and the authors have taken care to reduce potential confounding effects of genotyping error and other data artifacts. The highlighted results regarding human TD are challenging to interpret in the context of various limitations, including an absence of discussion of how the method might perform under strong TD, as well as missing or unclear information about the TD simulations and the source of the samples. While some limitations of the TD analysis are discussed, other important ones are missing. Overall, the paper may be stronger if it focuses on the method rather than the TD analysis and/or incorporates or suggests an additional use case beyond searching for TD.

Major suggestions for improvement:

– Expand simulations of TD beyond 70% overtransmission of one allele, to identify an upper bound for TD beyond which allelic dropout substantially reduces power.

– The methods and Figure 4 – —figure supplement 4 provide only the number of gametes and not the sequencing coverage used for simulations of TD. Include information about coverage for these simulations, and highlight in the figure the range of parameters/conditions represented in the real dataset.

– The authors claim that they have ">80% power to detect even subtle TD." However, the results from the analysis with multiple test correction shown in Figure 4 – —figure supplement 4B show 80% power for a sample size of ~1000 gametes (the smallest sample size in the sperm dataset) at roughly a 0.58 transmission rate (it is challenging to see precisely from the colors in the figure). It would be helpful to provide an exact lower bound for the detection of TD under realistic conditions. A transmission rate of 0.58 is arguably not subtle and is comparable to the rate at which pedigree studies would also be reasonably well powered.

– In several places the authors state that a true positive signature of TD would not be eliminated through stringent filtering due to the persistence of strong LD. One effective way to demonstrate this would be to show that their filtering has not masked large regions of the genome in some donors. What are the longest stretches in which all SNPs have been eliminated through filtering?

– Repetitive regions, particularly centromeres and telomeres, are often involved in TD in other species, yet these would be filtered out as "challenging regions" in this analysis. Is it possible to infer how well these regions might be captured through LD with included SNPs? The section of the Discussion describing cases of TD that might not be captured in this analysis should include this limitation.

– Relatedly, if the marginal signal of TD on chromosome 2 shown in Figure 4 and described in brief on pp. 5 – 6 is at the end of the genotype-able portion of the chromosome, it could represent stronger TD involving telomeric repeats in LD with this region. This possibility could be excluded or discussed.

– It would be helpful for interpreting the results to include more details about the source of the sperm samples, particularly the donors' ancestry and whether the donors were known to have any fertility issues. This would provide useful context for the possibility of population-specific TD in the Discussion, as well as any comparisons to results of earlier studies using other methods.

– The Conclusion could be substantially strengthened by suggesting other potential uses for this method, including but not limited to detection of TD. Which samples would be most interesting to use for subsequent scans of TD? What other uses might this method have in non-TD-related genetic or evolutionary studies?

Reviewer #3 (Recommendations for the authors):

This method for phasing and imputing gamete genotypes requires substantially more benchmarking. Although accuracy is of interest and reasonably well evaluated through simulations, nothing is presented about the computational performance. In particular, given that much larger Sperm-seq datasets must already be in production, it is important to determine if this approach will scale sufficiently for those applications. Runtimes and memory requirements should be reported.

The code is reasonably well documented. However, I am not certain I could reproduce the haplotype phasing approach nor the HMM-based on the description provided in the text. The description should be expanded substantially possibly in a supplemental text section. For instance, in the state transition diagram in Figure 1C, as shown it indicates that the HMM does not transition back from error states. Is that right or am I misunderstanding?

The section, "Power analysis for detecting TD" states that the power across simulations is 80%. Does this account for a multiple testing correction? Both corrected and uncorrected values should be reported.

Two biological factors might be worth some additional clarification. First, what do we know about the ancestry of the donors? E.g., are some individuals admixed? Admixture might be expected to "unmask" cryptic distorter/suppressor pairs. Second, it should be clearly stated in the discussion that this study does not exclude the possibility of rare strong distorters at modest or low frequencies in human populations.

To me, figure 4-S3 is somewhat misleading. Note that this might actually be figure 4-s4 given the caption and references in the text, but on the pdf that page is labelled figure 4-s3. None of the donors considered had more than about 3300 total sperm sampled. After QC, I assume it is less. Yet the X-axis proceeds to 10k. I'd like to see the X-axis pruned back to a reasonable range for these analyses and a line plotted with points for each donor where the experiment would have power to detect TD. I suggest plotting the line at 80% power by convention, but 50% would allow for a more direct comparison to Meyer et al. (2012). This really should be a main text figure, too, since all of the major biological claims in the paper hinge on these analyses it is necessary to explore power in much greater depth.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Adherence to Mendel's First Law across a large sample of human sperm genomes" for further consideration by eLife. Your revised article has been evaluated by Molly Przeworski (Senior Editor) and a Reviewing Editor.

The manuscript was reviewed by two of the original reviewers and two additional ones. Overall, the reviewers agree on the merit of the work. The manuscript has been improved but there are some important remaining issues that must be addressed. In summary, we have three requests and one suggestion.

Essential:

– A comparison to hapcut2 seems to be crucial for benchmarking the method. Since your method has a phasing component, it is important to present a comparison that is adequate given the scale of the dataset.

– The criticisms about pedigree-based studies need to be presented in a more nuanced way. Similarly, the broad statement about the absence of strong TD in humans seems poorly supported. The reviewers all suggested a more balanced presentation of these previous efforts and more of a discussion of how rhapsodi can be integrated in current research.

– The manuscript reads as two disjointed pieces, one on method development and the other on applications. We would ask that the manuscript be revised with this issue in mind, as it persists from the previous submission.

Potentially useful but not essential:

The reviewers list some suggestions, including changing the title, modifying the abstract, and reordering the introduction and discussion.

Reviewer #2 (Recommendations for the authors):

I appreciate the extensive further simulations, analyses, and revisions the authors have provided, particularly the additional simulations under strong TD and the clarified details about their samples, metrics, and computing resources. The suggestions for future uses of rhapsodi are also helpful in communicating the significance of this work. These additions substantially strengthen the manuscript and alleviate many of my initial concerns.

I still feel that statements about generalizing a negative finding in the search for TD within this sample to a broader statement about the absence of strong TD in humans (e.g., line 107 "underscores the fidelity of human male meiosis for ensuring balanced transmission of alleles to the gamete pool"), are somewhat too strong, for the following reasons:

1) All power simulations for rhapsodi-based TD detection are conditional on observing a distorter in heterozygous state within the sample, which would require the distorter not to be very rare (roughly >1.4% allele frequency required for 50% probability that at least one of 25 individuals is heterozygous). Observed distortion alleles in other species are often at very low frequency under an equilibrium model where the distortion is balanced by fitness costs in homozygotes (e.g., SD is found at 1 – 5% frequency per ref #9). Unbalanced distorters would fix or be lost rapidly, and would be unlikely to be detected except in the very brief window at which they are locally at intermediate frequency. The authors note these as caveats in the Discussion, but the need for a relatively common distorter is not described as an issue of power (see discussion of comparison to pedigree-based studies below).

2) I may have missed it, but I did not see mention of the possibility of population-specific TD aside from the discussion about fixed distorters uncovered through admixture. While it is true that there are very few alleles with extreme patterns of frequency differentiation across human populations (lines 620 – 621), this is not highly relevant for whether one should expect distorters to be population-specific. As mentioned above, distortion loci are likely to be rare and/or ephemeral, and either would make them very likely to be population-specific.

3) Some TD systems involve epistasis between alleles at distorter and responder loci, which would further require the responder allele to be observed heterozygous in the sample.

I did not notice this before, but there are statements in both the Introduction (line 63) and Discussion (line 505) about pedigree-based TD scans being underpowered due to the small size of human families. This is a bit misleading because such studies typically combine data across multiple families (as noted by the authors in lines 523 – 524), which reduces their power to detect TD present in any one individual but not to detect TD involving an allele at intermediate frequency that is heterozygous in many parents within the sample. Both pedigree-based and sperm sequencing studies have issues of power due to sample size of individuals that are currently discussed primarily in the context of pedigree-based studies; donor sample size concerns are separated from considerations of power when discussing the sperm sequencing study design.

The statement that "single-gamete sequencing studies… provide equal power for detecting TD involving common and rare alleles" is undermined by the following clause "provided that they are heterozygous in the sampled donor individual." The probability that any of the sampled donor individuals are heterozygous for a distorter depends upon that distorter's frequency in the population, so the study's overall power to detect TD is not equal for common and rare alleles (see point 1 above). For example, in the context of Figure 4—figure supplement 4, 200 informative transmissions out of 1518 trios would imply p(heterozygous) = 0.066 (200/3036, MAF = 0.034). In the Sperm-seq sample of 25 individuals, 18% of such cases would be expected to have no heterozygotes in the sample. As this example demonstrates, the power for the present study is substantially higher than for previous pedigree-based studies, but not as much higher as implied by the power simulations.

It might be worth considering for future work whether power may be gained for detecting TD using rhapsodi by combining data across individuals following haplotype inference.

Reviewer #3 (Recommendations for the authors):

Overall, the manuscript has improved and many of my concerns have been well addressed. In particular, emphasizing the power considerations of this specific approach in the main text was an important addition. I appreciate the authors' correcting inaccuracies in my understanding of their previous work.

I respectfully disagree with the authors' dismissal of several considerations.

First, the paper remains a pretty sharp subdivision of methods vs biology. The methods are really about phasing and recombination detection in sperm-seq data, the biology is about TD. I believe it will be hard for a non-specialist to read this manuscript, though the authors are correct that extremely specialized users --- i.e., those with their own sperm-seq datasets --- may benefit from improved software usability.

Second, a comparison to an approach that is suited to the scale of data used is necessary. If hapcut2 is the only option, it should be applied despite being an "off-label" use. Also, yes, it is somewhat outside of typical hapcut2, but linking the reads bioinformatically is pretty straightforward and reasonable. It bears some similarity to read-backed phasing Certainly if one of this studies' co-authors believed this to be a valid use in previous published work it is reasonable to include as a point of comparison here.

On a related note, in "Benchmarking against existing methods", while the description of previous analysis is accurate, the relevant underlying datasets vary so much from previous works that is it not clear that the same results would hold here. Minimally, some acknowledgement should be added that the results described from previous work are based on a dramatically different dataset than was considered here.

Reviewer #4 (Recommendations for the authors):

The authors describe two main things in this paper:

First is the software package called "rhapsodi". Rhapsodi is an R package which is designed to use sparse whole-genome sequencing data from gametes to phase a sperm donor's haplotypes, impute gamete genotypes and identify meiotic crossover breakpoints.

Second, rhapsodi was used with both simulated data and a published dataset of 41,189 individual sperm cells' sequence. The main biological focus of this second part of the paper involved testing for transmission distortion, of any alleles or broader linkage blocks, for which none is detected. Previous publications, referenced in the manuscript, have suggested that transmission distortion can occur in human populations. The possible causes of the discrepancies in results between the study and previous studies is discussed.

Strengths

In the set of variables assessed, rhapsodi appears to outperform Hapi, which the authors present as the benchmark to meet or exceed for a package focused on efficient, accurate, complete and reliable phasing of haplotypes.

Another strength is that the paper also assesses transmission distortion with particularly large datasets for which no signal is detected. This contrasts from previous studies cited in the manuscript. However, the claims relevant to the analysis of this public dataset appears to be convincing. Further, the authors go on to offer good reasoning, particularly in the Discussion, about how they arrived at these conclusions, what any potential limitations of the approach may be, and how the dataset used differs fundamentally from pedigree-based data i.e. this study investigates transmission rates in gametes and does not consider different stages of fertilisation and subsequent aspects embryonic development.

Weaknesses

The approaches in the paper generally are quite strong. However, the study lacks a positive control for transmission distortion, which is not simulated data. This positive control does not exist in published human datasets to the best of my knowledge. However, outside of the scope of this study, one could consider reanalysing published non-human single-gamete sequencing datasets of which there are a number of studies.

The analyses of meiotic crossovers with non-simulated data are quite light – limited to Figure 5 sup 4 and 5 – compared to the two other features of rhapsodi (phasing and imputation), which feature more extensively. Given that a significant portion of the manuscript presents a re-analysis of the data from Bell et al. 2019 Nature, it be helpful to visually demonstrate the variation in crossover numbers, positioning, interference between the 25 donors, and where possible compare these results to published analyses e.g. Bell et al. and other studies. For example, Figure 5—figure supplement 5 suggests that rhapsodi is much more conservative with calling of crossovers/haplotype transitions than Bjarni et al. and I think that this is worthy of discussion. Particularly because false positive rates of double crossovers can have a large effect on particular study types, such as those concerning crossover interference.

The outputs of this work will be very useful for future researchers; both the rhapsodi software, and the finding that there is no strong transmission distortion signal in these 25 male sperm samples. Rhapsodi is one of a very limited number of user-friendly generalised pieces of software – as opposed to a collection of scripts – for the analysis of sparsely sequenced gametes for haplotype phasing, imputation and meiotic crossover breakpoint analysis. The package should attract significant attention from the potential userbase.

The biological findings of this study – failure to identify transmission distortion in these samples – should be of general interest to geneticists and adjacent fields. Despite being a negative finding, it challenges existing literature with a robust analysis and will be sure to stimulate further research directions.

I found the paper very interesting. I also found the paper to be very well written, clear, accessible to a broad readership, and look forward to using rhapsodi.

Reviewer #5 (Recommendations for the authors):

The authors introduce a new method rhapsodi, which accurately infers haplotypes from low-coverage sequence data of large sample sizes of haploid gametes. They demonstrate that rhapsodi performs better than the existing approach Hapi, particularly for larger samples of gametes. They apply rhapsodi to test for evidence of transmission distortion (TD) in a published human Sperm-seq dataset (single cell sequencing of >30k sperm from 25 donors), and report no significant evidence of TD.

rhapsodi is a powerful approach, which will be useful in a variety of contexts. The TD analysis is more rigorous than previous pedigree-based studies, providing convincing evidence for a lack of strong distorters. However, power to detect weak TD remains limited after accounting for multiple testing. The method is a larger contribution, and likely to be of interest to a broader audience. The manuscript would benefit from re-framing to focus on introducing rhapsodi – with human TD results demonstrating its application.

Recommendations for authors:

The authors have thoroughly and carefully addressed the technical concerns of previous reviewers, and better placed the results in context with previous literature.

However, as noted by previous reviewers, the major strength of the manuscript is the method rather than the human TD analysis. Re-framing the manuscript with focus first on the method would better reflect the relative contribution of the aims, and broaden readership. This could be accomplished without extensive re-writing. For example, I suggest:

– Change title: The title refers only to the TD results – including mention of the method in the title will be more accurate and make it less likely to be overlooked by readers interested in applying the method to other questions. e.g. something like:

New method for analyzing haploid gamete sequences applied to a large sample of human sperm shows adherence to Mendel's first law.

– Re-ordering abstract: to first highlight the need for new method/software to analyse SpermSeq data – ie low-coverage sequencing data from large sample size of gametes. Next highlight TD in humans as an example of a question that can be addressed with SpermSeq data better than previous. [summary of findings] Add at the end more explicit mentions of other applications of rhapsodi (outlined in L653-672 of Discussion)

– Re-order Introduction/beginning of Discussion in similar manner.

Other comments:

L402 "even slight deviations from Mendelian expectations, as supported by our simulations of TD across a range of gamete sample sizes and transmission rates"

– I would consider ~50-55% slight deviations – power is low in this range

– Single noted peak on Chr 2 is 56% for 1571 gametes – but doesn't reach genome-wide significance

– This concern has been addressed somewhat by removing "strict" etc in response to previous reviewers, but could

L491 comparison to Zollner – excess of allele sharing among sibs in pedigree – 50.43% -

this is surviving offspring not gametes so reflects additional sources of TD and selection on embryos/offspring

L621-631 discusses other sources of TD but does not mention this may explain the discrepancy with pedigree based approaches

Could add within-ejaculate sperm competition as factor – reviewed in:

https://royalsocietypublishing.org/doi/full/10.1098/rstb.2020.0066#d1e627

L581 this paragraph makes more sense combined with points in L543-550

Filtering of repeat regions and segmental duplications could remove sites with TD (as noted by the previous reviewer) – in L545 point out the benefits of stringent filtering but doesn't point out that some of these regions could be effected by TD

Methods – in beginning, briefly reiterate the differences compared to methods in the sperm-seq paper (Bell et al. 2020) – it's unclear that the method here is similar but formalised into the package

Figure 4 – Figure supp 1 and Figure 2-supp9 – can't distinguish different colors – make points bigger and/or change shape also according to scale

https://doi.org/10.7554/eLife.76383.sa1

Author response

[Editors’ note: The authors appealed the original decision. What follows is the authors’ response to the first round of review.]

Reviewer #1 (Recommendations for the authors):

The manuscript reports a new method (rhapsodi) to impute haplotypes in the sequencing of human gametes. The method performs well in simulated data and the authors explore parameter spaces relevant to the research. In particular, the authors evaluate how well the method performs at different genotyping depths and number of gametes. Even with relatively low coverage (~0.1X), the method phases the haplotypes effectively. Next, the authors evaluate the performance of rhapsodi to mis-specification of recombination and genotype error rate. Some simulations show a non-monotonic relationship with coverage which means the method is currently hard-coded for an unidentified source of variation. The authors acknowledge this caveat.

We noted in the previous manuscript draft that for specific scenarios involving low coverages or small numbers of gametes, the positive relationship between phasing performance and increasing amounts of data (i.e., gamete sample size and coverage) was not always monotonic. This suggested that other parameters that we currently hold fixed (notably, window size) may interact with sample size and coverage to influence performance.

To address this possibility, we performed a series of simulations (n = 860) across a range of window sizes, tracking the resultant phasing accuracy. The optimal window size (maximum accuracy) was then identified for each simulation. We designated 25% (213) of these simulations as the test set, while using the remaining 75% (647) of simulations as a training set. For the training set, we fit a β regression with the optimal window size (number of SNPs in the window / total number of SNPs) as the response variable and number of gametes, coverage, genotyping error rate, and recombination rate as the predictor variables. As expected, we found that the number of gametes, coverage, and recombination rate were significant predictors of optimal window size. Additionally, the model performed well on both the training and the held-out test sets (Figure 2—figure supplement 10).

We have used the model above as the basis of an automatic window size calculation. We currently implement it as an optional feature within the software package and briefly describe these results in the manuscript text (Figure 2—figure supplement 10; Methods lines 988-1028). We plan to further develop this feature through more extensive simulations in the future.

The second part of the manuscript compares the results from rhapsodi with those Hapi, another tool to phase haplotypes in gamete sequencing data and show that rhapsodi method seems superior in terms of computational time, completeness, and accuracy.

In response to comments from another reviewer, we expanded the comparisons to Hapi in regard to computation time, while also providing additional context about why Hapi is the most relevant software for comparison (lines 287-295; lines 299-303).

The next, and final, part of the manuscript is to analyze the data from the SpermSeq dataset (Bell et al. 2020; ~41K sperm haploid genomes). The authors do several steps of filtering and use the reduced dataset to infer crossover location along the genome (and the results are consistent with those of previous reports). Most importantly, the authors also use the method to infer Transmission Distortion (TD). There are a few sites (e.g., seven linked SNPs in chromosome 2) that show some deviations, but after using a multiple hit correction, there is no single SNP that shows a strong signal of TD. In general the manuscript concludes there is no significant deviations from random segregation along the whole human genome. This is a rigorous study that provides a novel tool for the study of gamete genomes. Nonetheless, the conclusion is not particularly novel. Perhaps if the authors provide more context on how the findings are contextualized in our current understanding of TD in humans, the manuscript would be more appealing. As it stands, the discussion is almost entirely about the method performance. An example of this lack of context comes in paragraphs 1 and 2 of the discussion. I would elaborate on the studies that are 'limited in statistical power'. Paragraph 2 suggest that previous studies were problematic but the authors do not describe the specifics. This segment of the discussion could be expanded for the benefit of the reader.

Thank you for your careful reading and helpful suggestions. We agree that providing additional context about current understanding of TD in humans would improve our work. To this end, we bolstered both the Introduction and Discussion for a more thorough background about the strengths and limitations of pedigree-based studies, pooled gamete sequencing studies, and single-gamete sequencing studies in both humans and non-human systems (lines 37-46; lines 508-529; lines 557-580).

For additional context about why single-gamete sequencing is a powerful method, we explain that patterns of allelic co-inheritance among single sperm cells from the same donor individuals facilitate the imputation of missing genotype data even at very low depths of coverage (~0.01x) per cell, thereby augmenting our sample size at each heterozygous SNP by approximately 100-fold and improving statistical power compared to naive scans of aligned reads (lines 551-557).

In response to this and other reviewers’ comments, we have also extended our power analyses and more directly compared to power achieved by previous pedigree-based scans (Figure 4, Figure 4—figure supplement 1, Figure 4—figure supplement 4). Importantly, it is not the statistical test or pedigree nature of the data per se that limit statistical power, but rather the fact that the number of informative transmissions (i.e., the relevant sample size) per SNP in a pedigree based study is limited by the number of heterozygous individuals. This contrasts with our single-sperm analysis, where the relevant sample size for all observed SNPs is equal to the total number of gametes (lines 508-521).

Reviewer #2 (Recommendations for the authors):

As mentioned briefly in the public review, the method developed in this work is well-motivated, high performing, and clearly explained. The descriptions of both simulations and application of the method to real data are very thorough, and the authors have taken care to reduce potential confounding effects of genotyping error and other data artifacts. The highlighted results regarding human TD are challenging to interpret in the context of various limitations, including an absence of discussion of how the method might perform under strong TD, as well as missing or unclear information about the TD simulations and the source of the samples. While some limitations of the TD analysis are discussed, other important ones are missing. Overall, the paper may be stronger if it focuses on the method rather than the TD analysis and/or incorporates or suggests an additional use case beyond searching for TD.

Thank you for these comments and suggestions, which have strengthened our manuscript.

Major suggestions for improvement:

– Expand simulations of TD beyond 70% overtransmission of one allele, to identify an upper bound for TD beyond which allelic dropout substantially reduces power.

We added simulations of very strong TD (k = 0.99; Figure 4—figure supplement 3) as noted in the responses above. Our simulations support the accuracy of phasing and imputation under these circumstances. Importantly, while extreme TD does indeed result in “allelic dropout”, whereby the causal SNP and nearby flanking SNPs appear as homozygous (and thus undetectable without external knowledge of donor heterozygous SNPs), recombination recovers heterozygosity as one extends out in either direction along the chromosome, and massive signatures of TD become apparent within these regions (Power = 100% with coverage and gamete sample sizes resembling the Sperm-seq data).

Notably, while tracts of homozygosity do occasionally appear in the Sperm-seq data by consequence of data filtering and phenomena such as heterozygous deletions or copy neutral runs of homozygosity, no TD signal is apparent on the borders of these tracts as would be expected of strong TD (lines 471-485).

– The methods and Figure 4 – —figure supplement 4 provide only the number of gametes and not the sequencing coverage used for simulations of TD. Include information about coverage for these simulations, and highlight in the figure the range of parameters/conditions represented in the real dataset.

This point is related to points raised by Reviewer 3. To address these comments, we have modified this figure (now main Figure 4) to highlight the range of parameters (including coverage) that is consistent with the Sperm-seq dataset. These changes clarify the statistical power of our study and facilitate more direct comparison to power of previous studies from the literature (Figure 4, Figure 4—figure supplement 1).

– The authors claim that they have ">80% power to detect even subtle TD." However, the results from the analysis with multiple test correction shown in Figure 4 – —figure supplement 4B show 80% power for a sample size of ~1000 gametes (the smallest sample size in the sperm dataset) at roughly a 0.58 transmission rate (it is challenging to see precisely from the colors in the figure). It would be helpful to provide an exact lower bound for the detection of TD under realistic conditions. A transmission rate of 0.58 is arguably not subtle and is comparable to the rate at which pedigree studies would also be reasonably well powered.

We have added an additional figure (Figure 4—figure supplement 1) that more clearly illustrates our power to detect TD within the Sperm-seq dataset, as well as an additional figure that depicts the power of pedigree-based studies (Figure 4—figure supplement 4).

Our simulations of strong TD (Figure 4—figure supplement 3) and modification of the title (to remove the word “strict”) also help address this point, as does the additional context about pedigree, pooled gamete sequencing, and single gamete sequencing studies provided in the updated Introduction and Discussion (lines 37-46; lines 508-529; lines 557-580).

– In several places the authors state that a true positive signature of TD would not be eliminated through stringent filtering due to the persistence of strong LD. One effective way to demonstrate this would be to show that their filtering has not masked large regions of the genome in some donors. What are the longest stretches in which all SNPs have been eliminated through filtering?

We have noted in “Results: Adherence to Mendelian expectations across sperm genomes” that the longest observed tracts of homozygosity after filtering were ~3.5 Mbp (lines 476-485). Our TD simulations, however, demonstrate that given the extent of LD (which spans entire chromosomes) within a sample of sperm from a single donor, TD signal should still be apparent on the flanks of such regions where heterozygosity is restored (Figure 4—figure supplement 3). We have also added text to the Discussion further explaining why stringent filtering is necessary to avoid spurious signatures of TD (lines 541-550; lines 581-586)—a point that is not unique to our study, but that we approached very conservatively.

– Repetitive regions, particularly centromeres and telomeres, are often involved in TD in other species, yet these would be filtered out as "challenging regions" in this analysis. Is it possible to infer how well these regions might be captured through LD with included SNPs? The section of the Discussion describing cases of TD that might not be captured in this analysis should include this limitation.

We have updated the Discussion to acknowledge that a caveat of this and most previous scans for TD in humans is the inability to analyze highly repetitive and other technically challenging regions of the genome, which include heterochromatic sequences such as telomeres and centromeres that have been implicated in TD in other species (e.g., Brand et al. 2015, Axelsson et al. 2010). We also briefly describe resources and future technologies that could facilitate the analysis of TD within these technically challenging regions (lines 593-598).

– Relatedly, if the marginal signal of TD on chromosome 2 shown in Figure 4 and described in brief on pp. 5 – 6 is at the end of the genotype-able portion of the chromosome, it could represent stronger TD involving telomeric repeats in LD with this region. This possibility could be excluded or discussed.

In relation to the previous point, we now address the possibility of TD occurring at sites in LD with the region at the end of chromosome 2 in the Discussion (lines 586-593). We note that the stringent filtering of repetitive regions was necessary for avoiding spurious signatures of TD that may arise due to read mapping and genotyping artifacts, and we discuss future technologies that could facilitate genotyping within these regions.

– It would be helpful for interpreting the results to include more details about the source of the sperm samples, particularly the donors' ancestry and whether the donors were known to have any fertility issues. This would provide useful context for the possibility of population-specific TD in the Discussion, as well as any comparisons to results of earlier studies using other methods.

Using a principal component analysis (PCA) and data from the 1000 Genomes Project, we compared the genetic similarity of each donor relative to a reference panel. We have added this comparison as Figure 5—figure supplement 2. A total of 16 sperm donor individuals clustered with reference samples from the European superpopulation, 3 donors clustered with reference samples from the East Asian superpopulation, 2 donors clustered with reference samples from the South Asian superpopulation, and 3 samples clustered on an axis between the African and European superpopulations, consistent with their previously reported admixed African American backgrounds. Meanwhile, one sample (NC26) showed similarities with both the African and East Asian populations, again consistent with the previous report (lines 339-354).

We added information about the fertility status of each donor to the “Results: Application to data from human sperm” (lines 333-339). Of the 25 sperm donors, samples from 20 individuals were obtained from a sperm bank and of presumed (but unknown) normal fertility status (Bell et al. 2020), while 5 donors had known reproductive issues: failed fertilization after intracytoplasmic sperm injection (n=2) or poor blastocyst formation (n=3) (Leung et al. 2021).

In the Discussion (lines 613-621), we briefly consider the possibility of population-specific TD (a comment also raised by Reviewer 3). One related possibility raised by that reviewer is that admixture between populations could separate a (locally fixed) distorter from a (locally fixed) suppressor, revealing hidden distorter alleles. We hypothesize that this scenario is unlikely in humans, given that very few alleles exhibit such extreme patterns of frequency differentiation across human populations (1000 Genomes Project Consortium, 2015).

– The Conclusion could be substantially strengthened by suggesting other potential uses for this method, including but not limited to detection of TD. Which samples would be most interesting to use for subsequent scans of TD? What other uses might this method have in non-TD-related genetic or evolutionary studies?

Thank you for these suggestions, which we agree have strengthened our paper and established a starting point for others in the field to use our method.

As noted in our response to the previous point, the recent assembly of a complete human reference genome has resolved many heterochromatic sequences for the first time and offers great promise for the accurate study of TD genome-wide, especially with the use of long-read sequencing (Nurk et al. 2022, Aganezov et al. 2022).

In the Conclusion, we now also consider potential uses for rhapsodi in other genetic and evolutionary studies, beyond analysis of TD (lines 653-672). These include the construction of genetic maps for non-model species to complement other genomic resources such as genome assemblies. Within humans, the construction of personalized recombination maps could be useful to better understand the sources of variation in recombination and relation to phenotypes such as fertility status (Lyu et al. 2021). Notably, whole and partial chromosome gains and losses (i.e., aneuploidies) are the leading cause of human pregnancy loss, and one potential mechanism of aneuploidy formation is the abnormal number or placement of meiotic crossovers (Lamb et al. 2005). The chromosome-length haplotypes generated by rhapsodi offer additional context for population genetic analyses, including inferring kinship and population structure, detecting signatures of positive selection, and assessing demographic history (Li et al. 2020).

Reviewer #3 (Recommendations for the authors):

This method for phasing and imputing gamete genotypes requires substantially more benchmarking. Although accuracy is of interest and reasonably well evaluated through simulations, nothing is presented about the computational performance. In particular, given that much larger Sperm-seq datasets must already be in production, it is important to determine if this approach will scale sufficiently for those applications. Runtimes and memory requirements should be reported.

We have added information in the Results regarding rhapsodi’s runtimes and memory requirements for processing datasets ranging from the size (number of gametes, number of SNPs, coverage) of the sperm data used here to datasets 10x larger. We detail these requirements in Figure 2—figure supplement 9.

In the section titled “Results: Evaluating performance on simulated data”, we show that while rhapsodi was rigorously benchmarked for datasets containing up to 5000 gametes with 100,000 SNPs at coverages up to 2.3x coverage, the method is capable of analyzing much larger datasets (assuming coverages and SNP densities comparable to the data in Bell et al. [2020]; 0.01x coverage, 90,795 SNPs). To demonstrate this point, we successfully applied rhapsodi to simulated datasets of up to 32,900 gametes in under 24 hours running multi-threaded on 48 CPU cores (lines 1029-1047). This represents a dataset 20x the size of that produced by Bell et al. (2020).

We reference these results and their implications for future studies in the Discussion. In summary, our simulations demonstrate that rhapsodi outperforms existing methods for phasing and imputation of single-gamete data, while also efficiently scaling to datasets 20-fold the size of the data analyzed in our study (lines 274-286).

The code is reasonably well documented. However, I am not certain I could reproduce the haplotype phasing approach nor the HMM-based on the description provided in the text. The description should be expanded substantially possibly in a supplemental text section. For instance, in the state transition diagram in Figure 1C, as shown it indicates that the HMM does not transition back from error states. Is that right or am I misunderstanding?

We added additional detail regarding the haplotype phasing approach in the section titled “Methods: Reconstruction of phased donor haplotypes” (lines 718-761). We have also provided additional detail regarding the gamete imputation approach and the HMM in the section titled “Methods: Imputation of gamete genotypes” (lines 762-831).

We also linked a vignette for using rhapsodi in the main text section “Availability of data and materials” (lines 1321-1340), which details each of the three algorithmic steps of rhapsodi.

The section, "Power analysis for detecting TD" states that the power across simulations is 80%. Does this account for a multiple testing correction? Both corrected and uncorrected values should be reported.

Thank you for raising this point. We have now clarified that we report Power without (Figure 4A) and with (Figure 4B) multiple testing correction. We similarly report both values in Figure 4—figure supplement 1.

Two biological factors might be worth some additional clarification. First, what do we know about the ancestry of the donors? E.g., are some individuals admixed? Admixture might be expected to "unmask" cryptic distorter/suppressor pairs. Second, it should be clearly stated in the discussion that this study does not exclude the possibility of rare strong distorters at modest or low frequencies in human populations.

In response to this comment and to Reviewer 2’s comment regarding the ancestry of the donor individuals, we compared the genetic relatedness of each donor using a principal component analysis (PCA) with human genetic data from the 1000 Genomes Project (Figure 5—figure supplement 2; lines 339-354). A total of 16 sperm donor individuals clustered with reference samples from the European superpopulation, 3 donors clustered with reference samples from the East Asian superpopulation, 2 donors clustered with reference samples from the South Asian superpopulation, and 3 samples clustered on an axis between the African and European superpopulations, consistent with their previously reported admixed African American backgrounds. Meanwhile, one sample (NC26) showed similarities with both the African and East Asian populations, again consistent with the previous report.

The potential for admixture to unmask distorter/suppressor pairs is a fascinating one that we now briefly address in the Discussion (lines 608-621). After distorter alleles arise, suppressors may subsequently evolve to re-balance the transmission ratio (e.g., Greenberg et al. 2020). Admixture between populations may then separate the distorter from the suppressor, revealing hidden distorter alleles. We hypothesize that this scenario is unlikely in humans, given that very few alleles exhibit such extreme patterns of frequency differentiation across human populations (1000 Genomes Project Consortium, 2015).

As you note, there is the potential for a rare, strong distorter to affect TD in only a subset of the population. Such distorter alleles may not be detected in our study, especially due to the small sample size. We agree with this point and discuss this fundamental limitation in the Discussion (lines 605-608).

To me, figure 4-S3 is somewhat misleading. Note that this might actually be figure 4-s4 given the caption and references in the text, but on the pdf that page is labelled figure 4-s3. None of the donors considered had more than about 3300 total sperm sampled. After QC, I assume it is less. Yet the X-axis proceeds to 10k. I'd like to see the X-axis pruned back to a reasonable range for these analyses and a line plotted with points for each donor where the experiment would have power to detect TD. I suggest plotting the line at 80% power by convention, but 50% would allow for a more direct comparison to Meyer et al. (2012). This really should be a main text figure, too, since all of the major biological claims in the paper hinge on these analyses it is necessary to explore power in much greater depth.

Thank you for your careful consideration of our power analysis figure, which is a crucial aspect of our manuscript. We agree with your feedback and have improved the figure to address your comments.

Specifically, as suggested, we have made this a main text figure (Figure 4) with the X-axis pruned back to a reasonable range for these analyses: 500-3500 gametes for a given individual, reflective of our study design. We also added additional supplementary panels (Figure 4—figure supplement 1) that more clearly show the power to detect various strengths of transmission distortion across the sample sizes used in our study, including 80% power as is convention, as well as 50% power to allow direct comparison to previous work (e.g., Meyer et al. 2012). We indicate on all panels (both Figure 4 and Figure 4—figure supplement 1) the average number of gametes (n=1711) across donors in our study.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Essential:

– A comparison to hapcut2 seems to be crucial for benchmarking the method. Since your method has a phasing component, it is important to present a comparison that is adequate given the scale of the dataset.

We did not originally compare rhapsodi to HapCUT2 for two main reasons. First, HapCUT2 was developed as a diploid read-based phasing algorithm and is not compatible with the data structures and files used as input by rhapsodi. Second, we extensively compare rhapsodi to Hapi, a software package developed to impute sparsely sequenced gamete genotypes, and Hapi had been shown to outperform HapCUT2. However, given the comments from these Reviewers, as well as the adaptation of HapCUT developed and employed by our collaborator in Bell et al. (2020) for gamete sequencing data, we have revisited this topic for the revision. Note that while Bell et al. (2020) adapted the original version of HAPCUT, we instead adapted HAPCUT2 based on its improvements over the previous HapCUT algorithm (e.g., with regard to speed).

Applying HapCUT2 to the data input to rhapsodi requires a lengthy conversion process. We thus developed a file conversion pipeline (largely in line with that described by Bell et al. [2020]), with modifications to integrate with the other steps of rhapsodi. We then adapt the HapCUT2 algorithm for phasing, under the assumption that alleles originating from a single gamete and chromosome are linked. We offer this pipeline as an alternative phasing method within rhapsodi, called "linkedSNPHapCUT2". This pipeline is discussed in the Methods section "Reconstruction of phased donor haplotypes."

We compare the two phasing methods in rhapsodi – the default phasing approach termed "windowWardD2" and the HapCUT2 adaptation termed "linkedSNPHapCUT2." These comparisons are discussed in the Methods section "Comparison to existing methods'' (lines 1217-1237) and the Results section "Benchmarking against existing methods: Hapi and HapCUT2" (lines 297-454).

We applied "linkedSNPHapCUT2" to 5713 simulated datasets, on which it ran successfully in 77% of cases, but failed in 21% of cases based on the computing resources available (3 days runtime, 185Gb memory). Another 1% were unable to be processed due to extreme low coverages (88% below 0.001x) resulting in too few SNPs per chromosome (43 simulations with 1 SNP and 32 simulations with 2-5 SNPs). Of the simulations that failed with linkedSNPHapCut2, 43% were successfully phased with rhapsodi's default "windowWardD2" method. We compare completeness and accuracy between the two methods (Figure 3—figure supplement 3) as well as system/user and real time (Figure 3—figure supplement 4).

The major cost of the "linkedSNPHapCUT2" approach is the time and memory resources necessary to convert the input data files to the format necessary for use in HapCUT2. Because "windowWardD2" operates in windows along the chromosome, it runs as a multithreaded process. As such, its overall system time is larger than that of "linkedSNPHapCUT2" for datasets with high numbers of gametes, but wall-clock time may be much lower. Both options for phasing within rhapsodi offer high levels of completeness and accuracy across most data profiles (i.e., sequencing coverage and number of gametes), including those matching the Sperm-seq dataset. Overall, we find that in data limited scenarios, "linkedSNPHapCUT2" phases haplotypes with a higher completeness than the default "windowWardD2 method", but with comparable accuracy. For higher number of gametes, windowWardD2 offers comparable completeness and much higher accuracy, partially due to the HapCUT2 approach ignoring the positional information that was already encoded in the alignment. In doing so, this method does not take full advantage of the co-inheritance patterns of the SNP alleles, which is an advantage offered by the "windowWardD2" approach. Based on these results, we include both "windowWardD2" and "linkedSNPHapCUT2" in rhapsodi as alternative methods for phasing and recommend use of the latter in scenarios with small numbers of gametes and low coverage.

The above results are included in the text on lines 352-404, as well as in Figure 3—figure supplement 3 and Figure 3—figure supplement 4.

– The criticisms about pedigree-based studies need to be presented in a more nuanced way. Similarly, the broad statement about the absence of strong TD in humans seems poorly supported. The reviewers all suggested a more balanced presentation of these previous efforts and more of a discussion of how rhapsodi can be integrated in current research.

We have updated our presentation of previous efforts to investigate TD in humans (including pedigree-based studies), per the Reviewers’ specific comments below, in both the Introduction and the Discussion. These changes have better contextualized the nuances in statistical power, sample size, and allele frequency in each study design.

Specifically, we have clarified the power issues related to donor sample size and gamete size; we note that our study design differs from pedigree-based studies with regard to detecting TD affecting rare alleles. In our study, if an allele is rare in the population but heterozygous within just one donor, we are well-powered to detect TD. In pedigree-based studies, statistical power improves as the number of families heterozygous at a given site increases.

We further clarify our considerations of statistical power and allele frequency across populations: there is the issue of actually sampling the distorter allele, given its frequency in the population – a consideration that applies to both pedigree-based and gamete sequencing studies. Then, conditional on observing the allele, there is the issue of detecting TD given the sample size of informative transmissions. We have updated our statement regarding this topic (“single-gamete sequencing studies… provide equal power for detecting TD involving common and rare alleles”) to address this nuance.

We have addressed other differences between our study design and previous pedigree-based studies, including the possibility that pedigree studies capture TD or other effects occurring at other developmental timepoints, as well as the effects of selection (lines 579-583) and possible contributions of phenomena such as within-ejaculate sperm competition to signals of TD observed in other study designs that would not be detected in our current work (lines 724-743).

We also reworked our statement about the absence of strong TD in humans to more accurately reflect the evidence we have provided, which is specific to the sample under consideration. This begins with our statement of the TD result in the last paragraph of the Introduction and continues throughout the manuscript. In particular:

– We have added text to clarify that although TD occurring in repeat regions and segmental duplications would be removed through the filtering, we still expect to be able to detect the TD occurring across the chromosome, as evidenced by our simulations (lines 619-638).

– We address the possible cases of distorters masked or uncovered by admixture, and detection of multi-allele TD systems such as distorter and responder systems (lines 717-723).

Finally, we have expanded our discussion of how rhapsodi can be integrated in current research, facilitating novel uses of inferred genotype data and meiotic recombination events. These include:

– The ability to construct sex-specific genetic maps could provide essential genetic resources for non-model species to complement genome assemblies and annotations.

– The construction of personalized recombination maps for humans, which offers opportunities to investigate sources of variability in the recombination landscape, as well as its relation to phenotypes such as fertility status. For example, whole and partial chromosome gains and losses (i.e., aneuploidies) are the leading cause of human pregnancy loss, and one potential mechanism of aneuploidy formation is the abnormal number or placement of meiotic crossovers (Lamb et al. 2005).

– The opportunity for population genetic analyses using the chromosome-length haplotypes generated by rhapsodi, including inferring kinship and population structure, detecting signatures of positive selection, and assessing demographic history.

– The manuscript reads as two disjointed pieces, one on method development and the other on applications. We would ask that the manuscript be revised with this issue in mind, as it persists from the previous submission.

We have made significant updates to the manuscript, particularly following the suggestions from Reviewer 5, to better connect the method with its application and results. These changes have allowed us to reframe the paper, so that it more centrally focuses on our method and now features the scan for TD as a demonstration of an application of rhapsodi. We have added more clear transitions throughout all sections of the paper to improve the flow between use case, method, and application. We have also changed section headers to reflect this improved flow. As noted by multiple Reviewers, we believe these changes have improved the manuscript and will increase its readability and usefulness. Specifically:

– We have changed the title to more accurately summarize the work and appeal to potential readers and users by including the method in the title. It is now “A method for low-coverage single-gamete sequence analysis demonstrates adherence to Mendel's first law across a large sample of human sperm.”

– We have re-ordered the Abstract to highlight the need for new methods (such as rhapsodi) to analyze increasingly common datasets of sparsely sequenced gametes (such as SpermSeq). We then highlight TD in humans as an open question and one single gamete sequencing is well-suited to address. We then mention additional applications of rhapsodi.

– We have re-ordered the Introduction in a similar manner: starting with the availability of such datasets and the need for new methods to impute such sparsely sequenced data, we mention rhapsodi. We then move into the application to the SpermSeq data, outlining the state of the question of TD in humans currently and underscoring the opportunity offered by the SpermSeq data. Finally, we end with our findings regarding the absence of evidence of TD in this sample.

– We have applied these same changes to the Discussion: we begin with rhapsodi before introducing its application to the SpermSeq dataset and our findings regarding TD in that data.

Potentially useful but not essential:

The reviewers list some suggestions, including changing the title, modifying the abstract, and reordering the introduction and discussion.

Reviewer #2 (Recommendations for the authors):

I appreciate the extensive further simulations, analyses, and revisions the authors have provided, particularly the additional simulations under strong TD and the clarified details about their samples, metrics, and computing resources. The suggestions for future uses of rhapsodi are also helpful in communicating the significance of this work. These additions substantially strengthen the manuscript and alleviate many of my initial concerns.

Thank you for considering our manuscript again at this step. We agree your previous comments strengthened the paper and look forward to further improvement from this second set of reviews.

I still feel that statements about generalizing a negative finding in the search for TD within this sample to a broader statement about the absence of strong TD in humans (e.g., line 107 "underscores the fidelity of human male meiosis for ensuring balanced transmission of alleles to the gamete pool"), are somewhat too strong, for the following reasons:

1) All power simulations for rhapsodi-based TD detection are conditional on observing a distorter in heterozygous state within the sample, which would require the distorter not to be very rare (roughly >1.4% allele frequency required for 50% probability that at least one of 25 individuals is heterozygous). Observed distortion alleles in other species are often at very low frequency under an equilibrium model where the distortion is balanced by fitness costs in homozygotes (e.g., SD is found at 1 – 5% frequency per ref #9). Unbalanced distorters would fix or be lost rapidly, and would be unlikely to be detected except in the very brief window at which they are locally at intermediate frequency. The authors note these as caveats in the Discussion, but the need for a relatively common distorter is not described as an issue of power (see discussion of comparison to pedigree-based studies below).

We have addressed this and the next two reasons in detail throughout the manuscript. We have modified the above sentence to better reflect our finding of the absence of evidence of TD in this sample, as articulated by your comments here: “Our study thus suggests balanced transmission of alleles to the gamete pool in this sample.” While we acknowledge that TD detection is conditional on a donor being heterozygous for the distorter allele, our power analyses reflect the statistical power of our approach to detect TD if it occurs in a donor. We have added additional context addressing this limitation (the need for a relatively common distorter; lines 668-672) and added more detail regarding the comparison to pedigree-based studies, as outlined below.

2) I may have missed it, but I did not see mention of the possibility of population-specific TD aside from the discussion about fixed distorters uncovered through admixture. While it is true that there are very few alleles with extreme patterns of frequency differentiation across human populations (lines 620 – 621), this is not highly relevant for whether one should expect distorters to be population-specific. As mentioned above, distortion loci are likely to be rare and/or ephemeral, and either would make them very likely to be population-specific.

We agree that population-specific TD could be missed based on the limited diversity of this sample. We compared the genetic similarity of each of the 25 donor individuals in our dataset to globally diverse populations from the 1000 Genomes Project (see Figure 5 – —figure supplement 2). 16 individuals clustered with reference samples from the European superpopulation, 3 donors clustered with reference samples from the East Asian superpopulation, and 2 donors clustered with reference samples from the South Asian superpopulation. 3 donors clustered on an axis between the African and European superpopulations, and 1 donor showed similarities with both the African and East Asian populations. If distorters are rare or ephemeral, such distorters could be missed even in a diverse sample, which is a fundamental limitation of our study and other existing study designs for investigating TD in humans. We have added the additional caveat that population-specific distorters would likely not be included in our sample of 25 donors and discuss it in lines 709-713.

3) Some TD systems involve epistasis between alleles at distorter and responder loci, which would further require the responder allele to be observed heterozygous in the sample.

We agree that observing TD in such systems would require sampling an individual who is heterozygous at both loci. We would thus be less likely to capture this phenomenon, especially if the individual alleles are rare. More broadly, the possibility of missing rare alleles is a fundamental limitation of our study, but also of previous studies for detecting TD. Our study offers the advantage of requiring only one donor to be heterozygous for an allele for us to be well powered to detect TD, as opposed to other study designs that rely on aggregating data across multiple families and are thus more powered to detect TD involving more common alleles.

I did not notice this before, but there are statements in both the Introduction (line 63) and Discussion (line 505) about pedigree-based TD scans being underpowered due to the small size of human families. This is a bit misleading because such studies typically combine data across multiple families (as noted by the authors in lines 523 – 524), which reduces their power to detect TD present in any one individual but not to detect TD involving an allele at intermediate frequency that is heterozygous in many parents within the sample. Both pedigree-based and sperm sequencing studies have issues of power due to sample size of individuals that are currently discussed primarily in the context of pedigree-based studies; donor sample size concerns are separated from considerations of power when discussing the sperm sequencing study design.

We agree with the Reviewer that this is a nuanced issue. We have clarified the power issues related to donor sample size and gamete size, with changes to the paragraphs in the Introduction and the Discussion mentioned by the Reviewer. Specifically, we note that our study design differs from pedigree-based studies with regard to detecting TD affecting rare alleles: in our study, if an allele is rare in the population but heterozygous within just one donor, we are well powered to detect TD; in pedigree-based studies, statistical power improves as the number of families featuring the allele increases.

The statement that "single-gamete sequencing studies… provide equal power for detecting TD involving common and rare alleles" is undermined by the following clause "provided that they are heterozygous in the sampled donor individual." The probability that any of the sampled donor individuals are heterozygous for a distorter depends upon that distorter's frequency in the population, so the study's overall power to detect TD is not equal for common and rare alleles (see point 1 above). For example, in the context of Figure 4—figure supplement 4, 200 informative transmissions out of 1518 trios would imply p(heterozygous) = 0.066 (200/3036, MAF = 0.034). In the Sperm-seq sample of 25 individuals, 18% of such cases would be expected to have no heterozygotes in the sample. As this example demonstrates, the power for the present study is substantially higher than for previous pedigree-based studies, but not as much higher as implied by the power simulations.

We agree with the Reviewer that statistical power is affected by the distorter’s frequency in the population. We have chosen to consider these aspects of study design and power separately. There is the issue of actually sampling the distorter allele, given its frequency in the population a consideration that applies to both pedigree-based and gamete sequencing studies. Then, conditional on observing the allele, there is the issue of detecting TD given the sample size of informative transmissions. We have updated the sentence mentioned here to clarify this distinction and its impact on our study.

It might be worth considering for future work whether power may be gained for detecting TD using rhapsodi by combining data across individuals following haplotype inference.

Thank you for this suggestion. The idea of combining signals across donors (as is done with pedigree studies) is interesting and could potentially improve statistical power for common distorter alleles. However, it would require us to rework our approach for multiple testing correction (based on the effective number of statistical tests in light of LD). We will certainly consider this for future extensions of our work.

Reviewer #3 (Recommendations for the authors):

Overall, the manuscript has improved and many of my concerns have been well addressed. In particular, emphasizing the power considerations of this specific approach in the main text was an important addition. I appreciate the authors' correcting inaccuracies in my understanding of their previous work.

I respectfully disagree with the authors' dismissal of several considerations.

First, the paper remains a pretty sharp subdivision of methods vs biology. The methods are really about phasing and recombination detection in sperm-seq data, the biology is about TD. I believe it will be hard for a non-specialist to read this manuscript, though the authors are correct that extremely specialized users --- i.e., those with their own sperm-seq datasets --- may benefit from improved software usability.

Thank you for this feedback on the organization of the paper. We agree with the need to generalize the paper and make it more applicable and useful to non-specialist readers. We have made extensive changes to address this, including implementing many suggestions shared by Reviewer 5. Specifically, we updated the title to encapsulate both the method and its application; re-ordered the Abstract, Introduction, and Discussion (to first articulate the need for a new method/software to analyze low-coverage sequencing data from a large sample of haploid gametes; highlight TD in humans as an example of a question that can be addressed with the SpermSeq data more effectively than previous datasets; and summarize our findings); and added explicit mentions of other applications of rhapsodi to the Discussion. These changes are detailed in our response to essential revisions above.

Second, a comparison to an approach that is suited to the scale of data used is necessary. If hapcut2 is the only option, it should be applied despite being an "off-label" use. Also, yes, it is somewhat outside of typical hapcut2, but linking the reads bioinformatically is pretty straightforward and reasonable. It bears some similarity to read-backed phasing Certainly if one of this studies' co-authors believed this to be a valid use in previous published work it is reasonable to include as a point of comparison here.

As outlined in our response to essential revisions above, we have developed a pipeline to convert the input files from rhapsodi into a format usable in HapCUT2 and then call the phasing algorithm from HapCUT2 from within the rhapsodi package. We now include this as an alternative phasing approach in rhapsodi called `linkedSNPHapCUT2.` We compare this phasing approach with the default phasing option in rhapsodi (`windowWardD2`) for accuracy and completeness (Figure 3—figure supplement 3) and for system/user and real time (Figure 3—figure supplement 4) in simulated datasets (5,713 and 216, respectively) generated across a range of parameter spaces (coverage ranging from 0.001x to 2.303x and number of gametes ranging from 3 to 5,000).

On a related note, in "Benchmarking against existing methods", while the description of previous analysis is accurate, the relevant underlying datasets vary so much from previous works that is it not clear that the same results would hold here. Minimally, some acknowledgement should be added that the results described from previous work are based on a dramatically different dataset than was considered here.

We have added text (line 299-305) at the beginning of this section acknowledging that the results described from previous work are based on a dramatically different dataset than was considered here and clearly outlining our selection of Hapi for downstream comparisons. We also address this by comparing the default phasing method in rhapsodi, `windowWardD2`, with the modified phasing from HapCUT2, `linkedSNPHapCUT2` on the same simulated datasets.

Reviewer #4 (Recommendations for the authors):

The authors describe two main things in this paper:

First is the software package called "rhapsodi". Rhapsodi is an R package which is designed to use sparse whole-genome sequencing data from gametes to phase a sperm donor's haplotypes, impute gamete genotypes and identify meiotic crossover breakpoints.

Second, rhapsodi was used with both simulated data and a published dataset of 41,189 individual sperm cells' sequence. The main biological focus of this second part of the paper involved testing for transmission distortion, of any alleles or broader linkage blocks, for which none is detected. Previous publications, referenced in the manuscript, have suggested that transmission distortion can occur in human populations. The possible causes of the discrepancies in results between the study and previous studies is discussed.

Strengths

In the set of variables assessed, rhapsodi appears to outperform Hapi, which the authors present as the benchmark to meet or exceed for a package focused on efficient, accurate, complete and reliable phasing of haplotypes.

Another strength is that the paper also assesses transmission distortion with particularly large datasets for which no signal is detected. This contrasts from previous studies cited in the manuscript. However, the claims relevant to the analysis of this public dataset appears to be convincing. Further, the authors go on to offer good reasoning, particularly in the Discussion, about how they arrived at these conclusions, what any potential limitations of the approach may be, and how the dataset used differs fundamentally from pedigree-based data i.e. this study investigates transmission rates in gametes and does not consider different stages of fertilisation and subsequent aspects embryonic development.

Thank you for acknowledging the contributions of our work, and for your close reading of our paper.

Weaknesses

The approaches in the paper generally are quite strong. However, the study lacks a positive control for transmission distortion, which is not simulated data. This positive control does not exist in published human datasets to the best of my knowledge. However, outside of the scope of this study, one could consider reanalysing published non-human single-gamete sequencing datasets of which there are a number of studies.

We agree that a positive control of a non-human example of TD in single-gamete data would be a useful demonstration, though we agree that it is outside the scope of the current manuscript.

The analyses of meiotic crossovers with non-simulated data are quite light – limited to Figure 5 sup 4 and 5 – compared to the two other features of rhapsodi (phasing and imputation), which feature more extensively. Given that a significant portion of the manuscript presents a re-analysis of the data from Bell et al. 2019 Nature, it be helpful to visually demonstrate the variation in crossover numbers, positioning, interference between the 25 donors, and where possible compare these results to published analyses e.g. Bell et al. and other studies. For example, Figure 5—figure supplement 5 suggests that rhapsodi is much more conservative with calling of crossovers/haplotype transitions than Bjarni et al. and I think that this is worthy of discussion. Particularly because false positive rates of double crossovers can have a large effect on particular study types, such as those concerning crossover interference.

Thank you for acknowledging the contributions of our paper. We feel our analysis and discussion of meiotic crossovers is sufficiently detailed, supported by the two figures with non-simulated data mentioned by the Reviewer (Figure 5 – —figure supplements 4 and 5). Our method, rhapsodi, is a thoroughly documented and accessible software package for phasing, imputation, and identifying recombination hotspots. Our current manuscript focuses on phasing and imputation, which are the key steps that enable downstream applications of rhapsodi, including the scan for transmission distortion (which is novel and not investigated in Bell et al. 2020). Further investigating the crossovers of these donors (including in comparison to work by Bell et al.) is beyond the scope of the current paper; indeed, topics including the analysis of inter-individual variability in the crossover landscape were the main focus of Bell et al. (2020).

As described in our responses to previous reviewer comments, one such technical reason for the observed modest differences between our findings and those from Bjarni et al. appears to be related to the sample sequencing depth of coverage. Rather than pooling the number of inferred recombination events for each bin across all donors, we repeated the correlation analysis in a donor-specific manner. We then fit a linear regression model with the donor-specific sequencing depths of coverage as the predictor and the donor-specific correlations as the response variable. We found that the donor-specific correlation with the deCODE map was positively associated with depth of coverage (lines 463-470).

The outputs of this work will be very useful for future researchers; both the rhapsodi software, and the finding that there is no strong transmission distortion signal in these 25 male sperm samples. Rhapsodi is one of a very limited number of user-friendly generalised pieces of software – as opposed to a collection of scripts – for the analysis of sparsely sequenced gametes for haplotype phasing, imputation and meiotic crossover breakpoint analysis. The package should attract significant attention from the potential userbase.

Thank you for your enthusiasm about our software and findings.

The biological findings of this study – failure to identify transmission distortion in these samples – should be of general interest to geneticists and adjacent fields. Despite being a negative finding, it challenges existing literature with a robust analysis and will be sure to stimulate further research directions.

Thank you for your comment on our findings.

I found the paper very interesting. I also found the paper to be very well written, clear, accessible to a broad readership, and look forward to using rhapsodi.

Reviewer #5 (Recommendations for the authors):

The authors introduce a new method rhapsodi, which accurately infers haplotypes from low-coverage sequence data of large sample sizes of haploid gametes. They demonstrate that rhapsodi performs better than the existing approach Hapi, particularly for larger samples of gametes. They apply rhapsodi to test for evidence of transmission distortion (TD) in a published human Sperm-seq dataset (single cell sequencing of >30k sperm from 25 donors), and report no significant evidence of TD.

rhapsodi is a powerful approach, which will be useful in a variety of contexts. The TD analysis is more rigorous than previous pedigree-based studies, providing convincing evidence for a lack of strong distorters. However, power to detect weak TD remains limited after accounting for multiple testing. The method is a larger contribution, and likely to be of interest to a broader audience. The manuscript would benefit from re-framing to focus on introducing rhapsodi – with human TD results demonstrating its application.

Thank you for your comments, and for your close reading of our work. We agree that reframing the manuscript has helped emphasize the contributions of the Method and have addressed your feedback to that end, as detailed in the response to essential revisions above. We walk through those updates in response to your individual comments below.

Recommendations for authors:

The authors have thoroughly and carefully addressed the technical concerns of previous reviewers, and better placed the results in context with previous literature.

However, as noted by previous reviewers, the major strength of the manuscript is the method rather than the human TD analysis. Re-framing the manuscript with focus first on the method would better reflect the relative contribution of the aims, and broaden readership. This could be accomplished without extensive re-writing. For example, I suggest:

– Change title: The title refers only to the TD results – including mention of the method in the title will be more accurate and make it less likely to be overlooked by readers interested in applying the method to other questions. e.g. something like:

New method for analyzing haploid gamete sequences applied to a large sample of human sperm shows adherence to Mendel's first law.

Thank you for the title suggestion. We have reworded it slightly, keeping the mention of the method and the suggestion of its other applications: “A method for low-coverage single-gamete sequence analysis demonstrates adherence to Mendel's first law across a large sample of human sperm.” We agree that this more accurately summarizes our findings and emphasizes the method, as suggested by this Reviewer and by Reviewer 3.

– Re-ordering abstract: to first highlight the need for new method/software to analyse SpermSeq data – ie low-coverage sequencing data from large sample size of gametes. Next highlight TD in humans as an example of a question that can be addressed with SpermSeq data better than previous. [summary of findings] Add at the end more explicit mentions of other applications of rhapsodi (outlined in L653-672 of Discussion)

Thank you for the specific suggestions, which we agree have strengthened our paper. We have re-ordered the Abstract to highlight the opportunity for new methods to analyze increasingly common datasets of sparsely sequenced gametes (such as SpermSeq). We then highlight TD in humans as an open question and one single gamete sequencing is well-suited to address. We then mention additional applications of rhapsodi.

– Re-order Introduction/beginning of Discussion in similar manner.

We have re-ordered the Introduction in a similar manner: starting with the availability of such datasets and the need for new methods to impute such sparsely sequenced data, we mention rhapsodi. We then move into the application to the SpermSeq data, outlining the current knowledge regarding TD in humans and underscoring the opportunity offered by the single-cell gamete data such as SpermSeq. Finally, we end with our conclusion regarding the absence of evidence of TD in this sample.

We have applied these same changes to the Discussion: we begin with rhapsodi before introducing its application to the SpermSeq dataset and our findings regarding TD in that data.

Other comments:

L402 "even slight deviations from Mendelian expectations, as supported by our simulations of TD across a range of gamete sample sizes and transmission rates"

– I would consider ~50-55% slight deviations – power is low in this range

– Single noted peak on Chr 2 is 56% for 1571 gametes – but doesn't reach genome-wide significance

– This concern has been addressed somewhat by removing "strict" etc in response to previous reviewers, but could

We acknowledge that the statement regarding slight deviations is open-ended, especially in light of the other information in that paragraph and results we present in our study. As such, we have added additional notes in the text regarding the power in the range of 50-55% deviations.

L491 comparison to Zollner – excess of allele sharing among sibs in pedigree – 50.43% -

this is surviving offspring not gametes so reflects additional sources of TD and selection on embryos/offspring

Thank you for illustrating this important point that pedigree-based studies and gamete-based studies focus on data at completely different developmental stages and are thus subject to numerous different factors. We have added a sentence after this line to note that Zollner et al. and other pedigree studies are based on surviving offspring (rather than gametes) and thus may reflect additional sources of TD as well as selection and other factors.

L621-631 discusses other sources of TD but does not mention this may explain the discrepancy with pedigree based approaches

Could add within-ejaculate sperm competition as factor – reviewed in:

https://royalsocietypublishing.org/doi/full/10.1098/rstb.2020.0066#d1e627

We agree that the other factors (e.g., selection, physiology) affect samples between the study timepoints of gamete-based studies (i.e., pre-fertilization) and pedigree-based studies (i.e., two-generation family trios). As such, these factors can explain the discrepancy between our finding of balanced transmissions in the gamete pool and past findings of TD in pedigrees. We now mention this in the Results section “No global signal of biased transmission in human sperm” (lines 579-583). We have added text in the Discussion after listing these factors and specifically state that these can explain the discrepancy between the results and conclusions from past work and our current study: “These other forms of TD (in addition to other factors occurring throughout development between the timepoints of gamete-based studies such as ours and pedigree-based studies) can explain the differences in the conclusions of these studies” (lines 723-732). We have also added a sentence in this section mentioning that within-ejaculate sperm competition may contribute to signals of TD observed in other study designs (such as pedigrees) but would not be detected in this study of a pool of gametes, and have included the citation recommended here (lines 733-736).

L581 this paragraph makes more sense combined with points in L543-550

We agree that the paragraph referred to (L581, beginning with “One caveat of this and most previous scans for TD in humans”) makes more sense combined with the other points mentioned; we have thus combined that paragraph with that referred to by L543-550.

Filtering of repeat regions and segmental duplications could remove sites with TD (as noted by the previous reviewer) – in L545 point out the benefits of stringent filtering but doesn't point out that some of these regions could be effected by TD

We have added text to clarify that although TD occurring in these regions would be removed through the filtering, we still expect to be able to detect the TD occurring across the chromosome, as evidenced by our simulations (lines 615-625).

Methods – in beginning, briefly reiterate the differences compared to methods in the sperm-seq paper (Bell et al. 2020) – it's unclear that the method here is similar but formalised into the package

To clarify the differences between rhapsodi and the methods in Bell et al. 2020, we have added an additional paragraph to the Methods subsection “Comparison to existing methods.” In this we explain the differences in the phasing and crossover detection steps, and emphasize that rhapsodi is a formalized package, as noted by this Reviewer (lines 1238-1250).

Figure 4 – Figure supp 1 and Figure 2-supp9 – can't distinguish different colors – make points bigger and/or change shape also according to scale

Thank you for pointing out the issues with these figures. We have increased the size of the points in both figures to increase visibility.

https://doi.org/10.7554/eLife.76383.sa2

Article and author information

Author details

  1. Sara A Carioscia

    Department of Biology, Johns Hopkins University, Baltimore, United States
    Contribution
    Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Kathryn J Weaver
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0844-615X
  2. Kathryn J Weaver

    Department of Biology, Johns Hopkins University, Baltimore, United States
    Contribution
    Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Sara A Carioscia
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8701-4197
  3. Andrew N Bortvin

    Department of Biology, Johns Hopkins University, Baltimore, United States
    Contribution
    Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8784-6786
  4. Hao Pan

    Department of Biology, Johns Hopkins University, Baltimore, United States
    Contribution
    Software, Formal analysis, Investigation, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5543-4792
  5. Daniel Ariad

    Department of Biology, Johns Hopkins University, Baltimore, United States
    Contribution
    Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7427-6435
  6. Avery Davis Bell

    School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
    Contribution
    Data curation, Investigation, Methodology, Writing – review and editing
    Competing interests
    is an inventor on a US Patent Application (US20210230667A1, applicant: President and Fellows of Harvard College) relating to the Sperm-seq single-cell sequencing method. Was an occasional consultant for Ohana Biosciences between October 2019 and March 2020
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1837-302X
  7. Rajiv C McCoy

    Department of Biology, Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Resources, Software, Supervision, Funding acquisition, Visualization, Methodology, Writing – original draft, Writing – review and editing
    For correspondence
    rajiv.mccoy@jhu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0615-146X

Funding

National Science Foundation (1746891)

  • Sara A Carioscia

National Institutes of Health (R35GM133747)

  • Rajiv C McCoy

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank members of the McCoy lab for feedback and helpful discussions. Thank you to Angela Leung and Alec Wysoker for assistance accessing Sperm-seq data. We also thank the staff at the Advanced Research Computing at Hopkins for computing support.

Senior Editor

  1. Molly Przeworski, Columbia University, United States

Reviewing Editor

  1. Daniel R Matute, University of North Carolina, Chapel Hill, United States

Publication history

  1. Preprint posted: November 20, 2021 (view preprint)
  2. Received: December 14, 2021
  3. Accepted: December 5, 2022
  4. Accepted Manuscript published: December 7, 2022 (version 1)
  5. Version of Record published: January 17, 2023 (version 2)

Copyright

© 2022, Carioscia, Weaver et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 548
    Page views
  • 100
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Sara A Carioscia
  2. Kathryn J Weaver
  3. Andrew N Bortvin
  4. Hao Pan
  5. Daniel Ariad
  6. Avery Davis Bell
  7. Rajiv C McCoy
(2022)
A method for low-coverage single-gamete sequence analysis demonstrates adherence to Mendel’s first law across a large sample of human sperm
eLife 11:e76383.
https://doi.org/10.7554/eLife.76383

Further reading

    1. Genetics and Genomics
    2. Neuroscience
    Michael J Lafferty, Nil Aygün ... Jason L Stein
    Research Article Updated

    Expression quantitative trait loci (eQTL) data have proven important for linking non-coding loci to protein-coding genes. But eQTL studies rarely measure microRNAs (miRNAs), small non-coding RNAs known to play a role in human brain development and neurogenesis. Here, we performed small-RNA sequencing across 212 mid-gestation human neocortical tissue samples, measured 907 expressed miRNAs, discovering 111 of which were novel, and identified 85 local-miRNA-eQTLs. Colocalization of miRNA-eQTLs with GWAS summary statistics yielded one robust colocalization of miR-4707–3p expression with educational attainment and brain size phenotypes, where the miRNA expression increasing allele was associated with decreased brain size. Exogenous expression of miR-4707–3p in primary human neural progenitor cells decreased expression of predicted targets and increased cell proliferation, indicating miR-4707–3p modulates progenitor gene regulation and cell fate decisions. Integrating miRNA-eQTLs with existing GWAS yielded evidence of a miRNA that may influence human brain size and function via modulation of neocortical brain development.

    1. Genetics and Genomics
    2. Microbiology and Infectious Disease
    Po Jui Chen, Anna B McMullin ... David Bates
    Research Article Updated

    Bidirectional DNA replication complexes initiated from the same origin remain colocalized in a factory configuration for part or all their lifetimes. However, there is little evidence that sister replisomes are functionally interdependent, and the consequence of factory replication is unknown. Here, we investigated the functional relationship between sister replisomes in Escherichia coli, which naturally exhibits both factory and solitary configurations in the same replication cycle. Using an inducible transcription factor roadblocking system, we found that blocking one replisome caused a significant decrease in overall progression and velocity of the sister replisome. Remarkably, progression was impaired only if the block occurred while sister replisomes were still in a factory configuration – blocking one fork had no significant effect on the other replisome when sister replisomes were physically separate. Disruption of factory replication also led to increased fork stalling and requirement of fork restart mechanisms. These results suggest that physical association between sister replisomes is important for establishing an efficient and uninterrupted replication program. We discuss the implications of our findings on mechanisms of replication factory structure and function, and cellular strategies of replicating problematic DNA such as highly transcribed segments.