Low coverage data from individual gametes (A) is clustered to phase the diploid donor haplotypes (B). A Hidden Markov Model, with tunable rates of genotyping error and meiotic crossover, is applied …
Values represent the average of three independent trials. FDR: False Discovery Rate; TPR: True Positive Rate. For phasing and imputation, gray indicates that no hetSNPs remained after downsampling. …
(I) The first step of the generative model builds the phased haplotypes of the diploid donor, with n hetSNPs. (II) In the second step, gamete genotypes are derived from the diploid donor haplotypes …
Input data were created from the generative model and analyzed with rhapsodi. For all data in this figure, the genotyping error and recombination rates were matched between models. Each value is the …
The top row shows the true haplotypes of a given chromosome and the bottom row shows the inferred haplotypes. Purple denotes alleles assigned to one haplotype and teal alleles to the other. The …
A beta regression model relating optimal phasing window size (represented as window size / of SNPs) to number of gametes, coverage, genotyping error rate, and recombination rate was fit on the …
(A) Breakpoint resolution stratified by depth of coverage. Resolution is scaled to base pairs assuming pairwise nucleotide diversity of 0.001 (i.e., one hetSNP per 1000 bp). The minimum possible …
Plotted values are the difference between the mean performance (across three independent trials) when the generative model and rhapsodi use the same parameters (Figure 2) and the mean performance …
Plotted values are the difference between the mean performance (across 3 independent trials) when the generative model and rhapsodi use the same parameters (Figure 2) and the mean performance …
Plotted values are the difference between the mean performance (across three independent trials) when the generative model and rhapsodi use the same parameters (Figure 2) and the mean performance …
Plotted values are the difference between the mean performance (across three independent trials) when the generative model and rhapsodi use the same parameters (Figure 2) and the mean performance …
We generated sperm-seq datasets with 100,000 SNPs and varied coverages and numbers of gametes. We measured runtime of rhapsodi analysis using the rhapsodi_autorun function. Reported time is in CPU …
For each panel, we depict the difference in performance between the tools (rhapsodi minus Hapi). Each point represents a simulated dataset, and only datasets successfully analyzed by both tools are …
The datasets were stratified into groups according to the number of gametes used in construction, signified by color. The run completion trend for each tool is signified by line-type.
False = Not successful; True = Successful. Each column is defined by the number of gametes used in data construction. (A) The relationship between phasing accuracy and completeness. (B) The …
Datasets have varied coverages and gamete numbers, and all have 100,000 SNPs. Each panel shows the difference in performance between phasing methods (windowWardD2 minus linkedSNPHapCUT2) at a given …
Datasets have varied coverages, gamete numbers, and numbers of SNPs. Figure shows both CPU time and wall time. Points above the y=0 line represent datasets for which the windowWardD2 method performs …
For each combination of transmission rate and number of gametes, power was calculated based on 1000 independent simulations and assuming full knowledge of gamete genotypes. Panel A uses the standard …
For each combination of transmission rate and number of gametes, power was calculated based on 1000 independent simulations and assuming full knowledge of gamete genotypes. Panel A uses the standard …
We simulated 1000 gametes with 10,000 SNPs. We generated TD by choosing an allele at random and removing 30% of gametes which carried that allele. We simulated coverage of 0.01× and genotyping error …
We simulated data with 974 gametes, 79,630 SNPs, and 0.0075× coverage, corresponding to the data profile of donor NC26 chromosome 8. Zooming in to region surrounding the causal SNP (denoted with the …
The latter threshold is the standard for genome-wide significance (10-7) used in past studies (Meyer et al., 2012). The sample size refers to the number of informative transmissions, where each trio …
P-values are correlated across large genomic intervals due to the high degree of linkage disequilibrium among sperm cells from a single donor. Colors distinguish results from different donors. No …
Raw data containing hetSNPs from each chromosome are filtered to limit spurious signal caused by sequencing error. The autorun function from rhapsodi is applied, which generates and exports data …
We use a principal component analysis approach to investigate the genetic similarity of each donor with individuals from the 1000 Genomes Project (Ma and Amos, 2012). Donor individuals are shown in …
(A) For a segment of the chromosome enriched for low uncorrected p-values (p-value < 1 × 10-8), we plot each hetSNP based on the number of sperm in which it was observed as the reference allele or …
This map displays counts of the inferred number of crossovers within each 1 Mbp bin for each chromosome, pooling inference across the 25 donors. We used the midpoint of each crossover’s predicted …
By binning the deCODE recombination map within each 1 Mbp bin for each chromosome, a weighted average recombination rate was calculated and compared to the number of Sperm-seq inferred crossovers …
Points are distributed around a transmission rate of 0.5 (shown as red line).