Schematic distribution of two markers along the genealogy and four genomes.

A) Schematic distribution of marker 1 (yellow star) and marker 2 (green star) along the genealogies in a sample of four genomes both following a homogeneous Poisson process. B) The green marker 2 is not heritable, so that its distribution is independent from the genealogy. C) The green marker 2 is spatially structured along the genome, violating the distribution of the Poisson process along the genome and conflicting with the genealogy. D) The green marker 2 does not follows Poisson process through time, e.g. burst of mutations at a specific time point represented by given branches of the genealogies in green. The yellow marker 1 has an identical Poisson process along the genome and the genealogy in all four panels, and for readability, marker 2 exhibits light and dark green states.

Performance of SMC approaches using different markers.

Estimated demographic history of a bottleneck (black line) by SMC approaches using two genomic markers. In orange and red, are the estimates by MSMC2 and eSMC2 based on only marker 1. Estimates from SMCtheo integrating both markers are in green (with known µ2), and in blue with unknown µ2. The demographic scenarios are A) 10-fold recent bottleneck with an ancestral population size N = 10, 000, B) 10-fold recent bottleneck with an ancestral population size N = 1, 000, C) 10-fold bottleneck with an ancestral population size N = 10, 000, and D) a very severe (1,000 fold) and very recent bottleneck with incomplete size recovery. In A, B and D, we assume r/µ1 = 1 (with r = µ1 = 108, µ2 = 104 per generation per bp) and in C, r/µ1 = 10 (with r = 107, µ1 = 108, and µ2 = 104 per generation per bp). In all cases (A, B, C and D) 10 sequences (5 diploid indivudals) of 100 Mb were used as input.

Average estimated values of the mutation rate of marker 2 (µ2), knowing that of marker 1. We use 10 sequences (5 diploid individuals) of 100 Mb (r = µ1 = 108 per generation per bp) under a constant population size fixed at N = 10, 000. The coefficient of variation over 10 repetitions is indicated in parentheses.

Estimates of recombination rates with one or both markers. For SMCtheo, BW stands for the use of the Baum-Welch algorithm to infer parameters, and LH to the use of the likelihood. We use 10 sequences of 100 Mb with r = 107, µ1 = 108 and µ2 = 104 per generation per bp in a population with a past bottleneck event. The coefficient of variation over 10 repetitions is indicated in brackets.

Schematic representation of site and region epimutations

Schematic representation of a sequence undergoing epimutation at A) the cytosine site level, and B) at the region level. A methylated cytosine in CG context is indicated in black and an unmethylated cytosine in white.

Key statistics for epimutations and mutations.

A) Histogram of the length between two recombination events (genomic span of a genealogy) and DMRs size in bp of the simulated data. B) Histogram of genealogy span and DMRs size in bp from the A. thaliana data (10 German accessions). C) Linkage desequilibrium decay of epimutations in our samples of A. thaliana (red) and simulated data (blue). D) Linkage desequilibrium decay of mutations in our A. thaliana samples (red) and simulated data (blue). The simulations reproduce the outcome of a recent bottleneck with sample size n = 5 diploid of 100 Mb, the rates per generation per bp are r = 3.5 × 108, µ1 = 7 × 109, µSM = 3.5 × 104, µSU = 1.5 × 103, and per 1kb region µRM = 2 × 104 and µRU = 1 × 103.

Performance of SMC approaches using site epimutations (SMPs) and mutations (SNPs) under a bottleneck scenario.

Estimated demographic history by eSMC2 (blue) and SMCm assuming the epimutation rate is known (B and D) or not (A and C) where the percentage of CG sites with methylated information varies between 20% (red), 10% (orange) and 2% (green) using 10 sequences of 100 Mb in A and B (with 10 repetitions) and 10 sequences of 10 Mb in C and D (three repetitions displayed) under a recent severe bottleneck (black). The parameters are: r = 3.5 × 108 per generation per bp, mutation rate µ1 = 7 × 109, methylation rate to µSM = 3.5 × 104 and demethylation rate to µSU= 1.5 × 103 per generation per bp.

Integrating epimutations and mutations on German accessions of A. thaliana.

Estimated demographic history of the German population by eSMC2 (only SNPs, purple) and SMCm when keeping polymorphic methylation sites (SMPs) only: green with epimutation rates estimated by SMCm, blue with epimutation rates fixed to empirical values. The region epimutation effect is ignored. The parameters are r = 3.6 × 108, µ1 = 6.95 × 109, and when assumed known, the site methylation rate is µSM = 3.5 × 104 and demethylation rate is µSU = 1.5 × 103.