Comment on 'Single nucleus sequencing reveals evidence of inter-nucleus recombination in arbuscular mycorrhizal fungi'
Abstract
Chen et al. recently reported evidence for inter-nucleus recombination in arbuscular mycorrhizal fungi (Chen et al., 2018a). Here, we report a reanalysis of their data. After filtering the data by excluding heterozygous sites in haploid nuclei, duplicated regions of the genome, and low-coverage depths base calls, we find the evidence for recombination to be very sparse.
https://doi.org/10.7554/eLife.47301.001Introduction
For many years arbuscular mycorrhizal fungi (AMF) were presumed to be asexual as no one had witnessed sexual structures in these fungi. This was puzzling because AMF retain core meiosis genes (Halary et al., 2011), indicating that a meiosis-like process most likely occurs in this lineage. Previous evidence for recombination later turned out to be based on duplicated gene copies (Croll and Sanders, 2009), or ribosomal RNA sequences that were paralogs (Pawlowska and Taylor, 2004; Maeda et al., 2018). Recently, based on work comparing single nuclei whole-genome sequences to bulk sequencing data, new evidence for recombination in these fungi was reported (Chen et al., 2018a). The isolates were dikaryotic, containing nuclei of two classes defined by their mating type (MAT) locus (Ropars et al., 2016). For each sequenced nucleus, PCR-amplification was attempted to assign a mating type class (MAT-1 up to MAT-5). Recombination was then inferred based on: (i) base-pair calls classed as one mating type found in the alternate mating type; (ii) nuclei of the same mating type showing variation in consecutive blocks of single nucleotide polymorphisms (SNPs); (iii) SNPs from nucleus 7 (SL1 strain) being more similar to SNPs of the alternate mating type, consistent with a recombination event spanning the MAT locus.
Here, we ask how strong the signal of within-strain recombination was in the data from Chen et al. (2018a) if we excluded heterozygous sites in haploid nuclei, duplicated regions of the genome, and low-coverage depths base calls. By removing data that cannot confidently be distinguished from sequencing errors and repeated regions, we find that the evidence for recombination is very sparse. We also report specific examples of these possible errors to justify our more stringent filtering of the data.
Results
The effect of filtering positions based on reads
Our first analysis was of the dataset reported in Supplementary file 6 of Chen et al. (2018a), used for both Figures 2 and 3 of that manuscript. We filtered out: (1) positions where any single nucleus was heterozygous (defined as sites with read depth >10, and alternate allele >10%), (2) any individual site with less than five reads coverage, and (3) positions with more than one high-confidence BLAST hit using the settings specified in Chen et al. (2018a). We applied the filters individually or in combination, to see how many of the variable sites inferred as signals of recombination would be removed. Applying each filter individually removed between 19–77% of recombined positions. Filtering of low coverage sites had the strongest effect, a ~ 75% reduction. Applying all three filters together removed 91% of recombined sites (Figure 1). Notably, these filters had much less effect on the total number of analyzed sites, with the combined application of all three filters reducing the total number of sites by only ~22%.
Recombination, when involving crossing over, exchanges physical blocks between homologous chromosomes resulting in consecutive allelic differences. We calculated the number of recombined sites, as well as the number of recombined blocks, consecutive SNPs of the alternate haplotype, shown in Table 1 (details of identified blocks can be found in Table 1—source data 1). Based on the analysis presented in Chen et al., all isolates show recombined blocks, in some cases spanning over a thousand base pairs. However, applying our three filters reduced these blocks. After filtering no consecutive SNPs remained for strain A4. Filtering applied to strains A5 and SL1 reduced the number of recombined blocks to 2 and 4, respectively, and also reduced the length of the remaining blocks.
-
Table 1—source data 1
- https://doi.org/10.7554/eLife.47301.005
A specific example of repeated regions associated with biallelic sites in haploid nuclei
Chen et al. (2018a) interpreted consecutive SNPs differing between nuclei of the same mating type as a sign of recombination, as highlighted in Figure 3 of their article. The filtering used in Chen et al. (2018a) removed multi-copy sites ‘[i]f BLAST results returned more than two good hits’, but retained regions with two BLAST hits. This could lead to the inclusion of SNPs that are heterozygous due to duplicated regions of the genome.
To show how repeated regions may lead to a false signal of recombination, we focused on an example highlighted in Figure 3 of Chen et al. (2018a), and discussed in the main text of that article. We found that several positions on scaffold 70 from isolate A4 were heterozygous in several nuclei (Figure 2), although they are treated as homozygous in Chen et al. (2018a). High sequencing depth (>30) eliminates rare sequencing errors as the cause. To test if duplicated regions could be the cause of the heterozygosity, we performed a BLAST search against the A4 reference genome with sequences from scaffold 70:100354–100657. This search resulted in two BLAST matches: the self match on scaffold 70, as well as an additional match on scaffold 3570 (Figure 2B). When the short reads of the dikaryon (bulk sequencing of all nuclei) were aligned to the reference genome, both these SNPs on scaffold 70 and their match on scaffold 3570 were heterozygous, and the BLAST hit result for both showed 100% identity match. Thus, this repeated sequence seems to have been assembled as a chimera of the two variants in both scaffolds, and the short reads from either copy are mapped equally to both.
We feel this example illustrates the need to exclude repetitive regions from analyses of recombination.
Confidence in low coverage sites to infer recombination
The data presented in Chen et al. (2018a) was filtered with a minimum of two reads. This is a very low threshold, and insufficient even for a consensus in the event of disagreeing reads. To look at the effect of low coverage on the signal of recombination, we compared the distribution of read depths between random and recombined sites. We first needed to identify recombined sites, as the method was lacking from the original manuscript, so we applied a parsimony criterion as detailed in Figure 1—figure supplement 1. While imperfect, this method certainly underestimates recombination, as it cannot identify recombined sites when equal numbers of nuclei within a mating type have alternate genotypes. Our method identified 733 positions, sufficient for analysis. Looking at the distribution of read depths, overall SL1 nuclei had ~95% fewer high coverage sites (average of 97 sites > 10 read depth for nuclei from SL1 versus 2290 for A4 and 2441 for A5) compared to A4 and A5 (Figure 3). We note here, as described in Table 1 of Chen et al. (2018a), that SL1 nuclei cover much less of the genome (14%) compared to A4 and A5 (53% and 42%, respectively). Another fact visible from Figure 3 is that, for A4 and A5, recombined sites are overrepresented by sites with low depth compared to sub-sampled non-recombined sites (Wilcoxon ranked sum test A4; p=3.2×10−09, A5 p=2.9×10−6). We note that for nuclei from isolate SL1, fewer overall recombined sites can be identified since the decreased breadth of coverage reduces overlap between nuclei, making it difficult to say whether this pattern of excess low-coverage sites is also present (p=0.11).
Genome-wide pairwise SNP differences are reduced after filtering
We then assessed the evidence for genome-wide recombination based on pair-wise SNP differences. This is the analysis presented in Figure 3 of Chen et al. (2018a), showing overall more recombination in SL1 than A4 and A5, indicated by a ‘mosaic pattern’. We again note here that sequence from SL1 nuclei covers very little of the assembly (average of 14% from Table 1 of Chen et al., 2018a) means that very few positions will be covered between any two nuclei between nuclei of SL1 (14% in one nucleus x 14% in the other nucleus = 2% in both). After applying our filters, we find that in A4 and A5 almost all differences between nuclei of one mating type disappear (Figure 4). For nuclei from SL1, the filtering reduced the differences within a mating type, but since these nuclei cover so little of the genome, the overall dataset is reduced such that on average nuclei only share 9 SNPs that can be compared. Many nuclei have no overlapping SNPs and no comparison can be made (black squares in Figure 4). A few nuclei of opposite mating types, such as nuclei 17 and 25, show high similarity, but for these pairs the similarity is based on only one or two shared SNP positions.
Confirming the placement of nucleus 07 (SL1) could be strong evidence for recombination
In Figure 2 of Chen et al. (2018a), nucleus 07 of SL1 shows a strong similarity with nuclei of the alternate mating type, seen by the clustering of nucleus 07 with MAT-5. However, this nucleus was PCR-genotyped to be of mating type MAT-1. The incongruence between PCR-genotyped mating type and the SNP clustering would be evidence of recombination spanning the MAT locus. To confirm this, we looked in the mapped reads of each nucleus to find reads mapping to either alternate mating type locus (Figure 4—figure supplement 1]). We found that there were no Illumina sequencing reads of nucleus 07 mapped to either mating type locus, indicating the whole genome amplification step may not have amplified the mating locus. As such, we have no available evidence for the mating type of this nucleus. Without corresponding Illumina evidence, we consider the PCR product the sole remaining evidence of this recombination event. This PCR experiment that is not confirmed with Illumina data represents the only remaining evidence for recombination after read filtering. We find cross-contamination of the PCR to be a more likely scenario in the face of many billions of sequenced bases from an Illumina run.
Conclusion
Finding a balance between filtering poor data and losing informative data is a critical component of any analysis. For this dataset, we provide evidence for the necessity of stringent filtering to avoid inferences based on erroneous or misleading data. We do not consider our filters to be particularly strict, as removing low coverage sites, repeated regions, and heterozygous sites from haploid data is commonplace in genomic analyses. In SL1, three blocks of consecutive SNPs remained after our filtering, and two regions in A5. Some of these regions likely remain because our heterozygosity filter requires a minimum of ten reads, thus low coverage heterozygous sites are not excluded. This analysis used the first 100 contigs, covering approximately 10 Mb. As such, finding only a handful of putative recombined SNPs certainly cannot be confidently separated from amplification/sequencing noise. While small blocks of genetic exchange may be compatible with gene conversion, the limited number of markers involved greatly increases the difficulty in identifying high-confidence gene conversion events (Wijnker et al., 2013; Qi et al., 2014).
Mapping recombination inside repetitive regions would require longer reads than available from standard short-read technologies. This is a formidable task due to the input requirements of PacBio technologies, but it has been accomplished using linked short reads from individual pollen cells (Sun et al., 2019). Notably, the use of a more contiguous reference genome will actually include additional repetitive regions, and the exclusion of repetitive regions will become even more important. As the apparent recombined blocks are much smaller than the contigs, a more contiguous genome assembly would not change our analysis. Single nucleus genome amplification with multiple displacement amplification produces extremely variable genome coverage. Normalization of the data was shown to improve the quality of AMF genomic data when coverage is variable (Montoliu-Nerin et al., 2019). In addition to normalization, low coverage sites could still be used with a model-based approach, which incorporates the associated uncertainty (Hinch et al., 2019; Bloom et al., 2013). Finally, removing heterozygous sites from haploid single-nuclei seems like a self-evident requirement.
The conserved meiosis genes found in the genomes of Glomeromycotan species strongly suggests a meiosis-like process allowing recombination and re-shuffling of genetic material among genomes. Uncovering the details of this process would be a major scientific breakthrough. Given this importance, claims made regarding recombination in Glomeromycota require rigorous examination. While we acknowledge that the models presented in Chen et al. (2018a) are valuable framework to test hypotheses for meiosis-like mechanisms found in these fungi, the data presented are not robust enough to support or reject them. As such, we believe that one of the greatest remaining mysteries in mycology remains unknown.
Materials and methods
Raw data
Request a detailed protocolWe obtained the SPAdes assemblies for the three dikaryotic R. irregularis isolates from NCBI, as well as the paired-end read libraries from the dikaryons and the paired-end reads of the single nuclei. Details of the accession numbers used are found in Chen et al. (2018a).
Read processing
Request a detailed protocolShort reads were cleaned with FASTP (Chen et al., 2018b), then aligned using BWA mem as in Chen et al. (2018a). Reads per nucleus were analyzed using the python modules pysam (Heger and Jacobs, 2019, and BLAST searches were scripted using biopython (Cock et al., 2009). Scripts used are available on GitHub repository (https://github.com/BenAuxier/Chen.2018.Response; Auxier, 2019; copy archived at https://github.com/elifesciences-publications/Chen.2018.Response). The parsimony criterion shown in Figure 1—figure supplement 1A. for identification of recombined sites, was performed manually. Filtering of excel files and calculations of recombined sites was performed with the R statistical language, which was also used to prepare plots with ggplot2 and distance matrices using ape (Wickham, 2016; Paradis and Schliep, 2019).
We compared the coverage representation between non-recombined and recombined sites. We subsampled from non-recombined sites to match the number of reads in recombined sites and performed Wilcoxon ranked sum test in R between the subsampled set of non-recombined reads and the recombined reads.
Mating-type loci identification and mapping
Request a detailed protocolAs the location of the mating type loci was not specified in the results of Chen et al. (2018a), we identified them based on data from Ropars et al. (2016). Specifically, we used the primers sequences kary001, kary002, and kary003 as query sequences for BLAST searches against the A4, A5, and SL1 genomes. Each primer sequence had strong matches against two different scaffolds, consistent with divergent ideomorphs as found in Ropars et al. (2016). As these primers only target a small region, we extended the locus using the annotations found on NCBI to identify the boundaries of the pair of genes. These locations are reported in Figure 4—figure supplement 1—source data 1.. As no annotations are available for SL1, we used the entire sequence of the ideomorph on scaffold511 of A5 as a BLAST query to identify the homologous regions.
With the mating locus identified, we then counted the number of reads that mapped to each sequence using samtools (Li et al., 2009).
Calculation of overlapping SNPs
Request a detailed protocolTo calculate the expected number of overlapping SNPs found in Figure 4D, we used the following formula:
Data availability
No data was generated for this study. Analyses and scripts can be found on https://github.com/BenAuxier/Chen.2018.Response (copy archived at https://github.com/elifesciences-publications/Chen.2018.Response).
References
-
Fastp: an ultra-fast all-in-one FASTQ preprocessorBioinformatics 34:i884–i890.https://doi.org/10.1093/bioinformatics/bty560
-
Conserved meiotic machinery in Glomus spp., a putatively ancient asexual fungal lineageGenome Biology and Evolution 3:950–958.https://doi.org/10.1093/gbe/evr089
-
The sequence alignment/Map format and SAMtoolsBioinformatics 25:2078–2079.https://doi.org/10.1093/bioinformatics/btp352
-
BookGgplot2: Elegant Graphics for Data AnalysisNew York: Springer-Verlag.https://doi.org/10.1007/978-0-387-98141-3
Article and author information
Author details
Funding
The authors declare that there was no funding for this work.
Acknowledgements
We thank Dr. Duur Aanen, Dr. Anna Rosling, Dr. Marisol Sanchez-Garcia, Dr. Erik Wijnker, and Mathijs Nieuwenhuis for critical feedback. We also thank the three anonymous reviewers.
Version history
- Received: April 9, 2019
- Accepted: October 9, 2019
- Version of Record published: October 25, 2019 (version 1)
Copyright
© 2019, Auxier and Bazzicalupo
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,126
- Page views
-
- 73
- Downloads
-
- 5
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Evolutionary Biology
Life as we know it relies on the interplay between catalytic activity and information processing carried out by biological polymers. Here we present a plausible pathway by which a pool of prebiotic information-coding oligomers could acquire an early catalytic function, namely sequence-specific cleavage activity. Starting with a system capable of non-enzymatic templated replication, we demonstrate that even non-catalyzed spontaneous cleavage would promote proliferation by generating short fragments that act as primers. Furthermore, we show that catalytic cleavage function can naturally emerge and proliferate in this system. Specifically, a cooperative catalytic network with four subpopulations of oligomers is selected by the evolution in competition with chains lacking catalytic activity. The cooperative system emerges through the functional differentiation of oligomers into catalysts and their substrates. The model is inspired by the structure of the hammerhead RNA enzyme as well as other DNA- and RNA-based enzymes with cleavage activity that readily emerge through natural or artificial selection. We identify the conditions necessary for the emergence of the cooperative catalytic network. In particular, we show that it requires the catalytic rate enhancement over the spontaneous cleavage rate to be at least 102–103, a factor consistent with the existing experiments. The evolutionary pressure leads to a further increase in catalytic efficiency. The presented mechanism provides an escape route from a relatively simple pairwise replication of oligomers toward a more complex behavior involving catalytic function. This provides a bridge between the information-first origin of life scenarios and the paradigm of autocatalytic sets and hypercycles, albeit based on cleavage rather than synthesis of reactants.
-
- Cell Biology
- Evolutionary Biology
The genomes of close unicellular relatives of animals encode orthologs of many genes that regulate animal development. However, little is known about the function of such genes in unicellular organisms or the evolutionary process by which these genes came to function in multicellular development. The Hippo pathway, which regulates cell proliferation and tissue size in animals, is present in some of the closest unicellular relatives of animals, including the amoeboid organism Capsaspora owczarzaki. We previously showed that the Capsaspora ortholog of the Hippo pathway nuclear effector Yorkie/YAP/TAZ (coYki) regulates actin dynamics and the three-dimensional morphology of Capsaspora cell aggregates, but is dispensable for cell proliferation control (Phillips et al., 2022). However, the function of upstream Hippo pathway components, and whether and how they regulate coYki in Capsaspora, remained unknown. Here, we analyze the function of the upstream Hippo pathway kinases coHpo and coWts in Capsaspora by generating mutant lines for each gene. Loss of either kinase results in increased nuclear localization of coYki, indicating an ancient, premetazoan origin of this Hippo pathway regulatory mechanism. Strikingly, we find that loss of either kinase causes a contractile cell behavior and increased density of cell packing within Capsaspora aggregates. We further show that this increased cell density is not due to differences in proliferation, but rather actomyosin-dependent changes in the multicellular architecture of aggregates. Given its well-established role in cell density-regulated proliferation in animals, the increased density of cell packing in coHpo and coWts mutants suggests a shared and possibly ancient and conserved function of the Hippo pathway in cell density control. Together, these results implicate cytoskeletal regulation but not proliferation as an ancestral function of the Hippo pathway kinase cascade and uncover a novel role for Hippo signaling in regulating cell density in a proliferation-independent manner.