Clusters of polymorphic transmembrane genes control resistance to schistosomes in snail vectors

  1. Jacob A Tennessen  Is a corresponding author
  2. Stephanie R Bollmann
  3. Ekaterina Peremyslova
  4. Brent A Kronmiller
  5. Clint Sergi
  6. Bulut Hamali
  7. Michael Scott Blouin
  1. Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, United States
  2. Department of Integrative Biology, Oregon State University, United States
  3. Center for Genome Research and Biocomputing, Oregon State University, United States

Abstract

Schistosomiasis is a debilitating parasitic disease infecting hundreds of millions of people. Schistosomes use aquatic snails as intermediate hosts. A promising avenue for disease control involves leveraging innate host mechanisms to reduce snail vectorial capacity. In a genome-wide association study of Biomphalaria glabrata snails, we identify genomic region PTC2 which exhibits the largest known correlation with susceptibility to parasite infection (>15 fold effect). Using new genome assemblies with substantially higher contiguity than the Biomphalaria reference genome, we show that PTC2 haplotypes are exceptionally divergent in structure and sequence. This variation includes multi-kilobase indels containing entire genes, and orthologs for which most amino acid residues are polymorphic. RNA-Seq annotation reveals that most of these genes encode single-pass transmembrane proteins, as seen in another resistance region in the same species. Such groups of hyperdiverse snail proteins may mediate host-parasite interaction at the cell surface, offering promising targets for blocking the transmission of schistosomiasis.

eLife digest

Schistosomiasis is a widespread parasitic disease, affecting over 200 million people in tropical countries. It is caused by schistosome worms, which are carried by freshwater snails. These snails release worm larvae into the water, where they can infect humans – for example, after bathing or swimming.

Treatment options for schistosomiasis are limited. Eliminating the freshwater snails is one way to control the disease, but this is not always effective in the long term and the chemicals used can also harm other animals in the water.

Another way to manage schistosomiasis could be to stop the worms from infecting their snail host by breaking the parasites’ life cycle without killing the snails. It is already known that some snails are naturally resistant to infection by some strains of schistosomes. Since this immunity is also inherited by the offspring of resistant snails, there is likely a genetic mechanism behind it. However, very little else is known about any genes that might be involved. Tennessen et al. therefore set out to identify what genes were responsible for schistosome resistance and how they worked.

The experiments used a large laboratory colony of snails, whose susceptibility to schistosome infection varied among individual animals. To determine the genes behind this variation, Tennessen et al. first searched for areas of DNA that also differed between the immune and infected snails. Comparing genetic sequences across over 1,000 snails revealed a distinct region of DNA that had a large effect on how likely they were to be infected.

This section of DNA turned out to be highly diverse, with different snails carrying varying numbers and different forms of the genes within this region. Many of these genes appear to encode proteins found on the surface of snail cells, which could affect whether snails and worms can recognize each other when they come into contact. This in turn could determine whether or not the worms can infect their hosts.

These results shed new light on how the snails that carry schistosomes may be able to resist infections. In the future, this knowledge could be key to controlling schistosomiasis, either by releasing genetically engineered, immune snails into the wild (thus making it harder for the parasites to reproduce) or by using the snails’ mechanism of resistance to design better drug therapies.

Introduction

Schistosomiasis is a chronic and debilitating disease suffered by over 200 million people worldwide (Evan Secor, 2014; Lo et al., 2018). It is caused by infection with schistosome trematode parasites that are transmitted by aquatic snails (Lo et al., 2018). Infection can be treated with regular doses of a single drug, praziquantel (Doenhoff et al., 2009). However, it has become increasingly clear that mass drug administration alone will not adequately control schistosomiasis, and that successful elimination of transmission requires intervention at the snail stage (Lo et al., 2018; Sokolow et al., 2018).

Immunogenetic interactions between snails and schistosomes represent a crucial stage in the parasite life cycle that can be targeted to block transmission. Parasite resistance is highly heritable in snails, and there is substantial strain-by-strain interaction between hosts and parasites (Richards and Shade, 1987; Richards et al., 1992; Knight et al., 1999; Webster et al., 2004; Webster and Woolhouse, 1998; Theron et al., 2014). Genetically diverse parasite cultures can infect nearly all snails, while bottlenecked laboratory strains of parasite can only infect a subset (Theron et al., 2008), suggesting a ‘trench warfare’ model in which numerous alleles are maintained in both host and parasite populations because each matches a different phenotype in the other species (Seger, 1988; Stahl et al., 1999). There are likely to be snail genes with large effects on resistance to particular parasite genotypes (Lewis et al., 2001). Finding these genes will open two potential avenues for disease mitigation. First, they may uncover mechanisms of infection by the parasite that could be therapeutically targeted. Second, they may facilitate genetic manipulation of wild snail populations so as to reduce parasite transmission by eliminating alleles that permit infection by certain schistosome genotypes (Theron et al., 2014; Reardon, 2016; Famakinde, 2018; Maier et al., 2019).

For invertebrates, highly strain-specific responses to parasites remain unexplained (Schmid-Hempel, 2005; Schulenburg et al., 2007), as these taxa lack the combinatorial immune system found in vertebrates with its vast and adjustable repertoire of recognition molecules. Invertebrate resistance to macroparasites or parasitoids often involves large-effect loci (Carton et al., 2005), but most of these are poor candidates for mediating strain specificity, as they reflect generic enhanced defenses conveying a constitutive fitness cost (Kraaijeveld and Godfray, 1997; Webster and Woolhouse, 1999; Koella and Boëte, 2002), or encode signaling molecules (Hita et al., 2006) or effectors (Goodall et al., 2006) rather than recognition molecules. Strain specificity may be conveyed by suites of highly diverse host genes that act synergistically, especially if these interact with similarly varying and coevolving sets of parasite genes to mediate host-parasite recognition (Schmid-Hempel, 2005; Schulenburg et al., 2007; Cerenius and Söderhäll, 2013). Such loci will not necessarily produce a large phenotypic signal unless several are clustered in the same genomic region.

The neotropical snail Biomphalaria glabrata has been the focus of recent efforts to develop genomic resources for schistosomiasis vector biology, although the 916 Mb, repetitive genome remains poorly assembled (reference genome BglaB1 N50 = 48 kb; Adema et al., 2017; Tennessen et al., 2017). B. glabrata snails can be readily challenged with Schistosoma mansoni miracidia under controlled laboratory conditions, with successful infections diagnosed by subsequent shedding of cercariae (Bonner et al., 2012). To date, four genomic regions have been identified in which allelic variation influences resistance to infection by S. mansoni (Knight et al., 1999; Goodall et al., 2006; Tennessen et al., 2015a; Tennessen et al., 2015b). An F2 mapping cross between resistant B. glabrata strain BS-90 and a susceptible strain yielded two RAPD markers linked to resistance (Knight et al., 1999). One of those was subsequently aligned to a contig on Linkage Group (LG) XII (Tennessen et al., 2017), although the identities of any candidate genes to which it may be linked remain unclear (the other could not be uniquely mapped). Two other genomic regions (sod1 and RADres; Goodall et al., 2006; Tennessen et al., 2015a) influence S. mansoni infection in B. glabrata population 13–16-R1 which is admixed from Caribbean and Brazilian populations and has been maintained free from parasites for several decades as a large laboratory population with substantial segregating variation. However, together those two regions explain only 7% of the variance in resistance in 13-16-R1, suggesting that other resistance loci remain undiscovered in this population. In B. glabrata from Guadeloupe, the Guadeloupe Resistance Complex (GRC) shows an 8-fold effect on the odds of S. mansoni infection (Tennessen et al., 2015b; Allan et al., 2017) via a hemocyte-mediated mechanism (Allan et al., 2018a) that also affects the proteome (Allan et al., 2019) and microbiome (Allan et al., 2018b). Seven clustered GRC genes encode hyperdiverse single-pass transmembrane (TM1) proteins that appear to recognize parasite-associated molecules (Tennessen et al., 2015b), likely including saccharides (Allan and Blouin, 2018).

Here, we use a genome-wide association study to pinpoint a new resistance region in snails, with a very large (over 15-fold) effect on odds of infection by schistosomes. It comprises a cluster of highly polymorphic transmembrane genes, and as GRC (=PTC1) was the first such cluster described in Biomphalaria, we designate this second region as Polymorphic Transmembrane Cluster 2 (PTC2). Using PacBio, we have vastly improved the assembly of the B. glabrata genome, allowing us to fully characterize the chromosomal vicinity of PTC2. Transcriptomic data show that, like GRC, PTC2 harbors exceptionally divergent suites of TM1 genes, suggestive of coevolutionary dynamics. These results support a general immunogenetic scenario in which clusters of highly polymorphic TM1 genes mediate host-parasite interaction.

Results and discussion

In pooled whole-genome sequencing of 600 infected and 600 uninfected 13–16-R1 snails (298x and 333x coverage, respectively), a single genomic region showed by far the greatest difference in allele frequencies between pools (Figure 1A; Supplementary file 1A). The highest outliers occurred in a 450 kb section of LG XII, here called PTC2. Genetic divergence between pools (FST) at numerous PTC2 variants exceeds 0.1, a value unobserved among one billion simulated neutral variants, which is therefore significant even if corrected for the nearly 7 million empirical variants examined (p<0.01). Many variants even show FST over 0.2, more than twice the FST at sod1 and RADres. By subsequently genotyping indel polymorphisms at PTC2 in individual snails, we observed three alleles at intermediate frequency (R: 44%, S1: 24%, and S2: 32%). Infection was rare for RR homozygotes (12.9%), and much more common for S1S1 (75.3%) and S2S2 (29.6%) homozygotes, a difference in infection odds of over 15-fold (i.e. infection odds of 0.15 vs. 3.0; Figure 1B). Heterozygotes showed intermediate phenotypes. There was weak partial dominance of S1 over R (observed Clopper-Pearson 95% confidence interval of infection probability for S1R = 51.0–61.8%; expected intermediate phenotype = 44.1%), such that relative to RR, carrying an S1 allele increases the odds of infection 5.9-fold (p=6 × 10−42) while a second S1 allele further increases the odds of infection 2.7-fold for a 15.9-fold difference (p=1 × 10−4). The S2 allele acts additively, such that each S2 allele increases the odds of infection 1.5-fold (p=6 × 10−5). We confirmed the PTC2 signal using an independent set of 392 snails from 13-16-R1 that had previously been phenotyped (Tennessen et al., 2015a) (p=7 × 10−12 for R vs. S1; p=4 × 10−5 for R vs. S2; Figure 1—figure supplement 1). These snails had also been genotyped at sod1 and RADres, revealing that all three loci had significant independent associations when included together in the same model (p≤10−4 for each), with no evidence for epistasis (p>0.05 for interaction terms). Segregating variation at PTC2 has a stronger association with odds of infection than that of any other known B. glabrata locus (Tennessen et al., 2015a; Tennessen et al., 2015b). The BS-90 RAPD marker (Knight et al., 1999) is only 5 Mb and 23 cM from PTC2 (Figure 1—figure supplement 2; Tennessen et al., 2017). This marker is predicted to be 17 cM (range 6–33 cM) from a causal locus, which could therefore plausibly be PTC2 (Supplementary file 1B).

Figure 1 with 2 supplements see all
A region on Linkage Group XII displays a major association with infection risk (Figure 1—source data 1).

(A) Genetic divergence (FST) between infected and uninfected snail pools in 10 kb windows, for variants on contigs arranged by linkage map position (Tennessen et al., 2017). The strongest signal is from PTC2 on Linkage Group XII, far exceeding the signal of known regions RADres and sod1 which reflect wide haplotype blocks (Tennessen et al., 2015a). (B) PTC2 genotypes are associated with infection outcome (Source code 1). Genotypes are displayed for all three pairs of alleles, revealing a substantial difference between R and S1 (center panel), and an intermediate signal of S2 relative to the others (left and right panels).

Figure 1—source data 1

Genetic divergence between infected and uninfected snail pools in 10 kb windows.

https://cdn.elifesciences.org/articles/59395/elife-59395-fig1-data1-v2.txt.zip

Using PacBio whole-genome assemblies from snails homozygous for each of the three PTC2 alleles (Supplementary file 1C), we find striking sequence and structural divergence among the haplotypes (Figure 2). Alignable regions show 3.3% nucleotide divergence on average (SD = 2.1%). A majority of PTC2 sequence shows no similarity among alleles; the percentage of sequence that could even be aligned ranged from 12.9% (R onto S2) to 40.0% (S1 onto R). All three PTC2 haplotypes harbor unique insertions tens of thousands of bp in size, some of which contain complete coding genes, such that each genotype carries a distinct combination of genes (Figure 2). Shared orthologous genes at PTC2 show many nonsynonymous differences and in some cases homology can only be identified at the protein, not DNA, level (Figure 2—figure supplement 1; Figure 3). This degree of polymorphism is unusually high for conspecific haplotypes in most genomic regions in any taxon (Leffler et al., 2012). In contrast to PTC2, 89.5% of sequence on other contigs can be aligned between assemblies, with a mean of 0.4% nucleotide divergence. It is not obvious how a chromosomal rearrangement (e.g. inversion) could maintain more than two distinct haplotypes, and in any case we see no evidence for one in our assemblies.

Figure 2 with 4 supplements see all
Divergent haplotypes of Polymorphic Transmembrane Complex 2 (PTC2).

As in Figure 1A, genetic divergence (FST) between infected and uninfected snail pools is shown, here both for individual variants (grey circles, only FST ≥0.01 shown) and as mean values for sliding windows of 10 kb (red line). FST is undefined for regions present on only one haplotype. PTC2 (peach) is defined as the 450 kb region containing all windows for which mean FST exceeds 0.1. Within PTC2, the three haplotypes (R, S1, and S2) are aligned with multi-kilobase indels and genes indicated. Assemblies are comprehensive and alignment gaps represent annotated indels, not missing data. PTC2 is characterized by extensive sequence divergence, including large indels containing entire genes (numbered), and is enriched for single-pass transmembrane (TM1) loci (dark blue).

Figure 3 with 2 supplements see all
Divergence of PTC2 single-pass transmembrane (TM1) genes.

For each of eight TM1 genes (numbered as in Figure 2; Supplementary file 1D), the protein product is shown, including the transmembrane domain (peach). Orthologous alleles are aligned. Amino acid residues that differ from the R allele are shown in blue. Uncorrected protein sequence divergence between orthologous alleles ranges from 3% to 85%. Three genes (7, 8, and 10) are present only on a single haplotype.

Using RNA-Seq data from homozygotes of each genotype to identify expressed genes, we fully annotated PTC2 (Supplementary file 1D,E). Of the eleven PTC2 genes, eight are predicted to be TM1 genes, including all five genes that are shared among the three haplotypes (genes 1, 2, 4, 5, and 9; Figure 2; Figure 3). Of the three non-TM1 genes, gene 3 and gene 11 show homology to TM1 genes 2 and 8, respectively, but without the TM1 domains. Gene 6 contains a conserved protein domain of unknown function (DUF2732). Only 11% of B. glabrata genes are TM1 genes, so they would be unlikely to constitute eight of eleven genes by chance alone (p<10−5). PTC2 TM1 genes are all between 166 and 530 codons, have TM1 domains that are displaced from the N-terminus (Figure 3—figure supplement 1), and like the rest of PTC2 they are highly polymorphic (Figure 3), with amino acid level divergence exceeding 50% in several cases. Sequences similar to both R and S haplotypes are present in other B. glabrata populations without admixed histories (genomic and transcriptomic sequence from Brazilian strain BB02, Adema et al., 2017; transcriptomic sequence from Guadeloupe, Tennessen et al., 2015b; Figure 3—figure supplement 2), suggesting that they co-occur throughout the species range and the polymorphism is old. Synonymous divergence among alleles is higher than nonsynonymous, and a phylogeny of concatenated genes shows 24% synonymous divergence from the midpoint root. Thus, haplotypes are more consistent with an ancient origin (24 million years assuming a neutral mutation rate of 10−8 per year) rather than recent divergence via selection for protein diversity. Other than the transmembrane segment, these genes contain no known protein domains or homology to sequences outside of gastropods, nor are they homologous to GRC genes. Some show homology to each other and/or to other genes near PTC2 or elsewhere in the genome, but amino acid level sequence similarity among paralogs is low (<50%). The phenotypic effects of individual genes and polymorphisms will be an exciting subject for future work involving knockdowns or knockouts (Allan et al., 2017; Abe and Kuroda, 2019) and/or additional RNA-Seq from multiple individuals allowing quantification of expression differences.

Both GRC and PTC2 suggest a model of snail-schistosome interaction via molecular recognition (either of the parasite by the host, or of the host by the parasite) that is mediated by TM1 gene polymorphism. Across metazoans, TM1 genes often play a role in immunological recognition, and include B- and T-cell receptors, Toll-like receptors, major histocompatibility complex genes, and similar host defense genes (Pahl et al., 2013). Other polymorphic clusters of host transmembrane genes are used by parasites as receptors for host recognition and invasion (e.g. human glycophorins and Plasmodium; Malaria Genomic Epidemiology Network et al., 2017), and at least one of the GRC TM1 genes controls shedding of S. mansoni cercariae (Allan et al., 2017). One PTC2 TM1 gene is present only on the R haplotype and is an obvious candidate if it functions to recognize the parasite. However, allelic divergence among shared genes could also be important, and an R-specific gene alone would not explain the difference between S1 and S2. In contrast to GRC, in which a completely dominant allele confers resistance, all three alleles in PTC2 differ in their susceptibility, and allelic associations are additive or show partial dominant susceptibility (Figure 1B). This pattern suggests that multiple loci along the haplotypes may jointly contribute to phenotype by interacting with different combinations of parasite molecules such as SmPoMucs (Roger et al., 2008) or other glycoproteins (Allan and Blouin, 2018) to determine the outcome of infection.

As with GRC, we suspect non-neutral host-parasite coevolutionary processes have shaped sequence polymorphism at PTC2. The inferred ancient origin (>20 million years) of PTC2 is inconsistent with a neutral coalescent process. Because 13–16-R1 is admixed from geographically isolated populations, we can’t infer natural allele frequencies or compare the site frequency spectrum to a neutral expectation for a randomly-mating population, as was possible for GRC (Tennessen et al., 2015b), though these allelic lineages do segregate in natural populations (Figure 3—figure supplement 2). Therefore, while the remarkable structural and nonsynonymous polymorphism appears adaptive, it is difficult to distinguish among plausible scenarios including overdominance (Woolhouse et al., 2002), negative-frequency dependent selection (Woolhouse et al., 2002; Koskella and Lively, 2009; Bento et al., 2017), adaptive introgression from distantly related species (Hedrick, 2013), or selection for an epistatically-interacting supergene (Thompson and Jiggins, 2014). Introgression appears unlikely, as it would have had to occur twice independently to generate three distinct haplotypes, and all of the closest relatives of B. glabrata occur allopatrically in Africa (DeJong et al., 2001). Therefore, the most plausible explanations involve some form of long-term balancing selection. Although S. mansoni is not native to the neotropics (Desprès et al., 1993), selection may have been driven by other trematodes, a clade that has ubiquitously infected snails for millions of years (Blair et al., 2001) and which can be a strong selective force favoring rare alleles (Koskella and Lively, 2009). Schistosomes castrate snails (Faro et al., 2013) but wild snails show no sign of evolving universal resistance, suggesting that the R haplotype is either specific to parasite genotype or else costly to fitness in some undetected manner. The R allele has persisted within the 13–16-R1 population for decades in the absence of challenge by parasites, so any fitness cost must be relatively weak or context-dependent.

We have thus far only observed an effect on one parasite strain, PR-1, precluding inferences about gene-for-gene interaction. Nevertheless, a system of polymorphic matching alleles could explain the substantial schistosome-strain by snail-strain interaction in compatibility that is often observed (Richards and Shade, 1987; Richards et al., 1992; Knight et al., 1999; Webster et al., 2004; Webster and Woolhouse, 1998; Theron et al., 2014), including schistosome-infection dose-response curves that fit a simple phenotype-matching model (Theron et al., 2008; Theron et al., 2014). If more than one PTC2 gene contributes to resistance, then synergistic interactions among these genes and other unlinked loci could begin to explain the pronounced variation in host-parasite compatibility (Schulenburg et al., 2007). In other invertebrates, highly polymorphic haplotypes can be major-effect loci for infection and show striking coevolutionary signatures including variable presence/absence of genes. For example, the PR-locus mediating bacterial resistance in Daphnia via matching-allele interactions also features haplotypes of vastly different sizes (differences > 60 kb at both this locus and PTC2) with large non-homologous sections, and these contain glycosyltransferase genes that could mediate host-pathogen compatibility (Bento et al., 2017). Similarly, the APL1 immune factor impacting Plasmodium development in Anopheles consists of adjacent paralogs that differ in copy number among species and show extreme diversity within species (Rottschaefer et al., 2011; Mitri et al., 2020). Thus, while invertebrates lack the acquired immunity of vertebrates and its associated adaptive genetic variation (Spurgin and Richardson, 2010), their defenses can show similar nuance conveyed by molecular diversity (Loker et al., 2004; Cerenius and Söderhäll, 2013). However, the evolutionary consequences may not match those for vertebrate acquired immunity loci like the major histocompatibility complex or immunoglobulins, where sequence diversity per se tends to enhance immune effectiveness though perhaps at the cost of autoimmunity. For invertebrates, increased numbers of distinct immunogenetic sequences may not necessarily lead to increased resistance if parasites also use these sequences to recognize hosts or mount evasion strategies (Schmid-Hempel, 2005). If discarding immune genes is often as advantageous as gaining them, the result could be a patchwork of genes as observed at PTC2. A more appropriate vertebrate analog might be blood groups used by parasites to invade host cells, for which haplotypic differences often include loss of functional genes, and diversity has been maintained for millions of years by balancing selection (e.g. Dantu group, Malaria Genomic Epidemiology Network et al., 2017; ABO group, Ségurel et al., 2013). The generation and maintenance of such divergent haplotypes remain to be fully explained and could reflect long-term fluctuating selection among alleles with different combinations of specificity and cost (Seger, 1988; Ashby and Boots, 2017).

As with mosquitoes (Marshall et al., 2019), a promising strategy for disease control involves recruiting the natural immunogenetic variation of vectors (Reardon, 2016). The successful implementation of CRISPR/Cas9 in gastropods (Abe and Kuroda, 2019) will facilitate the creation of genetically modified snails having enhanced immunity to block disease transmission (Maier et al., 2019). As large-effect loci, the TM1 clusters are excellent candidates to target in such efforts. However, more work is needed to characterize the functional effects of these genes, as well as the molecular and evolutionary dynamics between hosts and parasites. For example, if host polymorphism is adaptive, it may not be readily replaced in natural populations. Furthermore, gene-by-gene interactions between snail and schistosome genotypes could permit the rapid evolution of parasite counterstrategies. In the context of ancient trench warfare coevolution, it is unlikely that a universally resistant snail could be generated by a single genetic change, although successive changes could enhance parasite resistance enough to impact patterns of transmission (Theron et al., 2014). As an alternative to genetic modification, future work could leverage the hypothesis that the snail TM1 proteins bind to key schistosome molecules that mediate invasion of the host. One could use TM1 proteins to find such molecules, as with snail fibrinogen related proteins and schistosome SmPoMucs (Roger et al., 2008), or GRC and galactose (Allan and Blouin, 2018). More broadly, clusters of immune recognition loci with elevated functional diversity have long been used to track and predict patterns of adaptive variation across populations and species (Sommer, 2005; Spurgin and Richardson, 2010). Thus, we anticipate that this class of genes will play a central role in disease control as the molecular aspects of vector biology are fully brought to bear on schistosomiasis.

Materials and methods

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional
information
Biological sample (Biomphalaria glabrata)13–16-R1PMID:5050093; PMID:72995815050093:NIH-MH-cc-13-16-1; 7299581:13–16-R1Oregon State University population established by C. Bayne
Strain, strain background (Schistosoma mansoni)PR-1PMID:5050093; PMID:72995815050093:NIH-Sm-PR1; 7299581:PR-1
Sequence-based reagentPB35_1696 k_FThis paperPCR primerGGTTCTCGCTTTTTATTGGCTTTTG
Sequence-
based reagent
PB35_1696 k_RThis paperPCR primerTTAGACGCACCCAAGGATCTC
Sequence-based reagentVB13_859 k_FbThis paperPCR primerACAAATGGGGCAGTTACACTGTTTAC
Sequence-based reagentVB13_859 k_RbThis paperPCR primerAGCGAAATGTGAGATTGGTTATGTTG
Sequence-based reagentVB13_868 k_FbThis paperPCR primerTCTTTTCACTAAAGCCGCACAAGTT
Sequence-based reagentVB13_868 k_RbThis paperPCR primerCCTACGTTCTCAATATCAACGGGAA
Software, algorithmSimulatePools.plhttps://github.com/jacobtennessen/GOPOPS/Perl scriptpower analysis for pooled sequencing
Software, algorithmMakeFreqTableFromPooledPileup.plhttps://github.com/jacobtennessen/GOPOPS/Perl scriptestimates allele frequencies
Software, algorithmFstFromJoinedFreqTablesWindow.plhttps://github.com/jacobtennessen/GOPOPS/Perl scriptcalculates mean FST per window
Software, algorithmChopFastaStaggered.plhttps://github.com/jacobtennessen/MiSCVARS/Perl scriptsubdivides sequence data
Software, algorithmAssessBlatChopped.plhttps://github.com/jacobtennessen/MiSCVARS/Perl scriptsummarizes sequence matches

Animals and Ethics

Request a detailed protocol

We used the Oregon State University population of 13–16-R1 that has been maintained as a large population (hundreds) since the mid-1970s (Bonner et al., 2012). 13–16-R1 is descended from snails collected in Brazil and Puerto Rico (Richards and Merritt, 1972; Sullivan and Richards, 1981) but its exact history is not entirely clear. Our population has been maintained in the absence of parasite exposure, and therefore under relaxed selective pressure in regard to parasite resistance.

We used mice to maintain the schistosome parasites and to produce miracidia for challenge experiments. Infection is through contact with inoculated water and involves minimal discomfort. Infected rodents are euthanized with CO2 prior to showing clinical signs of disease and are dissected to recover parasitic eggs. Animal numbers were held to the minimum required for the research. Institutional approval: Oregon State University Animal Care and Use Protocols 4749 and 5115.

Genome-wide scan of 13–16-R1

Request a detailed protocol

We challenged snails of the 13–16-R1 population with PR-1 miracidia, following previous methods (Bonner et al., 2012). In brief, we arbitrarily chose 1700 outbred juvenile snails (4–6 mm diameter), challenged them each with five miracidia, and classified them as infected or uninfected. About 40% of snails became infected. From these, we randomly selected 600 infected and 600 uninfected snails for sequencing. These sample sizes were chosen based on a simulation of variants with minor allele frequencies ≥ 0.2, with copies randomly assigned to 600 infected and 600 uninfected individuals at the expected sequencing coverage depth (script SimulatePools.pl at https://github.com/jacobtennessen/GOPOPS/), which revealed that FST between simulated sequencing pools was unlikely to exceed 0.05 (p<10−5) and very unlikely to exceed 0.1 (p<10−9) and therefore we had substantial power to detect larger FST differences. We divided the empirical pools into two technical replicates, and four pools (each combination of infected/uninfected and technical replicate) were sequenced across six lanes of the Illumina HiSeq 3000 (paired-end reads of 151 bp) at the Center for Genome Research and Biocomputing (CGRB) at Oregon State University (Illumina data at NCBI SRA, BioProject Accession PRJNA638474).

Infected snails contain DNA from S. mansoni, which could potentially generate false sequence variants correlated with resistance. To prevent this, we converted reads to FASTA format, used BLASTN (version 2.6.0) to identify reads that matched the S. mansoni reference genome (v. 5.2, Berriman et al., 2009) with an E-value cutoff of 1e-040, and then filtered all such reads, as well as their mate pairs, from all downstream analysis. Filtered FASTQ files, having had adapters removed with Cutadapt (version 1.15, Marcel, 2011) and trimmed with Trimmomatic (v. 0.30, Bolger et al., 2014; options: LEADING:20 TRAILING:20 SLIDINGWINDOW:5:20 MINLEN:50), were aligned using BWA version 0.7.12 (command: bwa mem -P -M -t 4; Li and Durbin, 2009) initially to reference genome BglaB1 and ultimately to our PacBio assemblies (Figure 2—figure supplement 2). All reads marked as secondary alignments were filtered out of the sam files. We used SamTools version 1.3 (Li et al., 2009) to convert these to sorted bam files (commands: samtools view -bT; samtools sort) and generate pileup files (command: samtools mpileup -t DP -A). From these files, we estimated allele frequencies at each variant within each pool, and calculated FST in overlapping 10 kb windows across the genome, using the scripts MakeFreqTableFromPooledPileup.pl (options: -a 0.1 and -d 15) and FstFromJoinedFreqTablesWindow.pl (default options), available at https://github.com/jacobtennessen/GOPOPS/. We only considered windows with at least 20 single-nucleotide polymorphisms, in order to exclude associations that are supported by few variants and which are therefore likely to be spurious.

To more precisely estimate genotype-phenotype associations at the LG XII candidate region, we genotyped candidate loci from the region in individual snails. We designed primers for genotyping using Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) on the consensus of our assemblies and used them for PCR amplification (Supplementary file 1F). These surrounded indels, such that after initial confirmatory sequencing in test samples, samples could be genotyped with PCR and gel electrophoresis alone. We genotyped the candidate locus in 1570 of the original 1700 phenotyped snails, including 1165 of the 1200 samples used in the genome-wide association study. Furthermore, in order to independently validate the candidate region, we also genotyped it in 392 snails (also 13–16-R1) from a set of 439 that had been phenotyped several years previously (Tennessen et al., 2015a). We tested for effects between genotype and phenotype using logistic regression, following our standard approach (Tennessen et al., 2015a; R Development Core Team, 2020; Source code 1). Specifically, we first coded infection as binary (1 or 0) and each allele as either additive (‘add’: 0, 1, or 2 copies of the allele), dominant (‘dom’: 1 or 0 for presence/absence of allele), or recessive (‘rec’: 1 if homozygous, 0 otherwise). We first confirmed an independent effect of both S alleles relative to the R allele with model glm(infection~S1add+S2add, family = binomial), and then we found the best-fitting parameter combination (minimum Akaike information criterion) which was model glm(infection~S1dom+S1rec+S2add, family = binomial). The positive effect of both S1dom and S1rec on infection odds was interpreted as partial dominance (i.e. increased susceptibility if the allele is present plus additional increased susceptibility for homozygotes). We tested for epistasis by first adding terms for RADres and sod1 (known to act additively; Tennessen et al., 2015a) to the model and then testing if interaction terms among loci were significant.

PacBio sequencing and assembly

Request a detailed protocol

We generated three PacBio assemblies from inbred snail lines homozygous for the three PTC2 alleles. (Supplementary file 1C; assemblies NCBI Genome, BioProject Accession PRJNA639204). The first assembly (homR) used snail line R68, which is derived from 13-16-R1 and is highly resistant to S. mansoni strain PR-1, as described previously (Tennessen et al., 2015a). We pooled and sequenced these snails in 15 SMRT cells (78x coverage) on the Pacific Biosciences Sequel I at the CGRB. We assembled the resulting raw sequences using the HGAP4/FALCON assembler (Chin et al., 2016; options: Genome Length 1 Gb, Seed coverage 30, Min Map Concordance 70). Similarly, the other two assemblies (homS1 and homS2) were generated using the same methodology from snail lines i90 (6 SMRT cells, 58x coverage) and i171 (5 SMRT cells, 46x coverage), respectively. By default, we treat homR as the reference genome unless stated otherwise. To assign PacBio contigs to the existing linkage map (Tennessen et al., 2017), we aligned 46,023 fragments of 100 bp each from BglaB1 (the published genome) that had previously been screened for uniqueness and used for targeted capture (Tennessen et al., 2017) using BLASTN (version 2.6.0) with default parameters. PacBio contigs were then assigned to linkage groups if at least one unique fragment from a mapped BglaB1 contig aligned to it, and if at least 75% of these matching unique fragments pertained to the same linkage group. We thus assigned 1489 homR contigs to linkage groups, representing 635 Mb; these assignments were supported by a median of seven mapped fragments per homR contig, with an average of 96% of fragments per homR contig mapping to the same linkage group.

In the vicinity of PTC2, we assessed sequence similarity with dot plots. Each assembly was broken into overlapping 600 bp segments (script ChopFastaStaggered.pl at https://github.com/jacobtennessen/MiSCVARS/), which were tested for sequence similarity in pairwise comparison using BLAT (Kent, 2002; options: stepSize = 1 -minScore = 300) followed by script AssessBlatChopped.pl (at https://github.com/jacobtennessen/MiSCVARS/). To estimate average genomic sequence similarity outside of PTC2, we used BLASTN (version 2.6.0) to identify pairs of orthologous contigs between homR and homS1 (our two best assemblies, which should represent random samples of 13–16-R1 in regions unlinked to PTC2), and performed a similar BLAT comparison for all such pairs in which both contigs were over 2 Mb. For assemblies homR and homS2, PTC2 is split between two contigs each (Supplementary file 1C). We manually combined these contigs into continuous haplotypes. For homR, the ends of contigs R-35 and R-304 both align to each other with 99.9% similarity for 22 kb, indicating that they are in fact directly adjacent and the assembly algorithm was overly conservative in failing to join them (Figure 2—figure supplement 3). For the homS2 contigs, raw reads aligning to contig S2-78 overlapped with raw reads aligning to contig S2-773, indicating a gap of only 12.6 kb which was confirmed by alignment to BglaB1 (Figure 2—figure supplement 4).

RNA-Seq and annotation

Request a detailed protocol

Although BglaB1 is annotated, many genes were likely missed, especially those spanning multiple contigs. Furthermore, some PTC2 haplotypes may contain genes missing from the reference genome. Therefore, we performed RNA-Seq on snail lines homozygous for each of the three PTC2 haplotypes in order to identify all expressed proteins on each haplotype. Samples were prepared as described previously (Tennessen et al., 2015b). A single sample from each homozygous genotype was included in the same lane of the Illumina HiSeq 3000 at the CGRB (single-end reads of 151 bp; Illumina data at NCBI SRA, Bioproject Accession PRJNA639026). This single-sample approach precludes quantifying expression in a rigorous way (Supplementary file 1E), but not our goal of assembling transcriptomes for the purpose of annotation. We performed a de-novo annotation of each PTC2 haplotype. Each haplotype-specific RNA-Seq dataset was adapter and quality trimmed using Cutadapt (version 1.15, Marcel, 2011; options: -q 15,10) and de-novo assembled into a transcriptome assembly using Trinity (Grabherr et al., 2011; default assembly parameters). Transcriptome assemblies were reduced to longest open reading frames using TransDecoder (Haas et al., 2013) by first identifying the longest open reading frames (TransDecoder.LongOrfs), then using BLAST (Altschul et al., 1990) to map the longest open reading frames to the UNIPROT (UniProt Consortium, 2019) gastropod protein database (options: -max_target_seqs 1 -outfmt 6 -evalue 1e-5), and finally by predicting protein sequences from the assembled transcripts (TransDecoder.Predict). AUGUSTUS (Stanke et al., 2004) gene prediction training model was built from the UNIPROT Biomphalaria dataset. BUSCO (Seppey et al., 2019) was run on the homS1 genome assembly for use across all assemblies. Single copy orthologs found by BUSCO were used to make the SNAP (Korf, 2004) gene prediction training set. A snail-specific repeat library was constructed using data from BglaB1 (Giraldo-Calderón et al., 2015; Adema et al., 2017; https://www.vectorbase.org), and mollusca-specific repeats from Repbase (Bao et al., 2015), and these repeats were then masked using RepeatMasker (Smit et al., 2013). De-novo gene prediction was run with MAKER (Cantarel et al., 2008) on the repeat-masked genome assembly using the TransDecoder reduced transcriptome assembly as EST evidence, the UNIPROT Biomphalaria proteins as protein evidence, and de-novo gene prediction was conducted using SNAP and AUGUSTUS using the constructed prediction models. We used these automated annotations, along with predictions from genomic sequence from GENSCAN (Burge and Karlin, 1997), and putative orthologous transcripts in the reference genome project (Giraldo-Calderón et al., 2015; Adema et al., 2017) identified with BLASTN (version 2.6.0), to guide manual alignment of RNA-Seq reads. Putative coding genes were rejected and subsequently ignored if they showed homology to transposable elements (e.g. RNA transcriptase or transposase) which are very abundant in the snail genome, if the open reading frame was less than 100 codons, or if the sequence could not be confirmed via manual alignment of RNA-Seq reads (sequences in NCBI GenBank, Accessions MT787302-MT787323). Secondary structure was predicted using TMHMM v. 2.0 (Sonnhammer et al., 1998).

To investigate the phylogenetic history of alleles, we first focused on the coding sequence of the two most conserved TM1 genes (4 and 5) as these could be aligned the most unambiguously. We searched for similar sequences in the genomic and transcriptomic data of reference genome BglaB1 generated from BB02 (Giraldo-Calderón et al., 2015; Adema et al., 2017; https://www.vectorbase.org) and RNA-Seq data from Guadeloupe population GUA (Tennessen et al., 2015b; Bioproject Accession PRJNA264063). We conducted phylogenetic analysis using RAxML (options: -N 100 m GTRCAT; Stamatakis, 2006) and displayed trees with FigTree version 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). We also aligned and concatenated coding sequence from the 13–16-R1 alleles of the five genes present on all three haplotypes and used SNAP version 2.1.1 (https://www.hiv.lanl.gov/content/sequence/SNAP/SNAP.html; Korber, 2000) to calculate nonsynonymous and synonymous divergence among alleles, and to infer synonymous site divergence from the midpoint root of a three-taxon neighbor-joining tree.

Data availability

All data not included in the manuscript are available at NCBI. PacBio genome assemblies are available under BioProject Accession PRJNA639204. Pooled whole-genome sequencing reads are available under BioProject Accession PRJNA638474. RNA-Seq reads are available under BioProject Accession PRJNA639026. Assembled transcripts are on Genbank, Accessions MT787302-MT787323.

The following data sets were generated
    1. Tennessen JA
    2. Bollmann SR
    3. Peremyslova E
    4. Kronmiller BA
    5. Sergi C
    6. Hamali B
    7. Blouin MS
    (2020) NCBI BioProject
    ID PRJNA639204. PacBio assemblies of Biomphalaria glabrata snails derived from lab strain 13-16-R1.
    1. Tennessen JA
    2. Bollmann SR
    3. Peremyslova E
    4. Kronmiller BA
    5. Sergi C
    6. Hamali B
    7. Blouin MS
    (2020) NCBI BioProject
    ID PRJNA638474. Genome-wide association study of Biomphalaria glabrata resistance to Schistosoma mansoni.
The following previously published data sets were used

References

  1. Book
    1. Korber B
    (2000) HIV signature and sequence variation analysis
    In: Rodrigo A. G, Learn G. H, editors. Computational Analysis of HIV Molecular Sequences, 4. Dordrecht, Netherlands: Kluwer Academic Publishers. pp. 55–72.
    https://doi.org/10.1007/b112102
  2. Software
    1. R Development Core Team
    (2020) R: A Language and Environment for Statistical Computing
    R Foundation for Statistical Computing, Vienna, Austria.
    1. Seger J
    (1988) Dynamics of some simple host-parasite models with more than two genotypes in each species
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 319:541–555.
    https://doi.org/10.1098/rstb.1988.0064
    1. Sonnhammer EL
    2. von Heijne G
    3. Krogh A
    (1998)
    A hidden Markov model for predicting transmembrane helices in protein sequences
    Proceedings. International Conference on Intelligent Systems for Molecular Biology 6:175–182.

Decision letter

  1. Dieter Ebert
    Reviewing Editor; University of Basel, Switzerland
  2. Dominique Soldati-Favre
    Senior Editor; University of Geneva, Switzerland
  3. Yann Bourgeois
    Reviewer; University of Portsmouth, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Your study investigates the genetic bases of snail resistance to schistosomes. Schistosomiasis is considered by the WHO as one of the most (after malaria) socioeconomically devastating parasitic disease. Your study showed extreme divergence between three haplotypes, that explain variation in resistance, a major breakthrough in a system hardly accessible to many of the genetic tools commonly used in other systems. The work constitutes a basis for more detailed studies in natural populations and other host-parasite combinations. In the long run it may contribute to reducing the burden of schistosomiasis.

Decision letter after peer review:

Thank you for submitting your article "Clusters of polymorphic transmembrane genes control resistance to schistosomes in snail vectors" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Dominique Soldati-Favre as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Yann Bourgeois (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Tennessen et al. investigated the genetic bases of snail resistance to schistosomes. Snails are the intermediate host (or vector) for schistosomes and therefore of high medical importance. Schistosomiasis is considered by the WHO as one of the most (after malaria) socioeconomically devastating parasitic disease. The current study examines genome-wide polymorphism and differentiation between snails susceptible and resistant to infection by schistosomes. Using pool-sequencing, they identify a region of high differentiation between resistant and susceptible genotypes. They genotype most of the same individuals that were included in the pools and reveal a system with one locus and three haplotypes (R, S1, S2) showing varying in the degree of protection they confer against infection. Using PacBio sequencing, they also show extreme divergence between the three haplotypes, to the extent that alignment is impossible in some regions. RNA-sequencing is used to characterize the region further. The evolutionary mechanisms maintaining this polymorphism remains unknown. The work constitutes a basis for more detailed studies in natural populations and other host-parasite combinations. As a long term perspective, the authors suggest that the knowledge about this genomic region can contribute to disease control.

The manuscript has been reviewed by three specialists. All three agree that this is a major step forward. However, all of them raise important points that need to be addressed in a revised version (with a point by point rebuttal letter), before a final decision can be reached. The main points are summarized here, the detailed reviews are below.

As it stands, the paper is of little interest for people outside the "schistosome world". While schistosomes are certainly important in their own right, a bit more effort could be invested into making this study more accessible to a wider audience.

Many aspects of the analysis, including the statistics, are not clear. More details and care in presentation are needed. Scripts should be fully available.

Even so, one reviewer stated that "The discussion remains cautious", there are numerous places where this is not the case (see reviewer #2). It is important present the results and their interpretation in an adequate way without overstating.

The paper hints at the idea that the mechanisms maintaining this extreme polymorphisms is host-parasite coevolution with balancing selection. Without much more work, there are a few further analysis that might be undertaken to support this better. This would also allow a link to other literature reporting on the role of balancing selection in the framework of host-parasite interactions.

Reviewer #1:

This study describes the finding of a genomic region for resistance in the schistosome vector Biomphalaria. The study includes a series of experiments including phenotypic screens, poolseq, re-sequencing and RNA-Seq. Technically the study is mostly fine. I have a few points I would like to see in a revised version.

1) The bioinformatic analysis of the data is described in a way that it would not be possible to trace the results. Parameter setting, threshold value etc. are not reported. Just naming the software used in not sufficient. Also, many intermediate steps are not reported.

2) The authors mention another study that reported a similar complex genomic region providing resistance to a parasite. They hint that invertebrates may differ in the way diversity is created with regard to resistance to parasites. Are there more such cases? It would be nice to compare the new results directly with the earlier study and discuss the similarities and differences.

3) The entire article is very Schistsome/Biomphalaria centred and unusually rich in self-citations. For people not working in this field, the article is not appealing. I would be happy to see much more efforts of the authors to widen their scope and make the study of interest for other people studying host-parasite interactions, at least those being interested in invertebrate hosts.

4) The dot plots in Figure 2B are impressive. It would be good to add dot plots of S1 against S1, R against R and S2 against S2. This would be helpful for a better understanding of the structure of the genomic region.

Reviewer #2:

The authors present an association study for resistance to a Schistosome parasite in Biomphalaria snails. The short report consists of the association study, a careful genome assembly of the resistant and two susceptible strains and annotation of those strains to compare genes present in the region. The manuscript stops short of any attempt to validate that the PTM2 region plays a major role in infection outcomes.

1) As written, the manuscript assumes a significant familiarity with the system that is unlikely for a broad audience. As a methods last style manuscript, much was left unclear until the end of the paper: how were individuals infected?, what is the nature of the population used? etc. More background on the system (previous founding about PTM1, etc. would be useful. I realize that any such expansion might jeopardize the "short report" status, but the increased clarity might outweigh the benefit of brevity.

2) While I understand that these snails are not a genetic model system with abundant tools to do validations of candidate loci, I feel that the language used is often too definitive, suggesting that the association is causal. For example, in the figure legend 1 "(B) PTC2 genotypes determine infection outcome." This is false. PTC2 genotypes are associated with infection outcome. Care should be taken to not oversell the association.

3) The annotations presented in Figure 3 are both interesting and a mess. Actually, they are partly interesting because they are a mess. The genomic region is highly divergent among haplotypes. Given the apparent homogenization of the admixed backgrounds across the genome in the population, it is amazing how divergent this region continues to be. Both LG7 and LG9 appear to be entire chromosomes associated with the phenotype, but PTC2 is a relatively small region. Is introgression from another species a possibility? Also, the authors discuss the structure of the genes in those different regions, but is it possible to make any additional predictions about the associations from the gene content?

4) There is no discussion of how statistical significance is assessed in the association study. In the Materials and methods, simulations suggest that Fst as always less than 0.05, but that isn't presented in the main text. I don't suspect that any analysis would diminish peak at PTC2, but some indication of "how" significant that peak is would be useful. There are several methods to do this, but the permutations should be appropriate and have already been completed.

Reviewer #3:

Summary

In this work, Tennessen et al. conduct a comprehensive study of the genetic bases of resistance to schistosomes in snail vectors. They examine genome-wide polymorphism and differentiation between snails susceptible and resistant to infection by schistosomes in a population followed for more than 40 years. Using pool-sequencing, they first identify a region of high differentiation between the two categories. They genotype most of the same individuals that were included in the pools and reveal a system with one locus and three alleles (R, S1, S2) showing partial overdominance and varying in the degree of protection they confer against infection. Using PacBio sequencing, they also show extreme divergence between the three haplotypes, to the extent that alignment is impossible in some regions. Using RNA-sequencing of one individual per homozygous genotype, they reannotate each of the three haplotypes and reveal a large excess of TM1 genes in this region, displaying elevated amino-acid divergence and diversity. The exact evolutionary mechanisms maintaining such polymorphisms are difficult to pinpoint, given that only one parasite strain was used in infection experiments, but the work constitutes a solid basis for more detailed studies in natural populations and other host-parasite combinations.

Assessment

This is a compelling and impressive work. The signal is strong, and is unlikely to be an artefact. The work goes well beyond simply studying association between genetic and phenotypic variation, and uses a broad set of techniques (infection experiments, Pool-seq, individual genotyping, RNA-seq, PacBio assembly) to characterize the molecular bases of resistance. The discussion remains cautious, and takes care of not over-interpreting results.

The manuscript is already good in its current state, but I had a few comments detailed below:

General suggestions

As the authors state in the Discussion, it is difficult to determine what drives the maintenance of this polymorphism (which vector(s) is the actual selective pressure)? Which selective dynamic underlies this diversity? Is there any chance that mutational load be exposed at the homozygous state? However, it may be possible to obtain more insights about the evolutionary history of this region based on the data already available. For example, what is the diversity within the three divergent haplotypes? A way to assess this in more details could be using ancestral recombination graphs for a few selected individuals (for example by calling SNPs for the three reference individuals and running ARGWeaver (https://github.com/mdrasmus/argweaver) or Relate (https://myersgroup.github.io/relate/index.html)). This could give an idea of the age of alleles and genealogies in the relatively conserved regions that flank the highly divergent loci. This may also provide more evidence for long-term balancing selection.

The authors mention that they performed RNA-sequencing on a single individual for each of the three homozygous genotypes. Do the authors have any plan to examine in more details how gene expression changes depending on allele combinations? It may be interesting to obtain an even more detailed picture of regulatory and non-synonymous variation, and how dominance/recessivity impact it. I do not think that more sequencing is necessary for this paper, but maybe discuss a bit more the potential of such an experiment?

https://doi.org/10.7554/eLife.59395.sa1

Author response

The manuscript has been reviewed by three specialists. All three agree that this is a major step forward. However, all of them raise important points that need to be addressed in a revised version (with a point by point rebuttal letter), before a final decision can be reached. The main points are summarized here, the detailed reviews are below.

As it stands, the paper is of little interest for people outside the "schistosome world". While schistosomes are certainly important in their own right, a bit more effort could be invested into making this study more accessible to a wider audience.

We have added substantial text and new citations about invertebrate immunity and host-parasite coevolution more broadly. Details below.

Many aspects of the analysis, including the statistics, are not clear. More details and care in presentation are needed. Scripts should be fully available.

We now include links to multiple scripts available on Github and more details about methodological parameters and statistics. Details below.

Even so, one reviewer stated that "The discussion remains cautious", there are numerous places where this is not the case (see reviewer #2). It is important present the results and their interpretation in an adequate way without overstating.

We have revised the wording throughout to clarify that we have only found a genotype-phenotype association and we have not demonstrated a specific causal gene or polymorphism. Similarly, while we speculate about immunogenetic function and host-parasite coevolution, we emphasize that these inferences are not definitive. Details below.

The paper hints at the idea that the mechanisms maintaining this extreme polymorphisms is host-parasite coevolution with balancing selection. Without much more work, there are a few further analysis that might be undertaken to support this better. This would also allow a link to other literature reporting on the role of balancing selection in the framework of host-parasite interactions.

As we note below in response to reviewer 3, inferences about adaptive evolution are limited by the artificial history of this laboratory population and by the available data. Nevertheless, we have added some additional evolution analyses, including reporting on nonsynonymous and synonymous diversity, sequence similarity to distant paralogs, similarity to alleles in other populations (new figure supplement), and estimated age of the haplotypes based on synonymous site divergence. We have also added additional references on balancing selection mediated by host-parasite interaction. Details below.

Reviewer #1:

This study describes the finding of a genomic region for resistance in the schistosome vector Biomphalaria. The study includes a series of experiments including phenotypic screens, poolseq, re-sequencing and RNA-Seq. Technically the study is mostly fine. I have a few points I would like to see in a revised version.

1) The bioinformatic analysis of the data is described in a way that it would not be possible to trace the results. Parameter setting, threshold value etc. are not reported. Just naming the software used in not sufficient. Also, many intermediate steps are not reported.

We have added many more details to our analytical methods, including specific parameters, links to several scripts available on Github, and a new source code file supporting our logistic regression analysis. e.g. subsections “Genome-wide scan of 13-16-R1”, “RNA-Seq and Annotation” and the Key Resources Table.

2) The authors mention another study that reported a similar complex genomic region providing resistance to a parasite. They hint that invertebrates may differ in the way diversity is created with regard to resistance to parasites. Are there more such cases? It would be nice to compare the new results directly with the earlier study and discuss the similarities and differences.

We have added more background and citations about invertebrate immunity, including a new paragraph in the Introduction and we have expanded our comparison with other loci in other species that show similar patterns.

3) The entire article is very Schistsome/Biomphalaria centred and unusually rich in self-citations. For people not working in this field, the article is not appealing. I would be happy to see much more efforts of the authors to widen their scope and make the study of interest for other people studying host-parasite interactions, at least those being interested in invertebrate hosts.

As mentioned above, we have added more text about invertebrate hosts more generally. We are actually one of the only groups so far to publish genome wide association analyses on Biomphalaria glabrata resistance to schistosomes, and in previous work we found TM1 genes similar to those described here. So, many relevant citations come from our group. However, we have added numerous citations from other systems to enhance the general appeal of this manuscript.

4) The dot plots in Figure 2B are impressive. It would be good to add dot plots of S1 against S1, R against R and S2 against S2. This would be helpful for a better understanding of the structure of the genomic region.

We have added dot plots of each haplotype against itself. In response to reviewer 2, we have also converted this figure to a figure supplement.

Reviewer #2:

The authors present an association study for resistance to a Schistosome parasite in Biomphalaria snails. The short report consists of the association study, a careful genome assembly of the resistant and two susceptible strains and annotation of those strains to compare genes present in the region. The manuscript stops short of any attempt to validate that the PTM2 region plays a major role in infection outcomes.

1) As written, the manuscript assumes a significant familiarity with the system that is unlikely for a broad audience. As a methods last style manuscript, much was left unclear until the end of the paper: how were individuals infected?, what is the nature of the population used? etc. More background on the system (previous founding about PTM1, etc. would be useful. I realize that any such expansion might jeopardize the "short report" status, but the increased clarity might outweigh the benefit of brevity.

We have added more details about the population and the infection method to the Introduction. Background about the other loci including GRC (aka PTC1) is also included in the Introduction.

2) While I understand that these snails are not a genetic model system with abundant tools to do validations of candidate loci, I feel that the language used is often too definitive, suggesting that the association is causal. For example, in the figure legend 1 "(B) PTC2 genotypes determine infection outcome." This is false. PTC2 genotypes are associated with infection outcome. Care should be taken to not oversell the association.

We have reworded the language here and elsewhere to avoid implying causality of any particular gene or marker.

3) The annotations presented in Figure 3 are both interesting and a mess. Actually, they are partly interesting because they are a mess. The genomic region is highly divergent among haplotypes. Given the apparent homogenization of the admixed backgrounds across the genome in the population, it is amazing how divergent this region continues to be. Both LG7 and LG9 appear to be entire chromosomes associated with the phenotype, but PTC2 is a relatively small region. Is introgression from another species a possibility? Also, the authors discuss the structure of the genes in those different regions, but is it possible to make any additional predictions about the associations from the gene content?

We have added a new figure supplement (Figure 3—figure supplement 2) illustrating how this sequence divergence occurs throughout the species range. Thus, the PTC2 polymorphism is not an artifact of the admixed origin of 13-16-R1 and is likely to show more allelic divergence than most other genomic regions even in natural populations. We have previously noted that the signals on LG7 and LG9 appear to reflect large haplotype blocks (Tennessen et al., 2015a), possibly maintained by inversions. We now state this in the legend of Figure 1. The regions on LG7 and LG9 have been discussed previously (Tennessen et al., 2015a); it will be interesting to revisit those regions and identify candidate genes using the PacBio assemblies presented here, but such an analysis is beyond the scope of this manuscript. The PTC2 signal appears small on a genome-wide scale (Figure 1) but this still represents hundreds of kilobases. We now discuss why introgression is an unlikely explanation for the polymorphism (Results and Discussion paragraph four). We have added some additional analyses of the PTC2 genes (e.g. phylogenetic, Results and Discussion paragraph two; assessment of transmembrane domain position using simulations, legend Figure 3—figure supplement 1).

4) There is no discussion of how statistical significance is assessed in the association study. In the Materials and methods, simulations suggest that Fst as always less than 0.05, but that isn't presented in the main text. I don't suspect that any analysis would diminish peak at PTC2, but some indication of "how" significant that peak is would be useful. There are several methods to do this, but the permutations should be appropriate and have already been completed.

Our power analysis was previously only mentioned in the Materials and methods, but now we discuss it in more detail in the Results and Discussion. Specifically, using simulations, we show that an Fst value greater than 0.1 (our threshold for defining PTC2) is very unlikely to occur by chance in pooled sequencing reads (uncorrected p < 10-9; corrected p < 0.01).

Reviewer #3:

[…]

General suggestions

As the authors state in the Discussion, it is difficult to determine what drives the maintenance of this polymorphism (which vector(s) is the actual selective pressure)? Which selective dynamic underlies this diversity? Is there any chance that mutational load be exposed at the homozygous state? However, it may be possible to obtain more insights about the evolutionary history of this region based on the data already available. For example, what is the diversity within the three divergent haplotypes? A way to assess this in more details could be using ancestral recombination graphs for a few selected individuals (for example by calling SNPs for the three reference individuals and running ARGWeaver (https://github.com/mdrasmus/argweaver) or Relate (https://myersgroup.github.io/relate/index.html)). This could give an idea of the age of alleles and genealogies in the relatively conserved regions that flank the highly divergent loci. This may also provide more evidence for long-term balancing selection.

We appreciate and share the reviewer’s enthusiasm for tracking down the evolutionary basis for the polymorphism. Unfortunately, any additional insight that can be gained from the available data is limited, for two reasons. First, our snails have a contrived demographic history that violates the assumptions of most tests for selection. Specifically, 13-16-R1 is descended from several distinct populations and its founding likely involved an extreme bottleneck, and our reference assemblies are derived from inbred lines generated from 13-16-R1. Second, three haploid sequences is too small of a sample size for an informative ancestral recombination graph. Thus, there is no biologically meaningful way to use ARGWeaver or Relate with this dataset. However, a few additional evolutionary analyses are possible, and we have added them to highlight the evidence for balancing selection. We have added a new brief analysis showing the average synonymous divergence since the common ancestor is 24%, indicating an ancient origin millions of years ago. Ancient haplotypes are a commonly noted signature of balancing selection, though the specific form of balancing selection is not known conclusively. This pattern is also consistent with introgression, though we note that this is unlikely (Results and Discussion paragraph four). We have also added a figure showing the polymorphism occurs naturally in populations from across the species range. We have no evidence for mutational load in homozygotes (e.g. no heterozygotes excess in violation of Hardy-Weinberg equilibrium).

The authors mention that they performed RNA-sequencing on a single individual for each of the three homozygous genotypes. Do the authors have any plan to examine in more details how gene expression changes depending on allele combinations? It may be interesting to obtain an even more detailed picture of regulatory and non-synonymous variation, and how dominance/recessivity impact it. I do not think that more sequencing is necessary for this paper, but maybe discuss a bit more the potential of such an experiment?

We agree with the reviewer that additional gene expression studies would be interesting but are outside the scope of the present manuscript. We now discuss the potential for future work in paragraph two of the Results and Discussion.

https://doi.org/10.7554/eLife.59395.sa2

Article and author information

Author details

  1. Jacob A Tennessen

    1. Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, United States
    2. Department of Integrative Biology, Oregon State University, Corvallis, United States
    Contribution
    Data curation, Formal analysis, Visualization, Writing - original draft, Writing - review and editing
    For correspondence
    jtennessen@hsph.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5015-4740
  2. Stephanie R Bollmann

    Department of Integrative Biology, Oregon State University, Corvallis, United States
    Contribution
    Data curation, Formal analysis, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Ekaterina Peremyslova

    Department of Integrative Biology, Oregon State University, Corvallis, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  4. Brent A Kronmiller

    1. Department of Integrative Biology, Oregon State University, Corvallis, United States
    2. Center for Genome Research and Biocomputing, Oregon State University, Corvallis, United States
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  5. Clint Sergi

    Department of Integrative Biology, Oregon State University, Corvallis, United States
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  6. Bulut Hamali

    Department of Integrative Biology, Oregon State University, Corvallis, United States
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  7. Michael Scott Blouin

    Department of Integrative Biology, Oregon State University, Corvallis, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8439-2878

Funding

National Institutes of Health (AI143991)

  • Michael Scott Blouin

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Leeah Whittier, Ryan Wilson, and Haley Hutcheson for snail husbandry assistance, and Angela Early for helpful comments. This work was partially supported by National Institutes of Health (http://www.nih.gov/) grant AI143991 to M Blouin.

Ethics

Animal experimentation: We used mice to maintain the schistosome parasites and to produce miracidia for challenge experiments. Infection is through contact with inoculated water and involves minimal discomfort. Infected rodents are euthanized with CO2 prior to showing clinical signs of disease and are dissected to recover parasitic eggs. Animal numbers were held to the minimum required for the research. Institutional approval: Oregon State University Animal Care and Use Protocols 4749 and 5115.

Senior Editor

  1. Dominique Soldati-Favre, University of Geneva, Switzerland

Reviewing Editor

  1. Dieter Ebert, University of Basel, Switzerland

Reviewer

  1. Yann Bourgeois, University of Portsmouth, United Kingdom

Publication history

  1. Received: May 27, 2020
  2. Accepted: August 25, 2020
  3. Accepted Manuscript published: August 26, 2020 (version 1)
  4. Version of Record published: September 16, 2020 (version 2)

Copyright

© 2020, Tennessen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,930
    Page views
  • 207
    Downloads
  • 11
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jacob A Tennessen
  2. Stephanie R Bollmann
  3. Ekaterina Peremyslova
  4. Brent A Kronmiller
  5. Clint Sergi
  6. Bulut Hamali
  7. Michael Scott Blouin
(2020)
Clusters of polymorphic transmembrane genes control resistance to schistosomes in snail vectors
eLife 9:e59395.
https://doi.org/10.7554/eLife.59395

Further reading

    1. Computational and Systems Biology
    2. Epidemiology and Global Health
    Oliver Robinson, Chung-Ho E Lau ... Martine Vrijheid
    Research Article

    Background: While biological age in adults is often understood as representing general health and resilience, the conceptual interpretation of accelerated biological age in children and its relationship to development remains unclear. We aimed to clarify the relationship of accelerated biological age, assessed through two established biological age indicators, telomere length and DNA methylation age, and two novel candidate biological age indicators , to child developmental outcomes, including growth and adiposity, cognition, behaviour, lung function and onset of puberty, among European school-age children participating in the HELIX exposome cohort.

    Methods: The study population included up to 1,173 children, aged between 5 and 12 years, from study centres in the UK, France, Spain, Norway, Lithuania, and Greece. Telomere length was measured through qPCR, blood DNA methylation and gene expression was measured using microarray, and proteins and metabolites were measured by a range of targeted assays. DNA methylation age was assessed using Horvath's skin and blood clock, while novel blood transcriptome and 'immunometabolic' (based on plasma protein and urinary and serum metabolite data) clocks were derived and tested in a subset of children assessed six months after the main follow-up visit. Associations between biological age indicators with child developmental measures as well as health risk factors were estimated using linear regression, adjusted for chronological age, sex, ethnicity and study centre. The clock derived markers were expressed as Δ age (i.e., predicted minus chronological age).

    Results: Transcriptome and immunometabolic clocks predicted chronological age well in the test set (r= 0.93 and r= 0.84 respectively). Generally, weak correlations were observed, after adjustment for chronological age, between the biological age indicators. Among associations with health risk factors, higher birthweight was associated with greater immunometabolic Δ age, smoke exposure with greater DNA methylation Δ age and high family affluence with longer telomere length. Among associations with child developmental measures, all biological age markers were associated with greater BMI and fat mass, and all markers except telomere length were associated with greater height, at least at nominal significance (p<0.05). Immunometabolic Δ age was associated with better working memory (p = 4e -3) and reduced inattentiveness (p= 4e -4), while DNA methylation Δ age was associated with greater inattentiveness (p=0.03) and poorer externalizing behaviours (p= 0.01). Shorter telomere length was also associated with poorer externalizing behaviours (p=0.03).

    Conclusions: In children, as in adults, biological ageing appears to be a multi-faceted process and adiposity is an important correlate of accelerated biological ageing. Patterns of associations suggested that accelerated immunometabolic age may be beneficial for some aspects of child development while accelerated DNA methylation age and telomere attrition may reflect early detrimental aspects of biological ageing, apparent even in children.

    Funding: UK Research and Innovation (MR/S03532X/1); European Commission (grant agreement numbers: 308333; 874583).

    1. Epidemiology and Global Health
    Katharine Sherratt, Hugo Gruson ... Sebastian Funk
    Research Article Updated

    Background:

    Short-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022.

    Methods:

    We used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported by a standardised source for 32 countries over the next 1–4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models’ predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models’ forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models’ past predictive performance.

    Results:

    Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models’ forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models’ forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models’ forecasts of deaths (N=763 predictions from 20 models). Across a 1–4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models.

    Conclusions:

    Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks.

    Funding:

    AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and respectively Virginia Dept of Health Grant VDH-21-501-0141, VDH-21-501-0143, VDH-21-501-0147, VDH-21-501-0145, VDH-21-501-0146, VDH-21-501-0142, VDH-21-501-0148. AF, AMa, GL funded by SMIGE - Modelli statistici inferenziali per governare l'epidemia, FISR 2020-Covid-19 I Fase, FISR2020IP-00156, Codice Progetto: PRJ-0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission - Directorate-General for Communications Networks, Content and Technology through the contract LC-01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018-095456-B-I00. DE., MGu funded by Spanish Ministry of Health / REACT-UE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021-021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID-19 (https://www.nfdi4health.de/task-force-covid-19-2) within the framework of a DFG-project (LO-342/17-1). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for real-time monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17-043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16-013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).