High resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
Abstract
The ANOSPP amplicon panel is a genus-wide targeted sequencing panel to facilitate large-scale monitoring of Anopheles species diversity. Combining information from the 62 nuclear amplicons present in the ANOSPP panel allows for a more nuanced species assignment than single gene (e.g. COI) barcoding, which is desirable in the light of permeable species boundaries. Here, we present NNoVAE, a method using Nearest Neighbours (NN) and Variational Autoencoders (VAE), which we apply to k-mers resulting from the ANOSPP amplicon sequences in order to hierarchically assign species identity. The NN step assigns a sample to a species-group by comparing the k-mers arising from each haplotype’s amplicon sequence to a reference database. The VAE step is required to distinguish between closely related species, and also has sufficient resolution to reveal population structure within species. In tests on independent samples with over 80% amplicon coverage, NNoVAE correctly classifies to species level 98% of samples within the An. gambiae complex and 89% of samples outside the complex. We apply NNoVAE to over two thousand new samples from Burkina Faso and Gabon, identifying unexpected species in Gabon. NNoVAE presents an approach that may be of value to other targeted sequencing panels, and is a method that will be used to survey Anopheles species diversity and Plasmodium transmission patterns through space and time on a large scale, with plans to analyse half a million mosquitoes in the next five years.
Data availability
Raw sequencing data will be made available on ENA (accession to be confirmed). Pipelines and analysis code, together with processed target haplotypes are available on GitHub: https://github.com/mariloubodde/NNoVAE.
Article and author information
Author details
Funding
Wellcome Trust (206194/Z/17/Z)
- Mara KN Lawniczak
Wellcome Trust (RG92770)
- Marilou Boddé
Wellcome Trust (WT207492)
- Richard Durbin
Agence Nationale de la Recherche (ANR-18-CE35-0002-01 - WILDING).)
- Diego Ayala
Institut de Recherche pour le Développement (Bourse ARTS/IRD)
- Lemonde Bouafou
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2022, Boddé et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Ecology
- Evolutionary Biology
While host phenotypic manipulation by parasites is a widespread phenomenon, whether tumors, which can be likened to parasite entities, can also manipulate their hosts is not known. Theory predicts that this should nevertheless be the case, especially when tumors (neoplasms) are transmissible. We explored this hypothesis in a cnidarian Hydra model system, in which spontaneous tumors can occur in the lab, and lineages in which such neoplastic cells are vertically transmitted (through host budding) have been maintained for over 15 years. Remarkably, the hydras with long-term transmissible tumors show an unexpected increase in the number of their tentacles, allowing for the possibility that these neoplastic cells can manipulate the host. By experimentally transplanting healthy as well as neoplastic tissues derived from both recent and long-term transmissible tumors, we found that only the long-term transmissible tumors were able to trigger the growth of additional tentacles. Also, supernumerary tentacles, by permitting higher foraging efficiency for the host, were associated with an increased budding rate, thereby favoring the vertical transmission of tumors. To our knowledge, this is the first evidence that, like true parasites, transmissible tumors can evolve strategies to manipulate the phenotype of their host.
-
- Evolutionary Biology
- Microbiology and Infectious Disease
Accurate estimation of the effects of mutations on SARS-CoV-2 viral fitness can inform public-health responses such as vaccine development and predicting the impact of a new variant; it can also illuminate biological mechanisms including those underlying the emergence of variants of concern. Recently, Lan et al. reported a model of SARS-CoV-2 secondary structure and its underlying dimethyl sulfate reactivity data (Lan et al., 2022). I investigated whether base reactivities and secondary structure models derived from them can explain some variability in the frequency of observing different nucleotide substitutions across millions of patient sequences in the SARS-CoV-2 phylogenetic tree. Nucleotide basepairing was compared to the estimated ‘mutational fitness’ of substitutions, a measurement of the difference between a substitution’s observed and expected frequency that is correlated with other estimates of viral fitness (Bloom and Neher, 2023). This comparison revealed that secondary structure is often predictive of substitution frequency, with significant decreases in substitution frequencies at basepaired positions. Focusing on the mutational fitness of C→U, the most common type of substitution, I describe C→U substitutions at basepaired positions that characterize major SARS-CoV-2 variants; such mutations may have a greater impact on fitness than appreciated when considering substitution frequency alone.