Ribosomal RNA (rRNA) sequences from 33 globally distributed mosquito species for improved metagenomics and species identification
Abstract
Total RNA sequencing (RNA-seq) is an important tool in the study of mosquitoes and the RNA viruses they vector as it allows assessment of both host and viral RNA in specimens. However, there are two main constraints. First, as with many other species, abundant mosquito ribosomal RNA (rRNA) serves as the predominant template from which sequences are generated, meaning that the desired host and viral templates are sequenced far less. Second, mosquito specimens captured in the field must be correctly identified, in some cases to the sub-species level. Here, we generate mosquito ribosomal RNA (rRNA) datasets which will substantially mitigate both of these problems. We describe a strategy to assemble novel rRNA sequences from mosquito specimens and produce an unprecedented dataset of 234 full-length 28S and 18S rRNA sequences of 33 medically important species from countries with known histories of mosquito-borne virus circulation (Cambodia, the Central African Republic, Madagascar, and French Guiana). These sequences will allow both physical and computational removal of rRNA from specimens during RNAseq protocols. We also assess the utility of rRNA sequences for molecular taxonomy and compare phylogenies constructed using rRNA sequences versus those created using the gold standard for molecular species identification of specimens-the mitochondrial cytochrome c oxidase I (COI) gene. We find that rRNA- and COI-derived phylogenetic trees are incongruent and that 28S and concatenated 28S+18S rRNA phylogenies reflect evolutionary relationships that are more aligned with contemporary mosquito systematics. This significant expansion to the current rRNA reference library for mosquitoes will improve mosquito RNA-seq metagenomics by permitting the optimization of species-specific rRNA depletion protocols for a broader range of species and streamlining species identification by rRNA sequence and phylogenetics.
Data availability
Multiple sequence alignment files are included as source data files. All sequences generated in this study have been deposited in GenBank under the accession numbers OM350214-OM350327 for 18S rRNA sequences, OM542339-OM542460 for 28S rRNA sequences, and OM630610-OM630715 for COI sequences.
Article and author information
Author details
Funding
Defense Advanced Research Projects Agency (Cooperative Agreement HR001118S0017)
- Maria-Carla Saleh
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2023, Koh et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,343
- views
-
- 198
- downloads
-
- 1
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Immunology and Inflammation
- Microbiology and Infectious Disease
Granulomas are defined by the presence of organized layers of immune cells that include macrophages. Granulomas are often characterized as a way for the immune system to contain an infection and prevent its dissemination. We recently established a mouse infection model where Chromobacterium violaceum induces the innate immune system to form granulomas in the liver. This response successfully eradicates the bacteria and returns the liver to homeostasis. Here, we sought to characterize the chemokines involved in directing immune cells to form the distinct layers of a granuloma. We use spatial transcriptomics to investigate the spatial and temporal expression of all CC and CXC chemokines and their receptors within this granuloma response. The expression profiles change dynamically over space and time as the granuloma matures and then resolves. To investigate the importance of monocyte-derived macrophages in this immune response, we studied the role of CCR2 during C. violaceum infection. Ccr2–/– mice had negligible numbers of macrophages, but large numbers of neutrophils, in the C. violaceum-infected lesions. In addition, lesions had abnormal architecture resulting in loss of bacterial containment. Without CCR2, bacteria disseminated and the mice succumbed to the infection. This indicates that macrophages are critical to form a successful innate granuloma in response to C. violaceum.
-
- Computational and Systems Biology
- Microbiology and Infectious Disease
Timely and effective use of antimicrobial drugs can improve patient outcomes, as well as help safeguard against resistance development. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is currently routinely used in clinical diagnostics for rapid species identification. Mining additional data from said spectra in the form of antimicrobial resistance (AMR) profiles is, therefore, highly promising. Such AMR profiles could serve as a drop-in solution for drastically improving treatment efficiency, effectiveness, and costs. This study endeavors to develop the first machine learning models capable of predicting AMR profiles for the whole repertoire of species and drugs encountered in clinical microbiology. The resulting models can be interpreted as drug recommender systems for infectious diseases. We find that our dual-branch method delivers considerably higher performance compared to previous approaches. In addition, experiments show that the models can be efficiently fine-tuned to data from other clinical laboratories. MALDI-TOF-based AMR recommender systems can, hence, greatly extend the value of MALDI-TOF MS for clinical diagnostics. All code supporting this study is distributed on PyPI and is packaged at https://github.com/gdewael/maldi-nn.