Ribosomal RNA (rRNA) sequences from 33 globally distributed mosquito species for improved metagenomics and species identification

Abstract
Data availability
Article and author information
Metrics

Abstract

Total RNA sequencing (RNA-seq) is an important tool in the study of mosquitoes and the RNA viruses they vector as it allows assessment of both host and viral RNA in specimens. However, there are two main constraints. First, as with many other species, abundant mosquito ribosomal RNA (rRNA) serves as the predominant template from which sequences are generated, meaning that the desired host and viral templates are sequenced far less. Second, mosquito specimens captured in the field must be correctly identified, in some cases to the sub-species level. Here, we generate mosquito ribosomal RNA (rRNA) datasets which will substantially mitigate both of these problems. We describe a strategy to assemble novel rRNA sequences from mosquito specimens and produce an unprecedented dataset of 234 full-length 28S and 18S rRNA sequences of 33 medically important species from countries with known histories of mosquito-borne virus circulation (Cambodia, the Central African Republic, Madagascar, and French Guiana). These sequences will allow both physical and computational removal of rRNA from specimens during RNAseq protocols. We also assess the utility of rRNA sequences for molecular taxonomy and compare phylogenies constructed using rRNA sequences versus those created using the gold standard for molecular species identification of specimens-the mitochondrial cytochrome c oxidase I (COI) gene. We find that rRNA- and COI-derived phylogenetic trees are incongruent and that 28S and concatenated 28S+18S rRNA phylogenies reflect evolutionary relationships that are more aligned with contemporary mosquito systematics. This significant expansion to the current rRNA reference library for mosquitoes will improve mosquito RNA-seq metagenomics by permitting the optimization of species-specific rRNA depletion protocols for a broader range of species and streamlining species identification by rRNA sequence and phylogenetics.

Data availability

Multiple sequence alignment files are included as source data files. All sequences generated in this study have been deposited in GenBank under the accession numbers OM350214-OM350327 for 18S rRNA sequences, OM542339-OM542460 for 28S rRNA sequences, and OM630610-OM630715 for COI sequences.

Article and author information

Author details

Cassandra Koh

Viruses and RNA Interference Unit, Institut Pasteur, Paris, France

For correspondence
cassandra.koh@pasteur.fr

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-2466-6731
Lionel Frangeul

Viruses and RNA Interference Unit, Institut Pasteur, Paris, France

Competing interests
The authors declare that no competing interests exist.
Hervé Blanc

Viruses and RNA Interference Unit, Institut Pasteur, Paris, France

Competing interests
The authors declare that no competing interests exist.
Carine Ngoagouni

Medical Entomology Laboratory, Institut Pasteur de Bangui, Bangui, Central African Republic

Competing interests
The authors declare that no competing interests exist.
Sébastien Boyer

Medical and Veterinary Entomology Unit, Institut Pasteur du Cambodge, Phnom Penh, Cambodia

Competing interests
The authors declare that no competing interests exist.
Philippe Dussart

Virology Unit, Institut Pasteur du Cambodge, Phnom Penh, Cambodia

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-1931-3037
Nina Grau

Medical Entomology Unit, Institut Pasteur de Madagascar, Antananarivo, Madagascar

Competing interests
The authors declare that no competing interests exist.
Romain Girod

Medical Entomology Unit, Institut Pasteur de Madagascar, Antananarivo, Madagascar

Competing interests
The authors declare that no competing interests exist.
Jean-Bernard Duchemin

Vectopôle Amazonien Emile Abonnenc, Institut Pasteur de la Guyane, Cayenne, French Guiana

Competing interests
The authors declare that no competing interests exist.
Maria-Carla Saleh

Viruses and RNA Interference Unit, Institut Pasteur, Paris, France

For correspondence
carla.saleh@pasteur.fr

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0001-8593-4117

Funding

Defense Advanced Research Projects Agency (Cooperative Agreement HR001118S0017)

Maria-Carla Saleh

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.