1. Microbiology and Infectious Disease
Download icon

Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses

  1. Ci-Xiu Li
  2. Mang Shi
  3. Jun-Hua Tian
  4. Xian-Dan Lin
  5. Yan-Jun Kang
  6. Liang-Jun Chen
  7. Xin-Cheng Qin
  8. Jianguo Xu
  9. Edward C Holmes
  10. Yong-Zhen Zhang Is a corresponding author
  1. National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, China
  2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, China
  3. Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Australia
  4. Wuhan Center for Disease Control and Prevention, China
  5. Wenzhou Center for Disease Control and Prevention, China
Research Article
Cited
60
Views
6,222
Comments
0
Cite as: eLife 2015;4:e05378 doi: 10.7554/eLife.05378

Abstract

Although arthropods are important viral vectors, the biodiversity of arthropod viruses, as well as the role that arthropods have played in viral origins and evolution, is unclear. Through RNA sequencing of 70 arthropod species we discovered 112 novel viruses that appear to be ancestral to much of the documented genetic diversity of negative-sense RNA viruses, a number of which are also present as endogenous genomic copies. With this greatly enriched diversity we revealed that arthropods contain viruses that fall basal to major virus groups, including the vertebrate-specific arenaviruses, filoviruses, hantaviruses, influenza viruses, lyssaviruses, and paramyxoviruses. We similarly documented a remarkable diversity of genome structures in arthropod viruses, including a putative circular form, that sheds new light on the evolution of genome organization. Hence, arthropods are a major reservoir of viral genetic diversity and have likely been central to viral evolution.

https://doi.org/10.7554/eLife.05378.001

eLife digest

Many illnesses, including influenza, hemorrhagic fever, and rabies, are caused by a group of viruses called negative-sense RNA viruses. The genetic information—or genome—of these viruses is encoded in strands of RNA that must be copied before they can be translated into the proteins needed to build new viruses. It is currently known that there are at least eight different families of these viruses, which have a wide range of shapes and sizes and arrange their RNA in different ways.

Insects, spiders, and other arthropods carry many different RNA viruses. Many of these viruses have not previously been studied, and those that have been studied so far are mainly those that cause diseases in humans and other vertebrates. Researchers therefore only know a limited amount about the diversity of the negative-sense RNA viruses that arthropods harbor and how these viruses evolved. Studying how viruses evolve helps scientists to understand what makes some viruses deadly and others harmless and can also help develop treatments or vaccines for the diseases caused by the viruses.

Li, Shi, Tian, Lin, Kang et al. collected 70 species of insects, spiders, centipedes, and other arthropods in China and sequenced all the negative-sense RNA viruses in the creatures. This revealed an enormous number of negative-sense RNA viruses, including 112 new viruses. Many of the newly discovered arthropod viruses appear to be the ancestors of disease-causing viruses, including influenza viruses and the filoviruses—the group that includes the Ebola virus. Indeed, it appears that arthropods host many—if not all—of the negative-sense RNA viruses that cause disease in vertebrates and plants.

While documenting the new RNA viruses and how they are related to each other, Li et al. found many different genome structures. Some genomes were segmented, which may play an important role in evolution as segments can be easily swapped to create new genetic combinations. Non-segmented and circular genomes were also found. This genetic diversity suggests that arthropods are likely to have played a key role in the evolution of new viruses by acting as a site where many different viruses can interact and exchange genetic information.

https://doi.org/10.7554/eLife.05378.002

Introduction

Negative-sense RNA viruses are important pathogens that cause a variety of diseases in humans including influenza, hemorrhagic fever, encephalitis, and rabies. Taxonomically, those negative-sense RNA viruses described to date comprise at least eight virus families and four unassigned genera or species (King et al., 2012). Although they share (i) a homologous RNA-dependent RNA polymerase (RdRp), (ii) inverted complementary genome ends, and (iii) an encapsidated negative-sense RNA genome, these viruses display substantial diversity in terms of virion morphology and genome organization (King et al., 2012). One key aspect of genome organization is the number of distinct segments, which is also central to virus classification. Among negative-sense RNA viruses, the number of segments varies from one (order Mononegavirales; unsegmented) to two (family Arenaviridae), three (Bunyaviridae), three-to-four (Ophioviridae), and six-to-eight (Orthomyxoviridae) and is further complicated by differences in the number, structure, and arrangement of the encoded genes.

Despite their diversity and importance in infectious disease, the origins and evolutionary history of the negative-sense RNA viruses are largely obscure. Arthropods harbor a diverse range of RNA viruses, which are often divergent from those that infect vertebrates (Marklewitz et al., 2011, 2013; Cook et al., 2013; Ballinger et al., 2014; Qin et al., 2014; Tokarz et al., 2014a, 2014b). However, those arthropod viruses sampled to date are generally those that have a relationship with vertebrates or are known to be agents of disease (Junglen and Drosten, 2013). To determine the extent of viral diversity harbored by arthropods, as well as their evolutionary history, we performed a systematic survey of negative-sense RNA viruses using RNA sequencing (RNA-seq) on a wide range of arthropods.

Results

Discovery of highly divergent negative-sense RNA viruses

We focused our study of virus biodiversity and evolution on 70 potential host species from four arthropod classes: Insecta, Arachnida, Chilopoda, and Malacostraca (Table 1 and Figure 1). From these samples, 16 separate cDNA libraries were constructed and sequenced, resulting in a total of 147.4 Gb of 100-base pair-end reads (Table 1). Blastx comparisons against protein sequences of negative-sense RNA virus revealed 108 distinct types of complete or nearly complete large (L) proteins (or polymerase protein 1 (PB1) in the case of orthomyxoviruses) that encode the relatively conserved RdRp (Tables 2–4). Four additional types of previously undescribed RdRp sequence (>1000 amino acids) were identified from the Transcriptome Shotgun Assembly (TSA) database. Together, these proteins exhibited an enormous diversity in terms of sequence variation and structure. Most notably, this data set of RdRp sequences is distinct from both previously described sequences and from each other, with the most divergent showing as little as 15.8% amino acid sequence identity to its closest relatives (Tables 2–4). Overall, these data provide evidence for at least 16 potentially new families and genera of negative-sense RNA viruses, defined as whose RdRp sequences shared less than 25% amino acid identity with existing taxa.

Table 1

Host and geographic information and data output for each pool of arthropod samples

https://doi.org/10.7554/eLife.05378.003
PoolNo of unitsOrderSpeciesLocationsData generated (bases)
Mosquitoes—Hubei24DipteraAedes sp, Armigeres subalbatus, Anopheles sinensis, Culex quinquefasciatus, Culex tritaeniorhynchusHubei26,606,799,000
Mosquitoes—Zhejiang26DipteraAedes albopictus, Armigeres subalbatus, Anopheles paraliae, Anopheles sinensis, Culex pipiens, Culex sp, Culex tritaeniorhynchusZhejiang7,233,954,480
True flies24DipteraAtherigona orientalis, Chrysomya megacephala, Lucilia sericata, Musca domestica, Sarcophaga dux, S. peregrina, S. spHubei6,574,954,320
Horseflies24DipteraUnidentified Tabanidae (5 species)Hubei8,721,642,060
Cockroaches24BlattodeaBlattella germanicaHubei6,182,028,000
Water striders12HemipteraUnidentified Gerridae (2 species)Hubei3,154,714,200
Insects mix 16Diptera, Coleoptera, Lepidoptera, NeuropteraAbraxas tenuisuffusa, Hermetia illucens, unidentified Chrysopidae, unidentified Coleoptera, Psychoda alternata, unidentified Diptera, unidentified StratiomyidaeZhejiang7,745,172,660
Insects mix 24Diptera, HemipteraUnidentified Hippoboscidae (2 species), Cimex hemipterusHubei5,916,431,520
Insects mix 3 (insect near water)10Odonata, Hemiptera, Hymenoptera, IsopodaPseudothemis zonata, unidentified Nepidae (2 species), Camponotus japonicus, Diplonychus sp, Asellus spHubei11,973,368,200
Insects mix 4 (insect in the mountain)12Diptera, Orthoptera, Odonata, Hymenoptera, HemipteraPsychoda alternata, Velarifictorus micado, Crocothemis servilia, unidentified Phoridae, unidentified Lampyridae, Aphelinus sp, Hyalopterus pruni, Aulacorthum magnoliaHubei6,882,491,800
Ticks16IxodidaDermacentor marginatus, Dermacentor sp, Haemaphysalis doenitzi, H. longicornis, H. sp, H. formosensis, Hyalomma asiaticum, Rhipicephalus microplus, Argas miniatusHubei, Zhejiang, Beijing, Xinjiang24,708,479,580
Ticks Hyalomma asiaticum1IxodidaHyalomma asiaticumXinjiang2,006,000,100
Spiders32AraneaeNeoscona nautica, Parasteatoda tepidariorum, Plexippus setipes, Pirata sp, unidentified AraneaeHubei11,361,912,300
Shrimps48DecapodaExopalaemon carinicauda, Metapenaeus sp, Solenocera crassicornis, Penaeus monodon, Litopenaeus vannameiZhejiang5,365,359,900
Crabs and barnacles35Decapoda, ScalpelliformesCapitulum mitella, Charybdis hellerii, C. japonica, Uca arcuataZhejiang5,833,269,360
Millipedes12PolydesmidaUnidentified Polydesmidae (2 species)Hubei, Beijing7,176,702,400
Host component of each pool used in the RNA-seq library construction and sequencing.

The taxonomic units in the tree correspond to the unit samples used in the RNA extraction. Species or genus information is marked to the left of the tree.

https://doi.org/10.7554/eLife.05378.004
Table 2

Mononegavirales-related RdRp sequences discovered in this study

https://doi.org/10.7554/eLife.05378.005
Virus nameLength of RdRpClassificationPoolAbundancePutative arthropod hostClosest relative (aa identity)
Bole Tick Virus 32155ChuvirusTicks202.35Hyalomma asiaticumMidway virus (17.1%)
Changping Tick Virus 22156ChuvirusTicks185.73Dermacentor spMidway virus (17.6%)
Changping Tick Virus 32209ChuvirusTicks41.80Dermacentor spMidway virus (16.5%)
Lishi Spider Virus 12180ChuvirusSpiders5.82Parasteatoda tepidariorumMidway virus (16.9%)
Shayang Fly Virus 12459ChuvirusTrue flies8.99Atherigona orientalisMaize mosaic virus (16.8%)
Shuangao Fly Virus 12097ChuvirusInsect mix 123.63Unidentified DipteraLettuce big-vein associated virus (16.3%)
Shuangao Insect Virus 52291ChuvirusInsect mix 1209.31Unidentified Diptera, Abraxas tenuisuffusa, unidentified ChrysopidaePotato yellow dwarf virus (16.3%)
Shuangao Lacewing Virus2145ChuvirusInsect mix 144.48Unidentified ChrysopidaePotato yellow dwarf virus (16.8%)
Tacheng Tick Virus 42101ChuvirusTicks137.22Argas miniatusMidway virus (17.5%)
Tacheng Tick Virus 52201ChuvirusTicks276.32Dermacentor marginatusMidway virus (16.8%)
Wenzhou Crab Virus 22208ChuvirusCrabs and barnacles4054.25Charybdis japonica, Charybdis lucifera, Charybdis helleriiMidway virus (15.8%)
Wenzhou Crab Virus 32077ChuvirusCrabs and barnacles169.21Charybdis japonicaMidway virus (16.3%)
Wuchang Cockroach Virus 32203ChuvirusCockroaches440.14Blattella germanicaMidway virus (16.3%)
Wuhan Louse Fly Virus 62182ChuvirusInsect mix 24.12Unidentified HippoboscidaeMidway virus (16.4%)
Wuhan Louse Fly Virus 72174ChuvirusInsect mix 299.83Unidentified HippoboscidaeMidway virus (17.2%)
Wuhan Mosquito Virus 82159ChuvirusMosquito Hubei300.33Culex tritaeniorhynchus, C. quinquefasciatus, Anopheles sinensis, Armigeres subalbatusMidway virus (16.7%)
Wuhan Tick Virus 22189ChuvirusTicks154.46Rhipicephalus microplusMidway virus (16.7%)
Culex tritaeniorhynchus rhabdovirus2142Culex tritaeniorhynchus rhabdovirusMosquito Hubei3517.32Culex tritaeniorhynchus, C. quinquefasciatus, Anopheles sinensis, Armigeres subalbatus, Aedes spIsfahan virus (38.5%)
Wuhan Insect Virus 42105CytorhabdovirusInsect mix 494.92Hyalopterus pruni OR Aphelinus spLettuce necrotic yellows virus (40.6%)
Wuhan Insect Virus 52098CytorhabdovirusInsect mix 4622.97Hyalopterus pruni OR Aphelinus spPersimmon virus A (47.9%)
Wuhan Insect Virus 62079CytorhabdovirusInsect mix 4991.99Hyalopterus pruni OR Aphelinus spPersimmon virus A (45.2)
Wuhan Louse Fly Virus 52123Kolente virus likeInsect mix 298.92Unidentified HippoboscidaeKolente virus (54.5%)
Yongjia Tick Virus 22113Nishimuro virus likeTicks13.14Haemaphysalis hystricisNishimuro virus (54.2%)
Shayang Fly Virus 22170Sigmavirus likeTrue flies36.83Musca domestica, Chrysomya megacephalaIsfahan virus (44.1%)
Wuhan Fly Virus 22134Sigmavirus likeTrue flies18.37Musca domestica, Sarcophaga spVesicular stomatitis Indiana virus (43.4%)
Wuhan House Fly Virus 12098Sigmavirus likeTrue flies31.04Musca domesticaIsfahan virus (42.8%)
Wuhan Louse Fly Virus 102146Sigmavirus likeInsect mix 2235.94Unidentified HippoboscidaeDrosophila melanogaster sigmavirus (51.2%)
Wuhan Louse Fly Virus 82145Sigmavirus likeInsect mix 2292.11Unidentified HippoboscidaeDrosophila melanogaster sigmavirus (50.6%)
Wuhan Louse Fly Virus 92145Sigmavirus likeInsect mix 269.37Unidentified HippoboscidaeDrosophila melanogaster sigmavirus (51.4%)
Bole Tick Virus 22171Unclassified dimarhabdovirus 1Ticks38.19Hyalomma asiaticumIsfahan virus (38.1%)
Huangpi Tick Virus 32193Unclassified dimarhabdovirus 1Ticks15.81Haemaphysalis doenitziEel virus European X (40%)
Tacheng Tick Virus 32182Unclassified dimarhabdovirus 1Ticks96.30Dermacentor marginatusEel virus European X (39.8%)
Taishun Tick Virus2226Unclassified dimarhabdovirus 1Ticks24.56Haemaphysalis hystricisVesicular stomatitis Indiana virus (36.6%)
Wuhan Tick Virus 12191Unclassified dimarhabdovirus 1Ticks119.92Rhipicephalus microplusEel virus European X (38.3%)
Wuhan Insect Virus 72120Unclassified dimarhabdovirus 2Insect mix 4241.7Hyalopterus pruni OR Aphelinus spIsfahan virus (42.6%)
Lishi Spider Virus 22201Unclassified mononegavirus 1Spiders5.57Unidentified AraneaeMaize fine streak virus (19.6%)
Sanxia Water Strider Virus 42108Unclassified mononegavirus 1Water striders4767.82Unidentified GerridaeOrchid fleck virus (20.5%)
Tacheng Tick Virus 62068Unclassified mononegavirus 1Ticks17.92Argas miniatusMaize mosaic virus (20.6%)
Shuangao Fly Virus 21966Unclassified mononegavirus 2Insect mix 125.94Psychoda alternataMidway virus (21.3%)
Xincheng Mosquito Virus2026Unclassified mononegavirus 2Mosquito Hubei400.12Anopheles sinensisMidway virus (19.2%)
Wenzhou Crab Virus 11807Unclassified mononegavirus 3Crabs and barnacles382.29Capitulum mitella, Charybdis japonica, Charybdis luciferaMidway virus (22.2%)
Tacheng Tick Virus 72215Unclassified rhabdovirus 1Ticks35.86Argas miniatusOrchid fleck virus (24.5%)
Jingshan Fly Virus 21970Unclassified rhabdovirus 2True flies4.43Sarcophaga spMaize fine streak virus (23.4%)
Sanxia Water Strider Virus 52264Unclassified rhabdovirus 2Water striders4373.68Unidentified GerridaeNorthern cereal mosaic virus (22.6%)
Shayang Fly Virus 32231Unclassified rhabdovirus 2True flies27.73Chrysomya megacephala, Atherigona orientalisMaize fine streak virus (22.6%)
Shuangao Bedbug Virus 22207Unclassified rhabdovirus 2Insect mix 216.29Cimex hemipterusMaize fine streak virus (22.5%)
Shuangao Insect Virus 62088Unclassified rhabdovirus 2Insect mix 114.37Unidentified Diptera, Abraxas tenuisuffusaPotato yellow dwarf virus (21.2%)
Wuhan Ant Virus2118Unclassified rhabdovirus 2Insect mix 3169.79Camponotus japonicusLettuce necrotic yellows virus (21.4%)
Wuhan Fly Virus 32230Unclassified rhabdovirus 2True flies6.00Musca domestica, Sarcophaga spMaize fine streak virus (21.9%)
Wuhan House Fly Virus 22233Unclassified rhabdovirus 2True flies221.04Musca domesticaNorthern cereal mosaic virus (23.4%)
Wuhan Mosquito Virus 92260Unclassified rhabdovirus 2Mosquito Hubei56.19Culex tritaeniorhynchus, C. quinquefasciatus, Aedes spPersimmon virus A (23.2%)
Wuhan Louse Fly Virus 112110Vesiculovirus likeInsect mix 26.11Unidentified HippoboscidaeVesicular stomatitis Indiana virus (52.9%)
Table 3

Bunya-arenaviridae-related RdRp sequences discovered in this study

https://doi.org/10.7554/eLife.05378.006
Virus nameLength of RdRpClassificationPoolAbundancePutative arthropod hostClosest relative (aa identity)
Huangpi Tick Virus 13914Nairovirus likeTicks11.32Haemaphysalis doenitziHazara virus (39.5%)
Tacheng Tick Virus 13962Nairovirus likeTicks88.91Dermacentor marginatusHazara virus (39.6%)
Wenzhou Tick Virus3967Nairovirus likeTicks44.30Haemaphysalis hystricisCrimean-Congo hemorrhagic fever virus (39.1%)
Shayang Spider Virus 14403Nairovirus likeSpiders90.95Neoscona nautica, Parasteatoda tepidariorum, Plexippus setipesCrimean-Congo hemorrhagic fever virus (26.2%)
Xinzhou Spider Virus4037Nairovirus likeSpiders3.79Neoscona nautica, Parasteatoda tepidariorumErve virus (22.9%)
Sanxia Water Strider Virus 13936Nairovirus likeWater striders26,483.38Unidentified GerridaeHazara virus (23.4%)
Wuhan Louse Fly Virus 12250OrthobunyavirusInsect mix 267.06Unidentified HippoboscoideaLa Crosse virus (57.8%)
Shuangao Insect Virus 12335Orthobunyavirus likeInsect mix 17.97Unidentified Chrysopidae, Psychoda alternataKhurdun virus (29.1%)
Wuchang Cockroach Virus 12125Phasmavirus likeCockroaches11,283.22Blattella germanicaKigluaik phantom virus (35.9%)
GAQJ010071891554Phasmavirus likeDatabaseN/AOstrinia furnacalisKigluaik phantom virus (35.9%)
Shuangao Insect Virus 21765Phasmavirus likeInsect mix 136.32Abraxas tenuisuffusa, unidentified DipteraKigluaik phantom virus (31.9%)
Wuhan Mosquito Virus 12095Phasmavirus likeMosquito Hubei, Mosquito Zhejiang3523.08Culex tritaeniorhynchus, Anopheles sinensis, Culex quinquefasciatusKigluaik phantom virus (39.5%)
Wuhan Mosquito Virus 22111Phasmavirus likeMosquito Hubei, Mosquito Zhejiang39.66Culex tritaeniorhynchus, Anopheles sinensis, Culex quinquefasciatus, Aedes spKigluaik phantom virus (39.6%)
Huangpi Tick Virus 22121PhlebovirusN/AN/AHaemaphysalis spUukuniemi virus (49.3%)
Bole Tick Virus 12148PhlebovirusTicks67.86Hyalomma asiaticumUukuniemi virus (37.9%)
Changping Tick Virus 12194PhlebovirusTicks335.25Dermacentor spUukuniemi virus (39.7%)
Dabieshan Tick Virus2148PhlebovirusTicks250.62Haemaphysalis longicornisUukuniemi virus (39.2%)
Lihan Tick Virus2151PhlebovirusTicks60.40Rhipicephalus microplusUukuniemi virus (38.6%)
Tacheng Tick Virus 22189PhlebovirusTicks132.59Dermacentor marginatusUukuniemi virus (39.0%)
Yongjia Tick Virus 12138PhlebovirusTicks119.49Haemaphysalis hystricisUukuniemi virus (40.5%)
GAIX010000592151Phlebovirus likeDatabaseN/APararge aegeriaCumuto virus (24.1%)
GAKZ010482601583Phlebovirus likeDatabaseN/AProcotyla fluviatilisCumuto virus (22.8%)
GAQJ010086812261Phlebovirus likeDatabaseN/AOstrinia furnacalisGouleako virus (22.0%)
Shuangao Insect Virus 32050Phlebovirus likeInsect mix 1339.41Unidentified Chrysopidae, unidentified DipteraCumuto virus (23.7%)
Wuhan Louse Fly Virus 22327Phlebovirus likeInsect mix 23.57Unidentified HippoboscoideaUukuniemi virus (25.2%)
Wuhan Insect Virus 12099Phlebovirus likeInsect mix 3178.53Asellus sp, unidentified Nepidae, Camponotus japonicusCumuto virus (24.8%)
Huangshi Humpbacked Fly Virus2009Phlebovirus likeInsect mix 413.13Unidentified PhoridaeCumuto virus (18.1%)
Yichang Insect Virus2100Phlebovirus likeInsect mix 471.50Aulacorthum magnoliaeGouleako virus (45.3%)
Wuhan Millipede Virus 11854Phlebovirus likeMillipedes and insect mix 3825.66Unidentified PolydesmidaeCumuto virus (25.3%)
Qingnian Mosquito Virus2243Phlebovirus likeMosquito Hubei17.09Culex quinquefasciatusRazdan virus (21.0%)
Wutai Mosquito Virus2185Phlebovirus likeMosquito Hubei70.72Culex quinquefasciatusRice stripe virus (26.4%)
Xinzhou Mosquito Virus2022Phlebovirus likeMosquito Hubei98.95Anopheles sinensisCumuto virus (24.7%)
Zhee Mosquito Virus2443Phlebovirus likeMosquito Hubei, Mosquito Zhejiang308.98Anopheles sinensis, Armigeres subalbatusCumuto virus (22.6%)
Wenzhou Shrimp Virus 12051Phlebovirus likeShrimps5859.37Penaeus monodonUukuniemi virus (32.2%)
Wuhan Spider Virus2251Phlebovirus likeSpiders17.71Neoscona nautica, Parasteatoda tepidariorum, Plexippus setipesUukuniemi virus (21.7%)
Wuhan Fly Virus 12192Phlebovirus likeTrue flies68.58Atherigona orientalis, Chrysomya megacephala, Sarcophaga sp, Musca domesticaGrand Arbaud virus (27.8%)
Wuhan Horsefly Virus3117Tenuivirus likeHorseflies13.50Unidentified TabanidaeUukuniemi virus (28.2%)
Jiangxia Mosquito Virus 11889Unclassified segmented virus 1Mosquito Hubei11.55Culex tritaeniorhynchusGouleako virus (16.7%)
Shuangao Bedbug Virus 12015Unclassified segmented virus 2Insect mix 212.71Cimex hemipterusMurrumbidgee virus (16.3%)
Jiangxia Mosquito Virus 21860Unclassified segmented virus 2Mosquito Hubei2.81Culex tritaeniorhynchusHantavirus (18.9%)
Shuangao Mosquito Virus1996Unclassified segmented virus 2Mosquito Zhejiang11.67Armigeres subalbatusHantavirus (18.7%)
Wenzhou Shrimp Virus 22241Unclassified segmented virus 3Shrimps3824.55Penaeus monodon, Exopalaemon carinicaudaLa Crosse virus (19.0%)
Shayang Spider Virus 22165Unclassified segmented virus 4Spiders12.75Neoscona nautica, Pirata sp, Parasteatoda tepidariorum, unidentified AraneaeAkabane virus (16.6%)
Wuhan Insect Virus 22377Unclassified segmented virus 5Insect mix 4223.06Hyalopterus pruni OR Aphelinus spKigluaik phantom virus (19.2%)
Sanxia Water Strider Virus 22349Unclassified segmented virus 5Water striders707.09Unidentified GerridaeKigluaik phantom virus (19.8%)
Wuhan Millipede Virus 23709Unclassified segmented virus 6Millipedes1513.41Unidentified PolydesmidaeDugbe virus (17.2%)
Wuhan Insect Virus 32231Unclassified segmented virus 7Insect mix 33.50Asellus spHerbert virus (17.2%)
Table 4

Orthomyxoviridae-related RdRp sequences discovered in this study

https://doi.org/10.7554/eLife.05378.007
Virus nameLength of RdRpClassificationPoolAbundancePutative arthropod hostClosest relative (aa identity)
Jingshan Fly Virus 1795QuaranjavirusTrue flies21.93Atherigona orientalis, Chrysomya megacephala, Sarcophaga sp, Musca domesticaJohnston Atoll virus (36.9%)
Jiujie Fly Virus653QuaranjavirusHorseflies10.30Unidentified TabanidaeJohnston Atoll virus (39.7%)
Sanxia Water Strider Virus 3789QuaranjavirusWater striders1101.03Unidentified GerridaeJohnston Atoll virus (36.7%)
Shayang Spider Virus 3768QuaranjavirusSpiders1.95Neoscona nauticaJohnston Atoll virus (38.5%)
Shuangao Insect Virus 4793QuaranjavirusInsect mix 159.90Unidentified Diptera, unidentified StratiomyidaeJohnston Atoll virus (36.9%)
Wuhan Louse Fly Virus 3784QuaranjavirusInsect mix 2500.77Unidentified HippoboscoideaJohnston Atoll virus (37.7%)
Wuhan Louse Fly Virus 4783QuaranjavirusInsect mix 296.80Unidentified HippoboscoideaJohnston Atoll virus (38.2%)
Wuhan Mosquito Virus 3801QuaranjavirusMosquito Hubei40.07Culex tritaeniorhynchus, Culex quinquefasciatus, Armigeres subalbatusJohnston Atoll virus (35.6%)
Wuhan Mosquito Virus 4792QuaranjavirusMosquito Hubei86.21Culex tritaeniorhynchus, Culex quinquefasciatus, Armigeres subalbatusJohnston Atoll virus (34.8%)
Wuhan Mosquito Virus 5806QuaranjavirusMosquito Hubei75.05Culex tritaeniorhynchus, Culex quinquefasciatus, Armigeres subalbatusJohnston Atoll virus (35.5%)
Wuhan Mosquito Virus 6800QuaranjavirusMosquito Hubei56.30Culex quinquefasciatusJohnston Atoll virus (34.2%)
Wuhan Mosquito Virus 7779QuaranjavirusMosquito Hubei20.74Anopheles sinensis, Culex quinquefasciatusJohnston Atoll virus (34.1%)
Wuhan Mothfly Virus710QuaranjavirusInsect mix 414.47Psychoda alternataJohnston Atoll virus (39.7%)
Wuchang Cockroach Virus 2671Unclassified orthomyxovirus 1Cockroaches4.01Blattella germanicaInfluenza C virus (27.0%)

Next, we measured the abundance of these sequences as the number transcripts per million (TPM) within each library after the removal of rRNA reads. The abundance of viral transcripts calculated in this manner exhibited substantial variation (Figure 2, Tables 2–4): while the least abundant L segment (Shayang Spider Virus 3) contributed to less than 0.001% to the total non-ribosomal RNA content, the most abundant (Sanxia Water Strider Virus 1) was at a frequency of 21.2%, and up to 43.9% if we include the matching M and S segments of the virus. The remaining viral RdRp sequences fell within a range (10–1000 TPM) that matched the abundance level of highly expressed host mitochondrial genes (Figure 2).

Abundance level (transcripts per million—TPM) of the RdRp genes from the negative-sense RNA viruses detected in this study.

Abundance is calculated after the removal of ribosomal RNA reads. As a comparison, we show the abundance of the two well characterized (positive-sense) RNA viruses: Japanese encephalitis virus and Gill-associated virus found in the Mosquito-Hubei and Shrimp libraries, respectively, as well as the range of abundance of host mitochondrial COI genes in these same multi-host libraries.

https://doi.org/10.7554/eLife.05378.008

Evolutionary history of negative-sense RNA viruses

With this highly diverse set of RdRp sequences in hand we re-examined the evolution of all available negative-sense RNA viruses by phylogenetic analysis (Figure 3; Figure 3—figure supplement 3). These data greatly expand the documented diversity of four viral families/orders—the Arenaviridae, Bunyaviridae, Orthomyxoviridae, and Mononegavirales—as well as of three floating genera—Tenuivirus, Emaravirus, and Varicosavirus (King et al., 2012). Most of the newly described arthropod viruses fell basal to the known genetic diversity in these taxa: their diversity either engulfed that of previously described viruses, as in the case of phlebovirus, nairovirus, and dimarhabdovirus, or appeared as novel lineages sandwiched between existing genera or families, and hence filling in a number of phylogenetic ‘gaps’ (Figure 3; Figure 3—figure supplement 3). One important example was a large monophyletic group of newly discovered viruses that fell between the major groups of segmented and unsegmented viruses (Figure 4); we name this putative new virus family the ‘Chuviridae’ reflecting the geographic location in China where most of this family were identified (‘Chu’ is a historical term referring to large area of China encompassing the middle and lower reaches of the Yangzi River). Also of note was that some of the previously defined families no longer appear as monophyletic. For example, although classified as distinct families, the family Arenaviridae fell within the genetic diversity of the family Bunyaviridae and as a sister group to viruses of the genus Nairovirus. Furthermore, the floating genus Tenuivirus was nested within the Phlebovirus-like clade, and another floating genus, Emaravirus, formed a monophyletic group with the Orthobunyavirus and Tospovirus genera (Figure 3C; Figure 3—figure supplement 2). Hence, there are important inconsistencies between the current virus classification scheme and the underlying evolutionary history of the RdRp revealed here.

Figure 3 with 3 supplements see all
Evolutionary history of negative-sense RNA viruses based on RdRp.

This is initially displayed in an unrooted maximum likelihood (ML) tree including all major groups of negative-sense RNA viruses (A). Separate and more detailed ML phylogenies are then shown for the Orthomyxoviridae-like (B), Bunya-Arenaviridae-like (C), and Mononegavirales-like viruses (D). In all the phylogenies, the RdRp sequences described here from arthropods are either shaded purple or marked with solid gray circles. The names of previously defined genera/families are labeled to the right of the phylogenies. Based on their host types, the branches are shaded red (vertebrate-specific), yellow (vertebrate and arthropod), green (plant and arthropod), blue (non-arthropod invertebrates), or black (arthropod only). For clarity, statistical supports (i.e., approximate likelihood-ratio test (aLRT) with Shimodaira–Hasegawa-like procedure/posterior probabilities) are shown for key internal nodes only.

https://doi.org/10.7554/eLife.05378.009
The unrooted ML phylogeny based on RdRp showing the topological position of segmented viruses within the genetic diversity of negative-sense RNA viruses.

The segmented viruses are labeled with segment numbers and shaded red. The unsegmented viruses are shaded green. The Chuviridae, which exhibit a wide variety of genome organizations, are shaded cyan. Three major types of putative chuvirus genomes (circular, circular and segmented, and linear) are shown in the right panel and are annotated with predicted ORFs: putative RdRp genes are shaded blue, putative glycoprotein genes are shaded orange, and the remaining ORFs are shaded gray.

https://doi.org/10.7554/eLife.05378.013

A key result of this study is that much of the genetic diversity of negative-sense RNA viruses in vertebrates and plants now appears to be contained within viruses that utilize arthropods as hosts or vectors. Indeed, it is striking that all vertebrate-specific segmented and unsegmented viruses (arenavirus, bornavirus, filovirus, hantavirus, influenza viruses, lyssavirus, and paramyxovirus) fall within the genetic diversity of arthropod-associated viruses (Figures 3, 5). Also nested with arthropod-associated diversity were plant viruses (emaravirus, tospovirus, tenuiviruses, nucleorhabdovirus, cytorhabdovirus, and varicosavirus) (Figures 3, 5). Surprisingly, our phylogeny similarly placed two non-arthropod invertebrate viruses, found in nematodes (Heterodera glycines) and flatworms (Procotyla fluviatilis), within arthropod-associated diversity (Figure 3C, Figure 3—figure supplement 2), indicating that the role of non-arthropod invertebrates should be explored further. Finally, it was striking that although individual arthropod species can harbor a rich diversity of RNA viruses, many viruses seemed to be associated with different arthropod species that share the same ecological niche (Tables 2–4). Interestingly, host species in the same niche had similar viral contents that were generally incongruent with the host phylogeny (Figure 6). Such a pattern is indicative of frequent cross-species and occasional cross-genus virus transmission in the context of ecological and geographic proximity.

The unrooted ML phylogeny of negative-sense RNA viruses (RdRp) with the common names of the principle arthropod hosts analyzed in this study indicated.

Vertebrate-specific viruses are shaded red, those infecting both vertebrates and arthropods (or with unknown vectors) are shaded yellow, those infecting both plants and arthropods are shaded green, those infecting non-arthropod invertebrates are shaded blue, and the remainder (arthropod only) are shaded black.

https://doi.org/10.7554/eLife.05378.014
Phylogenetic congruence between viruses (M segments) and hosts.

The comparisons include (A) Wuhan Horsefly Virus, (B) Wuhan Fly Virus 1, (C) Wuhan Mosquito Virus 2, and (D) Wuhan Mosquito Virus 1. Different host species/genera are distinguished with different colors, which are then mapped onto virus phylogeny to assess the phylogenetic congruence. ML phylogenetic trees were inferred in all cases.

https://doi.org/10.7554/eLife.05378.015

Diversity and evolution of virus genome organizations

The diversity of genome structures in these virus data was also striking. This can easily be documented with respect to the evolution of genome segmentation. The number of genome segments in negative-sense RNA viruses varies from one to eight. Our phylogenetic analysis revealed no particular trend for this number to increase or decrease through evolutionary time (Figure 4). Hence, genome segmentation (i.e., genomes with >1 segment) has clearly evolved on multiple occasions within the negative-sense RNA viruses (Figure 4), such that it is a relatively flexible genetic trait. Although most segmented viruses were distantly related to those with a single segment (Figure 4), close phylogenetic ties were seen in other cases supporting the relatively recent evolution of multiple segments, with the plant-infecting varicosavirus (two segments) and orchid fleck virus (bipartite) serving as informative examples.

In this context, it is notable that the newly discovered chuviruses fell ‘between’ the phylogenetic diversity of segmented and the unsegmented viruses. Although monophyletic, the chuviruses display a wide variety of genome organizations including unsegmented, bi-segmented, and a circular form, each of which appeared multiple times in the phylogeny (Figures 4, 7). The circular genomic form, which was confirmed by ‘around-the-genome’ RT-PCR and by the mapping of sequencing reads to the genome (Figure 7C), is a unique feature of the Chuviridae and can be distinguished from a pseudo-circular structure seen in some other negative-sense RNA viruses including the family Bunyaviridae and the family Orthomyxoviridae. Furthermore, this circular genomic form was also present in both segments of the segmented chuviruses (Figure 7B). In addition, the chuviruses displayed a diverse number and arrangement of predicted open reading frames that were markedly different from the genomic arrangement seen in the order Mononegavirales even though these viruses are relatively closely related (Figures 4, 7). In particular, the chuviruses had unique and variable orders of genes: the linear chuvirus genomes began with the glycoprotein (G) gene, followed by the nucleoprotein (N) gene, and then the polymerase (L) gene, whereas the majority of circular chuviruses were most likely arranged in the order L-(G)-N (i.e., if displayed in a linear form) as the only low coverage point throughout the genome lay between the 5′ end of N gene and the 3′ end of L gene (Figure 7B). In addition, the genome organizations of the chuviruses were far more concise than those of the order Mononegavirales, with ORFs encoding only 2–3 major (>20 kDa) proteins (Figure 7), and hence showing more similarity to segmented viruses in this respect.

The differing genome organizations in the Chuviridae.

(A) ML trees of three main putative proteins conserved among the chuviruses. Viruses with circular genomes (Type I) are shaded blue, while those with segmented genomes (Type II) are shaded red. (B) Structures of all complete chuvirus genomes. Circular genomes are indicated with the arrow (blue) situated at the 3′ end, and the genome is drawn in a linear form for ease of comparison only, being broken at the region of variable sequence (refer to the ‘Materials and methods’). (C) An example showing mapping of sequencing reads to the circular chuvirus genome. The template for mapping contains two genomes connected head-to-tail. The two boxes magnify the genomic region containing abundant sequence variation.

https://doi.org/10.7554/eLife.05378.016

Although our phylogenetic analysis focused on the relatively conserved RdRp, in the case of segmented viruses we searched for other putative viral proteins from the assembled contigs. Accordingly, we were able to find the segments encoding matching structural proteins (mainly glycoproteins and nucleoproteins) for many of the viral RdRp sequences (Figure 8), although extensive sequence divergence prevented this in some cases. Surprisingly, M segments were apparently absent in a group of tick phleboviruses whose RdRps and nucleoproteins showed relatively high sequence similarity to Uukuniemi virus (genus Phlebovirus; Table 3 and Figure 8). Genomes with missing glycoprotein genes were also found in the chuviruses (Changping Tick Viruses 3 and 5, Wuhan Louse Viruses 6 and 7, Figure 7) and the unsegmented dimarhabdovirus (Taishun Tick Virus, Wuhan Tick Virus 1, Tacheng Tick Virus 6, Figure 9). Although it is possible that the glycoprotein gene may have been replaced with a highly divergent or even non-homologous sequence, we failed to find any candidate G proteins within the no-Blastx-hit set of sequences under the following criteria: (i) structural resemblance to G proteins, (ii) similar level of abundance to the corresponding RdRp and nucleoprotein genes, and (iii) comparable phylogenies or levels of divergence (among related viruses) to those of RdRps and nucleoproteins. The cause and biological significance of these seemingly ‘incomplete’ virus genomes require further study. Finally, it was also of interest that a virus with four segments was discovered in the horsefly pool. Although the predicted proteins of all four segments showed sequence homology to their counterparts in Tenuivirus (Falk and Tsai, 1998), this virus lacked the ambisense coding strategy of tenuiviruses (Figure 10). While the capability of this virus to infect plants is unknown, it is possible that it represents a transitional form between plant-infecting and arthropod-specific viruses.

Genome structures of segmented negative-sense RNA viruses.

Predicted viral proteins homologous to known viral proteins are shown and colored according to their putative functions. The numbers below each ORF box give the predicted molecular mass.

https://doi.org/10.7554/eLife.05378.017
Genome structures of unsegmented negative-sense RNA viruses.

Predicted ORFs encoding viral proteins with >10 kDa molecular mass are shown and colored according to their putative functions. The numbers below each ORF box give the predicted molecular mass.

https://doi.org/10.7554/eLife.05378.018
Comparison of the genome structure of a potential tenui-like virus from horsefly with a prototype tenuivirus (Rice grassy stunt virus) genome.
https://doi.org/10.7554/eLife.05378.019

Novel Endogenous Virus Elements (EVEs)

As well as novel exogenous RNA viruses, our metagenomic analysis also revealed a large number of potential EVEs (Katzourakis and Gifford, 2010) in more than 40 arthropod species; these resembled complete or partial genes of the major proteins—the nucleoprotein, glycoprotein, and RdRp—but without fully intact genomes (Table 5). As expected given their endogenous status, most of these sequences have disrupted reading frames and many are found within transposon elements, suggesting that transposons have been central to their integration. Interestingly, in some cases, such as the putative glycoprotein gene of chuviruses, the homologous EVEs from within a family (Culicidae) or even an order (Hymenoptera) form monophyletic groups (Figure 11). However, they are unlikely to be orthologous because they do not share homologous integration sites in the host genome as determined by an analysis of flanking sequences, which in turn limited the applicability of molecular-clock based dating techniques. Furthermore, phylogenetic analyses of those EVEs shared among different host species revealed extremely complex tree topologies which do not exhibit simple matches to the host phylogeny at both the species and genera levels (Figure 11B–C). In sum, these results suggest that EVEs are relative commonplace in arthropod genomes and have been often generated by multiple and independent integration events.

Table 5

Summary of Endogenous Virus Elements (EVEs) determined here

https://doi.org/10.7554/eLife.05378.020
Host classificationHost nameVirus classificationGene(s) present
ChelicerataIxodes scapularisChuvirusG, N
DimarhabdovirusRdRp, N
Nairovirus likeN
PhlebovirusRdRp, N
QuaranjavirusRdRp
Tetranychus urticaeDimarhabdovirusN
CrustaceaDaphnia pulexPhlebovirus likeRdRp
Eurytemora affinisChuvirusG
DimarhabdovirusRdRp, N
Hyalella aztecaChuvirusG, N
Unclassified mononegavirus 3RdRp, N
Lepeophtheirus salmonisPhlebovirus likeN, G
Insecta: ColeopteraDendroctonus ponderosaeChuvirusG
PhasmavirusG, N
Tribolium castaneumChuvirusG
Insecta: DipteraAedes aegyptiChuvirusRdRp
DimarhabdovirusRdRp, N
PhasmavirusG
Phlebovirus likeN
QuaranjavirusRdRp
Anopheles spp.ChuvirusG
DimarhabdovirusRdRp, N
PhasmavirusG, N
Phlebovirus likeN
QuaranjavirusRdRp
Culex quinquefasciatusChuvirusG, N
DimarhabdovirusN
Drosophila spp.DimarhabdovirusRdRp, N
PhasmavirusN
Unclassified rhabdovirus 2RdRp, N
Insecta: IsopteraZootermopsis nevadensisChuvirusN
Insecta: HemipteraAcyrthosiphon pisumChuvirusG, N
DimarhabdovirusN
Phlebovirus likeN
QuaranjavirusRdRp
Unclassified mononegavirus 1RdRp, N
Rhodnius prolixusChuvirusG
PhasmavirusG
Insecta: HymenopteraAtta cephalotesUnclassified mononegavirus 2RdRp
Acromyrmex echinatiorChuvirusG
Unclassified mononegavirus 2RdRp
Camponotus floridanusChuvirusG
Unclassified mononegavirus 1N
Unclassified mononegavirus 3RdRp
Unclassified rhabdovirus 2RdRp
Harpegnathos saltatorChuvirusG
Linepithema humileChuvirusG
Nasonia spp.ChuvirusG
Pogonomyrmex barbatusChuvirusG
Solenopsis invictaChuvirusG
Unclassified mononegavirus 1N
Unclassified mononegavirus 3RdRp, N
Insecta: LepidopteraBombyx moriChuvirusRdRp, G
QuaranjavirusRdRp
Unclassified rhabdovirus 2RdRp
Melitaea cinxiaDimarhabdovirusN
QuaranjavirusRdRp
Plutella xylostellaDimarhabdovirusN, G
Spodoptera frugiperdaPhlebovirus likeG
MyriapodaStrigamia maritimaChuvirusN
Phlebovirus likeG
ML phylogeny of EVEs.

The phylogeny is based on the glycoprotein of chuviruses in the context of exogenous members of this family (A), with subtrees magnified for (B) the Culicidae clade and (C) the Hymenoptera clade. The EVEs used in the phylogeny covered the complete or near complete length of the glycoprotein gene and are shown in red and labeled according to host taxonomy in the overall tree. For clarity, monophyletic groups are collapsed based on the host taxonomy. Only bootstrap values >70% are shown.

https://doi.org/10.7554/eLife.05378.021

Discussion

Our study suggests that arthropods are major reservoir hosts for many, if not all, of the negative-sense RNA viruses in vertebrates and plants, and hence have likely played a major role in their evolution. This is further supported by the high abundance of viral RNA in the arthropod transcriptome, as well as by the high frequencies of endogenous copies of these viruses in the arthropod genome, greatly expanding the known biodiversity of these genomic ‘fossils’ (Katzourakis and Gifford, 2010; Cui and Holmes, 2012). The often basal position of the arthropod viruses in our phylogenetic trees is also compatible with the idea that the negative-sense RNA viruses found in vertebrates and plants ultimately have their ancestry in arthropods, although this will only be confirmed with a far wider sample of virus biodiversity.

The rich genetic and phylogenetic diversity of arthropod RNA viruses may in part reflect the enormous species number and diversity of arthropods, and that they sometimes live in large and very dense populations that provide abundant hosts to fuel virus transmission. Furthermore, arthropods are involved in almost all ecological guilds and actively interact with other eukaryotes, including animals, plants, and fungi, such that it is possible that they serve as both sources and sinks for viruses present in the environment. In addition, not only were diverse viruses present, but they were often highly abundant. For example, in the pool containing 12 individuals (representing two species) from the Gerridae (Water striders) collected at the same site, we identified at least five negative-sense RNA viruses whose TPM values are well above 100, and where the viral RNA collectively made up more than 50% of the host total RNA (rRNA excluded). Determining why arthropods are able to carry such a large viral diversity and at such frequencies clearly merits further investigation.

The viruses discovered here also exhibited a huge variation in level of abundance. It is possible that this variation is in part due to the stage or severity of infection in individual viruses and may be significantly influenced by the process of pooling, since most of our libraries contain an uneven mixture of different host species or even genera. In addition, it is possible that some low abundance viruses may in fact be derived from other eukaryotic organisms present in the host sampled, such as undigested food or prey, gut micro flora, and parasites. Nevertheless, since the majority of the low abundance viruses appear in the same groups as the highly abundant ones in our phylogenetic analyses, these viruses are most likely associated with arthropods.

Viral infections in vertebrates and plants can be divided into two main categories: (i) arthropod-dependent infections, in which there is spill-over to non-arthropods but where continued virus transmission still requires arthropods, and (ii) arthropod-independent infections, in which the virus has shifted its host range to circulate among vertebrates exclusively (Figure 12). The first category of infections is often associated with major vector-borne diseases (Zhang et al., 2011, 2012). Given the biodiversity of arthropod viruses documented here, it seems likely that arthropod-independent viruses were ultimately derived from arthropod-dependent infections, with subsequent adaptation to vertebrate-only transmission (Figure 12).

Transmission of negative-sense RNA viruses in arthropods and non-arthropods.

Three types of transmission cycle are shown: (i) those between arthropods and plants are shaded green; (ii) those between arthropods and vertebrates are shaded yellow; and (iii) those that are vertebrate-only are shaded red. Viruses associated with each transmission type are also indicated.

https://doi.org/10.7554/eLife.05378.022

One of the most notable discoveries was that of a novel family, the Chuviridae. The identification of this diverse virus family provides a new perspective on the evolutionary origins of segmented and unsegmented viruses. In particular, the chuviruses occupy a phylogenetic position that is in some sense ‘intermediate’ between the segmented and unsegmented negative-sense RNA viruses and display genomic features of both. Indeed, our phylogenetic analysis reveals that genome segmentation has evolved multiple times within the diversity of chuviruses (Figure 7), such that this trait appears to be more flexible than previously anticipated. In addition, the majority of the chuviruses possess circular genomes. To date, the only known circular RNA virus is (hepatitis) deltavirus, although this potentially originated from the human genome (Salehi-Ashtiani et al., 2006) and requires hepatitis B virus for successful replication. As such, the chuviruses may represent the first report of autonomously replicating circular RNA viruses, which opens up an important line of future research.

Our results also provide insights into the evolution of genome segmentation. Within the bunya-arena-like viruses (Figures 3C, 4), the three-segment structure is the most common, with the viral polymerase, nucleoprotein, and surface glycoproteins present on different segments. Notably, our phylogenetic analysis seemingly revealed independent occurrences of both increasing (Tenuivirus and Emaravirus) and decreasing (Arenavirus) segment numbers from the three-segment form (Figure 4). Independent changes of genome segmentation numbers are also observed in the mononegavirales-like viruses (Figure 4) and, more frequently, in the chuviruses (Figure 7A). Consequently, the number of genome segments appears to be a relatively flexible trait at a broad evolutionary scale, although the functional relevance of these changes remains unclear. While the segmented viruses (bunya-arenaviruses, orthomyxoviruses, and ophioviruses) appear to be distinct from the largely unsegmented mononegavirales-like viruses in our phylogenetic analysis, this may be an artifact of under-sampling, especially given that only a tiny fraction of eukaryotes have been sampled to date. With a wider sample of eukaryotic viruses it will be possible to more accurately map changes in segment number onto phylogenetic trees and in so doing come to a more complete understanding of the patterns and determinants of the evolution of genome segmentation.

In sum, our results highlight the remarkable diversity of arthropod viruses. Because arthropods interact with a wide range of organisms including vertebrate animals and plants, they can be seen as the direct or indirect source of many clinically or economically important viruses. The viral genetic and phenotypic diversity documented in arthropods here therefore provides a new perspective on fundamental questions of virus origins, diversity, host range, genome evolution, and disease emergence.

Materials and methods

Sample collection

Between 2011 and 2013 we collected 70 species of arthropods from various locations in China (Table 1). Among these, ticks were either directly picked from wild and domestic animals or captured using a tick drag-flag method; mosquitoes were trapped by light-traps; common flies were captured by fly paper; horseflies were picked from infested cattle; bed bugs and cockroaches were trapped indoors; louse flies were plucked from the skin of bats; millipedes were picked up from the ground; spiders were collected from their webs; water striders were captured using hand nets from river surfaces; and crabs and shrimps were bought (alive) from local fisherman. In addition, three pools of mixed insect samples (Table 1) were collected from a rural area adjacent to rice fields (Insect Mix 1), from a lakeside (Insect Mix 3), and from a mountainous area near Wuhan (Insect Mix 4). After brief species identification by experienced field biologists, these samples were immediately stored in liquid nitrogen and were later put on dry ice for shipment to our laboratory.

Total RNA extraction

The specimens were first grouped into several units (Table 1). Depending on the size of specimens, one unit could include from 1 to 20 individual arthropods belonging to the same species and sampling location. These units were first washed with phosphate-buffered saline (PBS) three times before homogenized with the Mixer mill MM400 (Restsch, Germany). The resultant homogenates were then subjected to RNA extraction using TRIzol LS reagent (Invitrogen, Carlsbad, CA). After obtaining the aqueous phase containing total RNA, we performed purification steps from the E.Z.N.A Total RNA Kit (OMEGA, Portugal) according to the manufacturer's instructions. The concentration and quality of final extractions were examined using a ND-1000 UV spectrophotometer (Nanodrop, Wilmington, DE). Based on host types and/or geographic locations, these extractions were further merged into 16 pools for RNA-seq library construction and sequencing (Table 1).

Species identification

To verify the field species identification, we took a proportion of the homogenates from each specimen or specimen pool for genomic DNA extraction using E.Z.N.A. DNA/RNA Isolation Kit (OMEGA). Two genes were used for host identification: the partial 18S rRNA gene (∼1100 nt) which was amplified using primer pairs 18S#1 (5′-CTGGTGCCAGCGAGCCGCGGYAA-3′) and 18S#2RC (5′-TCCGTCAATTYCTTTAAGTT-3′) and partial COI gene (∼680 nt) using primer pairs LCO1490 (5′-GGTCAACAAATCATAAAGATATTGG-3′) and HCO2198 (5′-TAAACTTCAGGGTGACCAAAAAATCA-3′). PCRs were performed as described previously (Folmer et al., 1994; Machida and Knowlton, 2012). For taxonomic determination, the resulting sequences were compared against the nt database as well as with all COI barcode records on the Barcode of Life Data Systems (BOLD).

RNA-seq sequencing and reads assembly

Total RNA was subjected to a slightly modified RNA-seq library preparation protocol from that provided by Illumina. Briefly, following DNase I digestion, total RNA was subjected to an rRNA removal step using Ribo-Zero Magnetic Gold Kit (Epicentre, Madison, WI). The remaining RNA was then fragmented, reverse-transcribed, ends repaired, dA-tailed, adaptor ligated, purified, and quantified with Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System. Pair-end (90 bp or 100 bp) sequencing of the RNA library was performed on the HiSeq 2000 platform (Illumina, San diego, CA). All library preparation and sequencing steps were performed by BGI Tech (Shenzhen, China). The resulting sequencing reads were quality trimmed and assembled de novo using the Trinity program (Grabherr et al., 2011). All sequence reads generated in this study were uploaded onto NCBI Sequence Read Achieve (SRA) database under the BioProject accession SRP051790.

Discovery of target virus sequences

The assembled contigs were translated and compared (using Blastx) to reference protein sequences of all negative-sense RNA viruses. Sequences yielding e-values larger than 1e−5 were retained and compared to the entire nr database to exclude non-viral sequences. The resulting viral sequences were merged by identifying unassembled overlaps between neighboring contigs or within a scaffold using the SeqMan program implemented in the Lasergene software package v7.1 (DNAstar, Madison, WI). To prevent missing highly divergent viruses, the newly found viral sequences were included in the reference protein sequences for a second round of Blastx.

Sequence confirmation and repairing by Sanger methods

For each potential viral sequence, we first used nested RT-PCR to examine which unit contained the target sequence, utilizing primers designed based on the deep-sequencing results. In the case of segmented viruses this information was also used to determine whether and which of the segments recovered from the pool belonged to the same virus. We next designed overlapping primers to verify the sequence obtained from the deep sequencing and assembly processes. Based on the verified sequences, we determined the sequencing depth and coverage by mapping reads to target sequences using bowtie2 (Langmead and Salzberg, 2012). All virus genome sequences generated in this study have been deposited in the GenBank database under accession numbers KM817593–KM817764‏.

Quantification of relative transcript abundances

Before quantification, we first removed the rRNA reads from the data sets to prevent any bias due to the unequal efficiency of rRNA removal steps during library preparation. To achieve this, we blasted the Trinity assembly results against the SILVER rRNA database (Quast et al., 2013) and then used the resulting rRNA contigs as a template for mapping using BOWTIE2 (Langmead and Salzberg, 2012). The remaining reads from each library were then mapped on to the assembled transcripts and analyzed with RSEM (Li et al., 2010), using the run_RSEM_align_n_estimate.pl scripts implemented in the Trinity program (Grabherr et al., 2011). The relative abundance of each transcript is presented as transcripts per million (TPM) which corrects for the total number of reads as well as for transcript length (Li et al., 2010).

Genome walking

Some of the sequences obtained were substantially shorter than expected. To obtain longer sequences, we used a Genome walking kit (TaKaRa, Japan). Briefly, three gene-specific primers close to the end of the known sequence were designed. RNA from positive samples was used as input for reverse transcription primed by random primer N6. TAIL-PCR (thermal asymmetric interlaced PCR) was performed according to the manufacturer's protocol. The cDNA was used as a template for PCR with specific primers and the manufacturer-supplied degenerate primers. After three rounds of amplification, the products were analyzed on 1.0% agarose gels, and single fragments were recovered from the gels and purified using an agarose gel DNA extraction kit (TaKaRa). The purified products were then ligated into pMD19-T vector (TaKaRa) which contains the gene for ampicillin resistance. The vector was transformed into DH5α cells, which were spread on agar plates and incubated overnight at 37°C. A total of 10 clones were randomly selected and sequenced using M13 primers on ABI 3730 genetic analyzer (Applied Biosystems, Carlsbad, CA).

Determination of genome/segment termini

The extreme 5′ sequences were recovered by performing a 5′-Full RACE kit with TAP (TaKaRa) according to the manufacturer's protocol. Briefly, two gene-specific primers close to the end of the known sequence were designed. The 5′ end of RNA was ligated to the 5′RACE adaptor (without 5′ end dephosphorylating and decapping) and then reverse-transcribed using random 9 mers. The resulting cDNA was used as a template for nested PCR with 5′ RACE primers provided by the kit and gene-specific reverse primers. The PCR products were separated on an agarose gel, cloned into pMD19-T cloning vector, and subsequently sequenced.

The extreme 3′ sequences were recovered by performing a 3′-full RACE Core Set with PrimeScript RTase (TaKaRa) according to the manufacturer's protocols. Because the RNA template lacks a polyadenylated tail, a Poly(A) Tailing Kit (Applied Biosystems) was used to add this to the RNAs prior to first-strand 3′-cDNA synthesis. 20 μl of the Poly(A)-tailing reaction mixture was prepared according to the manufacturer's instructions and was incubated at 37°C for 1 hr before reverse transcription using PrimeScript Reverse Transcriptase. The cDNA was then amplified by nested PCR using the 3′ RACE primers provided by the kit and gene-specific reverse primers. The PCR products were separated on agarose gels, cloned into pMD19-T cloning vector, and subsequently sequenced. The 5′ and 3′ ends of the genome fragment were also determined by RNA circularization. RT-PCR amplification was performed across the ligated termini and the resulting PCR products were subsequently cloned and sequenced.

Phylogenetic analyses

Potential viral proteins identified from this study were aligned with their corresponding homologs of reference negative-sense RNA viruses using MAFFT version 7 and employing the E-INS-i algorithm (Katoh and Standley, 2013). The sequence alignment was limited to conserved domains, with ambiguously aligned regions removed using TrimAl (Capella-Gutierrez et al., 2009). The final alignment lengths were 224 amino acids (aa), 412aa, 727aa, and 364aa for data sets of overall, bunya-arena-like, mononega-like, and orthomyxo-like data sets, respectively. Phylogenetic trees were inferred using the maximum likelihood method (ML) implemented in PhyML version 3.0 (Guindon and Gascuel, 2003), with the WAG + Γ amino acid substitution model and a Subtree Pruning and Regrafting (SPR) topology searching algorithm. Phylogenetic trees were also inferred using a Bayesian method implemented in MrBayes version 3.2.2 (Ronquist and Huelsenbeck, 2003), with the same substitution model as used in ML tree inference. In the MrBayes analyses, we used two simultaneous runs of Markov chain Monte Carlo sampling, and the runs were terminated upon convergence (standard deviation of the split frequencies <0.01). The phylogeny was subsequently summarized from both runs with an initial 10% of trees discarded as burn-in.

Prediction of protein domains and functions

For each of the putative viral protein sequences, we used TMHMM v2.0 (http://www.cbs.dtu.dk/services/TMHMM/) to predict the transmembrane domains, SignalP v4.0 (http://www.cbs.dtu.dk/services/SignalP/) to determine signal sequences, and NetNGlyc v1.0 (http://www.cbs.dtu.dk/services/NetNGlyc/) to identify N-linked glycosylation sites. For some of the highly divergent viruses belonging to the Mononegavirales and the Chuviridae, a protein was regarded as a potential glycoprotein if it contained (i) a N-terminal signal domain, (ii) a C-terminal transmembrane domain, and (iii) glycosylation sites in cytoplasmic domains.

Identification and characterization of endogenous viruses

Endogenous copies of the exogenous negative-sense RNA viruses newly described here were detected using the tBlastn algorithm against arthropod genomes available in the Reference Genomic Sequences Database (refseq_genomic) and Whole Genome Shotgun Database (WGS) in GenBank, and using viral amino acid sequences as queries. The threshold for match was set to 1e−05 for the e-value and 50 amino acids for matched length. The query process was reversed for each potential endogenous virus to determine their corresponding phylogenetic group. Orthologous insertion events were determined by examining flanking gene sequences. Sequence alignment and phylogenetic analyses were carried out as described above.

Characterization of bi-segmented viruses in the Chuviridae

Within the Chuviridae, Wuhan Louse Fly Virus 6 and 7, Wenzhou Crab Virus 2, Lishi Spider Virus 1, and Wuchang Cockroach Virus 3 possessed bi-segmented genomes.

Both segments were discovered using Blastx against pools of predicted proteins from unsegmented chuvirus or mononegavirales sequences. To determine that these sequences were indeed from separate segments, we performed all combinations of head-to-tail RT-PCR which allowed us to ascertain whether the sequence fragments came from a single genome. Furthermore, checking sequencing depth can help to eliminate the possibility of separate contigs being generated due to inadequate sequencing coverage. To prove that a pair of segments belonged to the same virus, we checked: (i) sequencing depth for both segments, (ii) the presence of conserved regulatory sequences at non-coding regions of the genome, (iii) whether there is match for PCR-positive units, and (iv) the phylogenetic positions of the different viral proteins (Figure 7A).

Characterization of a circular genome form within the Chuviridae

The circular genome organization within the Chuviridae was identified after we found that their genome sequences were ‘over assembled’ (i.e., generating contigs that contained more than one genome connected head-to-tail). This circular genomic form was also observed in both segments of the segmented chuviruses (Figure 7B). In addition, RT-PCR and sequencing over the entire genome did not reveal any break-points. As a control, the same protocol failed to connect the genome termini within the Mononegavirales, suggesting the circular genomic form is unique to the chuviruses. To further validate that these genomes are circular, we mapped the high-throughput sequencing reads to these assembled genomes. The coverage and depth were adequate throughout the genome with the exception of one location upstream to the 3′ end of the ORF encoding RdRp (Figure 7C). This genomic location had only 0–20 X coverage depending on the virus, although all RT-PCRs were successful across this location. Interestingly, sequencing of the cloned PCR products revealed extensive sequence variation (i.e., insertions and deletions) (Figure 7C), which is the likely cause of the low sequence coverage in this location. Collectively, these data provide strong evidence for circular genomes in the chuviruses, although this does not exclude the potential presence of linear genomic forms.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
    DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates
    1. O Folmer
    2. M Black
    3. W Hoeh
    4. R Lutz
    5. R Vrijenhoek
    (1994)
    Molecular Marine Biology and Biotechnology 3:294–299.
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
    Virus taxonomy: 9th report of the international committee on taxonomy of viruses
    1. AM King
    2. MJ Adams
    3. EB Carstens
    4. EJ Lefkowitz
    (2012)
    Elsevier Academic Press.
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
    Hemorrhagic fever caused by a novel tick-borne Bunyavirus in Huaiyangshan, China
    1. YZ Zhang
    2. DJ Zhou
    3. Y Xiong
    4. XP Chen
    5. YW He
    6. Q Sun
    7. B Yu
    8. J Li
    9. YA Dai
    10. JH Tian
    11. XC Qin
    12. D Jin
    13. Z Cui
    14. XL Luo
    15. W Li
    16. S Lu
    17. W Wang
    18. JS Peng
    19. WP Guo
    20. MH Li
    21. ZJ Li
    22. S Zhang
    23. C Chen
    24. Y Wang
    25. MD de Jong
    26. J Xu
    (2011)
    Zhonghua Liu Xing Bing Xue Za Zhi = Zhonghua Liuxingbingxue Zazhi 32:209–220.

Decision letter

  1. Stephen P Goff
    Reviewing Editor; Howard Hughes Medical Institute, Columbia University, United States

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for sending your work entitled “Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses” for consideration at eLife. Your article has been favorably evaluated by Chris Ponting (Senior editor) and three reviewers, one of whom, Stephen Goff, is a member of our Board of Reviewing Editors.

The Reviewing editor and the other reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.

This paper reports the recovery of sequences of a large number of new RNA viral genomes extracted from arthropods from China. These new sequences have then been placed in a phylogenetic tree of the virus family and reveal a hugely expanded branch of the negative-sense RNA virus tree. There is interesting discussion about the evolution of segmented and nonsegmented viral genomes, a current topic of great interest in virology.

These data provide a significant amount of new food for thought about virus diversity, virus evolution, and host range for this large and important group of viruses. We found the paper well-written, informative, and highly accessible to virologists and especially to nonvirologists. The text is terse and to-the-point. There are also surprises: there is a new family, the so-called Chuviridae. The circular RNA genomes are quite remarkable and require some new twists in RNA genome replication. This should spark considerable new exploration of RNA polymerase functions.

Concerns to address in a resubmission:

The manuscript primarily requires some textual revisions to 'tone down' some of its more general claims and to elaborate on some of the methodological details.

1) The most substantive concern regarding the data is 'where is the rest of it?'. The methods described would also identify positive sense RNA viruses and dsRNA viruses: given this, why is the focus limited to negative sense viruses? Do the other data tell a substantially different/additional/separable story? If they do not (i.e. there is another manuscript coming with substantially the same story for + ssRNA viruses) then we do not think it appropriate to split the dataset.

2) We would encourage some expanded discussion of the implications of the evolution of segmented and nonsegmented viral genomes.

3) We would encourage discussion of the possibility that these viruses were not infecting the presumed host, but were instead infecting any or all of shared gut contents such as animal prey or food plants, gut microflora (yeasts?) or parasites (notably nematodes and/or mites, single-celled eukaryotic parasites).

4) In the Introduction: To say arthropod viruses are 'characterised by' persistent infection, vertical transmission and genomic integration is an extreme overstatement, and is potentially misleading. These things happen, but surely there no evidence that they are even that common, let alone 'characteristic'? It might be arguable for 'persistent infection', but surely the other two really only pertain to a minority? Also, these things seem to be ascribed to all viruses, rather than the–ssRNA that are the focus of the manuscript.

5) In the Results, the authors state that “a key result of this study is that the majority of the diversity of negative-sense RNA viruses now appears to be contained within viruses that utilize arthropods as hosts or vectors”: this over claims the case. This large-scale systematic metagenomic survey (relative to the small-scale ad hoc analyses that represent the majority of the previously described–ssRNA viruses) certainly identifies an exciting diversity; but who's to say that a similar survey of marine molluscs (or basal metazoan?) wouldn't change the perspective again? The only strong claim that can be made here is that viruses are undersampled. This is revisited in the first paragraph of the Discussion section, where the authors acknowledge the problem, but we think it needs to be toned down.

6) We don't really 'buy' the argument in the second paragraph of the Discussion section that 'RNAi enables benign and persistent infections'. Since vertebrates are the only deep eukaryotic lineage (probably) lacking the antiviral RNAi response, this amounts to claiming that vertebrates are unique is not easily developing persistent benign viral infections (what about plants and fungi?). Is this really the case? What evidence is there to support this?

7) It seems a missed opportunity to not discuss timescales. We appreciate this is hard, but given the large diversity of new viruses here (and the ease with which some other RNA viruses form endogenous copies/EVEs) it seems a shame not to search genomic DNA datasets to identify shared ancient insertions that could be used to date some of the splits.

8) In the Results, where two different hosts are identified for a 'single' virus, we need to know the (synonymous site and amino acid) divergence between the two different viral lineages. The timescale over which such transmission occurs is vital—they could be sharing an infection pool, or they might not have had a common ancestor for thousands of years.

9) The experimental approach is quite straightforward, but comprehensive and well conducted, and the results are impressive. But it is unclear what efforts have been taken by the authors to ensure that the viruses were indeed from arthropods and not internal or external contaminants. This point should at least be discussed.

10) The results are presented in a rather cursory manner, and we would like to see a lot more detail, such as fully labeled trees being available in supplementary information.

11) While we appreciate that the RdRp is highly conserved and unlikely to provide strong statistical supports, there are unequivocal statements about phylogenetic relationships that may not have any support. Sequencing of other more variable genes and a more considered analysis would help to resolve this. As it is, it seems that only 171 sequences were generated from this study (sixth paragraph of the Materials and methods section), which is baffling, and it is not clear whether the deep sequencing assemblies are going to be made available anywhere (please provide details).

12) Ideally, we would like to see some attempt made at dating these new viruses but we understand that this could be very challenging.

https://doi.org/10.7554/eLife.05378.023

Author response

1) The most substantive concern regarding the data is 'where is the rest of it?'. The methods described would also identify positive sense RNA viruses and dsRNA viruses: given this, why is the focus limited to negative sense viruses? Do the other data tell a substantially different/additional/separable story? If they do not (i.e. there is another manuscript coming with substantially the same story for + ssRNA viruses) then we do not think it appropriate to split the dataset.

The reviewers are correct that we did indeed discover a number of other RNA viruses (positive-sense, double-stranded) in the arthropods examined here. However, the newly discovered diversity in other virus groups, although novel, is not as comprehensive and all-encompassing as that of the negative-sense RNA viruses described here, and tells a rather different story. In particular, the positive- and double-stranded RNA viruses (which we are still characterizing) tend to fall within the diversity of known families, rather than filling phylogenetic ‘gaps’ as is the case with the negative-sense RNA viruses. In addition, the role played by arthropods in such families as the Flaviviridae and the Picornavirales is somewhat different (and more complex) to what we present here (for example, see our recent paper on the evolutionary and genomic complexity of the Flaviviridae: Qin et al., PNAS, 111, 6744-6749, 2014). We therefore feel that it is difficult to present a single, coherent picture for all these RNA viruses, and that some families of positive-sense viruses will likely merit individual publications. Hence, because the negative-sense RNA viruses are a distinct phylogenetic group, we feel that it is simplest to present a stand-alone paper on these viruses, rather than greatly complicating it with the inclusion of other viral groups.

2) We would encourage some expanded discussion of the implications of the evolution of segmented and nonsegmented viral genomes.

We agree with the reviewers that the implications of virus segmentation could be discussed more thoroughly and we have now added some additional text on this matter. Because we would like to avoid unnecessary speculation, and it seems premature to make strong claims in the absence of a wider sampling of eukaryotic viruses, we have focused this extra text on the future work that needs to be done. The expanded discussion can be found in the sixth paragraph of the Discussion section:

“Independent changes of genome segmentation numbers are also observed in the mononegavirales-like viruses (Figure 4) and, more frequently, in the chuviruses (Figure 7A). Consequently, the number of genome segments appears to be a relatively flexible trait at a broad evolutionary scale, although the functional relevance of these changes remains unclear. While the segmented viruses (bunya-arenaviruses, orthomyxoviruses, and ophioviruses) appear to be distinct from the largely unsegmented mononegavirales-like viruses in our phylogenetic analysis, this may be an artifact of under-sampling, especially given that only a tiny fraction of eukaryotes have been sampled to date. With a wider sample of eukaryotic viruses it will be possible to more accurately map changes in segment number onto phylogenetic trees and in so doing come to a more complete understanding of the patterns and determinants of the evolution of genome segmentation.”

3) We would encourage discussion of the possibility that these viruses were not infecting the presumed host, but were instead infecting any or all of shared gut contents such as animal prey or food plants, gut microflora (yeasts?) or parasites (notably nematodes and/or mites, single-celled eukaryotic parasites).

The reviewers make a very good point, and we agree that the potential source of these viruses should be examined and discussed in more detail. Although it is impossible to achieve a clear picture of the true source for every virus in a metagenomic sequencing project such as this, we found that by measuring the relative abundance of viral transcripts we could at least distinguish those viruses that are clearly infecting the host organism we have sampled (i.e. those with high abundance) from those with possible other origins (which we predict to be of low abundance). Furthermore, the possibility that food/microflora/parasites are the source of the viruses is further reduced if the less abundant viruses appear closely related to the abundant viruses in phylogenetic trees. Collectively, we believe that this approach provides an effective working solution to the source of the viruses observed here. The description of RNA transcript quantification can be found in the Methods section in the seventh paragraph:

“Quantification of relative transcript abundances. Before quantification, we first removed the rRNA reads from the data sets to prevent any bias due to the unequal efficiency of rRNA removal steps during library preparation. To achieve this, we blasted the Trinity assembly results against the SILVER rRNA database (Quast et al., 2013), and then used the resulting rRNA contigs as a template for mapping using BOWTIE2 (Langmead and Salzberg, 2012). The remaining reads from each library were then mapped on to the assembled transcripts and analyzed with RSEM (Li et al., 2010), using the run_RSEM_align_n_estimate.pl scripts implemented in the Trinity program (Grabherr et al., 2011). The relative abundance of each transcript is presented as transcripts per million (TPM) which corrects for the total number of reads as well as for transcript length (Li et al., 2010).”

The new results include an additional figure and an extra “Abundance” column in Tables 2–4. The revised text can be found in the second paragraph of the Results section and reads:

“Next, we measured the abundance of these sequences as the number transcripts per million (TPM) within each library after the removal of rRNA reads. The abundance of viral transcripts calculated in this manner exhibited substantial variation (Figure 2, Tables 2–4): while the least abundant L segment (Shayang Spider Virus 3) contributed to less than 0.001% to the total non-ribosomal RNA content, the most abundant (Sanxia Water Strider Virus 1) was at a frequency of 21.2%, and up to 43.9% if we include the matching M and S segments of the virus. The remaining viral RdRp sequences fell within a range (10-1000 TPM) that matched the abundance level of highly expressed host mitochondrial genes (Figure 2).”

Finally, in the Discussion section of the manuscript, we examined the potential causes of this variation of abundance level, including the possibility that they did not infect the presumed host. The revised text can be found in the third paragraph and reads as follows:

“The viruses discovered here also exhibited a huge variation in level of abundance. It is possible that this variation is in part due to the stage or severity of infection in individual viruses, and may be significantly influenced by the process of pooling, since most of our libraries contain an uneven mixture of different host species or even genera. In addition, it is possible that some low abundance viruses may in fact be derived from other eukaryotic organisms present in the host sampled, such as undigested food or prey, gut micro flora, and parasites. Nevertheless, since the majority of the low abundance viruses appear in the same groups as the highly abundant ones in our phylogenetic analyses, these viruses are most likely associated with arthropods.”

4) In the Introduction: To say arthropod viruses are 'characterised by' persistent infection, vertical transmission and genomic integration is an extreme overstatement, and is potentially misleading. These things happen, but surely there no evidence that they are even that common, let alone 'characteristic'? It might be arguable for 'persistent infection', but surely the other two really only pertain to a minority? Also, these things seem to be ascribed to all viruses, rather than the–ssRNA that are the focus of the manuscript.

We agree with the reviewers that this was an overstatement and the relevant text has therefore been deleted.

5) In the Results, the authors state that “a key result of this study is that the majority of the diversity of negative-sense RNA viruses now appears to be contained within viruses that utilize arthropods as hosts or vectors”: this over claims the case. This large-scale systematic metagenomic survey (relative to the small-scale ad hoc analyses that represent the majority of the previously described–ssRNA viruses) certainly identifies an exciting diversity; but who's to say that a similar survey of marine molluscs (or basal metazoan?) wouldn't change the perspective again? The only strong claim that can be made here is that viruses are undersampled. This is revisited in the first paragraph of the Discussion section, where the authors acknowledge the problem, but we think it needs to be toned down.

We apologize for the confusion caused by our wording, which we agree was too strong. The intended point was simply that currently known virus diversity (which is largely from vertebrates and plants) is mainly nested within the viral diversity newly discovered in arthropods. The relevant context has therefore been changed in the fourth paragraph of the Results section:

“A key result of this study is that the much of the genetic diversity of negative-sense RNA viruses in vertebrates and plants now appears to be contained within viruses that utilize arthropods as hosts or vectors.”

And also in the first paragraph of the Discussion section:

“Our study suggests that arthropods are major reservoir hosts for many, if not all, of the negative-sense RNA viruses in vertebrates and plants, and hence have likely played a major role in their evolution.”

6) We don't really 'buy' the argument in the second paragraph of the Discussion section that 'RNAi enables benign and persistent infections'. Since vertebrates are the only deep eukaryotic lineage (probably) lacking the antiviral RNAi response, this amounts to claiming that vertebrates are unique is not easily developing persistent benign viral infections (what about plants and fungi?). Is this really the case? What evidence is there to support this?

The reviewers make a fair point. Accordingly, the relevant text has been deleted, and replaced by the following text in the second paragraph of the Discussion section:

“Furthermore, arthropods are involved in almost all ecological guilds and actively interact with other eukaryotes, including animals, plants and fungi, such that it is possible that they serve as both sources and sinks for viruses present in the environment. In addition, not only were diverse viruses present, but they were often highly abundant. For example, in the pool containing twelve individuals (representing two species) from the Gerridae (Water striders) collected at the same site, we identified at least five negative-sense RNA viruses whose TPM values are well above 100, and where the viral RNA collectively made up more than 50% of the host total RNA (rRNA excluded). Determining why arthropods are able to carry such a large viral diversity and at such frequencies clearly merits further investigation.”

7) It seems a missed opportunity to not discuss timescales. We appreciate this is hard, but given the large diversity of new viruses here (and the ease with which some other RNA viruses form endogenous copies/EVEs) it seems a shame not to search genomic DNA datasets to identify shared ancient insertions that could be used to date some of the splits.

As the reviewers correctly state, inferring the time-scale of this evolutionary history is very difficult (although it is clearly ancient) and something we considered closely (because it is obviously very interesting). We did find a large number of potential EVEs using tBlastx (please see the new Table 5 in the paper). Interestingly, most of these EVE sequences have disrupted reading frames and many are found within transposon elements, suggesting that the transposons have been central to their integration. Nevertheless, and critically for the context of molecular clock dating, the EVEs we discovered do not share homologous integration sites in the host genome as revealed by an analysis of flanking sequences (i.e. they are likely not orthologous and orthology greatly aids dating). In addition, many are in short contig/scaffolds such that it is difficult to trace their locations. Furthermore, phylogenetic analyses of those EVEs shared among different host species revealed extremely complex tree topologies which do not reflect the host phylogeny; these are indicative of independent integration events that again prohibit effective molecular clock dating (please find an example tree in the new Figure 11). In sum, after a careful analysis of the EVE data, we could not find any clear examples that could provide a reliable time-scale of viral evolution. In addition, the general incongruence between the (exogenous) virus and host phylogenies precluded any reliable molecular clock dating analysis. We would therefore prefer not to include any such analysis as it will clearly only add error (i.e. we cannot safely say that these dating estimates will be accurate). These points are now made in the revised version of the paper. In addition, we have added some text outlining the discovery and characterization of the novel endogenous viruses. The description of the methods can be found in the thirteenth paragraph of the Materials and methods section and reads:

“Identification and characterization of endogenous viruses. Endogenous copies of the exogenous negative-sense RNA viruses newly described here were detected using the tBlastn algorithm against arthropod genomes available in the Reference Genomic Sequences Database (refseq_genomic) and Whole Genome Shotgun Database (WGS) in GenBank, and using viral amino acid sequences as queries. The threshold for match was set to 1e-05 for the e-value and 50 amino acids for matched length. The query process was reversed for each potential endogenous virus to determine their corresponding phylogenetic group. Orthologous insertion events were determined by examining flanking gene sequences. Sequence alignment and phylogenetic analyses were carried out as described above.”

These results are described in a new Figure 11 and Table 5. The related text can be found in the last paragraph of the Results section and reads:

“Novel Endogenous Virus Elements (EVEs). As well as novel exogenous RNA viruses, our metagenomic analysis also revealed a large number of potential EVEs (Katzourakis and Gifford, 2010) in more than 40 arthropod species: these resembled complete or partial genes of the major proteins—the nucleoprotein, glycoprotein and RdRp—but without fully intact genomes (Table 5). As expected given their endogenous status, most of these sequences have disrupted reading frames and many are found within transposon elements, suggesting that transposons have been central to their integration. Interestingly, in some cases, such as the putative glycoprotein gene of chuviruses, the homologous EVEs from within a family (Culicidae) or even an order (Hymenoptera) form monophyletic groups (Figure 11). However, they are unlikely to be orthologous because they do not share homologous integration sites in the host genome as determined by an analysis of flanking sequences, which in turn limited the applicability of molecular-clock based dating techniques. Furthermore, phylogenetic analyses of those EVEs shared among different host species revealed extremely complex tree topologies which do not exhibit simple matches to the host phylogeny at both the species and genera levels (Figure 11B,C). In sum, these results suggest that EVEs are relative commonplace in arthropod genomes and have been often generated by multiple and independent integration events.”

8) In the Results, where two different hosts are identified for a 'single' virus, we need to know the (synonymous site and amino acid) divergence between the two different viral lineages. The timescale over which such transmission occurs is vital—they could be sharing an infection pool, or they might not have had a common ancestor for thousands of years.

The genetic diversity of these viruses is very limited across different hosts and the viral phylogenies do not reflect those of their hosts. As a consequence, it seems reasonable to conclude that they are likely from the same infection pool. We have included a new figure illustrating the relationship between these viruses in the context of multiple hosts, and the revised text can be found in the fourth paragraph of the Results section:

“Interestingly, host species in the same niche had similar viral contents that were generally incongruent with the host phylogeny (Figure 6). Such a pattern is indicative of frequent cross-species and occasional cross-genus virus transmission in the context of ecological and geographic proximity.”

9) The experimental approach is quite straightforward, but comprehensive and well conducted, and the results are impressive. But it is unclear what efforts have been taken by the authors to ensure that the viruses were indeed from arthropods and not internal or external contaminants. This point should at least be discussed.

Please refer to our response to Major Comment #3.

10) The results are presented in a rather cursory manner, and we would like to see a lot more detail, such as fully labeled trees being available in supplementary information.

We apologize for the over-simplification of the Results; we were trying to be as succinct as possible. Detailed phylogenic trees have now been included as Figure 3–figure supplement 3.

11) While we appreciate that the RdRp is highly conserved and unlikely to provide strong statistical supports, there are unequivocal statements about phylogenetic relationships that may not have any support. Sequencing of other more variable genes and a more considered analysis would help to resolve this. As it is, it seems that only 171 sequences were generated from this study (sixth paragraph of the Materials and methods section), which is baffling, and it is not clear whether the deep sequencing assemblies are going to be made available anywhere (please provide details).

We agree with the reviewers that the RdRp phylogeny may lack resolution if: (i) the sequences are too conserved (i.e. below species level), and (ii) the sequence are too divergent such that they do not share conserved domains. However, in our case, the replicase (RdRp) domains were well conserved across the negative-sense RNA viruses such that a reliable topology could be inferred (and very similar phylogenies were obtained using both ML and Bayesian methods). In contrast, the other major viral genes (glycoproteins and nucleoprotein) are far more divergent and lack conserved domains across genera (some are not even homologous), and therefore they are not helpful in a phylogenetic analysis aimed at resolving ancient relationships. As the result, the only viable way to infer the deep phylogenetic relationships among negative-sense viruses is with the RdRp which, of course, is why there is a rich literature on the use of this gene but not any others. We therefore strongly believe that our phylogenetic analysis of RdRp is the only scientifically rigorous way to proceed.

To clarify, the 171 sequences we deposited in the GenBank are confirmed complete or near complete genomes of all the negative-sense RNA viruses found in this study. We agree with the reviewers that raw data should be made available and, consequently, all the reads generated in this study have now been uploaded to NCBI Sequence Read Achieve. The corresponding description is added in the fourth paragraph of the Materials and methods section:

“All sequence reads generated in this study were uploaded onto NCBI Sequence Read Achieve (SRA) database under the BioProject accession SRP051790.”

12) Ideally, we would like to see some attempt made at dating these new viruses but we understand that this could be very challenging.

Please refer to our response to Major Comment #7. Sadly, we believe that dating is too difficult to perform on these data and will just add error.

https://doi.org/10.7554/eLife.05378.024

Article and author information

Author details

  1. Ci-Xiu Li

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, China
    Contribution
    C-XL, Acquisition of data, Analysis and interpretation of data
    Contributed equally with
    Mang Shi, Jun-Hua Tian, Xian-Dan Lin and Yan-Jun Kang
    Competing interests
    The authors declare that no competing interests exist.
  2. Mang Shi

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, China
    3. Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, Australia
    Contribution
    MS, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Ci-Xiu Li, Jun-Hua Tian, Xian-Dan Lin and Yan-Jun Kang
    Competing interests
    The authors declare that no competing interests exist.
  3. Jun-Hua Tian

    Wuhan Center for Disease Control and Prevention, Wuhan, China
    Contribution
    J-HT, Acquisition of data, Analysis and interpretation of data, Contributed unpublished essential data or reagents
    Contributed equally with
    Ci-Xiu Li, Mang Shi, Xian-Dan Lin and Yan-Jun Kang
    Competing interests
    The authors declare that no competing interests exist.
  4. Xian-Dan Lin

    Wenzhou Center for Disease Control and Prevention, Wenzhou, China
    Contribution
    X-DL, Acquisition of data, Analysis and interpretation of data, Contributed unpublished essential data or reagents
    Contributed equally with
    Ci-Xiu Li, Mang Shi, Jun-Hua Tian and Yan-Jun Kang
    Competing interests
    The authors declare that no competing interests exist.
  5. Yan-Jun Kang

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, China
    Contribution
    Y-JK, Acquisition of data, Analysis and interpretation of data, Contributed unpublished essential data or reagents
    Contributed equally with
    Ci-Xiu Li, Mang Shi, Jun-Hua Tian and Xian-Dan Lin
    Competing interests
    The authors declare that no competing interests exist.
  6. Liang-Jun Chen

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, China
    Contribution
    L-JC, Acquisition of data, Analysis and interpretation of data, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  7. Xin-Cheng Qin

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, China
    Contribution
    X-CQ, Acquisition of data, Analysis and interpretation of data, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  8. Jianguo Xu

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, China
    Contribution
    JX, Acquisition of data, Analysis and interpretation of data, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  9. Edward C Holmes

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, Australia
    Contribution
    ECH, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  10. Yong-Zhen Zhang

    1. State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
    2. Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, China
    Contribution
    Y-ZZ, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    For correspondence
    zhangyongzhen@icdc.cn
    Competing interests
    The authors declare that no competing interests exist.

Funding

National Natural Science Foundation of China (NSFC) (81290343, 81273014)

  • Ci-Xiu Li
  • Mang Shi
  • Jun-Hua Tian
  • Xian-Dan Lin
  • Yan-Jun Kang
  • Liang-Jun Chen
  • Xin-Cheng Qin
  • Jianguo Xu
  • Edward C Holmes
  • Yong-Zhen Zhang

Ministry of Science and Technology of the People's Republic of China (2014ZX10004001-005)

  • Yong-Zhen Zhang

National Health and Medical Research Council (NHMRC) (AF30)

  • Edward C Holmes

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was supported by National Natural Science Foundation of China (Grants 81290343, 81273014), the 12th Five-Year Major National Science and Technology Projects of China (2014ZX10004001-005). ECH is funded by an NHMRC Australia Fellowship (AF30). The authors sincerely thank Xiu-Nian Diao (Veterinary Station, Jiulingtuan of Wushi, Bole, Xinjiang Uygur Autonomous Region, China) and Ming-Hui Chen (Veterinary Station, Emin, Jiushi, Xinjiang Uygur Autonomous Region, China) for their assistance in sampling.

Reviewing Editor

  1. Stephen P Goff, Reviewing Editor, Howard Hughes Medical Institute, Columbia University, United States

Publication history

  1. Received: October 29, 2014
  2. Accepted: January 27, 2015
  3. Accepted Manuscript published: January 29, 2015 (version 1)
  4. Version of Record published: March 9, 2015 (version 2)

Copyright

© 2015, Li et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 6,222
    Page views
  • 1,375
    Downloads
  • 60
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Epidemiology and Global Health
    2. Microbiology and Infectious Disease
    Teja Turk et al.
    Research Article Updated
    1. Biochemistry
    2. Biophysics and Structural Biology
    Lindsay D Clark et al.
    Research Article Updated