When scientists hunt for new DNA sequences, sometimes they get a lot more than they bargained for. Such is the case in metagenomic surveys, which analyze not just DNA of a particular organism, but all the DNA in an environment at large. A vexing problem with these surveys is the overwhelming number of DNA sequences detected that are so different from any known microbe that they cannot be classified using traditional approaches. However, some of these “known unknowns” are undoubtedly viral sequences, because only a fraction of the enormous diversity of viruses has been characterized.
This “viral dark matter” is a major obstacle for those studying viruses. This led Tisza et al. to attempt to classify some of the unknown viral sequences in their metagenomic surveys. The search, which specifically focused on viruses with circular DNA genomes, detected over 2,500 circular viral genomes. Intensive analysis revealed that many of these genomes had similar makeup to previously discovered viruses, but hundreds of them were totally different from any known virus, based on typical methods of comparison.
Computational analysis of genes that were conserved among some of these brand-new circular sequences often revealed virus-like features. Experiments on a few of these genes showed that they encoded proteins capable of forming particles reminiscent of characteristic viral shells, implying that these new sequences are indeed viruses.
Tisza et al. have added the 2,500 newly characterized viral sequences to the publicly accessible GenBank database, and the sequences are being considered for the more authoritative RefSeq database, which currently contains around 9,000 complete viral genomes. The expanded databases will hopefully now better equip scientists to explore the enormous diversity of viruses and help medics and veterinarians to detect disease-causing viruses in humans and other animals.