Abstract

Shotgun metagenomic sequencing is a powerful approach to study microbiomes in an unbiased manner and of increasing relevance for identifying novel enzymatic functions. However, the potential of metagenomics to relate from microbiome composition to function has thus far been underutilized. Here, we introduce the Metagenomics Genome-Phenome Association (MetaGPA) study framework, which allows linking genetic information in metagenomes with a dedicated functional phenotype. We applied MetaGPA to identify enzymes associated with cytosine modifications in environmental samples. From the 2365 genes that met our significance criteria, we confirm known pathways for cytosine modifications and proposed novel cytosine-modifying mechanisms. Specifically, we characterized and identified a novel nucleic acid modifying enzyme, 5-hydroxymethylcytosine carbamoyltransferase, that catalyzes the formation of a previously unknown cytosine modification, 5-carbamoyloxymethylcytosine, in DNA and RNA. Our work introduces MetaGPA as a novel and versatile tool for advancing functional metagenomics.

Data availability

All raw and processed sequencing data generated in this study have been submitted to the NCBI Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra) under accession number PRJNA714147.

The following data sets were generated

Article and author information

Author details

  1. Weiwei Yang

    Research department, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Weiwei Yang, The author is employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  2. Yu-Cheng Lin

    Research department, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Yu-Cheng Lin, The author was an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  3. William Johnson

    Research department, New England Biolabs Inc, Ipswich, United States
    Competing interests
    William Johnson, The author was an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  4. Nan Dai

    RNA Biology, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Nan Dai, The author is an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  5. Romualdas Vaisvila

    Research department, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Romualdas Vaisvila, The author is an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  6. Peter Weigele

    Research department, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Peter Weigele, The author is an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  7. Yan-Jiun Lee

    Research department, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Yan-Jiun Lee, The author is an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  8. Ivan R Corrêa Jr

    RNA Biology, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Ivan R Corrêa, The author is an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3169-6878
  9. Ira Schildkraut

    Research department, New England Biolabs Inc, Ipswich, United States
    Competing interests
    Ira Schildkraut, The author is an employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
  10. Laurence Ettwiller

    Research department, New England Biolabs Inc, Ipswich, United States
    For correspondence
    laurence.ettwiller@gmail.com
    Competing interests
    Laurence Ettwiller, The author is employee of New England Biolabs Inc. a manufacturer of restriction enzymes and molecular reagents..
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3957-6539

Funding

New England Biolabs (no data)

  • Weiwei Yang
  • Yu-Cheng Lin
  • William Johnson
  • Nan Dai
  • Romualdas Vaisvila
  • Peter Weigele
  • Yan-Jiun Lee
  • Ivan R Corrêa Jr
  • Ira Schildkraut
  • Laurence Ettwiller

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. María Mercedes Zambrano, CorpoGen, Colombia

Version history

  1. Preprint posted: March 23, 2021 (view preprint)
  2. Received: May 4, 2021
  3. Accepted: November 5, 2021
  4. Accepted Manuscript published: November 8, 2021 (version 1)
  5. Version of Record published: December 14, 2021 (version 2)

Copyright

© 2021, Yang et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,649
    views
  • 211
    downloads
  • 4
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Weiwei Yang
  2. Yu-Cheng Lin
  3. William Johnson
  4. Nan Dai
  5. Romualdas Vaisvila
  6. Peter Weigele
  7. Yan-Jiun Lee
  8. Ivan R Corrêa Jr
  9. Ira Schildkraut
  10. Laurence Ettwiller
(2021)
A Genome-Phenome Association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA
eLife 10:e70021.
https://doi.org/10.7554/eLife.70021

Share this article

https://doi.org/10.7554/eLife.70021

Further reading

    1. Genetics and Genomics
    2. Neuroscience
    Yifei Weng, Shiyi Zhou ... Coleen T Murphy
    Research Article

    Cognitive decline is a significant health concern in our aging society. Here, we used the model organism C. elegans to investigate the impact of the IIS/FOXO pathway on age-related cognitive decline. The daf-2 Insulin/IGF-1 receptor mutant exhibits a significant extension of learning and memory span with age compared to wild-type worms, an effect that is dependent on the DAF-16 transcription factor. To identify possible mechanisms by which aging daf-2 mutants maintain learning and memory with age while wild-type worms lose neuronal function, we carried out neuron-specific transcriptomic analysis in aged animals. We observed downregulation of neuronal genes and upregulation of transcriptional regulation genes in aging wild-type neurons. By contrast, IIS/FOXO pathway mutants exhibit distinct neuronal transcriptomic alterations in response to cognitive aging, including upregulation of stress response genes and downregulation of specific insulin signaling genes. We tested the roles of significantly transcriptionally-changed genes in regulating cognitive functions, identifying novel regulators of learning and memory. In addition to other mechanistic insights, a comparison of the aged vs young daf-2 neuronal transcriptome revealed that a new set of potentially neuroprotective genes is upregulated; instead of simply mimicking a young state, daf-2 may enhance neuronal resilience to accumulation of harm and take a more active approach to combat aging. These findings suggest a potential mechanism for regulating cognitive function with age and offer insights into novel therapeutic targets for age-related cognitive decline.

    1. Genetics and Genomics
    Samuel Pattillo Smith, Gregory Darnell ... Lorin Crawford
    Research Article

    LD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a cis-interaction score (i.e. interactions between a focal variant and proximal variants) recovers genetic variance that is not captured by LDSC. For each of the 25 traits analyzed in the UK Biobank and BioBank Japan, i-LDSC detects additional variation contributed by genetic interactions. The i-LDSC software and its application to these biobanks represent a step towards resolving further genetic contributions of sources of non-additive genetic effects to complex trait variation.