Global analysis of cytosine and adenine DNA modifications across the tree of life

  1. Sreejith Jayasree Varma
  2. Enrica Calvani
  3. Nana-Maria Grüning
  4. Christoph B Messner
  5. Nicholas Grayson
  6. Floriana Capuano
  7. Michael Mülleder
  8. Markus Ralser  Is a corresponding author
  1. Department of Biochemistry, Charité Universitätsmedizin, Germany
  2. The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, United Kingdom
  3. Department of Biochemistry and Cambridge Systems Biology Center, University of Cambridge, United Kingdom
  4. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, United Kingdom
  5. Core Facility-High Throughput Mass Spectrometry, Charité Universitätsmedizin, Germany


Interpreting the function and metabolism of enzymatic DNA modifications requires both position-specific and global quantities. Sequencing-based techniques that deliver the former have become broadly accessible, but analytical methods for the global quantification of DNA modifications have thus far been applied mostly to individual problems. We established a mass spectrometric method for the sensitive and accurate quantification of multiple enzymatic DNA modifications. Then, we isolated DNA from 124 archean, bacterial, fungal, plant, and mammalian species, and several tissues and created a resource of global DNA modification quantities. Our dataset provides insights into the general nature of enzymatic DNA modifications, reveals unique biological cases, and provides complementary quantitative information to normalize and assess the accuracy of sequencing-based detection of DNA modifications. We report that only three of the studied DNA modifications, methylcytosine (5mdC), methyladenine (N6mdA) and hydroxymethylcytosine (5hmdC), were detected above a picomolar detection limit across species, and dominated in higher eukaryotes (5mdC), in bacteria (N6mdA), or the vertebrate central nervous systems (5hmdC). All three modifications were detected simultaneously in only one of the tested species, Raphanus sativus. In contrast, these modifications were either absent or detected only at trace quantities, across all yeasts and insect genomes studied. Further, we reveal interesting biological cases. For instance, in Allium cepa, Helianthus annuus, or Andropogon gerardi, more than 35% of cytosines were methylated. Additionally, next to the mammlian CNS, 5hmdC was also detected in plants like Lepidium sativum and was found on 8% of cytosines in the Garra barreimiae brain samples. Thus, identifying unexpected levels of DNA modifications in several wild species, our resource underscores the need to address biological diversity for studying DNA modifications.

Editor's evaluation

DNA methylation is an important mechanism to control gene expression, yet methods for quantitation of global DNA methylation analyses are limited. This work provides a new sensitive method for the quantitation of global DNA methylation and they apply this to over 100 species of eukaryotes and prokaryotes, finding interesting differences across species. This is a useful tool and resource for those interested in DNA methylation and evolution.


Enzyme-catalyzed DNA modifications are studied for their roles in chromatin structure, gene-expression regulation, prevention of viral DNA integration, epigenetic inheritance, cell–environment interactions, developmental biology, immunity, memory, aging, and cancer (Miller and Grant, 2013; Breiling and Lyko, 2015; Guo et al., 2011; Jessop et al., 2018; de la Calle-Fabregat et al., 2020; Day and Sweatt, 2010; Masser et al., 2018; Han et al., 2019; Cusack et al., 2020; Day, 2017). The methylation of the fifth carbon (C5) of the cytosine ring to yield 5-methyl-2′-deoxycytidine (5mdC) was the first nucleotide modification to be discovered (Hotchkiss, 1948) and has remained the most intensively studied (Umer and Herceg, 2013; Smith and Meissner, 2013). 5mdC can be enzymatically oxidized into 5-hydroxymethyl-2′-deoxycytidine (5hmdC) and further into 5-formyl-2′-deoxycytidine (fdC) and 5-carboxyl-2′-deoxycytidine (cadC) (Hu et al., 2015; Ito et al., 2011; Tahiliani et al., 2009). Although these modifications have been described as transient intermediates of 5mdC demethylation, at least one (5hmdC) has been found to accumulate in the mammalian brain, specifically in the large Purkinje neurons, indicating a regulatory function (Kriaucionis and Heintz, 2009). N4-methyl-2′-deoxycytidine (4mdC), found in bacteria, is yet another form of cytosine modification (Janulaitis et al., 1983; Ehrlich et al., 1987). Cytosine thus exists in multiple chemical states (dC, 5mdC, 5hmdC, fdC, cadC, 4mdC, as well as the rare 4,5-dimethyl-2′-deoxycytidine [4,5dmdC]) (Umer and Herceg, 2013; Klimasauskas et al., 2002). Another important modification is the N6 methylation of adenine. N6-methyl-2′-deoxyadenosine (N6mdA) was initially discovered in bacterial genomes (Dunn and Smith, 1955) and later also in archaea, plants, and nematodes (Couturier and Lindås, 2018; Liang et al., 2018). Although N6mdA is not essential in microbial model organisms, this modification has been increasingly associated with functions that promote virulence or to counteract viral DNA integration (Heusipp et al., 2007; O’Brown and Greer, 2016). Indeed, it seems likely that DNA modifications play different roles in different species, as indicated by the varying amounts of DNA modifications across model organisms. For instance, Arabidopsis thaliana has orders of magnitude higher levels 5mdC compared to the dominant insect model Drosophila melanogaster, while the dominant yeast model organism Saccharomyces cerevisiae lacks this modification altogether (Münzel et al., 2011; Capuano et al., 2014).

Until recently, studying DNA modifications was technically challenging, information concerning their content and function was scarce for species other than model organisms, several crops, and humans. Moreover, it was rather difficult to translate the knowledge derived from those intensively studied species into a broader biological context. For instance, it is hard to judge from the current literature if the low amount of DNA modifications in laboratory yeast and D. melanogaster, or the high amount in A. thaliana (Capuano et al., 2014), represent the rule or the exception in their respective phylogenetic group without a broader multi-species dataset for comparison.

In addition to the position-specific information provided by sequencing technologies (Chen et al., 2020; Liu et al., 2019), global quantities of DNA modifications are required to obtain a complete picture about their function and metabolism. For instance, quantitative values are required to determine activity of the biochemical pathways that modify nucleic acids. Moreover, there are roles of DNA modifications that do not necessarily depend on their specific location in the genome, like in anti-viral immunity. Also, there might be relationships between different modifications that depend on their chemistry rather than their function. Last but not least, absolute concentrations can help to normalize the values as provided by sequencing technologies and to assess their false positive and false negative rates. We and others Capuano et al., 2014; Le et al., 2011; Chowdhury et al., 2017; Tang et al., 2015; Chilakala et al., 2019; Gosselt et al., 2019 have shown previously that targeted mass spectrometry is an ideal technology to determine absolute quantities of DNA modifications, specifically, if they are low abundant and in the noise range of sequencing technologies. Mass spectrometry further is suitable for studying poorly characterized species, as no prior knowledge about the genome is required for data analysis. Aside from that, targeted mass spectrometry is economical, with running costs per sample amounting to single-digit dollars. For these reasons, mass-spectrometric quantification is well suited for identifying interesting patterns in the amount and relative abundances of DNA modifications, specifically within understudied species.

Results and discussion

Global quantification of a panel of enzymatic DNA modification using liquid chromatography/multiple reaction monitoring

In order to quantify the global levels of multiple enzymatic DNA modifications in a single analysis, we expanded a previous method based on liquid chromatography-multiple reaction monitoring (LC-MRM) and designed for the quantification of 5mdC (Tsuji et al., 2014). This method is characterized by a sensitivity down to attomoles and a broad dynamic range, and discriminates between RNA and DNA modifications, clarifying the previously debated content of 5mdC in several yeast species (Capuano et al., 2014). In this method, isolated DNA is first enzymatically digested to obtain the corresponding nucleosides using a nuclease enzyme mixture (DNA Degradase Plus, Zymo Research). The resulting digest is directly analyzed by a targeted assay using LC-MRM using a triple quadrupole (QQQ) mass spectrometer. Distinguishing the nucleosides arising from a DNA monomer from a potentially co-purified RNA monomer occurs on the basis of the precursor mass difference of the sugar moiety. Such a strategy ensures the measured nucleosides are free from RNA contamination as many base modifications are also present in RNA (Capuano et al., 2014; Tsuji et al., 2014). For quantifying other DNA modifications, namely 5hmdC, N6mdA, cadC, and fdC, we obtained synthetic standards for these molecules and optimized the instrumental and chromatography parameters accordingly (Tables 1 and 2; Figure 1—figure supplement 1). Moreover, we supplemented the method by a neutral loss scan as a strategy to confirm the MRM results, as well as to detect additional modifications such as 4mdC, that were not included among the standards. Combined with the high sensitivity offered by a triple quadrupole mass spectrometer (Agilent 6470), we were able to achieve detection limits in picomolar ranges (Figure 1A).

Table 1
Concentrations of pure nucleoside standards and their sources.
Molecule: vendor/codePure stock concentration(µM)Pool concentration(µM)
2dC: Sigma/D3897-100MG5,000100
5hmdC: Berry and Associates/PY75880.50.04
5mdC: Santa Cruz/ sc-2782561000.02
cadC: Berry and Associates/PY75930.50.02
dA: Sigma/D7400-250MG5,000100
dG: Sigma/8549995,000100
fdC: Berry and Associates/PY 75890.50.02
N6mdA: Alfa Aesar/ J649610.50.02
T: Sigma/89270–1G5,000100
Table 2
Retention times and transitions for nucleosides analyzed.
MoleculePrecursor ionQualifier Product ionQuantifier Product ionRetention time (min)
Figure 1 with 4 supplements see all
Quantification of DNA modifications across species.

(A) Multiplex analysis of various genomic DNA modifications using liquid chromatography-multiple reaction monitoring following enzymatic digestion of DNA. The regression curves and limit of detection (LOD) for modifications 5mdC, 5hmdC, and N6mdA are represented. Although our method also quantifies cadC and fdC, we did not detect significant concentrations of these in any of the measured samples; these modifications were hence omitted from the graphical illustrations. (B) A total of 286 tissue samples from 124 species were analyzed in the present study: 19 species from plants, 12 from animals, 6 from yeast, 2 from archaea, and 85 from bacteria. (C–D) Distribution of 5mdC, 5hmdC, and N6mdA across (C) archaeal, bacterial, and eukaryotic domains, and (D) animal, fungi, monera, plant, and protozoan kingdoms. The values depict percentage of cytosine residues bearing either methyl (%5mdC) or hydroxymethyl (%5hmdC) modification and percentage of adenine residues bearing methyl modification (N6mdA). Percentage modifications were calculated as ratio of modified cytosine residue and guanosine for 5mdC and 5hmdC; and ratio of modified adenine residue and thymine for N6mdA. The limits of detection for 5mdC, 5hmdC, and N6mdA are 4.6 nM, 320 pM, and 19 pM, respectively.

Upon setting up the method, we sampled cells or tissues for a large number of species across the three domains of life. Because our method does not include any amplification steps and detects modifications on the DNA directly, it requires clean DNA at microgram levels, at least for the detection of the lowly concentrated DNA modifications. Unfortunately for some rare specimens, we only had limited sample amounts, and in many cases, standard DNA preparation protocols did not yield DNA of sufficient quality or concentration for our assay. However, by combining different protocols and sources, we were able to obtain clean DNA at microgram levels for 286 distinct tissues. To isolate DNA, we employed mostly a spin-column kit (Genomic-tip 20/G, Qiagen) which is chemically mild to DNA, and avoided strategies that involve the use of oxidants and reactive chemicals. However, for plant species, due to their biochemical composition, we were forced to use phenol–chloroform extraction to obtain sufficient quantities of DNA. In such cases, reagents like β-mercaptoethanol (2-sulfanylethan-1-ol) were included to keep DNA damage to a minimum during the extraction. The obtained DNAs were from 124 different species, including 85 bacterial species, 6 yeast species, 2 archeal species, 19 plant species, and 18 tissue and cell-culture samples from multiple animal species, including human and mouse. The collection included both the typical model organisms, and specifically for bacteria, vertebrates, and plants we included a significant number of species that have been barely characterized at the molecular level so far (Figure 1B). Furthermore, for a number of vertebrates, including human, the model organisms mouse (Mus musculus), African clawed frog (Xenopus laevis), but also for some less studied species, the opossum (Monodelphis domestica), the Alpine marmot (Marmota marmota), and the Oman garra (Garra barreimiae), we obtained DNA from multiple tissues and/or cell lines in order to quantify tissue differences in the absolute DNA modification content. For plants, we focused on seedlings that were germinated in the lab (Varma and Calvani, 2022). The seedlings not only allowed for efficient DNA extraction, which can be hampered by high concentrations of plant polymers in fully differentiated plant tissues, but also for direct comparison between the plants at a similar developmental stage. Multiple species were analyzed in replicates to identify the extent of variation in the analytical technique which revealed reasonably consistent values for modifications measured across different species (Figure 1—figure supplement 2).

While multiple lower eukaryotes lack DNA modifications, N6mdA dominates in bacteria, and 5mdC is the dominating DNA modification across higher eukaryotes

Our results reveal major differences in the nature and global concentration of DNA modifications when comparing the domains of life (Figure 1C, D). First, despite the broad coverage, high sensitivity, and precision of our method, we did not detect significant levels of fdC and cadC in any of the genomes measured (limits of detection were 238 pM and 251 pM, respectively). These oxidized forms of 5-methyl-2′-deoxycytidine have been associated with the degradation of 5mdC (Ito et al., 2011), and according to our results they seem to remain undetectable across species as they are known to be labile and do not accumulate to significant, genome-wide-scale levels. In addition, neutral loss scans conducted in parallel, confirmed the picture that across species, only 5mdC, 4mdC, 5hmdC, and 6mdA reached notable concentrations on the genome-wide level. A notable exception was that we detected hardly any of these DNA modifications in the unicellular fungi studied (Supplementary file 1). Hence it is not merely 5mdC (Capuano et al., 2014; Binz et al., 2018; Nai et al., 2020), but also its oxidized form 5hmdC along with N6mdA that are very low if not absent in typical yeast species. It is interesting in this context that the insects Trichoplusia ni, Spodoptera frugiperda, and D. melanogaster (Supplementary file 1) all had DNA modifications, but also at much lower levels compared to both, higher organisms but also bacteria. Indeed, the fruit fly D. melanogaster has so far been considered an unusual case among the laboratory model organisms, as it contains only trace amounts, if any, of cytosine methylation (Capuano et al., 2014; Lyko et al., 2000; Zhang et al., 2015), but our data suggests this picture could be common to insects and other lower eukaryotes.

The presence of other DNA modifications in D. melanogaster like N6mdA has also been contested due to the presence of an appreciable gut microbiome, which could confound the results (O’Brown et al., 2019). We assessed this situation, comparing the genomic DNA obtained from fruit flies that possessed a functioning gut microbiome vs. ones grown under germ-free conditions. N6mdA was also detected in germ-free D. melanogaster (~0.04%, Figure 1—figure supplement 3). In a recent study comparing DNA adenine methylation levels in multiple eukaryotic species, the bacterial contamination affected the N6mdA measurements. However, it was possible to distinguish the N6mdA in D. melanogaster tissue from microbial contamination using quantitative deconvolution (Kong et al., 2022). While the adult D. melanogaster contained methylated adenine as a DNA building block, ovarian cells collected from two moth species (T. ni and S. frugiperda) principally contained methylated cytosine as the preferred base modification (0.2 and 0.1%, respectively).

What conclusions can be drawn from the low concentrations of DNA modifications in yeasts and insects? First, these results support the notion that enzymatic DNA modifications are not universal, which could have peculiar evolutionary consequences. Studies in yeast have concluded that DNA modifications could have been specifically lost during yeast evolution (Bhattacharyya et al., 2020). However, our result that insects can have similarly low DNA modification levels raises another possibility that DNA modifications could have evolved in higher eukaryotes and bacteria, after yeasts and insects branching off. As a rule, most genomes contained a single modification type that did pass the limit of detection of the highly sensitive method. Some exceptions to this were, however, encountered. A subset of the eukaryotes and a subset of prokaryotic species contained low concentrations also of a second modification, which could be either 5mdC, N6mdA, or 5hmdC (Figure 2, Figure 2—figure supplement 1). For instance, Diplotaxis tenuifolia had low amounts of N6mdA (0.1%, Supplementary file 1) next to high amounts of 5mdC. Notably, species that exhibited 5hmdC were also observed to contain its precursor 5mdC. Of particular interest was Raphanus sativus, which was the only species among those analyzed that possessed all the three modifications at detectable levels and in parallel. Among prokaryotes, we observed only cytosine and adenine methylation modifications, with 5hmdC entirely missing. Our study further featured two archeal genomes (Sulfolobus acidocaldarius and Halobacterium salinarum), which shared a similar level of the cytosine modification but differed in their levels of adenosine modification. While we detected N6mdA in H. salinarum, no adenosine modification was observed for S. acidocaldarius (Supplementary file 1).

Figure 2 with 1 supplement see all
The number of species detected containing one, two, or three DNA modification types above picomolar detection limit, grouped as eukaryotes (left) and prokaryotes (right).

The outer ring represents the kingdoms present within these domains. The groupings per number of modifications are shown as fill patterns on the inner ring, where dots represent species in which only one among 5mdC, 5hmdC, and N6mdA were found; crosses represent species bearing two modifications simultaneously; and no fill represents species carrying all three modifications.

Tissue divergence of 5mdC concentrations in vertebrate and plant genomes

Among the DNA modifications, 5mdC had the highest abundance and was specifically abundant in plants. Most vertebrate genomes studied had a 5mdC content of around 5% (mean 4.66, SD 2.17) of the cytosine residues. Some species, including the model organisms Danio rerio and X. laevis, had higher levels consistent with early observations (Colwell et al., 2018). In plants, however, 5mdC concentrations of 10% (mean 20.34, SD 9.81) and higher were typical (Figure 1D). Extremely high values for cytosine methylation were observed in Andropogon gerardii and Allium cepa, where more than 35% of cytosines were methylated (Figure 1D, Supplementary file 1). As plants are known to possess polyploid genomes, high cytosine methylation values could be attributed to silencing of multiplied genes and the much larger non-functional parts of their genome (Masterson, 1994). Given that very low levels or no 5mdC were detected in yeast and insects, cytosine 5 methylation levels hence differ by several orders of magnitude within the eukaryotic kingdom.

In multicellular organisms, DNA modifications are important for development, and tissue differences between DNA modification patterns are observed (De Bustos et al., 2009; He et al., 2020; Zhu et al., 2018). Our data suggests that a change in the modification pattern or sequence context does not necessarily have a strong impact on the total concentrations of the DNA modifications, however. We analyzed spleen, muscle, lung, liver, kidney, heart, and CNS samples from five animal species, of which two are model organisms (X. laevis, M. musculus), and three non-model organisms (G. barreimiae, M. domestica, M. marmota). From M. musculus we further examined tissues from multiple inbred laboratory lines: BALB/c, FVB/N, Hsd/Ola/MF1, B6SJL/CD451/CD452, BALB/cAnN, 129S8, and F1/CBAxB6. In parallel, we analyzed multiple human cell lines (Supplementary file 1). The obtained data was consistent, in the sense that the values for 5mdC levels were highly similar, as long as the tissues were derived from the same species (Figure 3A, left). For instance, most tissues in G. barreimiae, M. marmota, and M. musculus tissues had 5mdC levels of around 5–6% (Figure 3A). Between the different mouse lines, there were no significant differences in 5mdC levels (Supplementary file 1). We noted, however, some small but notable differences between specific tissues. Heart tissue presented a broad cytosine methylation level and brain tissue had a higher median value for percentage methylation compared to other tissues (5.3 vs. 4.9%; Figure 3B). We then tested whether different nutritional conditions would change the picture. Therefore, we grew a commonly used mammalian cell line (HeLa) under different growth conditions. The different growth conditions affected 5mdC levels, and the detected differences were in a similar magnitude as the small differences detected between tissues (Figure 3—figure supplement 1).

Figure 3 with 1 supplement see all
Distribution of DNA modifications in eukaryotes.

(A) The concentration of 5-methyl deoxycytidine (left) and 5-hydroxymethyl deoxycytidine (right) in different vertebrate genomes.n = 4 for G. barreimiae, M. marmota, M. musculus, X. laevis and n=3 for M. domestica.(B) Distribution of 5-methyl deoxycytidine (left) and 5-hydroxymethyl deoxycytidine (right) in different mouse tissues (n=5). Variations in percentage modification across different (C) non-plant eukaryotes including representatives from vertebrates like mammals, amphibians, and fish, invertebrates like insects and mollusks, and unicellular fungi and protozoa (D) plants species comprising both gymnosperms and angiosperms.

Overall, 5mdC concentrations in M. domestica (opossum) and X. laevis, respectively, were different to the aforementioned species. In opossum, we detected much lower levels (2%) of 5mdC in all tissues examined. Conversely, in X. laevis, all tissues had much higher concentrations (about 9.4%). Higher values in X. laevis could be attributed to the tetraploid genome of this species compared to its relative X. tropicalis, which is diploid (Head et al., 2014). However, also here, in both cases the tissue differences in the 5mdC concentrations were minimal, at least when compared to the differences that exist between species. Although we tested fewer cases in plants, our data suggest the situation could be similar there too. We tested different tissues (roots, leaf, stem, and seed cotyledon) from Phaseolus vulgaris and obtained consistently high (16.7%) 5mdC concentrations in all measured tissues (Supplementary file 1). Hence, the several tissues examined from animal species, cell lines, and P. vulgaris provided a largely consistent picture: in a given organism, several tissues exhibit similar levels of 5mdC, and, that within-tissue differences are typically smaller compared to the differences that can be detected between species.

Tissue specificity of 5-hydroxymethyl deoxycytidine in the vertebrate CNS

Tissue specificity was, however, detected for another modification, 5hmdC. Indeed, 5hmdC was previously discovered in mammalian brain tissue, where it is formed via oxidation of 5mdC by TET enzymes (Tahiliani et al., 2009; Globisch et al., 2010). Our dataset shows that 5hmdC is detected in a broad range of vertebrate tissues except for spleen, but reaches significantly higher concentrations specifically in samples from the CNS. Although the spleen tissues had similar 5mdC levels as other mouse tissues, 5hmdC was not detected in these tissues (Figure 3B). Interestingly, our data reveals that the highest 5hmdC levels were not detected in the mammalian brain, but in the fish G. barreimiae. were levels could reach up to 8% of cytosine residues Although lower compared to G. barreimiae, mammals M. musculus (3.3%), and amphibian X. laevis (2%) still had high levels of 5hmdC specifically in brain tissue relative to other tissues in those organisms (Figure 3A, right). An interesting exception was in opossum, the only vertebrate species analyzed, in which 5hmdC levels were not higher in the brain compared to peripheral tissue.

Apart from vertebrates, 5hmdC was also observed in A. thaliana and Oryza sativa (Mahmood and Dunwell, 2019). Our data shows that the presence of 5hmdC is by no means universal in plants, indeed, we did not detect it in the majority of plant samples. Nonetheless, our data adds several species (A. cepa, Laurus nobilis, Lepidium sativum, and R. sativus) in which we confirmed low concentrations of 5hmdC. Furthermore, we did not detect 5hmdC in any of the bacterial or fungal genomes analyzed. Our results support the fact that the modification of 5hmdC is more widespread in biological systems than previously assumed, but quantities above picomolar-levels being not detected in any bacteria, yeasts and many tissues from higher organisms implies 5hmdC is not universal or specific to any part of the phylogenetic tree.

Variations in DNA modification across different bacterial species

In prokaryotes, high amounts of DNA modifications all concerned N6mdA, with the highest levels detected in Mobiluncus curtisii (~1.4%) and Moorell thermoacetica (~1.1%). In total, the prokaryotic genomes hence contained higher amounts of DNA modifications compared to lower eukaryotes such as yeasts and insects, but lower amounts of DNA modifications compared to higher eukaryotes—plants and vertebrates in particular. Typical bacterial species contain only one type of modification—mostly N6mdA (Figure 4A). Our data reveals some exceptions. Certain genera such as Campylobacter contain trace quantities of 5mdC (<0.1%) next to the dominating N6mdA modification (Supplementary file 2). In general, the observed trend was that the occurrence of one type of modification limits the occurrence of the other. For instance, M. curtisii with ~1.4% of its adenine residues methylated shows only 0.3% 5mdC, while Sebaldella termitidis, with unusually high cytosine methylation (~2.4%), has only 0.1% of its adenines methylated. Interestingly, we observed that median values for 5mdC dominate over N6mdA in those bacteria that colonize or enter mutualistic relationships with higher eukaryote species that carry 5mdC as their main modification (Figure 4—figure supplement 1, Supplementary file 2). This included the genus Neisseria, mucosal-surface-colonizing bacteria, which showed 1.4 and 2% (Neisseria gonorrhoeae, Neisseria lactamica, respectively) of cytosine residues were methylated while containing only <0.3% N6mdA, and Faecalicoccus pleomorphus and Bifidobacterium adolescentis, with >1.5% of 5mdC without any detectable levels of N6mdA modification. Indeed, others made a similar observation in single-cell fungi. While the environmental yeasts studied herein and previously lacked any modifications (Capuano et al., 2014), the most frequent commensal yeast pathogen Candida albicans contained as sole yeast species 5mdC (Mishra et al., 2011). This result is interesting, because it could mean that host–pathogen interactions could select for similar DNA modifications in the pathogen as in the host. The study of future host–pathogen pairs is necessary to substantiate this observation and suggests that the picture about the functions of DNA modifications in prokaryotes is incomplete; 5mdC has thus far not been associated with function in pathogen immunity.

Figure 4 with 2 supplements see all
DNA modifications in bacteria.

(A) Percentage of cytidine methylated against the percentage of adenine methylated in bacterial species. (B) Variation of % 5-methyl deoxycytidine and % N6-methyl deoxyadenosine among taxonomic divisions: phylum, class, and genus. One-way ANOVA, p-values for phylum, class, and genus are 0.017, 7×10–4, and 0.16, respectively. (C) Distribution of 5mdC and N6mdA among 85 bacterial species depicted together with their phylogenetic relationships. Percentage modifications are calculated as ratio of modified cytosine residue and guanosine for 5mdC and 5hmdC; and ratio of modified adenine residue and thymine for N6mdA.

Having analyzed 85 species, we were able to ask if bacterial species with a close evolutionary relationship or similar habitat or genome properties also have a more similar modification makeup. We did not detect any relationship between nature and level of modification with genome size or GC content (Figure 4—figure supplement 2). Similarly, we detected no significant correlation between factors such as pathogenicity, temperature of growth, or tolerance to oxygen and the amount of modifications per unit genome size (not shown). We did, however, observe obvious patterns at the different taxonomic levels once we grouped the different bacterial strains according to phylum, class, and genus. Similarities are detected at the genus level (Figure 4B, C). Members of the same genus often displayed similar values for a given modification. For example, species of the Vibrio genus presented similar quantities of N6mdA. At the class level, we observed trends between the different classes and the amount of modification. α- and γ-Proteobacteria had the highest N6mdA content among different classes present while bacteroidetes presented with more cytidine methylation than adenosine methylation. At the phylum level, the patterns were more prominent in Proteobacteria, containing more N6mdA than 5mdC, while a reverse trend of more 5mdC than N6mdA was observed for Bacteroidetes and Firmicutes. Finally, we also observed a third modification, 4mdC, to be frequent in prokaryotes. 4mdC was detected in tandem with 5mdC as a second modification in Shewanella putrefaciens, Stenotrophomonas maltophilia, Bifidobacterium dentium, M. curtisii, and Gallibacterium anatis (Figure 1—figure supplement 2, not quantified).

Although the exact mode of inheritance could not be inferred from the present study, it is worth pointing out that the vastly different amounts of DNA modifications also indicate differences in the way they are inherited. It is plausible that the activities and specificities of DNA methylases and demethylases differ between species with high or low amounts of global DNA modifications. Combined, these results suggest that differences in the modifications do not reflect basic structural genome features such as size or GC content, but rather show that more closely evolutionarily related species have higher similarities in DNA modification implying gene drift and gene function are key drivers in the evolution of DNA modifications.

Materials and methods

For a description about the sources of samples and their extraction, please refer to ‘Supplementary Information for sample sources: Global analysis of cytosine and adenine DNA modifications across the tree of life’ (Varma and Calvani, 2022).

DNA extraction

Request a detailed protocol

DNA extracts were treated with RNase A (VWR, Cat.No. E866-5ML) at 37°C for 45 min, and DNA purification was performed using QIAGEN Genomic tip-20/G according to the manufacturer’s instructions. Purified DNA was precipitated with isopropanol, washed with 70% ethanol, and resuspended in 10 mM Tris-HCl, pH 8.0. Quantification was done using a dsDNA BR Assay Kit (Qubit). The DNA sample was then digested into corresponding nucleosides using DNA Degradase Plus (Zymo Research, E2020). 1 µg of DNA was treated with 5 U of DNA degradase at 37°C for 2 hr in a final volume of 25 μl and the reaction was inactivated by incubating the samples for 20 min at 70°C as described by the manufacturer. Calibration standards were prepared in 1:4:4:2:2:2:2:4:4:4:4:4:4:4:10 serial dilutions from a standard stock that was prepared as per Table 1 and stored at –80°C.


Request a detailed protocol

The samples were diluted 1:1 with MeOH 10% (v/v) containing 0.2% formic acid, and 10 µl corresponding to 200 ng of gDNA were injected onto a liquid chromatography system equipped with reverse phase Acquity UPLC HSS T3 column, 100 Å, 1.8 µm, 2.1 mm × 150 mm (Waters), column temperature 25°C and flow rate of 0.2 ml/min. Mobile phase A: 0.1% formic acid +10 mM ammonium formate in water, mobile phase B: 0.1% formic acid +10 mM ammonium formate in methanol. Gradient for elution was started from 5% mobile phase B to 35% B over 11.5 min followed by sharply increasing to 80% over the next 1.5 min. The gradient was held at 80% B for 2 min, lowered to the starting gradient over 1 min, and equilibrated for 6.5 min. Total length was 22.5 min.

The eluent was directed to an electrospray ion source connected to a triple quadrupole mass spectrometer (Agilent 6470 QQQ) equipped with an Agilent Jet stream source, operating in positive mode. The ESI source settings were: gas temperature: 300°C; gas flow: 6.4 l/min; nebulizer: 50 psi; sheath gas heater: 350°C; sheath gas flow: 7 l/min; capillary: 2000 V. The transitions monitored for MRM experiments are listed in Table 2.

For neutral loss experiments, the samples were injected as per the same LC parameters used for the MRM experiment while the mass spectrometer was set to a scan type of neutral loss (M=116 Da) while scanning the quadrupoles from 230 to 250 Da. The scan time was 1000 with a step size of 0.05 amu and the values for Fragmentor, collision energy, and cell accelerator voltage were 73, 8, and 5, respectively. 4mdC was detected as the second peak in the neutral loss (Δ = 118) chromatogram corresponding to parent ion 242 Da.

Data processing and analysis

Request a detailed protocol

Peak areas were extracted and integrated using MassHunter for QQQ to obtain the concentrations after applying the necessary limits of quantification. Subsequent processing for batch-to-batch variation and technical outlier removal were carried out using R or Python. A single reference mouse DNA sample was included in every measured batch to monitor batch-to-batch variation. Median-value-based normalization of the reference mouse samples was used to obtain the correction factor with which the corresponding batch was corrected. The results are depicted as percentage modification with respect to dG (for 5mdC and 5hmdC) and T (for N6mdA). The phylogenetic clustering was carried out using a newick file generated using NCBI Taxonomy (PhyloT) and the ggtree package (Yu, 2020). Features of bacteria were retrieved from the bacterial metadatabase BacDive (, accessed April 14, 2020; Reimer et al., 2019).

Data availability

All data is available as Supplementary Materials. The sample source information is provided as a separate document (Varma, Sreejith; Calvani, Enrica (2022)), Supplementary Information for sample sources: Global analysis of cytosine and adenine DNA modifications across the tree of life, Mendeley Data, V1, doi: (

The following data sets were generated
    1. Varma S
    2. Calvani E
    (2022) Mendeley Data
    Supplementary Information for sample sources: Global analysis of cytosine and adenine DNA modifications across the tree of life.


  1. Book
    1. Day JJ
    (2017) DNA Modifications and Memory
    In: Bredy TW, editors. DNA Modifications in the Brain. Academic Press. pp. 95–111.
    1. Hotchkiss RD
    The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography
    The Journal of Biological Chemistry 175:315–332.

Decision letter

  1. Jessica K Tyler
    Senior and Reviewing Editor; Weill Cornell Medicine, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

Author response

We would like to thank the reviewers for their valuable and constructive comments. We further would like to stare that we find the approach of Review Commons very refreshing. It’s great to receive comments on the science that is presented, without being judged about the suitability of the manuscript for a specific journal (and the cultural bias that has evolved around that).

Reviewer #1:

Specific points:

1. The authors observed a wide range of cytosine modification percentages across different living species, but have not really offered a plausible biological explanation of why that is. Could the authors provide some more speculation about the biological significance of this observation and species-dependent variations?

We thank the reviewer for their constructive feedback. The quantification of DNA modifications across many species does of course not provide a mechanistic explanation why the concentration of these DNA modifications has evolved so differently. Generally, it is however possible speculate about the key differentiators. It is hard to avoid noticing the most complex and large genomes, have high concentration of DNA cytosine methylation, while the most compact genomes, are bacterial genomes, possess with N6mdA. The fungal and insect genomes which are in between, are compact but possess the structure of a eukaryotic genome, are low in modifications in general. The key drivers are hence likely gene expression regulation in the complex genomes, and preventing elements that interfere with high compactness (like transposable elements or viral genome replication) in the bacterial genomes. (Page 8, first paragraph)

2. As a rather striking example (Figure 1D/D), why would the species of Eukaryota and Plantae have much higher frequencies of 5mdC as compared to the other species that were examined?

Not only do these species possess larger genomes, but they also contain much larger intergenic regions, more pseudogenes, and transposable elements. We hence speculate that one of the main use of DNA modifications is to be able to both to suppress the expression of non-functional genomic elements, and for gene expression regulation. (page 9 and 10)

3. As a conclusion, the authors mention "….related species have higher similarities in DNA modification suggests that gene drift and gene function are key drivers in the evolution of DNA modifications". Do the authors relate to a degree of "inheritance" of these modifications? This is in analogy to histone epigenetic modifications and is worth discussing also in this context.

We can derive conclusions from on the evolutionary relationships, but not conclude on the mode of inheritance from our comparative analytical study. But its worth speculating that the vastly different amount of DNA modifications also points to differences in the way they are inherited. It is plausible that the activities and specificities of DNA methylases and demethylases differs between species with high- our low amounts of global DNA modifications. We have hence expanded the section accordingly. (page 12, last paragraph).

4. The methodology or DNA detection by mass spectrometry is described in the Supplementary section, including Table S2 showing retention times and MRM transitions used by the authors. However, the authors do not show any examples of primary data, such as chromatography (e.g. TIC / EIC) profiles and mass spectra using standards and showing how they detected these modified nucleosides within the context of sample matrix. Examples reflecting the detection of each DNA modification should be provided and included.

We appreciate this suggestion. EICs for standards and sample matrix measurements from cell lines, bacteria, mice and plant samples are now included as Figure S1.

Reviewer #2:

– The Title should reflect the fact that this study is mainly focused on 5mC, 5hmC and 6mA. The term DNA modifications is much wider than the modifications assessed by the authors (e.g. the authors do not attempt to analyse the content of 5hmU, 5gmC etc as well as DNA lesions such as 8oxoG in the corresponding genomes). Therefore, the current title is slightly misleading.

We appreciate the comment of the Reviewer. There are two aspects here. Indeed, we have measured, and screened, for a much broader set of modifications. The reason we have concentrated on 5mC, 5hmC and 6mA is however because these were the only modifications that we detected at significant concentrations in the samples. 8oxoG is an exception, it’s also present, but this modification has a very different biology, as an intermediate step of the excision repair pathway, and was omitted as it’s a stress signal, and hence strongly condition dependent. Also, the neutral loss scan experiments we conducted, confirmed, that in all species analyzed, only 5mC, 4mc 5hmC or 6mA reach more than trace levels of abundance across all species. We apologize that this important point situation was not sufficiently explained and have reworked abstract and rationale accordingly.

– The authors should clearly state what they mean by the 'modification percentage' in each figure/legend. E. g. 5hmC/C %, or 5mC/C percentage. This should be explicitly written in each panel.

We have now included the description for "modification percentage" for each figure.

– Slightly more speculation on the biological functions of the DNA modifications and their phylogenetic distribution would make the paper more interesting. In this regards the incorporation of a separate 'Discussion' section would be beneficial for the manuscript.

We agree, also reviewer #1 made that point. As both reviewers encourage us to become more speculative, we have now expanded the discussion with some suggestions and hopefully interesting speculations. (page 8, 9 and 10)

Reviewer #3:

1. It would be relevant to conduct at least a subset of analyses as independent duplicates or triplicates in order to define the expected error margins.

We have conducted the analyses in triplicates (or more) in most instances, i.e. in all

those situations where we could obtain sufficient biological material. We have now included a figure (Figure S1) for species where measurements have been carried out in replicates. Generally, owing to the very high analytical precision of LC-SRM methods, the replicates are in excellent agreement.

2. It would be highly interesting to indicate LODs (or at least approximate LODs) in some of the figures, like 1C, D, 2A, B or 3A, B.

We have included the limits of detection for Figure 1-4.

3. I find the general concept of "presence" of a modification not convincing. This refers to Figure 2 and several occasions in the text, where it is said, that most species (and most bacteria) tend to "have" only one modification. Here the threshold between presence and absence is not properly defined. The statements also go against established knowledge telling us that many eukaryotes have 5mC and 5hmC (i.e. two modifications), and several bacteria do as well, like the well-know E. coli, which have the Dam and Dcm system generating 6mA and 5mC.

We apologize for the confusion, and the reviewer is of course right that we can and should be very specific here. By the usage "presence of a modification", we of meant to convey that the modification was detected above the (extremely sensitive) detection limit of our method. A modification not being detected does certainly not exclude the possibility of it being present at low levels- but owing due to the high sensitive and the data-independent nature of our approach – this can be only be trace quantities (just to note, we have far better detection limits as sequencing methods). We agree with the Reviewer that we have, and can, to be more precise in this respect, and have updated our text accordingly to include limits of detect with every figures.

4. 5hmC cannot exist without having 5mC, so I doubt if this concept overall makes sense.

Our data suggests the reviewer is right, but in theory, there was the remote possibility that there are species out there which convert all 5mC to 5hmdC. Our technology is agnostic to the order in which modifications are made or removed. Our data thus far supports the reviewer’s stance: In all cases where 5hmdC was detected, we detected also 5mdC consistently. (Table S3). (page 8, paragraph 2). The manuscript has been updated accordingly.

5. From a technical point of view, it would be relevant to document that distinction of 5mC and 4mC is really reliable.

We can distinguish the two modifications based on their chromatographic retention. Chromatograms for the neutral loss transition 242 -> 126 corresponding to the loss of ribose moiety (M=116) has been included along with authentic 5mdC TIC. To illustrate this, we are now including a chromatogram presenting an example of an organism containing 4mdC in majority(Citrobacter koseri) or an example of an organism containing 5mdC in majority 5mdC (Listeria innocua) or both (Shewanalla putrefaciens).

6. I wonder whether it would not be better to have a log scale y-axis in several of the figures.

We have considered the suggestion of the reviewer, for using log scale to present

the data. However, part of the value of our technology is that we can accurately quantify also low levels, and low differences, of the modifications and to distinguish closely related species. We believe log scale could make it difficult for readers to compare against many of the existing reports that do not use log scales. Rather we have tried to segregate species with similar magnitude together. We agree it’s a compromise, depending on whether one is more interested on the macroscopic rather than microscopic picture, the one or the other would be better.

7. Current literature has documented an import of modified nucleotides with the biochemical reagents. Can the authors exclude this and/or comment on this?

Indeed, the fact that artifacts of sample preparation does affect the values of modifications in the literature due to chemical reactions has also been a motivation for us in developing this analytical method, which is complementary to sequencing techniques. The reviewer is right that the chemistry does also not go away in LC-SRM technologies, although here in the problem is confined to chemical conversion of nucleotides, while we do not suffer the much bigger problem, that comes i.e. from the amplification of nucleic acids in several sequencing methods.

Indeed, the anger of artefacts was one reason to not include 8oxG as stress-sensitive modification in our study.

Further, our method avoids the usage of oxidants and other reactive reagents and instead used strategies that are much milder on the DNA (ex. spin column). For plant species, due to their biochemical composition, we were forced to use phenol-chloroform extraction to obtain enough DNA. But the inclusion of reagents like 2-thio ethanol, could keep this to a minimum. We discuss this now at much more depth in the manuscript, and added a caveat (page 6 paragraph 1).

8. Minor comments

The sentence in the abstract regarding host-pathogen interactions is not backed up by data.

We agree in principle, this were only anecdotal observations, we had not enough host-pathogen interactions represented in our study to be able to claim this conclusion being broadly robust. We have removed the statement from the abstract, and backed down on the claim in the results, just mention it as a possibility that would be compatible from the few cases contained in our data in the results and Discussion section.

Introduction: Sequencing methods do provide quantitative methylation information, but only for individual sites not globally. This should be corrected.

The proposed correction has been implemented. We have used the sentence of the reviewer also updated the abstract, to highlight the complementarity.

Introduction: typo in consent, which should be content

The proposed correction has been implemented.

Figure 4 legend: …species displayed together with their phylogenetic tree.

The proposed correction has been implemented.

Reviewer #4:

Major comments:

– The manuscript is well written, but brief, in some cases, too brief. There needs to be a more robust discussion about the function of DNA methylation, what is already known about its function, patterning, levels across different kingdoms and phyla, and what this finding brings to our understating of epigenetics aside from cataloging the presence or absence of specific modifications.

We agree, indeed also Reviewer’s 1 and 2 encouraged us to be less neutral in presenting our results and encouraged us to speculate a little more. We have hence expanded the discussion.

– Some of the data are presented in a robust way while others are presented in a redundant or not linear way with scant detail or depth. More consistency in the way data is presented is warranted. For instance, Figure 2 is overly simplified, figure 3 lacks labels of the kingdoms and phyla represented, and there is no meaningful evolutionary distance assigned to may of the graphs.

We apologize for the situation but this is not our fault. The taxonomy in biological sciences is far from being ideal with different taxonomic classifications used by different communities (ex. phylum, class etc. are valid for prokaryotes and eukaryotes but not for plants which are described using clades). A solution, although not ideal, was to cluster the species based on their taxonomic proximity. We give however all results in the detailed supplementary tables (Table S3), so results from each species are well contained and accessible.

– Since most studies of DNA methylation include the number of modified C/ number of total C, the parallelism of this study with other data is difficult.

We are a bit confused by the reviewer’s comments. We have re-searched the literature. but could not reproduce that our study is replicating others. Most species contained in our dataset are indeed analyzed for their content of the DNA modifications for the first time. In other aspects, our study complements manuscripts that use sequencing technologies, with overall numbers.

– There is a glaring omission the data: since methylation is measured on cytosines, and, for 5methyl-cytosine typically in a CpG dyad, without knowing how CpG specifically and cytosine overall is present in these genomes, it is difficult to interpret. While this data cannot be obtained accurately in the absence of a reference genome, for those organisms where there is a reference, it should be included. This point should also be added to the introduction and discussion

The reviewer comment applied to the higher eukaryote genomes, several of which are included in our study. We agree with the reviewer that position specific information as obtained by DNA sequencing is essential to interpret the specific role of 5mC in these species, specifically in CpG islands, but there are other reasons for why one also needs the total numbers. (see above). We have revised the introduction and rationale accordingly, and hope that the complementary nature of our study is now clearer.

– It is not clear why the authors choose to represent the DNA modification values as a percentage of modified C or A over respectively G and T [as reported in the supplemental material (5mdC/dG or N6mdA/T)*100]. This analysis generated values of DNA methylation that are not apparently consistent with previous datasets (i.e human, mouse, zebrafish, etc.). It is difficult to understand the data of the new species analyzed and to make parallelism across those. The authors should consider using the total number of C or A recovered in the analysis of each sample and where possible compare and verify that this amount corresponds to the one annotated in the available genomes.

For most of the species studied in our manuscript, there are no analytically determined absolute concentrations of DNA modifications found in the literature. Furthermore, is difficult to conclude from the local concentrations as determined by sequencing methods, on the real total amount of modified residues, specifically in species with complex genome structure, high ploidies, and lots of repetitive sequences. That is why one needs to measure these total concentrations with a direct and quantitative analytical method, as we have done it and our paper, and not extrapolate these values from indirect measures, like a genome sequence.

– The method used to calculate DNA modification percentage, represented as normalized over G and T, does not take into account that the relative amount of G and T can differ in the respective genome/organism analyzed. When making comparisons directly across different species, the authors should consider to normalize and scale the total number of G or T present in each sample.

We apologize for any confusion and if there was lack of clarity. In this method, we have of course not only measured the modified bases but also the unmodified ones (C, T, G and A). Therefore, the variation in the amount of G or T is reflected in the measurements and shouldn't influence the DNA modification percentage, and we can accurately represent the relative levels of modifications, irrespective of genome size.

– The authors claim for Figure 3B "Our dataset shows that 5hmdC is detected in a broad range of vertebrate tissues except for spleen, but reaches significantly higher concentrations specifically in samples from the CNS". Even if the trend is clear the authors should consider using some statistical test to claim significant differences.

Assuming a normal distribution, and have applied a simple T-test, which showed that the difference is highly significant.

– In Figure 4B the authors state "At the phylum level the patterns were more prominent in Proteobacteria, containing more N6mdA than 5mdC, while a reverse trend of more 5mdC than N6mdA was observed for Bacteroidetes and Firmicutes". The reverse trend of Bacteroidetes and Firmicutes is not present as the median of the box plot is lower for 5mdC compare to N6mdA and the data have just bigger dispersion in that Phylum. As already mentioned, the authors should perform some statistical test to claim differences.

Assuming a normal distribution, and have done analysis of variance, which showed that the difference is highly significant. The p values are given in the figure description.

Minor comments:

– For some of the plant samples, the authors use seeds as they represent an easier and cleaner source of genomic DNA. Is the analysis of DNA modifications in seeds comparable to "adult" fully developed tissues? Or is it different, as it happens when comparing gametes to adult tissues in animals?

We would like to correct the reviewer that we use plant seedlings (not seeds). This unlike gametes have differentiated tissues. Our (admittedly limited) data of different plant tissues suggests that the total overall concentrations is quite similar between different tissues, a situation that is also observed in mammals. We discuss this in the revision.

– Figure 4.B-C is not described in a linear way in the text, not reflecting the order in which the panels are positioned. Authors should consider reshuffling either the text or panels.

We apologize for the lack of clarity. We have performed the necessary corrections.

– In the text, there are many references to "Methods" section but this section is missing in the main text and that information are kind of reported altogether within the supplementary file. The authors should consider creating an appropriate Material and Method section in the main text.

As the material section is quite large due to large sample size with their respective protocols, we had to move this section to the supplementary information file. But to aid the reader find the sections easier, we have now included the important aspects of the section in the main manuscript.

– The common names and species names are used interchangeably; for readers to keep track, it would help to have a table listing the common names with the species names.

We apologize for the inconsistency in the usage. To simplify, we have reverted to the usage of only scientific binomial names of all the species used in this study.

– The authors should consider investigating deeper some of the interesting findings they observed in the group of bacteria hosted by higher eukaryotes that showed higher 5mdC compared to the free-living corresponding bacteria. This could be of interest not only at the epigenetic or molecular level but could also harbor a clinical relevance since some of them are able to induce human pathologies.

As this manuscript was intended only as a resource manuscript and microbiology not being the specialty of our laboratory the request of the reviewer is out of scope- we hope however, that exactly such investigations are stimulated by our resource.

Article and author information

Author details

  1. Sreejith Jayasree Varma

    Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
    Data curation, Formal analysis, Visualization, Writing - original draft
    Contributed equally with
    Enrica Calvani
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1669-2254
  2. Enrica Calvani

    1. The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
    2. Department of Biochemistry and Cambridge Systems Biology Center, University of Cambridge, Cambridge, United Kingdom
    Conceptualization, Data curation, Investigation, Methodology
    Contributed equally with
    Sreejith Jayasree Varma
    Competing interests
    No competing interests declared
  3. Nana-Maria Grüning

    Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
    Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Christoph B Messner

    1. The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
    2. Department of Biochemistry and Cambridge Systems Biology Center, University of Cambridge, Cambridge, United Kingdom
    Methodology, Resources
    Competing interests
    No competing interests declared
  5. Nicholas Grayson

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9998-6783
  6. Floriana Capuano

    Department of Biochemistry and Cambridge Systems Biology Center, University of Cambridge, Cambridge, United Kingdom
    Competing interests
    No competing interests declared
  7. Michael Mülleder

    Core Facility-High Throughput Mass Spectrometry, Charité Universitätsmedizin, Berlin, Germany
    Methodology, Resources
    Competing interests
    No competing interests declared
  8. Markus Ralser

    1. Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
    2. The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
    3. Department of Biochemistry and Cambridge Systems Biology Center, University of Cambridge, Cambridge, United Kingdom
    Conceptualization, Funding acquisition, Supervision, Writing – review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9535-7413


Cancer Research UK (FC001134)

  • Markus Ralser

Medical Research Council (FC001134)

  • Markus Ralser

Wellcome Trust (FC001134)

  • Markus Ralser

Federal Ministry of Education and Research (BMBF) (031L0220)

  • Markus Ralser

Wellcome Trust (200829/Z/16/Z)

  • Markus Ralser

Wellcome Trust (101503/Z/13/Z)

  • Markus Ralser

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.


We thank Biological Research Facility at Francis Crick Institute for Mus musculus, Danio rerio, Xenopus laevis, samples, Bryony Lee (Turner lab, The Francis Crick Institute) for opossum samples, Annick Sawala (Gould Lab) for Drosophila samples, Cell Services (The Francis Crick Institute) for animal cell lines, National Yeast Collection for yeast samples, Felix Forest (Kew Gardens) and Nell Jones (Chelsea Physic Garden) for plant samples, Barbara Tautsher, Elisabeth Haring, Luise Kruckenhauser, for Cepaea hortensis and Garra barreimiae samples (Natural History Museum of Vienna), Florian Winkler, Heinrich Aukenthaler, Erhard Seehauser, and Gottfried Hopfgartner (Forestry and Hunting Authorities South Tyrol, or Jagdrevier Mauls, Bolzano Province, Italy) for their support in obtaining tissue samples from alpine marmot in their wild habitats of Mauls and Gsies (Italy). We thank Christiane Kilian and Daniela Ludwig (Charité Universitätsmedizin Berlin) for yeast, plant and cell line samples. We thank Skirmantas Kriaucionis, Rob Klose, Julian Parkhill, Benjamin Heineike and Hezi Tenenboim for providing feedback on our manuscript. This work was supported by the Francis Crick Institute which receives its core funding from Cancer Research UK (FC001134), the UK Medical Research Council (FC001134), and the Wellcome Trust (FC001134), and received specific support from the Wellcome Trust (200829/Z/16/Z, 101503/Z/13/Z) and the German Ministry of Education and Research (BMBF) as part of the National Research Node “Mass spectrometry in Systems Medicine (MSCoresys)”, under grant agreement 031L0220A.

Senior and Reviewing Editor

  1. Jessica K Tyler, Weill Cornell Medicine, United States

Version history

  1. Preprint posted: March 23, 2022 (view preprint)
  2. Received: June 13, 2022
  3. Accepted: June 15, 2022
  4. Version of Record published: July 28, 2022 (version 1)


© 2022, Varma, Calvani et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 1,205
    Page views
  • 252
  • 5

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Sreejith Jayasree Varma
  2. Enrica Calvani
  3. Nana-Maria Grüning
  4. Christoph B Messner
  5. Nicholas Grayson
  6. Floriana Capuano
  7. Michael Mülleder
  8. Markus Ralser
Global analysis of cytosine and adenine DNA modifications across the tree of life
eLife 11:e81002.

Further reading

    1. Biochemistry and Chemical Biology
    2. Cell Biology
    Riham Ayoubi, Joel Ryan ... Carl Laflamme
    Research Advance

    Antibodies are critical reagents to detect and characterize proteins. It is commonly understood that many commercial antibodies do not recognize their intended targets, but information on the scope of the problem remains largely anecdotal, and as such, feasibility of the goal of at least one potent and specific antibody targeting each protein in a proteome cannot be assessed. Focusing on antibodies for human proteins, we have scaled a standardized characterization approach using parental and knockout cell lines (Laflamme et al., 2019) to assess the performance of 614 commercial antibodies for 65 neuroscience-related proteins. Side-by-side comparisons of all antibodies against each target, obtained from multiple commercial partners, have demonstrated that: (i) more than 50% of all antibodies failed in one or more applications, (ii) yet, ~50–75% of the protein set was covered by at least one high-performing antibody, depending on application, suggesting that coverage of human proteins by commercial antibodies is significant; and (iii) recombinant antibodies performed better than monoclonal or polyclonal antibodies. The hundreds of underperforming antibodies identified in this study were found to have been used in a large number of published articles, which should raise alarm. Encouragingly, more than half of the underperforming commercial antibodies were reassessed by the manufacturers, and many had alterations to their recommended usage or were removed from the market. This first study helps demonstrate the scale of the antibody specificity problem but also suggests an efficient strategy toward achieving coverage of the human proteome; mine the existing commercial antibody repertoire, and use the data to focus new renewable antibody generation efforts.

    1. Biochemistry and Chemical Biology
    2. Microbiology and Infectious Disease
    Rui-Qiu Yang, Yong-Hong Chen ... Cheng-Gang Zou
    Research Article

    An imbalance of the gut microbiota, termed dysbiosis, has a substantial impact on host physiology. However, the mechanism by which host deals with gut dysbiosis to maintain fitness remains largely unknown. In Caenorhabditis elegans, Escherichia coli, which is its bacterial diet, proliferates in its intestinal lumen during aging. Here, we demonstrate that progressive intestinal proliferation of E. coli activates the transcription factor DAF-16, which is required for maintenance of longevity and organismal fitness in worms with age. DAF-16 up-regulates two lysozymes lys-7 and lys-8, thus limiting the bacterial accumulation in the gut of worms during aging. During dysbiosis, the levels of indole produced by E. coli are increased in worms. Indole is involved in the activation of DAF-16 by TRPA-1 in neurons of worms. Our finding demonstrates that indole functions as a microbial signal of gut dysbiosis to promote fitness of the host.