1. Chromosomes and Gene Expression
  2. Genetics and Genomics
Download icon

Epigenetic Markers: How to build your own island

  1. Colum Walsh  Is a corresponding author
  2. Avinash Thakur  Is a corresponding author
  1. University of Ulster, United Kingdom
  • Cited 0
  • Views 1,868
  • Annotations
Cite this article as: eLife 2014;3:e04779 doi: 10.7554/eLife.04779


Inserting artificially-generated ‘DNA islands’ into a genome has shed new light on why some DNA sequences are methylated and others are not.

Main text

Navigating your way through the sea of information that is contained in the DNA of a genome can be a daunting task. The human genome, for example, contains about 3 billion base pairs and encodes around 20,000 genes. It is fortunate, therefore, that this sea is punctuated by ‘islands’ that mark the locations of important features. CpG islands, for example, mark the start of genes so reliably that they were used to identify genes in the pre-genomics era, before sequence information was so readily available. Despite this, we still do not fully understand why such islands are associated with essential DNA features, or which properties of these islands are crucial to their function. Now in eLife, two independent teams—one led by Adrian Bird, the other led by Dirk Schübeler—have shed more light on these enigmatic elements by taking advantage of recent advances in recombinant DNA technology.

While the four bases in DNA—A, C, T and G—are found in almost equal numbers in most DNA molecules, particular combinations of two bases can be more common in certain stretches of DNA than others. In particular, C is rarely followed by G (written as CpG); however, early research showed there are ‘islands’ where these CpG sites are common (Bird, 1980). Furthermore, these islands also have more Gs and Cs overall than the rest of the genome. This G + C enrichment does not always correlate with frequency of CpG sites, suggesting that the two properties should be considered separately.

CpG islands were also discovered to be important for transmitting epigenetic marks (that is, for transmitting heritable information that does not depend on the DNA itself; Mohandas et al., 1981; Li et al., 1993). A CpG site is recognised by enzymes that add a chemical tag called a methyl group to the C base. The methylation of the DNA in CpG islands inactivates nearby genes. Moreover, these epigenetic marks can be recreated whenever DNA is copied, and can therefore be passed on to new cells at each cell division. It is also known that some proteins bind specifically to methylated islands, and others to unmethylated islands (Nan et al., 1993; Lee et al., 2001). Both of these groups of binding proteins then recruit enzymes to the island. The enzymes in turn add methyl groups to the histone proteins, which help to package DNA in the cell. Different kinds of histone modifications can either activate or inactivate the nearby genes, and can also be passed on to new cells.

The Bird and Schübeler labs have taken advantage of recently developed ‘recombineering’ approaches which use high-throughput DNA technology to allow the efficient insertion of different DNA fragments into the same site in a genome (see Figure 1A). Bird and colleagues in Edinburgh and Dresden—including Elisabeth Wachter as first author—generated an artificial CpG island that resembled naturally-occurring ones in terms of CpG frequency and G + C density (Wachter et al., 2014). After this island was inserted into a region of the genome that contained relatively few genes, the DNA on the island was unmethylated (Figure 1B). Furthermore, the artificial island was able to recruit histone-modifying enzymes, as shown by the accumulation of two kinds of chemical marks on the histones. These histone marks are normally associated with promoters in embryonic stem cells: one is found on active promoters, and the other is associated with inactive promoters. The artificial island therefore seemed ‘poised’ for either activation or repression, even though there were no genes nearby.

An artificial CpG island reveals its secrets.

(A) Wachter et al. inserted stretches of DNA that were G + C rich, and also had a high frequency of CpG sites, into an area of the genome that is devoid of genes (‘gene-poor region’). Histones associated with the DNA were assayed for histone marks: the number 2 indicates histones associated with the artificial CpG island; 1 and 3 indicate histones not associated with the island. (B) The CpG sites in these artificial islands remained unmethylated and recruited both activating and inactivating histone marks (labelled H3K4me3 and H3K27me3 respectively). The rows of open or filled circles represent unmethylated or methylated CpG sites; and the graph represents the frequency of each histone mark in different regions, in and around the artificial island. (C) Removal of CpG sites from the artificial island prevented the accumulation of both types of histone mark. (D) Decreasing the G + C content caused CpG sites to be over-methylated, which blocked histone modification. However, this could be overcome, at least in part, by either preventing CpG site methylation to begin with, or (as revealed by Krebs et al.) by adding a strong binding site for a transcription factor into the island which could drive its demethylation (arrow).

Wachter et al. then resynthesized versions of the same artificial CpG island, but with either fewer CpG sites and the same number of Gs and Cs, or vice versa. By altering these two features separately, they could address the roles of each of these features for the first time. When CpG sites were essentially removed, but G + C content remained the same, histone modifications were absent (Figure 1C). This suggests that a high frequency of CpG sites is what leads to the ‘poised’ state described above.

When Wachter et al. tested this and reduced the G + C content of the artificial island, but left the CpG frequency unchanged, they found unexpectedly that the island also showed no histone modifications (Figure 1D). However, the DNA of this ‘G + C poor’ island was methylated, which seemed to block modifications to histones. This idea was later confirmed by making artificial CpG islands in cells that cannot target any new DNA for methylation (Figure 1D, arrow). In this background, the G + C poor island could once more recruit both activating and inactivating histone modifications, albeit at lower levels. These results highlight the importance of a high G + C content in protecting the DNA of CpG islands from being methylated.

The Schübeler lab in Basel has previously used a limited recombineering approach to begin to identify DNA sequences that can recreate their normal methylation pattern even when they moved them to a new location in the genome (Lienert et al., 2011). Now Schübler and colleagues—including Arnaud Krebs as first author—have used a similar approach but with a more extensive collection of DNA fragment sizes and types (Krebs et al., 2014). They compared the methylation of each fragment in its original position and in its ‘transplanted’ location. The findings of Kreb et al. largely matched those of Wachter et al., and show that DNA fragments or islands with a high CpG frequency retain a non-methylated state, whereas other fragments show more variation in their methylation patterns. Krebs et al. also observed that CpG islands with binding sites for transcription factors (proteins that help to switch gene expression on or off) were more likely to be unmethylated. Furthermore, Krebs et al. could drive the demethylation of an artificial island with an intermediate CpG density by inserting a binding site for one such transcription factor (called REST) into it (Figure 1D, arrow). These results suggest that proteins that strongly activate gene expression can influence the final methylation state that is acquired.

Wachter et al. and Krebs et al. both highlight the ability of the primary DNA sequence to program the default epigenetic state, and show that this can be further influenced by transcription factors for some types of genomic island. They also indicate that proteins that bind to areas of high G + C content may be crucial for protection against DNA methylation. However, it remains unclear if proteins may also exist that recognise A + T rich regions and recruit DNA-methylating enzymes to them. Furthermore, the component that recruits the enzymes that add the inactivating marks to the histones in areas of high CpG frequency is also currently unknown. As such, there is still a lot to learn about island-binding proteins. However, research in this field looks promising and it will undoubtedly help guide us through the sea of information that is contained within our own genome, and the genomes of other species.


Article and author information

Author details

  1. Colum Walsh

    Biomedical Sciences Research Institute, University of Ulster, Coleraine, United Kingdom
    For correspondence
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9921-7506
  2. Avinash Thakur

    Biomedical Sciences Research Institute, University of Ulster, Coleraine, United Kingdom
    For correspondence
    Competing interests
    The authors declare that no competing interests exist.

Publication history

  1. Version of Record published: October 21, 2014 (version 1)


© 2014, Walsh and Thakur

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 1,868
    Page views
  • 78
  • 0

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Arnaud R Krebs et al.
    Research Article Updated

    The majority of mammalian promoters are CpG islands; regions of high CG density that require protection from DNA methylation to be functional. Importantly, how sequence architecture mediates this unmethylated state remains unclear. To address this question in a comprehensive manner, we developed a method to interrogate methylation states of hundreds of sequence variants inserted at the same genomic site in mouse embryonic stem cells. Using this assay, we were able to quantify the contribution of various sequence motifs towards the resulting DNA methylation state. Modeling of this comprehensive dataset revealed that CG density alone is a minor determinant of their unmethylated state. Instead, these data argue for a principal role for transcription factor binding sites, a prediction confirmed by testing synthetic mutant libraries. Taken together, these findings establish the hierarchy between the two cis-encoded mechanisms that define the DNA methylation state and thus the transcriptional competence of CpG islands.

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    David Mauduit et al.
    Research Article

    Understanding how enhancers drive cell type specificity and efficiently identifying them is essential for the development of innovative therapeutic strategies. In melanoma, the melanocytic (MEL) and the mesenchymal-like (MES) states present themselves with different responses to therapy, making the identification of specific enhancers highly relevant. Using massively parallel reporter assays (MPRA) in a panel of patient-derived melanoma lines (MM lines), we set to identify and decipher melanoma enhancers by first focusing on regions with state specific H3K27 acetylation close to differentially expressed genes. An in-depth evaluation of those regions was then pursued by investigating the activity of overlapping ATAC-seq peaks along with a full tiling of the acetylated regions with 190 bp sequences. Activity was observed in more than 60% of the selected regions and we were able to precisely locate the active enhancers within ATAC-seq peaks. Comparison of sequence content with activity, using the deep learning model DeepMEL2, revealed that AP-1 alone is responsible for the MES enhancer activity. In contrast, SOX10 and MITF both influence MEL enhancer function with SOX10 being required to achieve high levels of activity. Overall, our MPRAs shed light on the relationship between long and short sequences in terms of their sequence content, enhancer activity, and specificity across melanoma cell states.