Sequence features of retrotransposons allow for epigenetic variability

  1. Kevin R Costello
  2. Amy Leung
  3. Candi Trac
  4. Michael Lee
  5. Mudaser Basam
  6. J Andrew Pospisilik
  7. Dustin E Schones  Is a corresponding author
  1. Department of Diabetes Complications and Metabolism, Beckman Research Institute, United States
  2. Irell and Manella Graduate School of Biological Sciences, City of Hope, United States
  3. Van Andel Research Institute, United States

Abstract

Transposable elements (TEs) are mobile genetic elements that make up a large fraction of mammalian genomes. While select TEs have been co-opted in host genomes to have function, the majority of these elements are epigenetically silenced by DNA methylation in somatic cells. However, some TEs in mice, including the Intracisternal A-particle (IAP) subfamily of retrotransposons, have been shown to display interindividual variation in DNA methylation. Recent work has revealed that IAP sequence differences and strain-specific KRAB zinc finger proteins (KZFPs) may influence the methylation state of these IAPs. However, the mechanisms underlying the establishment and maintenance of interindividual variability in DNA methylation still remain unclear. Here, we report that sequence content and genomic context influence the likelihood that IAPs become variably methylated. IAPs that differ from consensus IAP sequences have altered KZFP recruitment that can lead to decreased KAP1 recruitment when in proximity of constitutively expressed genes. These variably methylated loci have a high CpG density, similar to CpG islands, and can be bound by ZF-CxxC proteins, providing a potential mechanism to maintain this permissive chromatin environment and protect from DNA methylation. These observations indicate that variably methylated IAPs escape silencing through both attenuation of KZFP binding and recognition by ZF-CxxC proteins to maintain a hypomethylated state.

Introduction

Sequences derived from transposable elements (TEs) make up a large fraction of mammalian genomes (Bourque et al., 2018). TEs are silenced by both histone modifications and DNA methylation to prevent these elements from having deleterious impact on the host genome. Certain TEs, however, can escape silencing and behave as sites of interindividual epigenetic variability (Bakshi et al., 2016; Dolinoy et al., 2007a; Gunasekara et al., 2019; Hernando-Herraez et al., 2015; Oey et al., 2015; Reiss et al., 2010). These loci have been reported to be sensitive to environmental stimuli such as diet and exposure to toxins (Dolinoy et al., 2007b; Morgan et al., 1999; Rosenfeld et al., 2013; Waterland et al., 2007; Waterland and Jirtle, 2003), although this field remains an area of active research (Bertozzi et al., 2021). The well-most studied metastable epiallele is present in the Agouti viable yellow mouse, where genetically identical mice display phenotypic diversity that manifests as a range of coat colors and susceptibility to obesity (Bernal et al., 2011; Rakyan et al., 2002). This variable coat color has been linked to the methylation state of a novel insertion of an Intracisternal A-particle (IAP) retrotransposon (Dolinoy et al., 2010). Variable methylation of this IAP contributes to variable transcription of the agouti gene across all tissues and variation in coat color, hyperphagia and hyperinsulinemia (Bernal et al., 2011; Rakyan et al., 2002). These observations in the Agouti viable mouse model highlights the impact of variable methylation of TEs on phenotypic outputs. Thus, elucidating the mechanisms underlying the establishment and maintenance of these variably methylated TEs is important to further our understanding of non-Mendelian phenotypic variability and epigenetic plasticity (Ecker et al., 2018; Ecker et al., 2017; Tejedor and Fraga, 2017).

Recent work profiling interindividual epigenetic variability in mice identified roughly 50 loci that are variably methylated in a manner similar to the IAP that drives the agouti mouse phenotype (Adams et al., 2012; Bertozzi et al., 2020; Elmer et al., 2020; Kazachenka et al., 2018). (Adams et al., 2012; Kazachenka et al., 2018). The majority of these variably methylated loci are IAP retrotransposons (VM-IAPs) (Bertozzi et al., 2020; Faulk et al., 2013; Kazachenka et al., 2018). IAPs are silenced in a sequence-specific manner by KRAB-domain containing zinc finger proteins (KZFPs) (Coluccio et al., 2018). KZFPs recruit KAP1, also known as TRIM28, which, in turn, recruits repressive protein complexes to deposit H3K9me3 and DNA methylation at these loci (Bulut-Karslioglu et al., 2014; Ecco et al., 2017; Ecco et al., 2016; Schultz et al., 2002; Turelli et al., 2014). While this has been thought to happen primarily in embryonic stem cells, recent work has demonstrated that KZFPs and KAP1 regulated TEs in somatic cells as well (Ecco et al., 2016). KZFPs have evolved in clusters through imperfect gene duplication and spontaneous mutations that can allow for the recognition of novel TEs (Imbeault et al., 2017; Kauzlaric et al., 2017). Over time, TEs can mutate and escape from KZFP targeted silencing (Imbeault et al., 2017; Kauzlaric et al., 2017; Wolf et al., 2020). It has been proposed that these KZFPs are engaged in an evolutionary arms race with TEs to maintain their silencing (Jacobs et al., 2014).

IAPs have an unusually high CpG content that is more similar to CpG islands (CGIs) than to the rest of the mouse genome (Elmer et al., 2020; Kazachenka et al., 2018). CGIs are generally hypomethylated and can recruit ZF-CxxC proteins, which help maintain a permissive chromatin environment (Clouaire et al., 2012; Gu et al., 2018; Mikkelsen et al., 2007; Thomson et al., 2010; van de Lagemaat et al., 2018). Notable CxxC domain containing proteins include CFP1, which is a subunit of the SET1 complex that deposit H3K4me3, and TET1/TET3, which are responsible for the active removal of methylation from CpG dinucleotides to maintain a hypomethylated state (Gu et al., 2018; Tahiliani et al., 2009; Williams et al., 2011). Mammalian genomes in general have an underrepresentation of CpG dinucleotides as a result of deamination at methylated cytosines over evolutionary time (Duncan and Miller, 1980; Crayle et al., 2016; Shen et al., 1994). The ‘richness’ of CpG dinucleotides at IAPs is potentially due to the fact these elements are more recent additions to the genome that have yet to undergo ‘evolutionary’ deamination at methylated CpG dinucleotides. It has been previously shown that recently evolved CpG dense TEs have the potential to be hypomethylated when there is no targeted suppression of these elements (Jacobs et al., 2014; Long et al., 2016; Ward et al., 2013).

Based on this evidence, we postulated that variably methylated CpG dense IAPs can have sequence variants that allow for partial escape from KZFP-mediated silencing (Bertozzi et al., 2020; Faulk et al., 2013; Kazachenka et al., 2018) and subsequently be bound by ZF-CxxC proteins, thereby preventing stable repression. Focusing on the IAPLTR1 subfamily identified as IAPLTR1_Mm in RepBase,, we performed multiple sequence alignment and hierarchical clustering of these elements and confirmed that the VM-IAPs have a unique sequence that is distinct from other IAPLTRs. These IAPLTRs have altered KZFP binding and diminished KAP1 recruitment. Relative to silenced IAPs – that are found in gene poor regions or proximal to tissue specific transcripts – we find that VM-IAPs are over-represented proximal to constitutively expressed genes. Importantly, we find that CpG dense TEs (regardless of subfamily) have increased recruitment of ZF-CxxC proteins such as CFP1 and TET1 in the absence of KZFP-mediated silencing.

Results

IAP sequence influences KZFP recruitment and establishment of variable methylation

Recent work has identified novel VM loci, which are largely IAPs of the IAPLTR1 and IAPLTR2 subfamilies, across tissues in C57BL/6J mice (Bertozzi et al., 2020; Elmer et al., 2020; Kazachenka et al., 2018). IAPLTR1 and IAPLTR2 are among the most evolutionarily recent IAPs (Qin et al., 2010) As KZFPs are responsible for silencing IAP elements in a sequence-specific manner (Coluccio et al., 2018), we examined if the previously identified VM-IAPs have shared sequence variants that could allow for escape from KZFP-mediated silencing at both the LTR and internal element. While many older IAPs have lost their internal elements over time, many IAPLTR1 and IAPLTR2, which among the most evolutionarily recent IAPLTRs, flank an IAPEz-int (Qin et al., 2010; Stoye, 2001). We performed multiple sequence alignment and hierarchical clustering of all LTRs belonging to the IAPLTR1 and IAPLTR2 subfamilies that were greater than 300 bps long using the ETE3 pipeline (Huerta-Cepas et al., 2016). This analysis treated all LTRs independently, regardless of whether the LTR was a solo LTR or part of an ERV. We also performed multiple sequence alignment and hierarchical clustering the first 150bps of IAPEz-ints flanked by either IAPLTR1 or IAPLTR2 elements, which has previously been show to recruit the KZFP Gm14419 (Wolf et al., 2020). This profiling resulted in the characterization of four sequence ‘clades’ for IAPLTR1s (Figure 1A, Figure 1—figure supplements 1 and 2 and Supplementary file 1), two sequence ‘clades’ for the IAPLTR2s (Figure 1—figure supplement 1 and Supplementary file 1), and three sequence clades for IAPEz-ints. All IAPLTRs flanking an IAPEz-int belonged to the same sequence clade.

Figure 1 with 8 supplements see all
Sequence and chromatin context influence the establishment of a VM-IAPLTRs.

(A) All IAPLTR1 elements larger than 300 bps and the first 150bps of IAPEz-ints flanked by IAPLTR1 or IAPLTRs were clustered by sequence using PhyML with default settings. Major sequence variants for IAPLTR1s were separated into separate clades: clade1 (blue), clade 2 (Green), clade 3 (red), and clade 4 (orange). Major sequence variants for IAPEz-ints were separated into separate clades: clade α (dark blue), clade β (light blue), and clade γ (yellow). KZFP and KAP1 ChIP-seq signal was mapped across the consensus IAPLTR1 sequence as determined by MAFFT multiple sequence alignment. Gaps in the multiple sequence alignment for the IAPLTR1 sequence are displayed as grey. Heatmaps for KZFP and KAP1 ChIP-seq signals centered on the 5’ end of the IAPEz-int elements are show as well (B) Average KAP1 ChIP-seq signal across all IAPLTR1 and IAPEz-int clades. Each data point refers to the average KAP1 ChIP-seq signal at an individual IAPLTR1 element. Mean and standard deviation for each clade is shown as well. (C) Distribution of the IAPLTR1 elements clades (outer circle) and IAPEz-int clades (inner circle) for all IAP elements and VM-IAP elements. (D) Percent of VM-IAPLTR1s clade three elements and non VM-IAPLTR1 clade three elements that are within 50 kb of a constitutively expressed gene or 1 kb enhancer element, as a proxy for constitutive euchromatin environment. (E) Conservation of IAPLTR1 and IAPEz-int variants across mouse strains. Presence or absence of a IAP was determined using structural variants identified from the Sanger mouse genome project. Mouse KZFP and KAP1 ChIP-seq date are from GEO: GSE115291. VM-loci coordinates were obtained from Elmer et al., 2020.

To determine whether these clades have altered KZFP recruitment, we used published KZFP ChIP-seq profiles (Wolf et al., 2020) to survey the occupancy of KZFPs at each identified clade. We aligned all ChIP-seq libraries to the mm10 genome with multimapping, but allowed no SNPs, as we were specifically interested in identifying which sequences the KZFPs could bind (see Materials and methods for details). We identified that clade 1 IAPLTR1 elements have evidence of binding by ZFP989 and Gm21082, while clade 2 IAPLTR1 elements have evidence of binding by ZFP429 (Figure 1A and Figure 1—figure supplement 3). However, clade 3 and clade 4 IAPLTR1 elements do not show evidence of binding by these KZFPs and have decreased KAP1 binding (Figure 1A and B and Figure 1—figure supplement 3). Similar analysis with IAPLTR2s demonstrated that clade 1 IAPLTR2s are bound by ZFP429, while clade 2 IAPLTR2 elements are largely bound by ZFP989 (Figure 1—figure supplement 4). Both clades have evidence of KAP1 binding (Figure 1—figure supplement 4). When profiling KZFP and KAP1 occupation at the IAPEz-int variants, we observed that clade γ elements displayed the weakest Gm14419 binding and the lowest KAP1 signal compared to clades α and β (Figure 1A and B). Additionally, we observed that IAPLTR2s can flank either clade α or β IAPEz-ints, (Figure 1—figure supplement 5), while IAPLTR1s can flank any IAPEz-int, including IAPEz-int clade γ elements that have decreased KAP1 binding (Figure 1C). These results suggest that both LTR and internal sequence variants can result in altered KZFP recruitment and KAP1-mediated silencing.

Finally, we were interested in profiling whether the elements that have decreased KZFP-mediated silencing were significantly more likely to be variably methylated. We observed that VM-IAPLTR1s were significantly enriched at IAPs with IAPLTR1 clade three elements flanking IAPEz-int clade γ elements (Figure 1C), and VM-IAPLTR2 elements were significantly enriched at solo IAPLTR2 clade 1 elements (P-value < 1e-16 as determined using a Fisher exact test) (Figure 1—figure supplement 5). Consistent with previous work (Bertozzi et al., 2020; Faulk et al., 2013; Kazachenka et al., 2018), these results suggest that sequence is a contributing factor in the establishment of VM-IAPs. Overall, these results demonstrate that the IAPs that are most prone to be variably methylated are those that are escaping KZFP-mediated silencing.

VM-IAP are proximal to constitutively expressed transcripts and enhancer elements

While sequence has been shown to be a factor in the establishment of variable methylation (Bertozzi et al., 2020; Faulk et al., 2013; Kazachenka et al., 2018), not all IAPs with sequences identical to VM-IAPs are variably methylated (Figure 1A), as has been reported previously (Kazachenka et al., 2018). Previous studies have identified that CTCF enrichment, a protein involved in regulating chromatin architecture, has been bound at these VM-IAPs when hypomethylated (Elmer et al., 2020; Kazachenka et al., 2018). However, as CTCF binding motifs appear to be present at both VM-IAPs and silenced IAPs (Elmer et al., 2020) we examined whether genomic context had an impact on the establishment of variable methylation at the VM-IAPs. We used proximity of the IAP to constitutively expressed genes (Li et al., 2017) and ENCODE annotated enhancer elements from any cell type (Moore et al., 2020) as our proxy for a euchromatic environment (see Materials and methods for details). We found that 83 % of the VM-IAPLTR1s were within 50 kb of a constitutively expressed gene, while the remaining 17 % of VM-IAPLTR1s were less than 1 kb from an annotated enhancer element. In contrast, only 12 % of the non-VM IAPs were proximal to a constitutively expressed gene and 7 % were proximal to an enhancer (Figure 1D). Clade 3 VM-IAPLTR1s were more likely to be proximal to a constitutively expressed gene than other clade 3 IAPLTR1s regardless of the FPKM or distance profiled (Figure 1—figure supplement 6). This indicates that while sequence allows these elements to have the potential to become variably methylated, VM-IAPs are present in a permissive chromatin environment to escape being silenced, as heterochromatin spreading from neighboring loci may drive silencing of these elements.

Conservation of IAPs between mouse strains

IAP elements are a potential source of novelty between mouse strains and VM-IAPs in particular have been shown to be highly polymorphic (Gagnier et al., 2019; Kazachenka et al., 2018; Nellåker et al., 2012; Rebollo et al., 2020). In order to investigate the relationship between IAP sequence, potential for variable methylation and conservation across strains, we leveraged structural variant differences between mouse strains identified by the Sanger mouse genome project (Keane et al., 2011). We found that clade 3–4 IAPLTR1s, which have the lowest KAP1 binding of the IAPLTR1 clades and IAPEz-int clade γ elements are more polymorphic across mouse strains than the other clades (Figure 1E). Additionally, we found that IAPLTR2 clade 1 elements, which are more prone to be variably methylated, are more likely to be polymorphic across mouse strains (Figure 1—figure supplement 7). Thus, IAPs that have diminished KZFP recruitment are more likely to be polymorphic between mouse strains (Figure 1—figure supplement 8).

IAPs that have loss of KZFP binding are bound by the ZF-CxxC containing proteins TET1 and CFP1

It has previously been reported that IAPLTR1s and IAPLTR2s have a significantly higher CpG density than the genomic average (Elmer et al., 2020; Kazachenka et al., 2018). Given the prevalence of variable methylation at these elements, we wanted to assess the relationship between CpG content and variable methylation directly. For each TE subfamily, we profiled the CpG density of all LTRs (see Materials and methods for details) and the percentage of elements that were variably methylated using the previously identified list of VM-TEs in C57BL/6J mice (Kazachenka et al., 2018). We observed that TE subfamilies with a higher average CpG density had a larger percentage of elements that are variably methylated, with IAPLTRs being the most CpG dense and having the largest percentage of variably methylated elements (Figure 2A). The CpG content of VM-IAPLTR1s and VM-IAPLTR2s is similar to non-VM-IAPLTR1s and VM-IAPLTR2s, indicating that high CpG content alone is not sufficient to induce variable methylation (Figure 2B). As CpG dense loci are capable of being recognized and bound by ZF-CxxC proteins (Blackledge et al., 2010; Thomson et al., 2010), we examined the ability of ZF-CxxC proteins to bind to IAPLTR1s. Utilizing a publicly available ChIP-seq dataset for the ZF-CxxC-domain containing protein TET1 in mouse embryonic stem cells (mESCs) (Gu et al., 2018), we identified increased TET1 binding at clade3 IAPLTR1s elements, which contains the most variably methylated elements (Figure 2C and Figure 2—figure supplement 1). We furthermore profiled CpG methylation of VM-IAPLTRs in Tet1 knockout (KO) and Tet1/Dnmt3a double knockout (DKO) mESCs (Gu et al., 2018) and found that VM-IAPs displayed a significant increase in CpG methylation in the Tet1 KO cells (Figure 2D). This increase in methylation is reversed when both Dnmt3a and Tet1 are knocked out in mESCs (Figure 2D). We also profiled the recruitment of another ZF-CxxC protein, CFP1, using existing CFP1 ChIP-seq data from C57Bl/6 mice and observed CFP1 binding appears to be specific to VM-IAPLTRs (Figure 2—figure supplement 2). As CFP1 is a subunit of the SET1 complex responsible for depositing H3K4me3, we examined publicly available H3K4me3 ChIP-seq data from B cells (Adams et al., 2012) to determine if H3K4me3 is enriched at VM-IAPs. Consistent with previous observations (Bertozzi et al., 2020), we observed that VM-IAPs were more likely to ben enriched for H3K4me3 compared to non-variable IAPs (Figure 2—figure supplement 3). De novo motif discovery applied to VM-IAP sequences furthermore revealed known recognition sequences for ZF-CxxC proteins such as CFP1 (Figure 2—figure supplement 4). ZF-CxxC protein binding has been shown to protect the underlying DNA from methylation (Blackledge et al., 2010; Thomson et al., 2010). From these results we propose that recruitment of ZF-CxxC proteins, which have been shown to be involved in maintaining a euchromatic state at CpG islands, may be involved in the establishment of variable methylation at VM-IAPs in mice.

Figure 2 with 4 supplements see all
Divergent VM-IAPs elements have high CpG density and recruitment of ZF-CxxC proteins.

(A) Percent of ERV LTRs elements that are variably methylated for a given TE subfamily and average maximum CpG score of the TE subfamily. The size of each dot is determined by the number of VM-loci for each subfamily, with the largest dots indicating the greatest number of variably methylated elements. Average CpG score was determined by identifying the most CpG dense 200 bp window of each LTR and calculating the average CpG score for the whole subfamily. (B) Average CpG density of silenced and variably methylated IAPLTRs, as well as the average CpG score of a randomly selected background the same size as the IAPLTRs. Each dot refers to an individual IAPLTR element in the mm10 genome. Mean and standard deviation for each group is shown. (C) Aggregate plots of TET1 ChIP-seq signal across all identified IAPLTR1 clades. (D) Average methylation percentage of non-variable and VM-IAPLTR elements in wild type, Tet1 knockout (KO), and Dnmt3a/Tet1 double knockout cells (DKO). Each dot refers to the average methylation of an individual IAPLTR element. Only CpGs with >5 x coverage were retained to calculate methylation. Significance determined using a Wilcoxon rank sum test. *** indicates a p.value <0.0001. Bars for mean and standard deviation is shown as well for (B and D). TET1 ChIP-seq from GEO:GSE100957. Mouse Tet1 and Tet1/Dnmt3a DKO WGBS from GEO:GSE134396.

To determine if evolutionarily recent CpG-dense TEs are also more prone to be variably methylated in humans, we used a previously identified list of loci that display interindividual epigenetic variation in humans (Gunasekara et al., 2019). We determined the observed frequency at which variably methylated loci were found to be present on a TEs and compared this to an expected distribution if the variable methylation was only due to chance. Expected distribution was determined by selecting random loci in the genome the same size of the variably methylated loci. We observed that evolutionarily recent TEs were overrepresented in the variably methylated dataset (Figure 3A). TE subfamily ages were obtained from DFAM (Hubley et al., 2016). We also found the TE subfamilies with a higher CpG density are more likely to be variably methylated (Figure 3B and Figure 3—figure supplement 1), with the highly CpG dense transposable element subfamily LTR12C having both one of the highest CpG densities and one of highest percentage of elements that are variably methylated. LTR12Cs are recently evolved TEs that have previously been shown to have latent regulatory potential (Ward et al., 2013) and can act as cryptic promoter elements in cancer cells (Babaian and Mager, 2016). These results demonstrate a correlation between CpG density and variable methylation in both mice and humans.

Figure 3 with 1 supplement see all
Variably methylated loci in humans are enriched for evolutionarily recent CpG-dense TEs.

(A) Observed over expected distribution of VM-TE elements in humans stratified by the evolutionary age of the TE. Expected distribution was determined using a random sampling of the hg38 genome the same size as the VM-loci. Evolutionary age for each TE subfamily was obtained from DFAM. (B) Scatterplot showing the percent of LTR elements which display VM for a given TE subfamily and average CpG score of the TE subfamily. The size of each dot is determined by the number of VM-loci for each subfamily. Average CpG score was determined by identifying the most CpG dense 200 bp window of each ERV and calculating the average CpG score for the whole subfamily. The coordinates of VM-loci for humans were obtained from Gunasekara et al., 2019.

CpG dense TEs are hypomethylated and recruit ZF-CxxC proteins in the absence of KZFP-mediated silencing

To further investigate the relationship between KZFP/KAP1 silencing, DNA methylation, and ZF-CxxC binding to CpG dense TEs, we profiled DNA methylation and ZF-CxxC binding in Tc1 mouse liver. This ‘transchromosomic’ mouse carries the majority of human chromosome 21 (Hsa21), which possesses human specific transposable elements without the corresponding KZFPs for silencing (O’Doherty et al., 2005) and has previously been used to identify the latent regulatory potential of primate specific TEs (Jacobs et al., 2014; Long et al., 2016; Ward et al., 2013). We performed MinION nanopore sequencing to globally profile DNA methylation of human chromosome 21 in Tc1 mouse liver and compared this with existing DNA methylation data from human liver cells (Li et al., 2016; Figure 4A and B). Consistent with previous reports (Jacobs et al., 2014; Long et al., 2016; Ward et al., 2013), we found that TE-derived CGIs were hypomethylated in the absence of KZFP-mediated silencing (Figure 4B). In contrast, TE-derived CGIs that can be targeted by KZFPs shared between mice and humans were largely methylated (Figure 4—figure supplement 1). These silenced TE-derived CGIs are largely Alu elements that are derived from the same ancestral sequence as mouse B2 SINE elements and can be silenced by the KZFP ZNF182, which is present in both mice and humans (Imbeault et al., 2017). Additionally, other primate specific TEs that are not CpG rich are methylated in both the Tc1 and human livers (Figured 4B), highlighting the importance of CpG density in escaping CpG methylation at TEs. To profile ZF-CxxC occupancy at these hypomethylated TEs, we performed CUT&RUN for CFP1, a ZF-CxxC protein expressed in somatic tissue, in Tc1 mouse livers and compared this with existing CFP1 ChIP-seq data from human erythroblasts (van de Lagemaat et al., 2018). We observed that CpG dense TEs that are not targeted by KZFPs displayed CFP1 recruitment, while the methylated CpG poor TEs have no CFP1 signal (Figure 4C). This CFP1 binding was also unique to the Tc1 mice as there was no CFP1 binding at these loci in humans (Figure 4C), while non-TE CGIs in both the mouse and human genomes have CFP1 binding (Figure 4—figure supplement 2). Together this work demonstrates that in the absence of targeted KZFP-mediated silencing, TEs with high CpG content can be hypomethylated and bound by ZF-CxxC proteins.

Figure 4 with 2 supplements see all
CpG dense TEs are hypomethylated and recruit ZF-CxxC proteins in the absence of KZFP-mediated silencing.

(A) UCSC genome browser screenshot of an LTR12C element which is hypomethylated in Tc1 mice and shows novel CFP1 recruitment in Tc1 mice. (B) CpG methylation of TE CpG islands (CGIs) and other TEs for human chromosome 21 in both Tc1 mouse and human genomes. Each data point refers to an individual CpG island that had greater than 5 x coverage. Mean and standard deviation for DNA methylation at is shown as well. (C) CFP1 signal in Tc1 mice and humans at the human TE CGIs and other TEs (from B) on human chromosome 21. Human CFP1 ChIP-seq from GEO:GSM3132538. WGBS from human liver GEO:GSM1716957.

Trim28 haploinsufficiency activates evolutionarily recent TEs

As TRIM28/KAP1 is a key protein involved in the establishment of heterochromatin at TEs, we profiled Trim28 haploinsufficient mice to determine if decreased abundance of TRIM28 allows for CpG dense TEs to escape silencing. Trim28 haploinsufficiency has previously been shown to impact the Agouti viable yellow mouse coat color (Daxinger et al., 2016) and drive a bimodal obesity phenotype in FVB/NJ mice (Dalgaard et al., 2016). We performed both H3K4me3 ChIP-seq and CFP1 CUT&RUN in the livers of Trim28 haploinsufficient (Trim28+/D9) mice and their wild-type littermates (Materials and methods; Figure 5A). We identified 69 loci that had novel H3K4me3 signal in the Trim28+/D9 mice compared to control (Materials and methods; Figure 6—figure supplement 1). The majority of these loci with novel H3K4me3 recruitment were evolutionarily recent TEs (Figure 6B and - supplement 2). These loci also had an increase in CFP1 signal in Trim28+/D9 mice relative to control mice (Figure 6C and Figure 6—figure supplement 1). Additionally, we profiled global changes in expression of TE subfamilies in the Trim28+/D9 mice compared to wild-type mice using our previously generated RNA-seq datasets (Dalgaard et al., 2016). We identified that TEs which have a significant increase in gene expression in the Trim28+/D9 mice compared to wild-type mice have significantly higher CpG density than expected when compared to non-responsive TE subfamilies (Figure 6D). Together this work demonstrates that loss of KAP1-mediated silencing allows for some evolutionary recent TEs to be active and recruit ZF-CxxC proteins.

Figure 5 with 2 supplements see all
Trim28 haploinsufficiency leads to activation of evolutionarily recent and CpG dense TEs.

(A) Genome screenshot of an IAPLTR2 element with novel H3K4me3 enrichment in Trim28 haploinsufficient mice. (B) Breakdown of loci with novel H3K4me3 in Trim28 haploinsufficient mice. Age of each TE was determined by DFAM. (C) Aggregate plots of H3K4me3 and CFP11 signal across all loci that have novel H3K4me3 signal in Trim28 haploinsufficient mice (D) CpG Score of TE subfamilies with a global increase expression in Trim28 haploinsufficient mice and a random selection of non-responsive TEs subfamilies. Expression levels for each TE subfamily was determined using RepEnrich, and Deseq2 was used to determine TE subfamilies with a significant increase in expression in the Trim28 haploinsufficient mice. Bar is placed at mean and error bars cover one standard deviation. Each data point refers to an individual TE subfamily. p-Values for CpG density difference was calculated using a Wilcoxon rank sum test. Trim28 haploinsufficient and wild-type mouse RNA-seq was obtained from ENA:PRJEB11740.

Figure 6 with 1 supplement see all
Model for variable methylated transposable elements.

Loci with high CpG density and loss of KZFP binding have the potential to recruit ZF-CxxC proteins to protect these TEs from being silenced. However, elements with high CpG density but strong KZFP recruitment will remain methylated.

Discussion

This work provides the foundation for a deeper understanding of how mammalian genomes are shaped during evolution. It has been proposed that genomes are locked in an ‘evolutionary arms race’ between endogenous retrovirus and KZFPs to maintain proper silencing of the genome. The variable methylation observed at IAPs could be the result of the IAPs winning this ‘battle’. We observed that VM-IAPs have unique sequence variants that allow them to have diminished KZFP recruitment. This provides an explanation to previous works that identified sequence biases for variably methylated IAPs in mice (Bertozzi et al., 2020; Faulk et al., 2013; Kazachenka et al., 2018). We also found that IAPs with the same sequence variants as the VM-IAPs are more likely to be polymorphic between individual mouse strains. Interestingly, we find that the IAPLTRs flanking the ‘master’ IAP that has been proposed to be responsible for an expansion of IAPs in C3H/HeJ mice (Rebollo et al., 2020) has the same sequence as the clade 3 IAPLTR1s (Figure 6—figure supplement 1). Together this implies that these VM-IAPs may be more active in certain mouse strains and may be propagating due to a lack of repression.

Our work also highlights the potential importance of CpG density in the silencing of the endogenous retroviruses. We observed that VM-IAPs have recruitment of ZF-CxxC proteins. In the absence of targeted silencing, CpG-dense TEs have the potential to become hypomethylated and recruit ZF-CxxC proteins, while CpG poor TEs remain methylated. As TEs can lose CpG density over time, the observation that the CpG rich TEs have the potential for reactivation supports the idea that deamination of methylated loci could be one means for driving permanent silencing of TEs over time (Long et al., 2016).

Based on our results, we propose a model that VM-IAP loci are established as a result of partial KZFP mediated silencing at CpG dense TEs, which can then be protected from silencing through recruitment of ZF-CxxC containing proteins such as TET1 (Figure 6). This potentially results in a stochastically silenced locus, which is then inherited by all tissues during development. Together our work has significant implications in furthering our understanding of how TEs can alter both the genomes and epigenomes of mammals.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Mus musculus)Trim29+/D9Blewitt et al., 2005(RRID:MGI:3821610)Haploinsufficient for Trim28
Strain, strain background (Mus musculus)B6129S-Tc(HSA21)1TybEmcf/JThe Jackson LaboratoryStock No: 010801 (JAX)(RRID:IMSR_JAX:010801)2 Mb of a freely segregating human fragment of Chr21
AntibodyRabbit Polyclonal anti-CFP1 antibodyMilliporeABE211(RRID:AB_10806210)CUT&RUN(1:50 dilution)
AntibodyRabbit polyclonal anti-H3K4me3AntibodyAbcamab8580(RRID:AB_306649)ChIP-seq(2 µg antibody per25 µg chromatin)

Multiple sequence alignment and hierarchical clustering

Request a detailed protocol

IAPLTR1_Mm, IAPLTR2_Mm, and IAPEz-int elements were defined by RepeatMasker (RRID:SCR_012954) (Smit, AFA, Hubley, R & Smit et al., 2010) in the mm10 genome. Only IAPLTRs greater than 300bps in length were profiled. For IAPEz-ints, only IAPEz-int elements flanked by either IAPLTR1 or IAPLTR2 elements were selected for profiling, and only the first 150bps of the 5’ end of the IAPEz-int were aligned and clustered. All elements were processed using the ETE3 pipeline (Huerta-Cepas et al., 2016) using MAFFT (RRID:SCR_011811) with default settings (Katoh and Standley, 2013), and bases that were found in less than 10 % of the samples were trimmed from the multiple sequence alignment using trimAL (RRID:SCR_017334) (Capella-Gutiérrez et al., 2009). The multiple sequence alignments were then hierarchically clustered using PhyML (RRID:SCR_014629) (Guindon et al., 2010). Sequence clades were empirically selected.

Alignment of existing ChIP-Seq data

Request a detailed protocol

Reads were trimmed using Trimgalore version 0.5.0 (RRID:SCR_011847), which utilizes cutadapt (RRID:SCR_011841) (Martin, 2011), and were aligned to either the mm10 genome for mouse data, hg38 genome for human data, or a custom assembly that included both the mm10 genome and human chromosome 21 for the Tc1 mice. Reads were aligned using bowtie1 version 1.2.3 (RRID:SCR_005476) retaining only reads that could be mapped to unambiguously to a single locus using the -m one option (Langmead et al., 2009). Aligned reads were sorted using samtools version 1.10 (RRID:SCR_002105) (Li et al., 2009) and filtered to remove duplicate reads using the MarkDuplicates function of Picardtools version 2.21.1 (RRID:SCR_006525). Regions of enrichment were called using the callpeaks function of MACS2 version 2.2.5 (RRID:SCR_01329) (Zhang et al., 2008) with a q-val threshold of 1e-3.

KZFP and KAP1 data in Figure 1 was aligned using different parameters to allow for potential multimapping as the reads were too short to provide confident unique mapping at repetitive elements. These reads were aligned to the mm10 genome using bowtie2 version 2.3.5.1 (RRID:SCR_016368) with the --end-to-end --very-sensitive options (Langmead and Salzberg, 2012). Aligned reads were sorted using samtools version 1.10 (RRID:SCR_002105) (Li et al., 2009) and filtered for duplicate reads using the MarkDuplicates function of Picardtools version 2.21.1 (RRID:SCR_006525). Regions of enrichment were called using MACS2 version 2.2.5 callpeaks (RRID:SCR_01329) (Zhang et al., 2008) with a q-val threshold of 1e-3.

For all datasets, Bedgraph files were generated using bedtools version 2.29.0 genomecov (RRID:SCR_006646) with the -bg -ibam options (Quinlan and Hall, 2010). BigWigs (RRID:SCR_007708) were generated using the UCSCtools bedGraphToBigWig (Karolchik et al., 2004). Heatmaps of ChIP-seq signal across the IAP multiple sequence alignments were generated using a custom script to profile the read coverage at each base and were visualized using pheatmap (RRID:SCR_016418). All other heatmaps and aggregate plots of loci that extend were generated using deeptools (RRID:SCR_016366) (Ramírez et al., 2016).

Genomic context analysis

Request a detailed protocol

We used proximity to a constitutively expressed gene or annotated enhancer element as a proxy for profiling a euchromatic environment. Constitutive expressed genes were identified as genes that have a fragments per kilobase of transcript per million mapped reads (FPKM) greater than two in more than 90 % of the samples previously profiled in Li et al., 2017, while enhancers were all ENCODE annotated enhancer elements (RRID:SCR_006793) (Moore et al., 2020). While FPKM of >2 was used, observed trends appeared regardless of FPKM threshold set (Figure 1—figure supplement 6).

Polymorphism analysis between mouse strains

Request a detailed protocol

Coordinates of the structural variants between mouse strains were obtained from the Sanger mouse genome project (RRID:SCR_006239) (Keane et al., 2011). Deletions between mouse strains were extracted for each of the profiled mouse strains. Bedtools version 2.29.0 intersect (RRID:SCR_006646) (Quinlan and Hall, 2010) with the -f one option was used to determine if an IAP was fully deleted in each mouse strain relative to C57BL/6 J. Phylogeny of the mouse strains was obtained from previous work (Nellåker et al., 2012). For branches where multiple mouse strains are similarly diverged from C57BL6/J, the IAP was considered to be present if the IAP existed in any of profiled mice at that branch.

Profiling of CpG score

Request a detailed protocol

CpG score was calculated as previously described (Gardiner-Garden and Frommer, 1987). Briefly, we calculated CpG score as Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G). When profiling the CpG density of across the TE using sliding 200 bp tiles with 10 bp steps between windows and selected the most CpG dense portion of the TE to remove and impact of TE size. Window size was chosen to match the minimum size of a CpG island (Gardiner-Garden and Frommer, 1987). Differences in CpG density across an element was most pronounced on L1 elements, which can contain a CpG island in their 5’ UTR but have low CpG density across the rest of the element. TEs with low copy numbers, such as IAPLTR4, were removed from this analysis.

MinION sequencing and alignment

Request a detailed protocol

Livers from B6129S-Tc(HSA21)1TybEmcf/J mice (RRID:IMSR_JAX:010801) were purchased from JAX. Purified DNA from Tc1 mouse liver was prepared using the SQK-LSK109 ligation sequencing kit and run on R9 flow cells purchased from Nanopore. 1 ug of DNA was end repaired with NEBNext FFPE DNA repair and Ultra II End-prep purchased from New England Biolabs. The repaired DNA was cleaned with KAPA Pure Beads. Adapters were ligated using NEBNext quick T4 ligase purchased from New England Biolabs. Prepared libraries were mixed with Sequencing Buffer from the SQK-LSK109 kit and approximately 200 ng was loaded onto the MinION flowcell. Reads were aligned using bwa mem with the options -x ont2d -t 100 (RRID:SCR_010910) (Li, 2013) to a custom assembly that included all of the mm10 genome and human chromosome 21, and then sorted using samtools version 1.10 (RRID:SCR_002105) (Li et al., 2009). Reads were processed using minimap2 (RRID:SCR_018550) with the options -a -x map-ont (Li, 2018). Methylation state of CpGs was called using nanopolish (RRID:SCR_016157) with the options call-methylation -t 100 (Loman et al., 2015). Only loci with greater than 5 x coverage were considered in the analysis. CpG islands were obtained from the UCSC table browser (Gardiner-Garden and Frommer, 1987; Karolchik et al., 2004). Methylation percentage was averaged across CpG islands.

CFP1 CUT&RUN sequencing and alignment

Request a detailed protocol

CUT&RUN was performed as previously described (Skene and Henikoff, 2017) on livers from B6129S-Tc(HSA21)1TybEmcf/J mice (RRID:IMSR_JAX:010801) purchased from JAX and Trim28+/D9 livers using the CFP1 antibody (RRID:AB_10806210) (ABE211, Millipore). Briefly, unfixed permeabilized cells are incubated with the CFP1 antibody fused to A-Micrococcal Nuclease. Fragmented DNA was isolated and sequenced on an Illumina HiSeq 2,500 System. We assessed standard QC measures on the FASTQ file using FASTQC (RRID:SCR_014583) and adapters were trimmed using Trimgalore version 0.5.0 (RRID:SCR_011847). Trimmed reads were aligned to a custom assembly that included both the mm10 genome and human chromosome 21, or only the mm10 genome using bowtie1 version 1.2.3 (RRID:SCR_005476) using the -m one option to only retain reads that could be mapped to unambiguously to a single locus (Langmead et al., 2009). Aligned reads were sorted using samtools version 1.10 (RRID:SCR_002105) (Li et al., 2009) and filtered for duplicate reads using the MarkDuplicates function of Picardtools version 2.21.1 (RRID:SCR_006525). Regions of CFP1 enrichment were called using the call peaks function of MACS2 version 2.2.5 (RRID:SCR_01329) (Zhang et al., 2008) with a q-val threshold of 1e-3. Only peaks found in two animals were retained to remove biological noise. Additionally, due to small fragment size, loci that could not be mapped unambiguously by 90 bp fragments were removed from consideration using the deadzones tool from RSEG (RRID:SCR_007695) (Song and Smith, 2011).

Animals

The generation of Trim28+/D9 mice (RRID:MGI:3821610) has been described elsewhere (Blewitt et al., 2005). Animals were kept on a 12 hr light/dark cycle with free access to food and water and housed in accordance with international guidelines. Livers were isolated at 2 weeks of age and immediately snap-frozen for further processing.

Tc1 mouse livers (RRID:IMSR_JAX:010801), freshly harvested and snap frozen, were purchased from JAX Laboratories (Stock No: 010801).

ChIP-Seq analysis of H3K4me3 ChIP-Seq

Request a detailed protocol

Chromatin immunoprecipitation was performed with an H3K4me3 antibody (RRID:AB_306649) (ab8580, Abcam) as previously described (Leung et al., 2013). Isolated DNA was sequenced on an Illumina HiSeq 2,500 System. We assessed standard QC measures on the FASTQ file using FASTQC (RRID:SCR_014583) and adapters were trimmed using Trimgalore version 0.5.0 (RRID:SCR_011847). Reads were aligned to the mm10 genome using bowtie1 version 1.2.3 (RRID:SCR_005476) using the -m one option to only retain reads that could be mapped to unambiguously to a single locus (Langmead et al., 2009). Aligned reads were sorted using samtools version 1.10 (RRID:SCR_002105) (Li et al., 2009) and filtered for duplicate reads using the MarkDuplicates function of Picardtools version 2.21.1 (RRID:SCR_006525). Regions of enrichment were called using the call peaks function of MACS2 version 2.2.5 (RRID:SCR_01329) (Zhang et al., 2008) with a q-val threshold of 1e-3. Loci that displayed enrichment of H3K4me3 in at least two animals were retained to remove biological noise. The loci with increased H3K4me3 as shown in Figure 6B–C were identified by finding peaks present in Trim28+/D9 mice that were absent from WT mice and also had 3 x more H3K4me3 signal in Trim28+/D9 mice compared to WT. Heatmaps and aggregate plots were generated using deeptools (RRID:SCR_016366) (Ramírez et al., 2016).

Differential expression analysis of repetitive element subfamilies

Request a detailed protocol

RNA-seq reads were downloaded from ENA:PRJEB11740, and adapters were trimmed using Trimgalore version 0.5.0 (RRID:SCR_011847). Reads were then aligned and processed using the RepEnrich2 pipeline RRID:SCR_021733 as previously described (Criscione et al., 2014). This package sums the number of reads mapping to each repetitive element subfamily. Differentially expressed subfamilies between Trim28+/D9 and wild-type mice were identified using DEseq2 (RRID:SCR_015687) (Love et al., 2014) with a p-val threshold of 0.05.

Data availability

All datasets generated in this study have been submitted to GEO under accession code GSE176176.

The following data sets were generated
    1. Costello KR
    2. Leung A
    3. Trac C
    4. Lee M
    5. Basam M
    6. Pospisilik JA
    7. Schones DE
    (2021) NCBI Gene Expression Omnibus
    ID GSE176176. Mechanisms of interindividual epigenetic variability at CpG dense transposable elements.
The following previously published data sets were used
    1. Wolf G
    2. de Iaco A
    3. Sun MA
    4. Bruno M
    5. Tinkham M
    6. Hoang D
    7. Mitra AP
    8. Ralls S
    9. Trono D
    10. Macfarlan TS
    (2019) NCBI Gene Expression Omnibus
    ID GSE115291. Retrotransposon reactivation and mobilization upon deletions of megabase-scale KRAB zinc finger gene clusters in mice.
    1. Li X
    2. Liu Y
    3. Salz T
    4. Hansen KD
    5. Feinberg A
    (2016) NCBI Gene Expression Omnibus
    ID GSM1716957. Whole genome analysis of the methylome and hydroxymethylome in normal and malignant lung and liver.
    1. Li B
    2. Qing T
    3. Zhu J
    4. Wen Z
    5. Yu Y
    6. Fukumura R
    7. Zheng Y
    8. Gondo Y
    9. Shi L
    (2017) NCBI BioProject
    ID PRJNA375882. A Comprehensive Mouse Transcriptomic BodyMap across 17 Tissues by RNA-seq.

References

  1. Book
    1. Bernal AJ
    2. Murphy SK
    3. Jirtle RL
    (2011)
    Mouse Models of Epigenetic Inheritance
    In: Tollefsbol T, editors. Handbook of Epigenetics. Academic Press. pp. 233–249.
    1. Tejedor JR
    2. Fraga MF
    (2017) Interindividual epigenetic variability: Sound or noise?
    BioEssays: News and Reviews in Molecular, Cellular and Developmental Biology 39:201700055.
    https://doi.org/10.1002/bies.201700055

Decision letter

  1. Deborah Bourc'his
    Reviewing Editor; Institut Curie, France
  2. Patricia J Wittkopp
    Senior Editor; University of Michigan, United States
  3. Todd S Macfarlan
    Reviewer; The Eunice Kennedy Shriver National Institutes of Child Health and Human Development, NIH, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

This study aims at understanding how certain retrotransposons escape chromatin-based silencing. Focusing on variably methylated IAP copies (VM-IAPs) in the mouse, the authors show that escapees share sequence variations that alter KRAB zinc finger protein (KZFP) binding and KAP1 recruitment, proximity to expressed genes and high CpG content. Analysis of human retrotransposons in a human KZFP-free mouse cell line recapitulates some of these observations. The authors propose that ZF-CxxC proteins play a role in establishing permissive chromatin at retrotransposons that harbor high CpG content and weak KZFP binding. Although correlative, these findings open the path for further mechanistical demonstration. The paper is of interest to readers in the field of epigenetics, genome biology and transposable elements.

Decision letter after peer review:

Thank you for submitting your article "Mechanisms regulating interindividual epigenetic variability at transposable elements" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Patricia Wittkopp as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Todd S Macfarlan (Reviewer #4).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The reviewers all agreed on the interest of the question and the significance and robustness of the analyses, even though the conclusions may not be entirely novel in some places. They also raised the requirement for 1- providing improved method description, 2-performing additional bioinformatic analyses, 3- not overinterpreting correlative data, 4- discussing previous literature. Please refer to the main three points further expanded below to prepare your revisions, and please consult the detailed points raised by the reviewers, for further modification, editing and discussion in a point-by-point rebuttal.

1) More information is needed regarding the Material and Methods. This is key to understand how the data were treated in terms of bioinformatic analyses: i) precise everywhere it is required whether unique or multiple mapping was used, ii) precise whether 5'LTR and/or 3'LTR were used for LTR alignment when referring to full length VM-IAP copies, iii) considering the strain-specificity of some VM-IAPS and KZFP controllers, when relying on public mouse datasets, specify from which mouse strain they come from. The same mouse strain should ideally be used throughout the study.

2) Additional bioinformatic analyses are needed to strengthen the paper:

– plot KAP1 across IAPLTR2 sequences and over IAPEz internal sequence (reviewer #4).

– in VM versus non-VM comparisons, please use a random set of sequences to compare the same number of sequences (the number of VM-IAPs being overwhelmingly smaller compared to other categories of IAPS (non-VM, others)) (reviewer #3).

– use GTEX human liver methylation instead of the HepG2 cell line WGBS dataset (reviewer #2).

– from their CPF1 CandR in wildtype liver, the authors should analyze the level of CPF1 binding at VM-IAPs specifically, as they did for H3K4me3 (from public B cell dataset).

4 – Data are only correlative as this stage. VM-IAPs are less bound by KAP1/KZFP, but whether this lower recruitment is key to the VM status would need to be genetically tested: by modifying the KZFP binding site of a unique VM-IAP to convert towards weaker or stronger binding. Similarly, whether CFP1 presence on VM-IAPs was not demonstrated here to be causative of the VM-IAP status. IAPs that fail to be protected by KZFPs may just be accessible to all DNA binding proteins. It would require testing whether KAP1 or KZFP loss leads to CFP1 recruitment and then test if CPF1 is required for the VM status in mice. Considering the amount of time and work that these approaches would require, they are not requested within the frame of this revision. However, without formal demonstration, the authors should balance their discussion, and acknowledge that their data provide correlative, not causative, evidence, and should be treated as such. Reviewer #4 listed several instances where the text should be toned down.

Reviewer #2 (Recommendations for the authors):

Overall I think this could be a good manuscript if more emphasis was put on precision and an extensive description of patterns observed, as well as a more detailed analysis and a balanced discussion of the CPF1 results and their meaning for our mechanistical understanding of VM-IAPs and other epistable epialles. In addition to the points mentioned in the public review:

– I think the title is not supported by the content of the article and should be modified to something more neutral that showcases the novel findings but does not overplay them. There is no new 'mechanism' in this article, but a few correlations and observations, some of which could suggest future research avenues to prove if they play a mechanistical role. Similarly, claims of the same nature in the abstract, results and Discussion sections should be toned down.

– Much of my problems with the current manuscript have to be with the interpretation of the CPF1 findings – in my view there is absolutely no reason to think there is a mechanistical link between VM-IAP status and CPF1. Derepressed elements being now bound by various factors because they are accessible is not surprising and does not imply any causality with the VM status.

– As described in the public review, a clear description of what strains which dataset comes from is imperative, as KZFPs and young IAPs can be strain specific.

– the 'IAPLTR#' nomenclature should be changed to IAP LTRs #.

– 'euchromatic environment' should be changed to 'proximity of expressed genes' or a similar definition. There can be pockets of local repression close to expressed genes.

– less than 1 kb from an annotated enhancer element… in what cell type(s)? What is the genome distribution of these elements – is there one every 10 kb, which is possible if the database from all cell types was used (which would not be relevant at all)? Some kind of enrichment calculation should be performed instead.

– sentence starting at line 131: I see no support for such a strong statement, even if the above point is clarified. There is enrichment for sure but it is not exclusive, which is implied here with 'must'.

– Figure 1: Why only IAPLTR1s, and not show IAPLTR2s?

– line 137: 'intact' is most probably not correct here – these all have their full open reading frames with no mutations?

– line 142-145: wrong definition of the grey sections – these are gaps, not sequence with no homology.

– provide justification for using only the first 150 bps of IAPEz-ints – repression from KZFPs can have a longer range than a single nucleosome.

– line 181: 'least conserved' is potentially misleading – is it that they were lost in other strains, or that they are new a few related strains? 'conserved' implies the former, while I think here it is the latter. Should be changed elsewhere in the text and in figure legends (for example 2C) to something more precise.

– Figure 2A: IAPEz-clade # should be IAPEz-int clade #.

– Figure 2B: is there no clade 3-4 LTRs associated with IAPLTR2s?

– Figure 2C: again, why no mention of LTR2s?

– Figure 2C: 'cade' instead of 'clade' in a few spots.

– Figure 2C: This one is overall confusing – are clade 2, 3 and 4 never associated with IAPEz-Int? Figure 2B show that they are, and the legend should reflect this, as now this gives the impression that they are solo LTRs given the lack of annotation following the initial presence of it.

– line 224-225: this is backward logic, there's no such demonstration. It should be precised that 'in the absence of repression from KZFPs, IAPs are capable…'

– Figure 3A: 'Mouse ERVs LTRs', in both the figure and the text.

– Figure 3A: The use of the CpG Score metric is puzzling me – why restrict it to 'the 200 bp window with the densest CpG'? This is never explained – why not use the full element?

– line 245: comparing to random sampling of the genome does not make any sense – compare instead to non variably methylated loci of the same TE family (or families).

– Figure 4: be careful in calling 40 million year old elements 'young' – the timescale between mouse and human-specific elements is very different.

– Use GTEx liver methylation instead of in addition to the HepG2 datasets.

– line 285: what tissue was used from the Tc1 mice? Erythroblasts just the same? The expression of KZFPs is also tissue specific, this comparison should be made in ES cells where DNA methylation patterns are established.

– line 322: The sentence is completely unsupported by the data.

– Figure 6B: please compare to the genome-wide distribution of TEs – is it surprising to find that proportion of Mus-specific elements?

– Figure 6D: the use of random sampling of non-responsive TEs is not adequate – compare with random sampling from the same families where you see increased expression.

– The discussion should be rewritten to take into account the above comments – as of now, much of the CPF1 section is not supported by the data.

– provide version numbers for all software and databases used.

– justify the use of bowtie1 in some cases, and bowtie2 in others. Use a single pipeline is preferred.

– Bowtie2 to map RNA-seq is not appropriate as it does not take splicing into account – please use a RNA-seq specific mapping strategy.

Reviewer #3 (Recommendations for the authors):

The authors should state the number of copies they are referring to (the VM-IAPs). It seems to me that there is a small number of variably methylated IAPs and this is such a tiny number that reinforces the idea that most IAPs are silenced, and only a few are able to escape silencing. This discussion strengthens the fact that is not only a matter of IAP sequence, but also context, and potentially strain as the authors suggest in the conclusion. Finally, this would put into context the idea that the IAPs are "winning this arms race".

Since the authors are using clusters of IAP sequences, it would be interesting to estimate the age of such clusters and their relationship. This would imply discussing other articles that have previously classified IAP sequences (for instance 10.1002/mc.20576), and even taken into account their methylation level (10.1186/1471-2164-14-48), and potentially their CpG density.

As stated by the authors, 93% of VM-IAPLTR1s are full length sequences, so I wonder how LTRs stemming from the same copy were treated? Are both included in the analysis? Also, of these 93% FL copies, what are the differences in sequence/methylation/binding KZFP/KAP1 etc, between the two LTRs? Are there any FL copy that has different sequences at their LTR and therefore different chromatin state? Previous articles have shown that LTRs bearing the same sequence can have different methylation/chromatin state depending on their distance to nearby genes. Can the authors include this observation in their discussion?

I don't understand the consensus mapping for ChIP-seq for the ZFPs, while the KAP1 ChIP-seq is done on individual IAPLTR1 element (is this uniquely mapping reads?). From the methods, I understand that the ChIP-seq reads are mapped to the mm10 genome, and then the enrichment for each base is translated to a consensus in order to illustrate as a heatmap in Figure 1? Are these uniquely or multimapped reads? These methods should be stated clearly.

In order to associate chromatin environment and VM-IAP state, the authors demonstrate that VM-IAPs are enriched within 50Kb of constitutively expressed genes. This distance seems very large for me, and I wonder how the authors have decided to chose it. Finally, In the Elmer et al., article, where the VM-IAP set is chosen, there is a valid association between CTCF and variably methylated IAP copies. In my opinion this shows how the chromatin environment is associated with VM-IAP state. The authors should discuss this in their manuscript.

The four wild-derived mouse inbred strains from Nellaker et al., ((CAST/EiJ, PWK/PhJ, WSB/EiJ and SPRET/EiJ)) have disproportionate numbers of ERV TEV calls (see Figure 1c from their paper), which could be associated with false positive calls in those species. It would be more stringent to remove these strains from this analysis, in order to have a conservation estimation between lab strains.

Given that VM-IAPs are a smaller set than all the other IAPs, comparisons stating enrichment or "more likely", should take into account these differences. For instance, on Figure 3 D or E, the "other IAPs" or "non-VM" should be a random set that has the same number of copies as the "VM" set.

In the transchromosomic mouse, the authors show that human TE derived CpG islands are hypomethylated due to lack of KZFP that would silence such TEs. I wonder what is the extent of hypomethylation in TEs that are not CpG-rich? It would strengthen the authors argument to show that non-CpG rich TEs are highly methylated (I hope I didn't miss something here though!).

Reviewer #4 (Recommendations for the authors):

How the conclusions of this paper could be strengthened:

– To establish that the identified sequence variants are causal to diminished KZFP recruitment and increased CxxC domain containing protein recruitment, motifs could be called in the sequence variants and identified motifs could be deleted and KZFP and CxxX domain containing protein recruitment profiled to assess if the proposed competition between KZFP and CxxC is dependent on the sequence variant.

– As decreased Kap1 recruitment is central to the conclusions of the paper, the authors should provide Kap1 coverage heatmaps for the IAP regions in Figures S1.6 and 2.A. This data could support the conclusion that sequence clades confer altered KAP1 mediated silencing for IAP elements by expanding the observation for IAPLTR1 to IAPLTR2 elements.

– To strengthen the claims of the paper in respect to mouse VM-IAPs, the authors could assess CFP1 binding by ChIP-seq or cutandrun (which they did successfully in human) in previously used available mouse embryonic stem cell lines and mice that lack mouse-specific KZFPs and no longer repress VM-IAPs (see Bertozzi 2020 and Wolf 2020).

– To test if genetic context indeed causes VM, the authors could e.g. remove and ectopically insert a VM-IAP to a different context to test if VM is lost.

– Line 121 'While sequence has been shown to be a factor in the establishment of variable methylation' Please add a reference (Bertozzi et al., for example) to the experimental evidence supporting this claim.

– The authors could consider more directly comparing human and mouse data by performing CFP1 cutandrun or ChIP-seq (that the lab successfully performed in Tc1 liver) also in mouse instead of H3K4me3 as a proxy.

How the presentation of the data could be improved:

– Given that the sequence features distinguishing IAPLTR1 and 2 elements that are or are not variably methylated are a main claim of the paper, the authors should consider providing more reader-friendly sequence information for the observed sequence changes that correlate with diminished KZFP recruitment (e.g. in Figure S1.1 and also for internal seq in S2.1). In the related pre-print, but not the manuscript, one can zoom into the pdf to observe the sequences, but it is very hard to identify actual sequence changes and where they reside. A detailed file in the supplement or zoom ups of the exact sequence changes referred to would be helpful.

– The logic of the sentence in line 154f is not obvious "As the majority of IAPLTR2s exist as solo LTRs this indicates that these elements can be repressed through binding of KZFPs to the IAP internal sequence". Why does it follow from existing as solo LTR that the LTR can be repressed via KZFP binding to the internal sequence? Figure S1.7 does not help understand this point. Please clarify.

– Figure 2C: the authors could consider showing conservation of all VM-sequences individually (given there are not so many) in the supplement in addition to the whole clade.

– In some cases, the manuscript would benefit from toning down statements about causality, when only correlation is assessed (see public review). Further examples include:

– Figure 1 heading "Sequence and chromatin context influence the establishment of a VM-IAPLTRs." The figure shows that VM-IAPs occur in a different chromatin context (less Kap1, more close to constitutively expressed genes and regulatory elements), but not that this context influences variable methylation.

– Relating to the same Figure in 107f: "these results demonstrate that sequence can be a contributing factor in the establishment of VM-IAPs". More appropriate would be "these results suggest that sequence could be a contributing factor in the establishment of VM-IAPs.

– Relating to the same Figure the header in Line 120 'chromatin environment can influence establishment of VM-IAP'. More appropriate would be 'correlates with a distinct chromatin environment'.

– Line 198 'IAPs that have loss of KZFP binding have recruitment of the ZF-CxxC containing proteins TET1 and CFP1' while data supporting this statement is shown in the Tc1 mouse model, the data in this section for mouse is correlative. It does support presence, rather than 'recruitment' of the ZF-CxxC domain containing proteins.

– Line 253f 'our results highlight the potentially conserved importance of high CpG density in the establishment of variable methylation' The human data does not show that CpG density establishes LTR12C elements as variably methylated. It shows that LTR12C elements that are variably methylated are also CpG dense.

https://doi.org/10.7554/eLife.71104.sa1

Author response

Essential revisions:

The reviewers all agreed on the interest of the question and the significance and robustness of the analyses, even though the conclusions may not be entirely novel in some places. They also raised the requirement for 1- providing improved method description, 2-performing additional bioinformatic analyses, 3- not overinterpreting correlative data, 4- discussing previous literature. Please refer to the main three points further expanded below to prepare your revisions, and please consult the detailed points raised by the reviewers, for further modification, editing and discussion in a point-by-point rebuttal.

1) More information is needed regarding the Material and Methods. This is key to understand how the data were treated in terms of bioinformatic analyses: i) precise everywhere it is required whether unique or multiple mapping was used, ii) precise whether 5'LTR and/or 3'LTR were used for LTR alignment when referring to full length VM-IAP copies, iii) considering the strain-specificity of some VM-IAPS and KZFP controllers, when relying on public mouse datasets, specify from which mouse strain they come from. The same mouse strain should ideally be used throughout the study.

We thank that the reviewers for raising these points. In the revised manuscript:

1. We have elaborated on unique vs. multimapping approaches for all analyses. Briefly, we used only uniquely mapped reads in all analyses except for the analysis of KAP1 and KZFP binding at different sequence clades (Figure 1). The KAP1 and KZFP ChIP-seq datasets were unable to map uniquely to these highly repetitive loci. However, given that we were interested in determining which KZFPs, and KAP1, can bind to which clades, multimapped reads are appropriate. Bowtie1 with the -m 1 option was used for unique alignments and Bowtie2 with default settings was used where multimapping was required. This was done as bowtie1 has a higher true positive rate for uniquely mapped single end reads than bowtie2 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6935493/. This has been more thoroughly explained in the methods of the revised manuscript (see page 19, “Alignment of existing ChIP-seq data”).

2. We used both 5’ and 3’ IAPLTRs for all multiple sequence alignments and hierarchical clustering. All elements with the same sequence had the same KZFP binding profile regardless of their position relative to the IAPEz-int. We have attempted to make this more clear in the revised manuscript (see page 5, line 109).

3. We have included more information about the mouse strains used for each dataset. (see page 24, “Availability of data and materials”).

2) Additional bioinformatic analyses are needed to strengthen the paper:

– plot KAP1 across IAPLTR2 sequences and over IAPEz internal sequence (reviewer #4).

– in VM versus non-VM comparisons, please use a random set of sequences to compare the same number of sequences (the number of VM-IAPs being overwhelmingly smaller compared to other categories of IAPS (non-VM, others)) (reviewer #3).

– use GTEX human liver methylation instead of the HepG2 cell line WGBS dataset (reviewer #2).

– from their CPF1 CandR in wildtype liver, the authors should analyze the level of CPF1 binding at VM-IAPs specifically, as they did for H3K4me3 (from public B cell dataset).

We thank that the reviewers for suggesting these analyses to strengthen the manuscript. In the revised manuscript:

1. We have reworked Figure 1 to show KAP1/KZFP binding profiles for IAPLTR2s (see Figure 1 —figure supplement 4) and IAPEz-ints (see Figure 1A).

2. We have compared the VM-IAPs to: (1) a random set of loci and (2) a random selection of non VM-IAPLTRs (see Figure 2 —figure supplement 3) and observed that the VM-IAPs have increased H3K4me3 relative to both of these.

3. We have changed the HepG2 methylation data to human liver methylation data (see Figure 4). We observe the same trends with the human liver data as we saw with the HepG2 methylation profiling.

4. We profiled CFP1 binding at VM-IAPs using an existing CFP1 ChIP-seq dataset from C57BL/6 mice (see Figure 2 —figure supplement 2). The CFP1 CUTandRUN profiles that we generated from the WT and Trim28 D9/+ mouse livers were in an FVB background, so we used the existing data from C57BL/6 mice to keep the strains consistent.

4 – Data are only correlative as this stage. VM-IAPs are less bound by KAP1/KZFP, but whether this lower recruitment is key to the VM status would need to be genetically tested: by modifying the KZFP binding site of a unique VM-IAP to convert towards weaker or stronger binding. Similarly, whether CFP1 presence on VM-IAPs was not demonstrated here to be causative of the VM-IAP status. IAPs that fail to be protected by KZFPs may just be accessible to all DNA binding proteins. It would require testing whether KAP1 or KZFP loss leads to CFP1 recruitment and then test if CPF1 is required for the VM status in mice. Considering the amount of time and work that these approaches would require, they are not requested within the frame of this revision. However, without formal demonstration, the authors should balance their discussion, and acknowledge that their data provide correlative, not causative, evidence, and should be treated as such. Reviewer #4 listed several instances where the text should be toned down.

As suggested by the reviewers, the language has been toned down throughout the paper. We also have added new analyses/figures to examine the role of CpG density and ZF-CxxC proteins more clearly throughout the manuscript and added additional references to frame our results in the context of previous work.

Reviewer #2 (Recommendations for the authors):

Overall I think this could be a good manuscript if more emphasis was put on precision and an extensive description of patterns observed, as well as a more detailed analysis and a balanced discussion of the CPF1 results and their meaning for our mechanistical understanding of VM-IAPs and other epistable epialles. In addition to the points mentioned in the public review:

– I think the title is not supported by the content of the article and should be modified to something more neutral that showcases the novel findings but does not overplay them. There is no new 'mechanism' in this article, but a few correlations and observations, some of which could suggest future research avenues to prove if they play a mechanistical role. Similarly, claims of the same nature in the abstract, results and Discussion sections should be toned down.

We have adjusted the text and the title accordingly.

– Much of my problems with the current manuscript have to be with the interpretation of the CPF1 findings – in my view there is absolutely no reason to think there is a mechanistical link between VM-IAP status and CPF1. Derepressed elements being now bound by various factors because they are accessible is not surprising and does not imply any causality with the VM status.

We have toned down the language throughout the revised manuscript. Additionally, we have moved some supplemental figures into the main text and added additional analyses that support that fact that VM-IAPs are bound by ZF-CxxC proteins (see Figure 2).

– As described in the public review, a clear description of what strains which dataset comes from is imperative, as KZFPs and young IAPs can be strain specific.

Thank you for this comment. We have included the strain information for all datasets.

– the 'IAPLTR#' nomenclature should be changed to IAP LTRs #.

Thank you for the comment. We chose the IAPLTR# nomenclature to keep our naming consistent with the nomenclature used in Repbase and Dfam, as described in https://www.nature.com/articles/nrg2165-c1 and https://www.dfam.org/family/DF0001789/ summary.

– 'euchromatic environment' should be changed to 'proximity of expressed genes' or a similar definition. There can be pockets of local repression close to expressed genes.

Thank you. We have made this changed our terminology in the revised manuscript.

– less than 1 kb from an annotated enhancer element… in what cell type(s)? What is the genome distribution of these elements – is there one every 10 kb, which is possible if the database from all cell types was used (which would not be relevant at all)? Some kind of enrichment calculation should be performed instead.

We have elaborated on of this work was profiled (see page 20, “Genomic context analysis”). We used annotated enhancers across cell. Additionally, we compared the frequency at which the clade 3 VM-IAPLTR1s and clade 3 non VM-IAPLTR1s were proximal to these expressed genes and enhancers. We observed that the VM-IAPs were more likely to be proximal to these elements compared to other IAPs (Figure 1D). In regards to the window sizes, we have included a figure demonstrating VM-IAPLTR1 clade3 elements proximal to the expressed transcripts and compared this to the non VM-IAPLTR1 clade 3 elements (see Figure 1 —figure supplement 6).

– sentence starting at line 131: I see no support for such a strong statement, even if the above point is clarified. There is enrichment for sure but it is not exclusive, which is implied here with 'must'.

As suggested by the reviewer, the language has been toned down.

– Figure 1: Why only IAPLTR1s, and not show IAPLTR2s?

We have profiled IAPLTR1s and IAPEz-ints in Figure 1, while IAPLTR2s are included in the supplement (see Figure 1 —figure supplements 1,4,5, and 7). This was done because the potential impact of KZFPs on IAPLTR2 elements has been previously explored (Bertozzi et al., 2020).

– line 137: 'intact' is most probably not correct here – these all have their full open reading frames with no mutations?

We thank the reviewer for the suggestion. We have clarified the language to make it clear that we were profiling IAPLTR elements > 300 bps in size.

– line 142-145: wrong definition of the grey sections – these are gaps, not sequence with no homology.

We thank the reviewer for recognizing this mistake. We have changed the terminology to “gap” as suggested.

– provide justification for using only the first 150 bps of IAPEz-ints – repression from KZFPs can have a longer range than a single nucleosome.

We apologize for not making this clear in the initial submission. This window was used because it has been reported that Gm14419 binds to the 5’ most end of IAPEz-ints. We have made this more explicit in the revised manuscript.

– line 181: 'least conserved' is potentially misleading – is it that they were lost in other strains, or that they are new a few related strains? 'conserved' implies the former, while I think here it is the latter. Should be changed elsewhere in the text and in figure legends (for example 2C) to something more precise.

We thank the reviewer for the suggestion. We have clarified that the language and now state that the elements are more/less polymorphic.

– Figure 2A: IAPEz-clade # should be IAPEz-int clade #.

We thank the reviewer for catching this. This has been corrected in the text.

– Figure 2B: is there no clade 3-4 LTRs associated with IAPLTR2s?

We apologize that this was unclear. IAPLTR1 and IAPLTR2 have independent clades. IAPLTR1s were determined to have 4 clades, while IAPLTR2s were determined to have 2 clades. The IAPLTR1 clade 1 elements have distinct sequences compared to IAPLTR2 clade 1 elements (see Figure 1 —figure supplement 1). This figure’s representation has been changed to try to improve clarity (see Figure 1C and Figure 1 —figure supplement 5).

– Figure 2C: again, why no mention of LTR2s?

IAPLTR2s are now included (see Figure 1 —figure supplement 7).

– Figure 2C: 'cade' instead of 'clade' in a few spots.

We thank the reviewer for catching this. This has been corrected in the text.

– Figure 2C: This one is overall confusing – are clade 2, 3 and 4 never associated with IAPEz-Int? Figure 2B show that they are, and the legend should reflect this, as now this gives the impression that they are solo LTRs given the lack of annotation following the initial presence of it.

We apologize that this was unclear in the original version on the manuscript. The updated manuscript has a completely new Figure 1 which more clearly demonstrates the relationship between the LTRs and IAPEz-Int elements (see Figure 1C, 1E).

– line 224-225: this is backward logic, there's no such demonstration. It should be precised that 'in the absence of repression from KZFPs, IAPs are capable…'

We have adjusted both the language of the text and the content of Figure 2. We now include analysis of DNA methylation in cells where TET1 (a ZF-CxxC protein) is present or absent.

– Figure 3A: 'Mouse ERVs LTRs', in both the figure and the text.

– Figure 3A: The use of the CpG Score metric is puzzling me – why restrict it to 'the 200 bp window with the densest CpG'? This is never explained – why not use the full element?

We apologize that this was unclear in the original manuscript. The 200 bp window was used to prevent issues caused by TE size. For example, some L1 elements contain a CpG island at their 5’ UTR, but are relatively CpG poor throughout the rest of the element. The 200 bp window size was chosen to match the minimum size of a CpG island (Gardiner-Garden and Frommer, 1987). We have tried to make this clearer in the revised manuscript by adding a section describing CpG profiling in the methods section (see page 21, “Profiling of CpG score”).

– line 245: comparing to random sampling of the genome does not make any sense – compare instead to non variably methylated loci of the same TE family (or families).

We apologize for any confusion. In Figure 3A (to which this line was referring), we were looking to determine whether evolutionarily recent TEs were more likely to be found at variably methylated loci in humans. We determined the observed frequency that variably methylated loci were found at TEs in the genome and compared this to the frequency expected if this distribution was only due to chance. The random sampling of the genome was preformed to determine that expected distribution. We have worked to make this clearer in the text.

– Figure 4: be careful in calling 40 million year old elements 'young' – the timescale between mouse and human-specific elements is very different.

We thank the reviewer for the suggestion. We have adjusted this language from “young” to “evolutionarily recent” throughout the text.

– Use GTEx liver methylation instead of in addition to the HepG2 datasets.

We thank the reviewer for this suggestion. We have changed the methylation data to human liver data. This change did not impact any of our conclusions.

– line 285: what tissue was used from the Tc1 mice? Erythroblasts just the same? The expression of KZFPs is also tissue specific, this comparison should be made in ES cells where DNA methylation patterns are established.

We agree with the reviewer that is potential shortcoming. The CFP1 ChIP-seq data is from human erythroblasts, while the mouse is profiling liver data. However, as the KZFPs are missing from all cells, the lack of methylation should be consistent across tissue types, as seen reported here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3560060/.­­

– line 322: The sentence is completely unsupported by the data.

We have toned down the language and expanded the data in the figure corresponding to these results. We show that there is hypomethylation only at the CpG dense TEs and that these TEs are the elements that we see specific CFP1 enrichment in the Tc1 mice.

– Figure 6B: please compare to the genome-wide distribution of TEs – is it surprising to find that proportion of Mus-specific elements?

We have included a figure showing the breakdown of TE ages in mm10 genome (see Figure 5 —figure supplement 2).

– Figure 6D: the use of random sampling of non-responsive TEs is not adequate – compare with random sampling from the same families where you see increased expression.

We apologize for any confusion. Each point refers to the total expression of every element in a TE subfamily, and not the expression of an individual TE in the genome. The goal of this figure is to globally profile changes in expression for each TE subfamily, and not to identify novel TE-initiated transcripts. It is therefore not possible to compare responsive TEs to non-responsive TEs of the same subfamily as each data point contains all TEs for the given family. This has been clarified in the methods (see Methods “Differential expression analysis of repetitive element subfamilies”).

– The discussion should be rewritten to take into account the above comments – as of now, much of the CPF1 section is not supported by the data.

We thank the reviewer for the suggestion and have toned down the language throughout the paper.

– provide version numbers for all software and databases used.

We have added version numbers for all software used.

– justify the use of bowtie1 in some cases, and bowtie2 in others. Use a single pipeline is preferred.

We thank the reviewer for the suggestion. Bowtie1 with the -m 1 option was used to allow us to uniquely align the reads to the genome. Bowtie2 with default settings was used in the few instances where multimapping was required. This was done as bowtie1 has a higher true positive rate for uniquely mapped single end reads than bowtie2 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6935493/. More information has been included in the methods section to clarify which tool was used for processing each dataset (see the updated methods on page 19, “Alignment of existing ChIP-seq data”).

– Bowtie2 to map RNA-seq is not appropriate as it does not take splicing into account – please use a RNA-seq specific mapping strategy.

We thank the reviewer for the comment. This was done as this is part of the RepEnrich pipeline which is used to globally profile all TEs subfamilies. This has been clarified in the method section (see Methods “Differential expression analysis of repetitive element subfamilies”).

Reviewer #3 (Recommendations for the authors):

The authors should state the number of copies they are referring to (the VM-IAPs). It seems to me that there is a small number of variably methylated IAPs and this is such a tiny number that reinforces the idea that most IAPs are silenced, and only a few are able to escape silencing. This discussion strengthens the fact that is not only a matter of IAP sequence, but also context, and potentially strain as the authors suggest in the conclusion. Finally, this would put into context the idea that the IAPs are "winning this arms race".

We thank the reviewer for the suggestion. We have altered figures (see Figure 1C and Figure 1—figure supplement 5) to more clearly state the number of IAPs with are variably methylated compared to the total population.

Since the authors are using clusters of IAP sequences, it would be interesting to estimate the age of such clusters and their relationship. This would imply discussing other articles that have previously classified IAP sequences (for instance 10.1002/mc.20576), and even taken into account their methylation level (10.1186/1471-2164-14-48), and potentially their CpG density.

We thank the reviewer for this suggestion and agree that this is an interesting question. Previous studies, such as the suggested paper (10.1002/mc.20576), have observed that “older” TEs have a higher sequence divergence from the consensus sequence. Interestingly, we observed that the clade 3-4 elements have a higher sequence divergence from the consensus IAPLTR1 sequence, but that these elements are more polymorphic and potentially younger.

This is an area of interest for future research. Additionally, we did not observe any major differences in CpG content between the IAPLTR1 sequence clades.

As stated by the authors, 93% of VM-IAPLTR1s are full length sequences, so I wonder how LTRs stemming from the same copy were treated? Are both included in the analysis? Also, of these 93% FL copies, what are the differences in sequence/methylation/binding KZFP/KAP1 etc, between the two LTRs? Are there any FL copy that has different sequences at their LTR and therefore different chromatin state? Previous articles have shown that LTRs bearing the same sequence can have different methylation/chromatin state depending on their distance to nearby genes. Can the authors include this observation in their discussion?

As suggested by the reviewer, we have elaborated on how we treated these elements in the text (see page 5, line 105).

Briefly, we treated 5’ and 3’ LTRs the same during the clustering analysis. We found that both the 5’ and 3’ LTRs have same KZFP binding profile. Additionally, all FL-IAPs have IAPLTRs belonging to the same clades flanking the internal element, so differences in the 5’ and 3’ sequence do not appear to be responsible for any changes in VM status.

I don't understand the consensus mapping for ChIP-seq for the ZFPs, while the KAP1 ChIP-seq is done on individual IAPLTR1 element (is this uniquely mapping reads?). From the methods, I understand that the ChIP-seq reads are mapped to the mm10 genome, and then the enrichment for each base is translated to a consensus in order to illustrate as a heatmap in Figure 1? Are these uniquely or multimapped reads? These methods should be stated clearly.

We apologize for any confusion. We have clarified this in the methods section (see the updated methods on page 19, “Alignment of existing ChIP-seq data”).

While we used uniquely mapped reads when possible, the KAP1 and KZFP ChIP-seq datasets were unable to map uniquely to these highly repetitive TEs. For these datasets, we used multimapped reads to determine if KZFP/KAP1 had the potential to bind to the identified sequence clades.

In order to associate chromatin environment and VM-IAP state, the authors demonstrate that VM-IAPs are enriched within 50Kb of constitutively expressed genes. This distance seems very large for me, and I wonder how the authors have decided to chose it. Finally, In the Elmer et al., article, where the VM-IAP set is chosen, there is a valid association between CTCF and variably methylated IAP copies. In my opinion this shows how the chromatin environment is associated with VM-IAP state. The authors should discuss this in their manuscript.

We thank the reviewer for the suggestion. Regarding the window size, we sampled multiple different window sizes and ultimately selected 50Kb, which covered the greatest number of VM-IAPs. We have included an additional figure to demonstrate that the trends that we observe are consistent regardless of the window size used (see Figure 1 —figure supplement 6).

Additionally, we have included discussion on how Elmer et al., identified an association between CTCF and variable methylation in our main text, as this is an important observation.

The four wild-derived mouse inbred strains from Nellaker et al., (CAST/EiJ, PWK/PhJ, WSB/EiJ and SPRET/EiJ) have disproportionate numbers of ERV TEV calls (see Figure 1c from their paper), which could be associated with false positive calls in those species. It would be more stringent to remove these strains from this analysis, in order to have a conservation estimation between lab strains.

We thank the reviewer for the suggestion. We have removed these outbred strains from our comparison.

Given that VM-IAPs are a smaller set than all the other IAPs, comparisons stating enrichment or "more likely", should take into account these differences. For instance, on Figure 3 D or E, the "other IAPs" or "non-VM" should be a random set that has the same number of copies as the "VM" set.

We thank the reviewer for this suggestion. We have compared our results using both a smaller selection of other IAPs and compared them to a random selection of loci in the genome (Figure 2 —figure supplement 3). We have also moved the H3K4me3 results to the supplement as this has been observed in Bertozzi et al., 2020.

In the transchromosomic mouse, the authors show that human TE derived CpG islands are hypomethylated due to lack of KZFP that would silence such TEs. I wonder what is the extent of hypomethylation in TEs that are not CpG-rich? It would strengthen the authors argument to show that non-CpG rich TEs are highly methylated (I hope I didn't miss something here though!).

We thank the reviewer for this suggestion. We have added the non-CpG rich TEs to the main figure to show that the high CpG density appears to be necessary for the hypomethylation (Figure 4B-C).

Reviewer #4 (Recommendations for the authors):

How the conclusions of this paper could be strengthened:

– To establish that the identified sequence variants are causal to diminished KZFP recruitment and increased CxxC domain containing protein recruitment, motifs could be called in the sequence variants and identified motifs could be deleted and KZFP and CxxX domain containing protein recruitment profiled to assess if the proposed competition between KZFP and CxxC is dependent on the sequence variant.

We thank the reviewer for this suggestion. To begin to address this point, we have improved our profiling of the IAPLTR sequences. We highlight the region with the greatest KZFP recruitment and compare the sequences for each clade to the previously determined KZFP binding motif (see Figure 1 —figure supplement 3). However, we believe that this is an area for future exploration.

– As decreased Kap1 recruitment is central to the conclusions of the paper, the authors should provide Kap1 coverage heatmaps for the IAP regions in Figures S1.6 and 2.A. This data could support the conclusion that sequence clades confer altered KAP1 mediated silencing for IAP elements by expanding the observation for IAPLTR1 to IAPLTR2 elements.

We thank the reviewer for the suggestion. We have included the KAP1 heatmaps for IAPLTR1, IAPLTR2, and IAPEz-int elements (see Figure 1 and Figure 1 —figure supplement 4).

– To strengthen the claims of the paper in respect to mouse VM-IAPs, the authors could assess CFP1 binding by ChIP-seq or cutandrun (which they did successfully in human) in previously used available mouse embryonic stem cell lines and mice that lack mouse-specific KZFPs and no longer repress VM-IAPs (see Bertozzi 2020 and Wolf 2020).

We thank the reviewer for the suggestion. We agree that it would be interesting to determine whether we can observe an increase of CFP1 recruitment in the absence of KZFP silencing. We believe that this should be further explored in the future.

– To test if genetic context indeed causes VM, the authors could e.g. remove and ectopically insert a VM-IAP to a different context to test if VM is lost.

We agree that this is an interesting idea that could experimentally confirm that location impacts the establishment of a VM-IAP and hope to explore this in the future.

– Line 121 'While sequence has been shown to be a factor in the establishment of variable methylation' Please add a reference (Bertozzi et al., for example) to the experimental evidence supporting this claim.

We apologize for missing this. We have now included this citation.

– The authors could consider more directly comparing human and mouse data by performing CFP1 cutandrun or ChIP-seq (that the lab successfully performed in Tc1 liver) also in mouse instead of H3K4me3 as a proxy.

We thank the reviewer for the suggestion. We profiled existing CFP1 ChIP-seq data from C57BL/6 mice and observed that there is an enrichment of CFP1 signal at the VM-IAPs (see Figure 2 —figure supplement 2). However, we moved the H3K4me3 results to the supplement, as we find the TET1 occupation and the methylation changes in Tet1 KO mESCs to be more important to the overall story.

How the presentation of the data could be improved:

– Given that the sequence features distinguishing IAPLTR1 and 2 elements that are or are not variably methylated are a main claim of the paper, the authors should consider providing more reader-friendly sequence information for the observed sequence changes that correlate with diminished KZFP recruitment (e.g. in Figure S1.1 and also for internal seq in S2.1). In the related pre-print, but not the manuscript, one can zoom into the pdf to observe the sequences, but it is very hard to identify actual sequence changes and where they reside. A detailed file in the supplement or zoom ups of the exact sequence changes referred to would be helpful.

We thank the reviewer for the suggestion. We have included a more reader friendly version of the IAPLTR sequence variants (see Figure 1 —figure supplement 2). We have also included a zoomed in image of the loci with increased KZFP binding (see Figure 1 —figure supplement 3), in addition to the previously annotated KZFP binding motif.

– The logic of the sentence in line 154f is not obvious "As the majority of IAPLTR2s exist as solo LTRs this indicates that these elements can be repressed through binding of KZFPs to the IAP internal sequence". Why does it follow from existing as solo LTR that the LTR can be repressed via KZFP binding to the internal sequence? Figure S1.7 does not help understand this point. Please clarify.

We thank the reviewer for the suggestion. We have reworked this entire section to improve the clarity and flow of the section ( see the section on “IAP sequence influences KZFP recruitment and establishment of variable methylation” in the revised manuscript).

– Figure 2C: the authors could consider showing conservation of all VM-sequences individually (given there are not so many) in the supplement in addition to the whole clade.

We thank the reviewer for the suggestion. This is not included as it has been previously reported in Kazachenka et al., 2018. We were looking to expand upon their work to see if all elements with shared sequence variants were also more likely to be more polymorphic.

– In some cases, the manuscript would benefit from toning down statements about causality, when only correlation is assessed (see public review). Further examples include:

– Figure 1 heading "Sequence and chromatin context influence the establishment of a VM-IAPLTRs." The figure shows that VM-IAPs occur in a different chromatin context (less Kap1, more close to constitutively expressed genes and regulatory elements), but not that this context influences variable methylation.

– Relating to the same Figure in 107f: "these results demonstrate that sequence can be a contributing factor in the establishment of VM-IAPs". More appropriate would be "these results suggest that sequence could be a contributing factor in the establishment of VM-IAPs.

– Relating to the same Figure the header in Line 120 'chromatin environment can influence establishment of VM-IAP'. More appropriate would be 'correlates with a distinct chromatin environment'.

– Line 198 'IAPs that have loss of KZFP binding have recruitment of the ZF-CxxC containing proteins TET1 and CFP1' while data supporting this statement is shown in the Tc1 mouse model, the data in this section for mouse is correlative. It does support presence, rather than 'recruitment' of the ZF-CxxC domain containing proteins.

– Line 253f 'our results highlight the potentially conserved importance of high CpG density in the establishment of variable methylation' The human data does not show that CpG density establishes LTR12C elements as variably methylated. It shows that LTR12C elements that are variably methylated are also CpG dense.

We thank the reviewer for the suggestions. We have toned down the language throughout the document.

https://doi.org/10.7554/eLife.71104.sa2

Article and author information

Author details

  1. Kevin R Costello

    1. Department of Diabetes Complications and Metabolism, Beckman Research Institute, Duarte, United States
    2. Irell and Manella Graduate School of Biological Sciences, City of Hope, Duarte, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing - original draft, Resources
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6104-0776
  2. Amy Leung

    Department of Diabetes Complications and Metabolism, Beckman Research Institute, Duarte, United States
    Contribution
    Investigation, Methodology, Resources
    Competing interests
    No competing interests declared
  3. Candi Trac

    Department of Diabetes Complications and Metabolism, Beckman Research Institute, Duarte, United States
    Contribution
    Investigation, Resources
    Competing interests
    No competing interests declared
  4. Michael Lee

    1. Department of Diabetes Complications and Metabolism, Beckman Research Institute, Duarte, United States
    2. Irell and Manella Graduate School of Biological Sciences, City of Hope, Duarte, United States
    Contribution
    Investigation, Resources
    Competing interests
    No competing interests declared
  5. Mudaser Basam

    Department of Diabetes Complications and Metabolism, Beckman Research Institute, Duarte, United States
    Contribution
    Investigation, Methodology, Funding acquisition, Resources
    Competing interests
    No competing interests declared
  6. J Andrew Pospisilik

    Van Andel Research Institute, Grand Rapids, United States
    Contribution
    Methodology, Project administration, Resources
    Competing interests
    No competing interests declared
  7. Dustin E Schones

    1. Department of Diabetes Complications and Metabolism, Beckman Research Institute, Duarte, United States
    2. Irell and Manella Graduate School of Biological Sciences, City of Hope, Duarte, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Methodology, Writing – review and editing, Software, Writing - original draft, Resources
    For correspondence
    dschones@coh.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7692-8583

Funding

National Institutes of Health (R01DK112041)

  • Dustin E Schones

National Institutes of Health (R01CA220693)

  • Dustin E Schones

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Vanessa Wegert for valuable technical support. We would also like the thank the and members of the Schones lab for helpful comments and suggestions. This work was supported by the National Institutes of Health, grants R01DK112041, R01CA220693 (D.E.S.). The research reported in this publication included work performed in the Integrative Genomics and Analytical Cytometry Cores supported by the National Cancer Institute of the National Institutes of Health under award number P30CA033572.

Ethics

All animal protocols were in accordance with German and United Kingdom legislation; Project license numbers 80/2098, 80/2497, and 35-9185.81/G-10/94.

Senior Editor

  1. Patricia J Wittkopp, University of Michigan, United States

Reviewing Editor

  1. Deborah Bourc'his, Institut Curie, France

Reviewer

  1. Todd S Macfarlan, The Eunice Kennedy Shriver National Institutes of Child Health and Human Development, NIH, United States

Version history

  1. Received: June 9, 2021
  2. Accepted: October 20, 2021
  3. Accepted Manuscript published: October 20, 2021 (version 1)
  4. Version of Record published: October 29, 2021 (version 2)
  5. Version of Record updated: November 5, 2021 (version 3)

Copyright

© 2021, Costello et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,781
    Page views
  • 254
    Downloads
  • 5
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Kevin R Costello
  2. Amy Leung
  3. Candi Trac
  4. Michael Lee
  5. Mudaser Basam
  6. J Andrew Pospisilik
  7. Dustin E Schones
(2021)
Sequence features of retrotransposons allow for epigenetic variability
eLife 10:e71104.
https://doi.org/10.7554/eLife.71104

Further reading

    1. Genetics and Genomics
    2. Microbiology and Infectious Disease
    William Matlock, Samuel Lipworth ... REHAB Consortium
    Research Article Updated

    Plasmids enable the dissemination of antimicrobial resistance (AMR) in common Enterobacterales pathogens, representing a major public health challenge. However, the extent of plasmid sharing and evolution between Enterobacterales causing human infections and other niches remains unclear, including the emergence of resistance plasmids. Dense, unselected sampling is essential to developing our understanding of plasmid epidemiology and designing appropriate interventions to limit the emergence and dissemination of plasmid-associated AMR. We established a geographically and temporally restricted collection of human bloodstream infection (BSI)-associated, livestock-associated (cattle, pig, poultry, and sheep faeces, farm soils) and wastewater treatment work (WwTW)-associated (influent, effluent, waterways upstream/downstream of effluent outlets) Enterobacterales. Isolates were collected between 2008 and 2020 from sites <60 km apart in Oxfordshire, UK. Pangenome analysis of plasmid clusters revealed shared ‘backbones’, with phylogenies suggesting an intertwined ecology where well-conserved plasmid backbones carry diverse accessory functions, including AMR genes. Many plasmid ‘backbones’ were seen across species and niches, raising the possibility that plasmid movement between these followed by rapid accessory gene change could be relatively common. Overall, the signature of identical plasmid sharing is likely to be a highly transient one, implying that plasmid movement might be occurring at greater rates than previously estimated, raising a challenge for future genomic One Health studies.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Carolina A Martinez-Gutierrez, Josef C Uyeda, Frank O Aylward
    Research Article

    Microbial plankton play a central role in marine biogeochemical cycles, but the timing in which abundant lineages diversified into ocean environments remains unclear. Here, we reconstructed the timeline in which major clades of bacteria and archaea colonized the ocean using a high-resolution benchmarked phylogenetic tree that allows for simultaneous and direct comparison of the ages of multiple divergent lineages. Our findings show that the diversification of the most prevalent marine clades spans throughout a period of 2.2 Ga, with most clades colonizing the ocean during the last 800 million years. The oldest clades – SAR202, SAR324, Ca. Marinimicrobia, and Marine Group II – diversified around the time of the Great Oxidation Event, during which oxygen concentration increased but remained at microaerophilic levels throughout the Mid-Proterozoic, consistent with the prevalence of some clades within these groups in oxygen minimum zones today. We found the diversification of the prevalent heterotrophic marine clades SAR11, SAR116, SAR92, SAR86, and Roseobacter as well as the Marine Group I to occur near to the Neoproterozoic Oxygenation Event (0.8–0.4 Ga). The diversification of these clades is concomitant with an overall increase of oxygen and nutrients in the ocean at this time, as well as the diversification of eukaryotic algae, consistent with the previous hypothesis that the diversification of heterotrophic bacteria is linked to the emergence of large eukaryotic phytoplankton. The youngest clades correspond to the widespread phototrophic clades Prochlorococcus, Synechococcus, and Crocosphaera, whose diversification happened after the Phanerozoic Oxidation Event (0.45–0.4 Ga), in which oxygen concentrations had already reached their modern levels in the atmosphere and the ocean. Our work clarifies the timing at which abundant lineages of bacteria and archaea colonized the ocean, thereby providing key insights into the evolutionary history of lineages that comprise the majority of prokaryotic biomass in the modern ocean.